Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool
In my experience, -Os produced faster code on gcc-2.95 than -O2 or 
-O3.


On what CPU?  The effect of different optimisations varies
hugely between different CPUs (and architectures).


x86


That's not a CPU, that's an architecture.  I hope you
understand there are very big differences between different
members of the x86 family and you don't compare 2.95 on a
Pentium class CPU to 3.x on an Opteron or 4.x on a Pentium4
or something like that.


With gcc-3.3, -Os show roughly the same performance as -O2 for me on
various programs. However, with gcc-3.4, I noticed a slow down with
-Os. And with gcc-4, using -Os optimizes only for size, even if the
output code is slow as hell. I've had programs whose speed dropped
by 70% using -Os on gcc-4.


Well you better report those!  


No, -Os is for size only :

   -Os Optimize for size.  -Os enables all -O2 optimizations
   that do not typically increase code size.  It also
   performs further optimizations designed to reduce code
   size.


That is not "for size only".  Please read again.

A 70% speed decrease is something that should be at least
investigated, even if then perhaps it is decided GCC already
does the "right thing".


So it is expected that speed can be reduced using -Os. I won't report
a thing which is already documented !


A few percent points slower is expected, 20% would be
explainable, but 70%?

-O2 and -Os are supposed to differ in _minor_ ways.  Such
a huge performance drop is unexpected.  If you file the PR,
feel free to blame me for reporting it at all.


But in some situtations, it's desirable to have the smallest possible
kernel whatever its performance. This goes for installation CDs for
instance.


There are much better ways to achieve that.


Optimizing is not a matter of choosing *one* way, but cumulating
everything you have.


Yes of course.  I'm just saying -Os is a pretty minor step
in the overall making-things-smaller game.  Leaving out XFS
helps a whole megabyte on my default target, for example.


For instance, on a smart boot loader, I have
a kernel which is about 300 kB, or 700 kB with the initramfs. Among
the tricks I used :
  - -Os
  - -march=i386
  - align everything to 0
  - replace gzip with p7zip

Even if each of them reduces overall size by 5%, the net result is
0.95^4 = 0.81 = 19% gain, for the same set of features. This is
something to consider.


Sure.  I don't think making -Os mean "as small as possible
in all cases" (or, rather, introducing a new option for that)
would help terribly much over the current -Os meaning -- a
few percent at most.  That's not to say that no such optimisations
are added anymore, but mostly they turn out not to decrease
speed at all and so are enabled at any -O level :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Willy Tarreau
On Mon, Jun 25, 2007 at 09:08:23AM +0200, Segher Boessenkool wrote:
> >In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.
> 
> On what CPU?  The effect of different optimisations varies
> hugely between different CPUs (and architectures).

x86

> >It was not only because of cache considerations, but because gcc used
> >different tricks to avoid poor optimizations, and at the end, the CPU
> >ended executing the alternative code faster.
> 
> -Os is "as fast as you can without bloating the code size",
> so that is the expected result for CPUs that don't need
> special hand-holding around certain performance pitfalls.
> 
> >With gcc-3.3, -Os show roughly the same performance as -O2 for me on
> >various programs. However, with gcc-3.4, I noticed a slow down with
> >-Os. And with gcc-4, using -Os optimizes only for size, even if the
> >output code is slow as hell. I've had programs whose speed dropped
> >by 70% using -Os on gcc-4.
> 
> Well you better report those!  

No, -Os is for size only :

   -Os Optimize for size.  -Os enables all -O2 optimizations
   that do not typically increase code size.  It also
   performs further optimizations designed to reduce code
   size.

So it is expected that speed can be reduced using -Os. I won't report
a thing which is already documented !

> >But in some situtations, it's desirable to have the smallest possible
> >kernel whatever its performance. This goes for installation CDs for
> >instance.
> 
> There are much better ways to achieve that.

Optimizing is not a matter of choosing *one* way, but cumulating
everything you have. For instance, on a smart boot loader, I have
a kernel which is about 300 kB, or 700 kB with the initramfs. Among
the tricks I used :
  - -Os
  - -march=i386
  - align everything to 0
  - replace gzip with p7zip

Even if each of them reduces overall size by 5%, the net result is
0.95^4 = 0.81 = 19% gain, for the same set of features. This is
something to consider.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool

-Os is "as fast as you can without bloating the code size",
so that is the expected result for CPUs that don't need
special hand-holding around certain performance pitfalls.


this sounds like you are saying that people wanting performance should 
pick -Os.


That is true on most CPUs.  Some CPUs really really need
some of things that -Os disables (compared to -O2) for
decent performance though (branch target alignment...)

what should people pick who care more about code size then anything 
else? (examples being embedded development where you may be willing to 
sacrafice speed to avoid having to add additional chips to the design)


-Os and tune some options.  There is extensive work being
done over the last few years to make GCC more suitable for
embedded targets btw.  But the -O1/-O2/-O3/-Os gives you
four choices only, it's really not so hard to understand
I hope that for more specific goals you need to add more
specific options?


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool

Also note that whether or not it is profitable to unroll
a particular loop depends largely on how "hot" that loop
is, and GCC doesn't know much about that if you don't feed
it profiling information (it can guess a bit, sure, but it
can guess wrong too).


actually, what you are saying is that the compiler can't know enough 
to figure out how to optimize for speed. it will just do what you tell 
it to, either unroll loops or not.


It bases its optimisation decisions on the options you give
it, the profile feedback information you either or not gave
it, and a whole bunch of heuristics.

this argues that both O2 and Os are incorrect for a project to use and 
instead the project needs to make it's own decisions on this.


For optimal performance, you need to fine-tune options yes,
per file (or per function even!)

if this is the true feeling of the gcc team I'm very disappointed, it 
feels like a huge step backwards.


I speak only for myself.  However this is the only way it _can_
be, the compiler isn't clairvoyant.  Some of the heuristics sure
could use some tuning, but they stay heuristics.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread david

On Mon, 25 Jun 2007, Segher Boessenkool wrote:


 In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.


On what CPU?  The effect of different optimisations varies
hugely between different CPUs (and architectures).


 It was not only because of cache considerations, but because gcc used
 different tricks to avoid poor optimizations, and at the end, the CPU
 ended executing the alternative code faster.


-Os is "as fast as you can without bloating the code size",
so that is the expected result for CPUs that don't need
special hand-holding around certain performance pitfalls.


this sounds like you are saying that people wanting performance should 
pick -Os.


what should people pick who care more about code size then anything else? 
(examples being embedded development where you may be willing to sacrafice 
speed to avoid having to add additional chips to the design)


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread david

On Mon, 25 Jun 2007, Segher Boessenkool wrote:


 then do we need a new option 'optimize for best overall performance' that
 goes for size (and the corresponding wins there) most of the time, but is
 ignored where it makes a huge difference?


That's -Os mostly.  Some awful CPUs really need higher
loop/label/function alignment though to get any
performance; you could add -falign-xxx options for those.


 in reality this was a flaw in gcc that on modern CPU's with the larger
 difference between CPU speed and memory speed it still preferred to unroll
 loops (eating more memory and blowing out the cpu cache) when it shouldn't
 have.


You told it to unroll loops, so it did.  No flaw.  If you
feel the optimisations enabled by -O2 should depend on the
CPU tuning selected, please file a PR.

Also note that whether or not it is profitable to unroll
a particular loop depends largely on how "hot" that loop
is, and GCC doesn't know much about that if you don't feed
it profiling information (it can guess a bit, sure, but it
can guess wrong too).


actually, what you are saying is that the compiler can't know enough to 
figure out how to optimize for speed. it will just do what you tell it to, 
either unroll loops or not.


this argues that both O2 and Os are incorrect for a project to use and 
instead the project needs to make it's own decisions on this.


if this is the true feeling of the gcc team I'm very disappointed, it 
feels like a huge step backwards.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool

In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.


On what CPU?  The effect of different optimisations varies
hugely between different CPUs (and architectures).


It was not only because of cache considerations, but because gcc used
different tricks to avoid poor optimizations, and at the end, the CPU
ended executing the alternative code faster.


-Os is "as fast as you can without bloating the code size",
so that is the expected result for CPUs that don't need
special hand-holding around certain performance pitfalls.


With gcc-3.3, -Os show roughly the same performance as -O2 for me on
various programs. However, with gcc-3.4, I noticed a slow down with
-Os. And with gcc-4, using -Os optimizes only for size, even if the
output code is slow as hell. I've had programs whose speed dropped
by 70% using -Os on gcc-4.


Well you better report those!  


But in some situtations, it's desirable to have the smallest possible
kernel whatever its performance. This goes for installation CDs for
instance.


There are much better ways to achieve that.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool
then do we need a new option 'optimize for best overall performance' 
that goes for size (and the corresponding wins there) most of the 
time, but is ignored where it makes a huge difference?


That's -Os mostly.  Some awful CPUs really need higher
loop/label/function alignment though to get any
performance; you could add -falign-xxx options for those.

in reality this was a flaw in gcc that on modern CPU's with the larger 
difference between CPU speed and memory speed it still preferred to 
unroll loops (eating more memory and blowing out the cpu cache) when 
it shouldn't have.


You told it to unroll loops, so it did.  No flaw.  If you
feel the optimisations enabled by -O2 should depend on the
CPU tuning selected, please file a PR.

Also note that whether or not it is profitable to unroll
a particular loop depends largely on how "hot" that loop
is, and GCC doesn't know much about that if you don't feed
it profiling information (it can guess a bit, sure, but it
can guess wrong too).


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool

In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.


On what CPU?  The effect of different optimisations varies
hugely between different CPUs (and architectures).


It was not only because of cache considerations, but because gcc used
different tricks to avoid poor optimizations, and at the end, the CPU
ended executing the alternative code faster.


-Os is as fast as you can without bloating the code size,
so that is the expected result for CPUs that don't need
special hand-holding around certain performance pitfalls.


With gcc-3.3, -Os show roughly the same performance as -O2 for me on
various programs. However, with gcc-3.4, I noticed a slow down with
-Os. And with gcc-4, using -Os optimizes only for size, even if the
output code is slow as hell. I've had programs whose speed dropped
by 70% using -Os on gcc-4.


Well you better report those!  http://gcc.gnu.org/bugzilla


But in some situtations, it's desirable to have the smallest possible
kernel whatever its performance. This goes for installation CDs for
instance.


There are much better ways to achieve that.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool
then do we need a new option 'optimize for best overall performance' 
that goes for size (and the corresponding wins there) most of the 
time, but is ignored where it makes a huge difference?


That's -Os mostly.  Some awful CPUs really need higher
loop/label/function alignment though to get any
performance; you could add -falign-xxx options for those.

in reality this was a flaw in gcc that on modern CPU's with the larger 
difference between CPU speed and memory speed it still preferred to 
unroll loops (eating more memory and blowing out the cpu cache) when 
it shouldn't have.


You told it to unroll loops, so it did.  No flaw.  If you
feel the optimisations enabled by -O2 should depend on the
CPU tuning selected, please file a PR.

Also note that whether or not it is profitable to unroll
a particular loop depends largely on how hot that loop
is, and GCC doesn't know much about that if you don't feed
it profiling information (it can guess a bit, sure, but it
can guess wrong too).


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread david

On Mon, 25 Jun 2007, Segher Boessenkool wrote:


 then do we need a new option 'optimize for best overall performance' that
 goes for size (and the corresponding wins there) most of the time, but is
 ignored where it makes a huge difference?


That's -Os mostly.  Some awful CPUs really need higher
loop/label/function alignment though to get any
performance; you could add -falign-xxx options for those.


 in reality this was a flaw in gcc that on modern CPU's with the larger
 difference between CPU speed and memory speed it still preferred to unroll
 loops (eating more memory and blowing out the cpu cache) when it shouldn't
 have.


You told it to unroll loops, so it did.  No flaw.  If you
feel the optimisations enabled by -O2 should depend on the
CPU tuning selected, please file a PR.

Also note that whether or not it is profitable to unroll
a particular loop depends largely on how hot that loop
is, and GCC doesn't know much about that if you don't feed
it profiling information (it can guess a bit, sure, but it
can guess wrong too).


actually, what you are saying is that the compiler can't know enough to 
figure out how to optimize for speed. it will just do what you tell it to, 
either unroll loops or not.


this argues that both O2 and Os are incorrect for a project to use and 
instead the project needs to make it's own decisions on this.


if this is the true feeling of the gcc team I'm very disappointed, it 
feels like a huge step backwards.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread david

On Mon, 25 Jun 2007, Segher Boessenkool wrote:


 In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.


On what CPU?  The effect of different optimisations varies
hugely between different CPUs (and architectures).


 It was not only because of cache considerations, but because gcc used
 different tricks to avoid poor optimizations, and at the end, the CPU
 ended executing the alternative code faster.


-Os is as fast as you can without bloating the code size,
so that is the expected result for CPUs that don't need
special hand-holding around certain performance pitfalls.


this sounds like you are saying that people wanting performance should 
pick -Os.


what should people pick who care more about code size then anything else? 
(examples being embedded development where you may be willing to sacrafice 
speed to avoid having to add additional chips to the design)


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool

Also note that whether or not it is profitable to unroll
a particular loop depends largely on how hot that loop
is, and GCC doesn't know much about that if you don't feed
it profiling information (it can guess a bit, sure, but it
can guess wrong too).


actually, what you are saying is that the compiler can't know enough 
to figure out how to optimize for speed. it will just do what you tell 
it to, either unroll loops or not.


It bases its optimisation decisions on the options you give
it, the profile feedback information you either or not gave
it, and a whole bunch of heuristics.

this argues that both O2 and Os are incorrect for a project to use and 
instead the project needs to make it's own decisions on this.


For optimal performance, you need to fine-tune options yes,
per file (or per function even!)

if this is the true feeling of the gcc team I'm very disappointed, it 
feels like a huge step backwards.


I speak only for myself.  However this is the only way it _can_
be, the compiler isn't clairvoyant.  Some of the heuristics sure
could use some tuning, but they stay heuristics.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool

-Os is as fast as you can without bloating the code size,
so that is the expected result for CPUs that don't need
special hand-holding around certain performance pitfalls.


this sounds like you are saying that people wanting performance should 
pick -Os.


That is true on most CPUs.  Some CPUs really really need
some of things that -Os disables (compared to -O2) for
decent performance though (branch target alignment...)

what should people pick who care more about code size then anything 
else? (examples being embedded development where you may be willing to 
sacrafice speed to avoid having to add additional chips to the design)


-Os and tune some options.  There is extensive work being
done over the last few years to make GCC more suitable for
embedded targets btw.  But the -O1/-O2/-O3/-Os gives you
four choices only, it's really not so hard to understand
I hope that for more specific goals you need to add more
specific options?


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Willy Tarreau
On Mon, Jun 25, 2007 at 09:08:23AM +0200, Segher Boessenkool wrote:
 In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.
 
 On what CPU?  The effect of different optimisations varies
 hugely between different CPUs (and architectures).

x86

 It was not only because of cache considerations, but because gcc used
 different tricks to avoid poor optimizations, and at the end, the CPU
 ended executing the alternative code faster.
 
 -Os is as fast as you can without bloating the code size,
 so that is the expected result for CPUs that don't need
 special hand-holding around certain performance pitfalls.
 
 With gcc-3.3, -Os show roughly the same performance as -O2 for me on
 various programs. However, with gcc-3.4, I noticed a slow down with
 -Os. And with gcc-4, using -Os optimizes only for size, even if the
 output code is slow as hell. I've had programs whose speed dropped
 by 70% using -Os on gcc-4.
 
 Well you better report those!  http://gcc.gnu.org/bugzilla

No, -Os is for size only :

   -Os Optimize for size.  -Os enables all -O2 optimizations
   that do not typically increase code size.  It also
   performs further optimizations designed to reduce code
   size.

So it is expected that speed can be reduced using -Os. I won't report
a thing which is already documented !

 But in some situtations, it's desirable to have the smallest possible
 kernel whatever its performance. This goes for installation CDs for
 instance.
 
 There are much better ways to achieve that.

Optimizing is not a matter of choosing *one* way, but cumulating
everything you have. For instance, on a smart boot loader, I have
a kernel which is about 300 kB, or 700 kB with the initramfs. Among
the tricks I used :
  - -Os
  - -march=i386
  - align everything to 0
  - replace gzip with p7zip

Even if each of them reduces overall size by 5%, the net result is
0.95^4 = 0.81 = 19% gain, for the same set of features. This is
something to consider.

Regards,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool
In my experience, -Os produced faster code on gcc-2.95 than -O2 or 
-O3.


On what CPU?  The effect of different optimisations varies
hugely between different CPUs (and architectures).


x86


That's not a CPU, that's an architecture.  I hope you
understand there are very big differences between different
members of the x86 family and you don't compare 2.95 on a
Pentium class CPU to 3.x on an Opteron or 4.x on a Pentium4
or something like that.


With gcc-3.3, -Os show roughly the same performance as -O2 for me on
various programs. However, with gcc-3.4, I noticed a slow down with
-Os. And with gcc-4, using -Os optimizes only for size, even if the
output code is slow as hell. I've had programs whose speed dropped
by 70% using -Os on gcc-4.


Well you better report those!  http://gcc.gnu.org/bugzilla


No, -Os is for size only :

   -Os Optimize for size.  -Os enables all -O2 optimizations
   that do not typically increase code size.  It also
   performs further optimizations designed to reduce code
   size.


That is not for size only.  Please read again.

A 70% speed decrease is something that should be at least
investigated, even if then perhaps it is decided GCC already
does the right thing.


So it is expected that speed can be reduced using -Os. I won't report
a thing which is already documented !


A few percent points slower is expected, 20% would be
explainable, but 70%?

-O2 and -Os are supposed to differ in _minor_ ways.  Such
a huge performance drop is unexpected.  If you file the PR,
feel free to blame me for reporting it at all.


But in some situtations, it's desirable to have the smallest possible
kernel whatever its performance. This goes for installation CDs for
instance.


There are much better ways to achieve that.


Optimizing is not a matter of choosing *one* way, but cumulating
everything you have.


Yes of course.  I'm just saying -Os is a pretty minor step
in the overall making-things-smaller game.  Leaving out XFS
helps a whole megabyte on my default target, for example.


For instance, on a smart boot loader, I have
a kernel which is about 300 kB, or 700 kB with the initramfs. Among
the tricks I used :
  - -Os
  - -march=i386
  - align everything to 0
  - replace gzip with p7zip

Even if each of them reduces overall size by 5%, the net result is
0.95^4 = 0.81 = 19% gain, for the same set of features. This is
something to consider.


Sure.  I don't think making -Os mean as small as possible
in all cases (or, rather, introducing a new option for that)
would help terribly much over the current -Os meaning -- a
few percent at most.  That's not to say that no such optimisations
are added anymore, but mostly they turn out not to decrease
speed at all and so are enabled at any -O level :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Willy Tarreau
On Sun, Jun 24, 2007 at 06:33:15PM -0700, [EMAIL PROTECTED] wrote:
> On Sun, 24 Jun 2007, Arjan van de Ven wrote:
> 
> >On Sun, 2007-06-24 at 18:08 -0700, [EMAIL PROTECTED] wrote:
> >>>
> >>>on a system level, size can help performance because you have more
> >>>memory available for other things.  It also reduces download size and
> >>>gives you more space on the live CD
> >>>
> >>>if you want to make things bigger again, please do this OUTSIDE the
> >>>"optimize for size" option. Because that TELLS you to go for size.
> >>
> >>then do we need a new option 'optimize for best overall performance' that
> >>goes for size (and the corresponding wins there) most of the time, but is
> >>ignored where it makes a huge difference?
> >
> >that isn't so easy. Anything which doesn't have a performance tradeoff
> >is in -O2 already. So every single thing in -Os costs you performance on
> >a micro level.
> 
> this has not been true in the past (assuming that it's true today)
> 
> ok, if you look at a micro-enough level this may be true, but completely 
> ignoring things like download times, the optimizations almost always boil 
> down to trying to avoid jumps, loops, and decision logic at the expense of 
> space.
> 
> however recent cpu's are significantly better as handling jumps and loops, 
> and the cost of cache misses is significantly worse.
> 
> is the list of what's included in -O2 vs -Os different for different 
> CPU's? what about within a single family of processors? (even in the x86 
> family the costs of jumps, loops, and cache misses varies drasticly)
> 
> my understanding was that the optimizations for O2 were pretty fixed.
> 
> >The translation to macro level depends greatly on how things are used
> >(you even have to factor in download times etc)... so that is a fair
> >question to leave up to the user... which is what there is today.
> 
> ignore things like download time for the moment. it's not significant to 
> most people as they don't download things that often, and when they do 
> they are almost always downloading lots of stuff they don't need (drivers 
> for example)
> 
> users are trying to get better performance 90+% of the time when they 
> select -Os. That's why it got moved out of CONFIG_EMBEDDED.

In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.
It was not only because of cache considerations, but because gcc used
different tricks to avoid poor optimizations, and at the end, the CPU
ended executing the alternative code faster.

With gcc-3.3, -Os show roughly the same performance as -O2 for me on
various programs. However, with gcc-3.4, I noticed a slow down with
-Os. And with gcc-4, using -Os optimizes only for size, even if the
output code is slow as hell. I've had programs whose speed dropped
by 70% using -Os on gcc-4. But their size was smaller than with older
versions.

But in some situtations, it's desirable to have the smallest possible
kernel whatever its performance. This goes for installation CDs for
instance.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread david

On Mon, 25 Jun 2007, Adrian Bunk wrote:


On Sun, Jun 24, 2007 at 09:34:05PM -0400, Jeff Garzik wrote:

Adrian Bunk wrote:

The interesting questions are:
Does -Os still sometimes generate faster code with gcc 4.2?
If yes, why?


Smaller code can mean fewer page faults, fewer cache invalidations, etc.

It's not just a matter of compiler code generation, gotta look at the whole
picture.


the picture gets even murkier when you consider that even if neither 
option overflows the cpu cache the one that takes more space in the cache 
leaves less space in the cache for the userspacde code that the system is 
actually there to run.



Sure, but my point is that if the kernel is considered special and the
best optimization for the kernel is therefore between -Os and -O2, we
should try to find this point of best optimization.

This should address Arjans point that -Os might not be best choice for
best performance (and it's actually our fault if gcc generates stupid
but small code when we use -Os).


what can be done to find the horribly bad but small code among the "it's 
smaller and would be less efficiant if you didn't consider the cache" 
majority?


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Adrian Bunk
On Sun, Jun 24, 2007 at 09:34:05PM -0400, Jeff Garzik wrote:
> Adrian Bunk wrote:
>> The interesting questions are:
>> Does -Os still sometimes generate faster code with gcc 4.2?
>> If yes, why?
>
> Smaller code can mean fewer page faults, fewer cache invalidations, etc.
>
> It's not just a matter of compiler code generation, gotta look at the whole 
> picture.

Sure, but my point is that if the kernel is considered special and the 
best optimization for the kernel is therefore between -Os and -O2, we 
should try to find this point of best optimization.

This should address Arjans point that -Os might not be best choice for 
best performance (and it's actually our fault if gcc generates stupid 
but small code when we use -Os).

>   Jeff

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Rene Herman

On 06/25/2007 03:33 AM, [EMAIL PROTECTED] wrote:

is the list of what's included in -O2 vs -Os different for different 
CPU's? what about within a single family of processors? (even in the x86 
family the costs of jumps, loops, and cache misses varies drasticly)


At least not in the example Duron/Athlon case. Both -march=athlon{,-4) but 
64K versus 256K L2 which I'd expect to be an important difference in the -Os 
versus -O2 behaviour.


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Rene Herman

On 06/25/2007 03:23 AM, Rene Herman wrote:


On 06/25/2007 02:41 AM, Adrian Bunk wrote:


The interesting questions are:
Does -Os still sometimes generate faster code with gcc 4.2?
If yes, why?


I would wager that the CPU type makes more of a difference than the 
compiler version. That is, I'd expect my Duron with it's "puny" 64K L1 
to have a very different profile than it's Athlon brother with 256K L1.


Sorry, that should've been L2. And "its" ...

 > Not to mention CPUs with as little as 8K L1 (P1).


I can't quote numbers -- it's a bit hard to test those things anyway as 
it's a system-global effect and not su much that's easily isolated in a 
dedicated benchmark.


And while I'm at it, "and not so much one that's [ ...]".

Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Jeff Garzik

Adrian Bunk wrote:

The interesting questions are:
Does -Os still sometimes generate faster code with gcc 4.2?
If yes, why?


Smaller code can mean fewer page faults, fewer cache invalidations, etc.

It's not just a matter of compiler code generation, gotta look at the 
whole picture.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread david

On Sun, 24 Jun 2007, Arjan van de Ven wrote:


On Sun, 2007-06-24 at 18:08 -0700, [EMAIL PROTECTED] wrote:


on a system level, size can help performance because you have more
memory available for other things.  It also reduces download size and
gives you more space on the live CD

if you want to make things bigger again, please do this OUTSIDE the
"optimize for size" option. Because that TELLS you to go for size.


then do we need a new option 'optimize for best overall performance' that
goes for size (and the corresponding wins there) most of the time, but is
ignored where it makes a huge difference?


that isn't so easy. Anything which doesn't have a performance tradeoff
is in -O2 already. So every single thing in -Os costs you performance on
a micro level.


this has not been true in the past (assuming that it's true today)

ok, if you look at a micro-enough level this may be true, but completely 
ignoring things like download times, the optimizations almost always boil 
down to trying to avoid jumps, loops, and decision logic at the expense of 
space.


however recent cpu's are significantly better as handling jumps and loops, 
and the cost of cache misses is significantly worse.


is the list of what's included in -O2 vs -Os different for different 
CPU's? what about within a single family of processors? (even in the x86 
family the costs of jumps, loops, and cache misses varies drasticly)


my understanding was that the optimizations for O2 were pretty fixed.


The translation to macro level depends greatly on how things are used
(you even have to factor in download times etc)... so that is a fair
question to leave up to the user... which is what there is today.


ignore things like download time for the moment. it's not significant to 
most people as they don't download things that often, and when they do 
they are almost always downloading lots of stuff they don't need (drivers 
for example)


users are trying to get better performance 90+% of the time when they 
select -Os. That's why it got moved out of CONFIG_EMBEDDED.


David Lang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Adrian Bunk
On Sun, Jun 24, 2007 at 05:58:46PM -0700, Arjan van de Ven wrote:
> 
> > I wouldn't care if CONFIG_CC_OPTIMIZE_FOR_SIZE was hidden behind
> > CONFIG_EMBEDDED, but as long as it's available as a general purpose
> > option we have to consider it's performance.
> 
> I think you are missing the point. You tell the kernel to
> OPTIMIZE_FOR_SIZE. *over performance*. Sure. Performance shouldn't be
> EXTREMELY pathetic, but it's not; and if it were, it's a problem with
> the gcc version you have (and if you are a distro, you can surely fix
> that)

My point is commit c45b4f1f1e149c023762ac4be166ead1818cefef

CC_OPTIMIZE_FOR_SIZE is currently known as an experimental feature to 
improve the _performance_.

> > The interesting questions are:
> > Does -Os still sometimes generate faster code with gcc 4.2?
> > If yes, why?
> 
> on a system level, size can help performance because you have more
> memory available for other things.

For a given gcc version, there's a finite number of differences between 
-Os and -O2. 

The interesting question is for which differences with gcc 4.2 we want 
the -Os version in the kernel for best performance. This should then be 
controllable through gcc options.

> It also reduces download size and 
> gives you more space on the live CD

That's a different point.

If you don't care about performance but care about size then -Os is 
the best choice.

> if you want to make things bigger again, please do this OUTSIDE the
> "optimize for size" option. Because that TELLS you to go for size.

Agreed, but CONFIG_CC_OPTIMIZE_FOR_SIZE should again be under 
CONFIG_EMBEDDED.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Rene Herman

On 06/25/2007 02:41 AM, Adrian Bunk wrote:


The interesting questions are:
Does -Os still sometimes generate faster code with gcc 4.2?
If yes, why?


I would wager that the CPU type makes more of a difference than the compiler 
version. That is, I'd expect my Duron with it's "puny" 64K L1 to have a very 
different profile than it's Athlon brother with 256K L1. Not to mention CPUs 
with as little as 8K L1 (P1).


I can't quote numbers -- it's a bit hard to test those things anyway as it's 
a system-global effect and not su much that's easily isolated in a dedicated 
benchmark.


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Arjan van de Ven
On Sun, 2007-06-24 at 18:08 -0700, [EMAIL PROTECTED] wrote:
> >
> > on a system level, size can help performance because you have more
> > memory available for other things.  It also reduces download size and
> > gives you more space on the live CD
> >
> > if you want to make things bigger again, please do this OUTSIDE the
> > "optimize for size" option. Because that TELLS you to go for size.
> 
> then do we need a new option 'optimize for best overall performance' that 
> goes for size (and the corresponding wins there) most of the time, but is 
> ignored where it makes a huge difference?

that isn't so easy. Anything which doesn't have a performance tradeoff
is in -O2 already. So every single thing in -Os costs you performance on
a micro level.

The translation to macro level depends greatly on how things are used
(you even have to factor in download times etc)... so that is a fair
question to leave up to the user... which is what there is today.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread david

On Sun, 24 Jun 2007, Arjan van de Ven wrote:


I wouldn't care if CONFIG_CC_OPTIMIZE_FOR_SIZE was hidden behind
CONFIG_EMBEDDED, but as long as it's available as a general purpose
option we have to consider it's performance.


I think you are missing the point. You tell the kernel to
OPTIMIZE_FOR_SIZE. *over performance*. Sure. Performance shouldn't be
EXTREMELY pathetic, but it's not; and if it were, it's a problem with
the gcc version you have (and if you are a distro, you can surely fix
that)



The interesting questions are:
Does -Os still sometimes generate faster code with gcc 4.2?
If yes, why?


on a system level, size can help performance because you have more
memory available for other things.  It also reduces download size and
gives you more space on the live CD

if you want to make things bigger again, please do this OUTSIDE the
"optimize for size" option. Because that TELLS you to go for size.


then do we need a new option 'optimize for best overall performance' that 
goes for size (and the corresponding wins there) most of the time, but is 
ignored where it makes a huge difference?


I started useing Os several years ago, even when it was hidden in the 
embedded menu becouse in many cases the smaller binary ended up being 
faster.


in reality this was a flaw in gcc that on modern CPU's with the larger 
difference between CPU speed and memory speed it still preferred to unroll 
loops (eating more memory and blowing out the cpu cache) when it shouldn't 
have.


if that has been fixed on later versions of gcc this would be a good 
thing. if it hasn't (possibly in part due to gcc optimizations being 
designed to be cross platform) then either the current 'go for size' or a 
hybrid 'performance' option is needed.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Arjan van de Ven

> I wouldn't care if CONFIG_CC_OPTIMIZE_FOR_SIZE was hidden behind
> CONFIG_EMBEDDED, but as long as it's available as a general purpose
> option we have to consider it's performance.

I think you are missing the point. You tell the kernel to
OPTIMIZE_FOR_SIZE. *over performance*. Sure. Performance shouldn't be
EXTREMELY pathetic, but it's not; and if it were, it's a problem with
the gcc version you have (and if you are a distro, you can surely fix
that)

> 
> The interesting questions are:
> Does -Os still sometimes generate faster code with gcc 4.2?
> If yes, why?

on a system level, size can help performance because you have more
memory available for other things.  It also reduces download size and
gives you more space on the live CD

if you want to make things bigger again, please do this OUTSIDE the
"optimize for size" option. Because that TELLS you to go for size.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Arjan van de Ven

 I wouldn't care if CONFIG_CC_OPTIMIZE_FOR_SIZE was hidden behind
 CONFIG_EMBEDDED, but as long as it's available as a general purpose
 option we have to consider it's performance.

I think you are missing the point. You tell the kernel to
OPTIMIZE_FOR_SIZE. *over performance*. Sure. Performance shouldn't be
EXTREMELY pathetic, but it's not; and if it were, it's a problem with
the gcc version you have (and if you are a distro, you can surely fix
that)

 
 The interesting questions are:
 Does -Os still sometimes generate faster code with gcc 4.2?
 If yes, why?

on a system level, size can help performance because you have more
memory available for other things.  It also reduces download size and
gives you more space on the live CD

if you want to make things bigger again, please do this OUTSIDE the
optimize for size option. Because that TELLS you to go for size.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread david

On Sun, 24 Jun 2007, Arjan van de Ven wrote:


I wouldn't care if CONFIG_CC_OPTIMIZE_FOR_SIZE was hidden behind
CONFIG_EMBEDDED, but as long as it's available as a general purpose
option we have to consider it's performance.


I think you are missing the point. You tell the kernel to
OPTIMIZE_FOR_SIZE. *over performance*. Sure. Performance shouldn't be
EXTREMELY pathetic, but it's not; and if it were, it's a problem with
the gcc version you have (and if you are a distro, you can surely fix
that)



The interesting questions are:
Does -Os still sometimes generate faster code with gcc 4.2?
If yes, why?


on a system level, size can help performance because you have more
memory available for other things.  It also reduces download size and
gives you more space on the live CD

if you want to make things bigger again, please do this OUTSIDE the
optimize for size option. Because that TELLS you to go for size.


then do we need a new option 'optimize for best overall performance' that 
goes for size (and the corresponding wins there) most of the time, but is 
ignored where it makes a huge difference?


I started useing Os several years ago, even when it was hidden in the 
embedded menu becouse in many cases the smaller binary ended up being 
faster.


in reality this was a flaw in gcc that on modern CPU's with the larger 
difference between CPU speed and memory speed it still preferred to unroll 
loops (eating more memory and blowing out the cpu cache) when it shouldn't 
have.


if that has been fixed on later versions of gcc this would be a good 
thing. if it hasn't (possibly in part due to gcc optimizations being 
designed to be cross platform) then either the current 'go for size' or a 
hybrid 'performance' option is needed.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Arjan van de Ven
On Sun, 2007-06-24 at 18:08 -0700, [EMAIL PROTECTED] wrote:
 
  on a system level, size can help performance because you have more
  memory available for other things.  It also reduces download size and
  gives you more space on the live CD
 
  if you want to make things bigger again, please do this OUTSIDE the
  optimize for size option. Because that TELLS you to go for size.
 
 then do we need a new option 'optimize for best overall performance' that 
 goes for size (and the corresponding wins there) most of the time, but is 
 ignored where it makes a huge difference?

that isn't so easy. Anything which doesn't have a performance tradeoff
is in -O2 already. So every single thing in -Os costs you performance on
a micro level.

The translation to macro level depends greatly on how things are used
(you even have to factor in download times etc)... so that is a fair
question to leave up to the user... which is what there is today.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Rene Herman

On 06/25/2007 02:41 AM, Adrian Bunk wrote:


The interesting questions are:
Does -Os still sometimes generate faster code with gcc 4.2?
If yes, why?


I would wager that the CPU type makes more of a difference than the compiler 
version. That is, I'd expect my Duron with it's puny 64K L1 to have a very 
different profile than it's Athlon brother with 256K L1. Not to mention CPUs 
with as little as 8K L1 (P1).


I can't quote numbers -- it's a bit hard to test those things anyway as it's 
a system-global effect and not su much that's easily isolated in a dedicated 
benchmark.


Rene.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Adrian Bunk
On Sun, Jun 24, 2007 at 05:58:46PM -0700, Arjan van de Ven wrote:
 
  I wouldn't care if CONFIG_CC_OPTIMIZE_FOR_SIZE was hidden behind
  CONFIG_EMBEDDED, but as long as it's available as a general purpose
  option we have to consider it's performance.
 
 I think you are missing the point. You tell the kernel to
 OPTIMIZE_FOR_SIZE. *over performance*. Sure. Performance shouldn't be
 EXTREMELY pathetic, but it's not; and if it were, it's a problem with
 the gcc version you have (and if you are a distro, you can surely fix
 that)

My point is commit c45b4f1f1e149c023762ac4be166ead1818cefef

CC_OPTIMIZE_FOR_SIZE is currently known as an experimental feature to 
improve the _performance_.

  The interesting questions are:
  Does -Os still sometimes generate faster code with gcc 4.2?
  If yes, why?
 
 on a system level, size can help performance because you have more
 memory available for other things.

For a given gcc version, there's a finite number of differences between 
-Os and -O2. 

The interesting question is for which differences with gcc 4.2 we want 
the -Os version in the kernel for best performance. This should then be 
controllable through gcc options.

 It also reduces download size and 
 gives you more space on the live CD

That's a different point.

If you don't care about performance but care about size then -Os is 
the best choice.

 if you want to make things bigger again, please do this OUTSIDE the
 optimize for size option. Because that TELLS you to go for size.

Agreed, but CONFIG_CC_OPTIMIZE_FOR_SIZE should again be under 
CONFIG_EMBEDDED.

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread david

On Sun, 24 Jun 2007, Arjan van de Ven wrote:


On Sun, 2007-06-24 at 18:08 -0700, [EMAIL PROTECTED] wrote:


on a system level, size can help performance because you have more
memory available for other things.  It also reduces download size and
gives you more space on the live CD

if you want to make things bigger again, please do this OUTSIDE the
optimize for size option. Because that TELLS you to go for size.


then do we need a new option 'optimize for best overall performance' that
goes for size (and the corresponding wins there) most of the time, but is
ignored where it makes a huge difference?


that isn't so easy. Anything which doesn't have a performance tradeoff
is in -O2 already. So every single thing in -Os costs you performance on
a micro level.


this has not been true in the past (assuming that it's true today)

ok, if you look at a micro-enough level this may be true, but completely 
ignoring things like download times, the optimizations almost always boil 
down to trying to avoid jumps, loops, and decision logic at the expense of 
space.


however recent cpu's are significantly better as handling jumps and loops, 
and the cost of cache misses is significantly worse.


is the list of what's included in -O2 vs -Os different for different 
CPU's? what about within a single family of processors? (even in the x86 
family the costs of jumps, loops, and cache misses varies drasticly)


my understanding was that the optimizations for O2 were pretty fixed.


The translation to macro level depends greatly on how things are used
(you even have to factor in download times etc)... so that is a fair
question to leave up to the user... which is what there is today.


ignore things like download time for the moment. it's not significant to 
most people as they don't download things that often, and when they do 
they are almost always downloading lots of stuff they don't need (drivers 
for example)


users are trying to get better performance 90+% of the time when they 
select -Os. That's why it got moved out of CONFIG_EMBEDDED.


David Lang

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Jeff Garzik

Adrian Bunk wrote:

The interesting questions are:
Does -Os still sometimes generate faster code with gcc 4.2?
If yes, why?


Smaller code can mean fewer page faults, fewer cache invalidations, etc.

It's not just a matter of compiler code generation, gotta look at the 
whole picture.


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Rene Herman

On 06/25/2007 03:23 AM, Rene Herman wrote:


On 06/25/2007 02:41 AM, Adrian Bunk wrote:


The interesting questions are:
Does -Os still sometimes generate faster code with gcc 4.2?
If yes, why?


I would wager that the CPU type makes more of a difference than the 
compiler version. That is, I'd expect my Duron with it's puny 64K L1 
to have a very different profile than it's Athlon brother with 256K L1.


Sorry, that should've been L2. And its ...

  Not to mention CPUs with as little as 8K L1 (P1).


I can't quote numbers -- it's a bit hard to test those things anyway as 
it's a system-global effect and not su much that's easily isolated in a 
dedicated benchmark.


And while I'm at it, and not so much one that's [ ...].

Rene.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Rene Herman

On 06/25/2007 03:33 AM, [EMAIL PROTECTED] wrote:

is the list of what's included in -O2 vs -Os different for different 
CPU's? what about within a single family of processors? (even in the x86 
family the costs of jumps, loops, and cache misses varies drasticly)


At least not in the example Duron/Athlon case. Both -march=athlon{,-4) but 
64K versus 256K L2 which I'd expect to be an important difference in the -Os 
versus -O2 behaviour.


Rene.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Adrian Bunk
On Sun, Jun 24, 2007 at 09:34:05PM -0400, Jeff Garzik wrote:
 Adrian Bunk wrote:
 The interesting questions are:
 Does -Os still sometimes generate faster code with gcc 4.2?
 If yes, why?

 Smaller code can mean fewer page faults, fewer cache invalidations, etc.

 It's not just a matter of compiler code generation, gotta look at the whole 
 picture.

Sure, but my point is that if the kernel is considered special and the 
best optimization for the kernel is therefore between -Os and -O2, we 
should try to find this point of best optimization.

This should address Arjans point that -Os might not be best choice for 
best performance (and it's actually our fault if gcc generates stupid 
but small code when we use -Os).

   Jeff

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread david

On Mon, 25 Jun 2007, Adrian Bunk wrote:


On Sun, Jun 24, 2007 at 09:34:05PM -0400, Jeff Garzik wrote:

Adrian Bunk wrote:

The interesting questions are:
Does -Os still sometimes generate faster code with gcc 4.2?
If yes, why?


Smaller code can mean fewer page faults, fewer cache invalidations, etc.

It's not just a matter of compiler code generation, gotta look at the whole
picture.


the picture gets even murkier when you consider that even if neither 
option overflows the cpu cache the one that takes more space in the cache 
leaves less space in the cache for the userspacde code that the system is 
actually there to run.



Sure, but my point is that if the kernel is considered special and the
best optimization for the kernel is therefore between -Os and -O2, we
should try to find this point of best optimization.

This should address Arjans point that -Os might not be best choice for
best performance (and it's actually our fault if gcc generates stupid
but small code when we use -Os).


what can be done to find the horribly bad but small code among the it's 
smaller and would be less efficiant if you didn't consider the cache 
majority?


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-24 Thread Willy Tarreau
On Sun, Jun 24, 2007 at 06:33:15PM -0700, [EMAIL PROTECTED] wrote:
 On Sun, 24 Jun 2007, Arjan van de Ven wrote:
 
 On Sun, 2007-06-24 at 18:08 -0700, [EMAIL PROTECTED] wrote:
 
 on a system level, size can help performance because you have more
 memory available for other things.  It also reduces download size and
 gives you more space on the live CD
 
 if you want to make things bigger again, please do this OUTSIDE the
 optimize for size option. Because that TELLS you to go for size.
 
 then do we need a new option 'optimize for best overall performance' that
 goes for size (and the corresponding wins there) most of the time, but is
 ignored where it makes a huge difference?
 
 that isn't so easy. Anything which doesn't have a performance tradeoff
 is in -O2 already. So every single thing in -Os costs you performance on
 a micro level.
 
 this has not been true in the past (assuming that it's true today)
 
 ok, if you look at a micro-enough level this may be true, but completely 
 ignoring things like download times, the optimizations almost always boil 
 down to trying to avoid jumps, loops, and decision logic at the expense of 
 space.
 
 however recent cpu's are significantly better as handling jumps and loops, 
 and the cost of cache misses is significantly worse.
 
 is the list of what's included in -O2 vs -Os different for different 
 CPU's? what about within a single family of processors? (even in the x86 
 family the costs of jumps, loops, and cache misses varies drasticly)
 
 my understanding was that the optimizations for O2 were pretty fixed.
 
 The translation to macro level depends greatly on how things are used
 (you even have to factor in download times etc)... so that is a fair
 question to leave up to the user... which is what there is today.
 
 ignore things like download time for the moment. it's not significant to 
 most people as they don't download things that often, and when they do 
 they are almost always downloading lots of stuff they don't need (drivers 
 for example)
 
 users are trying to get better performance 90+% of the time when they 
 select -Os. That's why it got moved out of CONFIG_EMBEDDED.

In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.
It was not only because of cache considerations, but because gcc used
different tricks to avoid poor optimizations, and at the end, the CPU
ended executing the alternative code faster.

With gcc-3.3, -Os show roughly the same performance as -O2 for me on
various programs. However, with gcc-3.4, I noticed a slow down with
-Os. And with gcc-4, using -Os optimizes only for size, even if the
output code is slow as hell. I've had programs whose speed dropped
by 70% using -Os on gcc-4. But their size was smaller than with older
versions.

But in some situtations, it's desirable to have the smallest possible
kernel whatever its performance. This goes for installation CDs for
instance.

Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/