Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-23 Thread Boyd Stephen Smith Jr.
On Saturday 23 September 2006 17:21, Peter Humphrey <[EMAIL PROTECTED]> 
wrote about 'Re: [gentoo-amd64]  Re: gcc 4.1 upgrade - bad desktop 
interactivity anyone?':
> On Saturday 23 September 2006 19:52, Duncan wrote:
> > However, the only difference (CFLAGS wise) that I'm aware of for the
> > AMD dual-cores is that they now incorporate SSE3, while my old 242s
> > and I presume your 246s don't.
>
> Nope. SSE and SSE2, but not SSE3. According to /proc/cpuinfo, that is.

I can verify that the 275s do support SSE3 (flag: pni) from 
my /proc/cpuinfo:
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 33
model name  : Dual Core AMD Opteron(tm) Processor 275
stepping: 2
cpu MHz : 2200.000
fpu : yes
fpu_exception   : yes
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 
3dnowext 3dnow pni lahf_lm cmp_legacy
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


-- 
"If there's one thing we've established over the years,
it's that the vast majority of our users don't have the slightest
clue what's best for them in terms of package stability."
-- Gentoo Developer Ciaran McCreesh


pgpFZ2jceV7Tf.pgp
Description: PGP signature


Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-23 Thread Jason Booth
On Saturday 23 September 2006 16:21, Peter Humphrey wrote:
> Nope. SSE and SSE2, but not SSE3. According to /proc/cpuinfo, that is.

The flag in cpuinfo is pni for "Prescott New Instructions".

Cheers,

Jason
-- 
gentoo-amd64@gentoo.org mailing list



Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-23 Thread Peter Humphrey
On Saturday 23 September 2006 19:52, Duncan wrote:

> However, the only difference (CFLAGS wise) that I'm aware of for the AMD
> dual-cores is that they now incorporate SSE3, while my old 242s and I
> presume your 246s don't.

Nope. SSE and SSE2, but not SSE3. According to /proc/cpuinfo, that is.

-- 
Rgds
Peter
-- 
gentoo-amd64@gentoo.org mailing list



[gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-23 Thread Duncan
Peter Humphrey <[EMAIL PROTECTED]> posted
[EMAIL PROTECTED], excerpted below, on  Sat, 23 Sep
2006 14:39:11 +:

> Which model of Opteron are your CPUs? I have a feeling they differ from my
> 246s, and I've been wondering how I ought to tune your helpfully explained
> flags to suit my box.

I'm running 242s at present, so they should be fairly similar.

I plan on upgrading to dual-cores later this year or early next, when the
prices seem to be down to a reasonable level as the new socket format
takes over, and will run that for another couple years before I even think
of upgrading mobo/cpu/memory again, at which point I'll have been running
the same base mobo and platform for over five years(!!), and expect to
upgrade to a single socket 8-core model as mid-grade. (Of course by then
AMD's multi-socket co-processor model or a variation thereof may have
taken the market by storm, and I might as a result be buying a two or more
socket mobo with one for CPU and the other for GPU, or some such.)

However, the only difference (CFLAGS wise) that I'm aware of for the AMD
dual-cores is that they now incorporate SSE3, while my old 242s and I
presume your 246s don't.  The other changes I'll be making at the upgrade
will be in terms of kernel config.  Naturally, with dual Opterons, I'm
already running SMP, but I have it set for two max, and with the
dual-cores, that will of course change to four.  Additionally, there's
only one level of CPU/core zoning ATM, while there will be two levels
then, as the pair of cores on the same CPU will cooperate even closer than
the two in separate sockets but connected by hypertransport bus do.

The big difference in CFLAGS at this point is between Intel and AMD
products, and since we are both running AMD, that's not an issue.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list



Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-23 Thread Peter Humphrey
On Thursday 14 September 2006 20:08, Duncan wrote:

> Here's my CFLAGS/CXXFLAGS:

...etc.

Which model of Opteron are your CPUs? I have a feeling they differ from my 
246s, and I've been wondering how I ought to tune your helpfully explained 
flags to suit my box.

-- 
Rgds
Peter
-- 
gentoo-amd64@gentoo.org mailing list



[gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-15 Thread Duncan
"Mark Knecht" <[EMAIL PROTECTED]> posted
[EMAIL PROTECTED], excerpted
below, on  Fri, 15 Sep 2006 11:06:47 -0700:

> On 9/14/06, Mark Knecht <[EMAIL PROTECTED]> wrote:
>> Hi,
>>I'm just curious whether anyone besides me is noticing their
>> machine feeling somewhat sluggish since doing the gcc-4.1 upgrade?
> 
> I noticed this morning that MythTV's frontend program is often using
> >90% CPU when viewed in top.
> 
> It never used more than 10% before the upgrade to gcc-4.
> 
> Clearly this is at least part of the problem here.

Indeed, that would explain your observations.  Perhaps either the
front-end or some library it loads is one of the few programs that just
doesn't work quite right with gcc-4.1 yet.  Good detective work!

So it would appear you have to try recompiling it with gcc-3.x again, and
see if that eliminates the problem.  If not, you'll have to check its
dependency tree and try recompiling it.  Get that 90% off the CPU and
maybe you'll see the better general efficiency of gcc-4.1, regardless of
whether you try my cflags or not.  In fact, that's what I'd recommend you
do, before trying my cflags.  You'd then have a better base on which to
measure whether my cflags made a difference for you or not, as opposed to
what gcc-4.1.x itself did.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list



[gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-15 Thread Duncan
"Mark Knecht" <[EMAIL PROTECTED]> posted
[EMAIL PROTECTED], excerpted
below, on  Thu, 14 Sep 2006 17:43:19 -0700:

>Now, you are very adept at this. You're explanations make sense to
> the level I've considered them. (Not very far right now...) Main
> questions:

Adept, perhaps, but don't take my observations as being from God or
anything! =8^)  I try to be fairly cautious with my CFLAGS, but if
anything quits working, I know how to undo them and try with a more
generic set, and in fact do so from time to time on individual packages,
before filing bugs on them.  Sometimes it's my CFLAGS, tho usually my
config doesn't matter a whit to the bug, as I've been reasonably cautious
in my choices to begin with and don't tend to enable stuff like the unsafe
floating-point math options that give folks problems from time to time.

In particular, as you can see from the -ftree-vectorize subthread, I tend
to stay with the defaults when I can't explain with some degree of
confidence exactly what the effect of a flag might be and why I might or
might not want it.  I don't know enough about that area to do that, so
I've stayed well away from it in my CFLAGS.

> 1) What can be done to test this out at my end without making a 2-day
> commitment to rebuild the complete machine. Is it possibly to rebuild
> only portions of the machine using a different set of flags or is it a
> system wide commitment requiring that I rebuild 575 packages as I did
> last weekend?

In general, you /can/ rebuild only a part of your system and test that,
before making further changes.  However, it's important to use a bit of
(un?)common sense when doing so, or your results won't be worth much. 
Basically, in ordered to see how an optimization affects something, you
must have some awareness of the shared libraries it uses and to what
extent it uses them, recompiling enough of the heavily used dependencies
that the critical parts of your test applications (including the libraries
they load) are using the new optimizations.

One lib that all applications make some use of is glibc, so it can be worth
recompiling.  It's a big recompile on its own, but of course nowhere near
as big as recompiling the entire system. =8^)  However, glibc is a special
case in some aspects for a number of reasons. The glibc ebuild is pretty
conservative with the flags it allows, and actually replaces -Os with -O2,
due to problems -Os had mainly on x86, back in the gcc-3.2 and 3.3 era. 
Since the system is pretty horribly broken if glibc breaks, to the point
you are likely to have to boot to a backup or liveCD to fix it, this isn't
an unreasonable policy at all.

None-the-less, after making doubly sure I had tested-working backups, I
decided to see just what the effect of taking out that -Os -> -O2 replace
in the glibc ebuild might be.  For awhile I actually ran a glibc I had
built after having removed that replace.  The system continued to work
just fine with a -Os compiled glibc, it didn't break or anything, but it
didn't seem to be much better either and in some cases seemed worse.  It
turns out that glibc is built in a much more modular fashion than many
libraries, so an app will only load the parts of it it needs, not the
parts it doesn't, and that -Os doesn't work so well with this rather
extreme (compared to most libs) modularization.  As well, as I said, glibc
is used by everything on the system, which meant that having bypassed one
of the safeties in the glibc ebuild, I could never be sure whether a bug I
was experiencing was due to my strange glibc, or to some problem with the
package the bug was showing up in or one of its other dependencies.  I
concluded that it simply wasn't worth bypassing the safeties in the
ebuild, and since then, have left them there.

Thus, with glibc anyway, simply switching to -Os in your CFLAGS won't make
any difference, since the ebuild replaces that with -O2 anyway.  The
/other/ CFLAGS might make a difference, but -Os it self won't, unless you
bypass the replace in the ebuild, and as my experimentation demonstrated
well enough for me, that's really not worth the trouble.  As I said, the
other CFLAGS may make a bit of difference tho, so you might consider it
anyway, if you decide to try them.

For X users, another library that's going to be commonly used is libX11. 
You'll probably want to recompile xorg-server (assuming modular-X) as
well, plus whatever xf86-video-* driver you use, and libXcomposite if you
use the composite extension (transparent windows and the like).  Together,
those will be pretty critical for performance of any X app.  For OpenGL
accelerated apps, mesa is likely to be critical to performance as well,
for any functions not handled by hardware.

For anything written in C++, almost anything KDE among other packages, gcc
libstdc++, a part of gcc, will be critical.  Other than for C++
apps/libraries, recompiling gcc with new CFLAGS shouldn't make that much
difference in how the app runs, tho it might make some diff

[gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-15 Thread Mark Knecht

On 9/14/06, Mark Knecht <[EMAIL PROTECTED]> wrote:

Hi,
   I'm just curious whether anyone besides me is noticing their
machine feeling somewhat sluggish since doing the gcc-4.1 upgrade?


I noticed this morning that MythTV's frontend program is often using

90% CPU when viewed in top.


It never used more than 10% before the upgrade to gcc-4.

Clearly this is at least part of the problem here.

I'm interested in Duncan's flags and how to convert the machine
successfully. Is it a complete rebuild?

- Mark
--
gentoo-amd64@gentoo.org mailing list



Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-15 Thread Greg Bur
On 9/14/06, Mark Knecht <[EMAIL PROTECTED]> wrote:
On 9/14/06, Duncan <[EMAIL PROTECTED]> wrote:> "Mark Knecht" <[EMAIL PROTECTED]> posted> 
[EMAIL PROTECTED], excerpted> below, on  Thu, 14 Sep 2006 07:15:42 -0700:2) What about building the kernel? How do the standardmake && make modules_install
command make any use of the flags in /etc/make.conf?I believe you have to modify the Makefile in /usr/src/linux to enable additional optimizations.  I have noticed in recent kernels that there is an option to compile using -Os as well however I have not used that yet simply because I try to play it safe, especially with my kernels.  I would be interested in hearing feedback as to which "safe" optimizations can be used when building a kernel.  
   This machine is a fairly standard desktop running Xorg-7, Gnome andjust a few apps most of the time. However I am an audio oriented
I share your concern here as well.  One app in particular that comes to mind is lyx which in the past has not gotten along well with heavy optimization, at least for me.  Granted this was nearly two years ago when I, like many newcomers to Gentoo, got a bit ridiculous with the CFLAGS.  Duncan's post was very educational and has made me reconsider trying additional optimizations again.  I am definitely interested in hearing recommendations for proceeding even though like Mark I recently completed a rebuild of my system recently. I am looking forward to the continuation of this thread.
Regards,Greg 


Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-14 Thread Mark Knecht

On 9/14/06, Duncan <[EMAIL PROTECTED]> wrote:

"Mark Knecht" <[EMAIL PROTECTED]> posted
[EMAIL PROTECTED], excerpted
below, on  Thu, 14 Sep 2006 07:15:42 -0700:

> I'm just curious whether anyone besides me is noticing their machine
> feeling somewhat sluggish since doing the gcc-4.1 upgrade? Mine seems ot
> be using a lot of memory. Alt-tabbing between windows seems slow.
> Ethernet traffic in my browser is causing pretty noticeable
> interruptions in things like MythTV.

> The machine is still quite usable, but it doesn't feel as snappy as it
> did last week.
>
> I made no changes in /etc/make.conf for the upgrade. Everything is
> pretty basic as far as I can tell:
>
> CFLAGS="-march=k8 -O2 -pipe"

> CXXFLAGS="${CFLAGS}"

I've noticed rather the opposite, here.  gcc-4.1.1 compiled binaries are
/dramatically/ faster and more efficient than 3.x.  However, I'm using a
rather more elaborate CFLAGS/CXXFLAGS, and it's my conviction that gcc-4.1
does better at optimizing exactly the way you've told it to.  That is, if
you've given it inefficient optimizations, I'm convinced it makes a bad
thing worse, while if you've chosen your optimizations well, it makes a
good thing dramatically better.

Here's my CFLAGS/CXXFLAGS:

CFLAGS="-march=k8 -Os -pipe -frename-registers -fweb -freorder-blocks
-freorder-blocks-and-partition -combine -funit-at-a-time -ftree-pre
-fgcse-sm -fgcse-las -fgcse-after-reload -fmerge-all-constants"

CXXFLAGS="-march=k8 -Os -pipe -frename-registers -fweb -freorder-blocks
-funit-at-a-time -ftree-pre -fgcse-sm -fgcse-las -fgcse-after-reload
-fmerge-all-constants"





As I said, with the above, there's a /dramatic/ improvement in
performance between gcc-3.x and gcc-4.1.x.

--
Duncan - List replies preferred.   No HTML msgs.


Hi Duncan,
  As always, very deep thanks for the answer. Very informative and interesting.

  Now, you are very adept at this. You're explanations make sense to
the level I've considered them. (Not very far right now...) Main
questions:

1) What can be done to test this out at my end without making a 2-day
commitment to rebuild the complete machine. Is it possibly to rebuild
only portions of the machine using a different set of flags or is it a
system wide commitment requiring that I rebuild 575 packages as I did
last weekend?

2) What about building the kernel? How do the standard

make && make modules_install

command make any use of the flags in /etc/make.conf?

  This machine is a fairly standard desktop running Xorg-7, Gnome and
just a few apps most of the time. However I am an audio oriented
person so my kernel is rt-sources from the proaudio overlay. (Ingo
Molnar's patches to the kernel.org kernels and not a Gentoo kernel.) I
need to ensure that the audio stuff (Jack, Ardour, Aqualung, 1394 hard
drives) continue to work well.

  Your ideas are most welcome.

Thanks,
Mark
--
gentoo-amd64@gentoo.org mailing list



Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-14 Thread Richard Freeman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Duncan wrote:


Hmm - no -ftree-vectorize?  Care to comment on that?  I hear that it can
be buggy with a few packages, but I'm guessing it is worth having in
there in general.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFCemPG4/rWKZmVWkRArAVAJ9e6zKeeHuNvEa6PlJm3iqzgVmJ8gCgm1rG
P+lazfNdJNmNaaoMMlBBmPw=
=fJaA
-END PGP SIGNATURE-


smime.p7s
Description: S/MIME Cryptographic Signature


[gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?

2006-09-14 Thread Duncan
"Mark Knecht" <[EMAIL PROTECTED]> posted
[EMAIL PROTECTED], excerpted
below, on  Thu, 14 Sep 2006 07:15:42 -0700:

> I'm just curious whether anyone besides me is noticing their machine
> feeling somewhat sluggish since doing the gcc-4.1 upgrade? Mine seems ot
> be using a lot of memory. Alt-tabbing between windows seems slow.
> Ethernet traffic in my browser is causing pretty noticeable
> interruptions in things like MythTV.

> The machine is still quite usable, but it doesn't feel as snappy as it
> did last week.
> 
> I made no changes in /etc/make.conf for the upgrade. Everything is
> pretty basic as far as I can tell:
> 
> CFLAGS="-march=k8 -O2 -pipe"

> CXXFLAGS="${CFLAGS}"

I've noticed rather the opposite, here.  gcc-4.1.1 compiled binaries are
/dramatically/ faster and more efficient than 3.x.  However, I'm using a
rather more elaborate CFLAGS/CXXFLAGS, and it's my conviction that gcc-4.1
does better at optimizing exactly the way you've told it to.  That is, if
you've given it inefficient optimizations, I'm convinced it makes a bad
thing worse, while if you've chosen your optimizations well, it makes a
good thing dramatically better.

Here's my CFLAGS/CXXFLAGS:

CFLAGS="-march=k8 -Os -pipe -frename-registers -fweb -freorder-blocks
-freorder-blocks-and-partition -combine -funit-at-a-time -ftree-pre
-fgcse-sm -fgcse-las -fgcse-after-reload -fmerge-all-constants"

CXXFLAGS="-march=k8 -Os -pipe -frename-registers -fweb -freorder-blocks
-funit-at-a-time -ftree-pre -fgcse-sm -fgcse-las -fgcse-after-reload
-fmerge-all-constants"

The general strategy here is to take advantage of size optimization -- on
modern compilers, L1 and L2 cache are FAR FAR faster than main memory, and
raw CPU cycles runs circles around even cache speeds.  Thus, optimizing
for CPU speed at the expense of size makes little sense, because all those
saved cycles and more are likely to be spent waiting for memory to return
code that /would/ have fit in the cache were it size optimized.

Thus, for example, where traditional optimizations unroll loops into
flat code where possible, to avoid the expense of the jump back to the top
of the loop, that spreads out the loop to several times its original code
size, thus taking far more room in fast cache and forcing the CPU to wait
far more often for code to be fetched from main memory.  I prefer to keep
the loops, making the code smaller and thus allowing more of it to fit in
faster cache.  I believe that for most code, this technique will result in
faster execution in the real world, despite the theoretical loss of a CPU
cycle here or there due to jumping back to the top of the loop.

The -freorder-blocks-and-partition, OTOH, can make code slightly larger,
but the effect is the same as the above, increasing execution speed.  What
this optimization does is separate code that is used often from that which
is seldom used, so the "hot" code is smaller and fits better in high speed
cache, while the "cold" code ends up in slower main memory most of the
time.  While a lower percentage of the code may be in cache due to the
larger size, cache will be used far more effectively, as more "hot" code
will be retained therein, with the cold code that's not used so often
allowed to drop out of cache into main memory.  This particular
optimization doesn't work well with C++, however, so it's in my CFLAGS but
not my CXXFLAGS.

Likewise with -combine, which allows the compiler to optimize across
multiple source files at a time.  It's only implemented for C at this time
(according to the gcc manpage), so it's in my CFLAGS but omitted from my
CXXFLAGS.

The other strategy here is to make as full a use of the extra registers
available to amd64 in 64-bit mode (as opposed to 32-bit x86 mode) as
possible.  Registers operate at the speed of the CPU, no wait at all, as
there is for even L1 cache, so it pays to use them as efficiently as
possible.  Several of the flags (-frename-registers of course, -fweb, etc)
in my CFLAGS are therefore designed to encourage gcc to do this.

All the flags I've not mentioned specifically are designed to further the
three common goals mentioned above, making as efficient a use as possible
of the speed of (1) registers and (2) cache memory, by allowing gcc to
optimize over as wide a scope (3, whole units with unit-at-a-time, or
even multiple units with -combine) as possible.  Of course, see the gcc
manpage for additional details.

As I said, with the above, there's a /dramatic/ improvement in
performance between gcc-3.x and gcc-4.1.x.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list