Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
On Saturday 23 September 2006 17:21, Peter Humphrey <[EMAIL PROTECTED]> wrote about 'Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?': > On Saturday 23 September 2006 19:52, Duncan wrote: > > However, the only difference (CFLAGS wise) that I'm aware of for the > > AMD dual-cores is that they now incorporate SSE3, while my old 242s > > and I presume your 246s don't. > > Nope. SSE and SSE2, but not SSE3. According to /proc/cpuinfo, that is. I can verify that the 275s do support SSE3 (flag: pni) from my /proc/cpuinfo: vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 275 stepping: 2 cpu MHz : 2200.000 fpu : yes fpu_exception : yes wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp -- "If there's one thing we've established over the years, it's that the vast majority of our users don't have the slightest clue what's best for them in terms of package stability." -- Gentoo Developer Ciaran McCreesh pgpFZ2jceV7Tf.pgp Description: PGP signature
Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
On Saturday 23 September 2006 16:21, Peter Humphrey wrote: > Nope. SSE and SSE2, but not SSE3. According to /proc/cpuinfo, that is. The flag in cpuinfo is pni for "Prescott New Instructions". Cheers, Jason -- gentoo-amd64@gentoo.org mailing list
Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
On Saturday 23 September 2006 19:52, Duncan wrote: > However, the only difference (CFLAGS wise) that I'm aware of for the AMD > dual-cores is that they now incorporate SSE3, while my old 242s and I > presume your 246s don't. Nope. SSE and SSE2, but not SSE3. According to /proc/cpuinfo, that is. -- Rgds Peter -- gentoo-amd64@gentoo.org mailing list
[gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
Peter Humphrey <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Sat, 23 Sep 2006 14:39:11 +: > Which model of Opteron are your CPUs? I have a feeling they differ from my > 246s, and I've been wondering how I ought to tune your helpfully explained > flags to suit my box. I'm running 242s at present, so they should be fairly similar. I plan on upgrading to dual-cores later this year or early next, when the prices seem to be down to a reasonable level as the new socket format takes over, and will run that for another couple years before I even think of upgrading mobo/cpu/memory again, at which point I'll have been running the same base mobo and platform for over five years(!!), and expect to upgrade to a single socket 8-core model as mid-grade. (Of course by then AMD's multi-socket co-processor model or a variation thereof may have taken the market by storm, and I might as a result be buying a two or more socket mobo with one for CPU and the other for GPU, or some such.) However, the only difference (CFLAGS wise) that I'm aware of for the AMD dual-cores is that they now incorporate SSE3, while my old 242s and I presume your 246s don't. The other changes I'll be making at the upgrade will be in terms of kernel config. Naturally, with dual Opterons, I'm already running SMP, but I have it set for two max, and with the dual-cores, that will of course change to four. Additionally, there's only one level of CPU/core zoning ATM, while there will be two levels then, as the pair of cores on the same CPU will cooperate even closer than the two in separate sockets but connected by hypertransport bus do. The big difference in CFLAGS at this point is between Intel and AMD products, and since we are both running AMD, that's not an issue. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- gentoo-amd64@gentoo.org mailing list
Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
On Thursday 14 September 2006 20:08, Duncan wrote: > Here's my CFLAGS/CXXFLAGS: ...etc. Which model of Opteron are your CPUs? I have a feeling they differ from my 246s, and I've been wondering how I ought to tune your helpfully explained flags to suit my box. -- Rgds Peter -- gentoo-amd64@gentoo.org mailing list
[gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
"Mark Knecht" <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Fri, 15 Sep 2006 11:06:47 -0700: > On 9/14/06, Mark Knecht <[EMAIL PROTECTED]> wrote: >> Hi, >>I'm just curious whether anyone besides me is noticing their >> machine feeling somewhat sluggish since doing the gcc-4.1 upgrade? > > I noticed this morning that MythTV's frontend program is often using > >90% CPU when viewed in top. > > It never used more than 10% before the upgrade to gcc-4. > > Clearly this is at least part of the problem here. Indeed, that would explain your observations. Perhaps either the front-end or some library it loads is one of the few programs that just doesn't work quite right with gcc-4.1 yet. Good detective work! So it would appear you have to try recompiling it with gcc-3.x again, and see if that eliminates the problem. If not, you'll have to check its dependency tree and try recompiling it. Get that 90% off the CPU and maybe you'll see the better general efficiency of gcc-4.1, regardless of whether you try my cflags or not. In fact, that's what I'd recommend you do, before trying my cflags. You'd then have a better base on which to measure whether my cflags made a difference for you or not, as opposed to what gcc-4.1.x itself did. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- gentoo-amd64@gentoo.org mailing list
[gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
"Mark Knecht" <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Thu, 14 Sep 2006 17:43:19 -0700: >Now, you are very adept at this. You're explanations make sense to > the level I've considered them. (Not very far right now...) Main > questions: Adept, perhaps, but don't take my observations as being from God or anything! =8^) I try to be fairly cautious with my CFLAGS, but if anything quits working, I know how to undo them and try with a more generic set, and in fact do so from time to time on individual packages, before filing bugs on them. Sometimes it's my CFLAGS, tho usually my config doesn't matter a whit to the bug, as I've been reasonably cautious in my choices to begin with and don't tend to enable stuff like the unsafe floating-point math options that give folks problems from time to time. In particular, as you can see from the -ftree-vectorize subthread, I tend to stay with the defaults when I can't explain with some degree of confidence exactly what the effect of a flag might be and why I might or might not want it. I don't know enough about that area to do that, so I've stayed well away from it in my CFLAGS. > 1) What can be done to test this out at my end without making a 2-day > commitment to rebuild the complete machine. Is it possibly to rebuild > only portions of the machine using a different set of flags or is it a > system wide commitment requiring that I rebuild 575 packages as I did > last weekend? In general, you /can/ rebuild only a part of your system and test that, before making further changes. However, it's important to use a bit of (un?)common sense when doing so, or your results won't be worth much. Basically, in ordered to see how an optimization affects something, you must have some awareness of the shared libraries it uses and to what extent it uses them, recompiling enough of the heavily used dependencies that the critical parts of your test applications (including the libraries they load) are using the new optimizations. One lib that all applications make some use of is glibc, so it can be worth recompiling. It's a big recompile on its own, but of course nowhere near as big as recompiling the entire system. =8^) However, glibc is a special case in some aspects for a number of reasons. The glibc ebuild is pretty conservative with the flags it allows, and actually replaces -Os with -O2, due to problems -Os had mainly on x86, back in the gcc-3.2 and 3.3 era. Since the system is pretty horribly broken if glibc breaks, to the point you are likely to have to boot to a backup or liveCD to fix it, this isn't an unreasonable policy at all. None-the-less, after making doubly sure I had tested-working backups, I decided to see just what the effect of taking out that -Os -> -O2 replace in the glibc ebuild might be. For awhile I actually ran a glibc I had built after having removed that replace. The system continued to work just fine with a -Os compiled glibc, it didn't break or anything, but it didn't seem to be much better either and in some cases seemed worse. It turns out that glibc is built in a much more modular fashion than many libraries, so an app will only load the parts of it it needs, not the parts it doesn't, and that -Os doesn't work so well with this rather extreme (compared to most libs) modularization. As well, as I said, glibc is used by everything on the system, which meant that having bypassed one of the safeties in the glibc ebuild, I could never be sure whether a bug I was experiencing was due to my strange glibc, or to some problem with the package the bug was showing up in or one of its other dependencies. I concluded that it simply wasn't worth bypassing the safeties in the ebuild, and since then, have left them there. Thus, with glibc anyway, simply switching to -Os in your CFLAGS won't make any difference, since the ebuild replaces that with -O2 anyway. The /other/ CFLAGS might make a difference, but -Os it self won't, unless you bypass the replace in the ebuild, and as my experimentation demonstrated well enough for me, that's really not worth the trouble. As I said, the other CFLAGS may make a bit of difference tho, so you might consider it anyway, if you decide to try them. For X users, another library that's going to be commonly used is libX11. You'll probably want to recompile xorg-server (assuming modular-X) as well, plus whatever xf86-video-* driver you use, and libXcomposite if you use the composite extension (transparent windows and the like). Together, those will be pretty critical for performance of any X app. For OpenGL accelerated apps, mesa is likely to be critical to performance as well, for any functions not handled by hardware. For anything written in C++, almost anything KDE among other packages, gcc libstdc++, a part of gcc, will be critical. Other than for C++ apps/libraries, recompiling gcc with new CFLAGS shouldn't make that much difference in how the app runs, tho it might make some diff
[gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
On 9/14/06, Mark Knecht <[EMAIL PROTECTED]> wrote: Hi, I'm just curious whether anyone besides me is noticing their machine feeling somewhat sluggish since doing the gcc-4.1 upgrade? I noticed this morning that MythTV's frontend program is often using 90% CPU when viewed in top. It never used more than 10% before the upgrade to gcc-4. Clearly this is at least part of the problem here. I'm interested in Duncan's flags and how to convert the machine successfully. Is it a complete rebuild? - Mark -- gentoo-amd64@gentoo.org mailing list
Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
On 9/14/06, Mark Knecht <[EMAIL PROTECTED]> wrote: On 9/14/06, Duncan <[EMAIL PROTECTED]> wrote:> "Mark Knecht" <[EMAIL PROTECTED]> posted> [EMAIL PROTECTED], excerpted> below, on Thu, 14 Sep 2006 07:15:42 -0700:2) What about building the kernel? How do the standardmake && make modules_install command make any use of the flags in /etc/make.conf?I believe you have to modify the Makefile in /usr/src/linux to enable additional optimizations. I have noticed in recent kernels that there is an option to compile using -Os as well however I have not used that yet simply because I try to play it safe, especially with my kernels. I would be interested in hearing feedback as to which "safe" optimizations can be used when building a kernel. This machine is a fairly standard desktop running Xorg-7, Gnome andjust a few apps most of the time. However I am an audio oriented I share your concern here as well. One app in particular that comes to mind is lyx which in the past has not gotten along well with heavy optimization, at least for me. Granted this was nearly two years ago when I, like many newcomers to Gentoo, got a bit ridiculous with the CFLAGS. Duncan's post was very educational and has made me reconsider trying additional optimizations again. I am definitely interested in hearing recommendations for proceeding even though like Mark I recently completed a rebuild of my system recently. I am looking forward to the continuation of this thread. Regards,Greg
Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
On 9/14/06, Duncan <[EMAIL PROTECTED]> wrote: "Mark Knecht" <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Thu, 14 Sep 2006 07:15:42 -0700: > I'm just curious whether anyone besides me is noticing their machine > feeling somewhat sluggish since doing the gcc-4.1 upgrade? Mine seems ot > be using a lot of memory. Alt-tabbing between windows seems slow. > Ethernet traffic in my browser is causing pretty noticeable > interruptions in things like MythTV. > The machine is still quite usable, but it doesn't feel as snappy as it > did last week. > > I made no changes in /etc/make.conf for the upgrade. Everything is > pretty basic as far as I can tell: > > CFLAGS="-march=k8 -O2 -pipe" > CXXFLAGS="${CFLAGS}" I've noticed rather the opposite, here. gcc-4.1.1 compiled binaries are /dramatically/ faster and more efficient than 3.x. However, I'm using a rather more elaborate CFLAGS/CXXFLAGS, and it's my conviction that gcc-4.1 does better at optimizing exactly the way you've told it to. That is, if you've given it inefficient optimizations, I'm convinced it makes a bad thing worse, while if you've chosen your optimizations well, it makes a good thing dramatically better. Here's my CFLAGS/CXXFLAGS: CFLAGS="-march=k8 -Os -pipe -frename-registers -fweb -freorder-blocks -freorder-blocks-and-partition -combine -funit-at-a-time -ftree-pre -fgcse-sm -fgcse-las -fgcse-after-reload -fmerge-all-constants" CXXFLAGS="-march=k8 -Os -pipe -frename-registers -fweb -freorder-blocks -funit-at-a-time -ftree-pre -fgcse-sm -fgcse-las -fgcse-after-reload -fmerge-all-constants" As I said, with the above, there's a /dramatic/ improvement in performance between gcc-3.x and gcc-4.1.x. -- Duncan - List replies preferred. No HTML msgs. Hi Duncan, As always, very deep thanks for the answer. Very informative and interesting. Now, you are very adept at this. You're explanations make sense to the level I've considered them. (Not very far right now...) Main questions: 1) What can be done to test this out at my end without making a 2-day commitment to rebuild the complete machine. Is it possibly to rebuild only portions of the machine using a different set of flags or is it a system wide commitment requiring that I rebuild 575 packages as I did last weekend? 2) What about building the kernel? How do the standard make && make modules_install command make any use of the flags in /etc/make.conf? This machine is a fairly standard desktop running Xorg-7, Gnome and just a few apps most of the time. However I am an audio oriented person so my kernel is rt-sources from the proaudio overlay. (Ingo Molnar's patches to the kernel.org kernels and not a Gentoo kernel.) I need to ensure that the audio stuff (Jack, Ardour, Aqualung, 1394 hard drives) continue to work well. Your ideas are most welcome. Thanks, Mark -- gentoo-amd64@gentoo.org mailing list
Re: [gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Duncan wrote: Hmm - no -ftree-vectorize? Care to comment on that? I hear that it can be buggy with a few packages, but I'm guessing it is worth having in there in general. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFCemPG4/rWKZmVWkRArAVAJ9e6zKeeHuNvEa6PlJm3iqzgVmJ8gCgm1rG P+lazfNdJNmNaaoMMlBBmPw= =fJaA -END PGP SIGNATURE- smime.p7s Description: S/MIME Cryptographic Signature
[gentoo-amd64] Re: gcc 4.1 upgrade - bad desktop interactivity anyone?
"Mark Knecht" <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Thu, 14 Sep 2006 07:15:42 -0700: > I'm just curious whether anyone besides me is noticing their machine > feeling somewhat sluggish since doing the gcc-4.1 upgrade? Mine seems ot > be using a lot of memory. Alt-tabbing between windows seems slow. > Ethernet traffic in my browser is causing pretty noticeable > interruptions in things like MythTV. > The machine is still quite usable, but it doesn't feel as snappy as it > did last week. > > I made no changes in /etc/make.conf for the upgrade. Everything is > pretty basic as far as I can tell: > > CFLAGS="-march=k8 -O2 -pipe" > CXXFLAGS="${CFLAGS}" I've noticed rather the opposite, here. gcc-4.1.1 compiled binaries are /dramatically/ faster and more efficient than 3.x. However, I'm using a rather more elaborate CFLAGS/CXXFLAGS, and it's my conviction that gcc-4.1 does better at optimizing exactly the way you've told it to. That is, if you've given it inefficient optimizations, I'm convinced it makes a bad thing worse, while if you've chosen your optimizations well, it makes a good thing dramatically better. Here's my CFLAGS/CXXFLAGS: CFLAGS="-march=k8 -Os -pipe -frename-registers -fweb -freorder-blocks -freorder-blocks-and-partition -combine -funit-at-a-time -ftree-pre -fgcse-sm -fgcse-las -fgcse-after-reload -fmerge-all-constants" CXXFLAGS="-march=k8 -Os -pipe -frename-registers -fweb -freorder-blocks -funit-at-a-time -ftree-pre -fgcse-sm -fgcse-las -fgcse-after-reload -fmerge-all-constants" The general strategy here is to take advantage of size optimization -- on modern compilers, L1 and L2 cache are FAR FAR faster than main memory, and raw CPU cycles runs circles around even cache speeds. Thus, optimizing for CPU speed at the expense of size makes little sense, because all those saved cycles and more are likely to be spent waiting for memory to return code that /would/ have fit in the cache were it size optimized. Thus, for example, where traditional optimizations unroll loops into flat code where possible, to avoid the expense of the jump back to the top of the loop, that spreads out the loop to several times its original code size, thus taking far more room in fast cache and forcing the CPU to wait far more often for code to be fetched from main memory. I prefer to keep the loops, making the code smaller and thus allowing more of it to fit in faster cache. I believe that for most code, this technique will result in faster execution in the real world, despite the theoretical loss of a CPU cycle here or there due to jumping back to the top of the loop. The -freorder-blocks-and-partition, OTOH, can make code slightly larger, but the effect is the same as the above, increasing execution speed. What this optimization does is separate code that is used often from that which is seldom used, so the "hot" code is smaller and fits better in high speed cache, while the "cold" code ends up in slower main memory most of the time. While a lower percentage of the code may be in cache due to the larger size, cache will be used far more effectively, as more "hot" code will be retained therein, with the cold code that's not used so often allowed to drop out of cache into main memory. This particular optimization doesn't work well with C++, however, so it's in my CFLAGS but not my CXXFLAGS. Likewise with -combine, which allows the compiler to optimize across multiple source files at a time. It's only implemented for C at this time (according to the gcc manpage), so it's in my CFLAGS but omitted from my CXXFLAGS. The other strategy here is to make as full a use of the extra registers available to amd64 in 64-bit mode (as opposed to 32-bit x86 mode) as possible. Registers operate at the speed of the CPU, no wait at all, as there is for even L1 cache, so it pays to use them as efficiently as possible. Several of the flags (-frename-registers of course, -fweb, etc) in my CFLAGS are therefore designed to encourage gcc to do this. All the flags I've not mentioned specifically are designed to further the three common goals mentioned above, making as efficient a use as possible of the speed of (1) registers and (2) cache memory, by allowing gcc to optimize over as wide a scope (3, whole units with unit-at-a-time, or even multiple units with -combine) as possible. Of course, see the gcc manpage for additional details. As I said, with the above, there's a /dramatic/ improvement in performance between gcc-3.x and gcc-4.1.x. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- gentoo-amd64@gentoo.org mailing list