Re: [gentoo-dev] Re: LTO use in the tree

2014-04-27 Thread Rich Freeman
On Sat, Apr 26, 2014 at 10:37 PM, C. Bergström
cbergst...@pathscale.com wrote:
 #2 The only reference to anything which the compiler could impact is
 Use Boyer-Moore (and unroll its inner loop a few times). Finding out which
 flag controls that for ${CC} would have some importance. It's almost
 certainly combined with -O3 and or some standalone loop related
 optimization. (Nothing depending on LTO). If they were really clever or
 determined  - there's probably a few GCC or other pragma which could give a
 hint about unrolling.

So, I'll certainly agree that package-specific CFLAG tuning will
always be superior to just setting some flag at the system level and
walking away.

And yet, in the same paragraph you mention -O3, which is tantamount to
just setting a flag and walking away.  That turns on 14 things you
probably don't really need.

I run -flto at the system level since in my experience it only causes
problems with a handful of packages, and when it does provide a
benefit I get it.  For the most part it just means my compiles at 2AM
take longer, and a bit more RAM, neither of which are a concern.  If I
do run into a bug, that is just an opportunity to log it and
contribute (though to date I haven't been submitting -flto issues as
bugs as it is still a bit new).

I think LTO is becoming mainstream-enough that we should consider it
supported in the sense that packages should filter it if it is known
not to work.  We certainly do that with things like -O2/3/s if they
don't work.  However, it still should be considered a somewhat
experimental flag and enabling it will involve bumps.  Also, it will
always involve a RAM tradeoff, so there may be cases where it isn't
filtered because it does work just fine, but it won't work for your
system with 4GB of RAM (or 8, or 16 even).  If maintainers want to add
logic to test before building (as is sometimes done for /var/tmp with
very large packages) they are welcome to do so, but I think that is
going above-and-beyond.

Rich



Re: [gentoo-dev] Re: LTO use in the tree

2014-04-27 Thread C. Bergström

On 04/27/14 06:23 PM, Rich Freeman wrote:

On Sat, Apr 26, 2014 at 10:37 PM, C. Bergström
cbergst...@pathscale.com wrote:

#2 The only reference to anything which the compiler could impact is
Use Boyer-Moore (and unroll its inner loop a few times). Finding out which
flag controls that for ${CC} would have some importance. It's almost
certainly combined with -O3 and or some standalone loop related
optimization. (Nothing depending on LTO). If they were really clever or
determined  - there's probably a few GCC or other pragma which could give a
hint about unrolling.

So, I'll certainly agree that package-specific CFLAG tuning will
always be superior to just setting some flag at the system level and
walking away.

And yet, in the same paragraph you mention -O3, which is tantamount to
just setting a flag and walking away.  That turns on 14 things you
probably don't really need.
I was trying to give a simplified example... no need to nitpick my reply 
(Every compiler defines -O3 differently and even the flag to unroll 
loops and that threshold may be different.. ...)


I run -flto at the system level since in my experience it only causes
problems with a handful of packages, and when it does provide a
benefit I get it.
Can you name a single package that you use which receives a measurable 
benefit from LTO? (Just asking)


I don't disagree about enabling it, filing bug reports or many other 
things. I'm just curious if you have any hard numbers... (You seem 
passionate and sorry if this seems like I'm putting you on the spot)


/*
Side note
IPA (aka whole program and LTO) is by far the hardest optimizations I've 
ever personally had to debug/engineer/tune in a compiler. Making it 
robust needs passionate users who file good reduced test cases. While 
for a single source you have creduce or delta - what options are there 
for automated reduction of whole program problems..

*/




Re: [gentoo-dev] Re: LTO use in the tree

2014-04-27 Thread Rich Freeman
On Sun, Apr 27, 2014 at 7:41 AM, C. Bergström
cbergst...@pathscale.com wrote:
 On 04/27/14 06:23 PM, Rich Freeman wrote:
 And yet, in the same paragraph you mention -O3, which is tantamount to
 just setting a flag and walking away.  That turns on 14 things you
 probably don't really need.

 I was trying to give a simplified example... no need to nitpick my reply
 (Every compiler defines -O3 differently and even the flag to unroll loops
 and that threshold may be different.. ...)

Sorry if it came across aggressively.  I was just pointing out that
the reason one sets CFLAGs generically is to avoid the trouble of
optimizing the optimizer.  This always comes at a cost - I tend to
use -Os, but no doubt some packages would benefit from a different
global optimization, let alone specific optimizations.

That was just the point I wanted to make about LTO - I think it is of
general usefulness since it has the potential to help, and rarely
hurts.  The only problem with it is that the implementation is
immature.


 Can you name a single package that you use which receives a measurable
 benefit from LTO? (Just asking)

Alas, I cannot.  There are some general benchmarks out there, and they
seem to vary from little to no effect to significant.  More
CPU-intensive software seems the most likely to benefit.  No doubt the
benefits of LTO will improve as it matures.

Rich



Re: [gentoo-dev] Re: LTO use in the tree

2014-04-27 Thread Joshua Kinard
On 04/27/2014 07:23, Rich Freeman wrote:
 On Sat, Apr 26, 2014 at 10:37 PM, C. Bergström
 cbergst...@pathscale.com wrote:
 #2 The only reference to anything which the compiler could impact is
 Use Boyer-Moore (and unroll its inner loop a few times). Finding out which
 flag controls that for ${CC} would have some importance. It's almost
 certainly combined with -O3 and or some standalone loop related
 optimization. (Nothing depending on LTO). If they were really clever or
 determined  - there's probably a few GCC or other pragma which could give a
 hint about unrolling.
 
 So, I'll certainly agree that package-specific CFLAG tuning will
 always be superior to just setting some flag at the system level and
 walking away.
 
 And yet, in the same paragraph you mention -O3, which is tantamount to
 just setting a flag and walking away.  That turns on 14 things you
 probably don't really need.
 
 I run -flto at the system level since in my experience it only causes
 problems with a handful of packages, and when it does provide a
 benefit I get it.  For the most part it just means my compiles at 2AM
 take longer, and a bit more RAM, neither of which are a concern.  If I
 do run into a bug, that is just an opportunity to log it and
 contribute (though to date I haven't been submitting -flto issues as
 bugs as it is still a bit new).

My curiosity, as I have not attempted LTO yet on any machine, is what are
the RAM requirements?  Is it a hard limit, wherein the compiler simply fails
if there isn't enough RAM, or does it just start hitting swap real hard?
Those of us using older archs where the RAM is limited might have to be more
cautious w/ LTO.  I.e., my SGI O2 maxes right now at 512MB.  It can go to
1GB if the odd memory/PROM issue is ever worked out.  But 512MB is it for
now, so what are my odds of successfully using LTO on that?

Especially if LTO helps to reduce the final binary size, that's less data
being shuffled around main memory and the CPU caches, which, although means
slower compile times, might hake such a machine a bit snippier.  Though, I
dread how long GCC will take to build itself w/ LTO.  The O2 already needs
~18hrs for 4.8.  I haven't tried 4.9 on it yet.

-- 
Joshua Kinard
Gentoo/MIPS
ku...@gentoo.org
4096R/D25D95E3 2011-03-28

The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between.

--Emperor Turhan, Centauri Republic



[gentoo-dev] Last rites: dev-python/python-gnutls

2014-04-27 Thread Manuel Rüger
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

# Manuel Rüger mr...@gentoo.org (28 Apr 2014)
# Fails to build with gnutls-3, on behalf of python herd
# See bug #446016
dev-python/python-gnutls
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJ8BAEBCgBmBQJTXYr2XxSAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ4MDA1RERERkM0ODM2QkE4MEY3NzY0N0M1
OEZCQTM2QzhEOUQ2MzVDAAoJEFj7o2yNnWNc7TEP/RbgarQwrfyKCVOIESMJNccl
KNs1TR27Re8r4epZwclXGg9tcU++wSGCLph9uHjrJfPv6cla9m5MwxXXpXnuMRHo
QiGxKP2vM1663m/+wz6TrUSUzLglp1lvGKXX+pEweKoY5sY2yWiWKEQXOq5KL6q4
iEQLLWX3tvxF8aoE+Qy1nggSHym2wJc8S27bdD8P8GSmoIdCiVTesp5FYKxfryrB
Yt9U3sdH3Qa2HGJkIkI1qdaUHTjjK+XAsI24iMd4iGN8CuDzkubiOid8e1gq1R14
ytmqt8IiXnJIz9MdwQMn7DE6NhSNY7asFuTuwed+oJQRdK5CiejUq2fYe0FoPibv
uMsvq9xGmXzPXjwqg0yOca56EkengH7DF45LE+S3xwToFgxOmqXOKS5XsqKJ36nI
1fsQbeZeDAXGPFrncRgiCW1HlG4ZFrEmqSrsDzqpiQlVOlWw+EnqOePN5RD1pnJy
zhUS6XZscbhOo/JjPLbr9BtwjWzQ+NggDbDG1wokhQocuyBASgB7WGP3Lc8w2NiA
BqM2crQm9n/D2yD2j2mgB8UsZ5Ox+CwhqZbq0rO5q91o0mD48xlfXyD2Xkzt4Y5R
dH4fXMTcHFtWnffPRFDMwcwojfIsEpCADP4wzCPjVMbZWC/Ipl2JxsjsmkoX+FHF
kvzBeFIf/xTzJZfUBtje
=N+7f
-END PGP SIGNATURE-



Re: [gentoo-dev] Re: LTO use in the tree

2014-04-27 Thread Joshua Kinard
On 04/26/2014 20:34, C. Bergström wrote:
 On 04/27/14 02:58 AM, Martin Vaeth wrote:
 Rich Freeman ri...@gentoo.org wrote:
 FWIW the list of packages I have issues with include:
 Not sure whether this is the right place to post it.
 It's interesting to see that rather lengthy list. From a compiler engineer
 perspective I'd like to toss in my opinion
[snip]

What compiler, out of curiosity?

-- 
Joshua Kinard
Gentoo/MIPS
ku...@gentoo.org
4096R/D25D95E3 2011-03-28

The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between.

--Emperor Turhan, Centauri Republic



Re: [gentoo-dev] Re: LTO use in the tree

2014-04-27 Thread Rich Freeman
On Sun, Apr 27, 2014 at 6:56 PM, Joshua Kinard ku...@gentoo.org wrote:

 My curiosity, as I have not attempted LTO yet on any machine, is what are
 the RAM requirements?  Is it a hard limit, wherein the compiler simply fails
 if there isn't enough RAM, or does it just start hitting swap real hard?

It just allocates RAM, and the OS does the rest.  I've seen it invoke
the OOM killer.  That was back when I only had 8GB of RAM.  Now I have
16GB and I only need to disable LTO on the really big packages.

Of course, if you set an appropriate ulimit then the process will just
terminate more gracefully.  I'd highly recommend doing just that if
you have a lot of swap available.

 Those of us using older archs where the RAM is limited might have to be more
 cautious w/ LTO.  I.e., my SGI O2 maxes right now at 512MB.  It can go to
 1GB if the odd memory/PROM issue is ever worked out.  But 512MB is it for
 now, so what are my odds of successfully using LTO on that?

About zero.  Well, I'm sure it will work fine for hello.c, especially
if you eliminate any function calls inside of it.


 Especially if LTO helps to reduce the final binary size, that's less data
 being shuffled around main memory and the CPU caches, which, although means
 slower compile times, might hake such a machine a bit snippier.  Though, I
 dread how long GCC will take to build itself w/ LTO.  The O2 already needs
 ~18hrs for 4.8.  I haven't tried 4.9 on it yet.

Yeah, good luck with that...  :)

I'd be curious as to what you find.  You can always try it out by
picking a small package and doing a CFLAGS=foo emerge bar.  Be sure to
only use -j1 -flto=1 as well.

Rich



Re: [gentoo-dev] Re: LTO use in the tree

2014-04-27 Thread Joshua Kinard
On 04/27/2014 19:08, Rich Freeman wrote:
 On Sun, Apr 27, 2014 at 6:56 PM, Joshua Kinard ku...@gentoo.org wrote:

 My curiosity, as I have not attempted LTO yet on any machine, is what are
 the RAM requirements?  Is it a hard limit, wherein the compiler simply fails
 if there isn't enough RAM, or does it just start hitting swap real hard?
 
 It just allocates RAM, and the OS does the rest.  I've seen it invoke
 the OOM killer.  That was back when I only had 8GB of RAM.  Now I have
 16GB and I only need to disable LTO on the really big packages.
 
 Of course, if you set an appropriate ulimit then the process will just
 terminate more gracefully.  I'd highly recommend doing just that if
 you have a lot of swap available.

My favourite, starting long compiles on slow boxen, only to wake up to
discover they failed in the final five minutes of the build over something
as trite as low memory :)


 Those of us using older archs where the RAM is limited might have to be more
 cautious w/ LTO.  I.e., my SGI O2 maxes right now at 512MB.  It can go to
 1GB if the odd memory/PROM issue is ever worked out.  But 512MB is it for
 now, so what are my odds of successfully using LTO on that?
 
 About zero.  Well, I'm sure it will work fine for hello.c, especially
 if you eliminate any function calls inside of it.

About zero?  So, some floating point value infinitely between 0 and 1?  Hmm,
maybe I'll try it once I get my SGI Octane to boot Linux again.



 Especially if LTO helps to reduce the final binary size, that's less data
 being shuffled around main memory and the CPU caches, which, although means
 slower compile times, might hake such a machine a bit snippier.  Though, I
 dread how long GCC will take to build itself w/ LTO.  The O2 already needs
 ~18hrs for 4.8.  I haven't tried 4.9 on it yet.
 
 Yeah, good luck with that...  :)
 
 I'd be curious as to what you find.  You can always try it out by
 picking a small package and doing a CFLAGS=foo emerge bar.  Be sure to
 only use -j1 -flto=1 as well.

O2 only has one CPU, so it's always -j1.  SMP on my other MIPS machines
doesn't work yet (either Linux isn't supported, or I haven't debugged SMP
code yet).

-- 
Joshua Kinard
Gentoo/MIPS
ku...@gentoo.org
4096R/D25D95E3 2011-03-28

The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between.

--Emperor Turhan, Centauri Republic



Re: [gentoo-dev] Re: LTO use in the tree

2014-04-27 Thread Joshua Kinard
On 04/27/2014 20:40, C. Bergström wrote:

 On those old SGI MIPS machines use MIPSPro. It had better (LTO/whole
 program) optimizations than GCC more than 10 years ago (imho and gcc may
 have caught up now in 4.9). Just add the -ipa flag and test. In fairness
 there is primarily 3 limitations with MIPSPro IPA

[snip]

That's if they ran IRIX.  They run Linux :)

-- 
Joshua Kinard
Gentoo/MIPS
ku...@gentoo.org
4096R/D25D95E3 2011-03-28

The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between.

--Emperor Turhan, Centauri Republic