On Sep 13, 2006, at 9:42 PM, Gary Kline wrote:
-funroll-loops is as likely to decrease performance for a particular
program as it is to help.

        Isn't the compiler intelligent enough to have a reasonable
        limit, N, of the loops it will unroll to ensure a faster runtime?
        Something much less than 1000, say; possibly less than 100.

Of course; in fact, N is probably closer to 4 or 8 than it is to 100.

        At least, if the initializiation and end-loop code *plus* the
        loop code itself were too large for the cache, my thought is that
        gcc would back out.

Unless you've indicated that the compiler should target a specific CPU architecture, there is no way for it to know whether the size of the L1 cache on the machine doing the compile is the same as, or even similar to the size of the system where the code will run.

        I may be giving RMS too much credit; but
        if memory serves, thed compiler was GNU's first project.  And
        Stallman was into GOFAI, &c, for better/worse.[1]  Anyway, for now
        I'll comment out the unroll-loops arg.

cd /usr/src/contrib/gcc && grep Stallman ChangeLog

...returns no results.  A tool I wrote suggests:

% histogram.py -F'  ' -f 2,3 -p @ -c 10 ChangeLog
61 Kazu Hirata <[EMAIL PROTECTED]>
51 Eric Botcazou <[EMAIL PROTECTED]>
48 Jan Hubicka <[EMAIL PROTECTED]>
39 Richard Sandiford <[EMAIL PROTECTED]>
37 Alan Modra <[EMAIL PROTECTED]>
30 Richard Henderson <[EMAIL PROTECTED]>
29 Joseph S. Myers <[EMAIL PROTECTED]>
27 Jakub Jelinek <[EMAIL PROTECTED]>
25 Zack Weinberg <[EMAIL PROTECTED]>
22 Mark Mitchell <[EMAIL PROTECTED]>
20 John David Anglin <[EMAIL PROTECTED]>
20 Ulrich Weigand <[EMAIL PROTECTED]>
17 Rainer Orth <[EMAIL PROTECTED]>
16 Kelley Cook <[EMAIL PROTECTED]>
16 Roger Sayle <[EMAIL PROTECTED]>
13 David Edelsohn <[EMAIL PROTECTED]>
12 Aldy Hernandez <[EMAIL PROTECTED]>
11 Stephane Carrez <[EMAIL PROTECTED]>
11 Ian Lance Taylor <[EMAIL PROTECTED]>
10 Andrew Pinski <[EMAIL PROTECTED]>
10 Kaz Kojima <[EMAIL PROTECTED]>
10 James E Wilson <[EMAIL PROTECTED]>


A safe optimizer must assume that an arbitrary assignment via a
pointer dereference can change any value in memory, which means that
you have to spill and reload any data being cached in CPU registers
around the use of the pointer, except for const's, variables declared
as "register", and possibly function arguments being passed via
registers and not on the stack (cf "register windows" on the SPARC
hardware, or HP/PA's calling conventions).
        
        Well, I'd added the no-strict-aliasing flag to make.conf!
        Pointers give me indigestion ... even after all these years.
        Thanks for your insights.  And the URL.

You're welcome.

        gary

[1]. Seems to me that "good old-fashioned AI" techniques would work in
     something like a compiler  where you probblyhave a good idea of
     most heuristics.   -gk

Of course. The compiler enables those optimizations with -O or -O2 which are almost certain to result in beneficial improvements to performance and code size, most of the time. Potential optimizations which are not helpful on average are not enabled by default, until the situations where they are known to be useful can be identified by the compiler at compile-time.

Using non-default optimization options isn't like discovering buried treasure that nobody else was aware of; the options aren't enabled by default for good reason(s), usually because the tradeoffs they make aren't helpful in general (yet), or because their usage has known bugs which result in faulty executables being produced.

--
-Chuck

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to