Re: Input for an "optimized" slide

Zach Pfeffer Fri, 19 Aug 2011 12:14:44 -0700

Thanks Bero. Sending this extremely useful information out to a wider audience.


Alex,

I think you're probably be very interested in this for your Mozilla work.

>>   -O3
>>      * What is is, does, available on
>
> -O3 enables several additional compiler optimizations such as tree
> vectorizing and loop unswitching, and optimizes for speed over code
> size somewhat more aggressively than -O2, e.g. by inlining all calls
> to small static functions.
> It is available on any platform supported by gcc.
>
>>   OpenMP
>>      * What is is, does, available on
>
> OpenMP is a simple API that makes it easier for a programmer to make
> use of multi-core or multi-processor systems, e.g. by automatically
> splitting marked loops into several threads.
> Example:
>
> #pragma omp parallel for
> for(int i=0; i<100; i++)
>    do_something(i);
>
> Would use up to 100 threads to do its job.
>
>
> It is available on plaforms supported by gcc that can use libgomp,
> gcc's OpenMP library. This includes most platforms that support POSIX
> threads - but -- initially -- not Android.
>
>
>>   Loop parallelization
>>      * What is is, does, available on
>
> Loop parallelization takes OpenMP a step further by automatically
> determining which loops are suitable for "#pragma omp parallel for"
> and similar constructs. This allows code that was written without
> multiprocessing in mind (such as most code written specifically for
> ARM platforms - multicore/SMP ARM systems are quite new) to take
> advantage of multicore/SMP systems (to some extent) without having to
> modify the code.
>
> Compiler flag: -ftree-parallelize-loops=X (where X is the number of
> threads to be optimized for - typically the number of CPU cores in the
> target system)
>
> Available on anything supported by gcc that has both libgomp and
> graphite (incl. CLooG, PPL or ISL) - the original Android toolchain
> has neither of those.
>
>> ...and any other optimizations that you've done.
>
> None of the following is enabled yet (but the support in the toolchain
> is there now), but I'm planning to enable them step by step once we
> have systems built w/ the new toolchain that actually boot:
>
> binutils: --hash-style=gnu
>    By default, ld creates SysV style hash tables for function tables
> in shared libraries. With --hash-style=gnu, we switch to GNU style
> hashes, making symbol lookup a lot faster. (details:
> http://sourceware.org/ml/binutils/2006-10/msg00377.html)
>
> binutils: -Bsymbolic-functions
>    Speed up the dynamic linker by binding references to global
> functions in shared libraries where it is known that this doesn't
> break things (it's safe for libraries that don't have any users trying
> to override their symbols - it's probably safe to assume e.g. skia and
> opengl could benefit).
> (details: 
> http://www.fkf.mpg.de/edv/docs/intel_composer/Documentation/en_US/compiler_f/main_for/copts/common_options/option_bsymbolic_functions.htm)
>
> binutils/gcc: -flto, -fwhole-program
>    Link-Time Optimization - causes code to be optimized again at link
> time, when the compiler knows what functions are called form what
> parts of the code, what functions are only called with constant
> parameters, etc.
>
> gcc: -mtune=cortex-a9 (or whatever the actual target CPU is)
>    The Android build system uses -march=arm-v7a, which is good -- but
> it doesn't do any tuning for the specifc CPU type (e.g. cortex-a8 vs.
> cortex-a9).
>
> gcc: -fvisibility-inlines-hidden
>    Don't export C++ inline methods in shared libraries. Makes the
> symbol table smaller, improving startup time and diskspace efficiency
>
> gcc: -fstrict-aliasing -Werror=strict-aliasing
>    Currently, Android uses -fno-strict-aliasing unconditionally for
> thumb code, to work around some pieces of code that violate strict
> aliasing rules. Using -Werror=strict-aliasing, we can determine what
> pieces of code are affected, and fix them, or limit the use of
> -fno-strict-aliasing to the specific files that need it - enabling the
> rather useful strict-aliasing optimization for the rest of the build
>
> gcc: Investigate Graphite optimizations that aren't even enabled at -O3:
>   -fgraphite-identity -floop-block -floop-interchage
> -floop-strip-mine -ftree-loop-distribution -ftree-loop-linear
>

_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Input for an "optimized" slide

Reply via email to