GCC extension proposal (loop unswitching)

2008-03-30 Thread Nils Pipenbrinck
Hi there. I'm a long mailing-list lurker, and I use GCC quite a bit for 
embedded software development.


Most of the stuff I write is performance critical, and I always find 
myself in the same situation: I spend counter less hours to unswitch 
loops by hand because the built-in loop unswitcher is not always smart 
when multiple variables can be unswitched. Also you can not 
enable/disable loop unswitching on a per-function basis.


I like to structure my source by function (so you find related stuff in 
the same source). Therefore I always have uncritical control-code and 
critical inner loop functions in the same translation unit. This is bad 
because I can't simply enable the optimization without sacrificing 
code-space (and as such performance due to instruction cache misses).


Some kind of pragma/attribute to hint the compiler into making a better 
decision and enabling/disabling unswitching for entire functions would 
be great help for my work. I have no real idea how such a pragma should 
look like because I'm not familiar with the gcc internals.



I hope it's ok to place requests like this onto the mailinglist.

Cheers, and keep up the good work,

   Nils



US-CERT Vulnerability Note VU#162289

2008-04-07 Thread Nils Pipenbrinck


> If you know of a non-GCC compiler that optimizes away
> the test (so that the function always returns 0), please
> post here, and let me know the name, version number,
> command-line options, etc. you used to demonstrate that.


The lovely TI Code Composer Studio compiler does the same optimization. 
It's mostly for DSP code and not a mainstream compiler though.


Cmdline: ./cl6x -O3 test.c

Version of the compiler is 6.0.8 (latest)

Optimizations are good. I love them!

Nils




Byte permutation optimization

2008-07-12 Thread Nils Pipenbrinck

Hi there.

I recently came across a nice micro-optimization to permutate bytes 
within a dword. On x86 all 24 combinations can be done using a maximum 
of 3 instructions (bswap, rotate32 and rotate16).


I'd like to give it a try and write an optimization pass that detects 
such dword permutations and replaces them with the optimal sequences. In 
my tests this seems to be a good win in size and speed. I have the 
feeling that the register-allocator likes the smaller sequences as well.


Anyway, I don't know where to start. I've browsed the source-code and 
searched for related optimization passes that I could use as a 
boiler-plate for my experiments.


So far I found two places that look interesting for me:

 In optabs.c I've seen the code that combines two shifts and one or 
into a rotate. Looks like a hard-coded special case for me.


 In tree-ssa-math-opts.c I've seen a code that scans for some common 
math operations and replaces them with faster code.


Since the codebase is huge I have the feeling that I have overlooked 
something. Does some kind of infrastructure to detect patterns within a 
SSA tree already exists somewhere else?  Where would be the best place 
in gcc to add an automatic byteswap detection?


I don't know if I'll ever finish the experiment and submit a patch. The 
code-base *huge* and scary, but I'd at least like to give it a try.


 Nils Pipenbrinck





Something general (beginner related)

2008-08-19 Thread Nils Pipenbrinck

Hi folks.

Maybe the one or another remembers the post I've wrote more than a month 
ago. I was (and still am) new to the GCC codebase and had the evil plan 
to add an optimization pass to do byte permutations (capturing all 
home-brewed bswap things and all the other 23 possible byte-permutations 
as well).


Well - I had a lot of time to kill during the last weeks, and I had a 
lot of timeto read into the source. Business trips without internet 
access are a perfect opportunity to read such stuf...


Anyway, after all this passive read-work, I've been able to add the 
humble beginnings of my pass. I'm now able to  and I've been able to run 
a code that does a printf as soon as I find an structure in the SSA tree 
that is a potential candidate for my optimization.


However, approaching this point has been very painfull and I'm far away 
from chainging anything in the SSA.


Let's get to the point: The documentation sucks. If you want to learn 
how things work the wiki and the documentation is of little help. You 
have to read other code, step through other optimizations  and do a lot 
of reverse-engineering and code-reading to understand how all the 
different things fit and relate to each other.


I think a well commented pseudo-pass (well document as: talk to someone 
who know C but has no idea about the GCC magic..) that does something 
stupid but valid like re-inventing the neg-operator by using not and 
subtraction could act as be a very nice boilerplate code for people like me.


For someone who is into the SSA structures and the gcc internals should 
be able to write up something within a day or an half.


Just an idea.. And just to let you know how hard it can be: Finding out 
what GTY means *can* take half a day...


  Nils





Re: printf enhancement

2010-01-22 Thread Nils Pipenbrinck
Alfred M. Szmidt wrote:
>Since it is possible to use the 0b prefix to specify a binary
>number in GCC/C++, will there be any resistance to add %b format
>specifier to the printf family format strings?
>
> You can do that yourself by using the hook facility for printf, see
> (libc) Customizing Printf in the GNU C library manual; this is a GNU
> extention and not supported by ISO C.
>   
If you add support for it, you may break code that already registers %b
using this functionality, no?

Cheers,
Nils



Re: Antwort: Re: Antwort: Re: [PATCH]: bump minimum MPFR version, (includes some fortranbits)

2008-10-14 Thread Nils Pipenbrinck

Markus Milleder wrote:

I don't think anybody who tries to build GCC from source will have any
problem building MPFR first.
  

Not entirely true:

Those of us who use cygwin and want to use the latest GCC have to first 
compile a non MPFR GCC (e.g. 4.1.x) before they can compile the latest 
GPFR  and link GCC to it.


This is not a big deal if you know that you have to do that, but if you 
don't know why the MPFR fails and wich snapshot you have to use as an 
immediate step it can be very frustrating.


I would welcome a configuration option that disables all the MPFR 
related things. That would make compiling GCC on a naked cygwin 
installation *much* easier.



Cheers,
   Nils Pipenbrinck




Re: Antwort: Re: Antwort: Re: [PATCH]: bump minimum MPFR version, (includes some fortranbits)

2008-10-14 Thread Nils Pipenbrinck

Andrew Pinski wrote:

On Tue, Oct 14, 2008 at 1:28 PM, Nils Pipenbrinck
<[EMAIL PROTECTED]> wrote:
  

Not entirely true:

Those of us who use cygwin and want to use the latest GCC have to first
compile a non MPFR GCC (e.g. 4.1.x) before they can compile the latest GPFR
 and link GCC to it.



I don't really see any issue here.  Because to compile GCC you need a
compiler to begin with so compiling MPFR to start is easy, now if MPFR
does not support older GCCs, we might need to rethink this.
  
Cygwin comes with a GCC 3.4.somewhat out of the box. To compile MPFR you 
need a 4.1 compiler.  So you have to double compiling everything. And 
worse: You have to know that you have to do this. There is no 
information about that issue.


Besides that: compiling GCC under cygwin takes *much* longer than under 
linux for example The config script alone (and there are more than one) 
need around 20 minutes.


Nils



Re: [RFC] Remove -frtl-abstract-sequences in 4.5

2008-11-25 Thread Nils Pipenbrinck

Diego Novillo wrote:

On Mon, Nov 24, 2008 at 12:23, Mark Mitchell <[EMAIL PROTECTED]> wrote:
  

David Edelsohn wrote:


It currently is broken on many platforms.  Why not remove it now?  What is
the purpose of keeping a pass that does not work correctly and developers
cannot use?
  


As a user I'd like to point out that I would jump up and down in joy if 
-frtl-abstract-sequences would work.


I do a lot of embedded work for a wide range of targets using GCC and 
often I find myself in situations where another 20 to 30% of 
code-reduction could improve the performance of my code a lot. My 
targets often just have a simplistic 8k direct mapped code-cache.


For critical things I did just what -frtl-abstract-sequences did (along 
with some hand optimizing) and I've seen performance improvements of 50% 
and more just by getting my working set of code to fit into the cache-size.


Once the feature is removed: Will there ever be any chance that the 
feature will be re-implemented?


Maybe someone with contacts could convince NXP, ARM, NEC or another 
low/middle end microcontroller manufacturer to pay someone to do fix the 
current code instead of removing it. These days probably noone do so, 
but maybe in a year or two when we're all less paranoid about the world 
economy someone could be convinced. The potential performance 
improvement for small cache architectures should not be underestimated:


However, in the current state the feature is useless and should be 
removed. (in the tests I've done so far it hangs the gcc even with very 
simple code).


Cheers,
 Nils



Re: [ARM] Implement __builtin_bswap32() via ARMv6 "rev" instruction

2008-12-08 Thread Nils Pipenbrinck

Ian Lance Taylor wrote:

Unfortunately we need more than that: we need a signed piece of paper
disclaiming copyright.  


This is something I stumbled over some month ago when I studied the 
submission rules:


I am now a lawyer, but as far as I know in my country (germany) it is 
not possible to decline copyright (called Urheberrecht here - it's not 
exactly the same but close). You can give away the usage-rights to your 
code at will and for free (by putting your code into public domain for 
example), but declining "Urheberrecht" is not possible.


How does the FSF deals with this issue?




Broken optimization of pow (x, 1.5) and other halfs of integers..

2009-02-27 Thread Nils Pipenbrinck

Hi folks.

While optimizing some of my code I replaced powf (x, 1.5f) with x * 
sqrt(x). Out of couriosity I checked if GCC does this optimization and 
found it in the code. It's in expand_builtin_pow in the file builtin.c 
(gcc 4.3.1 source).


However, GCC does not apply this optimization for a reason or another. 
If I change the constant to 3.0f I get optimized code from another 
special case expansion branch in that function (whole integers). Any 
ideas what's wrong here?



The test-code, compiled with gcc (4.3.1) test.c 
-funsafe-math-optimizations -O3 -S


#include 
float test (float arg)
{
 return powf (arg, 1.5f);
}

If the optimization is triggered the disassembly shouldn't have a call 
to _powf  anymore.


Should I file a bug-report?

Cheers,
   Nils