Hi Edouard!

> Also, i have a more general question, why did you switch to
> intrinsics ? Certain (well almost all versions) of gcc are known to
> produce horrible code using intrisics.I remember from personal experience
> that in lot of cases intrinsics led to very poor mmx/sse code that used
> lot of load/store storms from/to stack. Moreover this was non op code like:
> movq [esp], mm0
> movq mm0, [esp]
>
I have been meaning to update my blog post on intrinsics, after I used
them for a lot more since it was written.

http://sh0dan.blogspot.com/2009/10/intrinsics-in-gcc.html

Code generation is generally very nice. I've tested on gcc 4.3, and
with some tweaks here and there the generated code looks very
convincing.

> At that time, gcc devs explained that this was caused by some sort
> of aliasing problem that gcc can't solve all by itself. Writing to
> memory was necessary because gcc can't determine if other pointers
> could alias the region. But i don't see wide usage of restrict keyword
> in rawtstudio sources.

I haven't really run into any aliasing problems. It may be that I have
been lucky, or simply don't depend on them. I always re-arrange my
data into aligned stack variables before using them in intrinsics.
Other methods, like using _mm_set1_ps() or similar work, but may lead
to wonky code generation.


> Do you have a very specific "good" version of gcc you advise ?
> What speedup/slowdown did you get using the intrinsics instead of
> inline asm ?
> Is that motivated by the ease of maintenance (i mean you write code
> once and this uses more registers on x86_64 than on ia32 ?)

All code is primarily "written" for 64 bit, but I find that the
compiled 32 bit is surprisingly nice. To sum up I feel that the
biggest wins are:

* Much easier C-integration.
This makes "housekeeping" much easier. For loops are written in C, C
can be used for pointer work, etc.

* More readable.
Syntax may be stupid - why use "_mm_add_epi32()" instead of a syntax
that reflects the instruction, like "_mm_paddd()", but you do use real
variable names and you write your code as if it was C.

* Let compiler choose registers and code order
This way you can still write readable code, and if you write your code
so you give the compiler room to work, it seems to be very capable of
reordering instructions and keeping track of registers, so that you
don't get a large number of dependency stall. If you write an
intrinsic function it is usually completely inlined into the code,
saving registers and function call time, and enabling inlined
instructions to be moved around in the calling function.

But to sum up, you HAVE to write your intrinsics so your compiler has
some possibility to reorder your instructions. That way you get a huge
win, especially on 64 bits.

I use gcc 4.3 where I check the generated code. With some minor tweaks
it produces very nice code with little to none register to stack
overflows, and very interleaved dependencies where it is possible,
especially on 64 bit.

When you write in straight assembler your code is "dead" - if you use
intrinsics the compiler can, at least in principle, make it faster,
when it gets better (llwm-gcc for instance) or optimize it to specific
architectures.

The "move to memory, read from memory" is usually only something the
compiler does when you do a debug compile (-g).

> Just curious about this intrinsics move :-)

Coming from doing primarily MSVC inline assembler, I looked for a good
flexible way to write assembler. Inline GCC assembler is horrible -
nothing less. Not only is it AT&T syntax, but on x86-32 you have 4GP
registers to pass parameters. So after I wrote the de-noiser assembler
in that, I was very much looking for something else. Straight
assembler is (nasm, yasm), is not for my temper, so I decided to look
at how useful intrinsics are.  And so far, I like them very much.



Regards, Klaus Post

http://www.klauspost.com

_______________________________________________
Rawstudio-dev mailing list
[email protected]
http://rawstudio.org/cgi-bin/mailman/listinfo/rawstudio-dev

Reply via email to