Re: __restrict, architecture intrinsics vs asm, consoles, and other stuff

Iain Buclaw Sat, 24 Sep 2011 05:40:32 -0700

== Quote from Manu Evans (turkey...@gmail.com)'s article
> > > How can I do this in a nice way in D? I'm long sick of writing
> > > unsightly vector classes in C++, but fortunately using vendor
> > > specific compiler intrinsics usually leads to decent code
> > > generation. I can currently imagine an equally ugly (possibly worse)
> > > hardware vector library in D, if it's even possible. But perhaps
> > > I've missed something here?
> > Your C++ vector code should be amenable to translation to D, so that effort 
> > of
> > yours isn't lost, except that it'd have to be in inline asm rather than
intrinsics.
> But sadly, in that case, it wouldn't work. Without an intrinsic hardware 
> vector
type, there's
> no way to pass vectors to functions in registers, and also, using explicit 
> asm,
you tend to
> end up with endless unnecessary loads and stores, and potentially a lot of 
> redundant
> shuffling/permutation. This will differ radically between architectures too.
> I think I read in another post too that functions containing inline asm will 
> not
be inlined?
> How does the D compiler go at optimising code around inline asm blocks? Most
compilers have a
> lot of trouble optimising around inline asm blocks, and many don't even 
> attempt
to do so...
> How does GDC compare to DMD? Does it do a good job?
> I really need to take the weekend and do a lot of experiments I think.


GDC is just the same as DMD (same runtime library implementation for vector 
array
operations).


You can define vector types in the language through use of GCC's attribute 
though
(is a pragma in GDC), then use a union to interface between it and the
corresponding static array.  It's deliberately UGLY and PRONE to you hitting 
lots
of brick walls if you don't handle them in a very specific way though. :~)

Stock example:

pragma(attribute, vector_size())
  typedef float __v4sf_t

union __v4sf {
  float[4] f;
  __v4sf_t v;
}


__v4sf a = {[1,2,3,4]}
       b = {[1,2,3,4]}
       c;

c.v = a.v + b.v;
assert(c.f == [2,4,6,8]);


The assignment compiles down to ~5 instructions:
movaps -0x88(%ebp),%xmm1
movaps -0x78(%ebp),%xmm0
addps  %xmm1,%xmm0
movaps %xmm0,-0x68(%ebp)
flds   -0x68(%ebp)

And is far quicker than c[] = a[] + b[] due to it being inlined, and not an
external library call.

Regards
Iain

Re: __restrict, architecture intrinsics vs asm, consoles, and other stuff

Reply via email to