Re: __restrict, architecture intrinsics vs asm, consoles, and other stuff

Manu Evans Fri, 23 Sep 2011 14:45:22 -0700

== Quote from Walter Bright (newshou...@digitalmars.com)'s article
> D doesn't have __restrict. I'm going to argue that it is unnecessary. AFAIK,
> __restrict is most used in writing vector operations. D, on the other hand, 
> has
> a dedicated vector operation syntax:
>    a[] += b[] * c;
> where a[] and b[] are required to not be overlapping, hence enabling
> parallelization of the operation.


Use of __restrict is certainly not limited to your example, it's applicable 
basically anywhere
that a pointer is dereferenced on either side of a write through any other 
pointer, or a
function call (since it could potentially do anything), the resident value from 
the previous
dereference is invalidated and must be reloaded needlessly unless the pointer 
is explicitly
marked restrict.

http://cellperformance.beyond3d.com/articles/2006/05/demystifying-the-restrict-keyword.html

For RISC architectures in particular, __restrict is mandatory when optimising 
certain hot
functions without making a mess of your code (declaring stack locals all over 
the place), and
I think I've run into cases where even that's not enough.

> D does have some intrinsics, like sin() and cos(). They tend to get added on a
> strictly as-needed basis, not a speculative one.
> D has no current intention to replace the inline assembler with intrinsics.
> As for custom intrinsics, Don Clugston wrote an amazing piece of 
> demonstration D
> code a while back that would take a string representing a floating point
> expression, and would literally compile it (using Compile Time Function
> Execution) and produce a string literal of inline asm functions, which were 
> then
> compiled by the inline assembler.
> So yes, it is entirely possible and practical for end users to write custom
> intrinsics.

I hadn't thought of that using compile-time functions, that's really nice.
I'm not sure if that'll be enough to generate good code in all cases, but I'll 
do some
experiments and see where it goes.
The main problem with writing (intelligently generated) inline asm vs using 
intrinsics, is in
the context of the C (or D) source code, you don't have enough context to know 
about the state
of the register assignment, and producing the appropriate loads/stores. Also, 
the opcodes
selected to perform the operation may change with context. (again, specific 
examples are hard
to fabricate, but I've had them consistently pop up over the years)

Also, I think someone else said that you couldn't inline functions with inline 
asm? Is that
correct? If so, I assume that's intended to be fixed?

> > As an extension from that, why is there no hardware vector support
> > in the language? Surely a primitive vector4 type would be a sensible
> > thing to have?
> The language supports it now (see the aforementioned vector syntax), it's just
> that the vector code gen isn't done (currently it is just implemented using 
> loops).

Are you referring to the comment about special casing a float[4]? I can see why 
one might
reach for that as a solution, but it sounds like a really bad idea to me...

> > Is it possible in D currently to pass vectors to functions by value
> > in registers? Without an intrinsic vector type, it would seem
> > impossible.
> Vectors (statically dimensioned arrays) are currently passed by value (unlike 
> C
> or C++).

Do you mean that like a memcpy to the stack, or somehow intuitively using the 
hardware vector
registers to pass arguments to the function properly?

> > How can I do this in a nice way in D? I'm long sick of writing
> > unsightly vector classes in C++, but fortunately using vendor
> > specific compiler intrinsics usually leads to decent code
> > generation. I can currently imagine an equally ugly (possibly worse)
> > hardware vector library in D, if it's even possible. But perhaps
> > I've missed something here?
> Your C++ vector code should be amenable to translation to D, so that effort of
> yours isn't lost, except that it'd have to be in inline asm rather than 
> intrinsics.

But sadly, in that case, it wouldn't work. Without an intrinsic hardware vector 
type, there's
no way to pass vectors to functions in registers, and also, using explicit asm, 
you tend to
end up with endless unnecessary loads and stores, and potentially a lot of 
redundant
shuffling/permutation. This will differ radically between architectures too.
I think I read in another post too that functions containing inline asm will 
not be inlined?
How does the D compiler go at optimising code around inline asm blocks? Most 
compilers have a
lot of trouble optimising around inline asm blocks, and many don't even attempt 
to do so...

How does GDC compare to DMD? Does it do a good job?
I really need to take the weekend and do a lot of experiments I think.

Re: __restrict, architecture intrinsics vs asm, consoles, and other stuff

Reply via email to