== Quote from Walter Bright (newshou...@digitalmars.com)'s article > D doesn't have __restrict. I'm going to argue that it is unnecessary. AFAIK, > __restrict is most used in writing vector operations. D, on the other hand, > has > a dedicated vector operation syntax: > a[] += b[] * c; > where a[] and b[] are required to not be overlapping, hence enabling > parallelization of the operation.
Use of __restrict is certainly not limited to your example, it's applicable basically anywhere that a pointer is dereferenced on either side of a write through any other pointer, or a function call (since it could potentially do anything), the resident value from the previous dereference is invalidated and must be reloaded needlessly unless the pointer is explicitly marked restrict. http://cellperformance.beyond3d.com/articles/2006/05/demystifying-the-restrict-keyword.html For RISC architectures in particular, __restrict is mandatory when optimising certain hot functions without making a mess of your code (declaring stack locals all over the place), and I think I've run into cases where even that's not enough. > D does have some intrinsics, like sin() and cos(). They tend to get added on a > strictly as-needed basis, not a speculative one. > D has no current intention to replace the inline assembler with intrinsics. > As for custom intrinsics, Don Clugston wrote an amazing piece of > demonstration D > code a while back that would take a string representing a floating point > expression, and would literally compile it (using Compile Time Function > Execution) and produce a string literal of inline asm functions, which were > then > compiled by the inline assembler. > So yes, it is entirely possible and practical for end users to write custom > intrinsics. I hadn't thought of that using compile-time functions, that's really nice. I'm not sure if that'll be enough to generate good code in all cases, but I'll do some experiments and see where it goes. The main problem with writing (intelligently generated) inline asm vs using intrinsics, is in the context of the C (or D) source code, you don't have enough context to know about the state of the register assignment, and producing the appropriate loads/stores. Also, the opcodes selected to perform the operation may change with context. (again, specific examples are hard to fabricate, but I've had them consistently pop up over the years) Also, I think someone else said that you couldn't inline functions with inline asm? Is that correct? If so, I assume that's intended to be fixed? > > As an extension from that, why is there no hardware vector support > > in the language? Surely a primitive vector4 type would be a sensible > > thing to have? > The language supports it now (see the aforementioned vector syntax), it's just > that the vector code gen isn't done (currently it is just implemented using > loops). Are you referring to the comment about special casing a float[4]? I can see why one might reach for that as a solution, but it sounds like a really bad idea to me... > > Is it possible in D currently to pass vectors to functions by value > > in registers? Without an intrinsic vector type, it would seem > > impossible. > Vectors (statically dimensioned arrays) are currently passed by value (unlike > C > or C++). Do you mean that like a memcpy to the stack, or somehow intuitively using the hardware vector registers to pass arguments to the function properly? > > How can I do this in a nice way in D? I'm long sick of writing > > unsightly vector classes in C++, but fortunately using vendor > > specific compiler intrinsics usually leads to decent code > > generation. I can currently imagine an equally ugly (possibly worse) > > hardware vector library in D, if it's even possible. But perhaps > > I've missed something here? > Your C++ vector code should be amenable to translation to D, so that effort of > yours isn't lost, except that it'd have to be in inline asm rather than > intrinsics. But sadly, in that case, it wouldn't work. Without an intrinsic hardware vector type, there's no way to pass vectors to functions in registers, and also, using explicit asm, you tend to end up with endless unnecessary loads and stores, and potentially a lot of redundant shuffling/permutation. This will differ radically between architectures too. I think I read in another post too that functions containing inline asm will not be inlined? How does the D compiler go at optimising code around inline asm blocks? Most compilers have a lot of trouble optimising around inline asm blocks, and many don't even attempt to do so... How does GDC compare to DMD? Does it do a good job? I really need to take the weekend and do a lot of experiments I think.