On 11/08/2009 11:28 PM, Robert Jacques wrote:
By design, D asm blocks are separated from the optimizer: no code
motion, etc occurs. D2 just changed fixed sized arrays to value types,
which provide most of the functionality of a small vector struct.
However, actual SSE optimization of these types is probably going to
wait until x64 support; since a bunch of 32-bit chips don't support them.

P.S. For what it's worth, I do research which involves volumetric
ray-tracing. I've always found memory to bottleneck computations. Also,
why not look into CUDA/OpenCL/DirectCompute?

Yeah, I've discovered that having either the constraints-based __asm() from ldc or actual intrinsics probably makes optimization opportunities more frequent. But, if it at least inlined the regular asm blocks for me I'd be most of the way there. The ldc guys tell me that they didn't include the llvm vector intrinsics already because they were going to need either a custom type in the frontend, or else the D2 fixed-size-arrays-as-value-types functionality. I might take a stab at some of that in ldc in the future to see if I can get it to work, but I'm not an expert in compilers by any stretch of the imagination.

-Mike

PS: As for trying CUDA/OpenCL/DirectCompute, I haven't gotten into it much for a few reasons:

* The standards and APIs are still evolving
* I refuse to pigeon-hole myself into windows (I'm typing this from a Fedora 11 box, and at work we're a linux shop doing movie VFX) * Larrabee (yes, yes, semi-vaporware until Intel gets their crap together) will allow something much closer to standard CPU code. I really think that's the direction the GPU makers are heading in general, so why hobble myself with cruddy GPU memory/threading models to code around right now? * GPUs keep changing, and every change brings with it subtle (and sometimes drastic) effects on your code's performance and results from card to card. It's a nightmare to maintain, and every project we've done trying to do production rendering stuff on GPU (even just relighting) has ended in tears and gnashing of teeth. Everyone just eventually throws up their hands and goes back to optimized CPU rendering in the VFX industry (Pixar, ILM, Tippett have all done that, just to name a few).

Good, solid general purpose CPUs with caches, decently wide SIMD with scatter/gather, and plenty of hardware threads are the wave of the future. (Or was that the past? I can't remember.)

GPUs are slowly converging back to that, except that currently they have a programmer-managed cache (texture mem), and they execute multiple threads concurrently over the same instructions in groups (warps, in CUDA-speak?). They'll eventually add the 'feature' of a more automatically-managed cache, and better memory throughput when allowing warps to be smaller and more flexible. And they'll look nearly identical to all the multi-core CPUs again when it happens.

Reply via email to