Re: SIMD/intrinsincs questions

Michael Farnsworth Mon, 09 Nov 2009 00:05:15 -0800

On 11/08/2009 11:28 PM, Robert Jacques wrote:

By design, D asm blocks are separated from the optimizer: no code
motion, etc occurs. D2 just changed fixed sized arrays to value types,
which provide most of the functionality of a small vector struct.
However, actual SSE optimization of these types is probably going to
wait until x64 support; since a bunch of 32-bit chips don't support them.


P.S. For what it's worth, I do research which involves volumetric
ray-tracing. I've always found memory to bottleneck computations. Also,
why not look into CUDA/OpenCL/DirectCompute?

Yeah, I've discovered that having either the constraints-based __asm()from ldc or actual intrinsics probably makes optimization opportunitiesmore frequent. But, if it at least inlined the regular asm blocks forme I'd be most of the way there. The ldc guys tell me that they didn'tinclude the llvm vector intrinsics already because they were going toneed either a custom type in the frontend, or else the D2fixed-size-arrays-as-value-types functionality. I might take a stab atsome of that in ldc in the future to see if I can get it to work, butI'm not an expert in compilers by any stretch of the imagination.


-Mike

PS: As for trying CUDA/OpenCL/DirectCompute, I haven't gotten into itmuch for a few reasons:


* The standards and APIs are still evolving

* I refuse to pigeon-hole myself into windows (I'm typing this from aFedora 11 box, and at work we're a linux shop doing movie VFX)* Larrabee (yes, yes, semi-vaporware until Intel gets their craptogether) will allow something much closer to standard CPU code. Ireally think that's the direction the GPU makers are heading in general,so why hobble myself with cruddy GPU memory/threading models to codearound right now?* GPUs keep changing, and every change brings with it subtle (andsometimes drastic) effects on your code's performance and results fromcard to card. It's a nightmare to maintain, and every project we'vedone trying to do production rendering stuff on GPU (even justrelighting) has ended in tears and gnashing of teeth. Everyone justeventually throws up their hands and goes back to optimized CPUrendering in the VFX industry (Pixar, ILM, Tippett have all done that,just to name a few).

Good, solid general purpose CPUs with caches, decently wide SIMD withscatter/gather, and plenty of hardware threads are the wave of thefuture. (Or was that the past? I can't remember.)

GPUs are slowly converging back to that, except that currently they havea programmer-managed cache (texture mem), and they execute multiplethreads concurrently over the same instructions in groups (warps, inCUDA-speak?). They'll eventually add the 'feature' of a moreautomatically-managed cache, and better memory throughput when allowingwarps to be smaller and more flexible. And they'll look nearlyidentical to all the multi-core CPUs again when it happens.

Re: SIMD/intrinsincs questions

Reply via email to