On Tue, 2010-10-12 at 16:29 +0200, Stéphane Letz wrote: > > > I've done some test using OpenCL in the context of the Faust project > (http://faust.grame.fr/). Up to now results are not really good, and I > guess CUDA/OpenCL will be usable only in specific cases.
What kinds of parallellism have you been exploring? I have found that the multiple channel strip approach with mixdown to subgroups is straight-forward for a DAW, as well as for polysynths. Global sync points between multiprocessors - for cascaded processing - works up to a limit after which the squared cost of the sync eats up the computational value of the added multiprocessor. I am syncing the 6 MP's on a GT220 every 16 samples at 96kHz with a penalty in the 5% range. On higher end cards this approach is not very useful though. Discouraged by Nvidia staffers also ... In theory, the granularity of the vector needs to be no higher than 32 vector elements to be efficient - that is how the hardware multithreading works. In practice, for the given use case, you will find yourself trashing the instruction cache if you use too many divergent warps. Two, perhaps three, completely different code paths on each MP works well. 192 or 256 threads are a minimum to hide instruction latency, leading to the conclusion that the effective vector as seen by the outside world needs to be at most 128 elements wide (256/2, which is what I currently use) and possibly as low as 64 (192 threads / 3 codepaths) > I'll probably now test if directly using CUDA would give some benefit. > Maybe we can share some ideas? CUDA is nice :) > Stéphane > /j -- jedes mal wenn du eine quintparallele verwendest tötet bach ein kätzchen. http://www.youtube.com/watch?v=43RdmmNaGfQ _______________________________________________ Linux-audio-dev mailing list Linux-audio-dev@lists.linuxaudio.org http://lists.linuxaudio.org/listinfo/linux-audio-dev