Hello Elliot

I can't really comment on the suitability of pycuda for your application,
but I can relate what I have observed as an inexperienced user with a signal
processing application.  The following are results for a single precision 64
tap filter processing 4.5 million samples on a dell 14Z laptop with an
nvidia 9400M GPU.  This is a very small GPU with 2 streaming multiprocessors
(SMs) - i.e. 16 arithmetic units.

Python only no use of Numpy or other packages: 554 sec
Python using weave.inline to implement a C code loop: 1.5 sec
First attempt at using pycuda: 0.37 sec
Optimized pycuda: 0.03 sec

The key thing that I did going from my first attempt at pycuda to the
optimized pycuda was learning how different types of memory are accessed and
using that information to refactor the code.  Getting the first attempt
running wasn't a whole lot more difficult than writing C code; however,
optimizing the code required substantially longer.

I would expect that you would obtain at least an additional factor of ten
speed up using one of the larger GPU boards for the optimized pycuda code.
The unoptimized code may be constrained by memory access time and probably
wouldn't see the additional 10X improvement.

regards

Jim

On Tue, Jan 19, 2010 at 11:22 PM, Elliot Hallmark <[email protected]>wrote:

> Hey there.
>
> I've been working with others on a non-sequential optical ray tracing
> program in python (modeling optical systems, not rendering fancy images).
>
> I was curious about using cuda to speed up processing, but I do not know
> enough about what GPU's do well and not so well.  Also, I wonder if there
> are any open projects that have already written code that would work for
> this.
>
> The program currently takes an array of origins and directions.  i use
> 5,000 but more like 100,000 rays would be excellent.  And then intersections
> between those rays and all of the dozen or so optical elements are
> calculated.  truth tests determine which of all of those intersections have
> children (ie, are the real rays).  this iterates till all of the rays are
> absorbed or at infinity.
>
> Soon, that whole process will be iterated many times while the computer
> tries to optimise the optical system.
>
> even with only 5,000 rays, 5000 rays*12 elements*10 bounces*30 iterations
> for optimization quickly becomes a lot of processing.
>
> Is this problem suited for cuda?  Is there any active or open development
> on this already? Also, I wonder how much of our code would have to be
> rewritten to accommodate this.  There is talk of rewriting the intersection
> calculations in faster C, and if it is a comparable ammount of work to do it
> in cuda instead then great!  If we should have just written everything in
> pycuda to start with, then I guess I'll be back if I ever want to write a
> new raytracer.
>
> thanks.
>
> Our raytracer can be found at:
> http://bitbucket.org/bryancole/raytrace/overview/
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://tiker.net/mailman/listinfo/pycuda_tiker.net
>
>
_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to