2009/8/6 Erik Tollerud <erik.tolle...@gmail.com>:
> Note that this is from a "user" perspective, as I have no particular plan of
> developing the details of this implementation, but I've thought for a long
> time that GPU support could be great for numpy (I would also vote for OpenCL
> support over cuda, although conceptually they seem quite similar)...
> But  what exactly would the large-scale plan be?  One of the advantages of
> GPGPUs is that they are particularly suited to rather complicated
> paralellizable algorithms,

You mean simple parallizable algorithms, I suppose?

 and the numpy-level basic operations are just the
> simple arithmatic operations.  So while I'd love to see it working, it's
> unclear to me exactly how much is gained at the core numpy level, especially
> given that it's limited to single-precision on most GPUs.
> Now linear algebra or FFTs on a GPU would probably be a huge boon, I'll
> admit - especially if it's in the form of a drop-in replacement for the
> numpy or scipy versions.
> By the way, I noticed no one mentioned the GPUArray class in pycuda (and it
> looks like there's something similar in the pyopencl) - seems like that's
> already done a fair amount of the work...
> http://documen.tician.de/pycuda/array.html#pycuda.gpuarray.GPUArray
>
>
> On Thu, Aug 6, 2009 at 10:41 AM, James Bergstra <bergs...@iro.umontreal.ca>
> wrote:
>>
>> On Thu, Aug 6, 2009 at 1:19 PM, Charles R
>> Harris<charlesr.har...@gmail.com> wrote:
>> > I almost looks like you are reimplementing numpy, in c++ no less. Is
>> > there
>> > any reason why you aren't working with a numpy branch and just adding
>> > ufuncs?
>>
>> I don't know how that would work.  The Ufuncs need a datatype to work
>> with, and AFAIK, it would break everything if a numpy ndarray pointed
>> to memory on the GPU.  Could you explain what you mean a little more?
>>
>> > I'm also curious if you have thoughts about how to use the GPU
>> > pipelines in parallel.
>>
>> Current thinking for ufunc type computations:
>> 1) divide up the tensors into subtensors whose dimensions have
>> power-of-two sizes (this permits a fast integer -> ndarray coordinate
>> computation using bit shifting),
>> 2) launch a kernel for each subtensor in it's own stream to use
>> parallel pipelines.
>> 3) sync and return.
>>
>> This is a pain to do without automatic code generation though.
>> Currently we're using macros, but that's not pretty.
>> C++ has templates, which we don't really use yet, but were planning on
>> using.  These have some power to generate code.
>> The 'theano' project (www.pylearn.org/theano) for which cuda-ndarray
>> was created has a more powerful code generation mechanism similar to
>> weave.   This algorithm is used in theano-cuda-ndarray.
>> Scipy.weave could be very useful for generating code for specific
>> shapes/ndims on demand, if weave could use nvcc.
>>
>> James
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to