2009/8/6 Erik Tollerud <erik.tolle...@gmail.com>: > Note that this is from a "user" perspective, as I have no particular plan of > developing the details of this implementation, but I've thought for a long > time that GPU support could be great for numpy (I would also vote for OpenCL > support over cuda, although conceptually they seem quite similar)... > But what exactly would the large-scale plan be? One of the advantages of > GPGPUs is that they are particularly suited to rather complicated > paralellizable algorithms,
You mean simple parallizable algorithms, I suppose? and the numpy-level basic operations are just the > simple arithmatic operations. So while I'd love to see it working, it's > unclear to me exactly how much is gained at the core numpy level, especially > given that it's limited to single-precision on most GPUs. > Now linear algebra or FFTs on a GPU would probably be a huge boon, I'll > admit - especially if it's in the form of a drop-in replacement for the > numpy or scipy versions. > By the way, I noticed no one mentioned the GPUArray class in pycuda (and it > looks like there's something similar in the pyopencl) - seems like that's > already done a fair amount of the work... > http://documen.tician.de/pycuda/array.html#pycuda.gpuarray.GPUArray > > > On Thu, Aug 6, 2009 at 10:41 AM, James Bergstra <bergs...@iro.umontreal.ca> > wrote: >> >> On Thu, Aug 6, 2009 at 1:19 PM, Charles R >> Harris<charlesr.har...@gmail.com> wrote: >> > I almost looks like you are reimplementing numpy, in c++ no less. Is >> > there >> > any reason why you aren't working with a numpy branch and just adding >> > ufuncs? >> >> I don't know how that would work. The Ufuncs need a datatype to work >> with, and AFAIK, it would break everything if a numpy ndarray pointed >> to memory on the GPU. Could you explain what you mean a little more? >> >> > I'm also curious if you have thoughts about how to use the GPU >> > pipelines in parallel. >> >> Current thinking for ufunc type computations: >> 1) divide up the tensors into subtensors whose dimensions have >> power-of-two sizes (this permits a fast integer -> ndarray coordinate >> computation using bit shifting), >> 2) launch a kernel for each subtensor in it's own stream to use >> parallel pipelines. >> 3) sync and return. >> >> This is a pain to do without automatic code generation though. >> Currently we're using macros, but that's not pretty. >> C++ has templates, which we don't really use yet, but were planning on >> using. These have some power to generate code. >> The 'theano' project (www.pylearn.org/theano) for which cuda-ndarray >> was created has a more powerful code generation mechanism similar to >> weave. This algorithm is used in theano-cuda-ndarray. >> Scipy.weave could be very useful for generating code for specific >> shapes/ndims on demand, if weave could use nvcc. >> >> James >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion