Thanks all for the replies. My goal is simple. Atleast, I though it was simple :-)
I have function where I calculate the dot product def F(a,b): return np.dot(a.T,b) I need to do this 8k times. The max size of 'a' and 'b' are (3 million, 1). For smaller size of a and b. linalg.dot is working great. But I want a more efficient way using GPU. Perhaps, GPU isn't the way to go since the memory is too large? On Mon, Nov 23, 2015 at 2:26 PM, Stanley Seibert <s...@mtrr.org> wrote: > From the cuBLAS-XT description: > > (https://developer.nvidia.com/cublas) > > "By using a streaming design, cuBLAS-XT efficiently manages transfers across > the PCI-Express bus automatically, which allows input and output data to be > stored on the host’s system memory. This provides out-of-core operation – the > size of operand data is only limited by system memory size, not by GPU > on-board memory size.” > > So I don’t think cuBLAS-XT can help unless you have more than 95 GB of system > RAM. If that is not the case, I think you have to step back and think about > what you need to do with this array ultimately, and where you want to stage > the data if you need to compute all 95 GB of it at once. > > >> On Nov 23, 2015, at 12:58 PM, Keith Brown <keith6...@gmail.com> wrote: >> >> Correct. My result matrix will be too large. >> >> <sigh> >> >> I would think cublasXT would take care of this for me. I though it >> would do some sort of divide and conquer. >> >> Is there a way to attack this sort of problem? >> >> On Mon, Nov 23, 2015 at 11:38 AM, Jonas Bardino <bard...@nbi.ku.dk> wrote: >>> Ehmm, I'm not sure I understand exactly what you do, but to me it sounds >>> like you try to calculate the dot product of a 160080 x 3 matrix and a >>> similar one transposed, i.e. a 3 x 160080 matrix. That would give you a >>> 160080 x 160080 matrix result - which surely won't fit your 3GB of GPU >>> memory. >>> >>> Cheers, Jonas >>> >>> On 2015-11-23 17:10, Keith Brown wrote: >>>> I have a 2 small matrix (160080,3) of type float32 and I am >>>> calculating their dot product. While doing this, I keep getting >>>> pycuda.__driver.MemoryError: cuMemAlloc failed out of memory. >>>> >>>> I have 2 cards, each with 3GB of memory. Each matrix takes about 1875 >>>> kilobytes. I am not sure why this is occuring. >>>> >>>> x=np.ones((160080,3L)).astype(np.float32) >>>> a_gpu=gpuarray.to_gpu(x) >>>> b_gpu=gpuarray.to_gpu(x) >>>> c_gpu = linalg.dot(a_gpu,b_gpu,'N','T',handle=handle) >>>> >>>> My handle is a cublasxt (not regular cublas since blasxt apprently >>>> does better memory handling). >>>> >>>> Any idea what is going on? >>>> >>>> _______________________________________________ >>>> PyCUDA mailing list >>>> PyCUDA@tiker.net >>>> http://lists.tiker.net/listinfo/pycuda >>>> >>> >>> >>> >>> _______________________________________________ >>> PyCUDA mailing list >>> PyCUDA@tiker.net >>> http://lists.tiker.net/listinfo/pycuda >>> >> >> _______________________________________________ >> PyCUDA mailing list >> PyCUDA@tiker.net >> http://lists.tiker.net/listinfo/pycuda > _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda