Thanks all for the replies.

My goal is simple. Atleast, I though it was simple :-)

I have function where I calculate the dot product

def F(a,b):
  return np.dot(a.T,b)

I need to do this 8k times. The max size of 'a' and 'b' are  (3 million, 1).

For smaller size of a and b. linalg.dot is working great. But I want a
more efficient way using GPU.

Perhaps, GPU isn't the way to go since the memory is too large?





On Mon, Nov 23, 2015 at 2:26 PM, Stanley Seibert <s...@mtrr.org> wrote:
> From the cuBLAS-XT description:
>
> (https://developer.nvidia.com/cublas)
>
> "By using a streaming design, cuBLAS-XT efficiently manages transfers across 
> the PCI-Express bus automatically, which allows input and output data to be 
> stored on the host’s system memory. This provides out-of-core operation – the 
> size of operand data is only limited by system memory size, not by GPU 
> on-board memory size.”
>
> So I don’t think cuBLAS-XT can help unless you have more than 95 GB of system 
> RAM.  If that is not the case, I think you have to step back and think about 
> what you need to do with this array ultimately, and where you want to stage 
> the data if you need to compute all 95 GB of it at once.
>
>
>> On Nov 23, 2015, at 12:58 PM, Keith Brown <keith6...@gmail.com> wrote:
>>
>> Correct. My result matrix will be too large.
>>
>> <sigh>
>>
>> I would think cublasXT would take care of this for me. I though it
>> would do some sort of divide and conquer.
>>
>> Is there a way to attack this sort of problem?
>>
>> On Mon, Nov 23, 2015 at 11:38 AM, Jonas Bardino <bard...@nbi.ku.dk> wrote:
>>> Ehmm, I'm not sure I understand exactly what you do, but to me it sounds
>>> like you try to calculate the dot product of a 160080 x 3 matrix and a
>>> similar one transposed, i.e. a 3 x 160080 matrix. That would give you a
>>> 160080 x 160080 matrix result - which surely won't fit your 3GB of GPU
>>> memory.
>>>
>>> Cheers, Jonas
>>>
>>> On 2015-11-23 17:10, Keith Brown wrote:
>>>> I have a 2 small matrix (160080,3) of type float32 and I am
>>>> calculating their dot product. While doing this, I keep getting
>>>> pycuda.__driver.MemoryError: cuMemAlloc failed out of memory.
>>>>
>>>> I have 2 cards, each with 3GB of memory. Each matrix takes about 1875
>>>> kilobytes. I am not sure why this is occuring.
>>>>
>>>> x=np.ones((160080,3L)).astype(np.float32)
>>>> a_gpu=gpuarray.to_gpu(x)
>>>> b_gpu=gpuarray.to_gpu(x)
>>>> c_gpu = linalg.dot(a_gpu,b_gpu,'N','T',handle=handle)
>>>>
>>>> My handle is a cublasxt (not regular cublas since blasxt apprently
>>>> does better memory handling).
>>>>
>>>> Any idea what is going on?
>>>>
>>>> _______________________________________________
>>>> PyCUDA mailing list
>>>> PyCUDA@tiker.net
>>>> http://lists.tiker.net/listinfo/pycuda
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> PyCUDA mailing list
>>> PyCUDA@tiker.net
>>> http://lists.tiker.net/listinfo/pycuda
>>>
>>
>> _______________________________________________
>> PyCUDA mailing list
>> PyCUDA@tiker.net
>> http://lists.tiker.net/listinfo/pycuda
>

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to