does anyone have any thoughts? is this feasible?

On Mon, Nov 23, 2015 at 3:14 PM, Keith Brown <keith6...@gmail.com> wrote:
> Thanks all for the replies.
>
> My goal is simple. Atleast, I though it was simple :-)
>
> I have function where I calculate the dot product
>
> def F(a,b):
>   return np.dot(a.T,b)
>
> I need to do this 8k times. The max size of 'a' and 'b' are  (3 million, 1).
>
> For smaller size of a and b. linalg.dot is working great. But I want a
> more efficient way using GPU.
>
> Perhaps, GPU isn't the way to go since the memory is too large?
>
>
>
>
>
> On Mon, Nov 23, 2015 at 2:26 PM, Stanley Seibert <s...@mtrr.org> wrote:
>> From the cuBLAS-XT description:
>>
>> (https://developer.nvidia.com/cublas)
>>
>> "By using a streaming design, cuBLAS-XT efficiently manages transfers across 
>> the PCI-Express bus automatically, which allows input and output data to be 
>> stored on the host’s system memory. This provides out-of-core operation – 
>> the size of operand data is only limited by system memory size, not by GPU 
>> on-board memory size.”
>>
>> So I don’t think cuBLAS-XT can help unless you have more than 95 GB of 
>> system RAM.  If that is not the case, I think you have to step back and 
>> think about what you need to do with this array ultimately, and where you 
>> want to stage the data if you need to compute all 95 GB of it at once.
>>
>>
>>> On Nov 23, 2015, at 12:58 PM, Keith Brown <keith6...@gmail.com> wrote:
>>>
>>> Correct. My result matrix will be too large.
>>>
>>> <sigh>
>>>
>>> I would think cublasXT would take care of this for me. I though it
>>> would do some sort of divide and conquer.
>>>
>>> Is there a way to attack this sort of problem?
>>>
>>> On Mon, Nov 23, 2015 at 11:38 AM, Jonas Bardino <bard...@nbi.ku.dk> wrote:
>>>> Ehmm, I'm not sure I understand exactly what you do, but to me it sounds
>>>> like you try to calculate the dot product of a 160080 x 3 matrix and a
>>>> similar one transposed, i.e. a 3 x 160080 matrix. That would give you a
>>>> 160080 x 160080 matrix result - which surely won't fit your 3GB of GPU
>>>> memory.
>>>>
>>>> Cheers, Jonas
>>>>
>>>> On 2015-11-23 17:10, Keith Brown wrote:
>>>>> I have a 2 small matrix (160080,3) of type float32 and I am
>>>>> calculating their dot product. While doing this, I keep getting
>>>>> pycuda.__driver.MemoryError: cuMemAlloc failed out of memory.
>>>>>
>>>>> I have 2 cards, each with 3GB of memory. Each matrix takes about 1875
>>>>> kilobytes. I am not sure why this is occuring.
>>>>>
>>>>> x=np.ones((160080,3L)).astype(np.float32)
>>>>> a_gpu=gpuarray.to_gpu(x)
>>>>> b_gpu=gpuarray.to_gpu(x)
>>>>> c_gpu = linalg.dot(a_gpu,b_gpu,'N','T',handle=handle)
>>>>>
>>>>> My handle is a cublasxt (not regular cublas since blasxt apprently
>>>>> does better memory handling).
>>>>>
>>>>> Any idea what is going on?
>>>>>
>>>>> _______________________________________________
>>>>> PyCUDA mailing list
>>>>> PyCUDA@tiker.net
>>>>> http://lists.tiker.net/listinfo/pycuda
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> PyCUDA mailing list
>>>> PyCUDA@tiker.net
>>>> http://lists.tiker.net/listinfo/pycuda
>>>>
>>>
>>> _______________________________________________
>>> PyCUDA mailing list
>>> PyCUDA@tiker.net
>>> http://lists.tiker.net/listinfo/pycuda
>>

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to