Re: [PyCUDA] cuMemAlloc failed: out of memory

Bogdan Opanchuk Thu, 05 Dec 2013 21:48:04 -0800

Hi Jayanth,

I can run a 8192x8192 transform on a Tesla C2050 without problems. I
think you are limited by the available video memory, see my previous
message in this thread --- a 8192x4096 buffer takes 250Mb, and you
have to factor in the temporary buffers PyFFT creates.


By the way, I would recommend you to switch from PyFFT to Reikna
(http://reikna.publicfields.net). PyFFT is not supported anymore, and
Reikna includes its code along with some additional features and
optimizations (more robust block/grid size finder, temporary array
management, launch optimizations and so on). Your code would look
like:

import numpy
import reikna.cluda as cluda
from reikna.fft import FFT

api = cluda.cuda_api()
thr = api.Thread.create()

# Or, if you want to use an external stream,
#
# cuda.init()
# context = make_default_context()
# stream = cuda.Stream()
# thr = api.Thread(stream)

data = numpy.ones((4096, 4096), dtype = numpy.complex64)
gpu_data = thr.to_device(data) #converting to gpu array

fft = FFT(data).compile(thr)
fft(gpu_data, gpu_data)
result = gpu_data.get()

print result


On Fri, Dec 6, 2013 at 3:43 PM, Jayanth Channagiri
<[email protected]> wrote:
> Dear Ahmed
>
> Thank you for the resourceful reply.
>
> But the CUFFT limit is ~2^27 and also in the benchmarks on the CUFFT reach
> upto 2^25. In my case, I am able to reach only upto 2^24. In some way, I am
> missing another factor. Is this limited by my GPU's memory?
> And also, in the same table, you can see for "Maximum width and height for a
> 2D texture reference bound to a CUDA array " is 65000*65000 which is way too
> high compared to mine. My GPU has a computing capacity of 2.x.
> Thank you for the idea of performing two separate sequentual 1D FFTs. I will
> shed more light on it. The thing is my problem doesn't stop at 2D. My goal
> is to perform 3D FFT and I am not sure if I can calculate that way.
>
>
> For others in the list, here I am sending the complete traceback of the
> error message.
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib/python2.7/dist-
> packages/spyderlib/widgets/externalshell/sitecustomize.py", line 493, in
> runfile
>     execfile(filename, namespace)
>   File "/home/jayanth/Dropbox/fft/fft1d_AB.py", line 99, in <module>
>     plan.execute(gpu_data)
>   File
> "/usr/local/lib/python2.7/dist-packages/pyfft-0.3.8-py2.7.egg/pyfft/plan.py",
> line 271, in _executeInterleaved
>     batch, data_in, data_out)
>   File
> "/usr/local/lib/python2.7/dist-packages/pyfft-0.3.8-py2.7.egg/pyfft/plan.py",
> line 192, in _execute
>     self._tempmemobj = self._context.allocate(buffer_size * 2)
>
> pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
>
> Also, here is the simple program to which I was addressing to calculate FFT
> using pyfft :
> from pyfft.cuda import Plan
> import numpy
> import pycuda.driver as cuda
> from pycuda.tools import make_default_context
> import pycuda.gpuarray as gpuarray
>
> cuda.init()
> context = make_default_context()
> stream = cuda.Stream()
>
> plan = Plan((4096, 4096), stream=stream) #creating the plan
> data = numpy.ones((4096, 4096), dtype = numpy.complex64) #My data with just
> ones to calculate the fft for single precision
> gpu_data = gpuarray.to_gpu(data) #converting to gpu array
> plan.execute(gpu_data)#calculating pyfft
> result = gpu_data.get() #the result
>
> This is just a simple program to calculate the FFT for an array of 4096 *
> 4096 in 2d. It works well for this array or a smaller array. As soon after I
> increase it to the higher values like 8192*8192 or 8192*4096 or anything, it
> gives an error message saying out of memory.
> So I wanted to know the reason behind it and how to overcome.
> You can execute the same code and kindly let me know if you have the same
> limits in your respective GPUs.
>
> Thank you
>
>
>
> ________________________________
> Date: Thu, 5 Dec 2013 20:27:45 -0500
> Subject: Re: [PyCUDA] cuMemAlloc failed: out of memory
> From: [email protected]
> To: [email protected]
> CC: [email protected]
>
>
> I ran into a similar issue:
> http://stackoverflow.com/questions/13187443/nvidia-cufft-limit-on-sizes-and-batches-for-fft-with-scikits-cuda
>
> The long and short of it is that CUFFT seems to have a limit of
> approximately 2^27 elements that it can operate on, in any combination of
> dimensions. In the StackOverflow post above, I was trying to make a plan for
> large batches of the same 1D FFTs and hit this limitation. You'll also
> notice that the benchmarks on the CUFFT site
> https://developer.nvidia.com/cuFFT go up to sizes of 2^25.
>
> I hypothesize that this is related to the 2^27 "Maximum width for a 1D
> texture reference bound to linear memory" limit that we see in Table 12 of
> the CUDA C Programming Guide
> http://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities.
>
> So since 4096**2 is 2^24, increasing to 8096 by 8096 gets very close to the
> limit, even though you'd think 2D FFTs would not be governed by the same
> limits as 1D FFT batches.
>
> You should be able to achieve 8096 by 8096 and larger 2D FFTs by performing
> two separate sequentual 1D FFTs, one horizontal and the other vertical. The
> runtimes should nominally be the same (they are for CPU FFTs), and the
> answer will be the same, up to machine precision.
>
>
> On Thu, Dec 5, 2013 at 9:53 AM, Jayanth Channagiri <[email protected]>
> wrote:
>
> Hello
>
> I have a NVIDIA 2000 GPU. It has 192 CUDA cores and 1 Gb memory.
>  GB GDDR5
>
> I am trying to calculate fft by GPU using pyfft.
> I am able to calculate the fft only upto the array with maximum of 4096 x
> 4096.
>
> But as soon after I increase the array size, it gives an error message
> saying:
> pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
>
> Can anyone please tell me if this error means that my GPU is not sufficient
> to calculate this array? Or is it my computer's memory? Or a programming
> error? What is the maximum array size you can achieve with GPU?
> Is there any information of how else can I calculate the huge arrays?
>
> Thank you very much in advance for the help and sorry if it is too
> preliminary question.
>
> Jayanth
>
>
>
>
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda
>
>
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda
>

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] cuMemAlloc failed: out of memory

Reply via email to