Hi Ahmed, On Fri, Dec 6, 2013 at 12:27 PM, Ahmed Fasih <[email protected]> wrote: > I ran into a similar issue: > http://stackoverflow.com/questions/13187443/nvidia-cufft-limit-on-sizes-and-batches-for-fft-with-scikits-cuda
Batch 10000 of 64x1024 complex64 arrays amounts to 5Gb of data, which wouldn't fit on 2.5Gb memory of C2050 anyway :) Even with C2070 it probably wouldn't work since it would require at least one temporary intermediate array of the same size. > I hypothesize that this is related to the 2^27 "Maximum width for a 1D > texture reference bound to linear memory" limit that we see in Table 12 of > the CUDA C Programming Guide > http://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities. I doubt that CUFFT uses textures internally, I do not see any advantage in it as compared to the normal global memory. I would guess it has something to do with grid size limitations or data sizes of variables used internally for indexing. Also I don't think that's what happens in Jayanth's case; for him it's probably just the lack of [free] global memory. 8192x8192 of complex64 is 500Mb, add an output array and one or two temporary ones and you can easily exceed the capabilities of your video card. > You should be able to achieve 8096 by 8096 and larger 2D FFTs by performing > two separate sequentual 1D FFTs, one horizontal and the other vertical. The > runtimes should nominally be the same (they are for CPU FFTs), and the > answer will be the same, up to machine precision. Isn't it how multidimensional FFTs are usually implemented? _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
