There are fixed startup costs that do not amortize well over only 400 elements.
What happens when you vary the size of the array over several orders of magnitude? Eli On Mon, Apr 9, 2012 at 2:05 PM, Serra, Mr. Efren, Contractor, Code 7542 <efren.serra....@nrlmry.navy.mil> wrote: > import numpy > """ > """ > import pycuda.driver as cuda > import pycuda.tools > import pycuda.gpuarray as gpuarray > import pycuda.autoinit, pycuda.compiler > > a=numpy.arange(400) > a_gpu=gpuarray.arange(400,dtype=numpy.float32) > > start=cuda.Event() > end=cuda.Event() > start.record() > gpuarray.sum(a_gpu).get()/a.size > end.record() > end.synchronize() > print "GPU array time: %fs" %(start.time_till(end)*1e-3) > > start.record() > numpy.sum(a)/a.size > end.record() > end.synchronize() > print "numpy array time: %fs" %(start.time_till(end)*1e-3) > > GPU array time: 0.000377s > numpy array time: 0.000001s > > Efren A. Serra (Contractor) > DeVine Consulting, Inc. > Naval Research Laboratory > Marine Meteorology Division > 7 Grace Hopper Ave., STOP 2 > Monterey, CA 93943 > Code 7542 > Office: 831-656-4650 > > > _______________________________________________ > PyCUDA mailing list > PyCUDA@tiker.net > http://lists.tiker.net/listinfo/pycuda _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda