Also, are you using numpy with MKL?  Those numpy times are really fast.

-----Original Message-----
From: pycuda-boun...@tiker.net [mailto:pycuda-boun...@tiker.net] On Behalf Of 
Serra, Mr. Efren, Contractor, Code 7542
Sent: Monday, April 09, 2012 4:26 PM
To: 'Eli Stevens (Gmail)'
Cc: pycuda@tiker.net
Subject: Re: [PyCUDA] numpy.sum 377x faster than gpuarray.sum

Array size: 4000
GPU array time: 0.000420s
numpy array time: 0.000001s

Array size:40,000
GPU array time: 0.001648s
numpy array time: 0.000002s

Array size: 400,000
GPU array time: 0.000576s
numpy array time: 0.000002s

Array size: 4,000,000
GPU array time: 0.001961s
numpy array time: 0.000001s

Eli, I have just started to experiment with PyCUDA and was hoping to use it to 
do mean, standard deviation of some atmospheric data; however, the numbers 
above don't show much promise.

Efren A. Serra (Contractor)
DeVine Consulting, Inc.
Naval Research Laboratory
Marine Meteorology Division
7 Grace Hopper Ave., STOP 2
Monterey, CA 93943
Code 7542
Office: 831-656-4650

-----Original Message-----
From: Eli Stevens (Gmail) [mailto:wickedg...@gmail.com]
Sent: Monday, April 09, 2012 2:11 PM
To: Serra, Mr. Efren, Contractor, Code 7542
Cc: pycuda@tiker.net
Subject: Re: [PyCUDA] numpy.sum 377x faster than gpuarray.sum

There are fixed startup costs that do not amortize well over only 400 elements.

What happens when you vary the size of the array over several orders of 
magnitude?

Eli

On Mon, Apr 9, 2012 at 2:05 PM, Serra, Mr. Efren, Contractor, Code
7542 <efren.serra....@nrlmry.navy.mil> wrote:
> import numpy
> """
> """
> import pycuda.driver as cuda
> import pycuda.tools
> import pycuda.gpuarray as gpuarray
> import pycuda.autoinit, pycuda.compiler
>
> a=numpy.arange(400)
> a_gpu=gpuarray.arange(400,dtype=numpy.float32)
>
> start=cuda.Event()
> end=cuda.Event()
> start.record()
> gpuarray.sum(a_gpu).get()/a.size
> end.record()
> end.synchronize()
> print "GPU array time: %fs" %(start.time_till(end)*1e-3)
>
> start.record()
> numpy.sum(a)/a.size
> end.record()
> end.synchronize()
> print "numpy array time: %fs" %(start.time_till(end)*1e-3)
>
> GPU array time: 0.000377s
> numpy array time: 0.000001s
>
> Efren A. Serra (Contractor)
> DeVine Consulting, Inc.
> Naval Research Laboratory
> Marine Meteorology Division
> 7 Grace Hopper Ave., STOP 2
> Monterey, CA 93943
> Code 7542
> Office: 831-656-4650
>
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to