Hi Simon,

It may help to synchronize with the queue before measuring time:

ta = time.time()
for fooo in range(rep):
  prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data ,
dest_array.data )
queue.finish()
tb = time.time()

Otherwise you only measure the speed of insertion into the queue.

Best regards,
Bogdan

On Thu, Dec 20, 2012 at 12:04 AM, Simone Riva <[email protected]> wrote:
> I've written this test code:
>
> Where I've inserted the call to the opencl prg in a loop.
> But after about 150 iterations I experiencing a dramatic loss of
> performance, and the velocity became too slow.
>
> What's the better way for calling an opencl program in a python for, like
> the example bellow, without any loss of performance.
>
> That's the output:
> the two loop do exactly the same operation.
>
> start ....
> Prg  : 0.256917
>
> start b ....
> Prg b: 1.663486
>
>
> Tnx.
>
> The code
> ----------------------------------------------------------------------
>
> import pyopencl as cl
> import pyopencl.array as cla
> import numpy
> import numpy.linalg as la
> import time
>
> lnn = 100000
> szz = lnn*32
>
> a = numpy.random.rand(szz,3).astype(numpy.float32)
> b = numpy.random.rand(szz,3).astype(numpy.float32)
> c = numpy.random.rand(szz,3).astype(numpy.float32)
>
> ctx = cl.create_some_context()
> queue = cl.CommandQueue(ctx)
> queue2 = cl.CommandQueue(ctx)
>
> mf = cl.mem_flags
>
> a_array = cla.to_device( queue , a )
> b_array = cla.to_device( queue , b )
>
> dest_array = cla.Array( queue , (szz,3) , numpy.float32 )
> dest_array_b = cla.Array( queue , (szz,3) , numpy.float32 )
>
> prg_b = cl.Program(ctx, """
>     __kernel void sum_b(__global const float *a,
>        __global const float *b, __global float *c)
>     {
>       int i = get_global_id(0);
>
>       float m = sqrt( pown( a[3*i] , 2 )  + pown( a[3*i+1] , 2 )  + pown(
> a[3*i+2] , 2 ) ) ;
>
>       c[3*i] = i*10.0f  + m ;
>       c[3*i+1] = i*10.0f + 1 ;
>       c[3*i+2] = i*10.0f + 2 ;
>
>     }
>     """).build()
>
>
>
> rep = 400
>
> print("\nstart ....")
>
> ta = time.time()
> for fooo in range(rep):
>   prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data ,
> dest_array.data )
> tb = time.time()
>
> print( "Prg  : %f" % (tb - ta) )
>
> #dest_array.get( queue , c )
> #print dest_array
>
> print("\nstart b ....")
>
> taa = time.time()
> for foo in range(rep):
>   prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data ,
> dest_array_b.data )
> tbb = time.time()
>
> print( "Prg b: %f" % (tbb - taa) )
>
> #dest_array_b.get( queue , c )
> #print ( dest_array_b - dest_array )
>
> _______________________________________________
> PyOpenCL mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pyopencl
>

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to