Hi Simon, It may help to synchronize with the queue before measuring time:
ta = time.time() for fooo in range(rep): prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data , dest_array.data ) queue.finish() tb = time.time() Otherwise you only measure the speed of insertion into the queue. Best regards, Bogdan On Thu, Dec 20, 2012 at 12:04 AM, Simone Riva <[email protected]> wrote: > I've written this test code: > > Where I've inserted the call to the opencl prg in a loop. > But after about 150 iterations I experiencing a dramatic loss of > performance, and the velocity became too slow. > > What's the better way for calling an opencl program in a python for, like > the example bellow, without any loss of performance. > > That's the output: > the two loop do exactly the same operation. > > start .... > Prg : 0.256917 > > start b .... > Prg b: 1.663486 > > > Tnx. > > The code > ---------------------------------------------------------------------- > > import pyopencl as cl > import pyopencl.array as cla > import numpy > import numpy.linalg as la > import time > > lnn = 100000 > szz = lnn*32 > > a = numpy.random.rand(szz,3).astype(numpy.float32) > b = numpy.random.rand(szz,3).astype(numpy.float32) > c = numpy.random.rand(szz,3).astype(numpy.float32) > > ctx = cl.create_some_context() > queue = cl.CommandQueue(ctx) > queue2 = cl.CommandQueue(ctx) > > mf = cl.mem_flags > > a_array = cla.to_device( queue , a ) > b_array = cla.to_device( queue , b ) > > dest_array = cla.Array( queue , (szz,3) , numpy.float32 ) > dest_array_b = cla.Array( queue , (szz,3) , numpy.float32 ) > > prg_b = cl.Program(ctx, """ > __kernel void sum_b(__global const float *a, > __global const float *b, __global float *c) > { > int i = get_global_id(0); > > float m = sqrt( pown( a[3*i] , 2 ) + pown( a[3*i+1] , 2 ) + pown( > a[3*i+2] , 2 ) ) ; > > c[3*i] = i*10.0f + m ; > c[3*i+1] = i*10.0f + 1 ; > c[3*i+2] = i*10.0f + 2 ; > > } > """).build() > > > > rep = 400 > > print("\nstart ....") > > ta = time.time() > for fooo in range(rep): > prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data , > dest_array.data ) > tb = time.time() > > print( "Prg : %f" % (tb - ta) ) > > #dest_array.get( queue , c ) > #print dest_array > > print("\nstart b ....") > > taa = time.time() > for foo in range(rep): > prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data , > dest_array_b.data ) > tbb = time.time() > > print( "Prg b: %f" % (tbb - taa) ) > > #dest_array_b.get( queue , c ) > #print ( dest_array_b - dest_array ) > > _______________________________________________ > PyOpenCL mailing list > [email protected] > http://lists.tiker.net/listinfo/pyopencl > _______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
