I suspect that you have a case of numerical overflow. Can you transfer the
results back from the device and see how many of the elements in the array
are inf?

David
On Dec 19, 2012 11:01 AM, "Simone Riva" <[email protected]> wrote:

> I've written this test code:
>
> Where I've inserted the call to the opencl prg in a loop.
> But after about 150 iterations I experiencing a dramatic loss
> of performance, and the velocity became too slow.
>
> What's the better way for calling an opencl program in a python for, like
> the example bellow, without any loss of performance.
>
> That's the output:
> the two loop do exactly the same operation.
>
> start ....
> Prg  : 0.256917
>
> start b ....
> Prg b: 1.663486
>
>
> Tnx.
>
> The code
> ----------------------------------------------------------------------
>
> import pyopencl as cl
> import pyopencl.array as cla
> import numpy
> import numpy.linalg as la
> import time
>
> lnn = 100000
> szz = lnn*32
>
> a = numpy.random.rand(szz,3).astype(numpy.float32)
> b = numpy.random.rand(szz,3).astype(numpy.float32)
> c = numpy.random.rand(szz,3).astype(numpy.float32)
>
> ctx = cl.create_some_context()
> queue = cl.CommandQueue(ctx)
> queue2 = cl.CommandQueue(ctx)
>
> mf = cl.mem_flags
>
> a_array = cla.to_device( queue , a )
> b_array = cla.to_device( queue , b )
>
> dest_array = cla.Array( queue , (szz,3) , numpy.float32 )
> dest_array_b = cla.Array( queue , (szz,3) , numpy.float32 )
>
> prg_b = cl.Program(ctx, """
>     __kernel void sum_b(__global const float *a,
>         __global const float *b, __global float *c)
>     {
>       int i = get_global_id(0);
>
>       float m = sqrt( pown( a[3*i] , 2 )  + pown( a[3*i+1] , 2 )  + pown(
> a[3*i+2] , 2 ) ) ;
>
>       c[3*i] = i*10.0f  + m ;
>       c[3*i+1] = i*10.0f + 1 ;
>       c[3*i+2] = i*10.0f + 2 ;
>
>     }
>     """).build()
>
>
>
> rep = 400
>
> print("\nstart ....")
>
> ta = time.time()
> for fooo in range(rep):
>   prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data ,
> dest_array.data )
> tb = time.time()
>
> print( "Prg  : %f" % (tb - ta) )
>
> #dest_array.get( queue , c )
> #print dest_array
>
> print("\nstart b ....")
>
> taa = time.time()
> for foo in range(rep):
>   prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data ,
> dest_array_b.data )
> tbb = time.time()
>
> print( "Prg b: %f" % (tbb - taa) )
>
> #dest_array_b.get( queue , c )
> #print ( dest_array_b - dest_array )
>
> _______________________________________________
> PyOpenCL mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pyopencl
>
>
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to