In the benchmark-all.py program, it seems that since the inner loop in
the normal CPU
example is over range(1000) :
for i in range(1000):
* for j in range(1000):
* c_result[i] = a[i] + b[i]
c_result[i] = c_result[i] * (a[i] + b[i])
c_result[i] = c_result[i] * (a[i] / 2.0)
....
that the loop that's running inside the kernel should be* <= 1000* instead
of *< 1000*, or perhaps loop should start at zero:
prg = cl.Program(ctx, """
__kernel void sum(__global const float *a,
__global const float *b, __global float *c)
{
int loop;
int gid = get_global_id(0);
* /* for(loop=1; loop<1000;loop++) */
for (loop = 0; loop < 1000; loop++)
* {
c[gid] = a[gid] + b[gid];
c[gid] = c[gid] * (a[gid] + b[gid]);
c[gid] = c[gid] * (a[gid] / 2.0);
}
}
""").build()
Is this observation correct? Or am I still missing something with how these
threads work? (I am new to the OpenCL programming model).
--Keith Brafford
_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net