In the benchmark-all.py program, it seems that since the inner loop in
the normal CPU
example is over range(1000) :

for i in range(1000):
*   for j in range(1000):
*       c_result[i] = a[i] + b[i]
       c_result[i] = c_result[i] * (a[i] + b[i])
       c_result[i] = c_result[i] * (a[i] / 2.0)
....

that the loop that's running inside the kernel should be* <= 1000* instead
of *< 1000*, or perhaps loop should start at zero:

   prg = cl.Program(ctx, """
       __kernel void sum(__global const float *a,
       __global const float *b, __global float *c)
       {
          int loop;
          int gid = get_global_id(0);
*          /* for(loop=1; loop<1000;loop++) */
          for (loop = 0; loop < 1000; loop++)
*          {
             c[gid] = a[gid] + b[gid];
             c[gid] = c[gid] * (a[gid] + b[gid]);
             c[gid] = c[gid] * (a[gid] / 2.0);
          }
       }
       """).build()

Is this observation correct?  Or am I still missing something with how these
threads work? (I am new to the OpenCL programming model).

--Keith Brafford
_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net

Reply via email to