Dear PyOpenCL community,

I have two implementations of a same algorithm using shared memory and I do not 
understand why one works and not the other ?
The wrong one is the second !!!

hist is cyclic and we want to average over 3 neighbours, everything 6 times.

WORKGROUP_SIZE = 128 in this case

__local volatile float hist[36];
__local volatile float hist2[WORKGROUP_SIZE];
int lid0 = get_local_id(0);
/*
        Apply smoothing 6 times
*/

for (j=0; j<6; j++) {
        if (lid0 == 0) {
                hist2[0] = hist[0]; //save unmodified hist
                hist[0] = (hist[35] + hist[0] + hist[1]) / 3.0f;
        }
        barrier(CLK_LOCAL_MEM_FENCE);
        if (0 < lid0 && lid0 < 35) {
                hist2[lid0]=hist[lid0];
                hist[lid0] = (hist2[lid0-1] + hist[lid0] + hist[lid0+1]) / 3.0f;
        }
        barrier(CLK_LOCAL_MEM_FENCE);
        if (lid0 == 35) {
                hist[35] = (hist2[34] + hist[35] + hist[0]) / 3.0f;
        }
        barrier(CLK_LOCAL_MEM_FENCE);
}


for (j=0; j<3; j++) {
        if (lid0 < 36 ) {
                prev = (lid0 == 0 ? 35 : lid0 - 1);
                next = (lid0 == 35 ? 0 : lid0 + 1);
                hist2[lid0] = (hist[prev] + hist[lid0] + hist[next]) / 3.0f;
        }
        barrier(CLK_LOCAL_MEM_FENCE);
        if (lid0 < 36 ) {
                prev = (lid0 == 0 ? 35 : lid0 - 1);
                next = (lid0 == 35 ? 0 : lid0 + 1);
                hist[lid0] = (hist2[prev] + hist2[lid0] + hist2[next]) / 3.0f;
        }
        barrier(CLK_LOCAL_MEM_FENCE);
}

Do you have an idea why the second version is wrong ??
we tested on 2 platforms (nvidia & AMD-CPU) and debugged the whole day :(

How can you debug this ?

Thanks a lot.

Cheers,
-- 
Jérôme Kieffer
Data analysis unit - ESRF

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to