Dear PyOpenCL community,
I have two implementations of a same algorithm using shared memory and I do not
understand why one works and not the other ?
The wrong one is the second !!!
hist is cyclic and we want to average over 3 neighbours, everything 6 times.
WORKGROUP_SIZE = 128 in this case
__local volatile float hist[36];
__local volatile float hist2[WORKGROUP_SIZE];
int lid0 = get_local_id(0);
/*
Apply smoothing 6 times
*/
for (j=0; j<6; j++) {
if (lid0 == 0) {
hist2[0] = hist[0]; //save unmodified hist
hist[0] = (hist[35] + hist[0] + hist[1]) / 3.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
if (0 < lid0 && lid0 < 35) {
hist2[lid0]=hist[lid0];
hist[lid0] = (hist2[lid0-1] + hist[lid0] + hist[lid0+1]) / 3.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
if (lid0 == 35) {
hist[35] = (hist2[34] + hist[35] + hist[0]) / 3.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
for (j=0; j<3; j++) {
if (lid0 < 36 ) {
prev = (lid0 == 0 ? 35 : lid0 - 1);
next = (lid0 == 35 ? 0 : lid0 + 1);
hist2[lid0] = (hist[prev] + hist[lid0] + hist[next]) / 3.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
if (lid0 < 36 ) {
prev = (lid0 == 0 ? 35 : lid0 - 1);
next = (lid0 == 35 ? 0 : lid0 + 1);
hist[lid0] = (hist2[prev] + hist2[lid0] + hist2[next]) / 3.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
Do you have an idea why the second version is wrong ??
we tested on 2 platforms (nvidia & AMD-CPU) and debugged the whole day :(
How can you debug this ?
Thanks a lot.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl