heya, [..] > Actually, I even wonder if OpenCL is relevant for this as it's a linear > operation performed on only one pixel at the time over the flatten array. I > wouldn't be surprised if the OpenCL version were slower on some systems than > a good SSE2 version.
maybe on some systems. the thing with opencl is that you need to copy the buffer to the gpu and back at the end. if you have one module that interrupts the pipeline, you'll need to copy more (get your input buffer back to the cpu, process, copy back to gpu). this slows down the whole process significantly, even if the module would run at same speed on both devices. > Considering the code itself, my only remarks are for this line: > for(size_t k = 1; k < (size_t)ch * roi_out->width * roi_out->height; > k++) > First, is there a reason why you are using a size_t type? int or unsigned > would be fine I think, and you wouldn't need a cast. you definitely want 64 bits for the counter if you go width*height (times channel count here, too). size_t happens to be unsigned 64-bit int on many systems. using stdint.h you could use uint64_t to be even clearer and maybe more portable. note that you could have used a nested loop for y and for x together with an openmp annotation "collapse(2)" to get similar results. and yes, please start at 0 :) cheers, jo ___________________________________________________________________________ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org