heya,

[..]
> Actually, I even wonder if OpenCL is relevant for this as it's a linear 
> operation performed on only one pixel at the time over the flatten array. I 
> wouldn't be surprised if the OpenCL version were slower on some systems than 
> a good SSE2 version.

maybe on some systems. the thing with opencl is that you need to copy
the buffer to the gpu and back at the end. if you have one module that
interrupts the pipeline, you'll need to copy more (get your input
buffer back to the cpu, process, copy back to gpu). this slows down
the whole process significantly, even if the module would run at same
speed on both devices.


> Considering the code itself, my only remarks are for this line:
>       for(size_t k = 1; k < (size_t)ch * roi_out->width * roi_out->height; 
> k++)
> First, is there a reason why you are using a size_t type? int or unsigned 
> would be fine I think, and you wouldn't need a cast.

you definitely want 64 bits for the counter if you go width*height
(times channel count here, too). size_t happens to be unsigned 64-bit
int on many systems. using stdint.h you could use uint64_t to be even
clearer and maybe more portable. note that you could have used a
nested loop for y and for x together with an openmp annotation
"collapse(2)" to get similar results.

and yes, please start at 0 :)

cheers,
 jo
___________________________________________________________________________
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org

Reply via email to