[darktable-devel] improved OpenCL latency

Ulrich Pegelow Sun, 20 Jan 2013 08:21:59 -0800

Hi guys,

I recently did some work on our OpenCL pixelpipe in order to improve 
latency. The results are in branch opencl and I am planning to merge 
them into master in the next days.


Before doing that I would like users who run OpenCL to give it a try and 
tell me their observations.

Here on my box with a new Radeon HD7950 :) the effect is quite 
remarkable. If I process a demanding pixelpipe (nlmeans + denoiseprofile 
+ equalizer + shadhi + ...) current git master would give me a typical 
timing like that:

[dev_process_image] pixel pipeline processing took 1,060 secs (0,120 CPU)

With the improved version from branch opencl its about twice as fast:

[dev_process_image] pixel pipeline processing took 0,562 secs (0,068 CPU)

Timings were done when fully zoomed into the image (100% view).

As there are some new configuration parameters needed for finetuning, 
all this requires a bit of documentation. Further below you can find a 
draft version, which is meant to go into usermanual later.

Best wishes,

Ulrich



OpenCL performance Optimization

There are some configuration parameters in 
$HOME/.config/darktable/darktablerc that help to finetune your system's 
OpenCL performance. Performance in this context mostly means the latency 
of darktable during interactive work, i.e. how long it takes to 
reprocess your pixelpipe. For a comfortable workflow it is essential to 
keep latency low.

In order to get profiling info you start darktable from a terminal with

darktable -d opencl -d perf

After each reprocessing of pixelpipe - caused by module parameter 
change, zooming, panning, etc. - you will get the total time and the 
time spent in each of our OpenCL kernels. The most reliable value is the 
total time spent in pixelpipe. [Please note that the timings given for 
each indiviual module are unreliable when running the OpenCL pixelpipe 
asynchronously. (see opencl_async_pixelpipe below).]

To allow for a fast pixelpipe processing with OpenCL it is essential 
that we keep the GPU busy. Any interrupts or a stalled data flow will 
strongly add to the total processing time. This is especially important 
for the small image buffers we need to handle during interactive work. 
They can be processed quickly by a fast GPU. However, even short-term 
stalls of the pixelpipe will easily become a bottleneck.

On the other hand darktable's performance during file exports is more or 
less only governed by the speed of our algorithms and the horse-powers 
of your GPU. Short-term stalls will not have a noticable effect on the 
total time of an export.


darktable comes with default settings that should deliver a decent GPU 
performance on most systems. However, if you want to fiddle around a bit 
by yourself and try to optimize things further, here follows a 
description of the relevant configuration parameters.

opencl_async_pixelpipe

This boolean flag controls how often we block the OpenCL pixelpipe and 
get a status on success/failure of all the kernels that have been run. 
For optimum latency set this to TRUE, so darktable runs the pixelpipe 
asynchronously and tries to use as few interrupts as possible. If you 
experience OpenCL errors like failing kernels, set the parameter to 
FALSE. darktable will then interrupt after each module so you can more 
easily isolate the problem.

opencl_number_event_handles

Event handles are used so we can monitor success/failure of kernels and 
profiling info even if the pixelpipe is run asynchronously. The number 
of event handles is a limited resource of your OpenCL driver. Luckily we 
can recycle them, but there is a limited number we can use at the same 
time. Unfortunately, there is no way to find out what the resource 
limits are; so we need to guess. A value of 100 (default) is probably a 
good choice in most cases. If your driver runs out of free handles you 
will experience failing OpenCL kernels with error code -5 
(CL_OUT_OF_RESOURCES); reduce the number in that case. You can also set 
this parameter to 0, which means that darktable assumes no restriction 
in the number of event handles; this is normally not a good choice. A 
value of -1 will block darktable from using any event handles. This will 
prevent darktable from properly monitoring the success of your OpenCL 
kernels. Any failures will likely lead to garbled output without 
darktable taking notice.

opencl_synch_cache

This parameter, if set to TRUE, will force darktable to fetch image 
buffers from your GPU after each module and store them in its pixelpipe 
cache. This is a very resource consuming operation. It only makes sense 
if you have a rather slow GPU. In that case darktable might in fact save 
some time when module parameters have changed, as it can go back to some 
cached intermediate state and reprocess only part of the pixelpipe. In 
most cases this parameter should be set to FALSE (default).

opencl_micro_nap

In an ideal case you keep your GPU busy at 100% when reprocessing the 
pixelpipe. That's good. On the other hand your GPU is also needed to do 
regular GUI updates. It might happen that there is no sufficient time 
left for this task. Consequence would by a jerky reaction of your GUI on 
panning, zooming or when moving sliders. darktable can add small naps 
into its pixelpipe processing to have the GPU catch some breath and do 
GUI related stuff. Parameter opencl_micro_nap controls the duration of 
these naps in microseconds. You need to experiment in order to find an 
optimum value for your system. Values of 0, 100, 500 and 1000 are good 
starting points to try. Defaults to zero.


------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_123012
_______________________________________________
darktable-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/darktable-devel

[darktable-devel] improved OpenCL latency

Reply via email to