Hi guys, I recently did some work on our OpenCL pixelpipe in order to improve latency. The results are in branch opencl and I am planning to merge them into master in the next days.
Before doing that I would like users who run OpenCL to give it a try and tell me their observations. Here on my box with a new Radeon HD7950 :) the effect is quite remarkable. If I process a demanding pixelpipe (nlmeans + denoiseprofile + equalizer + shadhi + ...) current git master would give me a typical timing like that: [dev_process_image] pixel pipeline processing took 1,060 secs (0,120 CPU) With the improved version from branch opencl its about twice as fast: [dev_process_image] pixel pipeline processing took 0,562 secs (0,068 CPU) Timings were done when fully zoomed into the image (100% view). As there are some new configuration parameters needed for finetuning, all this requires a bit of documentation. Further below you can find a draft version, which is meant to go into usermanual later. Best wishes, Ulrich OpenCL performance Optimization There are some configuration parameters in $HOME/.config/darktable/darktablerc that help to finetune your system's OpenCL performance. Performance in this context mostly means the latency of darktable during interactive work, i.e. how long it takes to reprocess your pixelpipe. For a comfortable workflow it is essential to keep latency low. In order to get profiling info you start darktable from a terminal with darktable -d opencl -d perf After each reprocessing of pixelpipe - caused by module parameter change, zooming, panning, etc. - you will get the total time and the time spent in each of our OpenCL kernels. The most reliable value is the total time spent in pixelpipe. [Please note that the timings given for each indiviual module are unreliable when running the OpenCL pixelpipe asynchronously. (see opencl_async_pixelpipe below).] To allow for a fast pixelpipe processing with OpenCL it is essential that we keep the GPU busy. Any interrupts or a stalled data flow will strongly add to the total processing time. This is especially important for the small image buffers we need to handle during interactive work. They can be processed quickly by a fast GPU. However, even short-term stalls of the pixelpipe will easily become a bottleneck. On the other hand darktable's performance during file exports is more or less only governed by the speed of our algorithms and the horse-powers of your GPU. Short-term stalls will not have a noticable effect on the total time of an export. darktable comes with default settings that should deliver a decent GPU performance on most systems. However, if you want to fiddle around a bit by yourself and try to optimize things further, here follows a description of the relevant configuration parameters. opencl_async_pixelpipe This boolean flag controls how often we block the OpenCL pixelpipe and get a status on success/failure of all the kernels that have been run. For optimum latency set this to TRUE, so darktable runs the pixelpipe asynchronously and tries to use as few interrupts as possible. If you experience OpenCL errors like failing kernels, set the parameter to FALSE. darktable will then interrupt after each module so you can more easily isolate the problem. opencl_number_event_handles Event handles are used so we can monitor success/failure of kernels and profiling info even if the pixelpipe is run asynchronously. The number of event handles is a limited resource of your OpenCL driver. Luckily we can recycle them, but there is a limited number we can use at the same time. Unfortunately, there is no way to find out what the resource limits are; so we need to guess. A value of 100 (default) is probably a good choice in most cases. If your driver runs out of free handles you will experience failing OpenCL kernels with error code -5 (CL_OUT_OF_RESOURCES); reduce the number in that case. You can also set this parameter to 0, which means that darktable assumes no restriction in the number of event handles; this is normally not a good choice. A value of -1 will block darktable from using any event handles. This will prevent darktable from properly monitoring the success of your OpenCL kernels. Any failures will likely lead to garbled output without darktable taking notice. opencl_synch_cache This parameter, if set to TRUE, will force darktable to fetch image buffers from your GPU after each module and store them in its pixelpipe cache. This is a very resource consuming operation. It only makes sense if you have a rather slow GPU. In that case darktable might in fact save some time when module parameters have changed, as it can go back to some cached intermediate state and reprocess only part of the pixelpipe. In most cases this parameter should be set to FALSE (default). opencl_micro_nap In an ideal case you keep your GPU busy at 100% when reprocessing the pixelpipe. That's good. On the other hand your GPU is also needed to do regular GUI updates. It might happen that there is no sufficient time left for this task. Consequence would by a jerky reaction of your GUI on panning, zooming or when moving sliders. darktable can add small naps into its pixelpipe processing to have the GPU catch some breath and do GUI related stuff. Parameter opencl_micro_nap controls the duration of these naps in microseconds. You need to experiment in order to find an optimum value for your system. Values of 0, 100, 500 and 1000 are good starting points to try. Defaults to zero. ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnmore_123012 _______________________________________________ darktable-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/darktable-devel
