I just pushed another version to branch opencl.

In my (admittedly) very demanding little benchmark I squeezed out 
another ca. 30% from latency. So a pixelpipe run now goes down to 0.42 
seconds. That's to be compared with 1.53 seconds on my [email protected] CPU.

What I did was changing from image objects to global float buffers in 
nlmeans and denoiseprofile.

I would be interested to learn how this works for NVIDIA systems. Just 
make a comparison between git master and branch opencl.

Ulrich

Am 20.01.2013 18:24, schrieb Max Killer:
> Should have gone to the list as well...
>
>
> -------- Original Message --------
> Subject:      Re: [darktable-devel] improved OpenCL latency
> Date:         Sun, 20 Jan 2013 18:16:57 +0100
> From:         Max Killer <[email protected]>
> To:   [email protected]
>
>
>
> On Sun 20 Jan 2013 05:20:44 PM CET, Ulrich Pegelow wrote:
>> Hi guys,
>>
>> I recently did some work on our OpenCL pixelpipe in order to improve
>> latency. The results are in branch opencl and I am planning to merge
>> them into master in the next days.
>>
>> Before doing that I would like users who run OpenCL to give it a try and
>> tell me their observations.
>>
>> Here on my box with a new Radeon HD7950 :) the effect is quite
>> remarkable. If I process a demanding pixelpipe (nlmeans + denoiseprofile
>> + equalizer + shadhi + ...) current git master would give me a typical
>> timing like that:
>>
>> [dev_process_image] pixel pipeline processing took 1,060 secs (0,120 CPU)
>>
>> With the improved version from branch opencl its about twice as fast:
>>
>> [dev_process_image] pixel pipeline processing took 0,562 secs (0,068 CPU)
>>
>> Timings were done when fully zoomed into the image (100% view).
>>
>> As there are some new configuration parameters needed for finetuning,
>> all this requires a bit of documentation. Further below you can find a
>> draft version, which is meant to go into usermanual later.
>>
>> Best wishes,
>>
>> Ulrich
>>
>>
>>
>> OpenCL performance Optimization
>>
>> There are some configuration parameters in
>> $HOME/.config/darktable/darktablerc that help to finetune your system's
>> OpenCL performance. Performance in this context mostly means the latency
>> of darktable during interactive work, i.e. how long it takes to
>> reprocess your pixelpipe. For a comfortable workflow it is essential to
>> keep latency low.
>>
>> In order to get profiling info you start darktable from a terminal with
>>
>> darktable -d opencl -d perf
>>
>> After each reprocessing of pixelpipe - caused by module parameter
>> change, zooming, panning, etc. - you will get the total time and the
>> time spent in each of our OpenCL kernels. The most reliable value is the
>> total time spent in pixelpipe. [Please note that the timings given for
>> each indiviual module are unreliable when running the OpenCL pixelpipe
>> asynchronously. (see opencl_async_pixelpipe below).]
>>
>> To allow for a fast pixelpipe processing with OpenCL it is essential
>> that we keep the GPU busy. Any interrupts or a stalled data flow will
>> strongly add to the total processing time. This is especially important
>> for the small image buffers we need to handle during interactive work.
>> They can be processed quickly by a fast GPU. However, even short-term
>> stalls of the pixelpipe will easily become a bottleneck.
>>
>> On the other hand darktable's performance during file exports is more or
>> less only governed by the speed of our algorithms and the horse-powers
>> of your GPU. Short-term stalls will not have a noticable effect on the
>> total time of an export.
>>
>>
>> darktable comes with default settings that should deliver a decent GPU
>> performance on most systems. However, if you want to fiddle around a bit
>> by yourself and try to optimize things further, here follows a
>> description of the relevant configuration parameters.
>>
>> opencl_async_pixelpipe
>>
>> This boolean flag controls how often we block the OpenCL pixelpipe and
>> get a status on success/failure of all the kernels that have been run.
>> For optimum latency set this to TRUE, so darktable runs the pixelpipe
>> asynchronously and tries to use as few interrupts as possible. If you
>> experience OpenCL errors like failing kernels, set the parameter to
>> FALSE. darktable will then interrupt after each module so you can more
>> easily isolate the problem.
>>
>> opencl_number_event_handles
>>
>> Event handles are used so we can monitor success/failure of kernels and
>> profiling info even if the pixelpipe is run asynchronously. The number
>> of event handles is a limited resource of your OpenCL driver. Luckily we
>> can recycle them, but there is a limited number we can use at the same
>> time. Unfortunately, there is no way to find out what the resource
>> limits are; so we need to guess. A value of 100 (default) is probably a
>> good choice in most cases. If your driver runs out of free handles you
>> will experience failing OpenCL kernels with error code -5
>> (CL_OUT_OF_RESOURCES); reduce the number in that case. You can also set
>> this parameter to 0, which means that darktable assumes no restriction
>> in the number of event handles; this is normally not a good choice. A
>> value of -1 will block darktable from using any event handles. This will
>> prevent darktable from properly monitoring the success of your OpenCL
>> kernels. Any failures will likely lead to garbled output without
>> darktable taking notice.
>>
>> opencl_synch_cache
>>
>> This parameter, if set to TRUE, will force darktable to fetch image
>> buffers from your GPU after each module and store them in its pixelpipe
>> cache. This is a very resource consuming operation. It only makes sense
>> if you have a rather slow GPU. In that case darktable might in fact save
>> some time when module parameters have changed, as it can go back to some
>> cached intermediate state and reprocess only part of the pixelpipe. In
>> most cases this parameter should be set to FALSE (default).
>>
>> opencl_micro_nap
>>
>> In an ideal case you keep your GPU busy at 100% when reprocessing the
>> pixelpipe. That's good. On the other hand your GPU is also needed to do
>> regular GUI updates. It might happen that there is no sufficient time
>> left for this task. Consequence would by a jerky reaction of your GUI on
>> panning, zooming or when moving sliders. darktable can add small naps
>> into its pixelpipe processing to have the GPU catch some breath and do
>> GUI related stuff. Parameter opencl_micro_nap controls the duration of
>> these naps in microseconds. You need to experiment in order to find an
>> optimum value for your system. Values of 0, 100, 500 and 1000 are good
>> starting points to try. Defaults to zero.
>>
>>
>> ------------------------------------------------------------------------------
>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> MVPs and experts. ON SALE this month only -- learn more at:
>>http://p.sf.net/sfu/learnmore_123012
>> _______________________________________________
>> darktable-devel mailing list
>>[email protected]
>>https://lists.sourceforge.net/lists/listinfo/darktable-devel
>
> Hello Ulrich,
>
> I tried your opencl branch and have some results. A lot of active iops,
> zooming in 100% and panning using the preview window on the left.
>
> With your optimizations:
> [dev_process_image] pixel pipeline processing took 1.803 secs (5.832
> CPU)
> [dev_process_preview] pixel pipeline processing took 0.582 secs (1.584
> CPU)
> [dev_process_image] pixel pipeline processing took 1.192 secs (0.172
> CPU)
> [dev_process_image] pixel pipeline processing took 1.502 secs (0.280
> CPU)
> [dev_process_image] pixel pipeline processing took 1.230 secs (0.132
> CPU)
> [dev_process_image] pixel pipeline processing took 1.631 secs (0.300
> CPU)
> [dev_process_image] pixel pipeline processing took 1.207 secs (0.200
> CPU)
> [dev_process_image] pixel pipeline processing took 1.377 secs (0.200
> CPU)
> [dev_process_image] pixel pipeline processing took 1.239 secs (0.140
> CPU)
> [dev_process_image] pixel pipeline processing took 1.760 secs (0.312
> CPU)
> [dev_process_image] pixel pipeline processing took 1.209 secs (0.092
> CPU)
> [dev_process_image] pixel pipeline processing took 1.587 secs (0.288
> CPU)
> [dev_process_image] pixel pipeline processing took 1.183 secs (0.128
> CPU)
> [dev_process_image] pixel pipeline processing took 1.165 secs (0.128
> CPU)
> [dev_process_image] pixel pipeline processing took 1.125 secs (0.140
> CPU)
>
> Standard master:
> [dev_process_image] pixel pipeline processing took 1.280 secs (2.408
> CPU)
> [dev_process_preview] pixel pipeline processing took 0.574 secs (1.608
> CPU)
> [dev_process_image] pixel pipeline processing took 1.733 secs (0.340
> CPU)
> [dev_process_image] pixel pipeline processing took 1.691 secs (0.316
> CPU)
> [dev_process_image] pixel pipeline processing took 1.773 secs (0.364
> CPU)
> [dev_process_image] pixel pipeline processing took 1.634 secs (0.288
> CPU)
> [dev_process_image] pixel pipeline processing took 1.666 secs (0.280
> CPU)
> [dev_process_image] pixel pipeline processing took 1.678 secs (0.360
> CPU)
> [dev_process_image] pixel pipeline processing took 1.673 secs (0.316
> CPU)
> [dev_process_image] pixel pipeline processing took 1.691 secs (0.228
> CPU)
> [dev_process_image] pixel pipeline processing took 1.658 secs (0.256
> CPU)
> [dev_process_image] pixel pipeline processing took 1.659 secs (0.260
> CPU)
> [dev_process_image] pixel pipeline processing took 1.641 secs (0.300
> CPU)
>
> I attached my logs and my clinfo log as well.
> If you need more test, let me know.
>
> hal
>
>

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
darktable-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/darktable-devel

Reply via email to