I just pushed another version to branch opencl. In my (admittedly) very demanding little benchmark I squeezed out another ca. 30% from latency. So a pixelpipe run now goes down to 0.42 seconds. That's to be compared with 1.53 seconds on my [email protected] CPU.
What I did was changing from image objects to global float buffers in nlmeans and denoiseprofile. I would be interested to learn how this works for NVIDIA systems. Just make a comparison between git master and branch opencl. Ulrich Am 20.01.2013 18:24, schrieb Max Killer: > Should have gone to the list as well... > > > -------- Original Message -------- > Subject: Re: [darktable-devel] improved OpenCL latency > Date: Sun, 20 Jan 2013 18:16:57 +0100 > From: Max Killer <[email protected]> > To: [email protected] > > > > On Sun 20 Jan 2013 05:20:44 PM CET, Ulrich Pegelow wrote: >> Hi guys, >> >> I recently did some work on our OpenCL pixelpipe in order to improve >> latency. The results are in branch opencl and I am planning to merge >> them into master in the next days. >> >> Before doing that I would like users who run OpenCL to give it a try and >> tell me their observations. >> >> Here on my box with a new Radeon HD7950 :) the effect is quite >> remarkable. If I process a demanding pixelpipe (nlmeans + denoiseprofile >> + equalizer + shadhi + ...) current git master would give me a typical >> timing like that: >> >> [dev_process_image] pixel pipeline processing took 1,060 secs (0,120 CPU) >> >> With the improved version from branch opencl its about twice as fast: >> >> [dev_process_image] pixel pipeline processing took 0,562 secs (0,068 CPU) >> >> Timings were done when fully zoomed into the image (100% view). >> >> As there are some new configuration parameters needed for finetuning, >> all this requires a bit of documentation. Further below you can find a >> draft version, which is meant to go into usermanual later. >> >> Best wishes, >> >> Ulrich >> >> >> >> OpenCL performance Optimization >> >> There are some configuration parameters in >> $HOME/.config/darktable/darktablerc that help to finetune your system's >> OpenCL performance. Performance in this context mostly means the latency >> of darktable during interactive work, i.e. how long it takes to >> reprocess your pixelpipe. For a comfortable workflow it is essential to >> keep latency low. >> >> In order to get profiling info you start darktable from a terminal with >> >> darktable -d opencl -d perf >> >> After each reprocessing of pixelpipe - caused by module parameter >> change, zooming, panning, etc. - you will get the total time and the >> time spent in each of our OpenCL kernels. The most reliable value is the >> total time spent in pixelpipe. [Please note that the timings given for >> each indiviual module are unreliable when running the OpenCL pixelpipe >> asynchronously. (see opencl_async_pixelpipe below).] >> >> To allow for a fast pixelpipe processing with OpenCL it is essential >> that we keep the GPU busy. Any interrupts or a stalled data flow will >> strongly add to the total processing time. This is especially important >> for the small image buffers we need to handle during interactive work. >> They can be processed quickly by a fast GPU. However, even short-term >> stalls of the pixelpipe will easily become a bottleneck. >> >> On the other hand darktable's performance during file exports is more or >> less only governed by the speed of our algorithms and the horse-powers >> of your GPU. Short-term stalls will not have a noticable effect on the >> total time of an export. >> >> >> darktable comes with default settings that should deliver a decent GPU >> performance on most systems. However, if you want to fiddle around a bit >> by yourself and try to optimize things further, here follows a >> description of the relevant configuration parameters. >> >> opencl_async_pixelpipe >> >> This boolean flag controls how often we block the OpenCL pixelpipe and >> get a status on success/failure of all the kernels that have been run. >> For optimum latency set this to TRUE, so darktable runs the pixelpipe >> asynchronously and tries to use as few interrupts as possible. If you >> experience OpenCL errors like failing kernels, set the parameter to >> FALSE. darktable will then interrupt after each module so you can more >> easily isolate the problem. >> >> opencl_number_event_handles >> >> Event handles are used so we can monitor success/failure of kernels and >> profiling info even if the pixelpipe is run asynchronously. The number >> of event handles is a limited resource of your OpenCL driver. Luckily we >> can recycle them, but there is a limited number we can use at the same >> time. Unfortunately, there is no way to find out what the resource >> limits are; so we need to guess. A value of 100 (default) is probably a >> good choice in most cases. If your driver runs out of free handles you >> will experience failing OpenCL kernels with error code -5 >> (CL_OUT_OF_RESOURCES); reduce the number in that case. You can also set >> this parameter to 0, which means that darktable assumes no restriction >> in the number of event handles; this is normally not a good choice. A >> value of -1 will block darktable from using any event handles. This will >> prevent darktable from properly monitoring the success of your OpenCL >> kernels. Any failures will likely lead to garbled output without >> darktable taking notice. >> >> opencl_synch_cache >> >> This parameter, if set to TRUE, will force darktable to fetch image >> buffers from your GPU after each module and store them in its pixelpipe >> cache. This is a very resource consuming operation. It only makes sense >> if you have a rather slow GPU. In that case darktable might in fact save >> some time when module parameters have changed, as it can go back to some >> cached intermediate state and reprocess only part of the pixelpipe. In >> most cases this parameter should be set to FALSE (default). >> >> opencl_micro_nap >> >> In an ideal case you keep your GPU busy at 100% when reprocessing the >> pixelpipe. That's good. On the other hand your GPU is also needed to do >> regular GUI updates. It might happen that there is no sufficient time >> left for this task. Consequence would by a jerky reaction of your GUI on >> panning, zooming or when moving sliders. darktable can add small naps >> into its pixelpipe processing to have the GPU catch some breath and do >> GUI related stuff. Parameter opencl_micro_nap controls the duration of >> these naps in microseconds. You need to experiment in order to find an >> optimum value for your system. Values of 0, 100, 500 and 1000 are good >> starting points to try. Defaults to zero. >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >>http://p.sf.net/sfu/learnmore_123012 >> _______________________________________________ >> darktable-devel mailing list >>[email protected] >>https://lists.sourceforge.net/lists/listinfo/darktable-devel > > Hello Ulrich, > > I tried your opencl branch and have some results. A lot of active iops, > zooming in 100% and panning using the preview window on the left. > > With your optimizations: > [dev_process_image] pixel pipeline processing took 1.803 secs (5.832 > CPU) > [dev_process_preview] pixel pipeline processing took 0.582 secs (1.584 > CPU) > [dev_process_image] pixel pipeline processing took 1.192 secs (0.172 > CPU) > [dev_process_image] pixel pipeline processing took 1.502 secs (0.280 > CPU) > [dev_process_image] pixel pipeline processing took 1.230 secs (0.132 > CPU) > [dev_process_image] pixel pipeline processing took 1.631 secs (0.300 > CPU) > [dev_process_image] pixel pipeline processing took 1.207 secs (0.200 > CPU) > [dev_process_image] pixel pipeline processing took 1.377 secs (0.200 > CPU) > [dev_process_image] pixel pipeline processing took 1.239 secs (0.140 > CPU) > [dev_process_image] pixel pipeline processing took 1.760 secs (0.312 > CPU) > [dev_process_image] pixel pipeline processing took 1.209 secs (0.092 > CPU) > [dev_process_image] pixel pipeline processing took 1.587 secs (0.288 > CPU) > [dev_process_image] pixel pipeline processing took 1.183 secs (0.128 > CPU) > [dev_process_image] pixel pipeline processing took 1.165 secs (0.128 > CPU) > [dev_process_image] pixel pipeline processing took 1.125 secs (0.140 > CPU) > > Standard master: > [dev_process_image] pixel pipeline processing took 1.280 secs (2.408 > CPU) > [dev_process_preview] pixel pipeline processing took 0.574 secs (1.608 > CPU) > [dev_process_image] pixel pipeline processing took 1.733 secs (0.340 > CPU) > [dev_process_image] pixel pipeline processing took 1.691 secs (0.316 > CPU) > [dev_process_image] pixel pipeline processing took 1.773 secs (0.364 > CPU) > [dev_process_image] pixel pipeline processing took 1.634 secs (0.288 > CPU) > [dev_process_image] pixel pipeline processing took 1.666 secs (0.280 > CPU) > [dev_process_image] pixel pipeline processing took 1.678 secs (0.360 > CPU) > [dev_process_image] pixel pipeline processing took 1.673 secs (0.316 > CPU) > [dev_process_image] pixel pipeline processing took 1.691 secs (0.228 > CPU) > [dev_process_image] pixel pipeline processing took 1.658 secs (0.256 > CPU) > [dev_process_image] pixel pipeline processing took 1.659 secs (0.260 > CPU) > [dev_process_image] pixel pipeline processing took 1.641 secs (0.300 > CPU) > > I attached my logs and my clinfo log as well. > If you need more test, let me know. > > hal > > ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d _______________________________________________ darktable-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/darktable-devel
