Am 08.04.2017 um 14:29 schrieb Ulrich Pegelow:
> Hi,
>
> I added a bit more flexibility concerning OpenCL device scheduling
> into master. There is a new selection box in preferences (core
> options) that allows to choose among a few typical presets.
>
> The main target are modern systems with very fast GPUs. By default and
> "traditionally" darktable distributes work between CPU and GPU in the
> darkroom: the GPU processes the center (full) view and the CPU is
> responsible for the preview (navigation) panel. Now that GPUs get
> faster and faster there are systems where the GPU so strongly
> outperforms the CPU that it makes more sense to process preview and
> full pixelpipe on the GPU sequentially.
>
> For that reason the "OpenCL scheduling profile" parameter has three
> options:
>
> * "default" describes the old behavior: work is split between GPU and
> CPU and works best for systems where CPU and GPU performance are on a
> similar level.
>
> * "very fast GPU" tackles the case described above: in darkroom view
> both pixelpipes are sequentially processed by the GPU. This is meant
> for GPUs which strongly outperform the CPU on that system.
>
> * "multiple GPUs" is meant for systems with more than one OpenCL
> device so that the full and the preview pixelpipe get processed by
> separate GPUs.
>
> At first startup darktable tries to find the best suited profile based
> on some benchmarking. You may at any time change the profile, this
> takes effect immediately.
>
> I am interested in your experience, both in terms of automatic
> detection of the best suited profile and in terms of overall
> performance. Please note that this is all about system latency and
> perceived system responsiveness in the darkroom view. Calling
> darktable with '-d perf' will only give you limited insights so you
> need to mostly rely on your own judgement.
>

Hi Ulrich,

1. gorgeous, thank you very much!

For me, the benchmarking seems to DTRT™ (do the right thing), it picks
the "very fast GPU" profile with a 2016 NVidia GeForce 1060 GTX 6 GB and
an old 2009 AMD Phenom II X4 2.5 GHz 65 W Quadcore, code is compiled
with -O2 -march=native, OpenMP and OpenCL enabled, and I get this:

[opencl_init] here are the internal numbers and names of OpenCL devices
available to darktable:
[opencl_init]           0       'GeForce GTX 1060 6GB'
[opencl_init] FINALLY: opencl is AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is ON.
[opencl_create_kernel] successfully loaded kernel `zero' (0) for device 0
[...]
[opencl_init] benchmarking results: 0.029428 seconds for fastest GPU
versus 0.382860 seconds for CPU.
[opencl_init] set scheduling profile for very fast GPU.
[opencl_priorities] these are your device priorities:
[opencl_priorities]             image   preview export  thumbnail
[opencl_priorities]             0       0       0       0
[opencl_priorities] show if opencl use is mandatory for a given pixelpipe:
[opencl_priorities]             image   preview export  thumbnail
[opencl_priorities]             1       1       1       1
[opencl_synchronization_timeout] synchronization timout set to 0

2. What bothers me though are the timeouts and their defaults. In
practice, the darktable works ok-ish, but the lighttable does not. When
a truckload full of small thumbnails (say, lighttable zoomed out to show
10 columns of images) needs to be regenerated for the lighttable, it
*appears* (not yet corroborated with measurements) that bumping up
timeouts considerably helps to avoid latencies, as though things were
deadlocking and waiting for the timer to break the lock. Might be an
internal issue with the synchronization though - how fine granular is
the re-attempt? Is it sleep-and-retry, or does it use some form of
semaphores and signalling at the system level between threads?

I am running with these - possibly ridiculously high - timeout settings
(15 s). This is normally enough to process an entire export including a
few CPU segments (say, raw denoise - I need it on some high-ISO images,
ISO 6400+, to avoid black blotches or green stipples, but I have some
concerns about its quality altogether which don't belong in this thread).

opencl_mandatory_timeout=3000
pixelpipe_synchronization_timeout=3000

3. Would it be sensible to set one of these timeouts considerably higher
than the other?

4. Can we have -d perf log when timeouts occur that change the
scheduling decision (i. e. if a timeout causes a job to be dispatched to
a different device, with original intent, and dispatch target), and
4b. possibly a complete scheduler trace including all dispatch attempts?
Might help debug in the long run.


___________________________________________________________________________
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org

Reply via email to