Re: [darktable-dev] OpenCL scheduling profiles
Am 09.04.2017 um 17:29 schrieb Matthias Andree: What's your number of background threads (fourth entry in core options)? It's currently set to 2, and if removed from the configuration file with darktable stopped, will revert to 2 when darktable gets restarted and closed next time. Note I see this quite often, but I don't see where that time comes from: [dev] took 4,787 secs (5,388 CPU) to load the image. [dev] took 4,787 secs (5,388 CPU) to load the image. You might try higher values like six or eight. Main advantage of many background threads is hiding I/O latency and that might be a main issue here. Looking at iotop it appears that the prime concern however is that it maxes out the external USB3 HDD reading from NTFS... reducing to 1 thread stalled the UI at first but came back with some 30 thumbnails all at once. Might easily be that the main issue on your system is stalling I/O (for whatever reason). Please make some experiments from a very fast storage medium (SSD, ram disk) to find out if this is the main cause. I sometimes see modules like highlite reconstruction, CA correction, or demosaic ("Entrastern") still being dispatched to the CPU, which is very slow, when it's normally dispatched to the GPU. Statistics below. It seems the only module that is supposed to be on the CPU is Gamma, and it's so blazingly fast that we don't need to care. Sorry for the German, but you get the idea. This is only from launching darktable in lighttable view: There are some modules where no OpenCL code is available (Amaze demosaic, raw denoise, color input/output profile with LittleCMS2) but I cannot say if this is the main cause here. At least several of the modules from the output below have OpenCL support. Please try further to isolate if slow CPU processing correlates with specific images and their history stacks. $ grep 'on CPU' /tmp/dt-perf-opencl.log | sort -k7 | uniq -f6 -c | sort -nr 124 [dev_pixelpipe] took 0,000 secs (0,000 CPU) processed `Gamma' on CPU, blended on CPU [thumbnail] 6 [dev_pixelpipe] took 0,026 secs (0,076 CPU) processed `Entrastern' on CPU, blended on CPU [thumbnail] 5 [dev_pixelpipe] took 0,276 secs (0,832 CPU) processed `Chromatische Aberration' on CPU, blended on CPU [thumbnail] 5 [dev_pixelpipe] took 0,019 secs (0,060 CPU) processed `Spitzlicht-Rekonstruktion' on CPU, blended on CPU [thumbnail] 2 [dev_pixelpipe] took 0,118 secs (0,348 CPU) processed `Raw-Schwarz-/Weißpunkt' on CPU, blended on CPU [thumbnail] 2 [dev_pixelpipe] took 0,052 secs (0,140 CPU) processed `Weißabgleich' on CPU, blended on CPU [thumbnail] 2 [dev_pixelpipe] took 0,023 secs (0,036 CPU) processed `Tonemapping' on CPU, blended on CPU [thumbnail] 2 [dev_pixelpipe] took 0,008 secs (0,016 CPU) processed `Objektivkorrektur' on CPU, blended on CPU [thumbnail] 2 [dev_pixelpipe] took 0,001 secs (0,004 CPU) processed `Ausgabefarbprofil' on CPU, blended on CPU [thumbnail] 2 [dev_pixelpipe] took 0,001 secs (0,000 CPU) processed `Eingabefarbprofil' on CPU, blended on CPU [thumbnail] 2 [dev_pixelpipe] took 0,000 secs (0,000 CPU) processed `Schärfen' on CPU, blended on CPU [thumbnail] 2 [dev_pixelpipe] took 0,000 secs (0,000 CPU) processed `Basiskurve' on CPU, blended on CPU [thumbnail] 1 [dev_pixelpipe] took 3,126 secs (9,444 CPU) processed `Raw-Entrauschen' on CPU, blended on CPU [thumbnail] 1 [dev_pixelpipe] took 0,000 secs (0,000 CPU) processed `Drehung' on CPU, blended on CPU [thumbnail] ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
Re: [darktable-dev] All of a sudden, darktable Thread Seg faults
On Sun, Apr 9, 2017 at 3:43 PM, Ulrich Pegelowwrote: > Am 09.04.2017 um 09:31 schrieb Roman Lebedev: >> >> On Sun, Apr 9, 2017 at 9:59 AM, Ulrich Pegelow >> wrote: >>> >>> Am 08.04.2017 um 20:04 schrieb Roman Lebedev: Well, that is *very* strange indeed. If it *reliably* happens for you, then maybe you could also bisect this within the submodule itself? >>> >>> Very clear result: >> >> Aha, now that makes rather no sense. >> It is likely caused by just one raw image, if you can find it, i'll >> take it from here. >> >>> 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 is the first bad commit >>> commit 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 >>> Author: Roman Lebedev >>> Date: Sat Apr 1 13:11:57 2017 +0300 >>> >>> ThrowException(): and how about this? >>> >>> :04 04 84dd635a545bf913c916bc075f152a6718f05b1b >>> ba3c956bb4ffb0b8f54b55ac4aaaf879334f7327 M src >> >> This commit was reverted in the very next commit, so what is the next >> bad commit? >> > > Looks like none of the following commits solves the issue. > > However, looking at the changes in question I found that the following patch > in master brings darktable back to normal: > > diff --git a/src/librawspeed/common/RawspeedException.h > b/src/librawspeed/common/RawspeedException.h > index 692d3f9..b0ebee6 100644 > --- a/src/librawspeed/common/RawspeedException.h > +++ b/src/librawspeed/common/RawspeedException.h > @@ -32,7 +32,7 @@ > namespace RawSpeed { > > template > -[[noreturn]] static inline void __attribute__((noreturn, format(printf, 1, > 2))) > +[[noreturn]] void __attribute__((noreturn, format(printf, 1, 2))) > ThrowException(const char* fmt, ...) { >static constexpr size_t bufSize = 8192; > #if defined(HAVE_THREAD_LOCAL) > > That means reverting the change from commit > 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 which has not yet been reverted in > commit 0967e3c8a528cca0800cc5289cba5c212a385a6b. > > Don't ask me And, pushed. > Ulrich > > > > ___ > darktable developer mailing list > to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org > ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
Re: [darktable-dev] OpenCL scheduling profiles
Am 09.04.2017 um 11:00 schrieb Matthias Andree: Am 08.04.2017 um 14:29 schrieb Ulrich Pegelow: 2. What bothers me though are the timeouts and their defaults. In practice, the darktable works ok-ish, but the lighttable does not. When a truckload full of small thumbnails (say, lighttable zoomed out to show 10 columns of images) needs to be regenerated for the lighttable, it *appears* (not yet corroborated with measurements) that bumping up timeouts considerably helps to avoid latencies, as though things were deadlocking and waiting for the timer to break the lock. Might be an internal issue with the synchronization though - how fine granular is the re-attempt? Is it sleep-and-retry, or does it use some form of semaphores and signalling at the system level between threads? What's your number of background threads (fourth entry in core options)? ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
Re: [darktable-dev] All of a sudden, darktable Thread Seg faults
On Sun, Apr 9, 2017 at 3:43 PM, Ulrich Pegelowwrote: > Am 09.04.2017 um 09:31 schrieb Roman Lebedev: >> >> On Sun, Apr 9, 2017 at 9:59 AM, Ulrich Pegelow >> wrote: >>> >>> Am 08.04.2017 um 20:04 schrieb Roman Lebedev: Well, that is *very* strange indeed. If it *reliably* happens for you, then maybe you could also bisect this within the submodule itself? >>> >>> Very clear result: >> >> Aha, now that makes rather no sense. >> It is likely caused by just one raw image, if you can find it, i'll >> take it from here. >> >>> 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 is the first bad commit >>> commit 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 >>> Author: Roman Lebedev >>> Date: Sat Apr 1 13:11:57 2017 +0300 >>> >>> ThrowException(): and how about this? >>> >>> :04 04 84dd635a545bf913c916bc075f152a6718f05b1b >>> ba3c956bb4ffb0b8f54b55ac4aaaf879334f7327 M src >> >> This commit was reverted in the very next commit, so what is the next >> bad commit? >> > > Looks like none of the following commits solves the issue. > > However, looking at the changes in question I found that the following patch > in master brings darktable back to normal: > > diff --git a/src/librawspeed/common/RawspeedException.h > b/src/librawspeed/common/RawspeedException.h > index 692d3f9..b0ebee6 100644 > --- a/src/librawspeed/common/RawspeedException.h > +++ b/src/librawspeed/common/RawspeedException.h > @@ -32,7 +32,7 @@ > namespace RawSpeed { > > template > -[[noreturn]] static inline void __attribute__((noreturn, format(printf, 1, > 2))) > +[[noreturn]] void __attribute__((noreturn, format(printf, 1, 2))) > ThrowException(const char* fmt, ...) { >static constexpr size_t bufSize = 8192; > #if defined(HAVE_THREAD_LOCAL) > > That means reverting the change from commit > 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 which has not yet been reverted in > commit 0967e3c8a528cca0800cc5289cba5c212a385a6b. > > Don't ask me Okay, thank you for debugging this :) I'll try to push that later today. > Ulrich Roman. > ___ > darktable developer mailing list > to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org > ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
Re: [darktable-dev] All of a sudden, darktable Thread Seg faults
Am 09.04.2017 um 09:31 schrieb Roman Lebedev: On Sun, Apr 9, 2017 at 9:59 AM, Ulrich Pegelowwrote: Am 08.04.2017 um 20:04 schrieb Roman Lebedev: Well, that is *very* strange indeed. If it *reliably* happens for you, then maybe you could also bisect this within the submodule itself? Very clear result: Aha, now that makes rather no sense. It is likely caused by just one raw image, if you can find it, i'll take it from here. 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 is the first bad commit commit 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 Author: Roman Lebedev Date: Sat Apr 1 13:11:57 2017 +0300 ThrowException(): and how about this? :04 04 84dd635a545bf913c916bc075f152a6718f05b1b ba3c956bb4ffb0b8f54b55ac4aaaf879334f7327 M src This commit was reverted in the very next commit, so what is the next bad commit? Looks like none of the following commits solves the issue. However, looking at the changes in question I found that the following patch in master brings darktable back to normal: diff --git a/src/librawspeed/common/RawspeedException.h b/src/librawspeed/common/RawspeedException.h index 692d3f9..b0ebee6 100644 --- a/src/librawspeed/common/RawspeedException.h +++ b/src/librawspeed/common/RawspeedException.h @@ -32,7 +32,7 @@ namespace RawSpeed { template -[[noreturn]] static inline void __attribute__((noreturn, format(printf, 1, 2))) +[[noreturn]] void __attribute__((noreturn, format(printf, 1, 2))) ThrowException(const char* fmt, ...) { static constexpr size_t bufSize = 8192; #if defined(HAVE_THREAD_LOCAL) That means reverting the change from commit 7f087325d09e2b6d4ecc392f7aee44dd29fafe62 which has not yet been reverted in commit 0967e3c8a528cca0800cc5289cba5c212a385a6b. Don't ask me Ulrich ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
Re: [darktable-dev] OpenCL scheduling profiles
Am 08.04.2017 um 14:29 schrieb Ulrich Pegelow: > Hi, > > I added a bit more flexibility concerning OpenCL device scheduling > into master. There is a new selection box in preferences (core > options) that allows to choose among a few typical presets. > > The main target are modern systems with very fast GPUs. By default and > "traditionally" darktable distributes work between CPU and GPU in the > darkroom: the GPU processes the center (full) view and the CPU is > responsible for the preview (navigation) panel. Now that GPUs get > faster and faster there are systems where the GPU so strongly > outperforms the CPU that it makes more sense to process preview and > full pixelpipe on the GPU sequentially. > > For that reason the "OpenCL scheduling profile" parameter has three > options: > > * "default" describes the old behavior: work is split between GPU and > CPU and works best for systems where CPU and GPU performance are on a > similar level. > > * "very fast GPU" tackles the case described above: in darkroom view > both pixelpipes are sequentially processed by the GPU. This is meant > for GPUs which strongly outperform the CPU on that system. > > * "multiple GPUs" is meant for systems with more than one OpenCL > device so that the full and the preview pixelpipe get processed by > separate GPUs. > > At first startup darktable tries to find the best suited profile based > on some benchmarking. You may at any time change the profile, this > takes effect immediately. > > I am interested in your experience, both in terms of automatic > detection of the best suited profile and in terms of overall > performance. Please note that this is all about system latency and > perceived system responsiveness in the darkroom view. Calling > darktable with '-d perf' will only give you limited insights so you > need to mostly rely on your own judgement. > Hi Ulrich, 1. gorgeous, thank you very much! For me, the benchmarking seems to DTRT™ (do the right thing), it picks the "very fast GPU" profile with a 2016 NVidia GeForce 1060 GTX 6 GB and an old 2009 AMD Phenom II X4 2.5 GHz 65 W Quadcore, code is compiled with -O2 -march=native, OpenMP and OpenCL enabled, and I get this: [opencl_init] here are the internal numbers and names of OpenCL devices available to darktable: [opencl_init] 0 'GeForce GTX 1060 6GB' [opencl_init] FINALLY: opencl is AVAILABLE on this system. [opencl_init] initial status of opencl enabled flag is ON. [opencl_create_kernel] successfully loaded kernel `zero' (0) for device 0 [...] [opencl_init] benchmarking results: 0.029428 seconds for fastest GPU versus 0.382860 seconds for CPU. [opencl_init] set scheduling profile for very fast GPU. [opencl_priorities] these are your device priorities: [opencl_priorities] image preview export thumbnail [opencl_priorities] 0 0 0 0 [opencl_priorities] show if opencl use is mandatory for a given pixelpipe: [opencl_priorities] image preview export thumbnail [opencl_priorities] 1 1 1 1 [opencl_synchronization_timeout] synchronization timout set to 0 2. What bothers me though are the timeouts and their defaults. In practice, the darktable works ok-ish, but the lighttable does not. When a truckload full of small thumbnails (say, lighttable zoomed out to show 10 columns of images) needs to be regenerated for the lighttable, it *appears* (not yet corroborated with measurements) that bumping up timeouts considerably helps to avoid latencies, as though things were deadlocking and waiting for the timer to break the lock. Might be an internal issue with the synchronization though - how fine granular is the re-attempt? Is it sleep-and-retry, or does it use some form of semaphores and signalling at the system level between threads? I am running with these - possibly ridiculously high - timeout settings (15 s). This is normally enough to process an entire export including a few CPU segments (say, raw denoise - I need it on some high-ISO images, ISO 6400+, to avoid black blotches or green stipples, but I have some concerns about its quality altogether which don't belong in this thread). opencl_mandatory_timeout=3000 pixelpipe_synchronization_timeout=3000 3. Would it be sensible to set one of these timeouts considerably higher than the other? 4. Can we have -d perf log when timeouts occur that change the scheduling decision (i. e. if a timeout causes a job to be dispatched to a different device, with original intent, and dispatch target), and 4b. possibly a complete scheduler trace including all dispatch attempts? Might help debug in the long run. ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org