On 13-10-05 03:04 PM, Christian Kanzian wrote:
> Am Samstag, 5. Oktober 2013 schrieb Patrick Shanahan:
>> * Ochal Christophe <[email protected]> [10-05-13 14:42]:
>>> On 2013-10-05 20:32, Tobias Ellinghaus wrote:
>>>> Am Samstag, 5. Oktober 2013, 19:44:37 schrieb Ochal Christophe:
>>>>
>>>> [...]
>>>>
>>>>> My desktop consists of an i7 with 16GB ram, NVidia card, mSATA SSD
>>>>> (boot disk) and 2 large SATA drives in mirroring mode and it still
>>>>> takes 15 to 30 seconds to process an image from my Nikon D00
>>>>
>>>> Is that for exporting or clicking around in darkroom mode?
>>>
>>> Exporting, I'd go nuts otherwise :D
>>
>> odd, my i7 w/12gb and system on ssd w/2 large sata drives @ raid 1 takes
>> 5-6 sec's per image and *most* images are 24mb/raw from D7100.  Images
>> from my 12mb/raw D3 take 3-5 sec's.
>>> But since I usually process 600-1500 images at a time and export at the
>> end, export time makes little difference to me as I usually walk away or
>> do something else in the mean tine.
>>
>> Still intending expanding ram to 24mb in the very near future.  I believe
>> the amount of ram on your video card is a large determinator re speed.
>
> So for my setup (i7 2600; 16 GB RAM; GeForce GT 640 2 GB RAM with nivida-
> driver 325.15; no SSD; Pentax K5 DNG) it take from 2 to 3 sec's. 15-20 sec's
> with denoising.
>
> Half a year ago I replaced my nvidia card (512 MB) with a the GT 640 (2024 MB)
> to get OpenCL working, but OpenCL is slower than CPU only in my case. Also X11
> is not responsive during export with OpenCL enable. I tried different setting
> as suggest, but with no big changes.
>
> Without OpenCl: [dev_process_export] pixel pipeline processing took 13.919
> secs (89.910 CPU)
>
> With OpenCL[dev_process_export] pixel pipeline processing took 24.525 secs
> (17.977 CPU)
>
> Export time heavily depends on the applied image settings/modules. To compare
> different setups a kind of benchmark - same RAW and process - should be done.
>
>
> I would take:
> - 16 GB RAM
> - i7 CPU
> - recent nvidia card with 2 GB RAM
> - SSD for the system
> - 2x2TB drives with RAID 1
>
> All the best,
> Christian
>
> chri@chk64:~$ darktable -d opencl -d perf
> [opencl_init] device 0 `GeForce GT 640' has sm_20 support.
> [opencl_init] device 0 `GeForce GT 640' supports image sizes of 32768 x 32768
> [opencl_init] device 0 `GeForce GT 640' allows GPU memory allocations of up to
> 511MB
> [opencl_init] device 0: GeForce GT 640
>       GLOBAL_MEM_SIZE:          2047MB
>       MAX_WORK_GROUP_SIZE:      1024
>       MAX_WORK_ITEM_DIMENSIONS: 3
>       MAX_WORK_ITEM_SIZES:      [ 1024 1024 64 ]
>       DRIVER_VERSION:           325.15
>       DEVICE_VERSION:           OpenCL 1.1 CUDA
>

Similar setup here as with Christian but with a bit older video card:

[opencl_init] device 0 `GeForce GT 630' has sm_20 support.
[opencl_init] device 0 `GeForce GT 630' supports image sizes of 32768 x 
32768
[opencl_init] device 0 `GeForce GT 630' allows GPU memory allocations of 
up to 511MB
[opencl_init] device 0: GeForce GT 630
      GLOBAL_MEM_SIZE:          2047MB
      MAX_WORK_GROUP_SIZE:      1024
      MAX_WORK_ITEM_DIMENSIONS: 3
      MAX_WORK_ITEM_SIZES:      [ 1024 1024 64 ]
      DRIVER_VERSION:           304.88
      DEVICE_VERSION:           OpenCL 1.1 CUDA

Note the older driver version being used.

I also shoot Pentax (K5IIs in this case) and work with DNG files when 
doing astrophotography due to DeepSkyStacker not being able to handle 
PEF files. After a tweaking session (mainly adjust tone curve, slight 
denoising since DSS already does most of the heavy lifting with that, 
saturation boost via color zones). Here is the final export run:

pixelpipe_process] [export] using device 0
[dev_pixelpipe] took 0.000 secs (0.000 CPU) initing base buffer [export]
[dev_pixelpipe] took 0.144 secs (0.156 CPU) processing `white balance' 
[export]
[dev_pixelpipe] took 0.044 secs (0.012 CPU) processing `highlight 
reconstruction' [export]
[default_process_tiling_cl_ptp] use tiling on module 'denoiseprofile' 
for image with full size 4950 x 3284
[default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 4100 x 
3284 and overlap 32
[default_process_tiling_cl_ptp] tile (0, 0) with 4100 x 3284 at origin 
[0, 0]
[opencl_denoiseprofile] couldn't enqueue kernel! -4, devid 0
[default_process_tiling_opencl_ptp] couldn't run process_cl() for module 
'denoiseprofile' in tiling mode: 0
[opencl_pixelpipe] failed to run module 'denoiseprofile'. fall back to 
cpu path
[dev_pixelpipe] took 12.437 secs (39.652 CPU) processing `denoise 
(profiled)' [export]
[dev_pixelpipe] took 0.207 secs (0.172 CPU) processing `input color 
profile' [export]
[dev_pixelpipe] took 0.123 secs (0.040 CPU) processing `color zones' 
[export]
[dev_pixelpipe] took 0.065 secs (0.024 CPU) processing `tone curve' [export]
[dev_pixelpipe] took 0.069 secs (0.028 CPU) processing `levels' [export]
[dev_pixelpipe] took 0.082 secs (0.036 CPU) processing `output color 
profile' [export]
[dev_pixelpipe] took 0.152 secs (0.076 CPU) processing `channel mixer' 
[export]
[opencl_profiling] spent  0.2753 seconds in [Write Image (from host to 
device)]
[opencl_profiling] spent  0.0415 seconds in whitebalance_4f
[opencl_profiling] spent  0.0426 seconds in highlights_4f
[opencl_profiling] spent  0.3006 seconds in [Read Image (from device to 
host)]
[opencl_profiling] spent  0.0491 seconds in denoiseprofile_precondition
[opencl_profiling] spent  1.0551 seconds in denoiseprofile_decompose
[opencl_profiling] spent  0.1123 seconds in colorin
[opencl_profiling] spent  0.1210 seconds in colorzones
[opencl_profiling] spent  0.0619 seconds in tonecurve
[opencl_profiling] spent  0.0658 seconds in levels
[opencl_profiling] spent  0.0775 seconds in colorout
[opencl_profiling] spent  0.0453 seconds in channelmixer
[opencl_profiling] spent  0.0200 seconds in blendop_mask_rgb
[opencl_profiling] spent  0.0782 seconds in blendop_rgb
[opencl_profiling] spent  2.3464 seconds totally in command queue (with 
1 event missing)
[dev_process_export] pixel pipeline processing took 13.704 secs (41.432 CPU)

Not sure why opencl cannot handle denoising with the GT 630 but it 
consumes the bulk of the pipeline processing time.

Jack








------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Darktable-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/darktable-users

Reply via email to