Hi,

(Only) just now (with the new ati-drivers 13.6) , I can enable the
opencl on darktable

[opencl_init] device 0 `Tahiti' supports image sizes of 16384 x 16384
[opencl_init] device 0 `Tahiti' allows GPU memory allocations of up to
1024MB
[opencl_init] device 0: Tahiti 
     GLOBAL_MEM_SIZE:          1845MB
     MAX_WORK_GROUP_SIZE:      256
     MAX_WORK_ITEM_DIMENSIONS: 3
     MAX_WORK_ITEM_SIZES:      [ 256 256 256 ]
     DRIVER_VERSION:           1214.3 (VM)
     DEVICE_VERSION:           OpenCL 1.2 AMD-APP (1214.3)
...
discarding CPU device 1 `Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz' as it
will not deliver any performance gain.


The result is realy great, for a picture-exemple-test, exporting it
(with lot of big treatement like sady/denoise) 
before : 29,837 secs (202,660 CPU)
after  : 8.412  secs (10.730 CPU)

So, we can really see improvement about speed & reactivity (as it's use
very less the CPU) 


1) But some plugin are more slower on the GPU than on the CPU : 
it's not a fallback mode to CPU : 
[opencl_summary_statistics] device 'Tahiti': 1468 out of 1468 events
were successful and 0 events lost


Test with
1.2 : 
cpu [dev_pixelpipe] took 0.347 secs (1.789 CPU) processing `raw
denoise' [export]
gpu [dev_pixelpipe] took 0.432 secs (1.852 CPU) processing `raw
denoise' [export]
cpu [dev_pixelpipe] took 0.318 secs (0.310 CPU) processing
`watermark' [export]
gpu [dev_pixelpipe] took 0.476 secs (0.582 CPU) processing
`watermark' [export]
cpu [dev_pixelpipe] took 0.029 secs (0.205 CPU) processing
`gamma' [export]
gpu [dev_pixelpipe] took 0.033 secs (0.232 CPU) processing
`gamma' [export]
Devel :
cpu [dev_pixelpipe] took 0.317 secs (1.638 CPU) processing `raw
denoise' [export]
gpu [dev_pixelpipe] took 0.371 secs (1.642 CPU) processing `raw
denoise' [export]
cpu [dev_pixelpipe] took 0.205 secs (0.199 CPU) processing
`watermark' [export]
gpu [dev_pixelpipe] took 1.351 secs (2.157 CPU) processing
`watermark' [export]
cpu [dev_pixelpipe] took 0.024 secs (0.175 CPU) processing
`gamma' [export]
gpu [dev_pixelpipe] took 0.024 secs (0.186 CPU) processing
`gamma' [export]


1.1 ) Specialy for the waterwark (where it's a big difference between
1.2 & the devel version) 


2) Also, i was needed to change the parameter opencl_memory_headroom to
350 ( default 300 ) to prevent this kind of error when use the equalizer
module :

default_process_tiling_cl_ptp] use tiling on module 'atrous' for image
with full size 3374 x 5064
[default_process_tiling_cl_ptp] (1 x 3) tiles with max dimensions 3374 x
2726 and overlap 256
[default_process_tiling_cl_ptp] tile (0, 0) with 3374 x 2726 at origin
[0, 0]
[opencl_atrous] couldn't enqueue kernel! -4
[default_process_tiling_opencl_ptp] couldn't run process_cl() for module
'atrous' in tiling mode: 0
[opencl_pixelpipe] failed to run module 'atrous'. fall back to cpu path
[dev_pixelpipe] took 7.192 secs (37.128 CPU) processing
`equalizer' [export]
[opencl_pixelpipe] couldn't copy image to opencl device for module
tonecurve
[opencl_pixelpipe] failed to run module 'tonecurve'. fall back to cpu
path
[opencl_pixelpipe (b)] late opencl error detected while copying back to
cpu buffer: -4

3) To prevent regression & catch worse performance, we can imagine a
test script which will enable nearly all modules then export one file (&
the same without opencl) then report it on a graph ?
i don't known how & if it's realy hard to do that ... (lua ?) 
about it, i was thinking about the job done by firefox : 

http://arewefastyet.com/
https://areweslimyet.com/



Regards



------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
darktable-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/darktable-devel

Reply via email to