Hi,

your problem is related to this here:

[opencl_profiling] spent 6.8647 seconds in [Write Image (from host to device)]

(taken from one of the OpenCL runs in your attached debug output).

This figure represents the time that is needed to transfer data from host memory (your main ram) to the graphics card memory. This is a processing step that is inherently slow for all OpenCL systems as data need to travel through the slow PCI bus. However, in your case it's really slow.

I am seeing something similar, albeit not to that extent, with an older Geforce GTS450 which I use here on a dual-GPU system (about 1.5s in Write Image...). My primary GPU, an AMD HD7950 runs with high speed (about 0.02s in Write Image...). No sure if this is related to the dual-GPU setup, though.

Maybe some of the other users of some low to mid-end Nvidia GPUs could report their figures.

Ulrich


Am 21.03.2016 um 13:32 schrieb Jamie Kitson:
Hi,

I have an Asus UX31VD with 10GB RAM, an i7-3517U and both Intel HD4000
and NVidia GeForce GT 620M GPUs. When I switch OpenCL on in Darktable I
find it runs much slower, I think you can see that from these numbers:

[dev_pixelpipe] took 24.632 secs (22.407 CPU) processed `lens
correction' on GPU with tiling, blended on CPU [export]

[dev_pixelpipe] took 4.648 secs (17.720 CPU) processed `lens correction'
on CPU with tiling, blended on CPU [export]


[dev_pixelpipe] took 0.264 secs (0.320 CPU) processed `shadows and
highlights' on GPU, blended on GPU [full]

[dev_pixelpipe] took 0.050 secs (0.167 CPU) processed `shadows and
highlights' on CPU, blended on CPU [full]


Reading the Darktable OpenCL post [1] it seems that this could be caused
by lack of graphics memory. According various sources (Windows included)
my machine has 2GB of graphics RAM, and according to the Darktable
OpenCL post, Darktable itself won't try to use a GPU with less than
1GB(?) of memory, however the only ways I know of checking the amount of
graphics memory in Linux show less than 2GB, eg:

clinfo:
   Global memory size                              1073479680 (1024MiB)
   Max memory allocation                           268369920 (255.9MiB)
   Unified memory for Host and Device              No
   Integrated memory (NV)                          No
   Global Memory cache type                        Read/Write
   Global Memory cache size                        32768
   Global Memory cache line                        128 bytes
   Local memory type                               Local
   Local memory size                               49152 (48KiB)

lspci:
     Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
     Memory at e0000000 (64-bit, prefetchable) [size=256M]
     Memory at f0000000 (64-bit, prefetchable) [size=32M]

lshw:
        resources: irq:16 memory:f6000000-f6ffffff
        memory:e0000000-efffffff memory:f0000000-f1ffffff
        ioport:e000(size=128) memory:f7000000-f707ffff

Darktable:
[opencl_init] device 0 `GeForce GT 620M' allows GPU memory allocations
of up to 255MB
[opencl_init] device 0: GeForce GT 620M
      GLOBAL_MEM_SIZE:          1024MB
      MAX_WORK_GROUP_SIZE:      1024
      MAX_WORK_ITEM_DIMENSIONS: 3
      MAX_WORK_ITEM_SIZES:      [ 1024 1024 64 ]
      DRIVER_VERSION:           361.28
      DEVICE_VERSION:           OpenCL 1.1 CUDA

Actually dmesg does report 2GB:
[    1.965859] [drm] Memory usable by graphics device = 2048M

So my questions are:
Am I right in thinking that the Darktable OpenCL slowness is likely down
to a lack of video memory?
From what I've read on the internet it seems that many people run
Darktable using OpenCL on Optimus systems without this issue, can anyone
vouch for that?
Does anyone have any idea in how I can either a) have Linux/OpenCL
recognise how much video memory I really have or b) speed up Darktable
with OpenCL?
Would the AMD/ATI instructions [2] help in my case?

Thanks, Jamie Kitson
____________________________________________________________________________
darktable user mailing list
to unsubscribe send a mail to [email protected]

Reply via email to