Re: [petsc-dev] SNES ex19 not using GPU despite passing the options

Karl Rupp Tue, 14 Jan 2014 14:50:39 -0800

Hi Mani,

> Thanks for the reply. That fixed it. I get only a 10% speed up using the

cusp options. Is the residual evaluation at each iteration happening on
the CPU or the GPU?

The residual evaluation happens on the CPU unless there is a dedicatedkernel provided for this (which is not the case in ex19)

Is there anyway one can do the residual evaluation
on the GPU too, after the data has been transferred?

Technically it is possible by extracting the underlying GPU buffers fromthe vector objects and by manually managing the Field data. Frankly Idon't know about the current state of the local-to-global mappings, youlikely have to do quite some copying of data between host and devicemanually.

Ex42 shows how it
can be done using cusp but it looks really ugly and I want to use
OpenCL. Basically can I do something like this?

DMGetLocalVector(da, &localX); //Vector is now in GPU.
DMDAVecGetArray(da, localX, &x); //Array is on GPU.

//Create buffers for OpenCL
buffer = cl::Buffer(context, CL_MEM_USE_HOST_PTR |
                                                 CL_MEM_READ_WRITE,
                                   sizeofarray, &x[X2Start-Ng][X1Start-Ng]
                                    , &clErr);

(I'm hoping that here CL_MEM_USE_HOST_PTR will give a pointer to the
data already on the GPU)

// Launch OpenCL kernels and now map the buffers to read off the data.

DMDAVecRestoreArray(da, localX, &x);
DMRestoreLocalVector(da, &localX);

I think the question is whether DMDAVecGetArray will return a pointer to
the data on the GPU or not.

*VecGetArray() will always return a pointer due to the inability tooverload functions in C. Buffers in OpenCL are of type cl_mem, so thiswon't work. Also, you won't be able to copy a two-dimensional array withjust one pointer &x[][]. As far as I know, we don't have any API whichprovides GPU buffers directly, but maybe Matt added some functions forthis to work with FEM recently.

As far as I can tell, only providing the kernel won't suffice because wedon't have the GPU-implementations for 'Field' data available. Hence,you would have to copy the x and b arrays manually and then copyeverything back, which is most likely too much of a performance hit tobe worth the effort. Since GPUs are getting more and more integratedinto CPUs, it's questionable whether it's worth the time to implementsuch additional memory management for accelerators if they disappear intheir discrete PCI-Express form in a few years from now...


Best regards,
Karli

Re: [petsc-dev] SNES ex19 not using GPU despite passing the options

Reply via email to