Hello,

If you compile with the keep=True option you should find the ptx file generated by the compiler, eg:

In [127]: mod = SourceModule(s, keep=True )
*** compiler output in c:\users\wright\appdata\local\temp\tmpvzledt

Over in that folder I find "kernel.ptx" which contains the details of the nvcc compiler and options used and the assembler output. If you compile your C based kernel using nvcc and the -ptx option you should be able to diff the two outputs.

If the ptx files match and the timing still does not then you might want to try configuring pycuda with --cuda-trace as another way to track down the differences.

Cheers

Jon

On 04/04/2012 10:39, Michiel Bruinink wrote:
Hello,
I have written a Cuda program that calculates lots of Gauss fits. When I
use that same program with PyCuda, the time it takes to do the
calculations is almost 3x the time it takes with nvcc.
With nvcc it takes 380 ms and with PyCuda it takes 1110 ms, while the
outcome of the calculations is the same.
There is no difference in the device code, because I use the same file
for the device code in both cases.
How is this possible?
Does anybody have an idea?
I am not sure, but could it have someting to do with array declarations
inside a device function?
# define lenP 6
# define nPoints 100000
...
__device__ void someFunction()
{
float residu[nPoints], newResidu[nPoints], pNew[lenP], b[lenP],
deltaP[lenP];
float A[lenP*lenP], Jacobian[nPoints*lenP], B[lenP*lenP];
...
}
Thanks,
Michiel.


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to