Hello,
If you compile with the keep=True option you should find the ptx file
generated by the compiler, eg:
In [127]: mod = SourceModule(s, keep=True )
*** compiler output in c:\users\wright\appdata\local\temp\tmpvzledt
Over in that folder I find "kernel.ptx" which contains the details of
the nvcc compiler and options used and the assembler output. If you
compile your C based kernel using nvcc and the -ptx option you should be
able to diff the two outputs.
If the ptx files match and the timing still does not then you might want
to try configuring pycuda with --cuda-trace as another way to track down
the differences.
Cheers
Jon
On 04/04/2012 10:39, Michiel Bruinink wrote:
Hello,
I have written a Cuda program that calculates lots of Gauss fits. When I
use that same program with PyCuda, the time it takes to do the
calculations is almost 3x the time it takes with nvcc.
With nvcc it takes 380 ms and with PyCuda it takes 1110 ms, while the
outcome of the calculations is the same.
There is no difference in the device code, because I use the same file
for the device code in both cases.
How is this possible?
Does anybody have an idea?
I am not sure, but could it have someting to do with array declarations
inside a device function?
# define lenP 6
# define nPoints 100000
...
__device__ void someFunction()
{
float residu[nPoints], newResidu[nPoints], pNew[lenP], b[lenP],
deltaP[lenP];
float A[lenP*lenP], Jacobian[nPoints*lenP], B[lenP*lenP];
...
}
Thanks,
Michiel.
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda