Dnia 2010-09-29, śro o godzinie 05:30 -0700, jmcarval pisze:
[cut[
> >Try to change it to arch = "sm_10" 
> > and so on, and check whether you get incorrect 14 in such a case.
> Sorry, Rybak. Can't put it to work. Thanks anyway.
> 
> Are there any other tests that I can do to help debug this?
> The sample I've posted can also trivially demonstrate the difficulties I'm
> having with the sum, min and max functions with arrays of 4 or more
> elements.

I started analysing how code is generated by CUDA, and reading on PTX,
nvcc, and so on.
In summary - nvcc can generate either PTX (assembly) or CUBIN (binary).
PTX is later compiled by driver (? -  I am not sure about details).
CUBIN is binary for particular chip.
Because PyCUDA already calls nvcc to compile CUDA code,
it would be pointless to generate PTX to have one more compilation
in the next second.
So, quite correctly, PyCUDA requires nvcc to generate CUBIN.
If I made mistake, please correct me.

Unfortunately CUBIN is generated for particular family of computing
capabilities. It means that you cannot require nvcc to generate code
for sm_11 and expect it to work on sm_20 (here - Fermi).

Sorry for that mistake, and I hope that I did not mixed something
while reading NVIDIA docs and PyCUDA sources.

-- 
Tomasz Rybak <bogom...@post.pl> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to