Hello Apostolis, There are two errors:
1. You are trying to use 32x32 block, but this size is only supported by compute compatibility 2.0 devices (Teslas and probably other new cards, look it up in the programming guide). Older cards (such as mine) only allow maximum 512 threads per block, so I had to change it to 16x16 (with corresponding changes to other parts of the code). You did not specify what error exactly are you getting, but keep this in mind. 2. When you create the output array as rot_im = zeros((width,height)) it has dtype=float64 by default. You have to explicitly set it to the same type as curr_im (float32), for example by writing rot_im = zeros((width,height)).astype(curr_im.dtype) With these changes your code works correctly on my system. Best regards, Bogdan On Sun, Oct 30, 2011 at 6:28 PM, Apostolis Glenis <apostgle...@gmail.com> wrote: > I tried adapting the SDK naive transpose example for a class project that > i'm working on.(The class project isn't about transpose but it is related). > Could you please tell me what is wrong with my code? > > Apostolis > > _______________________________________________ > PyCUDA mailing list > PyCUDA@tiker.net > http://lists.tiker.net/listinfo/pycuda > > _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda