Hi Tomasz, On Mon, 21 Mar 2011 20:15:35 +0100, "=?UTF-8?B?VG9tYXN6IFJ5YmFr?=" <bogom...@post.pl> wrote: > I attach patch updating pycuda.tools.DeviceData and > pycuda.tools.OccupancyRecord > to take new devices into consideration. I have tried to maintain "style" of > those classes > and introduced changes only when necessary. I have done changes using my old > notes > and NVIDIA Occupancy Calculator. Unfortunately I currently do not have > access to Fermi > to test those fully.
- self.smem_granularity = 16 + if dev.compute_capability() >= (2,0): + self.smem_granularity = 128 + else: + self.smem_granularity = 512 Way back in March, you submitted this patch, where smem_granularity is documented as the number of threads taking part in a simultaneous smem access. The new values just seem wrong. What am I missing, or rather, what did you have in mind? In any case, I've reverted them to 16/32 in git. Thanks, Andreas
pgpDPuSDh9jiP.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda