Hi Tomasz,

On Mon, 21 Mar 2011 20:15:35 +0100, "=?UTF-8?B?VG9tYXN6IFJ5YmFr?=" 
<bogom...@post.pl> wrote:
> I attach patch updating pycuda.tools.DeviceData and 
> pycuda.tools.OccupancyRecord
> to take new devices into consideration. I have tried to maintain "style" of 
> those classes
> and introduced changes only when necessary. I have done changes using my old 
> notes
> and NVIDIA Occupancy Calculator. Unfortunately I currently do not have 
> access to Fermi
> to test those fully.

-        self.smem_granularity = 16
+        if dev.compute_capability() >= (2,0):
+            self.smem_granularity = 128
+       else:
+            self.smem_granularity = 512

Way back in March, you submitted this patch, where smem_granularity is
documented as the number of threads taking part in a simultaneous smem
access. The new values just seem wrong. What am I missing, or rather,
what did you have in mind?

In any case, I've reverted them to 16/32 in git.

Thanks,
Andreas

Attachment: pgpDPuSDh9jiP.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to