Try to use the double2 or double4 data type? From memory I saw
benchmark on float that show faster float{2,3,4} vs float.

Fred

On Thu, Feb 16, 2012 at 2:57 PM, Jesse Lu <jess...@stanford.edu> wrote:
> Hi everyone,
>
> I ran a simple experiment today, which consisted of trying to maximize the
> memory (device memory) throughput of a very simple kernel. I was slightly
> disappointed that I was only able to achieve 72% of the theoretical maximum
> bandwidth. My GPU is a C2070. The file is attached and is executed using:
>
> $ python test_pycuda_speed.py
> 0.72196600476 utilization (1.0 is perfect utilization).
> Achieved bandwidth: 98 GB/s
> Theoretical maximum bandwidth: 136 GB/s
> Fastest kernel execution time: 0.000777023971081
> Optimum block shape: (160, 1, 1)
> .
> ----------------------------------------------------------------------
> Ran 1 test in 0.814s
>
> OK
>
> The questions that I have are:
>
> How close can others get to the theoretical peak bandwidth?
> Any suggested tweaks to increase performance?
>
> Thanks!
>
> Jesse
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda
>

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to