Hi,
I did some testing with number of threads. I changed number of threads and
recorded the time in seconds it took for the pyopencl kernel to execute.
Following are the results:

   -  No_of_threads --- Time in seconds
   - 10,000 -- 202
   - 20,000 -- 170
   - 24,000 -- 209
   - 30,000 -- 224
   - 30714 -- 659

Thanks
Aseem

On Wed, Jun 6, 2018 at 1:54 AM, Sven Warris <s...@warris.nl> wrote:

> Hi Aseem,
>
> This maybe caused by memory access collisions and/or lack of coalesced
> memory access. This technical report gives some pointers:
> https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf
> Do you use atomic operations? Or maybe you have too many thread fences?
> I have no problem starting many threads: the number of threads alone is
> not the issues.
>
> Cheers,
> Sven
>
>
> Op 6-6-2018 om 8:37 schreef aseem hegshetye:
>
> Hi,
> Does GPU speed exponentially drop as number of threads increase beyond a
> certain number?. I used to allocate number of threads= number of
> transactions in data under consideration.
> For Tesla K80 I see exponential drop in speed above 30290 Threads.
> If true, is it a best practice to keep number of threads low and iterate
> over the data to get results at optimum speed.
> How to find best number of threads for a GPU?
>
> Thanks
> Aseem
>
>
> _______________________________________________
> PyOpenCL mailing 
> listPyOpenCL@tiker.nethttps://lists.tiker.net/listinfo/pyopencl
>
>
>
> _______________________________________________
> PyOpenCL mailing list
> PyOpenCL@tiker.net
> https://lists.tiker.net/listinfo/pyopencl
>
>
_______________________________________________
PyOpenCL mailing list
PyOpenCL@tiker.net
https://lists.tiker.net/listinfo/pyopencl

Reply via email to