Hi, I did some testing with number of threads. I changed number of threads and recorded the time in seconds it took for the pyopencl kernel to execute. Following are the results:
- No_of_threads --- Time in seconds - 10,000 -- 202 - 20,000 -- 170 - 24,000 -- 209 - 30,000 -- 224 - 30714 -- 659 Thanks Aseem On Wed, Jun 6, 2018 at 1:54 AM, Sven Warris <s...@warris.nl> wrote: > Hi Aseem, > > This maybe caused by memory access collisions and/or lack of coalesced > memory access. This technical report gives some pointers: > https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf > Do you use atomic operations? Or maybe you have too many thread fences? > I have no problem starting many threads: the number of threads alone is > not the issues. > > Cheers, > Sven > > > Op 6-6-2018 om 8:37 schreef aseem hegshetye: > > Hi, > Does GPU speed exponentially drop as number of threads increase beyond a > certain number?. I used to allocate number of threads= number of > transactions in data under consideration. > For Tesla K80 I see exponential drop in speed above 30290 Threads. > If true, is it a best practice to keep number of threads low and iterate > over the data to get results at optimum speed. > How to find best number of threads for a GPU? > > Thanks > Aseem > > > _______________________________________________ > PyOpenCL mailing > listPyOpenCL@tiker.nethttps://lists.tiker.net/listinfo/pyopencl > > > > _______________________________________________ > PyOpenCL mailing list > PyOpenCL@tiker.net > https://lists.tiker.net/listinfo/pyopencl > >
_______________________________________________ PyOpenCL mailing list PyOpenCL@tiker.net https://lists.tiker.net/listinfo/pyopencl