Joe <homerun4...@gmail.com> writes: > Hello, > > I have a question and hope that you can help me. > > A block is working on a common problem, the threads are iterating > through a part of the problem each. > Now if some condition is met, a thread should write its threadId > to a 1D output which is smaller than the total number of threads. > > I would rather not store all of the results as integers. > since the condition is only met in very rare cases. > > The two options I found would be > > 1.) to store all results in a bitfield with is as long as there are > threads and use bitwise atomicAnd. > > 2.) share a common index within a block which is and use the > return value of atomicAdd to store the threadId there. > > Is one of this ideas to be preferred? Or do you have > better suggestions to do this?
This sounds tricky. A reasonable design might be to allocate space so that every block has room to write out twice or three times its expected number of outputs, use a scan within each block to compute indices, and have some sort of failure indication (+do-over) if the allocated output space overruns. Andreas _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda