Joe <homerun4...@gmail.com> writes:

> Hello,
>
> I have a question and hope that you can help me.
>
> A block is working on a common problem, the threads are iterating
> through a part of the problem each.
> Now if some condition is met, a thread should write its threadId
> to a 1D output which is smaller than the total number of threads.
>
> I would rather not store all of the results as integers.
> since the condition is only met in very rare cases.
>
> The two options I found would be
>
> 1.) to store all results in a bitfield with is as long as there are 
> threads and use bitwise atomicAnd.
>
> 2.) share a common index within a block which is and use the
> return value of atomicAdd to store the threadId there.
>
> Is one of this ideas to be preferred? Or do you have
> better suggestions to do this?

This sounds tricky. A reasonable design might be to allocate space so
that every block has room to write out twice or three times its expected
number of outputs, use a scan within each block to compute indices, and
have some sort of failure indication (+do-over) if the allocated output
space overruns.

Andreas

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to