On 10/22/15 05:55, Bernd Schmidt wrote:
On 10/22/2015 10:12 AM, Jakub Jelinek wrote:

So, is the worker broadcast buffer effectively a file scope .shared
variable?  My worry is that as .shared is quite limited resource, if you
compile many TUs and each allocates its own broadcast buffer you run out of
shared memory.  Is there any way how to share the broadcast buffers in
between different TUs (other than LTO)?

I think LTO is the mechanism, nvptx-lto1 only ever produces one assembly file.
So I'm not really concerned about this.

Correct. PTX has no equivalent of common or weak, so we can't do the elf thing of emitting a common defn and having the linking process pick the largest.


One other thing about this occurred to me yesterday - I was worried about
thread-safety with a single static buffer - couldn't code execute multiple
kernels at the same time? I googled a bit, and could not actually find a
definitive answer as to whether all shared memory is allocated at kernel launch,
or just the dynamic portion?

AFAICT a single CTA doesn't execute multiple kernels concurrently.

nathan

Reply via email to