Re: [OpenACC 2/11] PTX backend changes

Bernd Schmidt Thu, 22 Oct 2015 02:56:00 -0700

On 10/22/2015 10:12 AM, Jakub Jelinek wrote:

@@ -2129,6 +3242,19 @@ nvptx_file_end (void)
    FOR_EACH_HASH_TABLE_ELEMENT (*needed_fndecls_htab, decl, tree, iter)
      nvptx_record_fndecl (decl, true);
    fputs (func_decls.str().c_str(), asm_out_file);
+
+  if (worker_bcast_hwm)
+    {
+      /* Define the broadcast buffer.  */
+
+      worker_bcast_hwm = (worker_bcast_hwm + worker_bcast_align - 1)
+       & ~(worker_bcast_align - 1);
+
+      fprintf (asm_out_file, "// BEGIN VAR DEF: %s\n", worker_bcast_name);
+      fprintf (asm_out_file, ".shared .align %d .u8 %s[%d];\n",
+              worker_bcast_align,
+              worker_bcast_name, worker_bcast_hwm);
+    }


So, is the worker broadcast buffer effectively a file scope .shared
variable?  My worry is that as .shared is quite limited resource, if you
compile many TUs and each allocates its own broadcast buffer you run out of
shared memory.  Is there any way how to share the broadcast buffers in
between different TUs (other than LTO)?

I think LTO is the mechanism, nvptx-lto1 only ever produces one assemblyfile. So I'm not really concerned about this.

One other thing about this occurred to me yesterday - I was worriedabout thread-safety with a single static buffer - couldn't code executemultiple kernels at the same time? I googled a bit, and could notactually find a definitive answer as to whether all shared memory isallocated at kernel launch, or just the dynamic portion?



Bernd

Re: [OpenACC 2/11] PTX backend changes

Reply via email to