On Wed, Oct 21, 2015 at 05:40:24PM +0300, Alexander Monakov wrote:
> On Wed, 21 Oct 2015, Jakub Jelinek wrote:
> > > -#if defined HAVE_TLS || defined USE_EMUTLS
> > > +#if defined __nvptx__
> > > +extern struct gomp_thread *nvptx_thrs;
> > 
> > What kind of address space is this variable?  It should be
> > a per-CTA var, so that different teams have different, and
> > simultaneous target regions have different too.
> 
> As written it's in global accelerator memory.  Indeed it's broken with
> simultaneous target regions, and to unbreak that I'd like to place it in
> shared memory (but that would require expanding address-space support
> a bit more, exposing shared-memory space to C source code).

Or declare the pointer in inline asm and read from it in inline asm
(and ditto for initialization)?
Perhaps at least short term.

> > I'm surprised that for team.c you chose to adjust the shared source,
> > rather than copy and remove all the cruft you don't need/want.
> > 
> > That includes the LIBGOMP_USE_PTHREADS guarded parts, all the thread binding
> > stuff etc.  I'd like to see at least for comparison how much actually
> > remained in there.
> 
> Diffstat for the copy/remove patch is 66+/474-, almost all of removed 470
> lines are in gomp_team_start, which counts only ~150 lines after removals.

I think I prefer config/nvptx/team.c including the toplevel team.c, where
you ifdef out all of gomp_thread_start and gomp_team_start in there,
and define it yourself in config/nvptx/team.c.  After removing non-PTX
related stuff, gomp_thread_start is like 65 lines including comments/blank
lines, and gomp_team_start is 85 lines.  Note I've also removed the nested
parallelism handling from there, you can't support that without dynamic
parallelism.  And you'll surely want to tweak it even more.

        Jakub

Reply via email to