On Wed, Dec 02, 2015 at 09:14:03AM -0500, Nathan Sidwell wrote:
> On 12/02/15 08:46, Jakub Jelinek wrote:
> 
> >Or does the OpenACC execution model not allow anything like that, i.e.
> >have some function with an automatic variable pass the address of that
> >variable to some other function and that other function use #acc loop kind
> >that expects the caller to be at the worker level and splits the work among
> >the threads in the warp, on the array section pointed by that passed in
> >pointer?  See the OpenMP testcase I've posted in this thread.
> 
> There are two cases to consider
> 
> 1) the caller (& address taker) is already partitioned.  Thus the callers'
> frames are already copied.  The caller takes the address of the object in
> its own frame.
> 
> An example would be calling say __mulcd3 where the return value location is
> passed by pointer.
> 
> 2) the caller is not partitioned and calls a function containing a
> partitioned loop.  The caller takes the address of its instance of the
> variable.  As part of the RTL expansion we have to convert addresses (to be
> stored in registers) to the generic address space.  That conversion creates
> a pointer that may be used by any thread (on the same CTA)[*].  The function
> call is  executed by all threads (they're partially un-neutered before the
> call).  In the partitioned loop, each thread ends up accessing the location
> in the frame of the original calling active thread.
> 
> [*]  although .local is private to each thread, it's placed in memory that
> is reachable from anywhere, provided a generic address is used.  Essentially
> it's like TLS and genericization is simply adding the thread pointer to the
> local memory offset to create a generic address.

I believe Alex' testing revealed that if you take address of the same .local
objects in several threads, the addresses are the same, and therefore you
refer to your own .local space rather than the other thread's.  Which is why
the -msoft-stack stuff has been added.
Perhaps we need to use it everywhere, at least for OpenMP, and do it
selectively, non-addressable vars can stay .local, addressable vars proven
not to escape to other threads (or other functions that could access them
from other threads) would go to soft stack.

        Jakub

Reply via email to