https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461

--- Comment #15 from torvald at gcc dot gnu.org ---
From a ISO C/C++ conformance perspective, this is not a bug:

* Pre-C11/C++11, threading was basically undefined.

* Starting with C11 and C++11, "threads of execution" (as they're called in
C++11 and more recent, often abbreviated in the standard as "threads") are the
lowest level of how execution is modelled.  They are defined as single flows of
control.  Nothing is promised about any resources that may be used to implement
threads of execution (e.g., similar to the "execution context" notion mentioned
in comment #10).

* Thread-specific state is bound to a particular thread of execution (e.g.,
regarding lifetime).  A thread of execution accessing a __thread variable
accesses the thread-specific state of this thread of execution in the abstract
machine.  (Of course, one can still access other threads's thread-specific
state through pointers initially provided by those threads.)

* Only the standards' mechanisms can create threads of execution.  There is
nothing in these standards that would break up the concept of a single flow of
control (ie, that what "looks" like a single flow of control in a program is
actually not always the same thread of execution when executed).  (Also note
that fork/join parallelism is not a counter-example to this.)


That said, I can see that this doesn't match nicely with the fact that we have
things like swapcontext elsewhere.  Do we have any data on what the performance
impact where if the compiler would assume that function calls to functions it
cannot analyze could switch the thread of execution.  Data for several
architectures and different TLS models would be helpful.

Coming back to C++, currently I think there is only one Technical Specification
(TS) that allows breaking up a single flow of control: .then() in the
Concurrency TS (whose specification is certainly not ready for the standard). 
Maybe the Networking TS has something similar, but I can't remember right now. 
There are a few proposals that either allow it (Task Blocks, targeting
Parallelism TS version 2) or require it for good performance ("stackful"
coroutines).
The "stackless" coroutines in the upcoming Coroutines TS are not really
affected by thread-specific state; it's not the coroutines code that would
potentially switch threads, but any runtime that would supply a particular
Awaitable implementation that then may switch threads (e.g., if using .then()).
 The Coroutines does not specify any actual runtime.

However, I don't think the existence of some C++ proposals that may switch
threads necessarily means that the compiler would have to take this into
account when those proposals should become part of the standard.  The other way
this could play out is that the standard simply makes using thread-specific
state undefined for those threads of execution that can use these proposals.

Overall, I think it may be useful to experiment with a command line switch or
something like that, primarily to assess how big the performance degradation
would be caused by having the compiler assume that threads can switch on
function calls.

(In reply to Jakub Jelinek from comment #14)
> Even if we have an option that avoids CSE of TLS addresses across function
> calls (or attribute for specific function), what would you expect to happen
> when user takes address of TLS variables himself:
> __thread int a;
> void
> foo ()
> {
>   int *p = &a;
>   *p = 10;
>   bar (); // Changes threads
>   *p += 10;
> }

I think that p would point to the initial thread's TLS, even after bar().   The
user would be wrong to assume that it still is the initial thread's object "a"
after having called bar().

Reply via email to