On Saturday, 7 November 2015 at 08:37:40 UTC, Jacob Carlborg wrote:
On 2015-11-06 19:46, bitwise wrote:

Currently, the compiler just calls ___tls_get_addr(void *p) to get the thread local copy of a global. If that function signature is altered to
take a pointer to the image as well, the problem is solved.

Hehe, you make it sound so easy. Perhaps I missed something and you know more than I do. But as far as I know you have two options:

1. Implement native TLS. This will require modifications to the compiler and minor tweaks in the runtime

2. Continue to use the custom TLS implementation but add support for dynamic libraries. This will require modifications to the compiler (as you said above) and major changes to the runtime

The native TLS implementation works as you described above (roughly). I can hardly believe that the code Apple added to the dynamic linker to implement TLS is not necessary. I don't see how you can get around not implementing the same code as the dynamic linker does.

I also think that this is a good opportunity to change to native TLS. I don't like this situation we have now: "Yeah, D is compatible with C, except TLS on OS X.".

Well, I'm speaking in relative terms when I say easy... ;)

Right now, TLS has a fairly simple implementation. DMD puts any global TLS vars into their own section in the binary. Then, at the point here those vars are accessed in code, DMD inserts a call to ___tls_get_addr(void*) to map the address of the var to some thread specific block of memory. When ___tls_get_addr() is called, it lazily instantiates a block of memory for the calling thread, memcpy's the TLS vars from the TLS section in the binary, and stores that thread local copy using pthread_set_specific(). Any subsequent calls to ___tls_get_addr() will simply use pthread_get_specific() to retrieve that block of memory, and map the received address to one pointing in that block.

So, since binaries will not be mapped to overlapping address spaces, I can loop over all the binary images and find the range to which the argument of ___tls_get_addr() belongs, and map the pointer to the appropriate block of memory.

I am concerned that looping over all binary images for each TLS access will have performance implications, but for now, this solution is good enough. Later, ___tls_get_addr() can be amended to pass a pointer to the image from which the TLS originated, allowing constant time lookup. I believe Martin has already done this for linux/fbsd, but I had time to look at this specific issue.

So.. I've got a basic implementation working at this point. The global ctors are now used instead of that infernal dyld callback to initialize sections. I've tried loading(dynamically) a shared library, and everything seems to work. Next on the list is to work on how all this interacts with threads. Martin seems to have already solved this too, so it should be fairly straight forward. Currently, linking a dylib statically throws "thread.d(2916): Unable to suspend thread", but other wise, seems to work as expected.

Anyways, I am open to any help on the TLS stuff if you've got time.

     Bit

Reply via email to