On 21 nov 2011, at 20:12, Walter Bright wrote:
> 
> On 11/21/2011 9:17 AM, Jacob Carlborg wrote:
>> 
>>> One reason dmd doesn't support dynlib yet is because I haven't done much 
>>> research into how dynlib actually works.
>> 
>> Ok, I see. I though that you might know since you have developed a C++ 
>> compiler as well. I assume dynamic libraries can be used with DMC. Note that 
>> when I say "dylib" I mean the general term "dynamic library" and not the Mac 
>> OS X specific implementation.
>> 
> 
> DLLs on Windows work very differently from dynlibs on other systems. You have 
> to approach each as its own animal.


I've done some research about how TLS is implemented in the ELF format. I don't 
understand everything but I think I've got a, at least, somewhat better 
understanding of TLS.

I've started to think about if it's possible to implement TLS on Mac OS X in 
the same way as it's implement on Linux, but just with the help of the compiler 
and druntime. From what I've read, and understood, what basically happens and 
what's different compared to regular variables is:

* Some form of relocation happens 
* The TLS sections are initialization
* The regular sections are not used
* The regular symbol table is not used

I'm not sure if ti's possible to do the relocation but the initialization 
should be any problem (I think). I'm also not sure about the second symbol 
table, if that can be made to work. This is how I'm thinking:

There are a couple of things that needs to be done at program start to have TLS 
working. It shouldn't matter if that's done by the dynamic linker or the 
application itself (druntime). That's assuming an application can do everything 
that needs to be done, i.e. relocation.

Of course this is just how I'm thinking and I can be completely wrong. I also 
have no idea how close your implementation of TLS on Mac OS X is to the 
implementation on Linux.

Now about getting TLS to work with dynamic libraries.

What's happening now in the ___tls_get_addr function is that there is only one 
TLS section/segment, bracketed by the __tls_beg and __tls_end segments. The 
problem is that there is only one pari of these begin and end segments. 
According to the TLS reference I read, a thread-local variable is identified by 
a reference to the object and the offset of the variable in the thread-local 
storage section. So the problem is now how to get the object in which this 
variable is defined.

I don't full understand how this object is accessed. I think it's either passed 
to the __tls_get_addr function or accessed inside the function using assembly 
instructions. What's passed to the __tls_get_addr is an argument of the type 
"tls_index". The type is defined as follows for the IA-32 ABI:

typedef struct
{
    unsigned long int ti_module;
    unsigned long int ti_offset;
} tls_index;

Using the GNU variant of the ABI, the parameter is passed to the function in 
the %eax register. The reference says that to load the thread pointer in the 
%eax register the following code would be used:

movl %gs:0, %eax

I don't know if the object is the thread pointer or if it's the ti_module field 
in the tls_index struct. The name would suggest it's the field in the struct.

To call the __tls_get_addr function the following assembly instructions are 
used for the general dynamic model for the IA-32 ABI, the GNU version:

0x00 leal x@tlsgd(,%ebx,1),%eax
0x07 call ___tls_get_addr@plt

In the above code, "x" in the "leal" instruction, is the variable to be 
accessed. Since you have already implemented TLS for Linux, I assume according 
to this reference, you already know how to call this function (depending on 
what TLS model is used).

The general dynamic TLS model can be used everywhere and can access variables 
defined anywhere else. There are other models available but these are limited 
in different ways compared to the general model.

I've also read your article at Dr. Dobb's about implementing TLS on Mac OS X. 
You write:

"my benchmarks show it to be 10 times slower than a simple access to a shared 
global."

I don't understand why it has to be like this. If TLS is implemented in the 
same way as on Linux, but in the druntime instead of the dynamic linker (as 
suggested in the beginning), I don't see why it would be any slower than on 
Linux. I mean, the same tasks need to be performed regardless if it's done by 
the dynamic linker or druntime.

Also note that TLS is really fast on Mac OS X, pthread_getspecific is 
implemented using three assembly instructions and pthread_self using two 
instructions. There are inline versions available of these functions to remove 
the function call overhead.

BTW, according to this:

http://stackoverflow.com/questions/2436772/thread-local-storage-macosx

GCC 4.5+ on Mac OS X supports the __thread keyword but it's emulated.

TLS reference: http://www.akkadia.org/drepper/tls.pdf

-- 
/Jacob Carlborg

_______________________________________________
dmd-internals mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/dmd-internals

Reply via email to