On 2010-10-15 01:22, Sean Kelly wrote:
Jacob Carlborg Wrote:

Thread.getThis() calls pthread_getspecific which is just three
instructions on Mac OS X, so I guess that's not why it's so slow. The
only thing I can think of is first moving the if statement into the
assert and then trying to inline as much of the function calls.

Swapping the assert and the executable code would save you a jump, but inlining 
the call to ___tls_get_addr would be be a bit trickier.  We'd probably have to 
expose Thread.sm_this as an extern (C) symbol, move the function to object.d 
and explicitly do the pthread_getspecific call there.  If that would be enough 
for the compiler to inline the calls then it shouldn't be too hard to test, but 
I'm worried that the call generation may happen too late.  I guess it wouldn't 
be too hard to figure out from looking at the asm output though (PIC code 
weirdness notwithstanding).

I think it would save more than just a jump. When compiling in release mode the compiler have to generate code for the if statement but with an assert it can just skip it. See the assembly at the bottom.

I was thinking about inlining Thread.getThis() as a first step. Then inlining pthread_getspecific as a second step. I don't know if we can inline pthread_getspecific due to license issues but at least there is an inline version available. Then of course inlining the call to __tls_get_addr could help as well.

Both of the following versions are compiled with "dmd -c -O -release".

___tls_get_addr in thread.d compiled with if statement:

___tls_get_addr:
                push    EBP
                mov     EBP,ESP
                push    EAX
                mov     EDX,EAX
                push    EBX
                call      L2000
L2000:          pop     EBX
                cmp     03AAh[EBX],EDX
                ja      L2011
                cmp     03AEh[EBX],EDX
                ja      L2012
L2011:          hlt
L2012:          mov     -4[EBP],EDX
                call      L217E
                mov     EAX,054h[EAX]
                add     EAX,-4[EBP]
                sub     EAX,03AAh[EBX]
                pop     EBX
                mov     ESP,EBP
                pop     EBP
                ret
                nop

22 lines for the above code, the same compiled with an assert instead:

___tls_get_addr:
                push    EBP
                mov     EBP,ESP
                sub     ESP,8
                mov     -4[EBP],EAX
                call      L216E
                mov     EAX,054h[EAX]
                add     EAX,-4[EBP]
                call      L200D
L200D:          pop     ECX
                sub     EAX,038Dh[ECX]
                mov     ESP,EBP
                pop     EBP
                ret

This is just 13 lines of code, I can tell you that I don't know assembly but I can see the number of instructions are a lot more in the version with the if statement than the one with the assert.

--
/Jacob Carlborg

Reply via email to