Has anyone actually looked into why this is failing?

By the rules of Discreet Math/Boolean Algebra:

        something xor something = 0
        something_else xor 0 = something_else

                so

        valA ^ ( valA ^ valB ) = valB

If that is not working then you may have found a compiler bug (probably
in the register allocator).  The appropriate output code should resemble
this (AT&T syntax: op src, dest):

mov var_a, %r1
mov var_b, %r2
mov %r1, %r3
xor %r2, %r3
xor %r3, %r1
xor %r3, %r2
mov %r1, var_a
mov %r2, var_b

or optimized:

mov var_a, %r1
xor var_b, %r1
xor %r1, var_a
xor %r1, var_b

or load the values, manipulate the values, store the values.  If
something happens in the middle of those steps, like the registers are
needed elsewhere (and not saved), then it will fail.  The above code has
two memory fetches and two memory writes.  Is it really faster than? :

mov var_a, %r1
mov var_b, %r2
mov %r2, var_a
mov %r1, var_b

without the type casting:

#define BINSWAP(a, b)  ((a) ^= (b) ^= (a) ^= (b))

seems to rely on many compiler optimizations that aren't clearly
documented and may be defined differently for different architectures,
not to mention how different compilers will choose to deal with it.
Hence the failure on x86-64.  For those curious compiling:

#define BINSWAP(a, b) \
   (((long) (a)) ^= ((long) (b)) ^= ((long) (a)) ^= ((long) (b)))

int main( void )
{
  long a = 3;
  long b = 8;

  asm( "noop;noop;noop" );
  BINSWAP(a,b);
  asm( "noop;noop;noop" );

}

yields:

        noop;noop;noop
        movq    -16(%rbp), %rdx
        leaq    -8(%rbp), %rax 
        xorq    %rdx, (%rax)   
        movq    -8(%rbp), %rdx 
        leaq    -16(%rbp), %rax
        xorq    %rdx, (%rax)   
        movq    -16(%rbp), %rdx
        leaq    -8(%rbp), %rax 
        xorq    %rdx, (%rax)   
        noop;noop;noop

If you enable -O[123] then you will need to use the values a & b before
and after the BINSWAP call or they will be optimized away.  And simply
using immediate values like I did will cause the compiler to simply set
the different registers that are used to access them in reverse order.
In other words the swap gets optimized out.  The above code is without
-O and is clearly more complicated (by more than double) than it needs
to be.

Just my $0.02,
-- 
Tres



-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to