Bruce Evans wrote:
Right.  Though I've never seen unnecessary's __volatiles significantly
affecting i386 code.  This is because the code in the asms can't be
removed completely, and can't be moved much either.  With out of order
execution, the type of moves that are permitted (not across dependencies)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
are precisely the type of moves that the CPU's scheduler can do or undo
no matter how the compiler orders the code.

I disagree. For example look at the use of in_addword() in dev/sk/if_sk.cv in line 2819:
 csum1 = htons(csum & 0xffff);
 csum2 = htons((csum >> 16) & 0xffff);
 ipcsum = in_addword(csum1, ~csum2 & 0xffff);
 /* checksum fixup for IP options */
 len = hlen - sizeof(struct ip);
 if (len > 0) {
   return;
 }

The calculation will be executed even if the following if (len > 0) leaves the function and the value of ipcsum is unused. If in_addword() is not marked volatile it can be moved after the if and not be executed in all cases. csum1 and csum2 can be moved after the if, too.

No, volatile has no effect on whether the above calculation will be
executed, since the early return has no dependencies on the caclulation.

The volatile induces a dependency.

Old versions of gcc used to handle volatile like that, but this changed
in gcc-3 or earlier.  gcc.info now says:

% The `volatile' keyword indicates that the instruction has important
% side-effects.  GCC will not delete a volatile `asm' if it is reachable.
                                                      ^^^^^^^^^^^^^^^^^^^

This is not about whether the code is reachable or not (it is reachable), it is about whether the result is used (i.e. whether the code is dead).

% (The instruction can still be deleted if GCC can prove that
% control-flow will never reach the location of the instruction.)  Note
% that even a volatile `asm' instruction can be moved relative to other
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
% code, including across jump instructions.  For example, on many targets
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

jump != conditional jump.
If it moves the *volatile* asm statement across the if, it would only appear on *some* execution paths, which is wrong. It is perfectly fine to move it, when the statement is not volatile, though.

Even if gcc didn't move the caclulation, then CPUs with out of order
execution might schedule it so that it is effectively never executed
(most likely by executing it in otherwise-unused pipelines while the
main pipeline returns).  This is valid for the same reasons that gcc
can move the volatile asms -- the return doesn't depend on the result
of the caclulation.

This is for the CPU to decide. If the assembler block really contains "important" stuff like memory barriers, writes to machine control registers etc., the CPU will not "effectively never execute" the code. The compiler does not know this, all it sees is the word "volatile".

The above C code is fairly bad, but generates not so bad code on i386:

% %     movl    %esi, %eax
% #APP
%     xchgb %ah, %al        # byte operations can be slow; this one not
                # too bad, but I wonder if rorw $8 is better
                # (rorl $16 is already used for corresponding
                # 32-bit operations) where there is no xchg
                # alternative
% #NO_APP

And this again is an example why not to use inline assembler, but let the compiler decide this:

unsigned short swap16(unsigned short x)
{
  return x >> 8 | x << 8;
}

is compiled to

swap16:
        movl    4(%esp), %eax
        rolw    $8, %ax
        ret

The compiler is even able to do optimisations, which it absolutely cannot do, if inline assembler is used, example:

unsigned short id(unsigned short x)
{
  return swap16(swap16(x));
}

results in

id:
        movl    4(%esp), %eax
        ret

Maybe the MD htons() macros should be replaced by MI code.

%     shrl    $16, %esi
%     movl    %esi, %edx
% #APP
%     xchgb %dh, %dl        # as above
% #NO_APP
%     notl    %edx        # poor asm code -- the top 16 bits are unused
                # except here to stall for merging them with
                # the previous byte operation

The compiler simply does not know, that the inline assembler only operates on parts of the register. Another reason not to use inline assembler:

u_int g(u_short x)
{
  return ~swap16(x);
}

g:
        movl    4(%esp), %eax
        rolw    $8, %ax
        movzwl  %ax, %eax         # avoid stall
        notl    %eax
        ret


Regards
        Christoph
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to