* H. Peter Anvin <h...@zytor.com> wrote:

> I think you misunderstand partial register stalls.  They happen (on some 
> microarchitectures) when you write part of a register and then use the whole 
> register.

Yes, there's no partial register stall in this or later code handling these 
values.

> > "setbe %al" insn has a register merge stall: it needs to combine previous 
> > %eax 
> > value with new value for the lowest byte. Subsequent "movzbl %al,%edi" in 
> > turn 
> > depends on its completion.
> > 
> > This patch replaces "setbe %al + movzbl %al,%edi" pair of insns with "xor 
> > %edi,%edi" before the comparison, and conditional "inc %edi".

So here's the code in wider context:

>    cmpl      $-MAX_ERRNO, %eax     /* is it an error ? */
>    jbe       1f
>    movslq    %eax, %rsi            /* if error sign extend to 64 bits */
> 1: setbe     %al                   /* 1 if error, 0 if not */
>    movzbl    %al, %edi             /* zero-extend that into %edi */

What happens here is that at the point the SETBE executes it needs to know the 
previous 32-bit value of EAX. But the previous JBE needs to know it already (it 
needs the CF and ZF result of the CMPL comparison), so there's no real 
additional 
dependency.

(The MOVSLQ of EAX will likewise already have the full value of EAX, because 
the 
already JBE needs it.)

Furthermore, the following SETBE sets an entirely new value for the 8-bit AL. 
The 
'entirely new value' will be handled by modern uarchs with register renaming 
(and 
marking that it's a rename for the low byte of EAX), giving the new value a 
separate, independent path to compute and use - and that renamed register value 
will be moved into EDI (zero-extended).

The CPU might eventually have to merge the previous value of EAX with the new 
value for AL, but there's no dependency on it in this piece of code. If there 
was 
a dependency on the full value then _that_ would create a partial register 
stall.

And as it happens, there's no such subsequent dependency, because we call a C 
function right away:

       call    __audit_syscall_exit

and RAX is a freely available register used as the return code. It's being 
overwritten early in the __audit_syscall_exit() function's execution by zeroing:

    28d4:       19 c0                   sbb    %eax,%eax

which will fully overwrite the previous partial value without extra 
dependencies.

So the real motivation of the patch is to simplify the setting of EDI to 0 or 1 
by 
using a branch we already execute.

Thanks,

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to