davezarzycki added a comment.

In D70157#1772227 <https://reviews.llvm.org/D70157#1772227>, @annita.zhang 
wrote:

> > 
> > 
> >>> Third, I have not see a justification for why complexity for instruction 
> >>> prefix padding is necessary.  All the effected CPUs support multi-byte 
> >>> nops, so we're talking about a *single micro op* difference between the 
> >>> nop form and prefix form.  Can anyone point to a performance delta due to 
> >>> this?  If not, I'd suggest we should start with the nop form, and then 
> >>> build the prefix form in a generic manner for all alignment varieties.
> >> 
> >> +1.
> > 
> > +1. Starting from just NOP padding sounds a simple and good first step. We 
> > can explore segment override prefixes in the future.
>
> I think it's a good suggestion to start with NOP padding as the first step. 
> In our previous experiment, we saw that the prefix padding was slight better 
> than NOP padding, but not much. We will retest the NOP padding and go back to 
> you.


For whatever it may be worth: Agnor Fog's empirical research on x86 pipelines 
and his review of manufacturer optimization guidelines also concludes that 
prefixes are often preferable to NOPs on modern x86 processors. (See: 
https://www.agner.org/optimize/microarchitecture.pdf) This arguably isn't 
surprising given that the decoder needs to be good at finding instruction 
boundaries but the decoder isn't responsible for interpreting instructions, 
therefore NOPs of any size dilute decode bandwidth.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to