> Does that mean in some situations, if you have a small, tight loop, it
> might be better to optimise over speed in some very rare cases? For
> example, turning MOV EAX, $FFFFFFFF into OR EAX, $FF to squeeze out a
> few extra bytes, even though the instruction introduces a false dependency.

Latency 4 clock cycles is a lot. As long dependency can be resolved in shorter 
time there will be some performance gain. 
That performance penalty is not fixed 20%. It depends what code you have before 
that. Long latency instructions have time to catch up with rest of code. It is 
possible to completely cancel out, by placing call so that ret will fall into 
next 64 byte line. 
It's place where tricky optimizations can be done.

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to