To optimise SIZE over speed, sorry. Missed a word out.
On 22/11/2019 09:04, J. Gareth Moreton wrote:
Does that mean in some situations, if you have a small, tight loop, it
might be better to optimise over speed in some very rare cases? For
example, turning MOV EAX, $FFFFFFFF into OR EAX, $FF to squeeze out a
few extra bytes, even though the instruction introduces a false
dependency.
Gareth aka. Kit
On 22/11/2019 08:29, Marģers . via fpc-devel wrote:
Op 10/11/2019 om 11:17 schreef Marģers . via fpc-devel
Most processors have a fairly large uop cache (up to 2048 for the
newest
generations iirc), so this would only be for the first iteration?
Do you
have a reference (agner fog page or so) or more explanation for this
that describes this?)
I have to revoke my statement. Don't have evidence to back up.
Code, that lead me to thous conclusions, has been discarded.
I have read most whats published in agner's fog page. There nothing
to pinpoint as reference.
No prob. Was just interested, I had to do some sse/avx code the last
years, and hadn't heard of this.
I did some research
manual from Agner's Fog page
The microarchitecture of Intel, AMD and VIA CPUs
20.17 Cache and memory access
Level 1 code 64 kB, 4 way, 256 sets, 64 B line size, per core.
Latency 4 clocks
As well i created some performance tests and found out that if loop
crossed 64 B line it got 20% performance lose while measurement error
was 2%.
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel