> Op 10/11/2019 om 11:17 schreef Marģers . via fpc-devel > > Most processors have a fairly large uop cache (up to 2048 for the newest > >> generations iirc), so this would only be for the first iteration? Do you > >> have a reference (agner fog page or so) or more explanation for this > >> that describes this?) > > I have to revoke my statement. Don't have evidence to back up. Code, that > > lead me to thous conclusions, has been discarded. > > I have read most whats published in agner's fog page. There nothing to > > pinpoint as reference. > No prob. Was just interested, I had to do some sse/avx code the last > years, and hadn't heard of this.
I did some research manual from Agner's Fog page The microarchitecture of Intel, AMD and VIA CPUs 20.17 Cache and memory access Level 1 code 64 kB, 4 way, 256 sets, 64 B line size, per core. Latency 4 clocks As well i created some performance tests and found out that if loop crossed 64 B line it got 20% performance lose while measurement error was 2%. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel