Hi all,

In the latest Intel optimization manual, we can read in section "2.3.5.4 
Data Prefetching":

Spatial Prefetcher: This prefetcher strives to complete every cache line 
> fetched to the L2 cache with the pair line that completes it to a 128-byte 
> aligned chunk.


I take this to mean that adjacent cache lines are brought together from 
memory (as often as possible). Indeed, there is code (e.g. JCTools, Folly) 
that assumes the false sharing range is 128 bytes and pads accordingly.

However, doing some more exegesis on the manual reveals there is no mention 
of prefetching in the context of false sharing, and save for the NetBurst 
microarchitectures, all advice seems to be to place variables in different 
cache lines:

On Pentium M, Intel Core Solo, Intel Core Duo processors, and processors 
> based on Intel Core microarchitecture; a synchronization variable should be 
> placed alone and in separate cache line to avoid false-sharing. Software 
> must not allow a synchronization variable to span across page boundary.


Similarly, in the Linux kernel the false sharing range seems to be just a 
cache line (64 bytes).

I myself saw no difference whether values are 1 or 2 cache lines apart, 
when running tests to demonstraste false sharing 
(https://gist.github.com/duarten/b7ee60b4412596440a97498d87bf402e), but 
that's only relevant for the microarchitecture I'm in (Haswell-E).

Am I missing something?

Cheers,
Duarte

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to