On Friday, 23 November 2018 at 13:23:22 UTC, welkam wrote:
If we run these steps in different thread on the same core with SMT we could better use core`s resources. Reading file with kernel, decoding UTF-8 with vector instructions and lexing/parsing with scalar operations while all communication is done trough L1 and L2 cache.
You might save some pages from the data cache, but by doing more work at once, the code might stop fitting in the execution-related caches (code pages, microcode, branch prediction) instead.