Hi all,
I run the MARSSx86 full-system simulator to test TLB performance with simple
ooo core, and I have a question about the dtlb_latency status (in the yml
output file), which is used for recording tlb walk latency when tlb misses
happened.
I found that most of the tlb misses have quite short latency. Take 429.mcf
benchmark (size: ref) for example, 99% of the dtlb_latency has latency less
than 23 cycles.
Since the page table in 64-bit Linux kernel has 4-level directory, it means
that each tlb walk would result at most 4 memory accesses.
In my configuration, a memory access has about 120 cycles latency (fixed).
Which means that a tlb walk would has at most 480 cycles latency.
So with the dtlb_latency output.
Is it means that 99% of the tlb walks are not resulting any memory (DRAM)
accesses (otherwise its latency would >= 120 cycles), and they all hit in
the cache (2-level)?
This seems impossible for 429.mcf, since the L2 cache has a poor hit rate
(about 36%).
In ReorderBufferEntry::tlbwalk() (in ptlsim\core\ooo-core\ooo-exec.cpp), by
the following code, it could confirm that each tlb walk would access the L1
Cache:
bool L1_hit = core.memoryHierarchy->access_cache(request);
if(L1_hit) {
tlb_walk_level--;
} else {
cycles_left = 0;
changestate(thread.rob_cache_miss_list);
}
Thus I have a doubt that all the tlb walks do not walk through the memory
hierarchy normally.
It might just only access the L1 Cache, and never access the memory.
Am I right?
Could any one figure out how the MARSSx86 process tlb walk, especially how a
tlb miss walks though the memory (cache) hierarchy?
Thanks in advance!
Best Regards!
Licheng Chen
ICT, CAS
_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel