Hi all,

 

I run the MARSSx86 full-system simulator to test TLB performance with simple
ooo core, and I have a question about the dtlb_latency status (in the yml
output file), which is used for recording tlb walk latency when tlb misses
happened.

I found that most of the tlb misses have quite short latency. Take 429.mcf
benchmark (size: ref) for example, 99% of the dtlb_latency has latency less
than 23 cycles.

Since the page table in 64-bit Linux kernel has 4-level directory, it means
that each tlb walk would result at most 4 memory accesses. 

In my configuration, a memory access has about 120 cycles latency (fixed).
Which means that a tlb walk would has at most 480 cycles latency.

 

So with the dtlb_latency output. 

Is it means that 99% of the tlb walks are not resulting any memory (DRAM)
accesses (otherwise its latency would >= 120 cycles), and they all hit in
the cache (2-level)?

This seems impossible for 429.mcf, since the L2 cache has a poor hit rate
(about 36%).

 

In ReorderBufferEntry::tlbwalk() (in ptlsim\core\ooo-core\ooo-exec.cpp), by
the following code, it could confirm that each tlb walk would access the L1
Cache: 

 

bool L1_hit = core.memoryHierarchy->access_cache(request);

 

    if(L1_hit) {

        tlb_walk_level--;

    } else {

        cycles_left = 0;

        changestate(thread.rob_cache_miss_list);

 

    }

 

Thus I have a doubt that all the tlb walks do not walk through the memory
hierarchy normally. 

It might just only access the L1 Cache, and never access the memory.

 

Am I right? 

Could any one figure out how the MARSSx86 process tlb walk, especially how a
tlb miss walks though the memory (cache) hierarchy?

 

Thanks in advance!

 

Best Regards!

Licheng Chen

ICT, CAS

 

_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to