[marss86-devel] 答复: About dtlb_latency stats

chenlicheng Fri, 26 Apr 2013 02:55:14 -0700

Hi all,


         Another strange  <http://dict.youdao.com/w/phenomenon/> phenomenon
I found is that,  all tlb walks are L1 Cache Miss.

 

         In ReorderBufferEntry::tlbwalk() (in ptlsim\core\ooo-core\ooo-exec.
cpp), I add stat object to count the L1 hits and misses (show in read
color):

         

         bool L1_hit = core.memoryHierarchy->access_cache(request);

 

         if(L1_hit) {

                   tlb_walk_level--;

        thread.thread_stats.dcache.tlb_walk_cache_hits++;  //// count L1
hits of tlb walks

         } else {

                   cycles_left = 0;

                   thread.thread_stats.dcache.tlb_walk_cache_misses++;
//// count L1 misses of tlb walks

                   changestate(thread.rob_cache_miss_list);

         }

 

         I run 10 applications of SPECCPU2006, and all the output shows
that: the L1 hit of tlb walks is zero.

         I think this is a quit unnormal result.

         

         Thanks in advance!

 

Best regards!

Licheng Chen

ICT,CAS

 

 

发件人: [email protected]
[mailto:[email protected]] 代表 chenlicheng
发送时间: 2013年4月26日 0:31
收件人: [email protected]
主题: [marss86-devel] About dtlb_latency stats

 

Hi all,

 

I run the MARSSx86 full-system simulator to test TLB performance with simple
ooo core, and I have a question about the dtlb_latency status (in the yml
output file), which is used for recording tlb walk latency when tlb misses
happened.

I found that most of the tlb misses have quite short latency. Take 429.mcf
benchmark (size: ref) for example, 99% of the dtlb_latency has latency less
than 23 cycles.

 

Since the page table in 64-bit Linux kernel has 4-level directory, it means
that each tlb walk would result at most 4 memory accesses. 

In my configuration, a memory access has about 120 cycles latency (fixed).
Which means that a tlb walk would has at most 480 cycles latency.

 

So with the dtlb_latency output. 

Is it means that 99% of the tlb walks are not resulting any memory (DRAM)
accesses (otherwise its latency would >= 120 cycles), and they all hit in
the cache (2-level)?

This seems impossible for 429.mcf, since the L2 cache has a poor hit rate
(about 36%).

 

In ReorderBufferEntry::tlbwalk() (in ptlsim\core\ooo-core\ooo-exec.cpp), by
the following code, it could confirm that each tlb walk would access the L1
Cache: 

 

bool L1_hit = core.memoryHierarchy->access_cache(request);

 

    if(L1_hit) {

        tlb_walk_level--;

    } else {

        cycles_left = 0;

        changestate(thread.rob_cache_miss_list);

 

    }

 

Thus I have a doubt that all the tlb walks do not walk through the memory
hierarchy normally. 

It might just only access the L1 Cache, and never access the memory.

 

Am I right? 

Could any one figure out how the MARSSx86 process tlb walk, especially how a
tlb miss walks though the memory (cache) hierarchy?

 

Thanks in advance!

 

Best Regards!

Licheng Chen

ICT, CAS

_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

[marss86-devel] 答复: About dtlb_latency stats

Reply via email to