Hi Dramninjas, I agree with you about the explanation. That sounds make sense. But one thing I can not understand is when we change the cache latency to a very big number e.g 200 under single core mode (using cacheController.cpp) run mcf, the ipc doesn't change a lot. Have u try to run this kind of simulation by changing latency to see the impact on ipc?
Best, Zhenyu Date: Thu, 5 May 2011 12:13:24 -0400 Subject: Re: [marss86-devel] How the latency of cache access is reflected on the ipc? From: [email protected] To: [email protected]; [email protected] Also, you're right that the lines of code that you should be looking for are the add_event() lines -- those are calls make up the call chains. Though, this isn't always the case. Sometimes the code will have an add_event() with delay=0 and sometimes they will just call the function directly. It makes it a little harder to look at. When I was debugging the cache code I was thinking about making it uniform (i.e. always calling add_event() with delay 0 instead of calling the function directly), but I didn't get around to it. If there's interest, I can do it, but I feel like not enough people read this code for it to matter. On Thu, May 5, 2011 at 12:05 PM, DRAM Ninjas <[email protected]> wrote: On Thu, May 5, 2011 at 11:48 AM, zhenyu sun <[email protected]> wrote: Hi DRAMninjas, Thank you very much for your explanation. I understand ur points. The problem I have now is I wanna figure out where is the "long chain of events" in the code. Is it in the event_queue. Because I saw one line in the code. In the cache_access_cb: memoryHierarchy_->add_event(signal, delay, (void*)queueEntry); You should read this line as 'delay cycles from now, execute the function pointed to by signal, and pass it queueEntry as an argument'. signal and delay get set based on whether this is a cache miss or hit. So let's say the access time for this cache is 5 cycles. After 5 cycles, the cache_hit_cb() function will be executed. That function will probably schedule the event for cache_access_completed_cb(), and then that function will generate an event that calls wait_interconnect_cb(), which will then generate a call to P2PInterconnect::handle_interconnect_cb() ... etc, etc. (note: these might not be the actual call chains -- these chains tend to be pretty long so I don't remember them exactly). Each of these events might have some delay associated with it. In other words, cycles are ticking by while the pipeline is waiting for the cache request to resolve. I think the final terminating point for a cache request is in cpuController:: finalize_request() at which point the cache request is signal'd to be done in the pipeline and execution continues. I don't look at the ooocore/pipeline code very often so I can't tell you where the actual IPC statistics are computed (probably in the writeback phase somewhere), but that's the overall scheme by which IPC is implicitly determined. I hope that was clear enough ... Once this function is called, and in the definition of add_event event->setup(signal, sim_cycle + delay, arg); so does it mean the delay is finally added on the sim_cycle which is used in the ipc calculation: number of instruction/sim_cycle? In other words, does the add_event function which is the member of class Event influence the global variable? Thank you very much. I am really urgent to know this answer. Best, Zhenyu Date: Thu, 5 May 2011 11:16:18 -0400 Subject: Re: [marss86-devel] How the latency of cache access is reflected on the ipc? From: [email protected] To: [email protected] CC: [email protected] I think the answer to this question is sort of complicated. If you look down below, the delay variable is basically used to schedule either the cache hit event delay cycles in the future or a cache miss event delay cycles in the future. The code you pointed out is just a small step in the life of a cache request. Essentially at each point in the simulation, the simulator decides which function to call and how far in the future. So a cache hit will execute in x cycles, but a cache miss might go out to memory and take x+100 cycles to complete. This *total* latency to service a request is the key. The pipeline will send a cache request out, wait for it to complete, and then retire it. This is really what determines the IPC. What you've pointed out in the cacheController is one step along this long chain of events which ends in the request being 'completed' and the instruction moving forward in the pipeline. Please correct me if I misunderstood your question. On Fri, Apr 29, 2011 at 10:36 PM, zhenyu sun <[email protected]> wrote: Hi everyone, I am working on the CPU performance study under different cache hit latencies. In the cacheController.cpp: if(hit) { delay = cacheAccessLatency_; If the CPU encounter a stall caused by write access (dependency or buffer is full), how does the 'delay' influence the ipc. In other word, in which part of the code the delay is added on the sim_cycle? Thanks a lot. Zhenyu Sun _______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
_______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
