> On July 1, 2015, 3:07 p.m., Andreas Hansson wrote: > > src/mem/DRAMCtrl.py, line 460 > > <http://reviews.gem5.org/r/2932/diff/1/?file=47349#file47349line460> > > > > Really? I was under the impression that the vault controllers were > > simple closed page. See for example > > http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7095154 > > Erfan Azarkhish wrote: > In fact, vault controllers are simple closed page policy. But I faced a > problem with the simple closed policy in the DRAMCtrl class: If we issue a > packet longet than burst_length (for example a burst of 128Bytes or 256Bytes > which is supported by HMC), DRAMCtrl will open and close the row for every > subtransaction. While the close_adaptive keeps the row open until all > subtransactions complete. > Another alternative solution would be to set the page policy to simple > closed, but increase burst_length to maximum supported by HMC (128 or 256), > but then the problem will be that smaller transactions incur a large tBURST > latency. > I hope that this makes sense. > > Andreas Hansson wrote: > I see your point. In a real-life scenario I would hardly ever expect to > see any 128 or 256 byte transactions though, so I would prefer to stick to a > close page policy for now. We can always beef up the policy so that it is > less strict in the future.
I agree, but even with a 64B burst, the DRAM page will be opened and closed twice. Because burst_length is 8 x 32bits = 32Bytes and for 64Bytes we need to access twice. I don't think this is acceptable. I think we should either modify the closed policy for this matter, or stick with the close_adaptive policy. What's your opinion? > On July 1, 2015, 3:07 p.m., Andreas Hansson wrote: > > src/mem/DRAMCtrl.py, line 469 > > <http://reviews.gem5.org/r/2932/diff/1/?file=47349#file47349line469> > > > > I am somewhat surprised that we need values this high. With a 32 vault > > cube this would amount to an astonishing number of queued transactions (and > > there is probably no system out there that would even get close to even > > using 2048 outstanding transactions). > > Erfan Azarkhish wrote: > I performed some accuracy comparisons between the gem5 model and our > cycle-accurate HMC model, and I realized that the the 'buffer_size' parameter > in gem5 does not correspond directly to the number of flip-flops in the > system. This is because gem5's standard memory system does not work based on > flits, and it only approximates them. This is acceptable, but we should > remember that from the 'buffer_size' in gem5 we cannot draw conclusions about > the number of hardware flip-flops in that components. > Here is how I obtained this value: > I injected high pressure (identical) traffic to both models (gem5 and the > Cycle-accurate model) and tried to match their 'delivered bandwidth' and > 'execution time' with a high accuracy (less than 5% difference). > I observed that if 'buffer_size' is not large enough, gem5 does not > deliver the intended bandwidth. So I increased this value and gem5's > bandwidth matched the CA simulation really well. > This issue also stems from the fact that in real hardware, most > components interact with request/grant or valid/stall handshaking FIFOs which > regulate bandwidth, but in gem5's standard memory system these concepts > cannot be directly modeled. For this reason the buffer_size in the bridge and > the DRAMCtrl components should be adjusted to achieve the desired bandwidth > and execution time. > > Andreas Hansson wrote: > gem5's buffer size is expressed in DRAM bursts, is that where the problem > lies? I am still not sure I understand your argument about request/grant and > valid/stall. I fully appreciate that you changed it to make things aligned, > but to me it sounds like there is something else we should be adjusting > rather. There are three different points: 1. in gem5 when we specify buffer_size = 32 (either in the DRAMCtrl or in the Bridge), it means 32 packets and not 32 flits. This is one important source of discrepancy with real hardware. So one should carefully match the delivered bandwidth and execution-time of the gem5 simulation with a real hardware or a cycle-accurate simulator by adjusting the value of these parameters (changing buffer_size in the DRAMCtrl or Bridge can vary execution time and bandwidth widely). I observed that these values should be high enough, and they do not match with the actual number of flip-flops in the cycle-accurate simulation. 2. In my experiments, I observed that, to hide the DRAM access latency (static_front_end + tRCD + tCL + tBURST + static_back_end) which is large in HMC, and to obtained the desired bandwidth, I have to increase the buffer_size of the DRAM controller. I applied high pressure traffic to the DRAMCtrl and compared its delivered bandwidth with the cycle-accurate model. 3. Consider that we have a cache, 2 levels of crossbar, and a memory. If any of the crossbars in between is busy, the request from the cache will be retried from the cache port after a retry event is received. This makes a long combinational path from the cache to the memory, and will have negative impacts on the delivered bandwidth (This situation is the same for any series of components without internal queues). The other alternative is to break the combinational path with request/grant FIFOs between the interconnects. The only component with such capability is Bridge because it has internal queues. So placing bridges between the interconnects can regulate bandwidth, but again we should be careful about point 1. - Erfan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2932/#review6656 ----------------------------------------------------------- On July 1, 2015, 10:25 a.m., Erfan Azarkhish wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.gem5.org/r/2932/ > ----------------------------------------------------------- > > (Updated July 1, 2015, 10:25 a.m.) > > > Review request for Default. > > > Repository: gem5 > > > Description > ------- > > Minor fixes in the HMC vault model > This patch and the subsequent chain of patches aim to model a simple Hybrid > Memory Cube device in gem5 > Please apply the complete chain before running the simulation. > > > Diffs > ----- > > src/mem/DRAMCtrl.py 73d4798871a5 > > Diff: http://reviews.gem5.org/r/2932/diff/ > > > Testing > ------- > > > Thanks, > > Erfan Azarkhish > > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
