[gem5-users] Printing Statistics type variable using DPRINTF for debugging
Hello everyone I am trying to monitor how a particular Stats::Formula type variable (like cache miss rate) evolves at runtime. I have tried to display its value using DPRINTF but to no avail. Can someone guide me on how to accomplish this? Thank you. Regards, Arun ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] DRAM memory access latency
Hello Users, I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache and LPDDR3 x32 DRAM. According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL = '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is 22 ns and on cache miss is 132ns. Which means DRAM memory access latency ~ 110ns. However according to calculation it should be tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns. The latency what i observe is almost 50ns higher than what it is supposed to be. Is there anything which I am missing? Do any one know what else could add to the DRAM memory access latency? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] DRAM memory access latency
Hi Parthap, the latency tRP+tRCD+tCL+tBURST is only the static access latency for DRAM. In memory subsystem, there is also dynamic queuing delay due to the memory controller scheduling (reordering) and resource availability (bank conflict, refresh, other timing constraints like tFAW, tRRD, tWTR) . Therefore, the average request latency is more than the theoretic static latency. This is why there are many papers talking about the request scheduling and refresh relax to make possible optimization. -Tao On Tue, Nov 4, 2014 at 11:28 AM, Prathap Kolakkampadath via gem5-users gem5-users@gem5.org wrote: Hello Users, I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache and LPDDR3 x32 DRAM. According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL = '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is 22 ns and on cache miss is 132ns. Which means DRAM memory access latency ~ 110ns. However according to calculation it should be tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns. The latency what i observe is almost 50ns higher than what it is supposed to be. Is there anything which I am missing? Do any one know what else could add to the DRAM memory access latency? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] DRAM memory access latency
Prathap, You are probably missing DRAM queuing latency (major reason) and other on-chip latencies (such as bus latency) if any. Thanks, Amin On Tue, Nov 4, 2014 at 1:28 PM, Prathap Kolakkampadath via gem5-users gem5-users@gem5.org wrote: Hello Users, I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache and LPDDR3 x32 DRAM. According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL = '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is 22 ns and on cache miss is 132ns. Which means DRAM memory access latency ~ 110ns. However according to calculation it should be tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns. The latency what i observe is almost 50ns higher than what it is supposed to be. Is there anything which I am missing? Do any one know what else could add to the DRAM memory access latency? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] Prefetcher for Alpha in GEM5
Hi I am new to GEM5 and was trying to run benchmarks with prefetching enabled. I am using ALPHA. For this I tried changing the cache config files in config/common folder but still could not run it. Could anyone help as to how to use a Stride Prefetcher for an o3 cpu_type in GEM5? Thanks ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] Error in VncServer::checkProtocolVersion() - len==12 failed
Hi, I am running a full system emulation for the new ARM64 disk image. I am getting the following error when I do so - gem5.opt: build/ARM/base/vnc/vncserver.cc:382: void VncServer::checkProtocolVersion(): Assertion `len == 12' failed. Can someone help me figure this out? Regards, Urmish ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] McPAT with DVFS on ARM: De-couple chip-modelling and power computation phases
Hello, I am running ARM FS mode sims on gem5 with DVFS enabled. Currently only CPU frequency is being scaled dynamically. Now if in a simulation run, I observe N frequencies of CPU, then invoking McPAT (for post-processing) N times, each time with respective stats/xml, is technically wrong since McPAT will optimize the circuit on every invocation. What I want to do is to decouple the circuit initialization phase from power computation phase so that McPAT creates a circuit model on first invocation, and I can then use the same circuit for computations for next N-1 invocations. I have been going through McPAT source code and this does not look like a quick fix. I believe someone in the community would have tried something similar. Is there an existing way of achieving this decoupling of McPAT phases or some McPAT version that I can leverage? Otherwise, could someone give some details about how to implement this? Thanks Lokesh. ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] DRAM memory access latency
Hi Tao,Amin, According to gem5 source, MemAccLat is the time difference between the packet enters in the controller and packet leaves the controller. I presume this added with BusLatency and static backend latency should match with system.l2.ReadReq_avg_miss_latency. However i see a difference of approx 50ns. As mentioned above if MemAccLat is the time a packet spends in memory controller, then it should include the queuing latency too. In that case the value of avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat? Thanks, Prathap On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com wrote: From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the overall average memory latency. It is 63.816ns, which is very close to 60ns as you calculated. I guess the extra 3.816ns is due to the refresh penalty. -Tao On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath kvprat...@gmail.com wrote: Hi Toa, Amin, Thanks for your reply. To discard interbank interference and queueing delay, i have partitioned the banks so that the latency benchmark has exclusive access to a bank. Also latency benchmark is a pointer chasing benchmark, which will generate a single read request at a time. stats.txt says this: system.mem_ctrls.avgQLat 43816.35 # Average queueing delay per DRAM burst system.mem_ctrls.avgBusLat 5000.00 # Average bus latency per DRAM burst system.mem_ctrls.avgMemAccLat 63816.35 # Average memory access latency per DRAM burst system.mem_ctrls.avgRdQLen 2.00 # Average read queue length when enqueuing system.mem_ctrls.avgGap 136814.25 # Average gap between requests system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 # average ReadReq miss latency The average Gap between requests is equal to the L2 latency + DRAM Latency for this test. Also avgRdQLen is 2 because cache line size is 64 and DRAM interface is x32. Is the final latency sum of avgQLat + avgBusLat + avgMemAccLat ? Also when avgRdQLen is 2, i am not sure what amounts to high queueing latency? Regards, Prathap On Tue, Nov 4, 2014 at 1:38 PM, Amin Farmahini amin...@gmail.com wrote: Prathap, You are probably missing DRAM queuing latency (major reason) and other on-chip latencies (such as bus latency) if any. Thanks, Amin On Tue, Nov 4, 2014 at 1:28 PM, Prathap Kolakkampadath via gem5-users gem5-users@gem5.org wrote: Hello Users, I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache and LPDDR3 x32 DRAM. According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL = '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is 22 ns and on cache miss is 132ns. Which means DRAM memory access latency ~ 110ns. However according to calculation it should be tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns. The latency what i observe is almost 50ns higher than what it is supposed to be. Is there anything which I am missing? Do any one know what else could add to the DRAM memory access latency? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] extract Virtual address
I want to extract Virtual address reference and rewrite it into file ,can anybody help me. Thanks, Ranjan ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users