Re: [gem5-users] DRAM memory access latency
Hi Prathap, The crossbar has a given throughput and latency, but I think the default is 1 Ghz, and 128-bit (16 byte) wide data patch, with a single cycle overhead (64 bytes every 5 ns). If that is indeed a limit, then you can always increase the crossbar clock or width. Note that if you have very bursty reads for example you can easily build up a backlog. If the crossbar is not the issue, then perhaps the master port that is the destination for the response is causing the problem? Andreas From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com Date: Tuesday, 11 November 2014 03:41 To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org Subject: Re: [gem5-users] DRAM memory access latency Hello Andreas, waiting in the port until the crossbar can accept it Is this because of Bus Contention? In that case, is there a way to reduce this latency by changing any parameters in gem5? Thanks, Prathap On Thu, Nov 6, 2014 at 2:30 PM, Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote: Hi Prathap, I suspect the answer to the mysterious 50 ns is due to the responses being sent back using a so called “queued port” in gem5. Thus, from the memory controller’s point of view the packet is all done, but is now waiting in the port until the crossbar can accept it. This queue can hold a number of packets if there has been a burst of responses that are trickling through the crossbar on their way back. You can always run with some debug flags to verify this (XBar, DRAM, PacketQueue etc). Coincidentally I have been working on a patch to remove this “invisible” queue and should hopefully have this on the review board shortly. Andreas From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com Date: Thursday, November 6, 2014 at 5:47 PM To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org Subject: Re: [gem5-users] DRAM memory access latency Hello Andreas, Thanks for your reply. Ok. I got that the memory access latency indeed includes the queueing latency. And for the read/write request that miss the buffer has a static latency of Static frontend latency + Static backend latency. To summarize, the test i run is a latency benchmark which is a pointer chasing test(only one request at a time) , generate reads to a specific DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and single channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other cpu's. Test statistics: system.mem_ctrls.avgQLat 43816.35 # Average queueing delay per DRAM burst system.mem_ctrls.avgBusLat5000.00 # Average bus latency per DRAM burst system.mem_ctrls.avgMemAccLat63816.35 # Average memory access latency per DRAM burst system.mem_ctrls.avgRdQLen 2.00 # Average read queue length when enqueuing system.mem_ctrls.avgGap 136814.25 # Average gap between requests system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 # average ReadReq miss latency Based on above test statistics: avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns) +tCL(15ns)+static latency(20ns). Is this breakup correct? However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more than the avgMemAccLat. I couldn't figure out the components contributing to this 50ns latency. Your thoughts on this is much appreciated. Regards, Prathap On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote: Hi Prathap, The avgMemAccLat does indeed include any queueing latency. For the precise components included in the various latencies I would suggest checking the source code. Note that the controller is not just accounting for the static (and dynamic) DRAM latency, but also the static controller pipeline latency (and dynamic queueing latency). The controller static latency is two parameters that are by default also adding a few 10’s of nanoseconds. Let me know if you need more help breaking out the various components. Andreas From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.orgmailto:gem5-users@gem5.org Reply-To: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com, gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org Date: Wednesday, 5 November 2014 05:36 To: Tao Zhang tao.zhang.0...@gmail.commailto:tao.zhang.0...@gmail.com, gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org, Amin Farmahini amin...@gmail.commailto:amin...@gmail.com
Re: [gem5-users] DRAM memory access latency
Hello Andreas, waiting in the port until the crossbar can accept it Is this because of Bus Contention? In that case, is there a way to reduce this latency by changing any parameters in gem5? Thanks, Prathap On Thu, Nov 6, 2014 at 2:30 PM, Andreas Hansson andreas.hans...@arm.com wrote: Hi Prathap, I suspect the answer to the mysterious 50 ns is due to the responses being sent back using a so called “queued port” in gem5. Thus, from the memory controller’s point of view the packet is all done, but is now waiting in the port until the crossbar can accept it. This queue can hold a number of packets if there has been a burst of responses that are trickling through the crossbar on their way back. You can always run with some debug flags to verify this (XBar, DRAM, PacketQueue etc). Coincidentally I have been working on a patch to remove this “invisible” queue and should hopefully have this on the review board shortly. Andreas From: Prathap Kolakkampadath kvprat...@gmail.com Date: Thursday, November 6, 2014 at 5:47 PM To: Andreas Hansson andreas.hans...@arm.com Cc: gem5 users mailing list gem5-users@gem5.org Subject: Re: [gem5-users] DRAM memory access latency Hello Andreas, Thanks for your reply. Ok. I got that the memory access latency indeed includes the queueing latency. And for the read/write request that miss the buffer has a static latency of Static frontend latency + Static backend latency. To summarize, the test i run is a latency benchmark which is a pointer chasing test(only one request at a time) , generate reads to a specific DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and single channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other cpu's. Test statistics: system.mem_ctrls.avgQLat 43816.35 # Average queueing delay per DRAM burst system.mem_ctrls.avgBusLat 5000.00 # Average bus latency per DRAM burst system.mem_ctrls.avgMemAccLat 63816.35 # Average memory access latency per DRAM burst system.mem_ctrls.avgRdQLen 2.00 # Average read queue length when enqueuing system.mem_ctrls.avgGap 136814.25 # Average gap between requests system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 # average ReadReq miss latency Based on above test statistics: avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns) +tCL(15ns)+static latency(20ns). Is this breakup correct? However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more than the avgMemAccLat. I couldn't figure out the components contributing to this 50ns latency. Your thoughts on this is much appreciated. Regards, Prathap On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.com wrote: Hi Prathap, The avgMemAccLat does indeed include any queueing latency. For the precise components included in the various latencies I would suggest checking the source code. Note that the controller is not just accounting for the static (and dynamic) DRAM latency, but also the static controller pipeline latency (and dynamic queueing latency). The controller static latency is two parameters that are by default also adding a few 10’s of nanoseconds. Let me know if you need more help breaking out the various components. Andreas From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users mailing list gem5-users@gem5.org Date: Wednesday, 5 November 2014 05:36 To: Tao Zhang tao.zhang.0...@gmail.com, gem5 users mailing list gem5-users@gem5.org, Amin Farmahini amin...@gmail.com Subject: Re: [gem5-users] DRAM memory access latency Hi Tao,Amin, According to gem5 source, MemAccLat is the time difference between the packet enters in the controller and packet leaves the controller. I presume this added with BusLatency and static backend latency should match with system.l2.ReadReq_avg_miss_latency. However i see a difference of approx 50ns. As mentioned above if MemAccLat is the time a packet spends in memory controller, then it should include the queuing latency too. In that case the value of avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat? Thanks, Prathap On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com wrote: From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the overall average memory latency. It is 63.816ns, which is very close to 60ns as you calculated. I guess the extra 3.816ns is due to the refresh penalty. -Tao On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath kvprat...@gmail.com wrote: Hi Toa, Amin, Thanks for your reply. To discard interbank interference and queueing delay, i have
Re: [gem5-users] DRAM memory access latency
Hi Prathap, The avgMemAccLat does indeed include any queueing latency. For the precise components included in the various latencies I would suggest checking the source code. Note that the controller is not just accounting for the static (and dynamic) DRAM latency, but also the static controller pipeline latency (and dynamic queueing latency). The controller static latency is two parameters that are by default also adding a few 10’s of nanoseconds. Let me know if you need more help breaking out the various components. Andreas From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.orgmailto:gem5-users@gem5.org Reply-To: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com, gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org Date: Wednesday, 5 November 2014 05:36 To: Tao Zhang tao.zhang.0...@gmail.commailto:tao.zhang.0...@gmail.com, gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org, Amin Farmahini amin...@gmail.commailto:amin...@gmail.com Subject: Re: [gem5-users] DRAM memory access latency Hi Tao,Amin, According to gem5 source, MemAccLat is the time difference between the packet enters in the controller and packet leaves the controller. I presume this added with BusLatency and static backend latency should match with system.l2.ReadReq_avg_miss_latency. However i see a difference of approx 50ns. As mentioned above if MemAccLat is the time a packet spends in memory controller, then it should include the queuing latency too. In that case the value of avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat? Thanks, Prathap On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.commailto:tao.zhang.0...@gmail.com wrote: From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the overall average memory latency. It is 63.816ns, which is very close to 60ns as you calculated. I guess the extra 3.816ns is due to the refresh penalty. -Tao On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com wrote: Hi Toa, Amin, Thanks for your reply. To discard interbank interference and queueing delay, i have partitioned the banks so that the latency benchmark has exclusive access to a bank. Also latency benchmark is a pointer chasing benchmark, which will generate a single read request at a time. stats.txt says this: system.mem_ctrls.avgQLat 43816.35 # Average queueing delay per DRAM burst system.mem_ctrls.avgBusLat5000.00 # Average bus latency per DRAM burst system.mem_ctrls.avgMemAccLat63816.35 # Average memory access latency per DRAM burst system.mem_ctrls.avgRdQLen 2.00 # Average read queue length when enqueuing system.mem_ctrls.avgGap 136814.25 # Average gap between requests system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 # average ReadReq miss latency The average Gap between requests is equal to the L2 latency + DRAM Latency for this test. Also avgRdQLen is 2 because cache line size is 64 and DRAM interface is x32. Is the final latency sum of avgQLat + avgBusLat + avgMemAccLat ? Also when avgRdQLen is 2, i am not sure what amounts to high queueing latency? Regards, Prathap On Tue, Nov 4, 2014 at 1:38 PM, Amin Farmahini amin...@gmail.commailto:amin...@gmail.com wrote: Prathap, You are probably missing DRAM queuing latency (major reason) and other on-chip latencies (such as bus latency) if any. Thanks, Amin On Tue, Nov 4, 2014 at 1:28 PM, Prathap Kolakkampadath via gem5-users gem5-users@gem5.orgmailto:gem5-users@gem5.org wrote: Hello Users, I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache and LPDDR3 x32 DRAM. According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL = '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is 22 ns and on cache miss is 132ns. Which means DRAM memory access latency ~ 110ns. However according to calculation it should be tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns. The latency what i observe is almost 50ns higher than what it is supposed to be. Is there anything which I am missing? Do any one know what else could add to the DRAM memory access latency? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose
Re: [gem5-users] DRAM memory access latency
Hello Andreas, Thanks for your reply. Ok. I got that the memory access latency indeed includes the queueing latency. And for the read/write request that miss the buffer has a static latency of Static frontend latency + Static backend latency. To summarize, the test i run is a latency benchmark which is a pointer chasing test(only one request at a time) , generate reads to a specific DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and single channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other cpu's. Test statistics: system.mem_ctrls.avgQLat 43816.35 # Average queueing delay per DRAM burst system.mem_ctrls.avgBusLat5000.00 # Average bus latency per DRAM burst system.mem_ctrls.avgMemAccLat63816.35 # Average memory access latency per DRAM burst system.mem_ctrls.avgRdQLen 2.00 # Average read queue length when enqueuing system.mem_ctrls.avgGap 136814.25 # Average gap between requests system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 # average ReadReq miss latency Based on above test statistics: avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns) +tCL(15ns)+static latency(20ns). Is this breakup correct? However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more than the avgMemAccLat. I couldn't figure out the components contributing to this 50ns latency. Your thoughts on this is much appreciated. Regards, Prathap On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.com wrote: Hi Prathap, The avgMemAccLat does indeed include any queueing latency. For the precise components included in the various latencies I would suggest checking the source code. Note that the controller is not just accounting for the static (and dynamic) DRAM latency, but also the static controller pipeline latency (and dynamic queueing latency). The controller static latency is two parameters that are by default also adding a few 10’s of nanoseconds. Let me know if you need more help breaking out the various components. Andreas From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users mailing list gem5-users@gem5.org Date: Wednesday, 5 November 2014 05:36 To: Tao Zhang tao.zhang.0...@gmail.com, gem5 users mailing list gem5-users@gem5.org, Amin Farmahini amin...@gmail.com Subject: Re: [gem5-users] DRAM memory access latency Hi Tao,Amin, According to gem5 source, MemAccLat is the time difference between the packet enters in the controller and packet leaves the controller. I presume this added with BusLatency and static backend latency should match with system.l2.ReadReq_avg_miss_latency. However i see a difference of approx 50ns. As mentioned above if MemAccLat is the time a packet spends in memory controller, then it should include the queuing latency too. In that case the value of avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat? Thanks, Prathap On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com wrote: From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the overall average memory latency. It is 63.816ns, which is very close to 60ns as you calculated. I guess the extra 3.816ns is due to the refresh penalty. -Tao On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath kvprat...@gmail.com wrote: Hi Toa, Amin, Thanks for your reply. To discard interbank interference and queueing delay, i have partitioned the banks so that the latency benchmark has exclusive access to a bank. Also latency benchmark is a pointer chasing benchmark, which will generate a single read request at a time. stats.txt says this: system.mem_ctrls.avgQLat 43816.35 # Average queueing delay per DRAM burst system.mem_ctrls.avgBusLat 5000.00 # Average bus latency per DRAM burst system.mem_ctrls.avgMemAccLat 63816.35 # Average memory access latency per DRAM burst system.mem_ctrls.avgRdQLen 2.00 # Average read queue length when enqueuing system.mem_ctrls.avgGap 136814.25 # Average gap between requests system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 # average ReadReq miss latency The average Gap between requests is equal to the L2 latency + DRAM Latency for this test. Also avgRdQLen is 2 because cache line size is 64 and DRAM interface is x32. Is the final latency sum of avgQLat + avgBusLat + avgMemAccLat ? Also when avgRdQLen is 2, i am not sure what amounts to high queueing latency? Regards, Prathap On Tue, Nov 4, 2014 at 1:38 PM, Amin Farmahini amin...@gmail.com wrote: Prathap, You
Re: [gem5-users] DRAM memory access latency
Hi Prathap, I suspect the answer to the mysterious 50 ns is due to the responses being sent back using a so called “queued port” in gem5. Thus, from the memory controller’s point of view the packet is all done, but is now waiting in the port until the crossbar can accept it. This queue can hold a number of packets if there has been a burst of responses that are trickling through the crossbar on their way back. You can always run with some debug flags to verify this (XBar, DRAM, PacketQueue etc). Coincidentally I have been working on a patch to remove this “invisible” queue and should hopefully have this on the review board shortly. Andreas From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com Date: Thursday, November 6, 2014 at 5:47 PM To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org Subject: Re: [gem5-users] DRAM memory access latency Hello Andreas, Thanks for your reply. Ok. I got that the memory access latency indeed includes the queueing latency. And for the read/write request that miss the buffer has a static latency of Static frontend latency + Static backend latency. To summarize, the test i run is a latency benchmark which is a pointer chasing test(only one request at a time) , generate reads to a specific DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and single channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other cpu's. Test statistics: system.mem_ctrls.avgQLat 43816.35 # Average queueing delay per DRAM burst system.mem_ctrls.avgBusLat5000.00 # Average bus latency per DRAM burst system.mem_ctrls.avgMemAccLat63816.35 # Average memory access latency per DRAM burst system.mem_ctrls.avgRdQLen 2.00 # Average read queue length when enqueuing system.mem_ctrls.avgGap 136814.25 # Average gap between requests system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 # average ReadReq miss latency Based on above test statistics: avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns) +tCL(15ns)+static latency(20ns). Is this breakup correct? However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more than the avgMemAccLat. I couldn't figure out the components contributing to this 50ns latency. Your thoughts on this is much appreciated. Regards, Prathap On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote: Hi Prathap, The avgMemAccLat does indeed include any queueing latency. For the precise components included in the various latencies I would suggest checking the source code. Note that the controller is not just accounting for the static (and dynamic) DRAM latency, but also the static controller pipeline latency (and dynamic queueing latency). The controller static latency is two parameters that are by default also adding a few 10’s of nanoseconds. Let me know if you need more help breaking out the various components. Andreas From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.orgmailto:gem5-users@gem5.org Reply-To: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com, gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org Date: Wednesday, 5 November 2014 05:36 To: Tao Zhang tao.zhang.0...@gmail.commailto:tao.zhang.0...@gmail.com, gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org, Amin Farmahini amin...@gmail.commailto:amin...@gmail.com Subject: Re: [gem5-users] DRAM memory access latency Hi Tao,Amin, According to gem5 source, MemAccLat is the time difference between the packet enters in the controller and packet leaves the controller. I presume this added with BusLatency and static backend latency should match with system.l2.ReadReq_avg_miss_latency. However i see a difference of approx 50ns. As mentioned above if MemAccLat is the time a packet spends in memory controller, then it should include the queuing latency too. In that case the value of avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat? Thanks, Prathap On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.commailto:tao.zhang.0...@gmail.com wrote: From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the overall average memory latency. It is 63.816ns, which is very close to 60ns as you calculated. I guess the extra 3.816ns is due to the refresh penalty. -Tao On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com wrote: Hi Toa, Amin, Thanks for your reply. To discard interbank interference
Re: [gem5-users] DRAM memory access latency
Thanks for your reply. I will try to verify this and also get back to you with results once i run with your patch. Regards, Prathap On Thu, Nov 6, 2014 at 2:30 PM, Andreas Hansson andreas.hans...@arm.com wrote: Hi Prathap, I suspect the answer to the mysterious 50 ns is due to the responses being sent back using a so called “queued port” in gem5. Thus, from the memory controller’s point of view the packet is all done, but is now waiting in the port until the crossbar can accept it. This queue can hold a number of packets if there has been a burst of responses that are trickling through the crossbar on their way back. You can always run with some debug flags to verify this (XBar, DRAM, PacketQueue etc). Coincidentally I have been working on a patch to remove this “invisible” queue and should hopefully have this on the review board shortly. Andreas From: Prathap Kolakkampadath kvprat...@gmail.com Date: Thursday, November 6, 2014 at 5:47 PM To: Andreas Hansson andreas.hans...@arm.com Cc: gem5 users mailing list gem5-users@gem5.org Subject: Re: [gem5-users] DRAM memory access latency Hello Andreas, Thanks for your reply. Ok. I got that the memory access latency indeed includes the queueing latency. And for the read/write request that miss the buffer has a static latency of Static frontend latency + Static backend latency. To summarize, the test i run is a latency benchmark which is a pointer chasing test(only one request at a time) , generate reads to a specific DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and single channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other cpu's. Test statistics: system.mem_ctrls.avgQLat 43816.35 # Average queueing delay per DRAM burst system.mem_ctrls.avgBusLat 5000.00 # Average bus latency per DRAM burst system.mem_ctrls.avgMemAccLat 63816.35 # Average memory access latency per DRAM burst system.mem_ctrls.avgRdQLen 2.00 # Average read queue length when enqueuing system.mem_ctrls.avgGap 136814.25 # Average gap between requests system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 # average ReadReq miss latency Based on above test statistics: avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns) +tCL(15ns)+static latency(20ns). Is this breakup correct? However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more than the avgMemAccLat. I couldn't figure out the components contributing to this 50ns latency. Your thoughts on this is much appreciated. Regards, Prathap On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.com wrote: Hi Prathap, The avgMemAccLat does indeed include any queueing latency. For the precise components included in the various latencies I would suggest checking the source code. Note that the controller is not just accounting for the static (and dynamic) DRAM latency, but also the static controller pipeline latency (and dynamic queueing latency). The controller static latency is two parameters that are by default also adding a few 10’s of nanoseconds. Let me know if you need more help breaking out the various components. Andreas From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users mailing list gem5-users@gem5.org Date: Wednesday, 5 November 2014 05:36 To: Tao Zhang tao.zhang.0...@gmail.com, gem5 users mailing list gem5-users@gem5.org, Amin Farmahini amin...@gmail.com Subject: Re: [gem5-users] DRAM memory access latency Hi Tao,Amin, According to gem5 source, MemAccLat is the time difference between the packet enters in the controller and packet leaves the controller. I presume this added with BusLatency and static backend latency should match with system.l2.ReadReq_avg_miss_latency. However i see a difference of approx 50ns. As mentioned above if MemAccLat is the time a packet spends in memory controller, then it should include the queuing latency too. In that case the value of avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat? Thanks, Prathap On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com wrote: From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the overall average memory latency. It is 63.816ns, which is very close to 60ns as you calculated. I guess the extra 3.816ns is due to the refresh penalty. -Tao On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath kvprat...@gmail.com wrote: Hi Toa, Amin, Thanks for your reply. To discard interbank interference and queueing delay, i have partitioned the banks so that the latency benchmark has exclusive access to a bank
[gem5-users] DRAM memory access latency
Hello Users, I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache and LPDDR3 x32 DRAM. According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL = '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is 22 ns and on cache miss is 132ns. Which means DRAM memory access latency ~ 110ns. However according to calculation it should be tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns. The latency what i observe is almost 50ns higher than what it is supposed to be. Is there anything which I am missing? Do any one know what else could add to the DRAM memory access latency? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] DRAM memory access latency
Hi Parthap, the latency tRP+tRCD+tCL+tBURST is only the static access latency for DRAM. In memory subsystem, there is also dynamic queuing delay due to the memory controller scheduling (reordering) and resource availability (bank conflict, refresh, other timing constraints like tFAW, tRRD, tWTR) . Therefore, the average request latency is more than the theoretic static latency. This is why there are many papers talking about the request scheduling and refresh relax to make possible optimization. -Tao On Tue, Nov 4, 2014 at 11:28 AM, Prathap Kolakkampadath via gem5-users gem5-users@gem5.org wrote: Hello Users, I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache and LPDDR3 x32 DRAM. According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL = '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is 22 ns and on cache miss is 132ns. Which means DRAM memory access latency ~ 110ns. However according to calculation it should be tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns. The latency what i observe is almost 50ns higher than what it is supposed to be. Is there anything which I am missing? Do any one know what else could add to the DRAM memory access latency? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] DRAM memory access latency
Prathap, You are probably missing DRAM queuing latency (major reason) and other on-chip latencies (such as bus latency) if any. Thanks, Amin On Tue, Nov 4, 2014 at 1:28 PM, Prathap Kolakkampadath via gem5-users gem5-users@gem5.org wrote: Hello Users, I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache and LPDDR3 x32 DRAM. According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL = '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is 22 ns and on cache miss is 132ns. Which means DRAM memory access latency ~ 110ns. However according to calculation it should be tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns. The latency what i observe is almost 50ns higher than what it is supposed to be. Is there anything which I am missing? Do any one know what else could add to the DRAM memory access latency? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] DRAM memory access latency
Hi Tao,Amin, According to gem5 source, MemAccLat is the time difference between the packet enters in the controller and packet leaves the controller. I presume this added with BusLatency and static backend latency should match with system.l2.ReadReq_avg_miss_latency. However i see a difference of approx 50ns. As mentioned above if MemAccLat is the time a packet spends in memory controller, then it should include the queuing latency too. In that case the value of avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat? Thanks, Prathap On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com wrote: From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the overall average memory latency. It is 63.816ns, which is very close to 60ns as you calculated. I guess the extra 3.816ns is due to the refresh penalty. -Tao On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath kvprat...@gmail.com wrote: Hi Toa, Amin, Thanks for your reply. To discard interbank interference and queueing delay, i have partitioned the banks so that the latency benchmark has exclusive access to a bank. Also latency benchmark is a pointer chasing benchmark, which will generate a single read request at a time. stats.txt says this: system.mem_ctrls.avgQLat 43816.35 # Average queueing delay per DRAM burst system.mem_ctrls.avgBusLat 5000.00 # Average bus latency per DRAM burst system.mem_ctrls.avgMemAccLat 63816.35 # Average memory access latency per DRAM burst system.mem_ctrls.avgRdQLen 2.00 # Average read queue length when enqueuing system.mem_ctrls.avgGap 136814.25 # Average gap between requests system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 # average ReadReq miss latency The average Gap between requests is equal to the L2 latency + DRAM Latency for this test. Also avgRdQLen is 2 because cache line size is 64 and DRAM interface is x32. Is the final latency sum of avgQLat + avgBusLat + avgMemAccLat ? Also when avgRdQLen is 2, i am not sure what amounts to high queueing latency? Regards, Prathap On Tue, Nov 4, 2014 at 1:38 PM, Amin Farmahini amin...@gmail.com wrote: Prathap, You are probably missing DRAM queuing latency (major reason) and other on-chip latencies (such as bus latency) if any. Thanks, Amin On Tue, Nov 4, 2014 at 1:28 PM, Prathap Kolakkampadath via gem5-users gem5-users@gem5.org wrote: Hello Users, I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache and LPDDR3 x32 DRAM. According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL = '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is 22 ns and on cache miss is 132ns. Which means DRAM memory access latency ~ 110ns. However according to calculation it should be tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns. The latency what i observe is almost 50ns higher than what it is supposed to be. Is there anything which I am missing? Do any one know what else could add to the DRAM memory access latency? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users