Re: [gem5-users] DRAM memory access latency

2014-11-11 Thread Andreas Hansson via gem5-users
Hi Prathap,

The crossbar has a given throughput and latency, but I think the default is 1 
Ghz, and 128-bit (16 byte) wide data patch, with a single cycle overhead (64 
bytes every 5 ns). If that is indeed a limit, then you can always increase the 
crossbar clock or width. Note that if you have very bursty reads for example 
you can easily build up a backlog.

If the crossbar is not the issue, then perhaps the master port that is the 
destination for the response is causing the problem?

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Tuesday, 11 November 2014 03:41
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] DRAM memory access latency

Hello Andreas,

 waiting in the port until the crossbar can accept it

Is this because of Bus Contention? In that case, is there a way to reduce this 
latency by changing any parameters in gem5?

Thanks,
Prathap

On Thu, Nov 6, 2014 at 2:30 PM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

I suspect the answer to the mysterious 50 ns is due to the responses being sent 
back using a so called “queued port” in gem5. Thus, from the memory 
controller’s point of view the packet is all done, but is now waiting in the 
port until the crossbar can accept it. This queue can hold a number of packets 
if there has been a burst of responses that are trickling through the crossbar 
on their way back.

You can always run with some debug flags to verify this (XBar, DRAM, 
PacketQueue etc).

Coincidentally I have been working on a patch to remove this “invisible” queue 
and should hopefully have this on the review board shortly.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Thursday, November 6, 2014 at 5:47 PM
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org

Subject: Re: [gem5-users] DRAM memory access latency

Hello Andreas,

Thanks for your reply.


Ok. I got that the memory access latency indeed includes the queueing latency. 
And for the read/write request that miss the buffer has a static latency of  
Static frontend latency + Static backend latency.


To summarize, the test i run is a latency benchmark which is a pointer chasing 
test(only one request at a time) , generate reads to a specific DRAM bank (Bank 
partitioned).This test is running on cpu0 of 4 cpu arm_detailed running at 1GHZ 
frequency with 1MB shared L2 cache and  single channel LPDDR3 x32 DRAM. The 
bank used by cpu0 is not shared between other cpu's.

Test statistics:

system.mem_ctrls.avgQLat
   43816.35   # Average queueing delay per DRAM 
burst
system.mem_ctrls.avgBusLat5000.00   # 
Average bus latency per DRAM burst
system.mem_ctrls.avgMemAccLat63816.35   # 
Average memory access latency per DRAM burst
system.mem_ctrls.avgRdQLen   2.00   # 
Average read queue length when enqueuing
system.mem_ctrls.avgGap 136814.25   # 
Average gap between requests
system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 
  # average ReadReq miss latency

Based on above test statistics:

avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns) 
+tCL(15ns)+static latency(20ns).
Is this breakup correct?

However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more than the 
avgMemAccLat. I couldn't figure out the components contributing to this 50ns 
latency. Your thoughts on this is much appreciated.

Regards,
Prathap




On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

The avgMemAccLat does indeed include any queueing latency. For the precise 
components included in the various latencies I would suggest checking the 
source code.

Note that the controller is not just accounting for the static (and dynamic) 
DRAM latency, but also the static controller pipeline latency (and dynamic 
queueing latency). The controller static latency is two parameters that are by 
default also adding a few 10’s of nanoseconds.

Let me know if you need more help breaking out the various components.

Andreas

From: Prathap Kolakkampadath via gem5-users 
gem5-users@gem5.orgmailto:gem5-users@gem5.org
Reply-To: Prathap Kolakkampadath 
kvprat...@gmail.commailto:kvprat...@gmail.com, gem5 users mailing list 
gem5-users@gem5.orgmailto:gem5-users@gem5.org
Date: Wednesday, 5 November 2014 05:36
To: Tao Zhang tao.zhang.0...@gmail.commailto:tao.zhang.0...@gmail.com, gem5 
users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org, Amin 
Farmahini amin...@gmail.commailto:amin...@gmail.com

Re: [gem5-users] DRAM memory access latency

2014-11-10 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

 waiting in the port until the crossbar can accept it

Is this because of Bus Contention? In that case, is there a way to reduce
this latency by changing any parameters in gem5?

Thanks,
Prathap

On Thu, Nov 6, 2014 at 2:30 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  I suspect the answer to the mysterious 50 ns is due to the responses
 being sent back using a so called “queued port” in gem5. Thus, from the
 memory controller’s point of view the packet is all done, but is now
 waiting in the port until the crossbar can accept it. This queue can hold a
 number of packets if there has been a burst of responses that are trickling
 through the crossbar on their way back.

  You can always run with some debug flags to verify this (XBar, DRAM,
 PacketQueue etc).

  Coincidentally I have been working on a patch to remove this “invisible”
 queue and should hopefully have this on the review board shortly.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Thursday, November 6, 2014 at 5:47 PM
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org

 Subject: Re: [gem5-users] DRAM memory access latency

   Hello Andreas,

  Thanks for your reply.


  Ok. I got that the memory access latency indeed includes the queueing
 latency. And for the read/write request that miss the buffer has a static
 latency of  Static frontend latency + Static backend latency.


  To summarize, the test i run is a latency benchmark which is a pointer
 chasing test(only one request at a time) , generate reads to a specific
 DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu
 arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and  single
 channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other
 cpu's.

  Test statistics:

 system.mem_ctrls.avgQLat
43816.35   # Average queueing delay per
 DRAM burst
 system.mem_ctrls.avgBusLat
 5000.00   # Average bus latency per DRAM burst
 system.mem_ctrls.avgMemAccLat
 63816.35   # Average memory access latency per DRAM
 burst
 system.mem_ctrls.avgRdQLen
 2.00   # Average read queue length when enqueuing
 system.mem_ctrls.avgGap
 136814.25   # Average gap between requests
 system.l2.ReadReq_avg_miss_latency::switch_cpus0.data
 114767.654811   # average ReadReq miss latency

  Based on above test statistics:

  avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns)
 +tCL(15ns)+static latency(20ns).
 Is this breakup correct?

  However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more
 than the avgMemAccLat. I couldn't figure out the components contributing to
 this 50ns latency. Your thoughts on this is much appreciated.

  Regards,
  Prathap




 On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The avgMemAccLat does indeed include any queueing latency. For the
 precise components included in the various latencies I would suggest
 checking the source code.

  Note that the controller is not just accounting for the static (and
 dynamic) DRAM latency, but also the static controller pipeline latency (and
 dynamic queueing latency). The controller static latency is two parameters
 that are by default also adding a few 10’s of nanoseconds.

  Let me know if you need more help breaking out the various components.

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Wednesday, 5 November 2014 05:36
 To: Tao Zhang tao.zhang.0...@gmail.com, gem5 users mailing list 
 gem5-users@gem5.org, Amin Farmahini amin...@gmail.com
 Subject: Re: [gem5-users] DRAM memory access latency

  Hi Tao,Amin,

  According to gem5 source, MemAccLat is the time difference between the
 packet enters in the controller and packet leaves the controller. I presume
  this added with BusLatency and static backend latency should match with
 system.l2.ReadReq_avg_miss_latency. However i see a difference of approx
 50ns.


  As mentioned above if MemAccLat is the time a packet spends in memory
 controller, then it should include the queuing latency too. In that case
 the value of  avgQLat looks suspicious. Is the avgQlat part of
 avgMemAccLat?

  Thanks,
 Prathap



 On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com
 wrote:

  From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the
 overall average memory latency. It is 63.816ns, which is very close to 60ns
 as you calculated. I guess the extra 3.816ns is due to the refresh penalty.

 -Tao

 On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hi Toa, Amin,


  Thanks for your reply.

  To discard interbank interference and queueing delay, i have

Re: [gem5-users] DRAM memory access latency

2014-11-06 Thread Andreas Hansson via gem5-users
Hi Prathap,

The avgMemAccLat does indeed include any queueing latency. For the precise 
components included in the various latencies I would suggest checking the 
source code.

Note that the controller is not just accounting for the static (and dynamic) 
DRAM latency, but also the static controller pipeline latency (and dynamic 
queueing latency). The controller static latency is two parameters that are by 
default also adding a few 10’s of nanoseconds.

Let me know if you need more help breaking out the various components.

Andreas

From: Prathap Kolakkampadath via gem5-users 
gem5-users@gem5.orgmailto:gem5-users@gem5.org
Reply-To: Prathap Kolakkampadath 
kvprat...@gmail.commailto:kvprat...@gmail.com, gem5 users mailing list 
gem5-users@gem5.orgmailto:gem5-users@gem5.org
Date: Wednesday, 5 November 2014 05:36
To: Tao Zhang tao.zhang.0...@gmail.commailto:tao.zhang.0...@gmail.com, gem5 
users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org, Amin 
Farmahini amin...@gmail.commailto:amin...@gmail.com
Subject: Re: [gem5-users] DRAM memory access latency

Hi Tao,Amin,

According to gem5 source, MemAccLat is the time difference between the packet 
enters in the controller and packet leaves the controller. I presume  this 
added with BusLatency and static backend latency should match with  
system.l2.ReadReq_avg_miss_latency. However i see a difference of approx 50ns.


As mentioned above if MemAccLat is the time a packet spends in memory 
controller, then it should include the queuing latency too. In that case the 
value of  avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat?

Thanks,
Prathap



On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang 
tao.zhang.0...@gmail.commailto:tao.zhang.0...@gmail.com wrote:
From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the overall 
average memory latency. It is 63.816ns, which is very close to 60ns as you 
calculated. I guess the extra 3.816ns is due to the refresh penalty.

-Tao

On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath 
kvprat...@gmail.commailto:kvprat...@gmail.com wrote:
Hi Toa, Amin,


Thanks for your reply.

To discard interbank interference and queueing delay, i have partitioned the 
banks so that the latency benchmark has exclusive access to a bank. Also 
latency benchmark is a pointer chasing benchmark, which will generate a single 
read request at a time.


stats.txt says this:

system.mem_ctrls.avgQLat 43816.35   # 
Average queueing delay per DRAM burst
system.mem_ctrls.avgBusLat5000.00   # 
Average bus latency per DRAM burst
system.mem_ctrls.avgMemAccLat63816.35   # 
Average memory access latency per DRAM burst
system.mem_ctrls.avgRdQLen   2.00   # 
Average read queue length when enqueuing
system.mem_ctrls.avgGap 136814.25   # 
Average gap between requests
system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 
  # average ReadReq miss latency

The average Gap between requests is equal to the L2 latency + DRAM Latency for 
this test. Also avgRdQLen is 2 because cache line size is 64 and DRAM interface 
is x32.

Is the final latency sum of avgQLat + avgBusLat + avgMemAccLat ?
Also when avgRdQLen is 2, i am not sure what amounts to high queueing latency?

Regards,
Prathap



On Tue, Nov 4, 2014 at 1:38 PM, Amin Farmahini 
amin...@gmail.commailto:amin...@gmail.com wrote:
Prathap,

You are probably missing DRAM queuing latency (major reason) and other on-chip 
latencies (such as bus latency) if any.

Thanks,
Amin

On Tue, Nov 4, 2014 at 1:28 PM, Prathap Kolakkampadath via gem5-users 
gem5-users@gem5.orgmailto:gem5-users@gem5.org wrote:
Hello Users,

I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST) 
using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache and  
LPDDR3 x32 DRAM.

According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL = '15ns', 
tBURST = '5ns'. Latency measured by the benchmark on cache hit is 22 ns and on 
cache miss is  132ns. Which means DRAM memory access latency ~ 110ns. However 
according to calculation it should  be 
tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns.


The latency what i observe is almost 50ns higher than what it is supposed to 
be. Is there anything which I am missing? Do any one know what else could add 
to the DRAM memory access latency?

Thanks,
Prathap


___
gem5-users mailing list
gem5-users@gem5.orgmailto:gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users





-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose

Re: [gem5-users] DRAM memory access latency

2014-11-06 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

Thanks for your reply.


Ok. I got that the memory access latency indeed includes the queueing
latency. And for the read/write request that miss the buffer has a static
latency of  Static frontend latency + Static backend latency.


To summarize, the test i run is a latency benchmark which is a pointer
chasing test(only one request at a time) , generate reads to a specific
DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu
arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and  single
channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other
cpu's.

Test statistics:

system.mem_ctrls.avgQLat
   43816.35   # Average queueing delay per
DRAM burst
system.mem_ctrls.avgBusLat5000.00
# Average bus latency per DRAM burst
system.mem_ctrls.avgMemAccLat63816.35
# Average memory access latency per DRAM burst
system.mem_ctrls.avgRdQLen   2.00
# Average read queue length when enqueuing
system.mem_ctrls.avgGap 136814.25
# Average gap between requests
system.l2.ReadReq_avg_miss_latency::switch_cpus0.data
114767.654811   # average ReadReq miss latency

Based on above test statistics:

avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns)
+tCL(15ns)+static latency(20ns).
Is this breakup correct?

However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more than
the avgMemAccLat. I couldn't figure out the components contributing to this
50ns latency. Your thoughts on this is much appreciated.

Regards,
Prathap




On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  The avgMemAccLat does indeed include any queueing latency. For the
 precise components included in the various latencies I would suggest
 checking the source code.

  Note that the controller is not just accounting for the static (and
 dynamic) DRAM latency, but also the static controller pipeline latency (and
 dynamic queueing latency). The controller static latency is two parameters
 that are by default also adding a few 10’s of nanoseconds.

  Let me know if you need more help breaking out the various components.

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Wednesday, 5 November 2014 05:36
 To: Tao Zhang tao.zhang.0...@gmail.com, gem5 users mailing list 
 gem5-users@gem5.org, Amin Farmahini amin...@gmail.com
 Subject: Re: [gem5-users] DRAM memory access latency

  Hi Tao,Amin,

  According to gem5 source, MemAccLat is the time difference between the
 packet enters in the controller and packet leaves the controller. I presume
  this added with BusLatency and static backend latency should match with
 system.l2.ReadReq_avg_miss_latency. However i see a difference of approx
 50ns.


  As mentioned above if MemAccLat is the time a packet spends in memory
 controller, then it should include the queuing latency too. In that case
 the value of  avgQLat looks suspicious. Is the avgQlat part of
 avgMemAccLat?

  Thanks,
 Prathap



 On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com
 wrote:

  From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the
 overall average memory latency. It is 63.816ns, which is very close to 60ns
 as you calculated. I guess the extra 3.816ns is due to the refresh penalty.

 -Tao

 On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hi Toa, Amin,


  Thanks for your reply.

  To discard interbank interference and queueing delay, i have
 partitioned the banks so that the latency benchmark has exclusive access to
 a bank. Also latency benchmark is a pointer chasing benchmark, which will
 generate a single read request at a time.


  stats.txt says this:

 system.mem_ctrls.avgQLat
 43816.35   # Average queueing delay per DRAM burst
 system.mem_ctrls.avgBusLat
 5000.00   # Average bus latency per DRAM burst
 system.mem_ctrls.avgMemAccLat
 63816.35   # Average memory access latency per DRAM
 burst
 system.mem_ctrls.avgRdQLen
 2.00   # Average read queue length when enqueuing
 system.mem_ctrls.avgGap
 136814.25   # Average gap between requests
 system.l2.ReadReq_avg_miss_latency::switch_cpus0.data
 114767.654811   # average ReadReq miss latency

  The average Gap between requests is equal to the L2 latency + DRAM
 Latency for this test. Also avgRdQLen is 2 because cache line size is 64
 and DRAM interface is x32.

  Is the final latency sum of avgQLat + avgBusLat + avgMemAccLat ?
 Also when avgRdQLen is 2, i am not sure what amounts to high queueing
 latency?

  Regards,
  Prathap



 On Tue, Nov 4, 2014 at 1:38 PM, Amin Farmahini amin...@gmail.com
 wrote:

  Prathap,

  You

Re: [gem5-users] DRAM memory access latency

2014-11-06 Thread Andreas Hansson via gem5-users
Hi Prathap,

I suspect the answer to the mysterious 50 ns is due to the responses being sent 
back using a so called “queued port” in gem5. Thus, from the memory 
controller’s point of view the packet is all done, but is now waiting in the 
port until the crossbar can accept it. This queue can hold a number of packets 
if there has been a burst of responses that are trickling through the crossbar 
on their way back.

You can always run with some debug flags to verify this (XBar, DRAM, 
PacketQueue etc).

Coincidentally I have been working on a patch to remove this “invisible” queue 
and should hopefully have this on the review board shortly.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Thursday, November 6, 2014 at 5:47 PM
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] DRAM memory access latency

Hello Andreas,

Thanks for your reply.


Ok. I got that the memory access latency indeed includes the queueing latency. 
And for the read/write request that miss the buffer has a static latency of  
Static frontend latency + Static backend latency.


To summarize, the test i run is a latency benchmark which is a pointer chasing 
test(only one request at a time) , generate reads to a specific DRAM bank (Bank 
partitioned).This test is running on cpu0 of 4 cpu arm_detailed running at 1GHZ 
frequency with 1MB shared L2 cache and  single channel LPDDR3 x32 DRAM. The 
bank used by cpu0 is not shared between other cpu's.

Test statistics:

system.mem_ctrls.avgQLat
   43816.35   # Average queueing delay per DRAM 
burst
system.mem_ctrls.avgBusLat5000.00   # 
Average bus latency per DRAM burst
system.mem_ctrls.avgMemAccLat63816.35   # 
Average memory access latency per DRAM burst
system.mem_ctrls.avgRdQLen   2.00   # 
Average read queue length when enqueuing
system.mem_ctrls.avgGap 136814.25   # 
Average gap between requests
system.l2.ReadReq_avg_miss_latency::switch_cpus0.data 114767.654811 
  # average ReadReq miss latency

Based on above test statistics:

avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns) 
+tCL(15ns)+static latency(20ns).
Is this breakup correct?

However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more than the 
avgMemAccLat. I couldn't figure out the components contributing to this 50ns 
latency. Your thoughts on this is much appreciated.

Regards,
Prathap




On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

The avgMemAccLat does indeed include any queueing latency. For the precise 
components included in the various latencies I would suggest checking the 
source code.

Note that the controller is not just accounting for the static (and dynamic) 
DRAM latency, but also the static controller pipeline latency (and dynamic 
queueing latency). The controller static latency is two parameters that are by 
default also adding a few 10’s of nanoseconds.

Let me know if you need more help breaking out the various components.

Andreas

From: Prathap Kolakkampadath via gem5-users 
gem5-users@gem5.orgmailto:gem5-users@gem5.org
Reply-To: Prathap Kolakkampadath 
kvprat...@gmail.commailto:kvprat...@gmail.com, gem5 users mailing list 
gem5-users@gem5.orgmailto:gem5-users@gem5.org
Date: Wednesday, 5 November 2014 05:36
To: Tao Zhang tao.zhang.0...@gmail.commailto:tao.zhang.0...@gmail.com, gem5 
users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org, Amin 
Farmahini amin...@gmail.commailto:amin...@gmail.com
Subject: Re: [gem5-users] DRAM memory access latency

Hi Tao,Amin,

According to gem5 source, MemAccLat is the time difference between the packet 
enters in the controller and packet leaves the controller. I presume  this 
added with BusLatency and static backend latency should match with  
system.l2.ReadReq_avg_miss_latency. However i see a difference of approx 50ns.


As mentioned above if MemAccLat is the time a packet spends in memory 
controller, then it should include the queuing latency too. In that case the 
value of  avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat?

Thanks,
Prathap



On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang 
tao.zhang.0...@gmail.commailto:tao.zhang.0...@gmail.com wrote:
From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the overall 
average memory latency. It is 63.816ns, which is very close to 60ns as you 
calculated. I guess the extra 3.816ns is due to the refresh penalty.

-Tao

On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath 
kvprat...@gmail.commailto:kvprat...@gmail.com wrote:
Hi Toa, Amin,


Thanks for your reply.

To discard interbank interference

Re: [gem5-users] DRAM memory access latency

2014-11-06 Thread Prathap Kolakkampadath via gem5-users
Thanks for your reply. I will try to verify this and also get back to you
with results once i run with your patch.

Regards,
Prathap

On Thu, Nov 6, 2014 at 2:30 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  I suspect the answer to the mysterious 50 ns is due to the responses
 being sent back using a so called “queued port” in gem5. Thus, from the
 memory controller’s point of view the packet is all done, but is now
 waiting in the port until the crossbar can accept it. This queue can hold a
 number of packets if there has been a burst of responses that are trickling
 through the crossbar on their way back.

  You can always run with some debug flags to verify this (XBar, DRAM,
 PacketQueue etc).

  Coincidentally I have been working on a patch to remove this “invisible”
 queue and should hopefully have this on the review board shortly.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Thursday, November 6, 2014 at 5:47 PM
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org

 Subject: Re: [gem5-users] DRAM memory access latency

   Hello Andreas,

  Thanks for your reply.


  Ok. I got that the memory access latency indeed includes the queueing
 latency. And for the read/write request that miss the buffer has a static
 latency of  Static frontend latency + Static backend latency.


  To summarize, the test i run is a latency benchmark which is a pointer
 chasing test(only one request at a time) , generate reads to a specific
 DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu
 arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and  single
 channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other
 cpu's.

  Test statistics:

 system.mem_ctrls.avgQLat
43816.35   # Average queueing delay per
 DRAM burst
 system.mem_ctrls.avgBusLat
 5000.00   # Average bus latency per DRAM burst
 system.mem_ctrls.avgMemAccLat
 63816.35   # Average memory access latency per DRAM
 burst
 system.mem_ctrls.avgRdQLen
 2.00   # Average read queue length when enqueuing
 system.mem_ctrls.avgGap
 136814.25   # Average gap between requests
 system.l2.ReadReq_avg_miss_latency::switch_cpus0.data
 114767.654811   # average ReadReq miss latency

  Based on above test statistics:

  avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns)
 +tCL(15ns)+static latency(20ns).
 Is this breakup correct?

  However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more
 than the avgMemAccLat. I couldn't figure out the components contributing to
 this 50ns latency. Your thoughts on this is much appreciated.

  Regards,
  Prathap




 On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The avgMemAccLat does indeed include any queueing latency. For the
 precise components included in the various latencies I would suggest
 checking the source code.

  Note that the controller is not just accounting for the static (and
 dynamic) DRAM latency, but also the static controller pipeline latency (and
 dynamic queueing latency). The controller static latency is two parameters
 that are by default also adding a few 10’s of nanoseconds.

  Let me know if you need more help breaking out the various components.

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Wednesday, 5 November 2014 05:36
 To: Tao Zhang tao.zhang.0...@gmail.com, gem5 users mailing list 
 gem5-users@gem5.org, Amin Farmahini amin...@gmail.com
 Subject: Re: [gem5-users] DRAM memory access latency

  Hi Tao,Amin,

  According to gem5 source, MemAccLat is the time difference between the
 packet enters in the controller and packet leaves the controller. I presume
  this added with BusLatency and static backend latency should match with
 system.l2.ReadReq_avg_miss_latency. However i see a difference of approx
 50ns.


  As mentioned above if MemAccLat is the time a packet spends in memory
 controller, then it should include the queuing latency too. In that case
 the value of  avgQLat looks suspicious. Is the avgQlat part of
 avgMemAccLat?

  Thanks,
 Prathap



 On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com
 wrote:

  From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the
 overall average memory latency. It is 63.816ns, which is very close to 60ns
 as you calculated. I guess the extra 3.816ns is due to the refresh penalty.

 -Tao

 On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hi Toa, Amin,


  Thanks for your reply.

  To discard interbank interference and queueing delay, i have
 partitioned the banks so that the latency benchmark has exclusive access to
 a bank

[gem5-users] DRAM memory access latency

2014-11-04 Thread Prathap Kolakkampadath via gem5-users
Hello Users,

I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST)
using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache
and  LPDDR3 x32 DRAM.

According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL =
'15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is
22 ns and on cache miss is  132ns. Which means DRAM memory access latency ~
110ns. However according to calculation it should  be
tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns.


The latency what i observe is almost 50ns higher than what it is supposed
to be. Is there anything which I am missing? Do any one know what else
could add to the DRAM memory access latency?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAM memory access latency

2014-11-04 Thread Tao Zhang via gem5-users
Hi Parthap,

the latency tRP+tRCD+tCL+tBURST is only the static access latency for DRAM.
In memory subsystem, there is also dynamic queuing delay due to the memory
controller scheduling (reordering) and resource availability (bank
conflict, refresh, other timing constraints like tFAW, tRRD, tWTR) .

Therefore, the average request latency is more than the theoretic static
latency. This is why there are many papers talking about the request
scheduling and refresh relax to make possible optimization.

-Tao

On Tue, Nov 4, 2014 at 11:28 AM, Prathap Kolakkampadath via gem5-users 
gem5-users@gem5.org wrote:

 Hello Users,

 I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST)
 using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache
 and  LPDDR3 x32 DRAM.

 According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL =
 '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is
 22 ns and on cache miss is  132ns. Which means DRAM memory access latency ~
 110ns. However according to calculation it should  be
 tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns.


 The latency what i observe is almost 50ns higher than what it is supposed
 to be. Is there anything which I am missing? Do any one know what else
 could add to the DRAM memory access latency?

 Thanks,
 Prathap


 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAM memory access latency

2014-11-04 Thread Amin Farmahini via gem5-users
Prathap,

You are probably missing DRAM queuing latency (major reason) and other
on-chip latencies (such as bus latency) if any.

Thanks,
Amin

On Tue, Nov 4, 2014 at 1:28 PM, Prathap Kolakkampadath via gem5-users 
gem5-users@gem5.org wrote:

 Hello Users,

 I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST)
 using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache
 and  LPDDR3 x32 DRAM.

 According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL =
 '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is
 22 ns and on cache miss is  132ns. Which means DRAM memory access latency ~
 110ns. However according to calculation it should  be
 tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns.


 The latency what i observe is almost 50ns higher than what it is supposed
 to be. Is there anything which I am missing? Do any one know what else
 could add to the DRAM memory access latency?

 Thanks,
 Prathap


 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAM memory access latency

2014-11-04 Thread Prathap Kolakkampadath via gem5-users
Hi Tao,Amin,

According to gem5 source, MemAccLat is the time difference between the
packet enters in the controller and packet leaves the controller. I presume
 this added with BusLatency and static backend latency should match with
system.l2.ReadReq_avg_miss_latency. However i see a difference of approx
50ns.


As mentioned above if MemAccLat is the time a packet spends in memory
controller, then it should include the queuing latency too. In that case
the value of  avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat?

Thanks,
Prathap



On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com wrote:

 From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the
 overall average memory latency. It is 63.816ns, which is very close to 60ns
 as you calculated. I guess the extra 3.816ns is due to the refresh penalty.

 -Tao

 On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

 Hi Toa, Amin,


 Thanks for your reply.

 To discard interbank interference and queueing delay, i have partitioned
 the banks so that the latency benchmark has exclusive access to a bank.
 Also latency benchmark is a pointer chasing benchmark, which will generate
 a single read request at a time.


 stats.txt says this:

 system.mem_ctrls.avgQLat
 43816.35   # Average queueing delay per DRAM burst
 system.mem_ctrls.avgBusLat
 5000.00   # Average bus latency per DRAM burst
 system.mem_ctrls.avgMemAccLat
 63816.35   # Average memory access latency per DRAM
 burst
 system.mem_ctrls.avgRdQLen
 2.00   # Average read queue length when enqueuing
 system.mem_ctrls.avgGap
 136814.25   # Average gap between requests
 system.l2.ReadReq_avg_miss_latency::switch_cpus0.data
 114767.654811   # average ReadReq miss latency

 The average Gap between requests is equal to the L2 latency + DRAM
 Latency for this test. Also avgRdQLen is 2 because cache line size is 64
 and DRAM interface is x32.

 Is the final latency sum of avgQLat + avgBusLat + avgMemAccLat ?
 Also when avgRdQLen is 2, i am not sure what amounts to high queueing
 latency?

 Regards,
 Prathap



 On Tue, Nov 4, 2014 at 1:38 PM, Amin Farmahini amin...@gmail.com wrote:

 Prathap,

 You are probably missing DRAM queuing latency (major reason) and other
 on-chip latencies (such as bus latency) if any.

 Thanks,
 Amin

 On Tue, Nov 4, 2014 at 1:28 PM, Prathap Kolakkampadath via gem5-users 
 gem5-users@gem5.org wrote:

 Hello Users,

 I am measuring DRAM worst case memory access latency(tRP+tRCD
 +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB
 shared L2 cache and  LPDDR3 x32 DRAM.

 According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL =
 '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is
 22 ns and on cache miss is  132ns. Which means DRAM memory access latency ~
 110ns. However according to calculation it should  be
 tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns.


 The latency what i observe is almost 50ns higher than what it is
 supposed to be. Is there anything which I am missing? Do any one know what
 else could add to the DRAM memory access latency?

 Thanks,
 Prathap


 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users





___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users