Re: [gem5-users] Questions on DRAM Controller model

2014-10-16 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

According to gem5 Documentation

Load  store buffers (for read and write access) don’t impose any
restriction on the number of active memory accesses. Therefore, the maximum
number of outstanding CPU’s memory access requests is not limited by CPU
Memory Object but by underlying memory system model.

I assume this means the number outstanding memory accesses of a CPU is
depended  on number of L1 MSHR. Is this correct?


However, As quoted from the paper
http://web.eecs.umich.edu/~atgutier/papers/ispass_2014.pdf

gem5’s fetch engine only
allows a single outstanding I-cache access, whereas modern
OoO CPUs are fully pipelined allowing multiple parallel
accesses to instruction cache lines. This specification error in
the fetch stage contributes to the I-cache miss statistic error.

If this is the case then, it limits the number of outstanding read
requests? Is it possible to generate multiple parallel accesses by setting
the response_latency of L1 cache to 0?


Kindly let me know what you think on this?

Thanks,
Prathap




On Tue, Oct 14, 2014 at 4:22 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hello Prathap,

  I do not dare say, but perhaps some interaction between your generated
 access sequence and the O3 model (parameters) restrict the number of
 outstanding L1 misses? There are plenty debug flags to help in drilling
 down on this issue. Have a look in src/cpu/o3/Sconscript for the O3 related
 debug flags and src/mem/cache/Sconscript for the cache flags.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, October 14, 2014 at 9:21 PM

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

   Hello Andreas

  Whenever i switch to O3 Cpu from a checkpoint, i could see from
 config.ini that CPU is getting switched but the mem_mode is still set to
 atomic. However when booting in O3 CPU itself(without restoring from a
 checkpoint) the mem_mode is set to timing. Not sure why. Anyhow i could run
 my tests on O3 CPU with mem_mode timing(as verified from config.ini)

  When i run one memory-intensive tests, which generates cache miss on
 every read, in parallel with a pointer chasing test(one outstanding request
 at a time) and both the cpu's share the same bank of DRAM Controller. In my
 setup, as # of L1 MSHRs are 10, memory-intensive test can generate up to 10
 Outstanding requests at a time. Since CPU speed is much faster than DRAM
 controller, can generate outstanding requests and all the requests are
 targeted to same bank, i expect to see the DRAM queue size to be 10 all the
 time when there is a request coming from pointer chasing test. If this
 assumption is correct i could see a better interference in model as i could
 see in real platforms.

  Don't you think DRAM queue size would get  filled up to the size of
 number of L1 MSHRs according to above scenario. And what could be the case
 in order to fill the DRAM up to the size of # of L1 MSHRs.

  Thanks,
  Prathap Kumar Valsan
  Research Assistant
  University of Kansas

 On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The O3 CPU only works with the memory system in timing mode, so I do
 not understand what two points you are comparing when you say the results
 are exactly the same.

  The read queue is likely to never fill up unless all these transactions
 are generated at once. While the first one is being served by the memory
 controller you may have more coming in etc, but I do not understand why you
 think it would ever fill up.

  For “debugging” make sure that the config.ini actually captures what
 you think you are simulating. Also, you have a lot of DRAM-related stats in
 the stats.txt output.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 14 October 2014 04:33

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

  I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing,
 the results are exactly the same compared to mem_mode=atomic.
  I have partitioned the DRAM banks using software. Both the benchmarks-
 latency-sensitive and bandwidth -sensitive (both generates only reads)
 running in parallel using the same DRAM bank.
 From status file, i observe expected number L2 misses and DRAM requests
 are getting generated.
 In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
 32. So i expect that when a request from a latency-sensitive benchmark
 comes to DRAM, the readQ size has to be 10. However what i am observing is
 most of the time the Queue is not getting filled and hence there is less
 queueing latency and interference.

  I am using classic memory system with default DRAM
 controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
 policy

Re: [gem5-users] Questions on DRAM Controller model

2014-10-16 Thread Andreas Hansson via gem5-users
Hi Prathap,

I’ll leave that question for someone with more insight into the various CPU 
models.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Thursday, October 16, 2014 at 8:59 PM
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas,

According to gem5 Documentation

Load  store buffers (for read and write access) don’t impose any restriction 
on the number of active memory accesses. Therefore, the maximum number of 
outstanding CPU’s memory access requests is not limited by CPU Memory Object 
but by underlying memory system model.

I assume this means the number outstanding memory accesses of a CPU is depended 
 on number of L1 MSHR. Is this correct?


However, As quoted from the paper 
http://web.eecs.umich.edu/~atgutier/papers/ispass_2014.pdfhttp://web.eecs.umich.edu/%7Eatgutier/papers/ispass_2014.pdf

gem5’s fetch engine only
allows a single outstanding I-cache access, whereas modern
OoO CPUs are fully pipelined allowing multiple parallel
accesses to instruction cache lines. This specification error in
the fetch stage contributes to the I-cache miss statistic error.

If this is the case then, it limits the number of outstanding read requests? Is 
it possible to generate multiple parallel accesses by setting the 
response_latency of L1 cache to 0?


Kindly let me know what you think on this?

Thanks,
Prathap




On Tue, Oct 14, 2014 at 4:22 PM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hello Prathap,

I do not dare say, but perhaps some interaction between your generated access 
sequence and the O3 model (parameters) restrict the number of outstanding L1 
misses? There are plenty debug flags to help in drilling down on this issue. 
Have a look in src/cpu/o3/Sconscript for the O3 related debug flags and 
src/mem/cache/Sconscript for the cache flags.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Tuesday, October 14, 2014 at 9:21 PM

To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas

Whenever i switch to O3 Cpu from a checkpoint, i could see from config.ini that 
CPU is getting switched but the mem_mode is still set to atomic. However when 
booting in O3 CPU itself(without restoring from a checkpoint) the mem_mode is 
set to timing. Not sure why. Anyhow i could run my tests on O3 CPU with 
mem_mode timing(as verified from config.ini)

When i run one memory-intensive tests, which generates cache miss on every 
read, in parallel with a pointer chasing test(one outstanding request at a 
time) and both the cpu's share the same bank of DRAM Controller. In my setup, 
as # of L1 MSHRs are 10, memory-intensive test can generate up to 10 
Outstanding requests at a time. Since CPU speed is much faster than DRAM 
controller, can generate outstanding requests and all the requests are targeted 
to same bank, i expect to see the DRAM queue size to be 10 all the time when 
there is a request coming from pointer chasing test. If this assumption is 
correct i could see a better interference in model as i could see in real 
platforms.

Don't you think DRAM queue size would get  filled up to the size of number of 
L1 MSHRs according to above scenario. And what could be the case in order to 
fill the DRAM up to the size of # of L1 MSHRs.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

The O3 CPU only works with the memory system in timing mode, so I do not 
understand what two points you are comparing when you say the results are 
exactly the same.

The read queue is likely to never fill up unless all these transactions are 
generated at once. While the first one is being served by the memory controller 
you may have more coming in etc, but I do not understand why you think it would 
ever fill up.

For “debugging” make sure that the config.ini actually captures what you think 
you are simulating. Also, you have a lot of DRAM-related stats in the stats.txt 
output.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Tuesday, 14 October 2014 04:33

To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing, the 
results are exactly the same compared to mem_mode=atomic.
I have partitioned the DRAM banks using software. Both the benchmarks

Re: [gem5-users] Questions on DRAM Controller model

2014-10-15 Thread Prathap Kolakkampadath via gem5-users
Thanks Andreas.


On Tue, Oct 14, 2014 at 4:22 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hello Prathap,

  I do not dare say, but perhaps some interaction between your generated
 access sequence and the O3 model (parameters) restrict the number of
 outstanding L1 misses? There are plenty debug flags to help in drilling
 down on this issue. Have a look in src/cpu/o3/Sconscript for the O3 related
 debug flags and src/mem/cache/Sconscript for the cache flags.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, October 14, 2014 at 9:21 PM

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

   Hello Andreas

  Whenever i switch to O3 Cpu from a checkpoint, i could see from
 config.ini that CPU is getting switched but the mem_mode is still set to
 atomic. However when booting in O3 CPU itself(without restoring from a
 checkpoint) the mem_mode is set to timing. Not sure why. Anyhow i could run
 my tests on O3 CPU with mem_mode timing(as verified from config.ini)

  When i run one memory-intensive tests, which generates cache miss on
 every read, in parallel with a pointer chasing test(one outstanding request
 at a time) and both the cpu's share the same bank of DRAM Controller. In my
 setup, as # of L1 MSHRs are 10, memory-intensive test can generate up to 10
 Outstanding requests at a time. Since CPU speed is much faster than DRAM
 controller, can generate outstanding requests and all the requests are
 targeted to same bank, i expect to see the DRAM queue size to be 10 all the
 time when there is a request coming from pointer chasing test. If this
 assumption is correct i could see a better interference in model as i could
 see in real platforms.

  Don't you think DRAM queue size would get  filled up to the size of
 number of L1 MSHRs according to above scenario. And what could be the case
 in order to fill the DRAM up to the size of # of L1 MSHRs.

  Thanks,
  Prathap Kumar Valsan
  Research Assistant
  University of Kansas

 On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The O3 CPU only works with the memory system in timing mode, so I do
 not understand what two points you are comparing when you say the results
 are exactly the same.

  The read queue is likely to never fill up unless all these transactions
 are generated at once. While the first one is being served by the memory
 controller you may have more coming in etc, but I do not understand why you
 think it would ever fill up.

  For “debugging” make sure that the config.ini actually captures what
 you think you are simulating. Also, you have a lot of DRAM-related stats in
 the stats.txt output.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 14 October 2014 04:33

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

  I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing,
 the results are exactly the same compared to mem_mode=atomic.
  I have partitioned the DRAM banks using software. Both the benchmarks-
 latency-sensitive and bandwidth -sensitive (both generates only reads)
 running in parallel using the same DRAM bank.
 From status file, i observe expected number L2 misses and DRAM requests
 are getting generated.
 In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
 32. So i expect that when a request from a latency-sensitive benchmark
 comes to DRAM, the readQ size has to be 10. However what i am observing is
 most of the time the Queue is not getting filled and hence there is less
 queueing latency and interference.

  I am using classic memory system with default DRAM
 controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
 policy-open_adaptive, and frfcfs scheduler.

  Do you have any thoughts on this? How could i debug this further?

  Appreciate your help.

  Thanks,
  Prathap Kumar Valsan
  Research Assistant
  University of Kansas

 On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson andreas.hans...@arm.com
  wrote:

  Hi Prathap,

  Indeed. The atomic mode is for fast-forwarding only. Once you actually
 want to get some representative performance numbers you have to run in
 timing mode with either the O3 or Minor CPU model.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Monday, 13 October 2014 10:19

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Thanks for your reply. The memory mode which I used is atomic. I
 think, I need to run the tests in timing More. I believe which shows up
 interference and queueing delay similar to real platforms.

 Prathap
 On Oct 13, 2014 2:55 AM, Andreas Hansson andreas.hans

Re: [gem5-users] Questions on DRAM Controller model

2014-10-14 Thread Andreas Hansson via gem5-users
Hi Prathap,

The O3 CPU only works with the memory system in timing mode, so I do not 
understand what two points you are comparing when you say the results are 
exactly the same.

The read queue is likely to never fill up unless all these transactions are 
generated at once. While the first one is being served by the memory controller 
you may have more coming in etc, but I do not understand why you think it would 
ever fill up.

For “debugging” make sure that the config.ini actually captures what you think 
you are simulating. Also, you have a lot of DRAM-related stats in the stats.txt 
output.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Tuesday, 14 October 2014 04:33
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing, the 
results are exactly the same compared to mem_mode=atomic.
I have partitioned the DRAM banks using software. Both the benchmarks- 
latency-sensitive and bandwidth -sensitive (both generates only reads) running 
in parallel using the same DRAM bank.
From status file, i observe expected number L2 misses and DRAM requests are 
getting generated.
In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are  32. So 
i expect that when a request from a latency-sensitive benchmark comes to DRAM, 
the readQ size has to be 10. However what i am observing is most of the time 
the Queue is not getting filled and hence there is less queueing latency and 
interference.

I am using classic memory system with default DRAM controller,DDR3_1600_x64. 
Addressing map is RoRaBaChCo, page policy-open_adaptive, and frfcfs scheduler.

Do you have any thoughts on this? How could i debug this further?

Appreciate your help.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

Indeed. The atomic mode is for fast-forwarding only. Once you actually want to 
get some representative performance numbers you have to run in timing mode with 
either the O3 or Minor CPU model.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Monday, 13 October 2014 10:19

To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model


Thanks for your reply. The memory mode which I used is atomic. I think, I need 
to run the tests in timing More. I believe which shows up interference and 
queueing delay similar to real platforms.

Prathap

On Oct 13, 2014 2:55 AM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

I don’t dare say exactly what is going wrong in your setup, but I am confident 
that Ruby will not magically make things more representative (it will likely 
give you a whole lot more problems though). In the end it is all about 
configuring the building blocks to match the system you want to capture. The 
crossbars and caches in the classic memory system do make some simplifications, 
but I have not yet seen a case when they are not sufficiently accurate.

Have you looked at the various policy settings in the DRAM controller, e.g. the 
page policy and address mapping? If you’re trying to correlate with a real 
platform, also see Anthony’s ISPASS paper from last year for some sensible 
steps in simplifying the problem and dividing it into manageable chunks.

Good luck.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Monday, 13 October 2014 00:29
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas/Users,

I used to create a checkpoint until linux boot using Atomic Simple CPU and then 
restore from this checkpoint to detailed O3 cpu before running the test. I 
notice that the mem-mode is  set to atomic and not timing. Will that be the 
reason for less contention in memory bus i am observing?

Thanks,
Prathap

On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath 
kvprat...@gmail.commailto:kvprat...@gmail.com wrote:
Hello Andreas,

Even after configuring the model like the actual hardware, i still not seeing 
enough interference to the read request under consideration. I am using the 
classic memory system model. Since it uses atomic and functional
Packet allocation protocol, I would like to switch to Ruby( I think it more 
resembles with real platform).


I am hitting in to below problem when i use ruby.

/build/ARM/gem5.opt --stats-file=cr1A1.txt

Re: [gem5-users] Questions on DRAM Controller model

2014-10-14 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas

Whenever i switch to O3 Cpu from a checkpoint, i could see from config.ini
that CPU is getting switched but the mem_mode is still set to atomic.
However when booting in O3 CPU itself(without restoring from a checkpoint)
the mem_mode is set to timing. Not sure why. Anyhow i could run my tests on
O3 CPU with mem_mode timing(as verified from config.ini)

When i run one memory-intensive tests, which generates cache miss on every
read, in parallel with a pointer chasing test(one outstanding request at a
time) and both the cpu's share the same bank of DRAM Controller. In my
setup, as # of L1 MSHRs are 10, memory-intensive test can generate up to 10
Outstanding requests at a time. Since CPU speed is much faster than DRAM
controller, can generate outstanding requests and all the requests are
targeted to same bank, i expect to see the DRAM queue size to be 10 all the
time when there is a request coming from pointer chasing test. If this
assumption is correct i could see a better interference in model as i could
see in real platforms.

Don't you think DRAM queue size would get  filled up to the size of number
of L1 MSHRs according to above scenario. And what could be the case in
order to fill the DRAM up to the size of # of L1 MSHRs.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  The O3 CPU only works with the memory system in timing mode, so I do not
 understand what two points you are comparing when you say the results are
 exactly the same.

  The read queue is likely to never fill up unless all these transactions
 are generated at once. While the first one is being served by the memory
 controller you may have more coming in etc, but I do not understand why you
 think it would ever fill up.

  For “debugging” make sure that the config.ini actually captures what you
 think you are simulating. Also, you have a lot of DRAM-related stats in the
 stats.txt output.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 14 October 2014 04:33

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

  I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing,
 the results are exactly the same compared to mem_mode=atomic.
  I have partitioned the DRAM banks using software. Both the benchmarks-
 latency-sensitive and bandwidth -sensitive (both generates only reads)
 running in parallel using the same DRAM bank.
 From status file, i observe expected number L2 misses and DRAM requests
 are getting generated.
 In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
 32. So i expect that when a request from a latency-sensitive benchmark
 comes to DRAM, the readQ size has to be 10. However what i am observing is
 most of the time the Queue is not getting filled and hence there is less
 queueing latency and interference.

  I am using classic memory system with default DRAM
 controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
 policy-open_adaptive, and frfcfs scheduler.

  Do you have any thoughts on this? How could i debug this further?

  Appreciate your help.

  Thanks,
  Prathap Kumar Valsan
  Research Assistant
  University of Kansas

 On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  Indeed. The atomic mode is for fast-forwarding only. Once you actually
 want to get some representative performance numbers you have to run in
 timing mode with either the O3 or Minor CPU model.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Monday, 13 October 2014 10:19

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Thanks for your reply. The memory mode which I used is atomic. I think,
 I need to run the tests in timing More. I believe which shows up
 interference and queueing delay similar to real platforms.

 Prathap
 On Oct 13, 2014 2:55 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  I don’t dare say exactly what is going wrong in your setup, but I am
 confident that Ruby will not magically make things more representative (it
 will likely give you a whole lot more problems though). In the end it is
 all about configuring the building blocks to match the system you want to
 capture. The crossbars and caches in the classic memory system do make some
 simplifications, but I have not yet seen a case when they are not
 sufficiently accurate.

  Have you looked at the various policy settings in the DRAM controller,
 e.g. the page policy and address mapping? If you’re trying to correlate
 with a real platform, also see Anthony’s ISPASS paper from last year for
 some sensible steps in simplifying the problem and dividing

Re: [gem5-users] Questions on DRAM Controller model

2014-10-14 Thread Andreas Hansson via gem5-users
Hello Prathap,

I do not dare say, but perhaps some interaction between your generated access 
sequence and the O3 model (parameters) restrict the number of outstanding L1 
misses? There are plenty debug flags to help in drilling down on this issue. 
Have a look in src/cpu/o3/Sconscript for the O3 related debug flags and 
src/mem/cache/Sconscript for the cache flags.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Tuesday, October 14, 2014 at 9:21 PM
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas

Whenever i switch to O3 Cpu from a checkpoint, i could see from config.ini that 
CPU is getting switched but the mem_mode is still set to atomic. However when 
booting in O3 CPU itself(without restoring from a checkpoint) the mem_mode is 
set to timing. Not sure why. Anyhow i could run my tests on O3 CPU with 
mem_mode timing(as verified from config.ini)

When i run one memory-intensive tests, which generates cache miss on every 
read, in parallel with a pointer chasing test(one outstanding request at a 
time) and both the cpu's share the same bank of DRAM Controller. In my setup, 
as # of L1 MSHRs are 10, memory-intensive test can generate up to 10 
Outstanding requests at a time. Since CPU speed is much faster than DRAM 
controller, can generate outstanding requests and all the requests are targeted 
to same bank, i expect to see the DRAM queue size to be 10 all the time when 
there is a request coming from pointer chasing test. If this assumption is 
correct i could see a better interference in model as i could see in real 
platforms.

Don't you think DRAM queue size would get  filled up to the size of number of 
L1 MSHRs according to above scenario. And what could be the case in order to 
fill the DRAM up to the size of # of L1 MSHRs.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

The O3 CPU only works with the memory system in timing mode, so I do not 
understand what two points you are comparing when you say the results are 
exactly the same.

The read queue is likely to never fill up unless all these transactions are 
generated at once. While the first one is being served by the memory controller 
you may have more coming in etc, but I do not understand why you think it would 
ever fill up.

For “debugging” make sure that the config.ini actually captures what you think 
you are simulating. Also, you have a lot of DRAM-related stats in the stats.txt 
output.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Tuesday, 14 October 2014 04:33

To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing, the 
results are exactly the same compared to mem_mode=atomic.
I have partitioned the DRAM banks using software. Both the benchmarks- 
latency-sensitive and bandwidth -sensitive (both generates only reads) running 
in parallel using the same DRAM bank.
From status file, i observe expected number L2 misses and DRAM requests are 
getting generated.
In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are  32. So 
i expect that when a request from a latency-sensitive benchmark comes to DRAM, 
the readQ size has to be 10. However what i am observing is most of the time 
the Queue is not getting filled and hence there is less queueing latency and 
interference.

I am using classic memory system with default DRAM controller,DDR3_1600_x64. 
Addressing map is RoRaBaChCo, page policy-open_adaptive, and frfcfs scheduler.

Do you have any thoughts on this? How could i debug this further?

Appreciate your help.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

Indeed. The atomic mode is for fast-forwarding only. Once you actually want to 
get some representative performance numbers you have to run in timing mode with 
either the O3 or Minor CPU model.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Monday, 13 October 2014 10:19

To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model


Thanks for your reply. The memory mode which I used is atomic. I think, I need 
to run the tests in timing More. I believe which shows up interference

Re: [gem5-users] Questions on DRAM Controller model

2014-10-13 Thread Andreas Hansson via gem5-users
Hi Prathap,

I don’t dare say exactly what is going wrong in your setup, but I am confident 
that Ruby will not magically make things more representative (it will likely 
give you a whole lot more problems though). In the end it is all about 
configuring the building blocks to match the system you want to capture. The 
crossbars and caches in the classic memory system do make some simplifications, 
but I have not yet seen a case when they are not sufficiently accurate.

Have you looked at the various policy settings in the DRAM controller, e.g. the 
page policy and address mapping? If you’re trying to correlate with a real 
platform, also see Anthony’s ISPASS paper from last year for some sensible 
steps in simplifying the problem and dividing it into manageable chunks.

Good luck.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Monday, 13 October 2014 00:29
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas/Users,

I used to create a checkpoint until linux boot using Atomic Simple CPU and then 
restore from this checkpoint to detailed O3 cpu before running the test. I 
notice that the mem-mode is  set to atomic and not timing. Will that be the 
reason for less contention in memory bus i am observing?

Thanks,
Prathap

On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath 
kvprat...@gmail.commailto:kvprat...@gmail.com wrote:
Hello Andreas,

Even after configuring the model like the actual hardware, i still not seeing 
enough interference to the read request under consideration. I am using the 
classic memory system model. Since it uses atomic and functional
Packet allocation protocol, I would like to switch to Ruby( I think it more 
resembles with real platform).


I am hitting in to below problem when i use ruby.

/build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches 
--l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4 
--mem-size=512MB 
--kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux 
--disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM 
--dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

Traceback (most recent call last):
  File string, line 1, in module
  File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388, in main
exec filecode in scope
  File configs/example/fs.py, line 302, in module
test_sys = build_test_system(np)
  File configs/example/fs.py, line 138, in build_test_system
Ruby.create_system(options, test_sys, test_sys.iobus, test_sys._dma_ports)
  File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line 825, in 
__getattr__
raise AttributeError, err_string
AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
  (C++ object is not yet constructed, so wrapped C++ methods are unavailable.)

What could be the cause of this?

Thanks,
Prathap



On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

There are many possible reasons for the discrepancy, and obviously there are 
many ways of building a memory controller :-). Have you configured the model to 
look like the actual hardware? The most obvious differences would be in terms 
of buffer sizes, the page policy, arbitration policy, the threshold before 
closing a page, the read/write switching, actual timings etc. It is also worth 
checking if the controller hardware treats writes the same way the model does 
(early responses, minimise switching).

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Tuesday, 9 September 2014 18:56
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas,

Thanks for your reply. I read your ISPASS paper and got a fair understanding 
about the architecture.
I am trying to reproduce the results, collected from running synthetic 
benchmarks (latency and bandwidth) on real hardware, in Simulator 
Environment.However, i could see variations in the results and i am trying to 
understand the reasons.

The experiment has latency(memory non-intensive with random access) as the 
primary task and bandwidth(memory intensive with sequential access) as the 
co-runner task.


On real hardware
case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

On simulator
case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
case 2 - 1 corunner : latency

Re: [gem5-users] Questions on DRAM Controller model

2014-10-13 Thread Prathap Kolakkampadath via gem5-users
Thanks for your reply. The memory mode which I used is atomic. I think, I
need to run the tests in timing More. I believe which shows up interference
and queueing delay similar to real platforms.

Prathap
On Oct 13, 2014 2:55 AM, Andreas Hansson andreas.hans...@arm.com wrote:

  Hi Prathap,

  I don’t dare say exactly what is going wrong in your setup, but I am
 confident that Ruby will not magically make things more representative (it
 will likely give you a whole lot more problems though). In the end it is
 all about configuring the building blocks to match the system you want to
 capture. The crossbars and caches in the classic memory system do make some
 simplifications, but I have not yet seen a case when they are not
 sufficiently accurate.

  Have you looked at the various policy settings in the DRAM controller,
 e.g. the page policy and address mapping? If you’re trying to correlate
 with a real platform, also see Anthony’s ISPASS paper from last year for
 some sensible steps in simplifying the problem and dividing it into
 manageable chunks.

  Good luck.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Monday, 13 October 2014 00:29
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

   Hello Andreas/Users,

 I used to create a checkpoint until linux boot using Atomic Simple CPU and
 then restore from this checkpoint to detailed O3 cpu before running the
 test. I notice that the mem-mode is  set to atomic and not timing. Will
 that be the reason for less contention in memory bus i am observing?

  Thanks,
  Prathap

 On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hello Andreas,

  Even after configuring the model like the actual hardware, i still not
 seeing enough interference to the read request under consideration. I am
 using the classic memory system model. Since it uses atomic and functional
 Packet allocation protocol, I would like to switch to Ruby( I think it
 more resembles with real platform).


  I am hitting in to below problem when i use ruby.

 /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
 --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
 --mem-size=512MB
 --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
 --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM
 --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

 Traceback (most recent call last):
   File string, line 1, in module
   File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388, in
 main
 exec filecode in scope
   File configs/example/fs.py, line 302, in module
 test_sys = build_test_system(np)
   File configs/example/fs.py, line 138, in build_test_system
 Ruby.create_system(options, test_sys, test_sys.iobus,
 test_sys._dma_ports)
   File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line
 825, in __getattr__
 raise AttributeError, err_string
 AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
   (C++ object is not yet constructed, so wrapped C++ methods are
 unavailable.)

  What could be the cause of this?

  Thanks,
 Prathap



 On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  There are many possible reasons for the discrepancy, and obviously
 there are many ways of building a memory controller :-). Have you
 configured the model to look like the actual hardware? The most obvious
 differences would be in terms of buffer sizes, the page policy, arbitration
 policy, the threshold before closing a page, the read/write switching,
 actual timings etc. It is also worth checking if the controller hardware
 treats writes the same way the model does (early responses, minimise
 switching).

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 9 September 2014 18:56
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Hello Andreas,

  Thanks for your reply. I read your ISPASS paper and got a fair
 understanding about the architecture.
 I am trying to reproduce the results, collected from running synthetic
 benchmarks (latency and bandwidth) on real hardware, in Simulator
 Environment.However, i could see variations in the results and i am trying
 to understand the reasons.

  The experiment has latency(memory non-intensive with random access) as
 the primary task and bandwidth(memory intensive with sequential access) as
 the co-runner task.


  On real hardware
 case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
 case 2 - 1 corunner : latency of the test is 225.95ns

Re: [gem5-users] Questions on DRAM Controller model

2014-10-13 Thread Andreas Hansson via gem5-users
Hi Prathap,

Indeed. The atomic mode is for fast-forwarding only. Once you actually want to 
get some representative performance numbers you have to run in timing mode with 
either the O3 or Minor CPU model.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Monday, 13 October 2014 10:19
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model


Thanks for your reply. The memory mode which I used is atomic. I think, I need 
to run the tests in timing More. I believe which shows up interference and 
queueing delay similar to real platforms.

Prathap

On Oct 13, 2014 2:55 AM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

I don’t dare say exactly what is going wrong in your setup, but I am confident 
that Ruby will not magically make things more representative (it will likely 
give you a whole lot more problems though). In the end it is all about 
configuring the building blocks to match the system you want to capture. The 
crossbars and caches in the classic memory system do make some simplifications, 
but I have not yet seen a case when they are not sufficiently accurate.

Have you looked at the various policy settings in the DRAM controller, e.g. the 
page policy and address mapping? If you’re trying to correlate with a real 
platform, also see Anthony’s ISPASS paper from last year for some sensible 
steps in simplifying the problem and dividing it into manageable chunks.

Good luck.

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Monday, 13 October 2014 00:29
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas/Users,

I used to create a checkpoint until linux boot using Atomic Simple CPU and then 
restore from this checkpoint to detailed O3 cpu before running the test. I 
notice that the mem-mode is  set to atomic and not timing. Will that be the 
reason for less contention in memory bus i am observing?

Thanks,
Prathap

On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath 
kvprat...@gmail.commailto:kvprat...@gmail.com wrote:
Hello Andreas,

Even after configuring the model like the actual hardware, i still not seeing 
enough interference to the read request under consideration. I am using the 
classic memory system model. Since it uses atomic and functional
Packet allocation protocol, I would like to switch to Ruby( I think it more 
resembles with real platform).


I am hitting in to below problem when i use ruby.

/build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches 
--l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4 
--mem-size=512MB 
--kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux 
--disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM 
--dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

Traceback (most recent call last):
  File string, line 1, in module
  File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388, in main
exec filecode in scope
  File configs/example/fs.py, line 302, in module
test_sys = build_test_system(np)
  File configs/example/fs.py, line 138, in build_test_system
Ruby.create_system(options, test_sys, test_sys.iobus, test_sys._dma_ports)
  File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line 825, in 
__getattr__
raise AttributeError, err_string
AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
  (C++ object is not yet constructed, so wrapped C++ methods are unavailable.)

What could be the cause of this?

Thanks,
Prathap



On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

There are many possible reasons for the discrepancy, and obviously there are 
many ways of building a memory controller :-). Have you configured the model to 
look like the actual hardware? The most obvious differences would be in terms 
of buffer sizes, the page policy, arbitration policy, the threshold before 
closing a page, the read/write switching, actual timings etc. It is also worth 
checking if the controller hardware treats writes the same way the model does 
(early responses, minimise switching).

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Tuesday, 9 September 2014 18:56
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users

Re: [gem5-users] Questions on DRAM Controller model

2014-10-13 Thread Prathap Kolakkampadath via gem5-users
Hi Andreas, users

I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing, the
results are exactly the same compared to mem_mode=atomic.
I have partitioned the DRAM banks using software. Both the benchmarks-
latency-sensitive and bandwidth -sensitive (both generates only reads)
running in parallel using the same DRAM bank.
From status file, i observe expected number L2 misses and DRAM requests are
getting generated.
In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
32. So i expect that when a request from a latency-sensitive benchmark
comes to DRAM, the readQ size has to be 10. However what i am observing is
most of the time the Queue is not getting filled and hence there is less
queueing latency and interference.

I am using classic memory system with default DRAM
controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
policy-open_adaptive, and frfcfs scheduler.

Do you have any thoughts on this? How could i debug this further?

Appreciate your help.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  Indeed. The atomic mode is for fast-forwarding only. Once you actually
 want to get some representative performance numbers you have to run in
 timing mode with either the O3 or Minor CPU model.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Monday, 13 October 2014 10:19

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Thanks for your reply. The memory mode which I used is atomic. I think,
 I need to run the tests in timing More. I believe which shows up
 interference and queueing delay similar to real platforms.

 Prathap
 On Oct 13, 2014 2:55 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  I don’t dare say exactly what is going wrong in your setup, but I am
 confident that Ruby will not magically make things more representative (it
 will likely give you a whole lot more problems though). In the end it is
 all about configuring the building blocks to match the system you want to
 capture. The crossbars and caches in the classic memory system do make some
 simplifications, but I have not yet seen a case when they are not
 sufficiently accurate.

  Have you looked at the various policy settings in the DRAM controller,
 e.g. the page policy and address mapping? If you’re trying to correlate
 with a real platform, also see Anthony’s ISPASS paper from last year for
 some sensible steps in simplifying the problem and dividing it into
 manageable chunks.

  Good luck.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Monday, 13 October 2014 00:29
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

   Hello Andreas/Users,

 I used to create a checkpoint until linux boot using Atomic Simple CPU
 and then restore from this checkpoint to detailed O3 cpu before running the
 test. I notice that the mem-mode is  set to atomic and not timing. Will
 that be the reason for less contention in memory bus i am observing?

  Thanks,
  Prathap

 On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hello Andreas,

  Even after configuring the model like the actual hardware, i still not
 seeing enough interference to the read request under consideration. I am
 using the classic memory system model. Since it uses atomic and functional
 Packet allocation protocol, I would like to switch to Ruby( I think it
 more resembles with real platform).


  I am hitting in to below problem when i use ruby.

 /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py
 --caches --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB
 --num-cpus=4 --mem-size=512MB
 --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
 --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM
 --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

 Traceback (most recent call last):
   File string, line 1, in module
   File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388,
 in main
 exec filecode in scope
   File configs/example/fs.py, line 302, in module
 test_sys = build_test_system(np)
   File configs/example/fs.py, line 138, in build_test_system
 Ruby.create_system(options, test_sys, test_sys.iobus,
 test_sys._dma_ports)
   File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line
 825, in __getattr__
 raise AttributeError, err_string
 AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
   (C++ object is not yet constructed, so wrapped C

Re: [gem5-users] Questions on DRAM Controller model

2014-10-12 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

Even after configuring the model like the actual hardware, i still not
seeing enough interference to the read request under consideration. I am
using the classic memory system model. Since it uses atomic and functional
Packet allocation protocol, I would like to switch to Ruby( I think it more
resembles with real platform).


I am hitting in to below problem when i use ruby.

/build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
--l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
--mem-size=512MB
--kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
--disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
--machine-type=VExpress_EMM
--dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
--cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

Traceback (most recent call last):
  File string, line 1, in module
  File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388, in
main
exec filecode in scope
  File configs/example/fs.py, line 302, in module
test_sys = build_test_system(np)
  File configs/example/fs.py, line 138, in build_test_system
Ruby.create_system(options, test_sys, test_sys.iobus,
test_sys._dma_ports)
  File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line 825,
in __getattr__
raise AttributeError, err_string
AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
  (C++ object is not yet constructed, so wrapped C++ methods are
unavailable.)

What could be the cause of this?

Thanks,
Prathap



On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  There are many possible reasons for the discrepancy, and obviously there
 are many ways of building a memory controller :-). Have you configured the
 model to look like the actual hardware? The most obvious differences would
 be in terms of buffer sizes, the page policy, arbitration policy, the
 threshold before closing a page, the read/write switching, actual timings
 etc. It is also worth checking if the controller hardware treats writes the
 same way the model does (early responses, minimise switching).

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 9 September 2014 18:56
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Hello Andreas,

  Thanks for your reply. I read your ISPASS paper and got a fair
 understanding about the architecture.
 I am trying to reproduce the results, collected from running synthetic
 benchmarks (latency and bandwidth) on real hardware, in Simulator
 Environment.However, i could see variations in the results and i am trying
 to understand the reasons.

  The experiment has latency(memory non-intensive with random access) as
 the primary task and bandwidth(memory intensive with sequential access) as
 the co-runner task.


  On real hardware
 case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
 case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

  On simulator
  case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
 case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


  Case 1 where latency test run alone(0 corunner), the results matches on
 both environment. However Case 2, when run with bandwidth(1 corunner), the
 results varies a lot.
 Do you have any thoughts about this?
 Thanks,
 Prathap

 On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  Have you read our ISPASS paper from last year? It’s referenced in the
 header file, as well as on gem5.org.

1. Yes and no. Two different buffers are used in the model are used,
but they are random access, so you can treat the entries any way you want.
2. Yes and no. It’s a C++ model, so the scheduler executes in 0 time.
Thus, when looking at the various requests it effectively sees all the
banks.
3. Yes and no. See above.

 Remember that this is a model. The goal is not to be representative down
 to every last element of an RTL design. The goal is to be representative of
 a real design, and then be fast. Both of these goals are delivered upon by
 the model.

  I hope that explains it. IF there is anything in the results you do not
 agree with, please do say so.

  Thanks,

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Monday, 8 September 2014 18:38
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Questions on DRAM Controller model

  Hello Everybody,

 I am using DDR3_1600_x64. I am trying to understand the memory controller
 design and  have few doubts about it.

 1) Do the memory

Re: [gem5-users] Questions on DRAM Controller model

2014-10-12 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas/Users,

I used to create a checkpoint until linux boot using Atomic Simple CPU and
then restore from this checkpoint to detailed O3 cpu before running the
test. I notice that the mem-mode is  set to atomic and not timing. Will
that be the reason for less contention in memory bus i am observing?

Thanks,
Prathap

On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath kvprat...@gmail.com
 wrote:

 Hello Andreas,

 Even after configuring the model like the actual hardware, i still not
 seeing enough interference to the read request under consideration. I am
 using the classic memory system model. Since it uses atomic and functional
 Packet allocation protocol, I would like to switch to Ruby( I think it
 more resembles with real platform).


 I am hitting in to below problem when i use ruby.

 /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
 --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
 --mem-size=512MB
 --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
 --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM
 --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

 Traceback (most recent call last):
   File string, line 1, in module
   File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388, in
 main
 exec filecode in scope
   File configs/example/fs.py, line 302, in module
 test_sys = build_test_system(np)
   File configs/example/fs.py, line 138, in build_test_system
 Ruby.create_system(options, test_sys, test_sys.iobus,
 test_sys._dma_ports)
   File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line
 825, in __getattr__
 raise AttributeError, err_string
 AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
   (C++ object is not yet constructed, so wrapped C++ methods are
 unavailable.)

 What could be the cause of this?

 Thanks,
 Prathap



 On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  There are many possible reasons for the discrepancy, and obviously
 there are many ways of building a memory controller :-). Have you
 configured the model to look like the actual hardware? The most obvious
 differences would be in terms of buffer sizes, the page policy, arbitration
 policy, the threshold before closing a page, the read/write switching,
 actual timings etc. It is also worth checking if the controller hardware
 treats writes the same way the model does (early responses, minimise
 switching).

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 9 September 2014 18:56
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Hello Andreas,

  Thanks for your reply. I read your ISPASS paper and got a fair
 understanding about the architecture.
 I am trying to reproduce the results, collected from running synthetic
 benchmarks (latency and bandwidth) on real hardware, in Simulator
 Environment.However, i could see variations in the results and i am trying
 to understand the reasons.

  The experiment has latency(memory non-intensive with random access) as
 the primary task and bandwidth(memory intensive with sequential access) as
 the co-runner task.


  On real hardware
 case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
 case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

  On simulator
  case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
 case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


  Case 1 where latency test run alone(0 corunner), the results matches on
 both environment. However Case 2, when run with bandwidth(1 corunner), the
 results varies a lot.
 Do you have any thoughts about this?
 Thanks,
 Prathap

 On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  Have you read our ISPASS paper from last year? It’s referenced in the
 header file, as well as on gem5.org.

1. Yes and no. Two different buffers are used in the model are used,
but they are random access, so you can treat the entries any way you 
 want.
2. Yes and no. It’s a C++ model, so the scheduler executes in 0
time. Thus, when looking at the various requests it effectively sees all
the banks.
3. Yes and no. See above.

 Remember that this is a model. The goal is not to be representative down
 to every last element of an RTL design. The goal is to be representative of
 a real design, and then be fast. Both of these goals are delivered upon by
 the model.

  I hope that explains it. IF there is anything in the results you do
 not agree with, please do say so.

  Thanks,

  Andreas

   From: Prathap

Re: [gem5-users] Questions on DRAM Controller model

2014-09-09 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

Thanks for your reply. I read your ISPASS paper and got a fair
understanding about the architecture.
I am trying to reproduce the results, collected from running synthetic
benchmarks (latency and bandwidth) on real hardware, in Simulator
Environment.However, i could see variations in the results and i am trying
to understand the reasons.

The experiment has latency(memory non-intensive with random access) as the
primary task and bandwidth(memory intensive with sequential access) as the
co-runner task.


On real hardware
case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

On simulator
case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


Case 1 where latency test run alone(0 corunner), the results matches on
both environment. However Case 2, when run with bandwidth(1 corunner), the
results varies a lot.
Do you have any thoughts about this?
Thanks,
Prathap

On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  Have you read our ISPASS paper from last year? It’s referenced in the
 header file, as well as on gem5.org.

1. Yes and no. Two different buffers are used in the model are used,
but they are random access, so you can treat the entries any way you want.
2. Yes and no. It’s a C++ model, so the scheduler executes in 0 time.
Thus, when looking at the various requests it effectively sees all the
banks.
3. Yes and no. See above.

 Remember that this is a model. The goal is not to be representative down
 to every last element of an RTL design. The goal is to be representative of
 a real design, and then be fast. Both of these goals are delivered upon by
 the model.

  I hope that explains it. IF there is anything in the results you do not
 agree with, please do say so.

  Thanks,

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Monday, 8 September 2014 18:38
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Questions on DRAM Controller model

  Hello Everybody,

 I am using DDR3_1600_x64. I am trying to understand the memory controller
 design and  have few doubts about it.

 1) Do the memory controller has a separate  Bank request buffer (read and
 write buffer) for each bank or just a global queue?
 2) Is there a scheduler per bank which arbitrates between different queue
 requests parallel with other bank schedulers?
 3) Is there DRAM bus scheduler that arbitrates between different bank
 requests?

 Thanks,
 Prathap

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Questions on DRAM Controller model

2014-09-09 Thread Andreas Hansson via gem5-users
Hi Prathap,

There are many possible reasons for the discrepancy, and obviously there are 
many ways of building a memory controller :-). Have you configured the model to 
look like the actual hardware? The most obvious differences would be in terms 
of buffer sizes, the page policy, arbitration policy, the threshold before 
closing a page, the read/write switching, actual timings etc. It is also worth 
checking if the controller hardware treats writes the same way the model does 
(early responses, minimise switching).

Andreas

From: Prathap Kolakkampadath kvprat...@gmail.commailto:kvprat...@gmail.com
Date: Tuesday, 9 September 2014 18:56
To: Andreas Hansson andreas.hans...@arm.commailto:andreas.hans...@arm.com
Cc: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas,

Thanks for your reply. I read your ISPASS paper and got a fair understanding 
about the architecture.
I am trying to reproduce the results, collected from running synthetic 
benchmarks (latency and bandwidth) on real hardware, in Simulator 
Environment.However, i could see variations in the results and i am trying to 
understand the reasons.

The experiment has latency(memory non-intensive with random access) as the 
primary task and bandwidth(memory intensive with sequential access) as the 
co-runner task.


On real hardware
case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

On simulator
case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


Case 1 where latency test run alone(0 corunner), the results matches on both 
environment. However Case 2, when run with bandwidth(1 corunner), the results 
varies a lot.
Do you have any thoughts about this?
Thanks,
Prathap

On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson 
andreas.hans...@arm.commailto:andreas.hans...@arm.com wrote:
Hi Prathap,

Have you read our ISPASS paper from last year? It’s referenced in the header 
file, as well as on gem5.orghttp://gem5.org.

 1.  Yes and no. Two different buffers are used in the model are used, but they 
are random access, so you can treat the entries any way you want.
 2.  Yes and no. It’s a C++ model, so the scheduler executes in 0 time. Thus, 
when looking at the various requests it effectively sees all the banks.
 3.  Yes and no. See above.

Remember that this is a model. The goal is not to be representative down to 
every last element of an RTL design. The goal is to be representative of a real 
design, and then be fast. Both of these goals are delivered upon by the model.

I hope that explains it. IF there is anything in the results you do not agree 
with, please do say so.

Thanks,

Andreas

From: Prathap Kolakkampadath via gem5-users 
gem5-users@gem5.orgmailto:gem5-users@gem5.org
Reply-To: Prathap Kolakkampadath 
kvprat...@gmail.commailto:kvprat...@gmail.com, gem5 users mailing list 
gem5-users@gem5.orgmailto:gem5-users@gem5.org
Date: Monday, 8 September 2014 18:38
To: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: [gem5-users] Questions on DRAM Controller model

Hello Everybody,

I am using DDR3_1600_x64. I am trying to understand the memory controller 
design and  have few doubts about it.

1) Do the memory controller has a separate  Bank request buffer (read and write 
buffer) for each bank or just a global queue?
2) Is there a scheduler per bank which arbitrates between different queue 
requests parallel with other bank schedulers?
3) Is there DRAM bus scheduler that arbitrates between different bank requests?

Thanks,
Prathap

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England  Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England  Wales, Company No: 2548782


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England  Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England  Wales, Company No: 2548782___
gem5-users mailing list
gem5-users

[gem5-users] Questions on DRAM Controller model

2014-09-08 Thread Prathap Kolakkampadath via gem5-users
Hello Everybody,

I am using DDR3_1600_x64. I am trying to understand the memory controller
design and  have few doubts about it.

1) Do the memory controller has a separate  Bank request buffer (read and
write buffer) for each bank or just a global queue?
2) Is there a scheduler per bank which arbitrates between different queue
requests parallel with other bank schedulers?
3) Is there DRAM bus scheduler that arbitrates between different bank
requests?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Questions on DRAM Controller model

2014-09-08 Thread Andreas Hansson via gem5-users
Hi Prathap,

Have you read our ISPASS paper from last year? It’s referenced in the header 
file, as well as on gem5.org.

 1.  Yes and no. Two different buffers are used in the model are used, but they 
are random access, so you can treat the entries any way you want.
 2.  Yes and no. It’s a C++ model, so the scheduler executes in 0 time. Thus, 
when looking at the various requests it effectively sees all the banks.
 3.  Yes and no. See above.

Remember that this is a model. The goal is not to be representative down to 
every last element of an RTL design. The goal is to be representative of a real 
design, and then be fast. Both of these goals are delivered upon by the model.

I hope that explains it. IF there is anything in the results you do not agree 
with, please do say so.

Thanks,

Andreas

From: Prathap Kolakkampadath via gem5-users 
gem5-users@gem5.orgmailto:gem5-users@gem5.org
Reply-To: Prathap Kolakkampadath 
kvprat...@gmail.commailto:kvprat...@gmail.com, gem5 users mailing list 
gem5-users@gem5.orgmailto:gem5-users@gem5.org
Date: Monday, 8 September 2014 18:38
To: gem5 users mailing list gem5-users@gem5.orgmailto:gem5-users@gem5.org
Subject: [gem5-users] Questions on DRAM Controller model

Hello Everybody,

I am using DDR3_1600_x64. I am trying to understand the memory controller 
design and  have few doubts about it.

1) Do the memory controller has a separate  Bank request buffer (read and write 
buffer) for each bank or just a global queue?
2) Is there a scheduler per bank which arbitrates between different queue 
requests parallel with other bank schedulers?
3) Is there DRAM bus scheduler that arbitrates between different bank requests?

Thanks,
Prathap

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England  Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England  Wales, Company No: 2548782___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users