Re: [gem5-users] Questions on DRAM Controller model

2014-10-16 Thread Andreas Hansson via gem5-users
Hi Prathap,

I’ll leave that question for someone with more insight into the various CPU 
models.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Thursday, October 16, 2014 at 8:59 PM
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas,

According to gem5 Documentation

"Load & store buffers (for read and write access) don’t impose any restriction 
on the number of active memory accesses. Therefore, the maximum number of 
outstanding CPU’s memory access requests is not limited by CPU Memory Object 
but by underlying memory system model."

I assume this means the number outstanding memory accesses of a CPU is depended 
 on number of L1 MSHR. Is this correct?


However, As quoted from the paper 
http://web.eecs.umich.edu/~atgutier/papers/ispass_2014.pdf<http://web.eecs.umich.edu/%7Eatgutier/papers/ispass_2014.pdf>

"gem5’s fetch engine only
allows a single outstanding I-cache access, whereas modern
OoO CPUs are fully pipelined allowing multiple parallel
accesses to instruction cache lines. This specification error in
the fetch stage contributes to the I-cache miss statistic error."

If this is the case then, it limits the number of outstanding read requests? Is 
it possible to generate multiple parallel accesses by setting the 
response_latency of L1 cache to 0?


Kindly let me know what you think on this?

Thanks,
Prathap




On Tue, Oct 14, 2014 at 4:22 PM, Andreas Hansson 
mailto:andreas.hans...@arm.com>> wrote:
Hello Prathap,

I do not dare say, but perhaps some interaction between your generated access 
sequence and the O3 model (parameters) restrict the number of outstanding L1 
misses? There are plenty debug flags to help in drilling down on this issue. 
Have a look in src/cpu/o3/Sconscript for the O3 related debug flags and 
src/mem/cache/Sconscript for the cache flags.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Tuesday, October 14, 2014 at 9:21 PM

To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas

Whenever i switch to O3 Cpu from a checkpoint, i could see from config.ini that 
CPU is getting switched but the mem_mode is still set to atomic. However when 
booting in O3 CPU itself(without restoring from a checkpoint) the mem_mode is 
set to timing. Not sure why. Anyhow i could run my tests on O3 CPU with 
mem_mode timing(as verified from config.ini)

When i run one memory-intensive tests, which generates cache miss on every 
read, in parallel with a pointer chasing test(one outstanding request at a 
time) and both the cpu's share the same bank of DRAM Controller. In my setup, 
as # of L1 MSHRs are 10, memory-intensive test can generate up to 10 
Outstanding requests at a time. Since CPU speed is much faster than DRAM 
controller, can generate outstanding requests and all the requests are targeted 
to same bank, i expect to see the DRAM queue size to be 10 all the time when 
there is a request coming from pointer chasing test. If this assumption is 
correct i could see a better interference in model as i could see in real 
platforms.

Don't you think DRAM queue size would get  filled up to the size of number of 
L1 MSHRs according to above scenario. And what could be the case in order to 
fill the DRAM up to the size of # of L1 MSHRs.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson 
mailto:andreas.hans...@arm.com>> wrote:
Hi Prathap,

The O3 CPU only works with the memory system in timing mode, so I do not 
understand what two points you are comparing when you say the results are 
exactly the same.

The read queue is likely to never fill up unless all these transactions are 
generated at once. While the first one is being served by the memory controller 
you may have more coming in etc, but I do not understand why you think it would 
ever fill up.

For “debugging” make sure that the config.ini actually captures what you think 
you are simulating. Also, you have a lot of DRAM-related stats in the stats.txt 
output.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Tuesday, 14 October 2014 04:33

To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing, the 
results are exactly the same compared to mem_mode=atomic.
I have partitioned the DRAM banks using software. Both the benchmarks- 
latency-sensitive and bandwidth -sensitive (both generates only reads) running 
in parallel using 

Re: [gem5-users] Questions on DRAM Controller model

2014-10-16 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

According to gem5 Documentation

"Load & store buffers (for read and write access) don’t impose any
restriction on the number of active memory accesses. Therefore, the maximum
number of outstanding CPU’s memory access requests is not limited by CPU
Memory Object but by underlying memory system model."

I assume this means the number outstanding memory accesses of a CPU is
depended  on number of L1 MSHR. Is this correct?


However, As quoted from the paper
http://web.eecs.umich.edu/~atgutier/papers/ispass_2014.pdf

"gem5’s fetch engine only
allows a single outstanding I-cache access, whereas modern
OoO CPUs are fully pipelined allowing multiple parallel
accesses to instruction cache lines. This specification error in
the fetch stage contributes to the I-cache miss statistic error."

If this is the case then, it limits the number of outstanding read
requests? Is it possible to generate multiple parallel accesses by setting
the response_latency of L1 cache to 0?


Kindly let me know what you think on this?

Thanks,
Prathap




On Tue, Oct 14, 2014 at 4:22 PM, Andreas Hansson 
wrote:

>  Hello Prathap,
>
>  I do not dare say, but perhaps some interaction between your generated
> access sequence and the O3 model (parameters) restrict the number of
> outstanding L1 misses? There are plenty debug flags to help in drilling
> down on this issue. Have a look in src/cpu/o3/Sconscript for the O3 related
> debug flags and src/mem/cache/Sconscript for the cache flags.
>
>  Andreas
>
>   From: Prathap Kolakkampadath 
> Date: Tuesday, October 14, 2014 at 9:21 PM
>
> To: Andreas Hansson 
> Cc: gem5 users mailing list 
> Subject: Re: [gem5-users] Questions on DRAM Controller model
>
>   Hello Andreas
>
>  Whenever i switch to O3 Cpu from a checkpoint, i could see from
> config.ini that CPU is getting switched but the mem_mode is still set to
> atomic. However when booting in O3 CPU itself(without restoring from a
> checkpoint) the mem_mode is set to timing. Not sure why. Anyhow i could run
> my tests on O3 CPU with mem_mode timing(as verified from config.ini)
>
>  When i run one memory-intensive tests, which generates cache miss on
> every read, in parallel with a pointer chasing test(one outstanding request
> at a time) and both the cpu's share the same bank of DRAM Controller. In my
> setup, as # of L1 MSHRs are 10, memory-intensive test can generate up to 10
> Outstanding requests at a time. Since CPU speed is much faster than DRAM
> controller, can generate outstanding requests and all the requests are
> targeted to same bank, i expect to see the DRAM queue size to be 10 all the
> time when there is a request coming from pointer chasing test. If this
> assumption is correct i could see a better interference in model as i could
> see in real platforms.
>
>  Don't you think DRAM queue size would get  filled up to the size of
> number of L1 MSHRs according to above scenario. And what could be the case
> in order to fill the DRAM up to the size of # of L1 MSHRs.
>
>  Thanks,
>  Prathap Kumar Valsan
>  Research Assistant
>  University of Kansas
>
> On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson 
> wrote:
>
>>  Hi Prathap,
>>
>>  The O3 CPU only works with the memory system in timing mode, so I do
>> not understand what two points you are comparing when you say the results
>> are exactly the same.
>>
>>  The read queue is likely to never fill up unless all these transactions
>> are generated at once. While the first one is being served by the memory
>> controller you may have more coming in etc, but I do not understand why you
>> think it would ever fill up.
>>
>>  For “debugging” make sure that the config.ini actually captures what
>> you think you are simulating. Also, you have a lot of DRAM-related stats in
>> the stats.txt output.
>>
>>  Andreas
>>
>>   From: Prathap Kolakkampadath 
>> Date: Tuesday, 14 October 2014 04:33
>>
>> To: Andreas Hansson 
>> Cc: gem5 users mailing list 
>> Subject: Re: [gem5-users] Questions on DRAM Controller model
>>
>>Hi Andreas, users
>>
>>  I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing,
>> the results are exactly the same compared to mem_mode=atomic.
>>  I have partitioned the DRAM banks using software. Both the benchmarks-
>> latency-sensitive and bandwidth -sensitive (both generates only reads)
>> running in parallel using the same DRAM bank.
>> From status file, i observe expected number L2 misses and DRAM requests
>> are getting generated.
>> In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
>> 32. So i expect that when a request f

Re: [gem5-users] Questions on DRAM Controller model

2014-10-15 Thread Prathap Kolakkampadath via gem5-users
Thanks Andreas.


On Tue, Oct 14, 2014 at 4:22 PM, Andreas Hansson 
wrote:

>  Hello Prathap,
>
>  I do not dare say, but perhaps some interaction between your generated
> access sequence and the O3 model (parameters) restrict the number of
> outstanding L1 misses? There are plenty debug flags to help in drilling
> down on this issue. Have a look in src/cpu/o3/Sconscript for the O3 related
> debug flags and src/mem/cache/Sconscript for the cache flags.
>
>  Andreas
>
>   From: Prathap Kolakkampadath 
> Date: Tuesday, October 14, 2014 at 9:21 PM
>
> To: Andreas Hansson 
> Cc: gem5 users mailing list 
> Subject: Re: [gem5-users] Questions on DRAM Controller model
>
>   Hello Andreas
>
>  Whenever i switch to O3 Cpu from a checkpoint, i could see from
> config.ini that CPU is getting switched but the mem_mode is still set to
> atomic. However when booting in O3 CPU itself(without restoring from a
> checkpoint) the mem_mode is set to timing. Not sure why. Anyhow i could run
> my tests on O3 CPU with mem_mode timing(as verified from config.ini)
>
>  When i run one memory-intensive tests, which generates cache miss on
> every read, in parallel with a pointer chasing test(one outstanding request
> at a time) and both the cpu's share the same bank of DRAM Controller. In my
> setup, as # of L1 MSHRs are 10, memory-intensive test can generate up to 10
> Outstanding requests at a time. Since CPU speed is much faster than DRAM
> controller, can generate outstanding requests and all the requests are
> targeted to same bank, i expect to see the DRAM queue size to be 10 all the
> time when there is a request coming from pointer chasing test. If this
> assumption is correct i could see a better interference in model as i could
> see in real platforms.
>
>  Don't you think DRAM queue size would get  filled up to the size of
> number of L1 MSHRs according to above scenario. And what could be the case
> in order to fill the DRAM up to the size of # of L1 MSHRs.
>
>  Thanks,
>  Prathap Kumar Valsan
>  Research Assistant
>  University of Kansas
>
> On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson 
> wrote:
>
>>  Hi Prathap,
>>
>>  The O3 CPU only works with the memory system in timing mode, so I do
>> not understand what two points you are comparing when you say the results
>> are exactly the same.
>>
>>  The read queue is likely to never fill up unless all these transactions
>> are generated at once. While the first one is being served by the memory
>> controller you may have more coming in etc, but I do not understand why you
>> think it would ever fill up.
>>
>>  For “debugging” make sure that the config.ini actually captures what
>> you think you are simulating. Also, you have a lot of DRAM-related stats in
>> the stats.txt output.
>>
>>  Andreas
>>
>>   From: Prathap Kolakkampadath 
>> Date: Tuesday, 14 October 2014 04:33
>>
>> To: Andreas Hansson 
>> Cc: gem5 users mailing list 
>> Subject: Re: [gem5-users] Questions on DRAM Controller model
>>
>>Hi Andreas, users
>>
>>  I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing,
>> the results are exactly the same compared to mem_mode=atomic.
>>  I have partitioned the DRAM banks using software. Both the benchmarks-
>> latency-sensitive and bandwidth -sensitive (both generates only reads)
>> running in parallel using the same DRAM bank.
>> From status file, i observe expected number L2 misses and DRAM requests
>> are getting generated.
>> In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
>> 32. So i expect that when a request from a latency-sensitive benchmark
>> comes to DRAM, the readQ size has to be 10. However what i am observing is
>> most of the time the Queue is not getting filled and hence there is less
>> queueing latency and interference.
>>
>>  I am using classic memory system with default DRAM
>> controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
>> policy-open_adaptive, and frfcfs scheduler.
>>
>>  Do you have any thoughts on this? How could i debug this further?
>>
>>  Appreciate your help.
>>
>>  Thanks,
>>  Prathap Kumar Valsan
>>  Research Assistant
>>  University of Kansas
>>
>> On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson > > wrote:
>>
>>>  Hi Prathap,
>>>
>>>  Indeed. The atomic mode is for fast-forwarding only. Once you actually
>>> want to get some representative performance numbers you have to run in
>>> timing mode with either the O3 or Minor CPU model.
>>>
>&

Re: [gem5-users] Questions on DRAM Controller model

2014-10-14 Thread Andreas Hansson via gem5-users
Hello Prathap,

I do not dare say, but perhaps some interaction between your generated access 
sequence and the O3 model (parameters) restrict the number of outstanding L1 
misses? There are plenty debug flags to help in drilling down on this issue. 
Have a look in src/cpu/o3/Sconscript for the O3 related debug flags and 
src/mem/cache/Sconscript for the cache flags.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Tuesday, October 14, 2014 at 9:21 PM
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas

Whenever i switch to O3 Cpu from a checkpoint, i could see from config.ini that 
CPU is getting switched but the mem_mode is still set to atomic. However when 
booting in O3 CPU itself(without restoring from a checkpoint) the mem_mode is 
set to timing. Not sure why. Anyhow i could run my tests on O3 CPU with 
mem_mode timing(as verified from config.ini)

When i run one memory-intensive tests, which generates cache miss on every 
read, in parallel with a pointer chasing test(one outstanding request at a 
time) and both the cpu's share the same bank of DRAM Controller. In my setup, 
as # of L1 MSHRs are 10, memory-intensive test can generate up to 10 
Outstanding requests at a time. Since CPU speed is much faster than DRAM 
controller, can generate outstanding requests and all the requests are targeted 
to same bank, i expect to see the DRAM queue size to be 10 all the time when 
there is a request coming from pointer chasing test. If this assumption is 
correct i could see a better interference in model as i could see in real 
platforms.

Don't you think DRAM queue size would get  filled up to the size of number of 
L1 MSHRs according to above scenario. And what could be the case in order to 
fill the DRAM up to the size of # of L1 MSHRs.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson 
mailto:andreas.hans...@arm.com>> wrote:
Hi Prathap,

The O3 CPU only works with the memory system in timing mode, so I do not 
understand what two points you are comparing when you say the results are 
exactly the same.

The read queue is likely to never fill up unless all these transactions are 
generated at once. While the first one is being served by the memory controller 
you may have more coming in etc, but I do not understand why you think it would 
ever fill up.

For “debugging” make sure that the config.ini actually captures what you think 
you are simulating. Also, you have a lot of DRAM-related stats in the stats.txt 
output.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Tuesday, 14 October 2014 04:33

To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing, the 
results are exactly the same compared to mem_mode=atomic.
I have partitioned the DRAM banks using software. Both the benchmarks- 
latency-sensitive and bandwidth -sensitive (both generates only reads) running 
in parallel using the same DRAM bank.
>From status file, i observe expected number L2 misses and DRAM requests are 
>getting generated.
In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are  32. So 
i expect that when a request from a latency-sensitive benchmark comes to DRAM, 
the readQ size has to be 10. However what i am observing is most of the time 
the Queue is not getting filled and hence there is less queueing latency and 
interference.

I am using classic memory system with default DRAM controller,DDR3_1600_x64. 
Addressing map is RoRaBaChCo, page policy-open_adaptive, and frfcfs scheduler.

Do you have any thoughts on this? How could i debug this further?

Appreciate your help.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson 
mailto:andreas.hans...@arm.com>> wrote:
Hi Prathap,

Indeed. The atomic mode is for fast-forwarding only. Once you actually want to 
get some representative performance numbers you have to run in timing mode with 
either the O3 or Minor CPU model.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Monday, 13 October 2014 10:19

To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model


Thanks for your reply. The memory mode which I used is atomic. I think, I need 
to run the tests in timing More. I believe which shows up interference and 
queueing delay similar to real platforms.

Prathap

On Oct 13, 2014 2:55 AM, "Andreas Hansson" 
mailto:a

Re: [gem5-users] Questions on DRAM Controller model

2014-10-14 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas

Whenever i switch to O3 Cpu from a checkpoint, i could see from config.ini
that CPU is getting switched but the mem_mode is still set to atomic.
However when booting in O3 CPU itself(without restoring from a checkpoint)
the mem_mode is set to timing. Not sure why. Anyhow i could run my tests on
O3 CPU with mem_mode timing(as verified from config.ini)

When i run one memory-intensive tests, which generates cache miss on every
read, in parallel with a pointer chasing test(one outstanding request at a
time) and both the cpu's share the same bank of DRAM Controller. In my
setup, as # of L1 MSHRs are 10, memory-intensive test can generate up to 10
Outstanding requests at a time. Since CPU speed is much faster than DRAM
controller, can generate outstanding requests and all the requests are
targeted to same bank, i expect to see the DRAM queue size to be 10 all the
time when there is a request coming from pointer chasing test. If this
assumption is correct i could see a better interference in model as i could
see in real platforms.

Don't you think DRAM queue size would get  filled up to the size of number
of L1 MSHRs according to above scenario. And what could be the case in
order to fill the DRAM up to the size of # of L1 MSHRs.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson 
wrote:

>  Hi Prathap,
>
>  The O3 CPU only works with the memory system in timing mode, so I do not
> understand what two points you are comparing when you say the results are
> exactly the same.
>
>  The read queue is likely to never fill up unless all these transactions
> are generated at once. While the first one is being served by the memory
> controller you may have more coming in etc, but I do not understand why you
> think it would ever fill up.
>
>  For “debugging” make sure that the config.ini actually captures what you
> think you are simulating. Also, you have a lot of DRAM-related stats in the
> stats.txt output.
>
>  Andreas
>
>   From: Prathap Kolakkampadath 
> Date: Tuesday, 14 October 2014 04:33
>
> To: Andreas Hansson 
> Cc: gem5 users mailing list 
> Subject: Re: [gem5-users] Questions on DRAM Controller model
>
>Hi Andreas, users
>
>  I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing,
> the results are exactly the same compared to mem_mode=atomic.
>  I have partitioned the DRAM banks using software. Both the benchmarks-
> latency-sensitive and bandwidth -sensitive (both generates only reads)
> running in parallel using the same DRAM bank.
> From status file, i observe expected number L2 misses and DRAM requests
> are getting generated.
> In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
> 32. So i expect that when a request from a latency-sensitive benchmark
> comes to DRAM, the readQ size has to be 10. However what i am observing is
> most of the time the Queue is not getting filled and hence there is less
> queueing latency and interference.
>
>  I am using classic memory system with default DRAM
> controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
> policy-open_adaptive, and frfcfs scheduler.
>
>  Do you have any thoughts on this? How could i debug this further?
>
>  Appreciate your help.
>
>  Thanks,
>  Prathap Kumar Valsan
>  Research Assistant
>  University of Kansas
>
> On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson 
> wrote:
>
>>  Hi Prathap,
>>
>>  Indeed. The atomic mode is for fast-forwarding only. Once you actually
>> want to get some representative performance numbers you have to run in
>> timing mode with either the O3 or Minor CPU model.
>>
>>  Andreas
>>
>>   From: Prathap Kolakkampadath 
>> Date: Monday, 13 October 2014 10:19
>>
>> To: Andreas Hansson 
>> Cc: gem5 users mailing list 
>> Subject: Re: [gem5-users] Questions on DRAM Controller model
>>
>>  Thanks for your reply. The memory mode which I used is atomic. I think,
>> I need to run the tests in timing More. I believe which shows up
>> interference and queueing delay similar to real platforms.
>>
>> Prathap
>> On Oct 13, 2014 2:55 AM, "Andreas Hansson" 
>> wrote:
>>
>>>  Hi Prathap,
>>>
>>>  I don’t dare say exactly what is going wrong in your setup, but I am
>>> confident that Ruby will not magically make things more representative (it
>>> will likely give you a whole lot more problems though). In the end it is
>>> all about configuring the building blocks to match the system you want to
>>> capture. The crossbars and caches in the classic memory system do make some
>>> simplifications, but I have not ye

Re: [gem5-users] Questions on DRAM Controller model

2014-10-14 Thread Andreas Hansson via gem5-users
Hi Prathap,

The O3 CPU only works with the memory system in timing mode, so I do not 
understand what two points you are comparing when you say the results are 
exactly the same.

The read queue is likely to never fill up unless all these transactions are 
generated at once. While the first one is being served by the memory controller 
you may have more coming in etc, but I do not understand why you think it would 
ever fill up.

For “debugging” make sure that the config.ini actually captures what you think 
you are simulating. Also, you have a lot of DRAM-related stats in the stats.txt 
output.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Tuesday, 14 October 2014 04:33
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing, the 
results are exactly the same compared to mem_mode=atomic.
I have partitioned the DRAM banks using software. Both the benchmarks- 
latency-sensitive and bandwidth -sensitive (both generates only reads) running 
in parallel using the same DRAM bank.
>From status file, i observe expected number L2 misses and DRAM requests are 
>getting generated.
In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are  32. So 
i expect that when a request from a latency-sensitive benchmark comes to DRAM, 
the readQ size has to be 10. However what i am observing is most of the time 
the Queue is not getting filled and hence there is less queueing latency and 
interference.

I am using classic memory system with default DRAM controller,DDR3_1600_x64. 
Addressing map is RoRaBaChCo, page policy-open_adaptive, and frfcfs scheduler.

Do you have any thoughts on this? How could i debug this further?

Appreciate your help.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson 
mailto:andreas.hans...@arm.com>> wrote:
Hi Prathap,

Indeed. The atomic mode is for fast-forwarding only. Once you actually want to 
get some representative performance numbers you have to run in timing mode with 
either the O3 or Minor CPU model.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Monday, 13 October 2014 10:19

To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model


Thanks for your reply. The memory mode which I used is atomic. I think, I need 
to run the tests in timing More. I believe which shows up interference and 
queueing delay similar to real platforms.

Prathap

On Oct 13, 2014 2:55 AM, "Andreas Hansson" 
mailto:andreas.hans...@arm.com>> wrote:
Hi Prathap,

I don’t dare say exactly what is going wrong in your setup, but I am confident 
that Ruby will not magically make things more representative (it will likely 
give you a whole lot more problems though). In the end it is all about 
configuring the building blocks to match the system you want to capture. The 
crossbars and caches in the classic memory system do make some simplifications, 
but I have not yet seen a case when they are not sufficiently accurate.

Have you looked at the various policy settings in the DRAM controller, e.g. the 
page policy and address mapping? If you’re trying to correlate with a real 
platform, also see Anthony’s ISPASS paper from last year for some sensible 
steps in simplifying the problem and dividing it into manageable chunks.

Good luck.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Monday, 13 October 2014 00:29
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas/Users,

I used to create a checkpoint until linux boot using Atomic Simple CPU and then 
restore from this checkpoint to detailed O3 cpu before running the test. I 
notice that the mem-mode is  set to atomic and not timing. Will that be the 
reason for less contention in memory bus i am observing?

Thanks,
Prathap

On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath 
mailto:kvprat...@gmail.com>> wrote:
Hello Andreas,

Even after configuring the model like the actual hardware, i still not seeing 
enough interference to the read request under consideration. I am using the 
classic memory system model. Since it uses atomic and functional
Packet allocation protocol, I would like to switch to Ruby( I think it more 
resembles with real platform).


I am hitting in to below problem when i use ruby.

/build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches 
--l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4 
--mem-size=512MB 
--

Re: [gem5-users] Questions on DRAM Controller model

2014-10-13 Thread Prathap Kolakkampadath via gem5-users
Hi Andreas, users

I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing, the
results are exactly the same compared to mem_mode=atomic.
I have partitioned the DRAM banks using software. Both the benchmarks-
latency-sensitive and bandwidth -sensitive (both generates only reads)
running in parallel using the same DRAM bank.
>From status file, i observe expected number L2 misses and DRAM requests are
getting generated.
In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
32. So i expect that when a request from a latency-sensitive benchmark
comes to DRAM, the readQ size has to be 10. However what i am observing is
most of the time the Queue is not getting filled and hence there is less
queueing latency and interference.

I am using classic memory system with default DRAM
controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
policy-open_adaptive, and frfcfs scheduler.

Do you have any thoughts on this? How could i debug this further?

Appreciate your help.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson 
wrote:

>  Hi Prathap,
>
>  Indeed. The atomic mode is for fast-forwarding only. Once you actually
> want to get some representative performance numbers you have to run in
> timing mode with either the O3 or Minor CPU model.
>
>  Andreas
>
>   From: Prathap Kolakkampadath 
> Date: Monday, 13 October 2014 10:19
>
> To: Andreas Hansson 
> Cc: gem5 users mailing list 
> Subject: Re: [gem5-users] Questions on DRAM Controller model
>
>  Thanks for your reply. The memory mode which I used is atomic. I think,
> I need to run the tests in timing More. I believe which shows up
> interference and queueing delay similar to real platforms.
>
> Prathap
> On Oct 13, 2014 2:55 AM, "Andreas Hansson" 
> wrote:
>
>>  Hi Prathap,
>>
>>  I don’t dare say exactly what is going wrong in your setup, but I am
>> confident that Ruby will not magically make things more representative (it
>> will likely give you a whole lot more problems though). In the end it is
>> all about configuring the building blocks to match the system you want to
>> capture. The crossbars and caches in the classic memory system do make some
>> simplifications, but I have not yet seen a case when they are not
>> sufficiently accurate.
>>
>>  Have you looked at the various policy settings in the DRAM controller,
>> e.g. the page policy and address mapping? If you’re trying to correlate
>> with a real platform, also see Anthony’s ISPASS paper from last year for
>> some sensible steps in simplifying the problem and dividing it into
>> manageable chunks.
>>
>>  Good luck.
>>
>>  Andreas
>>
>>   From: Prathap Kolakkampadath 
>> Date: Monday, 13 October 2014 00:29
>> To: Andreas Hansson 
>> Cc: gem5 users mailing list 
>> Subject: Re: [gem5-users] Questions on DRAM Controller model
>>
>>   Hello Andreas/Users,
>>
>> I used to create a checkpoint until linux boot using Atomic Simple CPU
>> and then restore from this checkpoint to detailed O3 cpu before running the
>> test. I notice that the mem-mode is  set to atomic and not timing. Will
>> that be the reason for less contention in memory bus i am observing?
>>
>>  Thanks,
>>  Prathap
>>
>> On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath <
>> kvprat...@gmail.com> wrote:
>>
>>>  Hello Andreas,
>>>
>>>  Even after configuring the model like the actual hardware, i still not
>>> seeing enough interference to the read request under consideration. I am
>>> using the classic memory system model. Since it uses atomic and functional
>>> Packet allocation protocol, I would like to switch to Ruby( I think it
>>> more resembles with real platform).
>>>
>>>
>>>  I am hitting in to below problem when i use ruby.
>>>
>>> /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py
>>> --caches --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB
>>> --num-cpus=4 --mem-size=512MB
>>> --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
>>> --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
>>> --machine-type=VExpress_EMM
>>> --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
>>> --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64
>>>
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>>   File "/home/prathap/WorkSpace/gem5/src/python/

Re: [gem5-users] Questions on DRAM Controller model

2014-10-13 Thread Andreas Hansson via gem5-users
Hi Prathap,

Indeed. The atomic mode is for fast-forwarding only. Once you actually want to 
get some representative performance numbers you have to run in timing mode with 
either the O3 or Minor CPU model.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Monday, 13 October 2014 10:19
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model


Thanks for your reply. The memory mode which I used is atomic. I think, I need 
to run the tests in timing More. I believe which shows up interference and 
queueing delay similar to real platforms.

Prathap

On Oct 13, 2014 2:55 AM, "Andreas Hansson" 
mailto:andreas.hans...@arm.com>> wrote:
Hi Prathap,

I don’t dare say exactly what is going wrong in your setup, but I am confident 
that Ruby will not magically make things more representative (it will likely 
give you a whole lot more problems though). In the end it is all about 
configuring the building blocks to match the system you want to capture. The 
crossbars and caches in the classic memory system do make some simplifications, 
but I have not yet seen a case when they are not sufficiently accurate.

Have you looked at the various policy settings in the DRAM controller, e.g. the 
page policy and address mapping? If you’re trying to correlate with a real 
platform, also see Anthony’s ISPASS paper from last year for some sensible 
steps in simplifying the problem and dividing it into manageable chunks.

Good luck.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Monday, 13 October 2014 00:29
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas/Users,

I used to create a checkpoint until linux boot using Atomic Simple CPU and then 
restore from this checkpoint to detailed O3 cpu before running the test. I 
notice that the mem-mode is  set to atomic and not timing. Will that be the 
reason for less contention in memory bus i am observing?

Thanks,
Prathap

On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath 
mailto:kvprat...@gmail.com>> wrote:
Hello Andreas,

Even after configuring the model like the actual hardware, i still not seeing 
enough interference to the read request under consideration. I am using the 
classic memory system model. Since it uses atomic and functional
Packet allocation protocol, I would like to switch to Ruby( I think it more 
resembles with real platform).


I am hitting in to below problem when i use ruby.

/build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches 
--l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4 
--mem-size=512MB 
--kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux 
--disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM 
--dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

Traceback (most recent call last):
  File "", line 1, in 
  File "/home/prathap/WorkSpace/gem5/src/python/m5/main.py", line 388, in main
exec filecode in scope
  File "configs/example/fs.py", line 302, in 
test_sys = build_test_system(np)
  File "configs/example/fs.py", line 138, in build_test_system
Ruby.create_system(options, test_sys, test_sys.iobus, test_sys._dma_ports)
  File "/home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py", line 825, in 
__getattr__
raise AttributeError, err_string
AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
  (C++ object is not yet constructed, so wrapped C++ methods are unavailable.)

What could be the cause of this?

Thanks,
Prathap



On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson 
mailto:andreas.hans...@arm.com>> wrote:
Hi Prathap,

There are many possible reasons for the discrepancy, and obviously there are 
many ways of building a memory controller :-). Have you configured the model to 
look like the actual hardware? The most obvious differences would be in terms 
of buffer sizes, the page policy, arbitration policy, the threshold before 
closing a page, the read/write switching, actual timings etc. It is also worth 
checking if the controller hardware treats writes the same way the model does 
(early responses, minimise switching).

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Tuesday, 9 September 2014 18:56
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas,

Thanks for your repl

Re: [gem5-users] Questions on DRAM Controller model

2014-10-13 Thread Prathap Kolakkampadath via gem5-users
Thanks for your reply. The memory mode which I used is atomic. I think, I
need to run the tests in timing More. I believe which shows up interference
and queueing delay similar to real platforms.

Prathap
On Oct 13, 2014 2:55 AM, "Andreas Hansson"  wrote:

>  Hi Prathap,
>
>  I don’t dare say exactly what is going wrong in your setup, but I am
> confident that Ruby will not magically make things more representative (it
> will likely give you a whole lot more problems though). In the end it is
> all about configuring the building blocks to match the system you want to
> capture. The crossbars and caches in the classic memory system do make some
> simplifications, but I have not yet seen a case when they are not
> sufficiently accurate.
>
>  Have you looked at the various policy settings in the DRAM controller,
> e.g. the page policy and address mapping? If you’re trying to correlate
> with a real platform, also see Anthony’s ISPASS paper from last year for
> some sensible steps in simplifying the problem and dividing it into
> manageable chunks.
>
>  Good luck.
>
>  Andreas
>
>   From: Prathap Kolakkampadath 
> Date: Monday, 13 October 2014 00:29
> To: Andreas Hansson 
> Cc: gem5 users mailing list 
> Subject: Re: [gem5-users] Questions on DRAM Controller model
>
>   Hello Andreas/Users,
>
> I used to create a checkpoint until linux boot using Atomic Simple CPU and
> then restore from this checkpoint to detailed O3 cpu before running the
> test. I notice that the mem-mode is  set to atomic and not timing. Will
> that be the reason for less contention in memory bus i am observing?
>
>  Thanks,
>  Prathap
>
> On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath <
> kvprat...@gmail.com> wrote:
>
>>  Hello Andreas,
>>
>>  Even after configuring the model like the actual hardware, i still not
>> seeing enough interference to the read request under consideration. I am
>> using the classic memory system model. Since it uses atomic and functional
>> Packet allocation protocol, I would like to switch to Ruby( I think it
>> more resembles with real platform).
>>
>>
>>  I am hitting in to below problem when i use ruby.
>>
>> /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
>> --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
>> --mem-size=512MB
>> --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
>> --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
>> --machine-type=VExpress_EMM
>> --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
>> --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64
>>
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File "/home/prathap/WorkSpace/gem5/src/python/m5/main.py", line 388, in
>> main
>> exec filecode in scope
>>   File "configs/example/fs.py", line 302, in 
>> test_sys = build_test_system(np)
>>   File "configs/example/fs.py", line 138, in build_test_system
>> Ruby.create_system(options, test_sys, test_sys.iobus,
>> test_sys._dma_ports)
>>   File "/home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py", line
>> 825, in __getattr__
>> raise AttributeError, err_string
>> AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
>>   (C++ object is not yet constructed, so wrapped C++ methods are
>> unavailable.)
>>
>>  What could be the cause of this?
>>
>>  Thanks,
>> Prathap
>>
>>
>>
>> On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson 
>> wrote:
>>
>>>  Hi Prathap,
>>>
>>>  There are many possible reasons for the discrepancy, and obviously
>>> there are many ways of building a memory controller :-). Have you
>>> configured the model to look like the actual hardware? The most obvious
>>> differences would be in terms of buffer sizes, the page policy, arbitration
>>> policy, the threshold before closing a page, the read/write switching,
>>> actual timings etc. It is also worth checking if the controller hardware
>>> treats writes the same way the model does (early responses, minimise
>>> switching).
>>>
>>>  Andreas
>>>
>>>   From: Prathap Kolakkampadath 
>>> Date: Tuesday, 9 September 2014 18:56
>>> To: Andreas Hansson 
>>> Cc: gem5 users mailing list 
>>> Subject: Re: [gem5-users] Questions on DRAM Controller model
>>>
>>&g

Re: [gem5-users] Questions on DRAM Controller model

2014-10-13 Thread Andreas Hansson via gem5-users
Hi Prathap,

I don’t dare say exactly what is going wrong in your setup, but I am confident 
that Ruby will not magically make things more representative (it will likely 
give you a whole lot more problems though). In the end it is all about 
configuring the building blocks to match the system you want to capture. The 
crossbars and caches in the classic memory system do make some simplifications, 
but I have not yet seen a case when they are not sufficiently accurate.

Have you looked at the various policy settings in the DRAM controller, e.g. the 
page policy and address mapping? If you’re trying to correlate with a real 
platform, also see Anthony’s ISPASS paper from last year for some sensible 
steps in simplifying the problem and dividing it into manageable chunks.

Good luck.

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Monday, 13 October 2014 00:29
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas/Users,

I used to create a checkpoint until linux boot using Atomic Simple CPU and then 
restore from this checkpoint to detailed O3 cpu before running the test. I 
notice that the mem-mode is  set to atomic and not timing. Will that be the 
reason for less contention in memory bus i am observing?

Thanks,
Prathap

On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath 
mailto:kvprat...@gmail.com>> wrote:
Hello Andreas,

Even after configuring the model like the actual hardware, i still not seeing 
enough interference to the read request under consideration. I am using the 
classic memory system model. Since it uses atomic and functional
Packet allocation protocol, I would like to switch to Ruby( I think it more 
resembles with real platform).


I am hitting in to below problem when i use ruby.

/build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches 
--l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4 
--mem-size=512MB 
--kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux 
--disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM 
--dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

Traceback (most recent call last):
  File "", line 1, in 
  File "/home/prathap/WorkSpace/gem5/src/python/m5/main.py", line 388, in main
exec filecode in scope
  File "configs/example/fs.py", line 302, in 
test_sys = build_test_system(np)
  File "configs/example/fs.py", line 138, in build_test_system
Ruby.create_system(options, test_sys, test_sys.iobus, test_sys._dma_ports)
  File "/home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py", line 825, in 
__getattr__
raise AttributeError, err_string
AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
  (C++ object is not yet constructed, so wrapped C++ methods are unavailable.)

What could be the cause of this?

Thanks,
Prathap



On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson 
mailto:andreas.hans...@arm.com>> wrote:
Hi Prathap,

There are many possible reasons for the discrepancy, and obviously there are 
many ways of building a memory controller :-). Have you configured the model to 
look like the actual hardware? The most obvious differences would be in terms 
of buffer sizes, the page policy, arbitration policy, the threshold before 
closing a page, the read/write switching, actual timings etc. It is also worth 
checking if the controller hardware treats writes the same way the model does 
(early responses, minimise switching).

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Tuesday, 9 September 2014 18:56
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas,

Thanks for your reply. I read your ISPASS paper and got a fair understanding 
about the architecture.
I am trying to reproduce the results, collected from running synthetic 
benchmarks (latency and bandwidth) on real hardware, in Simulator 
Environment.However, i could see variations in the results and i am trying to 
understand the reasons.

The experiment has latency(memory non-intensive with random access) as the 
primary task and bandwidth(memory intensive with sequential access) as the 
co-runner task.


On real hardware
case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

On simulator
case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s

Re: [gem5-users] Questions on DRAM Controller model

2014-10-12 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas/Users,

I used to create a checkpoint until linux boot using Atomic Simple CPU and
then restore from this checkpoint to detailed O3 cpu before running the
test. I notice that the mem-mode is  set to atomic and not timing. Will
that be the reason for less contention in memory bus i am observing?

Thanks,
Prathap

On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath  wrote:

> Hello Andreas,
>
> Even after configuring the model like the actual hardware, i still not
> seeing enough interference to the read request under consideration. I am
> using the classic memory system model. Since it uses atomic and functional
> Packet allocation protocol, I would like to switch to Ruby( I think it
> more resembles with real platform).
>
>
> I am hitting in to below problem when i use ruby.
>
> /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
> --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
> --mem-size=512MB
> --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
> --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
> --machine-type=VExpress_EMM
> --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
> --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64
>
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/home/prathap/WorkSpace/gem5/src/python/m5/main.py", line 388, in
> main
> exec filecode in scope
>   File "configs/example/fs.py", line 302, in 
> test_sys = build_test_system(np)
>   File "configs/example/fs.py", line 138, in build_test_system
> Ruby.create_system(options, test_sys, test_sys.iobus,
> test_sys._dma_ports)
>   File "/home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py", line
> 825, in __getattr__
> raise AttributeError, err_string
> AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
>   (C++ object is not yet constructed, so wrapped C++ methods are
> unavailable.)
>
> What could be the cause of this?
>
> Thanks,
> Prathap
>
>
>
> On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson 
> wrote:
>
>>  Hi Prathap,
>>
>>  There are many possible reasons for the discrepancy, and obviously
>> there are many ways of building a memory controller :-). Have you
>> configured the model to look like the actual hardware? The most obvious
>> differences would be in terms of buffer sizes, the page policy, arbitration
>> policy, the threshold before closing a page, the read/write switching,
>> actual timings etc. It is also worth checking if the controller hardware
>> treats writes the same way the model does (early responses, minimise
>> switching).
>>
>>  Andreas
>>
>>   From: Prathap Kolakkampadath 
>> Date: Tuesday, 9 September 2014 18:56
>> To: Andreas Hansson 
>> Cc: gem5 users mailing list 
>> Subject: Re: [gem5-users] Questions on DRAM Controller model
>>
>>  Hello Andreas,
>>
>>  Thanks for your reply. I read your ISPASS paper and got a fair
>> understanding about the architecture.
>> I am trying to reproduce the results, collected from running synthetic
>> benchmarks (latency and bandwidth) on real hardware, in Simulator
>> Environment.However, i could see variations in the results and i am trying
>> to understand the reasons.
>>
>>  The experiment has latency(memory non-intensive with random access) as
>> the primary task and bandwidth(memory intensive with sequential access) as
>> the co-runner task.
>>
>>
>>  On real hardware
>> case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
>> case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s
>>
>>  On simulator
>>  case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
>> case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s
>>
>>
>>  Case 1 where latency test run alone(0 corunner), the results matches on
>> both environment. However Case 2, when run with bandwidth(1 corunner), the
>> results varies a lot.
>> Do you have any thoughts about this?
>> Thanks,
>> Prathap
>>
>> On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson 
>> wrote:
>>
>>>  Hi Prathap,
>>>
>>>  Have you read our ISPASS paper from last year? It’s referenced in the
>>> header file, as well as on gem5.org.
>>>
>>>1. Yes and no. Two different buffers are used in the model are used,
>>>but they are random access,

Re: [gem5-users] Questions on DRAM Controller model

2014-10-12 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

Even after configuring the model like the actual hardware, i still not
seeing enough interference to the read request under consideration. I am
using the classic memory system model. Since it uses atomic and functional
Packet allocation protocol, I would like to switch to Ruby( I think it more
resembles with real platform).


I am hitting in to below problem when i use ruby.

/build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
--l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
--mem-size=512MB
--kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
--disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
--machine-type=VExpress_EMM
--dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
--cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

Traceback (most recent call last):
  File "", line 1, in 
  File "/home/prathap/WorkSpace/gem5/src/python/m5/main.py", line 388, in
main
exec filecode in scope
  File "configs/example/fs.py", line 302, in 
test_sys = build_test_system(np)
  File "configs/example/fs.py", line 138, in build_test_system
Ruby.create_system(options, test_sys, test_sys.iobus,
test_sys._dma_ports)
  File "/home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py", line 825,
in __getattr__
raise AttributeError, err_string
AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
  (C++ object is not yet constructed, so wrapped C++ methods are
unavailable.)

What could be the cause of this?

Thanks,
Prathap



On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson 
wrote:

>  Hi Prathap,
>
>  There are many possible reasons for the discrepancy, and obviously there
> are many ways of building a memory controller :-). Have you configured the
> model to look like the actual hardware? The most obvious differences would
> be in terms of buffer sizes, the page policy, arbitration policy, the
> threshold before closing a page, the read/write switching, actual timings
> etc. It is also worth checking if the controller hardware treats writes the
> same way the model does (early responses, minimise switching).
>
>  Andreas
>
>   From: Prathap Kolakkampadath 
> Date: Tuesday, 9 September 2014 18:56
> To: Andreas Hansson 
> Cc: gem5 users mailing list 
> Subject: Re: [gem5-users] Questions on DRAM Controller model
>
>  Hello Andreas,
>
>  Thanks for your reply. I read your ISPASS paper and got a fair
> understanding about the architecture.
> I am trying to reproduce the results, collected from running synthetic
> benchmarks (latency and bandwidth) on real hardware, in Simulator
> Environment.However, i could see variations in the results and i am trying
> to understand the reasons.
>
>  The experiment has latency(memory non-intensive with random access) as
> the primary task and bandwidth(memory intensive with sequential access) as
> the co-runner task.
>
>
>  On real hardware
> case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
> case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s
>
>  On simulator
>  case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
> case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s
>
>
>  Case 1 where latency test run alone(0 corunner), the results matches on
> both environment. However Case 2, when run with bandwidth(1 corunner), the
> results varies a lot.
> Do you have any thoughts about this?
> Thanks,
> Prathap
>
> On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson 
> wrote:
>
>>  Hi Prathap,
>>
>>  Have you read our ISPASS paper from last year? It’s referenced in the
>> header file, as well as on gem5.org.
>>
>>1. Yes and no. Two different buffers are used in the model are used,
>>but they are random access, so you can treat the entries any way you want.
>>2. Yes and no. It’s a C++ model, so the scheduler executes in 0 time.
>>Thus, when looking at the various requests it effectively sees all the
>>banks.
>>3. Yes and no. See above.
>>
>> Remember that this is a model. The goal is not to be representative down
>> to every last element of an RTL design. The goal is to be representative of
>> a real design, and then be fast. Both of these goals are delivered upon by
>> the model.
>>
>>  I hope that explains it. IF there is anything in the results you do not
>> agree with, please do say so.
>>
>>  Thanks,
>>
>>  Andreas
>>
>>   From: Prathap Kolakkampadath via gem5-users 
>> Reply-To: Prathap Kolakkampadath , gem5 users
>> m

Re: [gem5-users] Questions on DRAM Controller model

2014-09-09 Thread Andreas Hansson via gem5-users
Hi Prathap,

There are many possible reasons for the discrepancy, and obviously there are 
many ways of building a memory controller :-). Have you configured the model to 
look like the actual hardware? The most obvious differences would be in terms 
of buffer sizes, the page policy, arbitration policy, the threshold before 
closing a page, the read/write switching, actual timings etc. It is also worth 
checking if the controller hardware treats writes the same way the model does 
(early responses, minimise switching).

Andreas

From: Prathap Kolakkampadath mailto:kvprat...@gmail.com>>
Date: Tuesday, 9 September 2014 18:56
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Questions on DRAM Controller model

Hello Andreas,

Thanks for your reply. I read your ISPASS paper and got a fair understanding 
about the architecture.
I am trying to reproduce the results, collected from running synthetic 
benchmarks (latency and bandwidth) on real hardware, in Simulator 
Environment.However, i could see variations in the results and i am trying to 
understand the reasons.

The experiment has latency(memory non-intensive with random access) as the 
primary task and bandwidth(memory intensive with sequential access) as the 
co-runner task.


On real hardware
case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

On simulator
case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


Case 1 where latency test run alone(0 corunner), the results matches on both 
environment. However Case 2, when run with bandwidth(1 corunner), the results 
varies a lot.
Do you have any thoughts about this?
Thanks,
Prathap

On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson 
mailto:andreas.hans...@arm.com>> wrote:
Hi Prathap,

Have you read our ISPASS paper from last year? It’s referenced in the header 
file, as well as on gem5.org<http://gem5.org>.

 1.  Yes and no. Two different buffers are used in the model are used, but they 
are random access, so you can treat the entries any way you want.
 2.  Yes and no. It’s a C++ model, so the scheduler executes in 0 time. Thus, 
when looking at the various requests it effectively sees all the banks.
 3.  Yes and no. See above.

Remember that this is a model. The goal is not to be representative down to 
every last element of an RTL design. The goal is to be representative of a real 
design, and then be fast. Both of these goals are delivered upon by the model.

I hope that explains it. IF there is anything in the results you do not agree 
with, please do say so.

Thanks,

Andreas

From: Prathap Kolakkampadath via gem5-users 
mailto:gem5-users@gem5.org>>
Reply-To: Prathap Kolakkampadath 
mailto:kvprat...@gmail.com>>, gem5 users mailing list 
mailto:gem5-users@gem5.org>>
Date: Monday, 8 September 2014 18:38
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: [gem5-users] Questions on DRAM Controller model

Hello Everybody,

I am using DDR3_1600_x64. I am trying to understand the memory controller 
design and  have few doubts about it.

1) Do the memory controller has a separate  Bank request buffer (read and write 
buffer) for each bank or just a global queue?
2) Is there a scheduler per bank which arbitrates between different queue 
requests parallel with other bank schedulers?
3) Is there DRAM bus scheduler that arbitrates between different bank requests?

Thanks,
Prathap

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England & Wales, Company No: 2548782


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England & Wales, Company No: 2548782___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Questions on DRAM Controller model

2014-09-09 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

Thanks for your reply. I read your ISPASS paper and got a fair
understanding about the architecture.
I am trying to reproduce the results, collected from running synthetic
benchmarks (latency and bandwidth) on real hardware, in Simulator
Environment.However, i could see variations in the results and i am trying
to understand the reasons.

The experiment has latency(memory non-intensive with random access) as the
primary task and bandwidth(memory intensive with sequential access) as the
co-runner task.


On real hardware
case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

On simulator
case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


Case 1 where latency test run alone(0 corunner), the results matches on
both environment. However Case 2, when run with bandwidth(1 corunner), the
results varies a lot.
Do you have any thoughts about this?
Thanks,
Prathap

On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson 
wrote:

>  Hi Prathap,
>
>  Have you read our ISPASS paper from last year? It’s referenced in the
> header file, as well as on gem5.org.
>
>1. Yes and no. Two different buffers are used in the model are used,
>but they are random access, so you can treat the entries any way you want.
>2. Yes and no. It’s a C++ model, so the scheduler executes in 0 time.
>Thus, when looking at the various requests it effectively sees all the
>banks.
>3. Yes and no. See above.
>
> Remember that this is a model. The goal is not to be representative down
> to every last element of an RTL design. The goal is to be representative of
> a real design, and then be fast. Both of these goals are delivered upon by
> the model.
>
>  I hope that explains it. IF there is anything in the results you do not
> agree with, please do say so.
>
>  Thanks,
>
>  Andreas
>
>   From: Prathap Kolakkampadath via gem5-users 
> Reply-To: Prathap Kolakkampadath , gem5 users
> mailing list 
> Date: Monday, 8 September 2014 18:38
> To: gem5 users mailing list 
> Subject: [gem5-users] Questions on DRAM Controller model
>
>  Hello Everybody,
>
> I am using DDR3_1600_x64. I am trying to understand the memory controller
> design and  have few doubts about it.
>
> 1) Do the memory controller has a separate  Bank request buffer (read and
> write buffer) for each bank or just a global queue?
> 2) Is there a scheduler per bank which arbitrates between different queue
> requests parallel with other bank schedulers?
> 3) Is there DRAM bus scheduler that arbitrates between different bank
> requests?
>
> Thanks,
> Prathap
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No: 2557590
> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No: 2548782
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Questions on DRAM Controller model

2014-09-08 Thread Andreas Hansson via gem5-users
Hi Prathap,

Have you read our ISPASS paper from last year? It’s referenced in the header 
file, as well as on gem5.org.

 1.  Yes and no. Two different buffers are used in the model are used, but they 
are random access, so you can treat the entries any way you want.
 2.  Yes and no. It’s a C++ model, so the scheduler executes in 0 time. Thus, 
when looking at the various requests it effectively sees all the banks.
 3.  Yes and no. See above.

Remember that this is a model. The goal is not to be representative down to 
every last element of an RTL design. The goal is to be representative of a real 
design, and then be fast. Both of these goals are delivered upon by the model.

I hope that explains it. IF there is anything in the results you do not agree 
with, please do say so.

Thanks,

Andreas

From: Prathap Kolakkampadath via gem5-users 
mailto:gem5-users@gem5.org>>
Reply-To: Prathap Kolakkampadath 
mailto:kvprat...@gmail.com>>, gem5 users mailing list 
mailto:gem5-users@gem5.org>>
Date: Monday, 8 September 2014 18:38
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: [gem5-users] Questions on DRAM Controller model

Hello Everybody,

I am using DDR3_1600_x64. I am trying to understand the memory controller 
design and  have few doubts about it.

1) Do the memory controller has a separate  Bank request buffer (read and write 
buffer) for each bank or just a global queue?
2) Is there a scheduler per bank which arbitrates between different queue 
requests parallel with other bank schedulers?
3) Is there DRAM bus scheduler that arbitrates between different bank requests?

Thanks,
Prathap

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England & Wales, Company No: 2548782___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users