Re: [gem5-users] Questions on DRAM Controller model

2014-10-12 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

Even after configuring the model like the actual hardware, i still not
seeing enough interference to the read request under consideration. I am
using the classic memory system model. Since it uses atomic and functional
Packet allocation protocol, I would like to switch to Ruby( I think it more
resembles with real platform).


I am hitting in to below problem when i use ruby.

/build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
--l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
--mem-size=512MB
--kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
--disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
--machine-type=VExpress_EMM
--dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
--cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

Traceback (most recent call last):
  File string, line 1, in module
  File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388, in
main
exec filecode in scope
  File configs/example/fs.py, line 302, in module
test_sys = build_test_system(np)
  File configs/example/fs.py, line 138, in build_test_system
Ruby.create_system(options, test_sys, test_sys.iobus,
test_sys._dma_ports)
  File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line 825,
in __getattr__
raise AttributeError, err_string
AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
  (C++ object is not yet constructed, so wrapped C++ methods are
unavailable.)

What could be the cause of this?

Thanks,
Prathap



On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  There are many possible reasons for the discrepancy, and obviously there
 are many ways of building a memory controller :-). Have you configured the
 model to look like the actual hardware? The most obvious differences would
 be in terms of buffer sizes, the page policy, arbitration policy, the
 threshold before closing a page, the read/write switching, actual timings
 etc. It is also worth checking if the controller hardware treats writes the
 same way the model does (early responses, minimise switching).

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 9 September 2014 18:56
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Hello Andreas,

  Thanks for your reply. I read your ISPASS paper and got a fair
 understanding about the architecture.
 I am trying to reproduce the results, collected from running synthetic
 benchmarks (latency and bandwidth) on real hardware, in Simulator
 Environment.However, i could see variations in the results and i am trying
 to understand the reasons.

  The experiment has latency(memory non-intensive with random access) as
 the primary task and bandwidth(memory intensive with sequential access) as
 the co-runner task.


  On real hardware
 case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
 case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

  On simulator
  case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
 case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


  Case 1 where latency test run alone(0 corunner), the results matches on
 both environment. However Case 2, when run with bandwidth(1 corunner), the
 results varies a lot.
 Do you have any thoughts about this?
 Thanks,
 Prathap

 On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  Have you read our ISPASS paper from last year? It’s referenced in the
 header file, as well as on gem5.org.

1. Yes and no. Two different buffers are used in the model are used,
but they are random access, so you can treat the entries any way you want.
2. Yes and no. It’s a C++ model, so the scheduler executes in 0 time.
Thus, when looking at the various requests it effectively sees all the
banks.
3. Yes and no. See above.

 Remember that this is a model. The goal is not to be representative down
 to every last element of an RTL design. The goal is to be representative of
 a real design, and then be fast. Both of these goals are delivered upon by
 the model.

  I hope that explains it. IF there is anything in the results you do not
 agree with, please do say so.

  Thanks,

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Monday, 8 September 2014 18:38
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Questions on DRAM Controller model

  Hello Everybody,

 I am using DDR3_1600_x64. I am trying to understand the memory controller
 design and  have few doubts about it.

 1) Do the memory 

Re: [gem5-users] Questions on DRAM Controller model

2014-10-12 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas/Users,

I used to create a checkpoint until linux boot using Atomic Simple CPU and
then restore from this checkpoint to detailed O3 cpu before running the
test. I notice that the mem-mode is  set to atomic and not timing. Will
that be the reason for less contention in memory bus i am observing?

Thanks,
Prathap

On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath kvprat...@gmail.com
 wrote:

 Hello Andreas,

 Even after configuring the model like the actual hardware, i still not
 seeing enough interference to the read request under consideration. I am
 using the classic memory system model. Since it uses atomic and functional
 Packet allocation protocol, I would like to switch to Ruby( I think it
 more resembles with real platform).


 I am hitting in to below problem when i use ruby.

 /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
 --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
 --mem-size=512MB
 --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
 --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM
 --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

 Traceback (most recent call last):
   File string, line 1, in module
   File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388, in
 main
 exec filecode in scope
   File configs/example/fs.py, line 302, in module
 test_sys = build_test_system(np)
   File configs/example/fs.py, line 138, in build_test_system
 Ruby.create_system(options, test_sys, test_sys.iobus,
 test_sys._dma_ports)
   File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line
 825, in __getattr__
 raise AttributeError, err_string
 AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
   (C++ object is not yet constructed, so wrapped C++ methods are
 unavailable.)

 What could be the cause of this?

 Thanks,
 Prathap



 On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  There are many possible reasons for the discrepancy, and obviously
 there are many ways of building a memory controller :-). Have you
 configured the model to look like the actual hardware? The most obvious
 differences would be in terms of buffer sizes, the page policy, arbitration
 policy, the threshold before closing a page, the read/write switching,
 actual timings etc. It is also worth checking if the controller hardware
 treats writes the same way the model does (early responses, minimise
 switching).

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 9 September 2014 18:56
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Hello Andreas,

  Thanks for your reply. I read your ISPASS paper and got a fair
 understanding about the architecture.
 I am trying to reproduce the results, collected from running synthetic
 benchmarks (latency and bandwidth) on real hardware, in Simulator
 Environment.However, i could see variations in the results and i am trying
 to understand the reasons.

  The experiment has latency(memory non-intensive with random access) as
 the primary task and bandwidth(memory intensive with sequential access) as
 the co-runner task.


  On real hardware
 case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
 case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

  On simulator
  case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
 case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


  Case 1 where latency test run alone(0 corunner), the results matches on
 both environment. However Case 2, when run with bandwidth(1 corunner), the
 results varies a lot.
 Do you have any thoughts about this?
 Thanks,
 Prathap

 On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  Have you read our ISPASS paper from last year? It’s referenced in the
 header file, as well as on gem5.org.

1. Yes and no. Two different buffers are used in the model are used,
but they are random access, so you can treat the entries any way you 
 want.
2. Yes and no. It’s a C++ model, so the scheduler executes in 0
time. Thus, when looking at the various requests it effectively sees all
the banks.
3. Yes and no. See above.

 Remember that this is a model. The goal is not to be representative down
 to every last element of an RTL design. The goal is to be representative of
 a real design, and then be fast. Both of these goals are delivered upon by
 the model.

  I hope that explains it. IF there is anything in the results you do
 not agree with, please do say so.

  Thanks,

  Andreas

   From: Prathap