Re: [gem5-users] Questions on DRAM Controller model

Prathap Kolakkampadath via gem5-users Wed, 15 Oct 2014 11:23:48 -0700

Thanks Andreas.


On Tue, Oct 14, 2014 at 4:22 PM, Andreas Hansson <andreas.hans...@arm.com>
wrote:

>  Hello Prathap,
>
>  I do not dare say, but perhaps some interaction between your generated
> access sequence and the O3 model (parameters) restrict the number of
> outstanding L1 misses? There are plenty debug flags to help in drilling
> down on this issue. Have a look in src/cpu/o3/Sconscript for the O3 related
> debug flags and src/mem/cache/Sconscript for the cache flags.
>
>  Andreas
>
>   From: Prathap Kolakkampadath <kvprat...@gmail.com>
> Date: Tuesday, October 14, 2014 at 9:21 PM
>
> To: Andreas Hansson <andreas.hans...@arm.com>
> Cc: gem5 users mailing list <gem5-users@gem5.org>
> Subject: Re: [gem5-users] Questions on DRAM Controller model
>
>   Hello Andreas
>
>  Whenever i switch to O3 Cpu from a checkpoint, i could see from
> config.ini that CPU is getting switched but the mem_mode is still set to
> atomic. However when booting in O3 CPU itself(without restoring from a
> checkpoint) the mem_mode is set to timing. Not sure why. Anyhow i could run
> my tests on O3 CPU with mem_mode timing(as verified from config.ini)
>
>  When i run one memory-intensive tests, which generates cache miss on
> every read, in parallel with a pointer chasing test(one outstanding request
> at a time) and both the cpu's share the same bank of DRAM Controller. In my
> setup, as # of L1 MSHRs are 10, memory-intensive test can generate up to 10
> Outstanding requests at a time. Since CPU speed is much faster than DRAM
> controller, can generate outstanding requests and all the requests are
> targeted to same bank, i expect to see the DRAM queue size to be 10 all the
> time when there is a request coming from pointer chasing test. If this
> assumption is correct i could see a better interference in model as i could
> see in real platforms.
>
>  Don't you think DRAM queue size would get  filled up to the size of
> number of L1 MSHRs according to above scenario. And what could be the case
> in order to fill the DRAM up to the size of # of L1 MSHRs.
>
>  Thanks,
>  Prathap Kumar Valsan
>  Research Assistant
>  University of Kansas
>
> On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson <andreas.hans...@arm.com>
> wrote:
>
>>  Hi Prathap,
>>
>>  The O3 CPU only works with the memory system in timing mode, so I do
>> not understand what two points you are comparing when you say the results
>> are exactly the same.
>>
>>  The read queue is likely to never fill up unless all these transactions
>> are generated at once. While the first one is being served by the memory
>> controller you may have more coming in etc, but I do not understand why you
>> think it would ever fill up.
>>
>>  For “debugging” make sure that the config.ini actually captures what
>> you think you are simulating. Also, you have a lot of DRAM-related stats in
>> the stats.txt output.
>>
>>  Andreas
>>
>>   From: Prathap Kolakkampadath <kvprat...@gmail.com>
>> Date: Tuesday, 14 October 2014 04:33
>>
>> To: Andreas Hansson <andreas.hans...@arm.com>
>> Cc: gem5 users mailing list <gem5-users@gem5.org>
>> Subject: Re: [gem5-users] Questions on DRAM Controller model
>>
>>    Hi Andreas, users
>>
>>  I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing,
>> the results are exactly the same compared to mem_mode=atomic.
>>  I have partitioned the DRAM banks using software. Both the benchmarks-
>> latency-sensitive and bandwidth -sensitive (both generates only reads)
>> running in parallel using the same DRAM bank.
>> From status file, i observe expected number L2 misses and DRAM requests
>> are getting generated.
>> In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
>> 32. So i expect that when a request from a latency-sensitive benchmark
>> comes to DRAM, the readQ size has to be 10. However what i am observing is
>> most of the time the Queue is not getting filled and hence there is less
>> queueing latency and interference.
>>
>>  I am using classic memory system with default DRAM
>> controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
>> policy-open_adaptive, and frfcfs scheduler.
>>
>>  Do you have any thoughts on this? How could i debug this further?
>>
>>  Appreciate your help.
>>
>>  Thanks,
>>  Prathap Kumar Valsan
>>  Research Assistant
>>  University of Kansas
>>
>> On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson <andreas.hans...@arm.com
>> > wrote:
>>
>>>  Hi Prathap,
>>>
>>>  Indeed. The atomic mode is for fast-forwarding only. Once you actually
>>> want to get some representative performance numbers you have to run in
>>> timing mode with either the O3 or Minor CPU model.
>>>
>>>  Andreas
>>>
>>>   From: Prathap Kolakkampadath <kvprat...@gmail.com>
>>> Date: Monday, 13 October 2014 10:19
>>>
>>> To: Andreas Hansson <andreas.hans...@arm.com>
>>> Cc: gem5 users mailing list <gem5-users@gem5.org>
>>> Subject: Re: [gem5-users] Questions on DRAM Controller model
>>>
>>>  Thanks for your reply. The memory mode which I used is atomic. I
>>> think, I need to run the tests in timing More. I believe which shows up
>>> interference and queueing delay similar to real platforms.
>>>
>>> Prathap
>>> On Oct 13, 2014 2:55 AM, "Andreas Hansson" <andreas.hans...@arm.com>
>>> wrote:
>>>
>>>>  Hi Prathap,
>>>>
>>>>  I don’t dare say exactly what is going wrong in your setup, but I am
>>>> confident that Ruby will not magically make things more representative (it
>>>> will likely give you a whole lot more problems though). In the end it is
>>>> all about configuring the building blocks to match the system you want to
>>>> capture. The crossbars and caches in the classic memory system do make some
>>>> simplifications, but I have not yet seen a case when they are not
>>>> sufficiently accurate.
>>>>
>>>>  Have you looked at the various policy settings in the DRAM
>>>> controller, e.g. the page policy and address mapping? If you’re trying to
>>>> correlate with a real platform, also see Anthony’s ISPASS paper from last
>>>> year for some sensible steps in simplifying the problem and dividing it
>>>> into manageable chunks.
>>>>
>>>>  Good luck.
>>>>
>>>>  Andreas
>>>>
>>>>   From: Prathap Kolakkampadath <kvprat...@gmail.com>
>>>> Date: Monday, 13 October 2014 00:29
>>>> To: Andreas Hansson <andreas.hans...@arm.com>
>>>> Cc: gem5 users mailing list <gem5-users@gem5.org>
>>>> Subject: Re: [gem5-users] Questions on DRAM Controller model
>>>>
>>>>   Hello Andreas/Users,
>>>>
>>>> I used to create a checkpoint until linux boot using Atomic Simple CPU
>>>> and then restore from this checkpoint to detailed O3 cpu before running the
>>>> test. I notice that the mem-mode is  set to atomic and not timing. Will
>>>> that be the reason for less contention in memory bus i am observing?
>>>>
>>>>  Thanks,
>>>>  Prathap
>>>>
>>>> On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath <
>>>> kvprat...@gmail.com> wrote:
>>>>
>>>>>  Hello Andreas,
>>>>>
>>>>>  Even after configuring the model like the actual hardware, i still
>>>>> not seeing enough interference to the read request under consideration. I
>>>>> am using the classic memory system model. Since it uses atomic and
>>>>> functional
>>>>> Packet allocation protocol, I would like to switch to Ruby( I think it
>>>>> more resembles with real platform).
>>>>>
>>>>>
>>>>>  I am hitting in to below problem when i use ruby.
>>>>>
>>>>> /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py
>>>>> --caches --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB
>>>>> --num-cpus=4 --mem-size=512MB
>>>>> --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
>>>>> --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
>>>>> --machine-type=VExpress_EMM
>>>>> --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
>>>>> --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64
>>>>>
>>>>> Traceback (most recent call last):
>>>>>   File "<string>", line 1, in <module>
>>>>>   File "/home/prathap/WorkSpace/gem5/src/python/m5/main.py", line 388,
>>>>> in main
>>>>>     exec filecode in scope
>>>>>   File "configs/example/fs.py", line 302, in <module>
>>>>>     test_sys = build_test_system(np)
>>>>>   File "configs/example/fs.py", line 138, in build_test_system
>>>>>     Ruby.create_system(options, test_sys, test_sys.iobus,
>>>>> test_sys._dma_ports)
>>>>>   File "/home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py", line
>>>>> 825, in __getattr__
>>>>>     raise AttributeError, err_string
>>>>> AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
>>>>>   (C++ object is not yet constructed, so wrapped C++ methods are
>>>>> unavailable.)
>>>>>
>>>>>  What could be the cause of this?
>>>>>
>>>>>  Thanks,
>>>>> Prathap
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson <
>>>>> andreas.hans...@arm.com> wrote:
>>>>>
>>>>>>  Hi Prathap,
>>>>>>
>>>>>>  There are many possible reasons for the discrepancy, and obviously
>>>>>> there are many ways of building a memory controller :-). Have you
>>>>>> configured the model to look like the actual hardware? The most obvious
>>>>>> differences would be in terms of buffer sizes, the page policy, 
>>>>>> arbitration
>>>>>> policy, the threshold before closing a page, the read/write switching,
>>>>>> actual timings etc. It is also worth checking if the controller hardware
>>>>>> treats writes the same way the model does (early responses, minimise
>>>>>> switching).
>>>>>>
>>>>>>  Andreas
>>>>>>
>>>>>>   From: Prathap Kolakkampadath <kvprat...@gmail.com>
>>>>>> Date: Tuesday, 9 September 2014 18:56
>>>>>> To: Andreas Hansson <andreas.hans...@arm.com>
>>>>>> Cc: gem5 users mailing list <gem5-users@gem5.org>
>>>>>> Subject: Re: [gem5-users] Questions on DRAM Controller model
>>>>>>
>>>>>>  Hello Andreas,
>>>>>>
>>>>>>  Thanks for your reply. I read your ISPASS paper and got a fair
>>>>>> understanding about the architecture.
>>>>>> I am trying to reproduce the results, collected from running
>>>>>> synthetic benchmarks (latency and bandwidth) on real hardware, in 
>>>>>> Simulator
>>>>>> Environment.However, i could see variations in the results and i am 
>>>>>> trying
>>>>>> to understand the reasons.
>>>>>>
>>>>>>  The experiment has latency(memory non-intensive with random access)
>>>>>> as the primary task and bandwidth(memory intensive with sequential 
>>>>>> access)
>>>>>> as the co-runner task.
>>>>>>
>>>>>>
>>>>>>  On real hardware
>>>>>> case 1 - 0 corunner : latency of the test is 74.88ns and b/w
>>>>>> 854.74MB/s
>>>>>> case 2 - 1 corunner : latency of the test is 225.95ns and b/w
>>>>>> 283.24MB/s
>>>>>>
>>>>>>  On simulator
>>>>>>  case 1 - 0 corunner : latency of the test is 76.08ns and b/w
>>>>>> 802.25MB/s
>>>>>> case 2 - 1 corunner : latency of the test is 93.69ns and b/w
>>>>>> 651.57MB/s
>>>>>>
>>>>>>
>>>>>>  Case 1 where latency test run alone(0 corunner), the results
>>>>>> matches on both environment. However Case 2, when run with bandwidth(1
>>>>>> corunner), the results varies a lot.
>>>>>> Do you have any thoughts about this?
>>>>>> Thanks,
>>>>>> Prathap
>>>>>>
>>>>>> On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson <
>>>>>> andreas.hans...@arm.com> wrote:
>>>>>>
>>>>>>>  Hi Prathap,
>>>>>>>
>>>>>>>  Have you read our ISPASS paper from last year? It’s referenced in
>>>>>>> the header file, as well as on gem5.org.
>>>>>>>
>>>>>>>    1. Yes and no. Two different buffers are used in the model are
>>>>>>>    used, but they are random access, so you can treat the entries any 
>>>>>>> way you
>>>>>>>    want.
>>>>>>>    2. Yes and no. It’s a C++ model, so the scheduler executes in 0
>>>>>>>    time. Thus, when looking at the various requests it effectively sees 
>>>>>>> all
>>>>>>>    the banks.
>>>>>>>    3. Yes and no. See above.
>>>>>>>
>>>>>>> Remember that this is a model. The goal is not to be representative
>>>>>>> down to every last element of an RTL design. The goal is to be
>>>>>>> representative of a real design, and then be fast. Both of these goals 
>>>>>>> are
>>>>>>> delivered upon by the model.
>>>>>>>
>>>>>>>  I hope that explains it. IF there is anything in the results you
>>>>>>> do not agree with, please do say so.
>>>>>>>
>>>>>>>  Thanks,
>>>>>>>
>>>>>>>  Andreas
>>>>>>>
>>>>>>>   From: Prathap Kolakkampadath via gem5-users <gem5-users@gem5.org>
>>>>>>> Reply-To: Prathap Kolakkampadath <kvprat...@gmail.com>, gem5 users
>>>>>>> mailing list <gem5-users@gem5.org>
>>>>>>> Date: Monday, 8 September 2014 18:38
>>>>>>> To: gem5 users mailing list <gem5-users@gem5.org>
>>>>>>> Subject: [gem5-users] Questions on DRAM Controller model
>>>>>>>
>>>>>>>  Hello Everybody,
>>>>>>>
>>>>>>> I am using DDR3_1600_x64. I am trying to understand the memory
>>>>>>> controller design and  have few doubts about it.
>>>>>>>
>>>>>>> 1) Do the memory controller has a separate  Bank request buffer
>>>>>>> (read and write buffer) for each bank or just a global queue?
>>>>>>> 2) Is there a scheduler per bank which arbitrates between different
>>>>>>> queue requests parallel with other bank schedulers?
>>>>>>> 3) Is there DRAM bus scheduler that arbitrates between different
>>>>>>> bank requests?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Prathap
>>>>>>>
>>>>>>> -- IMPORTANT NOTICE: The contents of this email and any attachments
>>>>>>> are confidential and may also be privileged. If you are not the intended
>>>>>>> recipient, please notify the sender immediately and do not disclose the
>>>>>>> contents to any other person, use it for any purpose, or store or copy 
>>>>>>> the
>>>>>>> information in any medium. Thank you.
>>>>>>>
>>>>>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>>>>>>> Registered in England & Wales, Company No: 2557590
>>>>>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>>>>>>> 9NJ, Registered in England & Wales, Company No: 2548782
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -- IMPORTANT NOTICE: The contents of this email and any attachments
>>>>>> are confidential and may also be privileged. If you are not the intended
>>>>>> recipient, please notify the sender immediately and do not disclose the
>>>>>> contents to any other person, use it for any purpose, or store or copy 
>>>>>> the
>>>>>> information in any medium. Thank you.
>>>>>>
>>>>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>>>>>> Registered in England & Wales, Company No: 2557590
>>>>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>>>>>> 9NJ, Registered in England & Wales, Company No: 2548782
>>>>>>
>>>>>
>>>>>
>>>>
>>>> -- IMPORTANT NOTICE: The contents of this email and any attachments are
>>>> confidential and may also be privileged. If you are not the intended
>>>> recipient, please notify the sender immediately and do not disclose the
>>>> contents to any other person, use it for any purpose, or store or copy the
>>>> information in any medium. Thank you.
>>>>
>>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>>>> Registered in England & Wales, Company No: 2557590
>>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>>>> 9NJ, Registered in England & Wales, Company No: 2548782
>>>>
>>>
>>> -- IMPORTANT NOTICE: The contents of this email and any attachments are
>>> confidential and may also be privileged. If you are not the intended
>>> recipient, please notify the sender immediately and do not disclose the
>>> contents to any other person, use it for any purpose, or store or copy the
>>> information in any medium. Thank you.
>>>
>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>>> Registered in England & Wales, Company No: 2557590
>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>>> 9NJ, Registered in England & Wales, Company No: 2548782
>>>
>>
>>
>> -- IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> Registered in England & Wales, Company No: 2557590
>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> Registered in England & Wales, Company No: 2548782
>>
>
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No: 2557590
> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No: 2548782
>

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Questions on DRAM Controller model

Reply via email to