That's a good point.

I'll coordinate with Nilay offline to get him the right image.

Nilay, from your previous optimizations trials, where did you see most of
the simulation time being sent at?

On Fri, Apr 1, 2011 at 4:48 PM, Ali Saidi <[email protected]> wrote:

> None of those benchmarks probably push the memory system with multiple
> cores like fft. Why don't you give Nilay your fft benchmark?
>
> Ali
>
>
>
> On Fri, 1 Apr 2011 16:42:48 -0400, Korey Sewell <[email protected]> wrote:
>
>> Hi Nilay,
>> I think I've located the images for those benchmarks so I'll test a couple
>> of these over the weekend and give an update.
>>
>> On Wed, Mar 30, 2011 at 8:03 PM, Nilay Vaish <[email protected]> wrote:
>>
>>  Korey, I do not have the FftBase32 benchmark. Is it possible for you to
>>> run
>>> the simulation with one of the following benchmarks --
>>>
>>> IScsiInitiator, IScsiTarget, MutexTest, NetperfMaerts, NetperfStream,
>>> NetperfStreamNT, NetperfStreamUdp, NetperfUdpLocal, Nfs, NfsTcp,
>>> Nhfsstone,
>>> Ping, PovrayAutumn, PovrayBench, SurgeSpecweb, SurgeStandard,
>>> ValAccDelay,
>>> ValAccDelay2, ValCtxLat, ValMemLat, ValMemLat2MB, ValMemLat8MB,
>>> ValStream,
>>> ValStreamCopy, ValStreamScale, ValSysLat, ValTlbLat, Validation, bnAn
>>>
>>>
>>> Which of these do you think would closely resemble FFT?
>>>
>>>
>>> Nilay
>>>
>>>
>>> On Wed, 30 Mar 2011, Korey Sewell wrote:
>>>
>>>  Hi all,
>>>
>>>> I had noticed that Ruby was running a little slower than the old M5
>>>> memory system and decided to run gprof on it to see if there was
>>>> anything obvious holding things up.
>>>>
>>>> For 2, 4, and 8 core  ALPHA_FS_MOESI_CMP_directory, SimpleCPU runs for
>>>> the Fft benchmark, it seems that the MemoryControl::executeCycle
>>>> conributes to nearly 30% of the runtime. Looking at the comments for
>>>> that code, I see this:
>>>> "// executeCycle:  This function is called once per memory clock cycle"
>>>>
>>>> I'm not familiar with this Memory Controller code but it would seem
>>>> that some type of optimization not requiring this to be run every
>>>> memory cycle would speed things up a good bit. So if someone has the
>>>> time or the need to do some Ruby optimization work (i know Nilay had
>>>> did some previously), then I think this will be a good place to
>>>> start...
>>>>
>>>> I post some of the gprof output below:
>>>> =====
>>>> 2 core
>>>> =====
>>>> time (%)   name
>>>> 29.17  MemoryControl::executeCycle()
>>>> 4.19    RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
>>>> 3.52    PerfectSwitch::wakeup()
>>>> 3.47    Set::Set(Set const&)
>>>> 3.46    RubyEventQueueNode::process()
>>>>
>>>>
>>>> 4 core
>>>> =====
>>>> time (%)   name
>>>> 27.49    MemoryControl::executeCycle()
>>>>  4.01    RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
>>>>  3.66    PerfectSwitch::wakeup()
>>>>  3.59   Set::Set(Set const&)
>>>>  3.50    RubyEventQueueNode::process()
>>>>
>>>> 8 core
>>>> =====
>>>> time (%)   name
>>>> 26.09    MemoryControl::executeCycle()
>>>>  4.12   Set::Set(Set const&)
>>>>  3.91   PerfectSwitch::wakeup()
>>>>  3.88   RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
>>>>  3.41   RubyEventQueueNode::process()
>>>>
>>>> --
>>>> - Korey
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>>  _______________________________________________
>>>>
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>



-- 
- Korey
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to