Yes, it's almost impossible to get completely identical behavior without
running a completely identical system.  Even making the cache larger will
make the program run faster in some phases, which will change where timer
interrupts happen with respect to program execution.

If you look at larger time windows and/or more samples, the mean behavior
should stabilize, but trying to correlate individual small samples like
you're doing is going to be extremely challenging.

This paper focuses on these issues in multiprocessor systems, but most of
what it talks about is relevant to uniprocessor systems running a full OS
too:
http://pages.cs.wisc.edu/~alaa/papers/ieeemicro03_variability.pdf

<http://pages.cs.wisc.edu/~alaa/papers/ieeemicro03_variability.pdf>Steve

On Mon, Jan 24, 2011 at 10:34 PM, Stevenson Jian <[email protected]>wrote:

> Yes, I am running in FS mode. Is it normal for the OS to make that much
> difference?
> These statistics are taken after the benchmarks have started.
> Thanks!
> Steve
>
> On Tue, Jan 25, 2011 at 12:00 AM, Steve Reinhardt <[email protected]>wrote:
>
>> OK, sorry for the confusion; since you were running a Parsec benchmark I
>> assumed the numbers were processor IDs.  Are you running in FS mode?  Are
>> these statistics taken from the beginning when Linux is booting, or are they
>> after the benchmark has started running?
>>
>> Steve
>>
>>
>> On Mon, Jan 24, 2011 at 5:51 PM, Stevenson Jian 
>> <[email protected]>wrote:
>>
>>> Thanks for replying Steve. I only used a single processor in both
>>> simulations. What is shown is not the output from individual processors, but
>>> that of the same processor at the end of every 100,000 instructions (see
>>> sim_insts increment 100,000 each time)
>>>
>>>
>>> On Mon, Jan 24, 2011 at 7:14 PM, Steve Reinhardt <[email protected]>wrote:
>>>
>>>> With a multiprocessor, seemingly small changes in configuration can have
>>>> a significant impact if it changes the order in which threads grab a lock,
>>>> or something like that. So in particular, for the stats you have below, it
>>>> seems likely that there's some serialized computation going on that 
>>>> happened
>>>> on processor 3 in the first case and on processor 5 in the second case.
>>>>
>>>> Steve
>>>>
>>>> On Mon, Jan 24, 2011 at 1:30 PM, Stevenson Jian <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi,
>>>>> How does Timing CPU count number of instructions? If it stalls on a
>>>>> cache miss, do the Nops count as instructions as well? The reason why I 
>>>>> ask
>>>>> is that by simply changing the size of the cache, the total number of
>>>>> instructions when the benchmark completes varies by about 0.1 - 0.01%.
>>>>>
>>>>> Another anomaly that I am observing is that again, by simply changing
>>>>> the size of the L2, the number of overall L2 accesses per let's say 
>>>>> 100,000
>>>>> instructions can vary by over 100%.
>>>>>
>>>>> The following are 2 runs that i did on m5 with the Freqmine benchmark.
>>>>> The first simulation uses a 1Mb 4 way L2 with a latency of 6ns while the
>>>>> second simulation uses a 2MB 8 way L2 with a latency of 4.5ns. The overall
>>>>> access per 100,000 instructions are show.
>>>>>
>>>>> ---------------------------------------------------------------------------------------------
>>>>> 1MB 4Way L2:
>>>>> 2:
>>>>> sim_insts                                   100200001
>>>>>     # Number of instructions simulated
>>>>> sim_ticks                                   196940000
>>>>>     # Number of ticks simulated
>>>>> system.l2.overall_accesses                       3231
>>>>>     # number of overall (read+write) accesses
>>>>> system.l2.overall_hits                           2515
>>>>>     # number of overall hits
>>>>>
>>>>> 3:
>>>>> sim_insts                                   100300001
>>>>>     # Number of instructions simulated
>>>>> sim_ticks                                   227453000
>>>>>     # Number of ticks simulated
>>>>> system.l2.overall_accesses                       4656
>>>>>     # number of overall (read+write) accesses
>>>>> system.l2.overall_hits                           3434
>>>>>     # number of overall hits
>>>>>
>>>>> 4:
>>>>> sim_insts                                   100400001
>>>>>     # Number of instructions simulated
>>>>> sim_ticks                                   154064000
>>>>>     # Number of ticks simulated
>>>>> system.l2.overall_accesses                       1078
>>>>>     # number of overall (read+write) accesses
>>>>> system.l2.overall_hits                            722
>>>>>     # number of overall hits
>>>>>
>>>>> 5:
>>>>> sim_insts                                   100500001
>>>>>     # Number of instructions simulated
>>>>> sim_ticks                                   155779000
>>>>>     # Number of ticks simulated
>>>>> system.l2.overall_accesses                       1575
>>>>>     # number of overall (read+write) accesses
>>>>> system.l2.overall_hits                           1154
>>>>>     # number of overall hits
>>>>>
>>>>> ....
>>>>>
>>>>> 2MB 8Way L2:
>>>>> 2:
>>>>> sim_insts                                   100200001
>>>>>     # Number of instructions simulated
>>>>> sim_ticks                                   234810000
>>>>>     # Number of ticks simulated
>>>>> system.l2.overall_accesses                       2936
>>>>>     # number of overall (read+write) accesses
>>>>> system.l2.overall_hits                           1163
>>>>>     # number of overall hits
>>>>>
>>>>> 3:
>>>>> sim_insts                                   100300000
>>>>>     # Number of instructions simulated
>>>>> sim_ticks                                   174173000
>>>>>     # Number of ticks simulated
>>>>> system.l2.overall_accesses                       1496
>>>>>     # number of overall (read+write) accesses
>>>>> system.l2.overall_hits                            803
>>>>>     # number of overall hits
>>>>>
>>>>> 4:
>>>>> sim_insts                                   100400000
>>>>>     # Number of instructions simulated
>>>>> sim_ticks                                   190135000
>>>>>     # Number of ticks simulated
>>>>> system.l2.overall_accesses                       2290
>>>>>     # number of overall (read+write) accesses
>>>>> system.l2.overall_hits                           1672
>>>>>     # number of overall hits
>>>>>
>>>>> 5:
>>>>> sim_insts                                   100500000
>>>>>     # Number of instructions simulated
>>>>> sim_ticks                                   213086000
>>>>>     # Number of ticks simulated
>>>>> system.l2.overall_accesses                       4554
>>>>>     # number of overall (read+write) accesses
>>>>> system.l2.overall_hits                           3871
>>>>>     # number of overall hits
>>>>> .....
>>>>>
>>>>> ----------------------------------------------------------------------------
>>>>>  Even if Nops are counted as instructions, I don't see how that would
>>>>> make overall access/100,000 instructions vary by as much 200%. How does M5
>>>>> count the number of instructions?
>>>>> Thanks,
>>>>> Steve
>>>>>
>>>>> _______________________________________________
>>>>> m5-users mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> m5-users mailing list
>>>> [email protected]
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>>>
>>>
>>>
>>> _______________________________________________
>>> m5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>>
>>
>>
>> _______________________________________________
>> m5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>
>
>
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to