If I'm not mistaken, there should be both parallel and serial sections of Freqmine (or any multithreaded benchmark). So since programs typically run in different phases, it wouldn't necessarily be out-of-line if there was a 100,000 instruction snippet that took particularly long, especially if that was a phase where there are a lot of cache misses.
Also, if you are running a parallel benchmark but only on a single processor then you might expect that there would be a lot of cache thrashing as threads fight over their shared data as well as their own private data. Looks like you'll have to dig a bit into what is the expected output for Freqmine (or any benchmark) and then compare against what M5 is simulating to be sure. On Mon, Jan 24, 2011 at 8:51 PM, Stevenson Jian <[email protected]>wrote: > Thanks for replying Steve. I only used a single processor in both > simulations. What is shown is not the output from individual processors, but > that of the same processor at the end of every 100,000 instructions (see > sim_insts increment 100,000 each time) > > > On Mon, Jan 24, 2011 at 7:14 PM, Steve Reinhardt <[email protected]> wrote: > >> With a multiprocessor, seemingly small changes in configuration can have a >> significant impact if it changes the order in which threads grab a lock, or >> something like that. So in particular, for the stats you have below, it >> seems likely that there's some serialized computation going on that happened >> on processor 3 in the first case and on processor 5 in the second case. >> >> Steve >> >> On Mon, Jan 24, 2011 at 1:30 PM, Stevenson Jian >> <[email protected]>wrote: >> >>> Hi, >>> How does Timing CPU count number of instructions? If it stalls on a cache >>> miss, do the Nops count as instructions as well? The reason why I ask is >>> that by simply changing the size of the cache, the total number of >>> instructions when the benchmark completes varies by about 0.1 - 0.01%. >>> >>> Another anomaly that I am observing is that again, by simply changing the >>> size of the L2, the number of overall L2 accesses per let's say 100,000 >>> instructions can vary by over 100%. >>> >>> The following are 2 runs that i did on m5 with the Freqmine benchmark. >>> The first simulation uses a 1Mb 4 way L2 with a latency of 6ns while the >>> second simulation uses a 2MB 8 way L2 with a latency of 4.5ns. The overall >>> access per 100,000 instructions are show. >>> >>> --------------------------------------------------------------------------------------------- >>> 1MB 4Way L2: >>> 2: >>> sim_insts 100200001 >>> # Number of instructions simulated >>> sim_ticks 196940000 >>> # Number of ticks simulated >>> system.l2.overall_accesses 3231 >>> # number of overall (read+write) accesses >>> system.l2.overall_hits 2515 >>> # number of overall hits >>> >>> 3: >>> sim_insts 100300001 >>> # Number of instructions simulated >>> sim_ticks 227453000 >>> # Number of ticks simulated >>> system.l2.overall_accesses 4656 >>> # number of overall (read+write) accesses >>> system.l2.overall_hits 3434 >>> # number of overall hits >>> >>> 4: >>> sim_insts 100400001 >>> # Number of instructions simulated >>> sim_ticks 154064000 >>> # Number of ticks simulated >>> system.l2.overall_accesses 1078 >>> # number of overall (read+write) accesses >>> system.l2.overall_hits 722 >>> # number of overall hits >>> >>> 5: >>> sim_insts 100500001 >>> # Number of instructions simulated >>> sim_ticks 155779000 >>> # Number of ticks simulated >>> system.l2.overall_accesses 1575 >>> # number of overall (read+write) accesses >>> system.l2.overall_hits 1154 >>> # number of overall hits >>> >>> .... >>> >>> 2MB 8Way L2: >>> 2: >>> sim_insts 100200001 >>> # Number of instructions simulated >>> sim_ticks 234810000 >>> # Number of ticks simulated >>> system.l2.overall_accesses 2936 >>> # number of overall (read+write) accesses >>> system.l2.overall_hits 1163 >>> # number of overall hits >>> >>> 3: >>> sim_insts 100300000 >>> # Number of instructions simulated >>> sim_ticks 174173000 >>> # Number of ticks simulated >>> system.l2.overall_accesses 1496 >>> # number of overall (read+write) accesses >>> system.l2.overall_hits 803 >>> # number of overall hits >>> >>> 4: >>> sim_insts 100400000 >>> # Number of instructions simulated >>> sim_ticks 190135000 >>> # Number of ticks simulated >>> system.l2.overall_accesses 2290 >>> # number of overall (read+write) accesses >>> system.l2.overall_hits 1672 >>> # number of overall hits >>> >>> 5: >>> sim_insts 100500000 >>> # Number of instructions simulated >>> sim_ticks 213086000 >>> # Number of ticks simulated >>> system.l2.overall_accesses 4554 >>> # number of overall (read+write) accesses >>> system.l2.overall_hits 3871 >>> # number of overall hits >>> ..... >>> >>> ---------------------------------------------------------------------------- >>> Even if Nops are counted as instructions, I don't see how that would >>> make overall access/100,000 instructions vary by as much 200%. How does M5 >>> count the number of instructions? >>> Thanks, >>> Steve >>> >>> _______________________________________________ >>> m5-users mailing list >>> [email protected] >>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>> >> >> >> _______________________________________________ >> m5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >> > > > _______________________________________________ > m5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > -- - Korey
_______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
