Hi, Arthur Thanks for the reply! I checked all the metrics you mentioned in the last email.
-- As for IQ/ROB/LSQFullEvents, there are around less than 2% of all simulation cycles when these hardware component is full, respectively. -- For the memOrderViolation and loads metrics, I'm not sure how to understand these metrics, but they are generally small compared to the total number of loads that have been committed (memOrderViolation is ignorable, conflictingLoads and forwLoads are around 5%~10% depending on the benchmarks and simpoints in general). -- One thing I notice is that the fetched IPC (the instructions that are fetched every cycle) is very low (from 0.2 to 1.1) although I set an issue width of 3. When I check the detailed information, I observed that the simulated CPU is not fetching any instructions for around 50% of all simulation cycles (for another ~40% of all cycles the CPU is fetching 3 instructions per cycle). Is this a normal behavior? I think this might be the major reason for the low IPC values. Thanks! Tiansheng On Wed, Apr 6, 2016 at 4:40 PM, Arthur Perais <[email protected]> wrote: > Tiansheng, > > Here are my two cents. As far as I remember, the X86 implementation is > fairly heavily microcoded (> 1 uop for many x86 instructions*), so while > IPC might be low, the number of uops committed each cycle might be > reasonable. That's one possible contributor to low IPC. > > Apart from this, you should generally check if one structure stalls the > pipeline very often (see IQ/ROB/LSQFullEvents in the stats for Rename) and > try to find out if this is expected or not. Low IPC can also come from bad > branch prediction, bad memory dependency prediction (see memOrderViolation > and conflictingLoads vs. forwardedLoads stats). Also check that your > issueToExecute delay is a single cycle. > > Hope it helps. > > Arthur Perais. > > *if I remember correctly, a single branch is three uop, so if you fetch a > branch, you're done for this cycle with a fetch width of 3. > > > Le 06/04/2016 18:59, Tiansheng Zhang a écrit : > > Hi, > > I've been running X86 full system simulation using gem5 and SPEC > benchmarks lately but observed very low IPC values. I used a 3-issue > architecture, whose detailed parameters are listed below: > ============================================== > Issue/Fetch/Rename/Decode/Commit Width | 3 / 3 / 3 / 3 / 3 > ROB Entries | 84 > LQ/SQ Entries | 32/32 > Integer ALU/MultDiv | 3/1 > FP ALU/MultDiv | 3/1 > L1 I/D cache | 32KB > each, 8 set assoc > L2 cache | 2MB, 8 > set assoc > Memory | DDR3 > module > ============================================== > > I made the simpoints for SPEC 2006 benchmarks with training input set > using Gem5 and simulate these simpoints with 100M instructions. I then > calculated out the weighted IPC results for each benchmark according to > their simpoint weights but the IPC values seem very low to me. For example, > the average IPC is 0.43. Some other examples are listed below: > bzip2 h264ref > hmmer mcf > Weighted IPC values 0.62 0.34 0.79 > 0.3 > > Since the architecture has issue/commit width of 3, I would expect much > higher IPC values than listed above. Does this result look normal? Is there > any possible performance bottlenecks within the architecture I simulated? > > Thanks, > > Tiansheng Zhang > > > > _______________________________________________ > gem5-users mailing > [email protected]http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > > -- > Arthur Perais > INRIA Bretagne Atlantique > Bâtiment 12E, Bureau E303, Campus de Beaulieu > 35042 Rennes, France > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
