Tiansheng,

Here are my two cents. As far as I remember, the X86 implementation is fairly heavily microcoded (> 1 uop for many x86 instructions*), so while IPC might be low, the number of uops committed each cycle might be reasonable. That's one possible contributor to low IPC.

Apart from this, you should generally check if one structure stalls the pipeline very often (see IQ/ROB/LSQFullEvents in the stats for Rename) and try to find out if this is expected or not. Low IPC can also come from bad branch prediction, bad memory dependency prediction (see memOrderViolation and conflictingLoads vs. forwardedLoads stats). Also check that your issueToExecute delay is a single cycle.

Hope it helps.

Arthur Perais.

*if I remember correctly, a single branch is three uop, so if you fetch a branch, you're done for this cycle with a fetch width of 3.

Le 06/04/2016 18:59, Tiansheng Zhang a écrit :
Hi,

I've been running X86 full system simulation using gem5 and SPEC benchmarks lately but observed very low IPC values. I used a 3-issue architecture, whose detailed parameters are listed below:
==============================================
Issue/Fetch/Rename/Decode/Commit Width | 3 / 3 / 3 / 3 / 3
ROB Entries   |        84
LQ/SQ Entries  |      32/32
Integer ALU/MultDiv                                       |        3/1
FP ALU/MultDiv |        3/1
L1 I/D cache     | 32KB each, 8 set assoc
L2 cache      | 2MB, 8 set assoc
Memory      | DDR3 module
==============================================

I made the simpoints for SPEC 2006 benchmarks with training input set using Gem5 and simulate these simpoints with 100M instructions. I then calculated out the weighted IPC results for each benchmark according to their simpoint weights but the IPC values seem very low to me. For example, the average IPC is 0.43. Some other examples are listed below: bzip2 h264ref hmmer mcf
Weighted IPC values     0.62                 0.34    0.79          0.3

Since the architecture has issue/commit width of 3, I would expect much higher IPC values than listed above. Does this result look normal? Is there any possible performance bottlenecks within the architecture I simulated?

Thanks,

Tiansheng Zhang



_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


--
Arthur Perais
INRIA Bretagne Atlantique
Bâtiment 12E, Bureau E303, Campus de Beaulieu
35042 Rennes, France

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to