------- Comment #4 from hjl dot tools at gmail dot com  2008-09-19 18:03 -------
(In reply to comment #1)
> Root cause is that instruction length of fused jcc is set to 16, which prevent
> the block from merging and copying. For some reason Core2 runs poorly with a
> unmerged branch block under certain circonstances.
> 

I looked at the generated code without TARGET_FUSE_CMP_AND_BRANCH.
In most cases, gcc doesn't put any instructions between TEST/CMP
and JCC and we get macro-fusion optimization automatically even
when TARGET_FUSE_CMP_AND_BRANCH is off.

Since TARGET_FUSE_CMP_AND_BRANCH generates patterns with incorrect
instruction length, it prevents the block from merging and copying,
which hurt performance.

We have 2 choices:

1. Correct insn length for *jcc_fused_X patterns, which what Joey's
patch does.
2. Remove *jcc_fused_X patterns and optimize macro-fusion in Core 2
scheduling.

I think we should do #2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37571

Reply via email to