http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56878
Bug #: 56878 Summary: Issue with candidate choice in vect_gen_niters_for_prolog_loop. Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ysrum...@gmail.com We found out 7% performance drop on 482.sphinx3 from spec2006 for -march=corei7 & -mavx which apeeared after fix r196872. The problem can be reproduced with the attached testcase. Function vect_gen_niters_for_prolog_loop() uses (after r196872) non-invariant pointer (v1) for calculation of #iterations for prolog but before it uses invariant pointer (x) for doing it and all these evaluations can be hoised out of outer loop: before fix <bb 6>: niters.3_17 = (unsigned int) len_7; vect_px.4_4 = x_24(D); _119 = (unsigned long) vect_px.4_4; _118 = _119 & 31; _117 = _118 >> 2; _116 = -_117; _115 = (unsigned int) _116; _114 = _115 & 7; prolog_loop_niters.5_52 = MIN_EXPR <niters.3_17, _114>; Note that all these assignments can be hoisted out of loop. after fix <bb 6>: niters.3_17 = (unsigned int) len_7; vect_pv1.4_4 = v1_16; _119 = (unsigned long) vect_pv1.4_4; where v1 is not loop invariant. If trip count for outer loop is huge and trip count for inner loop is small such code motion can affect on performance dramatically. To reproduce compile attached test on x86 with the following options: -O3 -funroll-loops -ffast-math -march=corei7 -mavx