https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83253
--- Comment #4 from Bill Schmidt <wschmidt at gcc dot gnu.org> --- I think the issue may be in this code: /* For any other increment, if this is a multiply candidate, we must introduce a temporary T and initialize it with T_0 = stride * increment. When optimizing for speed, walk the candidate tree to calculate the best cost reduction along any path; if it offsets the fixed cost of inserting the initializer, replacing the increment is profitable. When optimizing for size, instead calculate the total cost reduction from replacing all candidates with this increment. */ else if (first_dep->kind == CAND_MULT) { int cost = mult_by_coeff_cost (incr, mode, speed); int repl_savings = mul_cost (speed, mode) - add_cost (speed, mode); if (speed) cost = lowest_cost_path (cost, repl_savings, first_dep, incr_vec[i].incr, COUNT_PHIS); else cost -= total_savings (repl_savings, first_dep, incr_vec[i].incr, COUNT_PHIS); incr_vec[i].cost = cost; } Note that the cost of the newly introduced multiply (by 3) is equal to the target's cost of multiplying by that specific constant. But the gain from replacing the instruction is the cost of a *generic* multiply less the cost of an add. This is only correct when the stride is unknown at compile time. When the stride is an integer, we should use mult_by_coeff_cost for that integer rather than mul_cost. For a power of 2, this should drive repl_savings to 0 for all targets and make the replacement by a non-power of 2 too expensive to contemplate. I'll throw together a patch for people to play with shortly.