https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83253

--- Comment #4 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
I think the issue may be in this code:

      /* For any other increment, if this is a multiply candidate, we           
         must introduce a temporary T and initialize it with                    
         T_0 = stride * increment.  When optimizing for speed, walk the         
         candidate tree to calculate the best cost reduction along any          
         path; if it offsets the fixed cost of inserting the initializer,       
         replacing the increment is profitable.  When optimizing for            
         size, instead calculate the total cost reduction from replacing        
         all candidates with this increment.  */
      else if (first_dep->kind == CAND_MULT)
        {
          int cost = mult_by_coeff_cost (incr, mode, speed);
          int repl_savings = mul_cost (speed, mode) - add_cost (speed, mode);
          if (speed)
            cost = lowest_cost_path (cost, repl_savings, first_dep,
                                     incr_vec[i].incr, COUNT_PHIS);
          else
            cost -= total_savings (repl_savings, first_dep, incr_vec[i].incr,
                                   COUNT_PHIS);

          incr_vec[i].cost = cost;
        }

Note that the cost of the newly introduced multiply (by 3) is equal to the
target's cost of multiplying by that specific constant.  But the gain from
replacing the instruction is the cost of a *generic* multiply less the cost of
an add.  This is only correct when the stride is unknown at compile time.  When
the stride is an integer, we should use mult_by_coeff_cost for that integer
rather than mul_cost.  For a power of 2, this should drive repl_savings to 0
for all targets and make the replacement by a non-power of 2 too expensive to
contemplate.

I'll throw together a patch for people to play with shortly.

Reply via email to