Re: [Patch ping] Strength reduction

Richard Guenther Thu, 21 Jun 2012 02:14:15 -0700

On Wed, Jun 20, 2012 at 6:22 PM, William J. Schmidt
<wschm...@linux.vnet.ibm.com> wrote:
> On Wed, 2012-06-20 at 13:11 +0200, Richard Guenther wrote:
>> On Thu, Jun 14, 2012 at 3:21 PM, William J. Schmidt
>> <wschm...@linux.vnet.ibm.com> wrote:
>> > Pro forma ping. :)
>>
>> ;)
>>
>> I notice (with all of these functions)
>>
>> +unsigned
>> +negate_cost (enum machine_mode mode, bool speed)
>> +{
>> +  static unsigned costs[NUM_MACHINE_MODES];
>> +  rtx seq;
>> +  unsigned cost;
>> +
>> +  if (costs[mode])
>> +    return costs[mode];
>> +
>> +  start_sequence ();
>> +  force_operand (gen_rtx_fmt_e (NEG, mode,
>> +                             gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
>> +              NULL_RTX);
>> +  seq = get_insns ();
>> +  end_sequence ();
>> +
>> +  cost = seq_cost (seq, speed);
>> +  if (!cost)
>> +    cost = 1;
>>
>> that the cost[] array is independent on the speed argument.  Thus whatever
>> comes first determines the cost.  Odd, and probably not good.  A fix
>> would be appreciated (even for the current code ...) - simply make the
>> array costs[NUM_MACHINE_MODES][2].
>>
>> As for the renaming - can you name the functions consistently?  Thus
>> the above would be negate_reg_cost?  And maybe rename the other
>> FIXME function, too?
>
> I agree with all this.  I'll prepare all the cost model changes as a
> separate preliminaries patch.
>
>>
>> Index: gcc/tree-ssa-strength-reduction.c
>> ===================================================================
>> --- gcc/tree-ssa-strength-reduction.c (revision 0)
>> +++ gcc/tree-ssa-strength-reduction.c (revision 0)
>> @@ -0,0 +1,1611 @@
>> +/* Straight-line strength reduction.
>> +   Copyright (C) 2012  Free Software Foundation, Inc.
>>
>> I know we have these 'tree-ssa-' names, but really this is gimple-ssa now ;)
>> So, please name it gimple-ssa-strength-reduction.c.
>
> Will do.  Vive la revolution? ;)
>
>>
>> +  /* Access to the statement for subsequent modification.  Cached to
>> +     save compile time.  */
>> +  gimple_stmt_iterator cand_gsi;
>>
>> this is a iterator for cand_stmt?  Then caching it is no longer necessary
>> as the iterator is the stmt itself after recent infrastructure changes.
>
> Oh yeah, I remember seeing that go by.  Nice.  Will change.
>
>>
>> +/* Hash table embodying a mapping from statements to candidates.  */
>> +static htab_t stmt_cand_map;
>> ...
>> +static hashval_t
>> +stmt_cand_hash (const void *p)
>> +{
>> +  return htab_hash_pointer (((const_slsr_cand_t) p)->cand_stmt);
>> +}
>>
>> use a pointer-map instead.
>>
>> +/* Callback to produce a hash value for a candidate chain header.  */
>> +
>> +static hashval_t
>> +base_cand_hash (const void *p)
>> +{
>> +  tree ssa_name = ((const_cand_chain_t) p)->base_name;
>> +
>> +  if (TREE_CODE (ssa_name) != SSA_NAME)
>> +    return (hashval_t) 0;
>> +
>> +  return (hashval_t) SSA_NAME_VERSION (ssa_name);
>> +}
>>
>> does it ever happen that ssa_name is not an SSA_NAME?
>
> Not in this patch, but when I introduce CAND_REF in a later patch it
> could happen since the base field of a CAND_REF is a MEM_REF.  It's a
> safety valve in case of misuse.  I'll think about this some more.


Hm.  But then you produce a gigantic hash-collision for them ...

>> I'm not sure
>> the memory savings over simply using a fixed-size (num_ssa_names)
>> array indexed by SSA_NAME_VERSION pointing to the chain is worth
>> using a hashtable for this?
>
> That's reasonable.  I'll do that.
>
>>
>> +  node = (cand_chain_t) pool_alloc (chain_pool);
>> +  node->base_name = c->base_name;
>>
>> If you never free pool entries it's more efficient to use an obstack.
>> alloc-pool
>> only pays off if you get freed item re-use.
>
> OK.  I'll change both cand_pool and chain_pool to obstacks.
>
>>
>> +  switch (gimple_assign_rhs_code (gs))
>> +    {
>> +    case MULT_EXPR:
>> +      rhs2 = gimple_assign_rhs2 (gs);
>> +
>> +      if (TREE_CODE (rhs2) == INTEGER_CST)
>> +     return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
>> +
>> +      if (TREE_CODE (rhs1) == INTEGER_CST)
>> +     return multiply_by_cost (TREE_INT_CST_LOW (rhs1), lhs_mode, speed);
>>
>> In theory all commutative statements should have constant operands only
>> at rhs2 ...
>
> I'm glad I'm not the only one who thought that was the theory. ;)  I
> wasn't sure, and I've seen violations of this come up in practice.
> Should I assert when that happens instead, and track down the offending
> optimizations?

Yes please.  Good enough if you file bugreports for the cases you hit
and in the end simply drop the assert (but do not optimize) in the patch.

>>
>> Also you do not verify that the constant fits in a host-wide-int - but maybe
>> you do not care?  Thus, I'd do
>>
>>    if (host_integerp (rhs2, 0))
>>      return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
>>
>> or make multiply_by[_const?]_cost take a double-int instead.  Likewise below
>> for add.
>
> Ok.  Name change looks good also, I'll include that in the cost model
> changes.
>
>>
>> +    case MODIFY_EXPR:
>> +      /* Be suspicious of assigning costs to copies that may well go away.  
>> */
>> +      return 0;
>>
>> MODIFY_EXPR is never a gimple_assign_rhs_code.  Simple copies have
>> a code of SSA_NAME for example.  But as you assert if you get to an
>> unhandled code I wonder why you needed the above ...
>
> I'll remove this, and document that we are deliberately not touching
> copies (which was my original intent).
>
>>
>> +static slsr_cand_t
>> +base_cand_from_table (tree base_in)
>> +{
>> +  slsr_cand mapping_key;
>> +
>> +  gimple def = SSA_NAME_DEF_STMT (base_in);
>> +  if (!def)
>> +    return (slsr_cand_t) NULL;
>> +
>> +  mapping_key.cand_stmt = def;
>> +  return (slsr_cand_t) htab_find (stmt_cand_map, &mapping_key);
>>
>> isn't that reachable via the base-name -> chain mapping for base_in?
>
> Maybe so.  I need to think about it some more (it got evicted from my
> mental cache).  It could be I created this before adding the chains
> stuff and never cleaned up.

Ok.

>>
>> +  if (TREE_CODE (rhs2) == SSA_NAME && operand_equal_p (rhs1, rhs2, 0))
>> +    return;
>>
>> SSA_NAMEs can be compared by pointer equality, thus the above is
>> equivalent to
>>
>>   if (TREE_CODE (rhs2) == SSA_NAME && rhs1 == rhs2)
>>
>> or even just
>>
>>   if (rhs1 == rhs2)
>>
>> applies elsewhere as well.
>>
>
> Ok.  I promise I'll remember this eventually...

;)

>> +/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return
>> +   the multiple in *MULTIPLE.  Otherwise return FALSE and leave *MULTIPLE
>> +   unchanged.  */
>> +/* ??? - Should this be moved to double-int.c?  */
>>
>> I think so.
>
> Ok, I'll include this in the preliminaries patch.
>
>>
>> +static bool
>> +double_int_multiple_of (double_int product, double_int factor,
>> +                     bool unsigned_p, double_int *multiple)
>> +{
>> +  double_int remainder;
>> +  double_int quotient = double_int_divmod (product, factor, unsigned_p,
>> +                                        TRUNC_DIV_EXPR, &remainder);
>> +  if (double_int_zero_p (remainder))
>> +    {
>> +      *multiple = quotient;
>> +      return true;
>> +    }
>> +
>> +  return false;
>> +}
>>
>>
>> In general I find a lot of code of similar structure, factoring bits may make
>> it easier to read.  Obvious candidates include:
>>
>> +      /* Add the interpretation to the statement-candidate mapping.  */
>> +      slot = htab_find_slot (stmt_cand_map, c, INSERT);
>> +      gcc_assert (!*slot);
>> +      *slot = c;
>>
>> into a function add_cand_for_stmt ()
>>
>> +      if (TREE_CODE (rhs1) != SSA_NAME || TREE_CODE (rhs2) != INTEGER_CST)
>> +     return;
>>
>> doing some simple checks of this kind in the function walking all statements
>> and pass in operands and operation code.
>
> Sounds good.  You should have seen it before I already did one round of
> factoring... :)
>
>>
>> +/* Return TRUE if GS is a statement that defines an SSA name from
>> +   a NOP_EXPR and is legal for us to combine an add and multiply
>> +   across.  This is legitimate for casts from a signed type to
>> +   a signed or unsigned type of the same or larger size.  It is not
>> +   legitimate to convert any unsigned type to a signed type, or
>> +   to an unsigned type of a different size.
>> +
>> +   The reasoning here is that signed integer overflow is undefined,
>> +   so any program that was expecting overflow that no longer occurs
>> +   is not correct.  Unsigned integers, however, have wrap semantics,
>> +   and it is reasonable for programs to assume an overflowing add
>> +   will wrap.
>> +
>> +   With -fwrapv, signed integers also have wrap semantics, so widening
>> +   casts are not allowed then either.  */
>> +
>> +static bool
>> +legal_cast_p (gimple gs)
>> +{
>> +  tree lhs, rhs, lhs_type, rhs_type;
>> +  unsigned lhs_size, rhs_size, lhs_uns, rhs_uns;
>> +
>> +  if (!is_gimple_assign (gs)
>> +      || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME
>>
>> That's always true if the code is NOP_EXPR
>>
>> +      || gimple_assign_rhs_code (gs) != NOP_EXPR)
>>
>> CONVERT_EXPR_CODE_P ()
>>
>> +    return false;
>> +
>> +  lhs = gimple_assign_lhs (gs);
>> +  rhs = gimple_assign_rhs1 (gs);
>> +
>> +  if (TREE_CODE (rhs) != SSA_NAME)
>> +    return false;
>>
>> Likewise.
>>
>> +  lhs_type = TREE_TYPE (SSA_NAME_VAR (lhs));
>> +  rhs_type = TREE_TYPE (SSA_NAME_VAR (rhs));
>>
>> Get the type from the SSA_NAMEs themselves, no need to lookup
>> SSA_NAME_VAR.  If it happens you ICE that way you are looking
>> at released SSA names ...
>>
>> +  lhs_size = TYPE_PRECISION (lhs_type);
>> +  rhs_size = TYPE_PRECISION (rhs_type);
>> +  lhs_uns = TYPE_UNSIGNED (lhs_type);
>> +  rhs_uns = TYPE_UNSIGNED (rhs_type);
>> +
>> +  if (lhs_size < rhs_size
>> +      || (rhs_uns && !lhs_uns)
>> +      || (rhs_uns && lhs_uns && rhs_size != lhs_size)
>> +      || (!rhs_uns && flag_wrapv && rhs_size != lhs_size))
>> +    return false;
>>
>> So we basically check whether the lhs can represent all values of the rhs?
>> So ...
>>
>>     if (lhs_size > rhs_size)
>>       return true;
>>
>> is missing?  Because you'd reject a widening of an unsigned int to an
>> unsigned long.
>
> That rejection is intentional.  Assuming 4-byte int and 8-byte long, the
> unsigned int would wrap at 32 bits, so allowing the widening could
> change semantics.  I've actually encountered problems like this all over
> libiberty and other GCC support libraries.
>
>>
>> As for your handling of flag_wrapv and the unsigned flag, please try
>> to use the TYPE_OVERFLOW_UNDEFINED/TYPE_OVERFLOW_WRAPS
>> type properties instead for the parts that care about overflow behavior
>> and not value-representation.
>>
>> +  return true;
>> +}
>>
>
> OK, I'll look into those and let you know if I have questions.
>
>> I have a deja-vu for seeing similar code elsewhere (widening shift/multiply
>> support in the vector pattern recognizer maybe).  While I may understand
>> what the textual description says including an example would explain
>> things better.  For example
>>
>> "Return TRUE if GS is a statement that defines an SSA name from
>>  a conversion and is legal for us to associate with an add or multiply.
>>  Thus transform name = (T) name'; name * X; to name' * X"
>>
>> But I suppose I got the semantics wrong (and thus the example).  Can you
>> clarify?
>
> OK, I'll try.  This was definitely the hardest part to get right.

Of course ;)  It's probably also important due to the all-present casts to
sizetype for address offset computations.

>>
>> Otherwise the pass looks quite good.  It looks though that it will optimize
>> cross-basic-block strength-reduction opportunities, eventually causing
>> cross-BB register livetime increases?  Or is this not an issue?
>
> This is true, and it's made somewhat worse by later optimizations.  I
> attempted to reduce the scope where possible by choosing the most
> closely dominating candidate as a basis, thinking this would help limit
> lifetimes.  However, later optimizations fold up the various adds so
> that all candidates in a chain end up directly depending on the
> "ultimate basis," resulting in one longer lifetime.
>
> It's something to keep an eye on.  If it appears to be an issue, we may
> need to look into some sort of distance heuristic to limit which
> candidates can be used as a basis.  It's not a big deal to keep a
> register live across a single conditional branch, but reusing a value
> from a control-equivalent block when there's an intervening loop is not
> so good.  (On the other hand, it's just another opportunity for
> intelligent register spill design. :)
>
> As I indicated, though, I don't think this is a problem limited to this
> pass, but is a general issue many places in the middle end.

Of course.  For this pass as you already have a 'cost' associated you
could simply pessimize candidates from a different basic-block slightly.

But I guess without a real testcase it would be just some magic ...

Richard.

>>
>> Thanks for working on this,
>> Richard.
>>
>
> And thanks very much for the review!
>
> Bill
>

Re: [Patch ping] Strength reduction

Reply via email to