On Wed, Jun 20, 2012 at 6:22 PM, William J. Schmidt <wschm...@linux.vnet.ibm.com> wrote: > On Wed, 2012-06-20 at 13:11 +0200, Richard Guenther wrote: >> On Thu, Jun 14, 2012 at 3:21 PM, William J. Schmidt >> <wschm...@linux.vnet.ibm.com> wrote: >> > Pro forma ping. :) >> >> ;) >> >> I notice (with all of these functions) >> >> +unsigned >> +negate_cost (enum machine_mode mode, bool speed) >> +{ >> + static unsigned costs[NUM_MACHINE_MODES]; >> + rtx seq; >> + unsigned cost; >> + >> + if (costs[mode]) >> + return costs[mode]; >> + >> + start_sequence (); >> + force_operand (gen_rtx_fmt_e (NEG, mode, >> + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)), >> + NULL_RTX); >> + seq = get_insns (); >> + end_sequence (); >> + >> + cost = seq_cost (seq, speed); >> + if (!cost) >> + cost = 1; >> >> that the cost[] array is independent on the speed argument. Thus whatever >> comes first determines the cost. Odd, and probably not good. A fix >> would be appreciated (even for the current code ...) - simply make the >> array costs[NUM_MACHINE_MODES][2]. >> >> As for the renaming - can you name the functions consistently? Thus >> the above would be negate_reg_cost? And maybe rename the other >> FIXME function, too? > > I agree with all this. I'll prepare all the cost model changes as a > separate preliminaries patch. > >> >> Index: gcc/tree-ssa-strength-reduction.c >> =================================================================== >> --- gcc/tree-ssa-strength-reduction.c (revision 0) >> +++ gcc/tree-ssa-strength-reduction.c (revision 0) >> @@ -0,0 +1,1611 @@ >> +/* Straight-line strength reduction. >> + Copyright (C) 2012 Free Software Foundation, Inc. >> >> I know we have these 'tree-ssa-' names, but really this is gimple-ssa now ;) >> So, please name it gimple-ssa-strength-reduction.c. > > Will do. Vive la revolution? ;) > >> >> + /* Access to the statement for subsequent modification. Cached to >> + save compile time. */ >> + gimple_stmt_iterator cand_gsi; >> >> this is a iterator for cand_stmt? Then caching it is no longer necessary >> as the iterator is the stmt itself after recent infrastructure changes. > > Oh yeah, I remember seeing that go by. Nice. Will change. > >> >> +/* Hash table embodying a mapping from statements to candidates. */ >> +static htab_t stmt_cand_map; >> ... >> +static hashval_t >> +stmt_cand_hash (const void *p) >> +{ >> + return htab_hash_pointer (((const_slsr_cand_t) p)->cand_stmt); >> +} >> >> use a pointer-map instead. >> >> +/* Callback to produce a hash value for a candidate chain header. */ >> + >> +static hashval_t >> +base_cand_hash (const void *p) >> +{ >> + tree ssa_name = ((const_cand_chain_t) p)->base_name; >> + >> + if (TREE_CODE (ssa_name) != SSA_NAME) >> + return (hashval_t) 0; >> + >> + return (hashval_t) SSA_NAME_VERSION (ssa_name); >> +} >> >> does it ever happen that ssa_name is not an SSA_NAME? > > Not in this patch, but when I introduce CAND_REF in a later patch it > could happen since the base field of a CAND_REF is a MEM_REF. It's a > safety valve in case of misuse. I'll think about this some more.
Hm. But then you produce a gigantic hash-collision for them ... >> I'm not sure >> the memory savings over simply using a fixed-size (num_ssa_names) >> array indexed by SSA_NAME_VERSION pointing to the chain is worth >> using a hashtable for this? > > That's reasonable. I'll do that. > >> >> + node = (cand_chain_t) pool_alloc (chain_pool); >> + node->base_name = c->base_name; >> >> If you never free pool entries it's more efficient to use an obstack. >> alloc-pool >> only pays off if you get freed item re-use. > > OK. I'll change both cand_pool and chain_pool to obstacks. > >> >> + switch (gimple_assign_rhs_code (gs)) >> + { >> + case MULT_EXPR: >> + rhs2 = gimple_assign_rhs2 (gs); >> + >> + if (TREE_CODE (rhs2) == INTEGER_CST) >> + return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed); >> + >> + if (TREE_CODE (rhs1) == INTEGER_CST) >> + return multiply_by_cost (TREE_INT_CST_LOW (rhs1), lhs_mode, speed); >> >> In theory all commutative statements should have constant operands only >> at rhs2 ... > > I'm glad I'm not the only one who thought that was the theory. ;) I > wasn't sure, and I've seen violations of this come up in practice. > Should I assert when that happens instead, and track down the offending > optimizations? Yes please. Good enough if you file bugreports for the cases you hit and in the end simply drop the assert (but do not optimize) in the patch. >> >> Also you do not verify that the constant fits in a host-wide-int - but maybe >> you do not care? Thus, I'd do >> >> if (host_integerp (rhs2, 0)) >> return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed); >> >> or make multiply_by[_const?]_cost take a double-int instead. Likewise below >> for add. > > Ok. Name change looks good also, I'll include that in the cost model > changes. > >> >> + case MODIFY_EXPR: >> + /* Be suspicious of assigning costs to copies that may well go away. >> */ >> + return 0; >> >> MODIFY_EXPR is never a gimple_assign_rhs_code. Simple copies have >> a code of SSA_NAME for example. But as you assert if you get to an >> unhandled code I wonder why you needed the above ... > > I'll remove this, and document that we are deliberately not touching > copies (which was my original intent). > >> >> +static slsr_cand_t >> +base_cand_from_table (tree base_in) >> +{ >> + slsr_cand mapping_key; >> + >> + gimple def = SSA_NAME_DEF_STMT (base_in); >> + if (!def) >> + return (slsr_cand_t) NULL; >> + >> + mapping_key.cand_stmt = def; >> + return (slsr_cand_t) htab_find (stmt_cand_map, &mapping_key); >> >> isn't that reachable via the base-name -> chain mapping for base_in? > > Maybe so. I need to think about it some more (it got evicted from my > mental cache). It could be I created this before adding the chains > stuff and never cleaned up. Ok. >> >> + if (TREE_CODE (rhs2) == SSA_NAME && operand_equal_p (rhs1, rhs2, 0)) >> + return; >> >> SSA_NAMEs can be compared by pointer equality, thus the above is >> equivalent to >> >> if (TREE_CODE (rhs2) == SSA_NAME && rhs1 == rhs2) >> >> or even just >> >> if (rhs1 == rhs2) >> >> applies elsewhere as well. >> > > Ok. I promise I'll remember this eventually... ;) >> +/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return >> + the multiple in *MULTIPLE. Otherwise return FALSE and leave *MULTIPLE >> + unchanged. */ >> +/* ??? - Should this be moved to double-int.c? */ >> >> I think so. > > Ok, I'll include this in the preliminaries patch. > >> >> +static bool >> +double_int_multiple_of (double_int product, double_int factor, >> + bool unsigned_p, double_int *multiple) >> +{ >> + double_int remainder; >> + double_int quotient = double_int_divmod (product, factor, unsigned_p, >> + TRUNC_DIV_EXPR, &remainder); >> + if (double_int_zero_p (remainder)) >> + { >> + *multiple = quotient; >> + return true; >> + } >> + >> + return false; >> +} >> >> >> In general I find a lot of code of similar structure, factoring bits may make >> it easier to read. Obvious candidates include: >> >> + /* Add the interpretation to the statement-candidate mapping. */ >> + slot = htab_find_slot (stmt_cand_map, c, INSERT); >> + gcc_assert (!*slot); >> + *slot = c; >> >> into a function add_cand_for_stmt () >> >> + if (TREE_CODE (rhs1) != SSA_NAME || TREE_CODE (rhs2) != INTEGER_CST) >> + return; >> >> doing some simple checks of this kind in the function walking all statements >> and pass in operands and operation code. > > Sounds good. You should have seen it before I already did one round of > factoring... :) > >> >> +/* Return TRUE if GS is a statement that defines an SSA name from >> + a NOP_EXPR and is legal for us to combine an add and multiply >> + across. This is legitimate for casts from a signed type to >> + a signed or unsigned type of the same or larger size. It is not >> + legitimate to convert any unsigned type to a signed type, or >> + to an unsigned type of a different size. >> + >> + The reasoning here is that signed integer overflow is undefined, >> + so any program that was expecting overflow that no longer occurs >> + is not correct. Unsigned integers, however, have wrap semantics, >> + and it is reasonable for programs to assume an overflowing add >> + will wrap. >> + >> + With -fwrapv, signed integers also have wrap semantics, so widening >> + casts are not allowed then either. */ >> + >> +static bool >> +legal_cast_p (gimple gs) >> +{ >> + tree lhs, rhs, lhs_type, rhs_type; >> + unsigned lhs_size, rhs_size, lhs_uns, rhs_uns; >> + >> + if (!is_gimple_assign (gs) >> + || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME >> >> That's always true if the code is NOP_EXPR >> >> + || gimple_assign_rhs_code (gs) != NOP_EXPR) >> >> CONVERT_EXPR_CODE_P () >> >> + return false; >> + >> + lhs = gimple_assign_lhs (gs); >> + rhs = gimple_assign_rhs1 (gs); >> + >> + if (TREE_CODE (rhs) != SSA_NAME) >> + return false; >> >> Likewise. >> >> + lhs_type = TREE_TYPE (SSA_NAME_VAR (lhs)); >> + rhs_type = TREE_TYPE (SSA_NAME_VAR (rhs)); >> >> Get the type from the SSA_NAMEs themselves, no need to lookup >> SSA_NAME_VAR. If it happens you ICE that way you are looking >> at released SSA names ... >> >> + lhs_size = TYPE_PRECISION (lhs_type); >> + rhs_size = TYPE_PRECISION (rhs_type); >> + lhs_uns = TYPE_UNSIGNED (lhs_type); >> + rhs_uns = TYPE_UNSIGNED (rhs_type); >> + >> + if (lhs_size < rhs_size >> + || (rhs_uns && !lhs_uns) >> + || (rhs_uns && lhs_uns && rhs_size != lhs_size) >> + || (!rhs_uns && flag_wrapv && rhs_size != lhs_size)) >> + return false; >> >> So we basically check whether the lhs can represent all values of the rhs? >> So ... >> >> if (lhs_size > rhs_size) >> return true; >> >> is missing? Because you'd reject a widening of an unsigned int to an >> unsigned long. > > That rejection is intentional. Assuming 4-byte int and 8-byte long, the > unsigned int would wrap at 32 bits, so allowing the widening could > change semantics. I've actually encountered problems like this all over > libiberty and other GCC support libraries. > >> >> As for your handling of flag_wrapv and the unsigned flag, please try >> to use the TYPE_OVERFLOW_UNDEFINED/TYPE_OVERFLOW_WRAPS >> type properties instead for the parts that care about overflow behavior >> and not value-representation. >> >> + return true; >> +} >> > > OK, I'll look into those and let you know if I have questions. > >> I have a deja-vu for seeing similar code elsewhere (widening shift/multiply >> support in the vector pattern recognizer maybe). While I may understand >> what the textual description says including an example would explain >> things better. For example >> >> "Return TRUE if GS is a statement that defines an SSA name from >> a conversion and is legal for us to associate with an add or multiply. >> Thus transform name = (T) name'; name * X; to name' * X" >> >> But I suppose I got the semantics wrong (and thus the example). Can you >> clarify? > > OK, I'll try. This was definitely the hardest part to get right. Of course ;) It's probably also important due to the all-present casts to sizetype for address offset computations. >> >> Otherwise the pass looks quite good. It looks though that it will optimize >> cross-basic-block strength-reduction opportunities, eventually causing >> cross-BB register livetime increases? Or is this not an issue? > > This is true, and it's made somewhat worse by later optimizations. I > attempted to reduce the scope where possible by choosing the most > closely dominating candidate as a basis, thinking this would help limit > lifetimes. However, later optimizations fold up the various adds so > that all candidates in a chain end up directly depending on the > "ultimate basis," resulting in one longer lifetime. > > It's something to keep an eye on. If it appears to be an issue, we may > need to look into some sort of distance heuristic to limit which > candidates can be used as a basis. It's not a big deal to keep a > register live across a single conditional branch, but reusing a value > from a control-equivalent block when there's an intervening loop is not > so good. (On the other hand, it's just another opportunity for > intelligent register spill design. :) > > As I indicated, though, I don't think this is a problem limited to this > pass, but is a general issue many places in the middle end. Of course. For this pass as you already have a 'cost' associated you could simply pessimize candidates from a different basic-block slightly. But I guess without a real testcase it would be just some magic ... Richard. >> >> Thanks for working on this, >> Richard. >> > > And thanks very much for the review! > > Bill >