On 4 August 2016 at 13:31, Richard Biener <rguent...@suse.de> wrote: > On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote: > >> Hi, >> This is a prototype patch for propagating known/unknown bits >> inter-procedurally. >> for integral types which propagates info obtained from get_nonzero_bits (). >> >> Patch required making following changes: >> a) To make info from get_nonzero_bits() available to ipa, I had to remove >> guard !nonzero_p in ccp_finalize. However that triggered the following ICE >> in get_ptr_info() for default_none.f95 (and several other fortran tests) >> with options: -fopenacc -O2 >> ICE: http://pastebin.com/KjD7HMQi >> I confirmed with Richard that this was a latent issue. > > Can you plase bootstrap/test the fix for this separately? (doesn't > seem to be included in this patch btw) Well I don't have the fix available -;) > >> b) I chose widest_int for representing value, mask in ipcp_bits_lattice >> and correspondingly changed declarations for >> bit_value_unop_1/bit_value_binop_1 to take >> precision and sign instead of type (those are the only two fields that >> were used). Both these functions are exported by tree-ssa-ccp.h >> I hope that's ok ? > > That's ok, but please change the functions to overloads of > bit_value_binop / bit_value_unop to not export ugly _1 names. > > - signop sgn = TYPE_SIGN (type); > - int width = TYPE_PRECISION (type); > + signop sgn = type_sgn; > + int width = (int) type_precision; > > please adjust parameter names to get rid of those now unnecessary > locals (and make the precision parameter an 'int'). > >> c) Changed streamer_read_wi/streamer_write_wi to non-static. >> Ah I see Kugan has submitted a patch for this, so I will drop this hunk. > > But he streams wide_int, not widest_int. I followed up on his > patch. Oops, I got confused, sorry about that. > >> d) We have following in tree-ssa-ccp.c:get_default_value (): >> if (flag_tree_bit_ccp) >> { >> wide_int nonzero_bits = get_nonzero_bits (var); >> if (nonzero_bits != -1) >> { >> val.lattice_val = CONSTANT; >> val.value = build_zero_cst (TREE_TYPE (var)); >> val.mask = extend_mask (nonzero_bits); >> } >> >> extend_mask() sets all upper bits to 1 in nonzero_bits, ie, varying >> in terms of bit-ccp. >> I suppose in tree-ccp we need to extend mask if var is parameter since we >> don't >> know in advance what values it will receive from different callers and mark >> all >> upper bits as 1 to be safe. > > Not sure, it seems to me that we can zero-extend for unsigned types > and sign-extend for signed types (if the "sign"-bit of nonzero_bits > is one it properly makes higher bits undefined). Can you change > the code accordingly? (simply give extend_mask a sign-op and use > that appropriately?) Please split out this change so it can be > tested separately. > >> However I suppose with ipa, we can determine exactly which bits of >> parameter are constant and >> setting all upper bits to 1 will become unnecessary ? >> >> For example, consider following artificial test-case: >> int f(int x) >> { >> if (x > 300) >> return 1; >> else >> return 2; >> } >> >> int main(int argc, char **argv) >> { >> return f(argc & 0xc) + f (argc & 0x3); >> } >> >> For x, the mask would be meet of: >> <0, 0xc> meet <0, 0x3> == (0x3 | 0xc) | (0 ^ 0) == 0xf >> and ipcp_update_bits() sets nonzero_bits for x to 0xf. >> However get_default_value then calls extend_mask (0xf), resulting in >> all upper bits >> being set to 1 and consequently the condition if (x > 300) doesn't get >> folded. > > But then why would the code trying to optimize the comparison look at > bits that are outside of the precision? (where do we try to use this > info? I see that VRP misses to use nonzero bits if no range info > is present - I suppose set_nonzero_bits misses to eventually adjust > the range. > > That said, where is the folding code and why does it care for those > "uninteresting" bits at all? Well there is following in bit_value_binop_1 for case LT_EXPR / LE_EXPR: /* If the most significant bits are not known we know nothing. */ if (wi::neg_p (o1mask) || wi::neg_p (o2mask)) break;
IIUC extend_mask extends all upper bits to 1, and we hit break and thus not perform folding. ccp2 dump shows: Folding statement: if (x_2(D) > 300) which is likely CONSTANT Not folded Instead if we extend based on signop, then the condition gets folded correctly: Folding statement: if (x_2(D) > 300) which is likely CONSTANT Folding predicate x_2(D) > 300 to 0 gimple_simplified to if (0 != 0) Folded into: if (0 != 0) I thought it was unsafe for ccp to extend based on sign-op, so I guarded that on DECL_SET_BY_IPA. I will try to change extend_mask to extend the mask based on signop and get rid of the flag. I will address your other comments in follow-up patch. Thanks, Prathamesh > >> To resolve this, I added a new flag "set_by_ipa" to decl_common, >> which is set to true if the mask of parameter is determined by ipa-cp, >> and the condition changes to: >> >> if (SSA_NAME_VAR (var) >> && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL >> && DECL_SET_BY_IPA (SSA_NAME_VAR (var)) >> val.mask = widest_int::from (nonzero_bits, >> TYPE_SIGN (TREE_TYPE (SSA_NAME_VAR (var))); >> else >> val.mask = extend_mask (nonzero_bits); >> >> I am not sure if adding a new flag to decl_common is a good idea. How >> do other ipa passes deal with this/similar issue ? >> >> I suppose we would want to gate this on some flag, say -fipa-bit-cp ? >> I haven't yet gated it on the flag, will do in next version of patch. >> I have added some very simple test-cases, I will try to add more >> meaningful ones. > > See above - we should avoid needing this. > >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu >> and cross-tested on arm*-*-* and aarch64*-*-* with the exception >> of some fortran tests failing due to above ICE. >> >> As next steps, I am planning to extend it to handle alignment propagation, >> and do further testing (lto-bootstrap, chromium). >> I would be grateful for feedback on the current patch. > > I see you do > > @@ -895,7 +899,7 @@ do_dbg_cnt (void) > Return TRUE when something was optimized. */ > > static bool > -ccp_finalize (bool nonzero_p) > +ccp_finalize (bool nonzero_p ATTRIBUTE_UNUSED) > { > bool something_changed; > unsigned i; > @@ -913,10 +917,7 @@ ccp_finalize (bool nonzero_p) > > if (!name > || (!POINTER_TYPE_P (TREE_TYPE (name)) > - && (!INTEGRAL_TYPE_P (TREE_TYPE (name)) > - /* Don't record nonzero bits before IPA to avoid > - using too much memory. */ > - || !nonzero_p))) > + && (!INTEGRAL_TYPE_P (TREE_TYPE (name))))) > continue; > > can you instead adjust the caller to do sth like > > if (ccp_finalize (nonzero_p || flag_ipa_cp)) > { > > ? What we miss to optimize memory usage in the early CCP case > (it's run very early, before dead code elimination) is to > avoid setting alignment / nonzero bits for the case of > fully propagatable (and thus dead after substitute_and_fold) > SSA names. > > So in ccp_finalize do sth like > > val = get_value (name); > if (val->lattice_val != CONSTANT > || TREE_CODE (val->value) != INTEGER_CST > || val->mask == 0) > continue; > > That should cut down early CCP memory use in case of nonzero > setting significantly. > > I didn't look at the propagation part but eventually the IPA-CP > lattice gets quite big. Also the alignment lattice is very > similar to the bits lattice so why not merge those two? But > in the end it's Martins/Honzas call here. Note there is > trailing_wide_ints <> which could be used to improve memory usage > based on the underlying type. > > Thanks, > Richard.