On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote: > On 4 August 2016 at 13:31, Richard Biener <rguent...@suse.de> wrote: > > On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote: > > > >> Hi, > >> This is a prototype patch for propagating known/unknown bits > >> inter-procedurally. > >> for integral types which propagates info obtained from get_nonzero_bits (). > >> > >> Patch required making following changes: > >> a) To make info from get_nonzero_bits() available to ipa, I had to remove > >> guard !nonzero_p in ccp_finalize. However that triggered the following ICE > >> in get_ptr_info() for default_none.f95 (and several other fortran tests) > >> with options: -fopenacc -O2 > >> ICE: http://pastebin.com/KjD7HMQi > >> I confirmed with Richard that this was a latent issue. > > > > Can you plase bootstrap/test the fix for this separately? (doesn't > > seem to be included in this patch btw) > Well I don't have the fix available -;)
Oh, I thought it was obvious: Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c (revision 239117) +++ gcc/tree-inline.c (working copy) @@ -242,7 +242,8 @@ remap_ssa_name (tree name, copy_body_dat SSA_NAME_OCCURS_IN_ABNORMAL_PHI (new_tree) = SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name); /* At least IPA points-to info can be directly transferred. */ - if (id->src_cfun->gimple_df + if (POINTER_TYPE_P (TREE_TYPE (name)) + && id->src_cfun->gimple_df && id->src_cfun->gimple_df->ipa_pta && (pi = SSA_NAME_PTR_INFO (name)) && !pi->pt.anything) @@ -274,7 +275,8 @@ remap_ssa_name (tree name, copy_body_dat SSA_NAME_OCCURS_IN_ABNORMAL_PHI (new_tree) = SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name); /* At least IPA points-to info can be directly transferred. */ - if (id->src_cfun->gimple_df + if (POINTER_TYPE_P (TREE_TYPE (name)) + && id->src_cfun->gimple_df && id->src_cfun->gimple_df->ipa_pta && (pi = SSA_NAME_PTR_INFO (name)) && !pi->pt.anything) similarly range info could be transfered of course. > > > >> b) I chose widest_int for representing value, mask in ipcp_bits_lattice > >> and correspondingly changed declarations for > >> bit_value_unop_1/bit_value_binop_1 to take > >> precision and sign instead of type (those are the only two fields that > >> were used). Both these functions are exported by tree-ssa-ccp.h > >> I hope that's ok ? > > > > That's ok, but please change the functions to overloads of > > bit_value_binop / bit_value_unop to not export ugly _1 names. > > > > - signop sgn = TYPE_SIGN (type); > > - int width = TYPE_PRECISION (type); > > + signop sgn = type_sgn; > > + int width = (int) type_precision; > > > > please adjust parameter names to get rid of those now unnecessary > > locals (and make the precision parameter an 'int'). > > > >> c) Changed streamer_read_wi/streamer_write_wi to non-static. > >> Ah I see Kugan has submitted a patch for this, so I will drop this hunk. > > > > But he streams wide_int, not widest_int. I followed up on his > > patch. > Oops, I got confused, sorry about that. > > > >> d) We have following in tree-ssa-ccp.c:get_default_value (): > >> if (flag_tree_bit_ccp) > >> { > >> wide_int nonzero_bits = get_nonzero_bits (var); > >> if (nonzero_bits != -1) > >> { > >> val.lattice_val = CONSTANT; > >> val.value = build_zero_cst (TREE_TYPE (var)); > >> val.mask = extend_mask (nonzero_bits); > >> } > >> > >> extend_mask() sets all upper bits to 1 in nonzero_bits, ie, varying > >> in terms of bit-ccp. > >> I suppose in tree-ccp we need to extend mask if var is parameter since we > >> don't > >> know in advance what values it will receive from different callers and > >> mark all > >> upper bits as 1 to be safe. > > > > Not sure, it seems to me that we can zero-extend for unsigned types > > and sign-extend for signed types (if the "sign"-bit of nonzero_bits > > is one it properly makes higher bits undefined). Can you change > > the code accordingly? (simply give extend_mask a sign-op and use > > that appropriately?) Please split out this change so it can be > > tested separately. > > > >> However I suppose with ipa, we can determine exactly which bits of > >> parameter are constant and > >> setting all upper bits to 1 will become unnecessary ? > >> > >> For example, consider following artificial test-case: > >> int f(int x) > >> { > >> if (x > 300) > >> return 1; > >> else > >> return 2; > >> } > >> > >> int main(int argc, char **argv) > >> { > >> return f(argc & 0xc) + f (argc & 0x3); > >> } > >> > >> For x, the mask would be meet of: > >> <0, 0xc> meet <0, 0x3> == (0x3 | 0xc) | (0 ^ 0) == 0xf > >> and ipcp_update_bits() sets nonzero_bits for x to 0xf. > >> However get_default_value then calls extend_mask (0xf), resulting in > >> all upper bits > >> being set to 1 and consequently the condition if (x > 300) doesn't get > >> folded. > > > > But then why would the code trying to optimize the comparison look at > > bits that are outside of the precision? (where do we try to use this > > info? I see that VRP misses to use nonzero bits if no range info > > is present - I suppose set_nonzero_bits misses to eventually adjust > > the range. > > > > That said, where is the folding code and why does it care for those > > "uninteresting" bits at all? > Well there is following in bit_value_binop_1 for case LT_EXPR / LE_EXPR: > /* If the most significant bits are not known we know nothing. */ > if (wi::neg_p (o1mask) || wi::neg_p (o2mask)) > break; > > IIUC extend_mask extends all upper bits to 1, and we hit break and > thus not perform folding. Yeah, this should simply test _the_ most significant bit (that is, bit number TYPE_PRECISION (r1type)). But it should be fixed by properly extending the mask. > ccp2 dump shows: > Folding statement: if (x_2(D) > 300) > which is likely CONSTANT > Not folded > > Instead if we extend based on signop, then the condition gets folded > correctly: > Folding statement: if (x_2(D) > 300) > which is likely CONSTANT > Folding predicate x_2(D) > 300 to 0 > gimple_simplified to if (0 != 0) > Folded into: if (0 != 0) > > I thought it was unsafe for ccp to extend based on sign-op, > so I guarded that on DECL_SET_BY_IPA. > I will try to change extend_mask to extend the mask based on signop > and get rid of the flag. > > I will address your other comments in follow-up patch. Thanks, Richard. > Thanks, > Prathamesh > > > >> To resolve this, I added a new flag "set_by_ipa" to decl_common, > >> which is set to true if the mask of parameter is determined by ipa-cp, > >> and the condition changes to: > >> > >> if (SSA_NAME_VAR (var) > >> && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL > >> && DECL_SET_BY_IPA (SSA_NAME_VAR (var)) > >> val.mask = widest_int::from (nonzero_bits, > >> TYPE_SIGN (TREE_TYPE (SSA_NAME_VAR (var))); > >> else > >> val.mask = extend_mask (nonzero_bits); > >> > >> I am not sure if adding a new flag to decl_common is a good idea. How > >> do other ipa passes deal with this/similar issue ? > >> > >> I suppose we would want to gate this on some flag, say -fipa-bit-cp ? > >> I haven't yet gated it on the flag, will do in next version of patch. > >> I have added some very simple test-cases, I will try to add more > >> meaningful ones. > > > > See above - we should avoid needing this. > > > >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu > >> and cross-tested on arm*-*-* and aarch64*-*-* with the exception > >> of some fortran tests failing due to above ICE. > >> > >> As next steps, I am planning to extend it to handle alignment propagation, > >> and do further testing (lto-bootstrap, chromium). > >> I would be grateful for feedback on the current patch. > > > > I see you do > > > > @@ -895,7 +899,7 @@ do_dbg_cnt (void) > > Return TRUE when something was optimized. */ > > > > static bool > > -ccp_finalize (bool nonzero_p) > > +ccp_finalize (bool nonzero_p ATTRIBUTE_UNUSED) > > { > > bool something_changed; > > unsigned i; > > @@ -913,10 +917,7 @@ ccp_finalize (bool nonzero_p) > > > > if (!name > > || (!POINTER_TYPE_P (TREE_TYPE (name)) > > - && (!INTEGRAL_TYPE_P (TREE_TYPE (name)) > > - /* Don't record nonzero bits before IPA to avoid > > - using too much memory. */ > > - || !nonzero_p))) > > + && (!INTEGRAL_TYPE_P (TREE_TYPE (name))))) > > continue; > > > > can you instead adjust the caller to do sth like > > > > if (ccp_finalize (nonzero_p || flag_ipa_cp)) > > { > > > > ? What we miss to optimize memory usage in the early CCP case > > (it's run very early, before dead code elimination) is to > > avoid setting alignment / nonzero bits for the case of > > fully propagatable (and thus dead after substitute_and_fold) > > SSA names. > > > > So in ccp_finalize do sth like > > > > val = get_value (name); > > if (val->lattice_val != CONSTANT > > || TREE_CODE (val->value) != INTEGER_CST > > || val->mask == 0) > > continue; > > > > That should cut down early CCP memory use in case of nonzero > > setting significantly. > > > > I didn't look at the propagation part but eventually the IPA-CP > > lattice gets quite big. Also the alignment lattice is very > > similar to the bits lattice so why not merge those two? But > > in the end it's Martins/Honzas call here. Note there is > > trailing_wide_ints <> which could be used to improve memory usage > > based on the underlying type. > > > > Thanks, > > Richard. > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)