On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote: > Hi, > This is a prototype patch for propagating known/unknown bits > inter-procedurally. > for integral types which propagates info obtained from get_nonzero_bits (). > > Patch required making following changes: > a) To make info from get_nonzero_bits() available to ipa, I had to remove > guard !nonzero_p in ccp_finalize. However that triggered the following ICE > in get_ptr_info() for default_none.f95 (and several other fortran tests) > with options: -fopenacc -O2 > ICE: http://pastebin.com/KjD7HMQi > I confirmed with Richard that this was a latent issue.
Can you plase bootstrap/test the fix for this separately? (doesn't seem to be included in this patch btw) > b) I chose widest_int for representing value, mask in ipcp_bits_lattice > and correspondingly changed declarations for > bit_value_unop_1/bit_value_binop_1 to take > precision and sign instead of type (those are the only two fields that > were used). Both these functions are exported by tree-ssa-ccp.h > I hope that's ok ? That's ok, but please change the functions to overloads of bit_value_binop / bit_value_unop to not export ugly _1 names. - signop sgn = TYPE_SIGN (type); - int width = TYPE_PRECISION (type); + signop sgn = type_sgn; + int width = (int) type_precision; please adjust parameter names to get rid of those now unnecessary locals (and make the precision parameter an 'int'). > c) Changed streamer_read_wi/streamer_write_wi to non-static. > Ah I see Kugan has submitted a patch for this, so I will drop this hunk. But he streams wide_int, not widest_int. I followed up on his patch. > d) We have following in tree-ssa-ccp.c:get_default_value (): > if (flag_tree_bit_ccp) > { > wide_int nonzero_bits = get_nonzero_bits (var); > if (nonzero_bits != -1) > { > val.lattice_val = CONSTANT; > val.value = build_zero_cst (TREE_TYPE (var)); > val.mask = extend_mask (nonzero_bits); > } > > extend_mask() sets all upper bits to 1 in nonzero_bits, ie, varying > in terms of bit-ccp. > I suppose in tree-ccp we need to extend mask if var is parameter since we > don't > know in advance what values it will receive from different callers and mark > all > upper bits as 1 to be safe. Not sure, it seems to me that we can zero-extend for unsigned types and sign-extend for signed types (if the "sign"-bit of nonzero_bits is one it properly makes higher bits undefined). Can you change the code accordingly? (simply give extend_mask a sign-op and use that appropriately?) Please split out this change so it can be tested separately. > However I suppose with ipa, we can determine exactly which bits of > parameter are constant and > setting all upper bits to 1 will become unnecessary ? > > For example, consider following artificial test-case: > int f(int x) > { > if (x > 300) > return 1; > else > return 2; > } > > int main(int argc, char **argv) > { > return f(argc & 0xc) + f (argc & 0x3); > } > > For x, the mask would be meet of: > <0, 0xc> meet <0, 0x3> == (0x3 | 0xc) | (0 ^ 0) == 0xf > and ipcp_update_bits() sets nonzero_bits for x to 0xf. > However get_default_value then calls extend_mask (0xf), resulting in > all upper bits > being set to 1 and consequently the condition if (x > 300) doesn't get folded. But then why would the code trying to optimize the comparison look at bits that are outside of the precision? (where do we try to use this info? I see that VRP misses to use nonzero bits if no range info is present - I suppose set_nonzero_bits misses to eventually adjust the range. That said, where is the folding code and why does it care for those "uninteresting" bits at all? > To resolve this, I added a new flag "set_by_ipa" to decl_common, > which is set to true if the mask of parameter is determined by ipa-cp, > and the condition changes to: > > if (SSA_NAME_VAR (var) > && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL > && DECL_SET_BY_IPA (SSA_NAME_VAR (var)) > val.mask = widest_int::from (nonzero_bits, > TYPE_SIGN (TREE_TYPE (SSA_NAME_VAR (var))); > else > val.mask = extend_mask (nonzero_bits); > > I am not sure if adding a new flag to decl_common is a good idea. How > do other ipa passes deal with this/similar issue ? > > I suppose we would want to gate this on some flag, say -fipa-bit-cp ? > I haven't yet gated it on the flag, will do in next version of patch. > I have added some very simple test-cases, I will try to add more > meaningful ones. See above - we should avoid needing this. > Patch passes bootstrap+test on x86_64-unknown-linux-gnu > and cross-tested on arm*-*-* and aarch64*-*-* with the exception > of some fortran tests failing due to above ICE. > > As next steps, I am planning to extend it to handle alignment propagation, > and do further testing (lto-bootstrap, chromium). > I would be grateful for feedback on the current patch. I see you do @@ -895,7 +899,7 @@ do_dbg_cnt (void) Return TRUE when something was optimized. */ static bool -ccp_finalize (bool nonzero_p) +ccp_finalize (bool nonzero_p ATTRIBUTE_UNUSED) { bool something_changed; unsigned i; @@ -913,10 +917,7 @@ ccp_finalize (bool nonzero_p) if (!name || (!POINTER_TYPE_P (TREE_TYPE (name)) - && (!INTEGRAL_TYPE_P (TREE_TYPE (name)) - /* Don't record nonzero bits before IPA to avoid - using too much memory. */ - || !nonzero_p))) + && (!INTEGRAL_TYPE_P (TREE_TYPE (name))))) continue; can you instead adjust the caller to do sth like if (ccp_finalize (nonzero_p || flag_ipa_cp)) { ? What we miss to optimize memory usage in the early CCP case (it's run very early, before dead code elimination) is to avoid setting alignment / nonzero bits for the case of fully propagatable (and thus dead after substitute_and_fold) SSA names. So in ccp_finalize do sth like val = get_value (name); if (val->lattice_val != CONSTANT || TREE_CODE (val->value) != INTEGER_CST || val->mask == 0) continue; That should cut down early CCP memory use in case of nonzero setting significantly. I didn't look at the propagation part but eventually the IPA-CP lattice gets quite big. Also the alignment lattice is very similar to the bits lattice so why not merge those two? But in the end it's Martins/Honzas call here. Note there is trailing_wide_ints <> which could be used to improve memory usage based on the underlying type. Thanks, Richard.