On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote:

> Hi,
> This is a prototype patch for propagating known/unknown bits 
> inter-procedurally.
> for integral types which propagates info obtained from get_nonzero_bits ().
> 
> Patch required making following changes:
> a) To make info from get_nonzero_bits() available to ipa, I had to remove
> guard !nonzero_p in ccp_finalize. However that triggered the following ICE
> in get_ptr_info() for default_none.f95 (and several other fortran tests)
> with options: -fopenacc -O2
> ICE: http://pastebin.com/KjD7HMQi
> I confirmed with Richard that this was a latent issue.

Can you plase bootstrap/test the fix for this separately?  (doesn't
seem to be included in this patch btw)

> b) I chose widest_int for representing value, mask in ipcp_bits_lattice
> and correspondingly changed declarations for
> bit_value_unop_1/bit_value_binop_1 to take
> precision and sign instead of type (those are the only two fields that
> were used). Both these functions are exported by tree-ssa-ccp.h
> I hope that's ok ?

That's ok, but please change the functions to overloads of
bit_value_binop / bit_value_unop to not export ugly _1 names.

-  signop sgn = TYPE_SIGN (type);
-  int width = TYPE_PRECISION (type);
+  signop sgn = type_sgn;
+  int width = (int) type_precision;

please adjust parameter names to get rid of those now unnecessary
locals (and make the precision parameter an 'int').

> c) Changed streamer_read_wi/streamer_write_wi to non-static.
> Ah I see Kugan has submitted a patch for this, so I will drop this hunk.

But he streams wide_int, not widest_int.  I followed up on his
patch.

> d) We have following in tree-ssa-ccp.c:get_default_value ():
>           if (flag_tree_bit_ccp)
>             {
>               wide_int nonzero_bits = get_nonzero_bits (var);
>               if (nonzero_bits != -1)
>                 {
>                   val.lattice_val = CONSTANT;
>                   val.value = build_zero_cst (TREE_TYPE (var));
>                   val.mask = extend_mask (nonzero_bits);
>                 }
> 
> extend_mask() sets all upper bits to 1 in nonzero_bits, ie, varying
> in terms of bit-ccp.
> I suppose in tree-ccp we need to extend mask if var is parameter since we 
> don't
> know in advance what values it will receive from different callers and mark 
> all
> upper bits as 1 to be safe.

Not sure, it seems to me that we can zero-extend for unsigned types
and sign-extend for signed types (if the "sign"-bit of nonzero_bits
is one it properly makes higher bits undefined).  Can you change
the code accordingly?  (simply give extend_mask a sign-op and use
that appropriately?)  Please split out this change so it can be
tested separately.

> However I suppose with ipa, we can determine exactly which bits of
> parameter are constant and
> setting all upper bits to 1 will become unnecessary ?
> 
> For example, consider following artificial test-case:
> int f(int x)
> {
>   if (x > 300)
>     return 1;
>   else
>     return 2;
> }
> 
> int main(int argc, char **argv)
> {
>   return f(argc & 0xc) + f (argc & 0x3);
> }
> 
> For x, the mask would be meet of:
> <0, 0xc> meet <0, 0x3> == (0x3 | 0xc) | (0 ^ 0) == 0xf
> and ipcp_update_bits() sets nonzero_bits for x to 0xf.
> However get_default_value then calls extend_mask (0xf), resulting in
> all upper bits
> being set to 1 and consequently the condition if (x > 300) doesn't get folded.

But then why would the code trying to optimize the comparison look at
bits that are outside of the precision?  (where do we try to use this
info?  I see that VRP misses to use nonzero bits if no range info
is present - I suppose set_nonzero_bits misses to eventually adjust
the range.

That said, where is the folding code and why does it care for those
"uninteresting" bits at all?

> To resolve this, I added a new flag "set_by_ipa" to decl_common,
> which is set to true if the mask of parameter is determined by ipa-cp,
> and the condition changes to:
> 
> if (SSA_NAME_VAR (var)
>     && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL
>     && DECL_SET_BY_IPA (SSA_NAME_VAR (var))
>   val.mask = widest_int::from (nonzero_bits,
>                           TYPE_SIGN (TREE_TYPE (SSA_NAME_VAR (var)));
> else
>   val.mask = extend_mask (nonzero_bits);
> 
> I am not sure if adding a new flag to decl_common is a good idea. How
> do other ipa passes deal with this/similar issue ?
> 
> I suppose we would want to gate this on some flag, say -fipa-bit-cp ?
> I haven't yet gated it on the flag, will do in next version of patch.
> I have added some very simple test-cases, I will try to add more
> meaningful ones.

See above - we should avoid needing this.

> Patch passes bootstrap+test on x86_64-unknown-linux-gnu
> and cross-tested on arm*-*-* and aarch64*-*-* with the exception
> of some fortran tests failing due to above ICE.
> 
> As next steps, I am planning to extend it to handle alignment propagation,
> and do further testing (lto-bootstrap, chromium).
> I would be grateful for feedback on the current patch.

I see you do

@@ -895,7 +899,7 @@ do_dbg_cnt (void)
    Return TRUE when something was optimized.  */

 static bool
-ccp_finalize (bool nonzero_p)
+ccp_finalize (bool nonzero_p ATTRIBUTE_UNUSED)
 {
   bool something_changed;
   unsigned i;
@@ -913,10 +917,7 @@ ccp_finalize (bool nonzero_p)

       if (!name
          || (!POINTER_TYPE_P (TREE_TYPE (name))
-             && (!INTEGRAL_TYPE_P (TREE_TYPE (name))
-                 /* Don't record nonzero bits before IPA to avoid
-                    using too much memory.  */
-                 || !nonzero_p)))
+             && (!INTEGRAL_TYPE_P (TREE_TYPE (name)))))
        continue;

can you instead adjust the caller to do sth like

  if (ccp_finalize (nonzero_p || flag_ipa_cp))
    {

?  What we miss to optimize memory usage in the early CCP case
(it's run very early, before dead code elimination) is to
avoid setting alignment / nonzero bits for the case of
fully propagatable (and thus dead after substitute_and_fold)
SSA names.

So in ccp_finalize do sth like

      val = get_value (name);
      if (val->lattice_val != CONSTANT
          || TREE_CODE (val->value) != INTEGER_CST
          || val->mask == 0)
        continue;

That should cut down early CCP memory use in case of nonzero
setting significantly.

I didn't look at the propagation part but eventually the IPA-CP
lattice gets quite big.  Also the alignment lattice is very
similar to the bits lattice so why not merge those two?  But
in the end it's Martins/Honzas call here.  Note there is
trailing_wide_ints <> which could be used to improve memory usage
based on the underlying type.

Thanks,
Richard.

Reply via email to