Re: [PATCH] c++: compile time evaluation of prvalues [PR116416]

Richard Biener Sat, 28 Sep 2024 03:16:28 -0700

On Fri, 27 Sep 2024, Jakub Jelinek wrote:

> On Fri, Sep 27, 2024 at 12:14:47PM +0200, Richard Biener wrote:
> > I can investigate a bit when there's a testcase showing the issue.
> 
> The testcase is pr78687.C with Marek's cp-gimplify.cc patch.


OK, I can reproduce.  The main issue I see is that SRA refuses to totally
scalarize any union type (or a type which recursively contains a union).
That's because for total scalarization it will attempt to create 
replacements according to the component types - for unions it would need
to arrange to block-copy it - certainly possible, esp. when none of
its field is accessed, but that will end up as aggregate copy chain
probably not solving things.  For the testcase the union involved is
also quite large (40 bytes).

There are also two variables in the chain disqualified for total
scalarization, one is because

MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;

"contains a VCE" according to contains_vce_or_bfcref_p, in this case
the MEM accesses D.9898 in a type that doesn't match its declaration.
Same for the other case:

MEM[(struct variant_ref *)&D.10030].inner_storage_ = 
ptr.D.9270.inner_storage_;

I'm not sure whether that changed with the patch - without the patch
it seems we avoid touching the union portion at all.  The cruical
difference indeed seems to be

-  MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)};
-  MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
-  MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
-  MEM[(struct _storage *)&D.9472]._which = 2;
+  D.9472 = {};
+  D.9472._storage._which = 2;

where the extra zeroing triggers the need for total scalarization of it - 
otherwise only _which is determined as used and the rest of the
objects ignored.

So maybe instead of teaching SRA about unions zero-init could be
special-cased as "re-materializable".  OTOH SRA is already full of
too many special-cases.

I'll note that the bob/eos CLOBBERs unfortunately have many of the
aggregates have overlapping lifetimes:

  D.10024 = {}; 
  D.10024._storage._which = 2;
  D.10026 = D.10024;

^^^ D.10024 should be dead here

  t = D.10026;
  MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)};
  MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;
  t ={v} {CLOBBER(eos)};
  D.10026 ={v} {CLOBBER(eos)};
  D.10024 ={v} {CLOBBER(eos)};

^^^ but EOS here, even after D.10026.  For cases like in this testcase
doing BB-local aggregate liveness analysis, relying on bob/eos CLOBBERS
only to determine if an aggregate is BB-local could make for a relatively
cheap aggregate "re-use" pass, replacing D.10026 with D.10024 above.
Note bob CLOBBERS are missing for most vars.  This kind of analysis
could also help SRA code generation as it could generate only one set
of scalar replacements for a chain of aggregate copies.

> Or the
> struct S { union U { struct T { void *a, *b, *c, *d, *e; } t; struct V {} v; 
> } u; unsigned long w; };
> void bar (struct S *);
> 
> void
> foo ()
> {
>   struct S s = { .u = { .v = {} }, .w = 2 };
>   bar (&s);
> }
> I've mentioned shows the same behavior except for SRA (which
> is there of course prevented through the object escaping).
> Though, not really sure right now if this reduced testcase
> in C or C++ actually isn't supposed to clear the whole object rather than
> just the initialized fields and what exactly is CONSTRUCTOR_NO_CLEARING
> vs. !CONSTRUCTOR_NO_CLEARING supposed to mean.
> 
> On pr78687.C with Marek's patch, CONSTRUCTOR_NO_CLEARING is cleared with
>   /* The result of a constexpr function must be completely initialized.
> 
>      However, in C++20, a constexpr constructor doesn't necessarily have
>      to initialize all the fields, so we don't clear CONSTRUCTOR_NO_CLEARING
>      in order to detect reading an unitialized object in constexpr instead
>      of value-initializing it.  (reduced_constant_expression_p is expected to
>      take care of clearing the flag.)  */
>   if (TREE_CODE (result) == CONSTRUCTOR
>       && (cxx_dialect < cxx20
>           || !DECL_CONSTRUCTOR_P (fun)))
>     clear_no_implicit_zero (result);
> 
> generic.texi says:
> "Unrepresented fields will be cleared (zeroed), unless the
> CONSTRUCTOR_NO_CLEARING flag is set, in which case their value becomes
> undefined."
> Now, for RECORD_TYPE, I think the !CONSTRUCTOR_NO_CLEARING meaning is clear,
> if the flag isn't set, then if there is no constructor_elt for certain
> FIELD_DECL, that FIELD_DECL is implicitly zeroed.  The state of padding bits
> is fuzzy, we don't really have a flag whether the CONSTRUCTOR clears also
> padding bits (aka C++ zero initialization) or not.
> The problem above is with UNION_TYPE.  If the CONSTRUCTOR for it is empty,
> that should IMHO still imply zero initialization of the whole thing, we
> don't really know which union member is active.  But if the CONSTRUCTOR
> has one elt (it should never have more than one), shouldn't that mean
> (unless there is a new flag which says that padding bits are cleared too)
> that CONSTRUCTOR_NO_CLEARING and !CONSTRUCTOR_NO_CLEARING behave the same,
> in particular that the selected FIELD_DECL is initialized to whatever
> the initializer is but nothing else is?
> 
> The reason why the gimplifier clears the whole struct is because
> on (with Marek's patch on the pr78687.C testcase)
> D.10137 = 
> {._storage={.D.9542={.D.9123={._tail={.D.9181={._tail={.D.9240={._head={}}}}}}},
>  ._which=2}};
> or that
> s = {.u={.v={}}, .w=2};
> in the above testcase, categorize_ctor_elements yields
> valid_const_initializer = true
> num_nonzero_elements = 1
> num_unique_nonzero_elements = 1
> num_ctor_elements = 1
> complete_p = false
> - there is a single non-empty initializer in both CONSTRUCTORs,
> they aren't CONSTRUCTOR_NO_CLEARING and the reason complete_p is false
> is that categorize_ctor_elements_1 does:
>   if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
>                                                 num_fields, elt_type))
>     *p_complete = 0;
>   else if (*p_complete > 0
>            && type_has_padding_at_level_p (TREE_TYPE (ctor)))
>     *p_complete = -1;
> and for UNION_TYPE/QUAL_UNION_TYPE complete_ctor_at_level_p does:
>       if (num_elts == 0)
>         return false;
> ...
>       /* ??? We could look at each element of the union, and find the
>          largest element.  Which would avoid comparing the size of the
>          initialized element against any tail padding in the union.
>          Doesn't seem worth the effort...  */
>       return simple_cst_equal (TYPE_SIZE (type), TYPE_SIZE (last_type)) == 1;
> Now, given the documentation of categorize_ctor_elements:
>    * whether the constructor is complete -- in the sense that every
>      meaningful byte is explicitly given a value --
>      and place it in *P_COMPLETE:
>      -  0 if any field is missing
>      -  1 if all fields are initialized, and there's no padding
>      - -1 if all fields are initialized, but there's padding
> I'd argue this handling of UNION_TYPE/QUAL_UNION_TYPE is wrong

I agree (it's pessimistic).

> (though note that type_has_padding_at_level_p returns true if any of the
> union members is smaller than the whole, rather than checking whether
> the actually initialized one has the same size as whole), it will
> set *p_complete to 0 as if any field is missing, even when no field
> is missing, just the union has padding.
> 
> So, I think we should go with (but so far completely untested except
> for pr78687.C which is optimized with Marek's patch and the above testcase
> which doesn't have the clearing anymore) the following patch.
> 
> 2024-09-27  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR c++/116416
>       * expr.cc (categorize_ctor_elements_1): Fix up union handling of
>       *p_complete.  Clear it only if num_fields is 0 and the union has
>       at least one FIELD_DECL, set to -1 if either union has no fields
>       and non-zero size, or num_fields is 1 and complete_ctor_at_level_p
>       returned false.
>       * gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE
>       handling, return also true for UNION_TYPE with no FIELD_DECLs
>       and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE.
> 
> --- gcc/expr.cc.jj    2024-09-04 12:09:22.598808244 +0200
> +++ gcc/expr.cc       2024-09-27 15:34:20.929282525 +0200
> @@ -7218,7 +7218,36 @@ categorize_ctor_elements_1 (const_tree c
>  
>    if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
>                                               num_fields, elt_type))
> -    *p_complete = 0;
> +    {
> +      if (TREE_CODE (TREE_TYPE (ctor)) == UNION_TYPE
> +       || TREE_CODE (TREE_TYPE (ctor)) == QUAL_UNION_TYPE)
> +     {
> +       /* The union case is more complicated.  */
> +       if (num_fields == 0)
> +         {
> +           /* If the CONSTRUCTOR doesn't have any elts, it is
> +              incomplete if the union has at least one field.  */

Hmm, but an empty CTOR is zero-initialization?

You know this code more than anybody else, so I suppose the change
is OK if it works.

Thanks,
Richard.

> +           for (tree f = TYPE_FIELDS (TREE_TYPE (ctor));
> +                f; f = DECL_CHAIN (f))
> +             if (TREE_CODE (f) == FIELD_DECL)
> +               {
> +                 *p_complete = 0;
> +                 break;
> +               }
> +           /* Otherwise it has padding if the union has non-zero size.  */
> +           if (*p_complete > 0
> +               && !integer_zerop (TYPE_SIZE (TREE_TYPE (ctor))))
> +             *p_complete = -1;
> +         }
> +       /* Otherwise the CONSTRUCTOR should have exactly one element.
> +          It is then never incomplete, but if complete_ctor_at_level_p
> +          returned false, it has padding.  */
> +       else if (*p_complete > 0)
> +         *p_complete = -1;
> +     }
> +      else
> +     *p_complete = 0;
> +    }
>    else if (*p_complete > 0
>          && type_has_padding_at_level_p (TREE_TYPE (ctor)))
>      *p_complete = -1;
> --- gcc/gimple-fold.cc.jj     2024-09-09 11:25:43.197048840 +0200
> +++ gcc/gimple-fold.cc        2024-09-27 15:44:05.896355615 +0200
> @@ -4814,12 +4814,22 @@ type_has_padding_at_level_p (tree type)
>       return false;
>        }
>      case UNION_TYPE:
> +    case QUAL_UNION_TYPE:
> +      bool any_fields;
> +      any_fields = false;
>        /* If any of the fields is smaller than the whole, there is padding.  
> */
>        for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> -     if (TREE_CODE (f) == FIELD_DECL)
> -       if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
> -                             TREE_TYPE (type)) != 1)
> -         return true;
> +     if (TREE_CODE (f) != FIELD_DECL)
> +       continue;
> +     else if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
> +                                TYPE_SIZE (TREE_TYPE (type))) != 1)
> +       return true;
> +     else
> +       any_fields = true;
> +      /* If the union doesn't have any fields and still has non-zero size,
> +      all of it is padding.  */
> +      if (!any_fields && !integer_zerop (TYPE_SIZE (type)))
> +     return true;
>        return false;
>      case ARRAY_TYPE:
>      case COMPLEX_TYPE:
> 
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH] c++: compile time evaluation of prvalues [PR116416]

Reply via email to