On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao <qing.z...@oracle.com> wrote:
>
> Hi, Richard,
>
> Thanks a lot for taking a look at this issue (and Sorry that I haven’t fixed 
> this one yet, I was distracted by other tasks then just forgot this one….)
>
> > On Apr 13, 2022, at 3:41 AM, Richard Biener <richard.guent...@gmail.com> 
> > wrote:
> >
> > On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >>
> >>
> >>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> >>>
> >>> Hi!
> >>>
> >>> For IBM double double I've added in PR95450 and PR99648 verification that
> >>> when we at the tree/GIMPLE or RTL level interpret target bytes as a 
> >>> REAL_CST
> >>> or CONST_DOUBLE constant, we try to encode it back to target bytes and
> >>> verify it is the same.
> >>> This is because our real.c support isn't able to represent all valid 
> >>> values
> >>> of IBM double double which has variable precision.
> >>> In PR104522, it has been noted that we have similar problem with the
> >>> Intel/Motorola extended XFmode formats, our internal representation isn't
> >>> able to record pseudo denormals, pseudo infinities, pseudo NaNs and 
> >>> unnormal
> >>> values.
> >>> So, the following patch is an attempt to extend that verification to all
> >>> floats.
> >>> Unfortunately, it wasn't that straightforward, because the
> >>> __builtin_clear_padding code exactly for the XFmode long doubles needs to
> >>> discover what bits are padding and does that by interpreting memory of
> >>> all 1s.  That is actually a valid supported value, a qNaN with negative
> >>> sign with all mantissa bits set, but the verification includes also the
> >>> padding bits (exactly what __builtin_clear_padding wants to figure out)
> >>> and so fails the comparison check and so we ICE.
> >>> The patch fixes that case by moving that verification from
> >>> native_interpret_real to its caller, so that clear_padding_type can
> >>> call native_interpret_real and avoid that extra check.
> >>>
> >>> With this, the only thing that regresses in the testsuite is
> >>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times 
> >>> long\\t-16843010 5
> >>> because it decides to use a pattern that has non-zero bits in the padding
> >>> bits of the long double, so the simplify-rtx.cc change prevents folding
> >>> a SUBREG into a constant.  We emit (the testcase is -O0 but we emit worse
> >>> code at all opt levels) something like:
> >>>       movabsq $-72340172838076674, %rax
> >>>       movabsq $-72340172838076674, %rdx
> >>>       movq    %rax, -48(%rbp)
> >>>       movq    %rdx, -40(%rbp)
> >>>       fldt    -48(%rbp)
> >>>       fstpt   -32(%rbp)
> >>> instead of
> >>>       fldt    .LC2(%rip)
> >>>       fstpt   -32(%rbp)
> >>> ...
> >>> .LC2:
> >>>       .long   -16843010
> >>>       .long   -16843010
> >>>       .long   65278
> >>>       .long   0
> >>> Note, neither of those sequences actually stores the padding bits, fstpt
> >>> simply doesn't touch them.
> >>> For vars with clear_padding_real_needs_padding_p types that are allocated
> >>> to memory at expansion time, I'd say much better would be to do the stores
> >>> using integral modes rather than XFmode, so do that:
> >>>       movabsq $-72340172838076674, %rax
> >>>      movq    %rax, -32(%rbp)
> >>>      movq    %rax, -24(%rbp)
> >>> directly.  That is the only way to ensure the padding bits are initialized
> >>> (or expand __builtin_clear_padding, but then you initialize separately the
> >>> value bits and padding bits).
> >>>
> >>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as mentioned
> >>> above, the gcc.target/i386/auto-init-4.c case is unresolved.
> >>
> >> Thanks, I will try to fix this testing case in a later patch.
> >
> > I've looked at this FAIL now and really wonder whether "pattern init" as
> > implemented makes any sense for non-integral types.
> > We end up with
> > initializing a register (SSA name) with
> >
> >  VIEW_CONVERT_EXPR<long double>(0xfefefefefefefefefefefefefefefefe)
> >
> > as we go building a TImode constant (we verified we have a TImode SET!)
> > but then
> >
> >          /* Pun the LHS to make sure its type has constant size
> >             unless it is an SSA name where that's already known.  */
> >          if (TREE_CODE (lhs) != SSA_NAME)
> >            lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
> >          else
> >            init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
> > ...
> >      expand_assignment (lhs, init, false);
> >
> > and generally registers do not have any padding.  This weird expansion
> > then causes us to spill the TImode constant and reload the XFmode value,
> > which is definitely not optimal here.
> >
> > One approach to avoid the worse code generation would be to use mode
> > specific patterns for registers (like using a NaN or a target specific
> > value that
> > can be loaded cheaply),
>
> You mean that using “mode specific patterns” ONLY for registers?
> Can we use “mode specific patterns” consistently for both registers and 
> memory?

The difference is that registers don't have padding while memory
possibly does, also
for aggregates using different patterns is too expensive IMHO.  For
registers the extra
complication with generic patterns is that efficient initialization
(without going through memory)
should be a priority IMHO.

And for stack memory I still think that initializing the full
allocated frame (including padding
between variables!) is the best approach.

> LLVM use different patterns for different types (Integral, Float, pointer 
> type, etc…) in order to
> Catch bugs easily for different types.
>
> The beginning versions of this patch tried to do similar as LLVM, but later 
> we gave up this and
> Use the same pattern for all different types.
>
> If now, we want to use “mode specific patterns” for registers, I am wondering 
> whether it’s
> Better to just use “mode specific patterns” consistently for both registers 
> and memory?
>
> > another would be to simply fall back to zero
> > initialization
> > when we fail to constant fold the initializer like with
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 8b1733e20c4..a4b995f71e4 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -3125,7 +3125,11 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> >          if (TREE_CODE (lhs) != SSA_NAME)
> >            lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
> >          else
> > -           init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
> > +           {
> > +             init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
> > +             if (!CONSTANT_CLASS_P (init))
> > +               init = build_zero_cst (TREE_TYPE (lhs));
> > +           }
> >        }
> >       else
> >        /* Use zero-init also for variable-length sizes.  */
> >
> > note that still does not address the issue noted by Jakub that we do not
> > initialize padding this way - but of course that's because we expand a
> > special assignment from .DEFERRED_INIT and are not initializing
> > the storage allocated for the variable on the stack (at -O0) by RTL
> > expansion.  Ideally .DEFERRED_INIT "expansion" for stack variables
> > would take place there, simply filling the allocated frame with the
> > pattern or zero.  That would probably mean that RTL expansion
> > should scoop up .DEFERRED_INITs for variables it decides not to
> > expand to pseudos and not leave that to stmt expansion.
>
> Need to study this a little bit more...
>
> >
> > I'm going to simply XFAIL this testcase for GCC 12 and opened
> > PR105259 so we can revisit the above for GCC 13.
>
> I can further study this bug and try to come up with a good solution in gcc13.
>
> Thank you!
>
> Qing
> >
> > Richard.
> >
> >> Qing
> >>>
> >>
>

Reply via email to