Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]

Qing Zhao via Gcc-patches Wed, 20 Apr 2022 09:03:16 -0700


> On Apr 20, 2022, at 5:38 AM, Richard Biener <richard.guent...@gmail.com> 
> wrote:
> 
> On Tue, Apr 19, 2022 at 11:36 PM Qing Zhao <qing.z...@oracle.com> wrote:
>> 
>> 
>> 
>>> On Apr 14, 2022, at 1:53 AM, Richard Biener <richard.guent...@gmail.com> 
>>> wrote:
>>> 
>>> On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao <qing.z...@oracle.com> wrote:
>>>> 
>>>> Hi, Richard,
>>>> 
>>>> Thanks a lot for taking a look at this issue (and Sorry that I haven’t 
>>>> fixed this one yet, I was distracted by other tasks then just forgot this 
>>>> one….)
>>>> 
>>>>> On Apr 13, 2022, at 3:41 AM, Richard Biener <richard.guent...@gmail.com> 
>>>>> wrote:
>>>>> 
>>>>> On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches
>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek <ja...@redhat.com> wrote:
>>>>>>> 
>>>>>>> Hi!
>>>>>>> 
>>>>>>> For IBM double double I've added in PR95450 and PR99648 verification 
>>>>>>> that
>>>>>>> when we at the tree/GIMPLE or RTL level interpret target bytes as a 
>>>>>>> REAL_CST
>>>>>>> or CONST_DOUBLE constant, we try to encode it back to target bytes and
>>>>>>> verify it is the same.
>>>>>>> This is because our real.c support isn't able to represent all valid 
>>>>>>> values
>>>>>>> of IBM double double which has variable precision.
>>>>>>> In PR104522, it has been noted that we have similar problem with the
>>>>>>> Intel/Motorola extended XFmode formats, our internal representation 
>>>>>>> isn't
>>>>>>> able to record pseudo denormals, pseudo infinities, pseudo NaNs and 
>>>>>>> unnormal
>>>>>>> values.
>>>>>>> So, the following patch is an attempt to extend that verification to all
>>>>>>> floats.
>>>>>>> Unfortunately, it wasn't that straightforward, because the
>>>>>>> __builtin_clear_padding code exactly for the XFmode long doubles needs 
>>>>>>> to
>>>>>>> discover what bits are padding and does that by interpreting memory of
>>>>>>> all 1s.  That is actually a valid supported value, a qNaN with negative
>>>>>>> sign with all mantissa bits set, but the verification includes also the
>>>>>>> padding bits (exactly what __builtin_clear_padding wants to figure out)
>>>>>>> and so fails the comparison check and so we ICE.
>>>>>>> The patch fixes that case by moving that verification from
>>>>>>> native_interpret_real to its caller, so that clear_padding_type can
>>>>>>> call native_interpret_real and avoid that extra check.
>>>>>>> 
>>>>>>> With this, the only thing that regresses in the testsuite is
>>>>>>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times 
>>>>>>> long\\t-16843010 5
>>>>>>> because it decides to use a pattern that has non-zero bits in the 
>>>>>>> padding
>>>>>>> bits of the long double, so the simplify-rtx.cc change prevents folding
>>>>>>> a SUBREG into a constant.  We emit (the testcase is -O0 but we emit 
>>>>>>> worse
>>>>>>> code at all opt levels) something like:
>>>>>>>     movabsq $-72340172838076674, %rax
>>>>>>>     movabsq $-72340172838076674, %rdx
>>>>>>>     movq    %rax, -48(%rbp)
>>>>>>>     movq    %rdx, -40(%rbp)
>>>>>>>     fldt    -48(%rbp)
>>>>>>>     fstpt   -32(%rbp)
>>>>>>> instead of
>>>>>>>     fldt    .LC2(%rip)
>>>>>>>     fstpt   -32(%rbp)
>>>>>>> ...
>>>>>>> .LC2:
>>>>>>>     .long   -16843010
>>>>>>>     .long   -16843010
>>>>>>>     .long   65278
>>>>>>>     .long   0
>>>>>>> Note, neither of those sequences actually stores the padding bits, fstpt
>>>>>>> simply doesn't touch them.
>>>>>>> For vars with clear_padding_real_needs_padding_p types that are 
>>>>>>> allocated
>>>>>>> to memory at expansion time, I'd say much better would be to do the 
>>>>>>> stores
>>>>>>> using integral modes rather than XFmode, so do that:
>>>>>>>     movabsq $-72340172838076674, %rax
>>>>>>>    movq    %rax, -32(%rbp)
>>>>>>>    movq    %rax, -24(%rbp)
>>>>>>> directly.  That is the only way to ensure the padding bits are 
>>>>>>> initialized
>>>>>>> (or expand __builtin_clear_padding, but then you initialize separately 
>>>>>>> the
>>>>>>> value bits and padding bits).
>>>>>>> 
>>>>>>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as 
>>>>>>> mentioned
>>>>>>> above, the gcc.target/i386/auto-init-4.c case is unresolved.
>>>>>> 
>>>>>> Thanks, I will try to fix this testing case in a later patch.
>>>>> 
>>>>> I've looked at this FAIL now and really wonder whether "pattern init" as
>>>>> implemented makes any sense for non-integral types.
>>>>> We end up with
>>>>> initializing a register (SSA name) with
>>>>> 
>>>>> VIEW_CONVERT_EXPR<long double>(0xfefefefefefefefefefefefefefefefe)
>>>>> 
>>>>> as we go building a TImode constant (we verified we have a TImode SET!)
>>>>> but then
>>>>> 
>>>>>        /* Pun the LHS to make sure its type has constant size
>>>>>           unless it is an SSA name where that's already known.  */
>>>>>        if (TREE_CODE (lhs) != SSA_NAME)
>>>>>          lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
>>>>>        else
>>>>>          init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
>>>>> ...
>>>>>    expand_assignment (lhs, init, false);
>>>>> 
>>>>> and generally registers do not have any padding.  This weird expansion
>>>>> then causes us to spill the TImode constant and reload the XFmode value,
>>>>> which is definitely not optimal here.
>>>>> 
>>>>> One approach to avoid the worse code generation would be to use mode
>>>>> specific patterns for registers (like using a NaN or a target specific
>>>>> value that
>>>>> can be loaded cheaply),
>>>> 
>>>> You mean that using “mode specific patterns” ONLY for registers?
>>>> Can we use “mode specific patterns” consistently for both registers and 
>>>> memory?
>>> 
>>> The difference is that registers don't have padding while memory
>>> possibly does, also
>>> for aggregates using different patterns is too expensive IMHO.  For
>>> registers the extra
>>> complication with generic patterns is that efficient initialization
>>> (without going through memory)
>>> should be a priority IMHO.
>>> 
>>> And for stack memory I still think that initializing the full
>>> allocated frame (including padding
>>> between variables!) is the best approach.
>>> 
>>>> LLVM use different patterns for different types (Integral, Float, pointer 
>>>> type, etc…) in order to
>>>> Catch bugs easily for different types.
>>>> 
>>>> The beginning versions of this patch tried to do similar as LLVM, but 
>>>> later we gave up this and
>>>> Use the same pattern for all different types.
>>>> 
>>>> If now, we want to use “mode specific patterns” for registers, I am 
>>>> wondering whether it’s
>>>> Better to just use “mode specific patterns” consistently for both 
>>>> registers and memory?
>>>> 
>>>>> another would be to simply fall back to zero
>>>>> initialization
>>>>> when we fail to constant fold the initializer like with
>>>>> 
>>>>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>>>>> index 8b1733e20c4..a4b995f71e4 100644
>>>>> --- a/gcc/internal-fn.cc
>>>>> +++ b/gcc/internal-fn.cc
>>>>> @@ -3125,7 +3125,11 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>>>>        if (TREE_CODE (lhs) != SSA_NAME)
>>>>>          lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
>>>>>        else
>>>>> -           init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
>>>>> +           {
>>>>> +             init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), 
>>>>> init);
>>>>> +             if (!CONSTANT_CLASS_P (init))
>>>>> +               init = build_zero_cst (TREE_TYPE (lhs));
>>>>> +           }
>>>>>      }
>>>>>     else
>>>>>      /* Use zero-init also for variable-length sizes.  */
>>>>> 
>>>>> note that still does not address the issue noted by Jakub that we do not
>>>>> initialize padding this way - but of course that's because we expand a
>>>>> special assignment from .DEFERRED_INIT and are not initializing
>>>>> the storage allocated for the variable on the stack (at -O0) by RTL
>>>>> expansion.  Ideally .DEFERRED_INIT "expansion" for stack variables
>>>>> would take place there, simply filling the allocated frame with the
>>>>> pattern or zero.  That would probably mean that RTL expansion
>>>>> should scoop up .DEFERRED_INITs for variables it decides not to
>>>>> expand to pseudos and not leave that to stmt expansion.
>>>> 
>>>> Need to study this a little bit more...
>> 
>> 
>> So, Is what you mean in the above a complete rewrite of the current 
>> “expand_DEFERRED_INIT”:
>> 
>> Instead of simply using “expand_builtin_memset” for “variables that are not 
>> in register” and  “expand_assignment”
>> for “variables that are in register”,  RTL expansion should directly expand 
>> this call in a lower level:
>> 
>> i.e,
>> 
>>  tree lhs = gimple_call_lhs (stmt);
>>  …
>> 
>>  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>> 
>>  If (MEM_P (target))   // the variable is allocated on stack
>>  {
>>     // filling the allocated frame with pattern or zero.   How to do it??
>>  }
>>  else                     // the variable is in pseudo register.
>>  {
>>    rtx init_rtx = …;
>>    emit_move_insn (target, init_rtx)
>>  }
>> 
>> 
>> Is the above a correct understanding?
> 
> No, I would simply expand the automatic in-memory var .DEFERRED_INIT
> calls to nothing
> but instead process their effect at the time we do
> expand_one_stack_var, which means at
> CFG expansion time scan for the .DEFERRED_INITs and record whether and how to
> initialize the variables.  We can then group the variables accordingly
> and block-initialize
> the stack space also covering padding inbetween variables.


So, the above only covers the variables that will be allocated in stack?
For the variables that are in pseudo registers, we still need to simply expand 
the initialization to a move insn?

And for the variables in pseudo registers, later after register allocation, 
some of them will be spilled into stack, too.
Specially for long double/complex double variables that might have padding, 
shall we specially handle them during that time?

Qing
> 
> Richard.
> 
>> Thanks.
>> 
>> Qing
>> 
>>>> 
>>>>> 
>>>>> I'm going to simply XFAIL this testcase for GCC 12 and opened
>>>>> PR105259 so we can revisit the above for GCC 13.
>>>> 
>>>> I can further study this bug and try to come up with a good solution in 
>>>> gcc13.
>>>> 
>>>> Thank you!
>>>> 
>>>> Qing
>>>>> 
>>>>> Richard.
>>>>> 
>>>>>> Qing

Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]

Reply via email to