On Tue, Apr 19, 2022 at 11:36 PM Qing Zhao <qing.z...@oracle.com> wrote: > > > > > On Apr 14, 2022, at 1:53 AM, Richard Biener <richard.guent...@gmail.com> > > wrote: > > > > On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao <qing.z...@oracle.com> wrote: > >> > >> Hi, Richard, > >> > >> Thanks a lot for taking a look at this issue (and Sorry that I haven’t > >> fixed this one yet, I was distracted by other tasks then just forgot this > >> one….) > >> > >>> On Apr 13, 2022, at 3:41 AM, Richard Biener <richard.guent...@gmail.com> > >>> wrote: > >>> > >>> On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches > >>> <gcc-patches@gcc.gnu.org> wrote: > >>>> > >>>> > >>>> > >>>>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek <ja...@redhat.com> wrote: > >>>>> > >>>>> Hi! > >>>>> > >>>>> For IBM double double I've added in PR95450 and PR99648 verification > >>>>> that > >>>>> when we at the tree/GIMPLE or RTL level interpret target bytes as a > >>>>> REAL_CST > >>>>> or CONST_DOUBLE constant, we try to encode it back to target bytes and > >>>>> verify it is the same. > >>>>> This is because our real.c support isn't able to represent all valid > >>>>> values > >>>>> of IBM double double which has variable precision. > >>>>> In PR104522, it has been noted that we have similar problem with the > >>>>> Intel/Motorola extended XFmode formats, our internal representation > >>>>> isn't > >>>>> able to record pseudo denormals, pseudo infinities, pseudo NaNs and > >>>>> unnormal > >>>>> values. > >>>>> So, the following patch is an attempt to extend that verification to all > >>>>> floats. > >>>>> Unfortunately, it wasn't that straightforward, because the > >>>>> __builtin_clear_padding code exactly for the XFmode long doubles needs > >>>>> to > >>>>> discover what bits are padding and does that by interpreting memory of > >>>>> all 1s. That is actually a valid supported value, a qNaN with negative > >>>>> sign with all mantissa bits set, but the verification includes also the > >>>>> padding bits (exactly what __builtin_clear_padding wants to figure out) > >>>>> and so fails the comparison check and so we ICE. > >>>>> The patch fixes that case by moving that verification from > >>>>> native_interpret_real to its caller, so that clear_padding_type can > >>>>> call native_interpret_real and avoid that extra check. > >>>>> > >>>>> With this, the only thing that regresses in the testsuite is > >>>>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times > >>>>> long\\t-16843010 5 > >>>>> because it decides to use a pattern that has non-zero bits in the > >>>>> padding > >>>>> bits of the long double, so the simplify-rtx.cc change prevents folding > >>>>> a SUBREG into a constant. We emit (the testcase is -O0 but we emit > >>>>> worse > >>>>> code at all opt levels) something like: > >>>>> movabsq $-72340172838076674, %rax > >>>>> movabsq $-72340172838076674, %rdx > >>>>> movq %rax, -48(%rbp) > >>>>> movq %rdx, -40(%rbp) > >>>>> fldt -48(%rbp) > >>>>> fstpt -32(%rbp) > >>>>> instead of > >>>>> fldt .LC2(%rip) > >>>>> fstpt -32(%rbp) > >>>>> ... > >>>>> .LC2: > >>>>> .long -16843010 > >>>>> .long -16843010 > >>>>> .long 65278 > >>>>> .long 0 > >>>>> Note, neither of those sequences actually stores the padding bits, fstpt > >>>>> simply doesn't touch them. > >>>>> For vars with clear_padding_real_needs_padding_p types that are > >>>>> allocated > >>>>> to memory at expansion time, I'd say much better would be to do the > >>>>> stores > >>>>> using integral modes rather than XFmode, so do that: > >>>>> movabsq $-72340172838076674, %rax > >>>>> movq %rax, -32(%rbp) > >>>>> movq %rax, -24(%rbp) > >>>>> directly. That is the only way to ensure the padding bits are > >>>>> initialized > >>>>> (or expand __builtin_clear_padding, but then you initialize separately > >>>>> the > >>>>> value bits and padding bits). > >>>>> > >>>>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as > >>>>> mentioned > >>>>> above, the gcc.target/i386/auto-init-4.c case is unresolved. > >>>> > >>>> Thanks, I will try to fix this testing case in a later patch. > >>> > >>> I've looked at this FAIL now and really wonder whether "pattern init" as > >>> implemented makes any sense for non-integral types. > >>> We end up with > >>> initializing a register (SSA name) with > >>> > >>> VIEW_CONVERT_EXPR<long double>(0xfefefefefefefefefefefefefefefefe) > >>> > >>> as we go building a TImode constant (we verified we have a TImode SET!) > >>> but then > >>> > >>> /* Pun the LHS to make sure its type has constant size > >>> unless it is an SSA name where that's already known. */ > >>> if (TREE_CODE (lhs) != SSA_NAME) > >>> lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs); > >>> else > >>> init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init); > >>> ... > >>> expand_assignment (lhs, init, false); > >>> > >>> and generally registers do not have any padding. This weird expansion > >>> then causes us to spill the TImode constant and reload the XFmode value, > >>> which is definitely not optimal here. > >>> > >>> One approach to avoid the worse code generation would be to use mode > >>> specific patterns for registers (like using a NaN or a target specific > >>> value that > >>> can be loaded cheaply), > >> > >> You mean that using “mode specific patterns” ONLY for registers? > >> Can we use “mode specific patterns” consistently for both registers and > >> memory? > > > > The difference is that registers don't have padding while memory > > possibly does, also > > for aggregates using different patterns is too expensive IMHO. For > > registers the extra > > complication with generic patterns is that efficient initialization > > (without going through memory) > > should be a priority IMHO. > > > > And for stack memory I still think that initializing the full > > allocated frame (including padding > > between variables!) is the best approach. > > > >> LLVM use different patterns for different types (Integral, Float, pointer > >> type, etc…) in order to > >> Catch bugs easily for different types. > >> > >> The beginning versions of this patch tried to do similar as LLVM, but > >> later we gave up this and > >> Use the same pattern for all different types. > >> > >> If now, we want to use “mode specific patterns” for registers, I am > >> wondering whether it’s > >> Better to just use “mode specific patterns” consistently for both > >> registers and memory? > >> > >>> another would be to simply fall back to zero > >>> initialization > >>> when we fail to constant fold the initializer like with > >>> > >>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > >>> index 8b1733e20c4..a4b995f71e4 100644 > >>> --- a/gcc/internal-fn.cc > >>> +++ b/gcc/internal-fn.cc > >>> @@ -3125,7 +3125,11 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt) > >>> if (TREE_CODE (lhs) != SSA_NAME) > >>> lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs); > >>> else > >>> - init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init); > >>> + { > >>> + init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), > >>> init); > >>> + if (!CONSTANT_CLASS_P (init)) > >>> + init = build_zero_cst (TREE_TYPE (lhs)); > >>> + } > >>> } > >>> else > >>> /* Use zero-init also for variable-length sizes. */ > >>> > >>> note that still does not address the issue noted by Jakub that we do not > >>> initialize padding this way - but of course that's because we expand a > >>> special assignment from .DEFERRED_INIT and are not initializing > >>> the storage allocated for the variable on the stack (at -O0) by RTL > >>> expansion. Ideally .DEFERRED_INIT "expansion" for stack variables > >>> would take place there, simply filling the allocated frame with the > >>> pattern or zero. That would probably mean that RTL expansion > >>> should scoop up .DEFERRED_INITs for variables it decides not to > >>> expand to pseudos and not leave that to stmt expansion. > >> > >> Need to study this a little bit more... > > > So, Is what you mean in the above a complete rewrite of the current > “expand_DEFERRED_INIT”: > > Instead of simply using “expand_builtin_memset” for “variables that are not > in register” and “expand_assignment” > for “variables that are in register”, RTL expansion should directly expand > this call in a lower level: > > i.e, > > tree lhs = gimple_call_lhs (stmt); > … > > rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > > If (MEM_P (target)) // the variable is allocated on stack > { > // filling the allocated frame with pattern or zero. How to do it?? > } > else // the variable is in pseudo register. > { > rtx init_rtx = …; > emit_move_insn (target, init_rtx) > } > > > Is the above a correct understanding?
No, I would simply expand the automatic in-memory var .DEFERRED_INIT calls to nothing but instead process their effect at the time we do expand_one_stack_var, which means at CFG expansion time scan for the .DEFERRED_INITs and record whether and how to initialize the variables. We can then group the variables accordingly and block-initialize the stack space also covering padding inbetween variables. Richard. > Thanks. > > Qing > > >> > >>> > >>> I'm going to simply XFAIL this testcase for GCC 12 and opened > >>> PR105259 so we can revisit the above for GCC 13. > >> > >> I can further study this bug and try to come up with a good solution in > >> gcc13. > >> > >> Thank you! > >> > >> Qing > >>> > >>> Richard. > >>> > >>>> Qing > >>>>> > >>>> > >> >