On Tue, 20 Feb 2024, Jakub Jelinek wrote:

> On Tue, Feb 20, 2024 at 12:12:11AM +0000, Jason Merrill wrote:
> > On 2/19/24 02:55, Jakub Jelinek wrote:
> > > On Fri, Feb 16, 2024 at 01:51:54PM +0000, Jonathan Wakely wrote:
> > > > Ah, although __atomic_compare_exchange only takes pointers, the
> > > > compiler replaces that with a call to __atomic_compare_exchange_n
> > > > which takes the newval by value, which presumably uses an 80-bit FP
> > > > register and so the padding bits become indeterminate again.
> > > 
> > > The problem is that __atomic_{,compare_}exchange lowering if it has
> > > a supported atomic 1/2/4/8/16 size emits code like:
> > >    _3 = *p2;
> > >    _4 = VIEW_CONVERT_EXPR<I_type> (_3);
> > 
> > Hmm, I find that surprising; I thought of VIEW_CONVERT_EXPR as working on
> > lvalues, not rvalues, based on the documentation describing it as roughly
> > *(type2 *)&X.
> 
> It works on both, if it is on the lhs, it obviously needs to be on an lvalue
> and is something like
> VIEW_CONVERT_EXPR<I_type> (mem) = something;
> and if it is on rhs, it can be on an rvalue, just reinterpret bits of
> something as something else, so more like
> ((union { typeof (val) x; I_type y; }) (val)).y
> 
> > Now I see that gimplify_expr does what you describe, and I wonder what the
> > advantage of that is.  That seems to go back to richi's r206420 for PR59471.
> > r270579 for PR limited it to cases where the caller wants an rvalue; perhaps
> > it should also be avoided when the operand is an INDIRECT_REF?
> 
> Strangely we don't even try to optimize it, even at -O2 that
>   _3 = *p2_2(D);
>   _4 = VIEW_CONVERT_EXPR<I_type> (_3);
> stays around until optimized.  I believe VIEW_CONVERT_EXPR<I_type> is valid
> around a memory reference, so it could be either
>   _4 = VIEW_CONVERT_EXPR<I_type> (*p2_2(D));
> or the MEM_REF with aliasing info from original pointer and type from VCE.
> For optimizations, guess it is a matter of writing some match.pd rule, but
> we can't rely on it for the atomics.

I'm not sure those would be really equivalent (MEM_REF vs. V_C_E
as well as combined vs. split).  It really depends how RTL expansion
handles this (as you can see padding can be fun here).

So I'd be nervous for a match.pd rule here (also we can't match
memory defs).

As for your patch I'd go with a MEM_REF unconditionally, I don't
think we want different behavior whether there's padding or not ...

> Doing it in the gimplifier is possible, but not sure how to do that easily,
> given that the VIEW_CONVERT_EXPR argument can be either lvalue or rvalue
> and we need to gimplify it first.
> The first part is exactly what forces it into a separate SSA_NAME for the
> load vs. VIEW_CONVERT_EXPR around it:
> 
>         case VIEW_CONVERT_EXPR:
>           if ((fallback & fb_rvalue)
>               && is_gimple_reg_type (TREE_TYPE (*expr_p))
>               && is_gimple_reg_type (TREE_TYPE (TREE_OPERAND (*expr_p, 0))))
>             {
>               ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
>                                    post_p, is_gimple_val, fb_rvalue);
>               recalculate_side_effects (*expr_p);
>               break;
>             }

that was done to help optimizing some cases (IIRC with vectors / int 
punning).

>           /* Fallthru.  */
> 
>         case ARRAY_REF:
>         case ARRAY_RANGE_REF:
>         case REALPART_EXPR:
>         case IMAGPART_EXPR:
>         case COMPONENT_REF:
>           ret = gimplify_compound_lval (expr_p, pre_p, post_p,
>                                         fallback ? fallback : fb_rvalue);
>           break;
> 
> but if we do the gimplify_compound_lval, we'd actually force it to be
> addressable (with possible errors if that isn't possible) etc.
> Having just a special-case of VIEW_CONVERT_EXPR of INDIRECT_REF and
> do gimplify_compound_lval in that case mikght be wrong I think if
> e.g. INDIRECT_REF operand is ADDR_EXPR, shouldn't we cancel *& in that case
> and still not force it into memory?
> 
> The INDIRECT_REF: case is:
>           {
>             bool volatilep = TREE_THIS_VOLATILE (*expr_p);
>             bool notrap = TREE_THIS_NOTRAP (*expr_p);
>             tree saved_ptr_type = TREE_TYPE (TREE_OPERAND (*expr_p, 0));
>       
>             *expr_p = fold_indirect_ref_loc (input_location, *expr_p);
>             if (*expr_p != save_expr)
>               {
>                 ret = GS_OK;
>                 break;
>               }
>         
>             ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
>                                  is_gimple_reg, fb_rvalue);
>             if (ret == GS_ERROR)
>               break;
>         
>             recalculate_side_effects (*expr_p);
>             *expr_p = fold_build2_loc (input_location, MEM_REF,
>                                        TREE_TYPE (*expr_p),
>                                        TREE_OPERAND (*expr_p, 0),
>                                        build_int_cst (saved_ptr_type, 0));
>             TREE_THIS_VOLATILE (*expr_p) = volatilep;
>             TREE_THIS_NOTRAP (*expr_p) = notrap;
>             ret = GS_OK;
>             break;
>           }
> so I think if we want to special case VIEW_CONVERT_EXPR on INDIRECT_REF,
> we should basically copy and adjust that to the start of the case
> VIEW_CONVERT_EXPR:.  In particular, if fold_indirect_ref_loc did something
> and it isn't a different INDIRECT_REF or something addressable, do what it
> does now (i.e. possibly treat as lvalue), otherwise gimplify the INDIRECT_REF
> operand and build a MEM_REF, but with the type of the VIEW_CONVERT_EXPR but
> still saved_ptr_type of the INDIRECT_REF.
> 
> Though, that all still feels like an optimization rather than guarantee
> which is what we need for the atomics.

Yes, and I think the frontend does it wrong - if it wants to do a load
of l_type it should do that, not re-interpret the result.  I suppose
the docs

@defbuiltin{void __atomic_exchange (@var{type} *@var{ptr}, @var{type} 
*@var{val}, @var{type} *@var{ret}, int @var{memorder})}
This is the generic version of an atomic exchange.  It stores the
contents of @code{*@var{val}} into @code{*@var{ptr}}. The original value
of @code{*@var{ptr}} is copied into @code{*@var{ret}}.

contain too little standards legalise to constrain 'type', it just
appears to use lvalues of *ptr, *val and *ret here which for floats
with padding possibly isn't well-defined and "contents" isn't a
standard term (it doesn't say bit representation or something similar),
it also says "stores" and "copied into".

That said, if the FE determines in l_type a type suitable for copying
then the access should use a MEM_REF, no further "conversion" required.

Richard.

Reply via email to