On Tue, 20 Feb 2024, Jakub Jelinek wrote: > On Tue, Feb 20, 2024 at 12:12:11AM +0000, Jason Merrill wrote: > > On 2/19/24 02:55, Jakub Jelinek wrote: > > > On Fri, Feb 16, 2024 at 01:51:54PM +0000, Jonathan Wakely wrote: > > > > Ah, although __atomic_compare_exchange only takes pointers, the > > > > compiler replaces that with a call to __atomic_compare_exchange_n > > > > which takes the newval by value, which presumably uses an 80-bit FP > > > > register and so the padding bits become indeterminate again. > > > > > > The problem is that __atomic_{,compare_}exchange lowering if it has > > > a supported atomic 1/2/4/8/16 size emits code like: > > > _3 = *p2; > > > _4 = VIEW_CONVERT_EXPR<I_type> (_3); > > > > Hmm, I find that surprising; I thought of VIEW_CONVERT_EXPR as working on > > lvalues, not rvalues, based on the documentation describing it as roughly > > *(type2 *)&X. > > It works on both, if it is on the lhs, it obviously needs to be on an lvalue > and is something like > VIEW_CONVERT_EXPR<I_type> (mem) = something; > and if it is on rhs, it can be on an rvalue, just reinterpret bits of > something as something else, so more like > ((union { typeof (val) x; I_type y; }) (val)).y > > > Now I see that gimplify_expr does what you describe, and I wonder what the > > advantage of that is. That seems to go back to richi's r206420 for PR59471. > > r270579 for PR limited it to cases where the caller wants an rvalue; perhaps > > it should also be avoided when the operand is an INDIRECT_REF? > > Strangely we don't even try to optimize it, even at -O2 that > _3 = *p2_2(D); > _4 = VIEW_CONVERT_EXPR<I_type> (_3); > stays around until optimized. I believe VIEW_CONVERT_EXPR<I_type> is valid > around a memory reference, so it could be either > _4 = VIEW_CONVERT_EXPR<I_type> (*p2_2(D)); > or the MEM_REF with aliasing info from original pointer and type from VCE. > For optimizations, guess it is a matter of writing some match.pd rule, but > we can't rely on it for the atomics.
I'm not sure those would be really equivalent (MEM_REF vs. V_C_E as well as combined vs. split). It really depends how RTL expansion handles this (as you can see padding can be fun here). So I'd be nervous for a match.pd rule here (also we can't match memory defs). As for your patch I'd go with a MEM_REF unconditionally, I don't think we want different behavior whether there's padding or not ... > Doing it in the gimplifier is possible, but not sure how to do that easily, > given that the VIEW_CONVERT_EXPR argument can be either lvalue or rvalue > and we need to gimplify it first. > The first part is exactly what forces it into a separate SSA_NAME for the > load vs. VIEW_CONVERT_EXPR around it: > > case VIEW_CONVERT_EXPR: > if ((fallback & fb_rvalue) > && is_gimple_reg_type (TREE_TYPE (*expr_p)) > && is_gimple_reg_type (TREE_TYPE (TREE_OPERAND (*expr_p, 0)))) > { > ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, > post_p, is_gimple_val, fb_rvalue); > recalculate_side_effects (*expr_p); > break; > } that was done to help optimizing some cases (IIRC with vectors / int punning). > /* Fallthru. */ > > case ARRAY_REF: > case ARRAY_RANGE_REF: > case REALPART_EXPR: > case IMAGPART_EXPR: > case COMPONENT_REF: > ret = gimplify_compound_lval (expr_p, pre_p, post_p, > fallback ? fallback : fb_rvalue); > break; > > but if we do the gimplify_compound_lval, we'd actually force it to be > addressable (with possible errors if that isn't possible) etc. > Having just a special-case of VIEW_CONVERT_EXPR of INDIRECT_REF and > do gimplify_compound_lval in that case mikght be wrong I think if > e.g. INDIRECT_REF operand is ADDR_EXPR, shouldn't we cancel *& in that case > and still not force it into memory? > > The INDIRECT_REF: case is: > { > bool volatilep = TREE_THIS_VOLATILE (*expr_p); > bool notrap = TREE_THIS_NOTRAP (*expr_p); > tree saved_ptr_type = TREE_TYPE (TREE_OPERAND (*expr_p, 0)); > > *expr_p = fold_indirect_ref_loc (input_location, *expr_p); > if (*expr_p != save_expr) > { > ret = GS_OK; > break; > } > > ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p, > is_gimple_reg, fb_rvalue); > if (ret == GS_ERROR) > break; > > recalculate_side_effects (*expr_p); > *expr_p = fold_build2_loc (input_location, MEM_REF, > TREE_TYPE (*expr_p), > TREE_OPERAND (*expr_p, 0), > build_int_cst (saved_ptr_type, 0)); > TREE_THIS_VOLATILE (*expr_p) = volatilep; > TREE_THIS_NOTRAP (*expr_p) = notrap; > ret = GS_OK; > break; > } > so I think if we want to special case VIEW_CONVERT_EXPR on INDIRECT_REF, > we should basically copy and adjust that to the start of the case > VIEW_CONVERT_EXPR:. In particular, if fold_indirect_ref_loc did something > and it isn't a different INDIRECT_REF or something addressable, do what it > does now (i.e. possibly treat as lvalue), otherwise gimplify the INDIRECT_REF > operand and build a MEM_REF, but with the type of the VIEW_CONVERT_EXPR but > still saved_ptr_type of the INDIRECT_REF. > > Though, that all still feels like an optimization rather than guarantee > which is what we need for the atomics. Yes, and I think the frontend does it wrong - if it wants to do a load of l_type it should do that, not re-interpret the result. I suppose the docs @defbuiltin{void __atomic_exchange (@var{type} *@var{ptr}, @var{type} *@var{val}, @var{type} *@var{ret}, int @var{memorder})} This is the generic version of an atomic exchange. It stores the contents of @code{*@var{val}} into @code{*@var{ptr}}. The original value of @code{*@var{ptr}} is copied into @code{*@var{ret}}. contain too little standards legalise to constrain 'type', it just appears to use lvalues of *ptr, *val and *ret here which for floats with padding possibly isn't well-defined and "contents" isn't a standard term (it doesn't say bit representation or something similar), it also says "stores" and "copied into". That said, if the FE determines in l_type a type suitable for copying then the access should use a MEM_REF, no further "conversion" required. Richard.