On Wed, 14 Aug 2019, Bernd Edlinger wrote: > On 8/14/19 2:00 PM, Richard Biener wrote: > > On Thu, 8 Aug 2019, Bernd Edlinger wrote: > > > >> On 8/2/19 9:01 PM, Bernd Edlinger wrote: > >>> On 8/2/19 3:11 PM, Richard Biener wrote: > >>>> On Tue, 30 Jul 2019, Bernd Edlinger wrote: > >>>> > >>>>> > >>>>> I have no test coverage for the movmisalign optab though, so I > >>>>> rely on your code review for that part. > >>>> > >>>> It looks OK. I tried to make it trigger on the following on > >>>> i?86 with -msse2: > >>>> > >>>> typedef int v4si __attribute__((vector_size (16))); > >>>> > >>>> struct S { v4si v; } __attribute__((packed)); > >>>> > >>>> v4si foo (struct S s) > >>>> { > >>>> return s.v; > >>>> } > >>>> > >>> > >>> Hmm, the entry_parm need to be a MEM_P and an unaligned one. > >>> So the test case could be made to trigger it this way: > >>> > >>> typedef int v4si __attribute__((vector_size (16))); > >>> > >>> struct S { v4si v; } __attribute__((packed)); > >>> > >>> int t; > >>> v4si foo (struct S a, struct S b, struct S c, struct S d, > >>> struct S e, struct S f, struct S g, struct S h, > >>> int i, int j, int k, int l, int m, int n, > >>> int o, struct S s) > >>> { > >>> t = o; > >>> return s.v; > >>> } > >>> > >> > >> Ah, I realized that there are already a couple of very similar > >> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c, > >> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c, > >> which also manage to execute the movmisalign code with the latest patch > >> version. So I thought that it is not necessary to add another one. > >> > >>> However the code path is still not reached, since > >>> targetm.slow_ualigned_access > >>> is always FALSE, which is probably a flaw in my patch. > >>> > >>> So I think, > >>> > >>> + else if (MEM_P (data->entry_parm) > >>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > >>> + > MEM_ALIGN (data->entry_parm) > >>> + && targetm.slow_unaligned_access (promoted_nominal_mode, > >>> + MEM_ALIGN > >>> (data->entry_parm))) > >>> > >>> should probably better be > >>> > >>> + else if (MEM_P (data->entry_parm) > >>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > >>> + > MEM_ALIGN (data->entry_parm) > >>> + && (((icode = optab_handler (movmisalign_optab, > >>> promoted_nominal_mode)) > >>> + != CODE_FOR_nothing) > >>> + || targetm.slow_unaligned_access (promoted_nominal_mode, > >>> + MEM_ALIGN > >>> (data->entry_parm)))) > >>> > >>> Right? > >>> > >>> Then the modified test case would use the movmisalign optab. > >>> However nothing changes in the end, since the i386 back-end is used to > >>> work > >>> around the middle end not using movmisalign optab when it should do so. > >>> > >> > >> I prefer the second form of the check, as it offers more test coverage, > >> and is probably more correct than the former. > >> > >> Note there are more variations of this misalign check in expr.c, > >> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR: > >> > >> && mode != BLKmode > >> && align < GET_MODE_ALIGNMENT (mode)) > >> { > >> if ((icode = optab_handler (movmisalign_optab, mode)) > >> != CODE_FOR_nothing) > >> [...] > >> else if (targetm.slow_unaligned_access (mode, align)) > >> temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode), > >> 0, TYPE_UNSIGNED (TREE_TYPE (exp)), > >> (modifier == EXPAND_STACK_PARM > >> ? NULL_RTX : target), > >> mode, mode, false, alt_rtl); > >> > >> I wonder if they are correct this way, why shouldn't we use the movmisalign > >> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ? > > > > Doesn't the code do exactly this? Prefer movmisalign over > > extrct_bit_field? > > > > Ah, yes. How could I miss that. > > >> > >>> I wonder if I should try to add a gcc_checking_assert to the mov<mode> > >>> expand > >>> patterns that the memory is properly aligned ? > >>> > >> > >> Wow, that was a really exciting bug-hunt with those assertions around... > > > > :) > > > >>>> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all > >>>> > >>>> did_conversion = true; > >>>> } > >>>> + else if (MEM_P (data->entry_parm) > >>>> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > >>>> + > MEM_ALIGN (data->entry_parm) > >>>> > >>>> we arrive here by-passing > >>>> > >>>> else if (need_conversion) > >>>> { > >>>> /* We did not have an insn to convert directly, or the sequence > >>>> generated appeared unsafe. We must first copy the parm to a > >>>> pseudo reg, and save the conversion until after all > >>>> parameters have been moved. */ > >>>> > >>>> int save_tree_used; > >>>> rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); > >>>> > >>>> emit_move_insn (tempreg, validated_mem); > >>>> > >>>> but this move instruction is invalid in the same way as the case > >>>> you fix, no? So wouldn't it be better to do > >>>> > >>> > >>> We could do that, but I supposed that there must be a reason why > >>> assign_parm_setup_stack gets away with that same: > >>> > >>> if (data->promoted_mode != data->nominal_mode) > >>> { > >>> /* Conversion is required. */ > >>> rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); > >>> > >>> emit_move_insn (tempreg, validize_mem (copy_rtx > >>> (data->entry_parm))); > >>> > >>> > >>> So either some back-ends are too permissive with us, > >>> or there is a reason why promoted_mode != nominal_mode > >>> does not happen together with unaligned entry_parm. > >>> In a way that would be a rather unusual ABI. > >>> > >> > >> To find out if that ever happens I added a couple of checking > >> assertions in the arm mov<mode> expand patterns. > >> > >> So far the assertions did (almost) always hold, so it is likely not > >> necessary to fiddle with all those naive move instructions here. > >> > >> So my gut feeling is, leave those places alone until there is a reason > >> for changing them. > > > > Works for me - I wonder if we should add those asserts to generic > > code (guarded with flag_checking) though. > > > > Well, yes, but I was scared away by the complexity of emit_move_insn_1. > > It could be done, but in the moment I would be happy to have these > checks of one major strict alignment target, ARM is a good candidate > since most instructions work even if they are accidentally > using unaligned arguments. So middle-end errors do not always > visible by ordinary tests. Nevertheless it is a blatant violation of the > contract between middle-end and back-end, which should be avoided.
Fair enough. > >> However the assertion in movsi triggered a couple times in the > >> ada testsuite due to expand_builtin_init_descriptor using a > >> BLKmode MEM rtx, which is only 8-bit aligned. So, I set the > >> ptr_mode alignment there explicitly. > > > > Looks good given we emit ptr_mode moves into it. Thus OK independently. > > > > Thanks, committed as r274487. > > >> Several struct-layout-1.dg testcase tripped over misaligned > >> complex_cst constants, fixed by varasm.c (align_variable). > >> This is likely a wrong code bug, because misaligned complex > >> constants, are expanded to misaligned MEM_REF, but the > >> expansion cannot handle misaligned constants, only packed > >> structure fields. > > > > Hmm. So your patch overrides user-alignment here. Woudln't it > > be better to do that more conciously by > > > > if (! DECL_USER_ALIGN (decl) > > || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) > > && targetm.slow_unaligned_access (DECL_MODE (decl), align))) > > > > ? And why is the movmisalign optab support missing here? > > > > Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl: > > /* If we can't trust the parm stack slot to be aligned enough for its > ultimate type, don't use that slot after entry. We'll make another > stack slot, if we need one. */ > if (stack_parm > && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm) > && targetm.slow_unaligned_access (data->nominal_mode, > MEM_ALIGN (stack_parm))) > > which also makes a variable more aligned than it is declared. > But maybe both should also check the movmisalign optab in > addition to slow_unaligned_access ? Quite possible. > > IMHO whatever code later fails to properly use unaligned loads > > should be fixed instead rather than ignoring user requested alignment. > > > > Can you quote a short testcase that explains what exactly goes wrong? > > The struct-layout ones are awkward to look at... > > > > Sure, > > $ cat test.c > _Complex float __attribute__((aligned(1))) cf; > > void foo (void) > { > cf = 1.0i; > } > > $ arm-linux-gnueabihf-gcc -S test.c > during RTL pass: expand > test.c: In function 'foo': > test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003 > 5 | cf = 1.0i; > | ~~~^~~~~~ > 0x7ba475 gen_movsf(rtx_def*, rtx_def*) > ../../gcc-trunk/gcc/config/arm/arm.md:7003 > 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const > ../../gcc-trunk/gcc/recog.h:318 > 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*) > ../../gcc-trunk/gcc/expr.c:3695 > 0xa49914 emit_move_insn(rtx_def*, rtx_def*) > ../../gcc-trunk/gcc/expr.c:3791 > 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*) > ../../gcc-trunk/gcc/expr.c:3490 > 0xa49914 emit_move_insn(rtx_def*, rtx_def*) > ../../gcc-trunk/gcc/expr.c:3791 > 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool) > ../../gcc-trunk/gcc/expr.c:5855 > 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) > ../../gcc-trunk/gcc/expr.c:5441 Huh, so why didn't it trigger /* Handle misaligned stores. */ mode = TYPE_MODE (TREE_TYPE (to)); if ((TREE_CODE (to) == MEM_REF || TREE_CODE (to) == TARGET_MEM_REF) && mode != BLKmode && !mem_ref_refers_to_non_mem_p (to) && ((align = get_object_alignment (to)) < GET_MODE_ALIGNMENT (mode)) && (((icode = optab_handler (movmisalign_optab, mode)) != CODE_FOR_nothing) || targetm.slow_unaligned_access (mode, align))) { ? (_Complex float is 32bit aligned it seems, the DECL_RTL for the var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned. Ah, 'to' is a plain DECL here so the above handling is incomplete. IIRC component refs like __real cf = 0.f should be handled fine again(?). So, does adding || DECL_P (to) fix the case as well? > 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) > ../../gcc-trunk/gcc/expr.c:4983 > 0x93396f expand_gimple_stmt_1 > ../../gcc-trunk/gcc/cfgexpand.c:3777 > 0x93396f expand_gimple_stmt > ../../gcc-trunk/gcc/cfgexpand.c:3875 > 0x9392e1 expand_gimple_basic_block > ../../gcc-trunk/gcc/cfgexpand.c:5915 > 0x93b046 execute > ../../gcc-trunk/gcc/cfgexpand.c:6538 > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. > See <https://gcc.gnu.org/bugs/> for instructions. > > Without the hunk in varasm.c of course. > > What happens is that expand_expr_real_2 returns a unaligned mem_ref here: > > case COMPLEX_CST: > /* Handle evaluating a complex constant in a CONCAT target. */ > if (original_target && GET_CODE (original_target) == CONCAT) > { > [... this path not taken ...] > } > > /* fall through */ > > case STRING_CST: > temp = expand_expr_constant (exp, 1, modifier); > > /* temp contains a constant address. > On RISC machines where a constant address isn't valid, > make some insns to get that address into a register. */ > if (modifier != EXPAND_CONST_ADDRESS > && modifier != EXPAND_INITIALIZER > && modifier != EXPAND_SUM > && ! memory_address_addr_space_p (mode, XEXP (temp, 0), > MEM_ADDR_SPACE (temp))) > return replace_equiv_address (temp, > copy_rtx (XEXP (temp, 0))); > return temp; > > The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable > by emit_move_insn, that is expected just *everywhere* and can't be changed. > > This could probably be fixed in an ugly way in the COMPLEX_CST, handler > but OTOH, I don't see any reason why this constant has to be misaligned > when it can be easily aligned, which avoids the need for a misaligned access. If the COMPLEX_CST happends to end up in unaligned memory then that's of course a bug (unless the target requests that for all COMPLEX_CSTs). That is, if the unalignment is triggered because the store is to an unaligned decl. But I think the issue is the above one? > >> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the > >> change in expr.c (expand_expr_real_1). Certainly is it invalid > >> to read memory at a function address, but it should not ICE. > >> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks > >> like A32, so the misaligned code execution is not taken, but it is > >> set to A8 below, but then we hit an ICE if the result is used: > > > > So the user accessed it as A32. > > > >> /* Don't set memory attributes if the base expression is > >> SSA_NAME that got expanded as a MEM. In that case, we should > >> just honor its original memory attributes. */ > >> if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0)) > >> set_mem_attributes (op0, exp, 0); > > > > Huh, I don't understand this. 'tem' should never be SSA_NAME. > > tem is the result of get_inner_reference, why can't that be a SSA_NAME ? We can't subset an SSA_NAME. I have really no idea what this intended to do... > > But set_mem_attributes_minus_bitpos uses get_object_alignment_1 > > and that has special treatment for FUNCTION_DECLs that is not > > covered by > > > > /* When EXP is an actual memory reference then we can use > > TYPE_ALIGN of a pointer indirection to derive alignment. > > Do so only if get_pointer_alignment_1 did not reveal absolute > > alignment knowledge and if using that alignment would > > improve the situation. */ > > unsigned int talign; > > if (!addr_p && !known_alignment > > && (talign = min_align_of_type (TREE_TYPE (exp)) * > > BITS_PER_UNIT) > > && talign > align) > > align = talign; > > > > which could be moved out of the if-cascade. > > > > That said, setting A8 should eventually result into appropriate > > unaligned expansion, so it seems odd this triggers the assert... > > > > The function pointer is really 32-byte aligned in ARM mode to start > with... > > The problem is that the code that handles this misaligned access > is skipped because the mem_rtx has initially no MEM_ATTRS and therefore > MEM_ALIGN == 32, and therefore the code that handles the unaligned > access is not taken. BUT before the mem_rtx is returned it is > set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion, > because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be > usable with emit_move_insn. yes, as said the _access_ determines the address should be aligned so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according to the access type/mode. But we can't trust DECL_ALIGN of FUNCTION_DECLs but we _can_ trust users writing *(int *)fn (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just we may not compute nonzero bits for the actual address because of function pointer mangling) (for accessing function code I'd say this would be premature optimization, but ...) > >> > >> Finally gcc.dg/torture/pr48493.c required the change > >> in assign_parm_setup_stack. This is just not using the > >> correct MEM_ALIGN attribute value, while the memory is > >> actually aligned. > > > > But doesn't > > > > int align = STACK_SLOT_ALIGNMENT (data->passed_type, > > GET_MODE (data->entry_parm), > > TYPE_ALIGN > > (data->passed_type)); > > + if (align < (int)GET_MODE_ALIGNMENT (GET_MODE > > (data->entry_parm)) > > + && targetm.slow_unaligned_access (GET_MODE > > (data->entry_parm), > > + align)) > > + align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); > > > > hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target? > > That is, the target says, for natural alignment 64 the stack slot > > alignment can only be guaranteed 32. You can't then simply up it > > but have to use unaligned accesses (or the target/middle-end needs > > to do dynamic stack alignment). > > > Yes, maybe, but STACK_SLOT_ALIGNMENT is used in a few other places as well, > and none of them have a problem, probably because they use expand_expr, > but here we use emit_move_insn: > > if (MEM_P (src)) > { > [...] > } > else > { > if (!REG_P (src)) > src = force_reg (GET_MODE (src), src); > emit_move_insn (dest, src); > } > > So I could restrict that to > > if (!MEM_P (data->entry_parm) > && align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)) > && ((optab_handler (movmisalign_optab, > GET_MODE (data->entry_parm)) > != CODE_FOR_nothing) > || targetm.slow_unaligned_access (GET_MODE > (data->entry_parm), > align))) > align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); > > But OTOH even for arguments arriving in unaligned stack slots where > emit_block_move could handle it, that would just work against the > intention of assign_parm_adjust_stack_rtl. > > Of course there are limits how much alignment assign_stack_local > can handle, and that would result in an assertion in the emit_move_insn. > But in the end if that happens it is just an impossible target > configuration. Still I think you can't simply override STACK_SLOT_ALIGNMENT just because of the mode of an entry param, can you? If you can assume a bigger alignment then STACK_SLOT_ALIGNMENT should return it. > > > >> Note that set_mem_attributes does not > >> always preserve the MEM_ALIGN of the ref, since: > > > > set_mem_attributes sets _all_ attributes from an expression or type. > > > > Not really: > > refattrs = MEM_ATTRS (ref); > if (refattrs) > { > /* ??? Can this ever happen? Calling this routine on a MEM that > already carries memory attributes should probably be invalid. */ > [...] > attrs.align = refattrs->align; > } > else > [...] > > if (objectp || TREE_CODE (t) == INDIRECT_REF) > attrs.align = MAX (attrs.align, TYPE_ALIGN (type)); > > >> /* Default values from pre-existing memory attributes if present. */ > >> refattrs = MEM_ATTRS (ref); > >> if (refattrs) > >> { > >> /* ??? Can this ever happen? Calling this routine on a MEM that > >> already carries memory attributes should probably be invalid. */ > >> attrs.expr = refattrs->expr; > >> attrs.offset_known_p = refattrs->offset_known_p; > >> attrs.offset = refattrs->offset; > >> attrs.size_known_p = refattrs->size_known_p; > >> attrs.size = refattrs->size; > >> attrs.align = refattrs->align; > >> } > >> > >> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT > >> the MEM_ATTRS are zero, and a smaller alignment may result. > > > > Not sure what you are saying here. That > > > > set_mem_align (MEM:SI A32, 32) > > > > produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting > > the A32 but eventually computing sth lower? Yeah, that's probably > > an interesting "hole" here. I'm quite sure that if we'd do > > > > refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) > > GET_MODE (ref)]; > > > > we run into issues exactly on strict-align targets ... > > > > Yeah, that's scary... > > >> Well with those checks in place it should now be a lot harder to generate > >> invalid code on STRICT_ALIGNMENT targets, without running into an ICE. > >> > >> Attached is the latest version of my arm alignment patch. > >> > >> > >> Boot-strapped and reg-tested on x64_64-pc-linux-gnu and > >> arm-linux-gnueabihf. > >> Is it OK for trunk? > > > > @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all > > > > did_conversion = true; > > } > > + else if (MEM_P (data->entry_parm) > > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > > + > MEM_ALIGN (data->entry_parm) > > + && (((icode = optab_handler (movmisalign_optab, > > + promoted_nominal_mode)) > > + != CODE_FOR_nothing) > > + || targetm.slow_unaligned_access (promoted_nominal_mode, > > + MEM_ALIGN > > (data->entry_parm)))) > > + { > > + if (icode != CODE_FOR_nothing) > > + emit_insn (GEN_FCN (icode) (parmreg, validated_mem)); > > + else > > + rtl = parmreg = extract_bit_field (validated_mem, > > + GET_MODE_BITSIZE (promoted_nominal_mode), 0, > > + unsignedp, parmreg, > > + promoted_nominal_mode, VOIDmode, false, NULL); > > + } > > else > > emit_move_insn (parmreg, validated_mem); > > > > This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) / > > GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm) > > and promoted_nominal_mode. > > > > Yes, the idea is just to save some cycles, since > > parmreg = gen_reg_rtx (promoted_nominal_mode); > we know that parmreg will also have that mode, plus > emit_move_insn (parmreg, validated_mem) which would be called here > asserts that: > > gcc_assert (mode != BLKmode > && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode)); > > so GET_MODE(validated_mem) == GET_MODE (parmreg) == promoted_nominal_mode > > I still like the current version with promoted_nominal_mode slighhtly > better both because of performance, and the 80-column restriction. :) So if you say they are 1:1 equivalent then go for it (for this hunk, approved as "obvious"). Richard.