subject:"Re\: \[PATCHv4\] Fix not 8\-byte aligned ldrd\/strd on ARMv5 \(PR 89544\)"

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Richard Biener

On August 15, 2019 4:52:24 PM GMT+02:00, Bernd Edlinger 
 wrote:
>On 8/15/19 2:54 PM, Richard Biener wrote:
>> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
>> 
>>
>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
>> be better to do that more conciously by
>>
>>   if (! DECL_USER_ALIGN (decl)
>>   || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>   && targetm.slow_unaligned_access (DECL_MODE (decl),
>align)))
>>
>>>
>>> ? I don't know why that would be better?
>>> If the value is underaligned no matter why, pretend it was declared
>as
>>> naturally aligned if that causes wrong code otherwise.
>>> That was the idea here.
>> 
>> It would be better because then we ignore it and use what we'd use
>> by default rather than inventing sth new.  And your patch suggests
>> it might be needed to up align even w/o DECL_USER_ALIGN.
>> 
>
>Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set?
>But it inherits the alignment from the destination variable,
>apparently. 

Yes. I think it shouldn't inherit the alignment unless we are assembling a 
static initializer. 

>
>did you mean
>if (! DECL_USER_ALIGN (decl)
>&& align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>&& ...
>?
>
>I can give it a try.

No, I meant || thus ignore DECL_USER_ALIGN if it is sth we have to satisfy with 
unaligned loads. 
>
>> IMHO whatever code later fails to properly use unaligned loads
>> should be fixed instead rather than ignoring user requested
>alignment.
>>
>> Can you quote a short testcase that explains what exactly goes
>wrong?
>> The struct-layout ones are awkward to look at...
>>
>
> Sure,
>
> $ cat test.c
> _Complex float __attribute__((aligned(1))) cf;
>
> void foo (void)
> {
>   cf = 1.0i;
> }
>
> $ arm-linux-gnueabihf-gcc -S test.c 
> during RTL pass: expand
> test.c: In function 'foo':
> test.c:5:6: internal compiler error: in gen_movsf, at
>config/arm/arm.md:7003
> 5 |   cf = 1.0i;
>   |   ~~~^~
> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
>   ../../gcc-trunk/gcc/config/arm/arm.md:7003
> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>   ../../gcc-trunk/gcc/recog.h:318
> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
>   ../../gcc-trunk/gcc/expr.c:3695
> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>   ../../gcc-trunk/gcc/expr.c:3791
> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
>   ../../gcc-trunk/gcc/expr.c:3490
> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>   ../../gcc-trunk/gcc/expr.c:3791
> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
>   ../../gcc-trunk/gcc/expr.c:5855
> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>   ../../gcc-trunk/gcc/expr.c:5441

 Huh, so why didn't it trigger

   /* Handle misaligned stores.  */
   mode = TYPE_MODE (TREE_TYPE (to));
   if ((TREE_CODE (to) == MEM_REF
|| TREE_CODE (to) == TARGET_MEM_REF)
   && mode != BLKmode
   && !mem_ref_refers_to_non_mem_p (to)
   && ((align = get_object_alignment (to))
   < GET_MODE_ALIGNMENT (mode))
   && (((icode = optab_handler (movmisalign_optab, mode))
!= CODE_FOR_nothing)
   || targetm.slow_unaligned_access (mode, align)))
 {

 ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
 var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] >>> 0x2aad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.

 Ah, 'to' is a plain DECL here so the above handling is incomplete.
 IIRC component refs like __real cf = 0.f should be handled fine
 again(?).  So, does adding || DECL_P (to) fix the case as well?

>>>
>>> So I tried this instead of the varasm.c change:
>>>
>>> Index: expr.c
>>> ===
>>> --- expr.c  (revision 274487)
>>> +++ expr.c  (working copy)
>>> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool
>nontem
>>>/* Handle misaligned stores.  */
>>>mode = TYPE_MODE (TREE_TYPE (to));
>>>if ((TREE_CODE (to) == MEM_REF
>>> -   || TREE_CODE (to) == TARGET_MEM_REF)
>>> +   || TREE_CODE (to) == TARGET_MEM_REF
>>> +   || DECL_P (to))
>>>&& mode != BLKmode
>>> -  && !mem_ref_refers_to_non_mem_p (to)
>>> +  && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>>>&& ((align = get_object_alignment (to))
>>>   < GET_MODE_ALIGNMENT (mode))
>>>&& (((icode = optab_handler (movmisalign_optab, mode))
>>>
>>> Result, yes, it fixes this test case
>>> but then I run all struct-layout-1.exp there are sill cases. where
>we have problems:
>>>
>>> In file included from
>/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M
>>>

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Bernd Edlinger

On 8/15/19 2:54 PM, Richard Biener wrote:
> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
> 
>
> Hmm.  So your patch overrides user-alignment here.  Woudln't it
> be better to do that more conciously by
>
>   if (! DECL_USER_ALIGN (decl)
>   || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>   && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
>
>>
>> ? I don't know why that would be better?
>> If the value is underaligned no matter why, pretend it was declared as
>> naturally aligned if that causes wrong code otherwise.
>> That was the idea here.
> 
> It would be better because then we ignore it and use what we'd use
> by default rather than inventing sth new.  And your patch suggests
> it might be needed to up align even w/o DECL_USER_ALIGN.
> 

Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set?
But it inherits the alignment from the destination variable, apparently.

did you mean
if (! DECL_USER_ALIGN (decl)
&& align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
&& ...
?

I can give it a try.

> IMHO whatever code later fails to properly use unaligned loads
> should be fixed instead rather than ignoring user requested alignment.
>
> Can you quote a short testcase that explains what exactly goes wrong?
> The struct-layout ones are awkward to look at...
>

 Sure,

 $ cat test.c
 _Complex float __attribute__((aligned(1))) cf;

 void foo (void)
 {
   cf = 1.0i;
 }

 $ arm-linux-gnueabihf-gcc -S test.c 
 during RTL pass: expand
 test.c: In function 'foo':
 test.c:5:6: internal compiler error: in gen_movsf, at 
 config/arm/arm.md:7003
 5 |   cf = 1.0i;
   |   ~~~^~
 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
../../gcc-trunk/gcc/config/arm/arm.md:7003
 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
../../gcc-trunk/gcc/recog.h:318
 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
../../gcc-trunk/gcc/expr.c:3695
 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
../../gcc-trunk/gcc/expr.c:3791
 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
../../gcc-trunk/gcc/expr.c:3490
 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
../../gcc-trunk/gcc/expr.c:3791
 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
../../gcc-trunk/gcc/expr.c:5855
 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
../../gcc-trunk/gcc/expr.c:5441
>>>
>>> Huh, so why didn't it trigger
>>>
>>>   /* Handle misaligned stores.  */
>>>   mode = TYPE_MODE (TREE_TYPE (to));
>>>   if ((TREE_CODE (to) == MEM_REF
>>>|| TREE_CODE (to) == TARGET_MEM_REF)
>>>   && mode != BLKmode
>>>   && !mem_ref_refers_to_non_mem_p (to)
>>>   && ((align = get_object_alignment (to))
>>>   < GET_MODE_ALIGNMENT (mode))
>>>   && (((icode = optab_handler (movmisalign_optab, mode))
>>>!= CODE_FOR_nothing)
>>>   || targetm.slow_unaligned_access (mode, align)))
>>> {
>>>
>>> ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
>>> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] >> 0x2aad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
>>>
>>> Ah, 'to' is a plain DECL here so the above handling is incomplete.
>>> IIRC component refs like __real cf = 0.f should be handled fine
>>> again(?).  So, does adding || DECL_P (to) fix the case as well?
>>>
>>
>> So I tried this instead of the varasm.c change:
>>
>> Index: expr.c
>> ===
>> --- expr.c   (revision 274487)
>> +++ expr.c   (working copy)
>> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
>>/* Handle misaligned stores.  */
>>mode = TYPE_MODE (TREE_TYPE (to));
>>if ((TREE_CODE (to) == MEM_REF
>> -   || TREE_CODE (to) == TARGET_MEM_REF)
>> +   || TREE_CODE (to) == TARGET_MEM_REF
>> +   || DECL_P (to))
>>&& mode != BLKmode
>> -  && !mem_ref_refers_to_non_mem_p (to)
>> +  && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>>&& ((align = get_object_alignment (to))
>>< GET_MODE_ALIGNMENT (mode))
>>&& (((icode = optab_handler (movmisalign_optab, mode))
>>
>> Result, yes, it fixes this test case
>> but then I run all struct-layout-1.exp there are sill cases. where we have 
>> problems:
>>
>> In file included from 
>> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M
>> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:
>>  In function 'test2112':^M
>> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10:
>>  internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M
>> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3:
>>  note: in definition of

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Richard Biener

On Thu, 15 Aug 2019, Richard Biener wrote:

> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
> > > 
> > > We can't subset an SSA_NAME.  I have really no idea what this intended
> > > to do...
> > > 
> > 
> > Nice, so would you do a patch to change that to a
> > gcc_checking_assert (TREE_CODE (tem) != SSA_NAME) ?
> > maybe with a small explanation?
> 
> I'll try.

So actually we can via BIT_FIELD_REF<_1, ...> and that _1 can end
up being expanded in memory.  See r233656 which brought this in.

Richard.

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Richard Biener

On Thu, 15 Aug 2019, Bernd Edlinger wrote:

> On 8/15/19 10:55 AM, Richard Biener wrote:
> > On Wed, 14 Aug 2019, Bernd Edlinger wrote:
> > 
> >> On 8/14/19 2:00 PM, Richard Biener wrote:
> >>
> >> Well, yes, but I was scared away by the complexity of emit_move_insn_1.
> >>
> >> It could be done, but in the moment I would be happy to have these
> >> checks of one major strict alignment target, ARM is a good candidate
> >> since most instructions work even if they are accidentally
> >> using unaligned arguments.  So middle-end errors do not always
> >> visible by ordinary tests.  Nevertheless it is a blatant violation of the
> >> contract between middle-end and back-end, which should be avoided.
> > 
> > Fair enough.
> > 
>  Several struct-layout-1.dg testcase tripped over misaligned
>  complex_cst constants, fixed by varasm.c (align_variable).
>  This is likely a wrong code bug, because misaligned complex
>  constants, are expanded to misaligned MEM_REF, but the
>  expansion cannot handle misaligned constants, only packed
>  structure fields.
> >>>
> >>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
> >>> be better to do that more conciously by
> >>>
> >>>   if (! DECL_USER_ALIGN (decl)
> >>>   || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
> >>>   && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
> >>>
> 
> ? I don't know why that would be better?
> If the value is underaligned no matter why, pretend it was declared as
> naturally aligned if that causes wrong code otherwise.
> That was the idea here.

It would be better because then we ignore it and use what we'd use
by default rather than inventing sth new.  And your patch suggests
it might be needed to up align even w/o DECL_USER_ALIGN.

> >>> ?  And why is the movmisalign optab support missing here?
> >>>
> >>
> >> Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl:
> >>
> >>   /* If we can't trust the parm stack slot to be aligned enough for its
> >>  ultimate type, don't use that slot after entry.  We'll make another
> >>  stack slot, if we need one.  */
> >>   if (stack_parm
> >>   && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
> >>&& targetm.slow_unaligned_access (data->nominal_mode,
> >>  MEM_ALIGN (stack_parm)))
> >>
> >> which also makes a variable more aligned than it is declared.
> >> But maybe both should also check the movmisalign optab in
> >> addition to slow_unaligned_access ?
> > 
> > Quite possible.
> > 
> 
> Will do, see attached new version of the patch.
> 
> >>> IMHO whatever code later fails to properly use unaligned loads
> >>> should be fixed instead rather than ignoring user requested alignment.
> >>>
> >>> Can you quote a short testcase that explains what exactly goes wrong?
> >>> The struct-layout ones are awkward to look at...
> >>>
> >>
> >> Sure,
> >>
> >> $ cat test.c
> >> _Complex float __attribute__((aligned(1))) cf;
> >>
> >> void foo (void)
> >> {
> >>   cf = 1.0i;
> >> }
> >>
> >> $ arm-linux-gnueabihf-gcc -S test.c 
> >> during RTL pass: expand
> >> test.c: In function 'foo':
> >> test.c:5:6: internal compiler error: in gen_movsf, at 
> >> config/arm/arm.md:7003
> >> 5 |   cf = 1.0i;
> >>   |   ~~~^~
> >> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
> >>../../gcc-trunk/gcc/config/arm/arm.md:7003
> >> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
> >>../../gcc-trunk/gcc/recog.h:318
> >> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
> >>../../gcc-trunk/gcc/expr.c:3695
> >> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
> >>../../gcc-trunk/gcc/expr.c:3791
> >> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
> >>../../gcc-trunk/gcc/expr.c:3490
> >> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
> >>../../gcc-trunk/gcc/expr.c:3791
> >> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
> >>../../gcc-trunk/gcc/expr.c:5855
> >> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
> >>../../gcc-trunk/gcc/expr.c:5441
> > 
> > Huh, so why didn't it trigger
> > 
> >   /* Handle misaligned stores.  */
> >   mode = TYPE_MODE (TREE_TYPE (to));
> >   if ((TREE_CODE (to) == MEM_REF
> >|| TREE_CODE (to) == TARGET_MEM_REF)
> >   && mode != BLKmode
> >   && !mem_ref_refers_to_non_mem_p (to)
> >   && ((align = get_object_alignment (to))
> >   < GET_MODE_ALIGNMENT (mode))
> >   && (((icode = optab_handler (movmisalign_optab, mode))
> >!= CODE_FOR_nothing)
> >   || targetm.slow_unaligned_access (mode, align)))
> > {
> > 
> > ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
> > var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2]  > 0x2aad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
> > 
> > Ah, 'to' is a plain DECL here so the above handling is incomplete.
> > IIRC component refs like __real cf = 0.f should be handled

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Bernd Edlinger

On 8/15/19 10:55 AM, Richard Biener wrote:
> On Wed, 14 Aug 2019, Bernd Edlinger wrote:
> 
>> On 8/14/19 2:00 PM, Richard Biener wrote:
>>
>> Well, yes, but I was scared away by the complexity of emit_move_insn_1.
>>
>> It could be done, but in the moment I would be happy to have these
>> checks of one major strict alignment target, ARM is a good candidate
>> since most instructions work even if they are accidentally
>> using unaligned arguments.  So middle-end errors do not always
>> visible by ordinary tests.  Nevertheless it is a blatant violation of the
>> contract between middle-end and back-end, which should be avoided.
> 
> Fair enough.
> 
 Several struct-layout-1.dg testcase tripped over misaligned
 complex_cst constants, fixed by varasm.c (align_variable).
 This is likely a wrong code bug, because misaligned complex
 constants, are expanded to misaligned MEM_REF, but the
 expansion cannot handle misaligned constants, only packed
 structure fields.
>>>
>>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
>>> be better to do that more conciously by
>>>
>>>   if (! DECL_USER_ALIGN (decl)
>>>   || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>>   && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
>>>

? I don't know why that would be better?
If the value is underaligned no matter why, pretend it was declared as
naturally aligned if that causes wrong code otherwise.
That was the idea here.

>>> ?  And why is the movmisalign optab support missing here?
>>>
>>
>> Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl:
>>
>>   /* If we can't trust the parm stack slot to be aligned enough for its
>>  ultimate type, don't use that slot after entry.  We'll make another
>>  stack slot, if we need one.  */
>>   if (stack_parm
>>   && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
>>&& targetm.slow_unaligned_access (data->nominal_mode,
>>  MEM_ALIGN (stack_parm)))
>>
>> which also makes a variable more aligned than it is declared.
>> But maybe both should also check the movmisalign optab in
>> addition to slow_unaligned_access ?
> 
> Quite possible.
> 

Will do, see attached new version of the patch.

>>> IMHO whatever code later fails to properly use unaligned loads
>>> should be fixed instead rather than ignoring user requested alignment.
>>>
>>> Can you quote a short testcase that explains what exactly goes wrong?
>>> The struct-layout ones are awkward to look at...
>>>
>>
>> Sure,
>>
>> $ cat test.c
>> _Complex float __attribute__((aligned(1))) cf;
>>
>> void foo (void)
>> {
>>   cf = 1.0i;
>> }
>>
>> $ arm-linux-gnueabihf-gcc -S test.c 
>> during RTL pass: expand
>> test.c: In function 'foo':
>> test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003
>> 5 |   cf = 1.0i;
>>   |   ~~~^~
>> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
>>  ../../gcc-trunk/gcc/config/arm/arm.md:7003
>> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>  ../../gcc-trunk/gcc/recog.h:318
>> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
>>  ../../gcc-trunk/gcc/expr.c:3695
>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>  ../../gcc-trunk/gcc/expr.c:3791
>> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
>>  ../../gcc-trunk/gcc/expr.c:3490
>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>  ../../gcc-trunk/gcc/expr.c:3791
>> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
>>  ../../gcc-trunk/gcc/expr.c:5855
>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>>  ../../gcc-trunk/gcc/expr.c:5441
> 
> Huh, so why didn't it trigger
> 
>   /* Handle misaligned stores.  */
>   mode = TYPE_MODE (TREE_TYPE (to));
>   if ((TREE_CODE (to) == MEM_REF
>|| TREE_CODE (to) == TARGET_MEM_REF)
>   && mode != BLKmode
>   && !mem_ref_refers_to_non_mem_p (to)
>   && ((align = get_object_alignment (to))
>   < GET_MODE_ALIGNMENT (mode))
>   && (((icode = optab_handler (movmisalign_optab, mode))
>!= CODE_FOR_nothing)
>   || targetm.slow_unaligned_access (mode, align)))
> {
> 
> ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2]  0x2aad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
> 
> Ah, 'to' is a plain DECL here so the above handling is incomplete.
> IIRC component refs like __real cf = 0.f should be handled fine
> again(?).  So, does adding || DECL_P (to) fix the case as well?
> 

So I tried this instead of the varasm.c change:

Index: expr.c
===
--- expr.c  (revision 274487)
+++ expr.c  (working copy)
@@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
   /* Handle misaligned stores.  */
   mode = TYPE_MODE (TREE_TYPE (to));
   if ((TREE_CODE (to) == MEM_REF
-   ||

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Richard Biener

On Wed, 14 Aug 2019, Bernd Edlinger wrote:

> On 8/14/19 2:00 PM, Richard Biener wrote:
> > On Thu, 8 Aug 2019, Bernd Edlinger wrote:
> > 
> >> On 8/2/19 9:01 PM, Bernd Edlinger wrote:
> >>> On 8/2/19 3:11 PM, Richard Biener wrote:
>  On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> 
> >
> > I have no test coverage for the movmisalign optab though, so I
> > rely on your code review for that part.
> 
>  It looks OK.  I tried to make it trigger on the following on
>  i?86 with -msse2:
> 
>  typedef int v4si __attribute__((vector_size (16)));
> 
>  struct S { v4si v; } __attribute__((packed));
> 
>  v4si foo (struct S s)
>  {
>    return s.v;
>  }
> 
> >>>
> >>> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> >>> So the test case could be made to trigger it this way:
> >>>
> >>> typedef int v4si __attribute__((vector_size (16)));
> >>>
> >>> struct S { v4si v; } __attribute__((packed));
> >>>
> >>> int t;
> >>> v4si foo (struct S a, struct S b, struct S c, struct S d,
> >>>   struct S e, struct S f, struct S g, struct S h,
> >>>   int i, int j, int k, int l, int m, int n,
> >>>   int o, struct S s)
> >>> {
> >>>   t = o;
> >>>   return s.v;
> >>> }
> >>>
> >>
> >> Ah, I realized that there are already a couple of very similar
> >> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c,
> >> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c,
> >> which also manage to execute the movmisalign code with the latest patch
> >> version.  So I thought that it is not necessary to add another one.
> >>
> >>> However the code path is still not reached, since 
> >>> targetm.slow_ualigned_access
> >>> is always FALSE, which is probably a flaw in my patch.
> >>>
> >>> So I think,
> >>>
> >>> +  else if (MEM_P (data->entry_parm)
> >>> +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >>> + > MEM_ALIGN (data->entry_parm)
> >>> +  && targetm.slow_unaligned_access (promoted_nominal_mode,
> >>> +MEM_ALIGN 
> >>> (data->entry_parm)))
> >>>
> >>> should probably better be
> >>>
> >>> +  else if (MEM_P (data->entry_parm)
> >>> +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >>> + > MEM_ALIGN (data->entry_parm)
> >>> +&& (((icode = optab_handler (movmisalign_optab, 
> >>> promoted_nominal_mode))
> >>> + != CODE_FOR_nothing)
> >>> +|| targetm.slow_unaligned_access (promoted_nominal_mode,
> >>> +  MEM_ALIGN 
> >>> (data->entry_parm
> >>>
> >>> Right?
> >>>
> >>> Then the modified test case would use the movmisalign optab.
> >>> However nothing changes in the end, since the i386 back-end is used to 
> >>> work
> >>> around the middle end not using movmisalign optab when it should do so.
> >>>
> >>
> >> I prefer the second form of the check, as it offers more test coverage,
> >> and is probably more correct than the former.
> >>
> >> Note there are more variations of this misalign check in expr.c,
> >> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR:
> >>
> >> && mode != BLKmode
> >> && align < GET_MODE_ALIGNMENT (mode))
> >>   {
> >> if ((icode = optab_handler (movmisalign_optab, mode))
> >> != CODE_FOR_nothing)
> >>   [...]
> >> else if (targetm.slow_unaligned_access (mode, align))
> >>   temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
> >> 0, TYPE_UNSIGNED (TREE_TYPE (exp)),
> >> (modifier == EXPAND_STACK_PARM
> >>  ? NULL_RTX : target),
> >> mode, mode, false, alt_rtl);
> >>
> >> I wonder if they are correct this way, why shouldn't we use the movmisalign
> >> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ?
> > 
> > Doesn't the code do exactly this?  Prefer movmisalign over 
> > extrct_bit_field?
> > 
> 
> Ah, yes.  How could I miss that.
> 
> >>
> >>> I wonder if I should try to add a gcc_checking_assert to the mov 
> >>> expand
> >>> patterns that the memory is properly aligned ?
> >>>
> >>
> >> Wow, that was a really exciting bug-hunt with those assertions around...
> > 
> > :)
> > 
>  @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> 
> did_conversion = true;
>   }
>  +  else if (MEM_P (data->entry_parm)
>  +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>  + > MEM_ALIGN (data->entry_parm)
> 
>  we arrive here by-passing
> 
>    else if (need_conversion)
>  {
>    /* We did not have an insn to convert directly, or the sequence
>   generated appeared unsafe.  We must first copy the parm to a
>

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-14 Thread Bernd Edlinger

On 8/14/19 2:00 PM, Richard Biener wrote:
> On Thu, 8 Aug 2019, Bernd Edlinger wrote:
> 
>> On 8/2/19 9:01 PM, Bernd Edlinger wrote:
>>> On 8/2/19 3:11 PM, Richard Biener wrote:
 On Tue, 30 Jul 2019, Bernd Edlinger wrote:

>
> I have no test coverage for the movmisalign optab though, so I
> rely on your code review for that part.

 It looks OK.  I tried to make it trigger on the following on
 i?86 with -msse2:

 typedef int v4si __attribute__((vector_size (16)));

 struct S { v4si v; } __attribute__((packed));

 v4si foo (struct S s)
 {
   return s.v;
 }

>>>
>>> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
>>> So the test case could be made to trigger it this way:
>>>
>>> typedef int v4si __attribute__((vector_size (16)));
>>>
>>> struct S { v4si v; } __attribute__((packed));
>>>
>>> int t;
>>> v4si foo (struct S a, struct S b, struct S c, struct S d,
>>>   struct S e, struct S f, struct S g, struct S h,
>>>   int i, int j, int k, int l, int m, int n,
>>>   int o, struct S s)
>>> {
>>>   t = o;
>>>   return s.v;
>>> }
>>>
>>
>> Ah, I realized that there are already a couple of very similar
>> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c,
>> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c,
>> which also manage to execute the movmisalign code with the latest patch
>> version.  So I thought that it is not necessary to add another one.
>>
>>> However the code path is still not reached, since 
>>> targetm.slow_ualigned_access
>>> is always FALSE, which is probably a flaw in my patch.
>>>
>>> So I think,
>>>
>>> +  else if (MEM_P (data->entry_parm)
>>> +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>>> + > MEM_ALIGN (data->entry_parm)
>>> +  && targetm.slow_unaligned_access (promoted_nominal_mode,
>>> +MEM_ALIGN (data->entry_parm)))
>>>
>>> should probably better be
>>>
>>> +  else if (MEM_P (data->entry_parm)
>>> +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>>> + > MEM_ALIGN (data->entry_parm)
>>> +&& (((icode = optab_handler (movmisalign_optab, 
>>> promoted_nominal_mode))
>>> + != CODE_FOR_nothing)
>>> +|| targetm.slow_unaligned_access (promoted_nominal_mode,
>>> +  MEM_ALIGN 
>>> (data->entry_parm
>>>
>>> Right?
>>>
>>> Then the modified test case would use the movmisalign optab.
>>> However nothing changes in the end, since the i386 back-end is used to work
>>> around the middle end not using movmisalign optab when it should do so.
>>>
>>
>> I prefer the second form of the check, as it offers more test coverage,
>> and is probably more correct than the former.
>>
>> Note there are more variations of this misalign check in expr.c,
>> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR:
>>
>> && mode != BLKmode
>> && align < GET_MODE_ALIGNMENT (mode))
>>   {
>> if ((icode = optab_handler (movmisalign_optab, mode))
>> != CODE_FOR_nothing)
>>   [...]
>> else if (targetm.slow_unaligned_access (mode, align))
>>   temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
>> 0, TYPE_UNSIGNED (TREE_TYPE (exp)),
>> (modifier == EXPAND_STACK_PARM
>>  ? NULL_RTX : target),
>> mode, mode, false, alt_rtl);
>>
>> I wonder if they are correct this way, why shouldn't we use the movmisalign
>> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ?
> 
> Doesn't the code do exactly this?  Prefer movmisalign over 
> extrct_bit_field?
> 

Ah, yes.  How could I miss that.

>>
>>> I wonder if I should try to add a gcc_checking_assert to the mov 
>>> expand
>>> patterns that the memory is properly aligned ?
>>>
>>
>> Wow, that was a really exciting bug-hunt with those assertions around...
> 
> :)
> 
 @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all

did_conversion = true;
  }
 +  else if (MEM_P (data->entry_parm)
 +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
 + > MEM_ALIGN (data->entry_parm)

 we arrive here by-passing

   else if (need_conversion)
 {
   /* We did not have an insn to convert directly, or the sequence
  generated appeared unsafe.  We must first copy the parm to a
  pseudo reg, and save the conversion until after all
  parameters have been moved.  */

   int save_tree_used;
   rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));

   emit_move_insn (tempreg, validated_mem);

 but this move instruction is invalid in the

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-14 Thread Richard Biener

On Thu, 8 Aug 2019, Bernd Edlinger wrote:

> On 8/2/19 9:01 PM, Bernd Edlinger wrote:
> > On 8/2/19 3:11 PM, Richard Biener wrote:
> >> On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> >>
> >>>
> >>> I have no test coverage for the movmisalign optab though, so I
> >>> rely on your code review for that part.
> >>
> >> It looks OK.  I tried to make it trigger on the following on
> >> i?86 with -msse2:
> >>
> >> typedef int v4si __attribute__((vector_size (16)));
> >>
> >> struct S { v4si v; } __attribute__((packed));
> >>
> >> v4si foo (struct S s)
> >> {
> >>   return s.v;
> >> }
> >>
> > 
> > Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> > So the test case could be made to trigger it this way:
> > 
> > typedef int v4si __attribute__((vector_size (16)));
> > 
> > struct S { v4si v; } __attribute__((packed));
> > 
> > int t;
> > v4si foo (struct S a, struct S b, struct S c, struct S d,
> >   struct S e, struct S f, struct S g, struct S h,
> >   int i, int j, int k, int l, int m, int n,
> >   int o, struct S s)
> > {
> >   t = o;
> >   return s.v;
> > }
> > 
> 
> Ah, I realized that there are already a couple of very similar
> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c,
> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c,
> which also manage to execute the movmisalign code with the latest patch
> version.  So I thought that it is not necessary to add another one.
> 
> > However the code path is still not reached, since 
> > targetm.slow_ualigned_access
> > is always FALSE, which is probably a flaw in my patch.
> > 
> > So I think,
> > 
> > +  else if (MEM_P (data->entry_parm)
> > +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > + > MEM_ALIGN (data->entry_parm)
> > +  && targetm.slow_unaligned_access (promoted_nominal_mode,
> > +MEM_ALIGN (data->entry_parm)))
> > 
> > should probably better be
> > 
> > +  else if (MEM_P (data->entry_parm)
> > +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > + > MEM_ALIGN (data->entry_parm)
> > +&& (((icode = optab_handler (movmisalign_optab, 
> > promoted_nominal_mode))
> > + != CODE_FOR_nothing)
> > +|| targetm.slow_unaligned_access (promoted_nominal_mode,
> > +  MEM_ALIGN 
> > (data->entry_parm
> > 
> > Right?
> > 
> > Then the modified test case would use the movmisalign optab.
> > However nothing changes in the end, since the i386 back-end is used to work
> > around the middle end not using movmisalign optab when it should do so.
> > 
> 
> I prefer the second form of the check, as it offers more test coverage,
> and is probably more correct than the former.
> 
> Note there are more variations of this misalign check in expr.c,
> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR:
> 
> && mode != BLKmode
> && align < GET_MODE_ALIGNMENT (mode))
>   {
> if ((icode = optab_handler (movmisalign_optab, mode))
> != CODE_FOR_nothing)
>   [...]
> else if (targetm.slow_unaligned_access (mode, align))
>   temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
> 0, TYPE_UNSIGNED (TREE_TYPE (exp)),
> (modifier == EXPAND_STACK_PARM
>  ? NULL_RTX : target),
> mode, mode, false, alt_rtl);
> 
> I wonder if they are correct this way, why shouldn't we use the movmisalign
> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ?

Doesn't the code do exactly this?  Prefer movmisalign over 
extrct_bit_field?

> 
> > I wonder if I should try to add a gcc_checking_assert to the mov 
> > expand
> > patterns that the memory is properly aligned ?
> >
> 
> Wow, that was a really exciting bug-hunt with those assertions around...

:)

> >> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> >>
> >>did_conversion = true;
> >>  }
> >> +  else if (MEM_P (data->entry_parm)
> >> +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >> + > MEM_ALIGN (data->entry_parm)
> >>
> >> we arrive here by-passing
> >>
> >>   else if (need_conversion)
> >> {
> >>   /* We did not have an insn to convert directly, or the sequence
> >>  generated appeared unsafe.  We must first copy the parm to a
> >>  pseudo reg, and save the conversion until after all
> >>  parameters have been moved.  */
> >>
> >>   int save_tree_used;
> >>   rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> >>
> >>   emit_move_insn (tempreg, validated_mem);
> >>
> >> but this move instruction is invalid in the same way as the case
> >> you fix, no?  So wouldn't it be better to do
> >>
> > 
> > We could do that, but I

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

8 matches

Site Navigation

Mail list logo

Footer information