Re: movmem pattern and missed alignment

2018-10-08 Thread Richard Biener
On Mon, Oct 8, 2018 at 3:57 PM Paul Koning  wrote:
>
> I have a movmem pattern in my target that pays attention to the alignment 
> argument.
>
> GCC isn't passing in the expected alignment part of the time.  I have this 
> test case:
>
> extern int *i, *j;
> extern int iv[40], jv[40];
>
> void f1(void)
> {
> __builtin_memcpy (i, j, 32);
> }
>
> void f2(void)
> {
> __builtin_memcpy (iv, jv, 32);
> }
>
> When the movmem pattern is called for f1, alignment is 1.  In f2, it is 2 
> (int is 2 bytes in pdp11) as expected.
>
> The compiler clearly knows that int* points to aligned data, since it 
> generates instructions that assume alignment (this is a strict-alignment 
> target) when I dereference the pointer.  But somehow it gets it wrong for 
> block move.
>
> I also see this for the individual move operations that are generated for 
> very short memcpy operations; if the count is 4, I get four move byte 
> operations for f1, but two move word operations for f2.
>
> This seems like a bug.  Am I missing something?

Yes, memcpy doesn't require anything bigger than byte alignment and
GCC infers alignemnt
only from actual memory references or from declarations (like iv /
jv).  For i and j there
are no dereferences and thus you get alignment of 1.

Richard.

>
> paul
>


Re: movmem pattern and missed alignment

2018-10-08 Thread Paul Koning



> On Oct 8, 2018, at 11:09 AM, Richard Biener  
> wrote:
> 
> On Mon, Oct 8, 2018 at 3:57 PM Paul Koning  wrote:
>> 
>> I have a movmem pattern in my target that pays attention to the alignment 
>> argument.
>> 
>> GCC isn't passing in the expected alignment part of the time.  I have this 
>> test case:
>> 
>> extern int *i, *j;
>> extern int iv[40], jv[40];
>> 
>> void f1(void)
>> {
>>__builtin_memcpy (i, j, 32);
>> }
>> 
>> void f2(void)
>> {
>>__builtin_memcpy (iv, jv, 32);
>> }
>> 
>> When the movmem pattern is called for f1, alignment is 1.  In f2, it is 2 
>> (int is 2 bytes in pdp11) as expected.
>> 
>> The compiler clearly knows that int* points to aligned data, since it 
>> generates instructions that assume alignment (this is a strict-alignment 
>> target) when I dereference the pointer.  But somehow it gets it wrong for 
>> block move.
>> 
>> I also see this for the individual move operations that are generated for 
>> very short memcpy operations; if the count is 4, I get four move byte 
>> operations for f1, but two move word operations for f2.
>> 
>> This seems like a bug.  Am I missing something?
> 
> Yes, memcpy doesn't require anything bigger than byte alignment and
> GCC infers alignemnt
> only from actual memory references or from declarations (like iv /
> jv).  For i and j there
> are no dereferences and thus you get alignment of 1.
> 
> Richard.

Ok, but why is that not a bug?  The whole point of passing alignment to the 
movmem pattern is to let it generate code that takes advantage of the 
alignment.  So we get a missed optimization.

paul



Re: movmem pattern and missed alignment

2018-10-08 Thread Michael Matz
Hi,

On Mon, 8 Oct 2018, Paul Koning wrote:

> >> extern int *i, *j;
> >> extern int iv[40], jv[40];
> >> 
> >> void f1(void)
> >> {
> >>__builtin_memcpy (i, j, 32);
> >> }
> >> 
> >> void f2(void)
> >> {
> >>__builtin_memcpy (iv, jv, 32);
> >> }
> > 
> > Yes, memcpy doesn't require anything bigger than byte alignment and
> > GCC infers alignemnt
> > only from actual memory references or from declarations (like iv /
> > jv).  For i and j there
> > are no dereferences and thus you get alignment of 1.
> > 
> > Richard.
> 
> Ok, but why is that not a bug?  The whole point of passing alignment to 
> the movmem pattern is to let it generate code that takes advantage of 
> the alignment.  So we get a missed optimization.

Only if you somewhere visibly add accesses to *i and *j.  Without them you 
only have the "accesses" via memcpy, and as Richi says, those don't imply 
any alignment requirements.  The i and j pointers might validly be char* 
pointers in disguise and hence be in fact only 1-aligned.  I.e. there's 
nothing in your small example program from which GCC can infer that those 
two global pointers are in fact 2-aligned.


Ciao,
Michael.


Re: movmem pattern and missed alignment

2018-10-08 Thread Andrew Haley
On 10/08/2018 06:20 PM, Michael Matz wrote:
> Only if you somewhere visibly add accesses to *i and *j.  Without them you 
> only have the "accesses" via memcpy, and as Richi says, those don't imply 
> any alignment requirements.  The i and j pointers might validly be char* 
> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's 
> nothing in your small example program from which GCC can infer that those 
> two global pointers are in fact 2-aligned.

So all you'd actually have to say is

void f1(void)
{
*i; *j;
__builtin_memcpy (i, j, 32);
}

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. 
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: movmem pattern and missed alignment

2018-10-08 Thread Alexander Monakov
On Mon, 8 Oct 2018, Michael Matz wrote:
> > Ok, but why is that not a bug?  The whole point of passing alignment to 
> > the movmem pattern is to let it generate code that takes advantage of 
> > the alignment.  So we get a missed optimization.
> 
> Only if you somewhere visibly add accesses to *i and *j.  Without them you 
> only have the "accesses" via memcpy, and as Richi says, those don't imply 
> any alignment requirements.  The i and j pointers might validly be char* 
> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's 
> nothing in your small example program from which GCC can infer that those 
> two global pointers are in fact 2-aligned.

Well, it's not that simple. C11 6.3.2.3 p7 makes it undefined to form an
'int *' value that is not suitably aligned:

  A pointer to an object type may be converted to a pointer to a different
  object type. If the resulting pointer is not correctly aligned for the
  referenced type, the behavior is undefined.

So in addition to what you said, we should probably say that GCC decides
not to exploit this UB in order to allow code to round-trip pointer values
via arbitrary pointer types?


To put Michael's explanation in different words:

This is not obviously a bug, because static pointer type does not imply the
dynamic pointed-to type. The caller of 'f1' could look like

void call_f1(void)
{
  short ibuf[20] = {0}, jbuf[20] = {0};
  i = (void *) ibuf;
  j = (void *) jbuf;
  f1();
}

and it's valid to memcpy from jbuf to ibuf, memcpy does not "see" the
static pointer type, and works as if by dereferencing 'char *' pointers.
(although as mentioned above it's more subtly invalid when assigning to
i and j).

If 'f1' dereferences 'i', GCC may deduce that dynamic type of '*i' is 'int' and
therefore 'i' must be suitably aligned. But in absence of dereferences GCC
does not make assumptions about dynamic type and alignment.

Alexander


Re: movmem pattern and missed alignment

2018-10-08 Thread Michael Matz
Hi,

On Mon, 8 Oct 2018, Alexander Monakov wrote:

> > Only if you somewhere visibly add accesses to *i and *j.  Without them 
> > you only have the "accesses" via memcpy, and as Richi says, those 
> > don't imply any alignment requirements.  The i and j pointers might 
> > validly be char* pointers in disguise and hence be in fact only 
> > 1-aligned.  I.e. there's nothing in your small example program from 
> > which GCC can infer that those two global pointers are in fact 
> > 2-aligned.
> 
> Well, it's not that simple. C11 6.3.2.3 p7 makes it undefined to form an 
> 'int *' value that is not suitably aligned:
> 
> So in addition to what you said, we should probably say that GCC decides
> not to exploit this UB in order to allow code to round-trip pointer values
> via arbitrary pointer types?

That's correct, I was explaining from the middle-end perspective.  There 
we are consciously more lenient as we have to support the real world and 
other languages than C.  This is one of the cases.


Ciao,
Michael.


Re: movmem pattern and missed alignment

2018-10-08 Thread Paul Koning



> On Oct 8, 2018, at 1:29 PM, Andrew Haley  wrote:
> 
> On 10/08/2018 06:20 PM, Michael Matz wrote:
>> Only if you somewhere visibly add accesses to *i and *j.  Without them you 
>> only have the "accesses" via memcpy, and as Richi says, those don't imply 
>> any alignment requirements.  The i and j pointers might validly be char* 
>> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's 
>> nothing in your small example program from which GCC can infer that those 
>> two global pointers are in fact 2-aligned.
> 
> So all you'd actually have to say is
> 
> void f1(void)
> {
>*i; *j;
>__builtin_memcpy (i, j, 32);
> }

No, that doesn't help.  Not even if I make it:

void f1(void)
{
k = *i + *j;
__builtin_memcpy (i, j, 4);
}

The first line does word aligned references to *i and *j, but the memcpy 
stubbornly remains a byte move.

paul



Re: movmem pattern and missed alignment

2018-10-08 Thread Michael Matz
Hi,

On Mon, 8 Oct 2018, Paul Koning wrote:

> > So all you'd actually have to say is
> > 
> > void f1(void)
> > {
> >*i; *j;
> >__builtin_memcpy (i, j, 32);
> > }
> 
> No, that doesn't help.  Not even if I make it:
> 
> void f1(void)
> {
> k = *i + *j;
> __builtin_memcpy (i, j, 4);
> }
> 
> The first line does word aligned references to *i and *j, but the memcpy 
> stubbornly remains a byte move.

k is a global, so the loads from i/j can't be optimized away?  If so, now 
you have a missed optimization bug ;-)  Might be non-trivial to fix for 
general situations (basically the natural alignment can only be inferred 
in regions that are dominated by such accesses, but not e.g. for:
   if (cond()) k = *i+*j;
   memcpy(i,j,4);
as cond() might be always false).


Ciao,
Michael.


Re: movmem pattern and missed alignment

2018-10-08 Thread Eric Botcazou
> That's correct, I was explaining from the middle-end perspective.  There
> we are consciously more lenient as we have to support the real world and
> other languages than C.  This is one of the cases.

This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK 
for every language on strict-alignment platforms.  This was changed only 
because of SSE on x86.

-- 
Eric Botcazou


Re: movmem pattern and missed alignment

2018-10-08 Thread Paul Koning



> On Oct 8, 2018, at 5:43 PM, Eric Botcazou  wrote:
> 
>> That's correct, I was explaining from the middle-end perspective.  There
>> we are consciously more lenient as we have to support the real world and
>> other languages than C.  This is one of the cases.
> 
> This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK 
> for every language on strict-alignment platforms.  This was changed only 
> because of SSE on x86.
> 
> -- 
> Eric Botcazou

So does that mean this should be a target-specific behavior, but it isn't at 
the moment?

paul



Re: movmem pattern and missed alignment

2018-10-08 Thread Richard Biener
On October 8, 2018 11:43:00 PM GMT+02:00, Eric Botcazou  
wrote:
>> That's correct, I was explaining from the middle-end perspective. 
>There
>> we are consciously more lenient as we have to support the real world
>and
>> other languages than C.  This is one of the cases.
>
>This had worked as Paul expects until GCC 4.4 IIRC and this was
>perfectly OK 
>for every language on strict-alignment platforms.  This was changed
>only 
>because of SSE on x86.

And because we ended up ignoring all pointer casts. 

Richard. 



Re: movmem pattern and missed alignment

2018-10-08 Thread Alexander Monakov
On Tue, 9 Oct 2018, Richard Biener wrote:
> >This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK
> >for every language on strict-alignment platforms.  This was changed only
> >because of SSE on x86.
> 
> And because we ended up ignoring all pointer casts. 

It's not quite obvious what SSE has to do with this - any hint please?

(according to my quick check this changed between gcc-4.5 and gcc-4.6)

Alexander


Re: movmem pattern and missed alignment

2018-10-08 Thread Eric Botcazou
> It's not quite obvious what SSE has to do with this - any hint please?

SSE introduced alignment constraints into the non-strict-alignment target x86 
so people didn't really want to play by the rules of strict-alignment targets.

> (according to my quick check this changed between gcc-4.5 and gcc-4.6)

Possibly indeed, I remembered GCC 4.5 as being the turning point.

-- 
Eric Botcazou


Re: movmem pattern and missed alignment

2018-10-09 Thread Andrew Haley
On 10/08/2018 07:38 PM, Paul Koning wrote:
> 
> 
>> On Oct 8, 2018, at 1:29 PM, Andrew Haley  wrote:
>>
>> On 10/08/2018 06:20 PM, Michael Matz wrote:
>>> Only if you somewhere visibly add accesses to *i and *j.  Without them you 
>>> only have the "accesses" via memcpy, and as Richi says, those don't imply 
>>> any alignment requirements.  The i and j pointers might validly be char* 
>>> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's 
>>> nothing in your small example program from which GCC can infer that those 
>>> two global pointers are in fact 2-aligned.
>>
>> So all you'd actually have to say is
>>
>> void f1(void)
>> {
>>*i; *j;
>>__builtin_memcpy (i, j, 32);
>> }
> 
> No, that doesn't help. 

It could do.

> Not even if I make it:
> 
> void f1(void)
> {
> k = *i + *j;
> __builtin_memcpy (i, j, 4);
> }
> 
> The first line does word aligned references to *i and *j, but the memcpy 
> stubbornly remains a byte move.

Right, so that is a missed optimization.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. 
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: movmem pattern and missed alignment

2018-10-09 Thread Richard Biener
On Tue, Oct 9, 2018 at 8:41 AM Eric Botcazou  wrote:
>
> > It's not quite obvious what SSE has to do with this - any hint please?
>
> SSE introduced alignment constraints into the non-strict-alignment target x86
> so people didn't really want to play by the rules of strict-alignment targets.

Yeah.  We've walked back and forth for that very issue though.  We now require
all targest to play by the same rules -- if you have a *(double *) access then
that has to be aligned according to double.

We couldn't realistically walk back and rely on alignment of addresses based
on their type (like C would allow us to do) because we've thrown away types
on addresses.  See also the thread about string-length warning stuff where
we've posted testcases that show you can get arbitrarily typed addresses
into your strlen() calls for example by means of CSE.  The middle-end is
simply not prepared to preserve that information.

It was repeatedly suggested that we _could_ derive alignment info from
function parameter types since we rely on precise typing there for example
for points-to analysis (albeit only for restrict qualification processing and
for DECL_BY_REFERENCE "pointers").  That would fix the simple testcase
that was presented here.

> > (according to my quick check this changed between gcc-4.5 and gcc-4.6)
>
> Possibly indeed, I remembered GCC 4.5 as being the turning point.

It was really changing over several releases, but yes.

Richard.

>
> --
> Eric Botcazou


Re: movmem pattern and missed alignment

2018-10-09 Thread Richard Biener
On Tue, Oct 9, 2018 at 10:02 AM Andrew Haley  wrote:
>
> On 10/08/2018 07:38 PM, Paul Koning wrote:
> >
> >
> >> On Oct 8, 2018, at 1:29 PM, Andrew Haley  wrote:
> >>
> >> On 10/08/2018 06:20 PM, Michael Matz wrote:
> >>> Only if you somewhere visibly add accesses to *i and *j.  Without them you
> >>> only have the "accesses" via memcpy, and as Richi says, those don't imply
> >>> any alignment requirements.  The i and j pointers might validly be char*
> >>> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's
> >>> nothing in your small example program from which GCC can infer that those
> >>> two global pointers are in fact 2-aligned.
> >>
> >> So all you'd actually have to say is
> >>
> >> void f1(void)
> >> {
> >>*i; *j;
> >>__builtin_memcpy (i, j, 32);
> >> }
> >
> > No, that doesn't help.
>
> It could do.
>
> > Not even if I make it:
> >
> > void f1(void)
> > {
> > k = *i + *j;
> > __builtin_memcpy (i, j, 4);
> > }
> >
> > The first line does word aligned references to *i and *j, but the memcpy 
> > stubbornly remains a byte move.
>
> Right, so that is a missed optimization.

Yes.  Note that on GIMPLE alignment of pointers info is carried as
side-info for SSA names
which make the above cases difficult to deal with since the
dereference and the call argument
use the same SSA names.  So if you consider

  if (i_1 & 7 == 0)
   {
 k = *i_1;
 __builtin_memcpy (i_1, j, 4);
   }

then we cannot set the alignment of i_1 at/after k = *i_1 because doing so would
affect the alignment test which we'd then optimize away.  We'd need to introduce
a SSA copy to get a new SSA name but that would be optimized away quickly.

So the option would be to change the representation of __builtin_memcpy
either by making it an aggregate assignment or by using a builtin with
explicit alignment or compute alignment at RTL expansion time.

Note the pass that "computes" alignment is currently SSA based (it's
the CCP pass).

Richard.

> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. 
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: movmem pattern and missed alignment

2018-10-09 Thread Eric Botcazou
> It was repeatedly suggested that we _could_ derive alignment info from
> function parameter types since we rely on precise typing there for example
> for points-to analysis (albeit only for restrict qualification processing
> and for DECL_BY_REFERENCE "pointers").  That would fix the simple testcase
> that was presented here.

OK, I keep forgetting it and that would be a good compromise indeed.

-- 
Eric Botcazou


Re: movmem pattern and missed alignment

2018-10-09 Thread Alexander Monakov
On Tue, 9 Oct 2018, Richard Biener wrote:
> 
> then we cannot set the alignment of i_1 at/after k = *i_1 because doing so 
> would
> affect the alignment test which we'd then optimize away.  We'd need to 
> introduce
> a SSA copy to get a new SSA name but that would be optimized away quickly.

We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would it
work to emit it just before the memcpy

  i_2 = __builtin_assume_aligned(i_1, 4);
  __builtin_memcpy(j, i_2, 32);

in theory?

Alexander


Re: movmem pattern and missed alignment

2018-10-09 Thread Richard Biener
On Tue, Oct 9, 2018 at 11:00 AM Alexander Monakov  wrote:
>
> On Tue, 9 Oct 2018, Richard Biener wrote:
> >
> > then we cannot set the alignment of i_1 at/after k = *i_1 because doing so 
> > would
> > affect the alignment test which we'd then optimize away.  We'd need to 
> > introduce
> > a SSA copy to get a new SSA name but that would be optimized away quickly.
>
> We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would it
> work to emit it just before the memcpy
>
>   i_2 = __builtin_assume_aligned(i_1, 4);
>   __builtin_memcpy(j, i_2, 32);
>
> in theory?

That's still before RTL expansion so I'm not sure that is enough.

Richard.

>
> Alexander


Re: movmem pattern and missed alignment

2018-10-09 Thread Jakub Jelinek
On Tue, Oct 09, 2018 at 11:08:44AM +0200, Richard Biener wrote:
> On Tue, Oct 9, 2018 at 11:00 AM Alexander Monakov  wrote:
> >
> > On Tue, 9 Oct 2018, Richard Biener wrote:
> > >
> > > then we cannot set the alignment of i_1 at/after k = *i_1 because doing 
> > > so would
> > > affect the alignment test which we'd then optimize away.  We'd need to 
> > > introduce
> > > a SSA copy to get a new SSA name but that would be optimized away quickly.
> >
> > We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would 
> > it
> > work to emit it just before the memcpy
> >
> >   i_2 = __builtin_assume_aligned(i_1, 4);
> >   __builtin_memcpy(j, i_2, 32);
> >
> > in theory?
> 
> That's still before RTL expansion so I'm not sure that is enough.

But we likely won't invalidate the computed SSA_NAME_INFO afterwards.

Jakub


Re: movmem pattern and missed alignment

2018-10-09 Thread Richard Biener
On Tue, Oct 9, 2018 at 11:23 AM Jakub Jelinek  wrote:
>
> On Tue, Oct 09, 2018 at 11:08:44AM +0200, Richard Biener wrote:
> > On Tue, Oct 9, 2018 at 11:00 AM Alexander Monakov  
> > wrote:
> > >
> > > On Tue, 9 Oct 2018, Richard Biener wrote:
> > > >
> > > > then we cannot set the alignment of i_1 at/after k = *i_1 because doing 
> > > > so would
> > > > affect the alignment test which we'd then optimize away.  We'd need to 
> > > > introduce
> > > > a SSA copy to get a new SSA name but that would be optimized away 
> > > > quickly.
> > >
> > > We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so 
> > > would it
> > > work to emit it just before the memcpy
> > >
> > >   i_2 = __builtin_assume_aligned(i_1, 4);
> > >   __builtin_memcpy(j, i_2, 32);
> > >
> > > in theory?
> >
> > That's still before RTL expansion so I'm not sure that is enough.
>
> But we likely won't invalidate the computed SSA_NAME_INFO afterwards.

But we've propagated out the i_2 = i_1 copy, no?

Richard.

> Jakub


Re: movmem pattern and missed alignment

2018-10-09 Thread Joseph Myers
On Tue, 9 Oct 2018, Richard Biener wrote:

> It was repeatedly suggested that we _could_ derive alignment info from
> function parameter types since we rely on precise typing there for example
> for points-to analysis (albeit only for restrict qualification processing and
> for DECL_BY_REFERENCE "pointers").  That would fix the simple testcase
> that was presented here.

Even in that case you mustn't assume alignment for pointer comparisons, 
only for dereferences.  Assuming it for comparisons breaks e.g. glibc's

# define LC_GLOBAL_LOCALE   ((locale_t) -1L)

(locale_t is a pointer-to-pointer-aligned-struct) and other similar 
constructs involving magic constants (not dereferenced) of pointer type; 
comparisons of a locale_t value against LC_GLOBAL_LOCALE need to work.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: movmem pattern and missed alignment

2018-10-09 Thread Richard Biener
On Tue, Oct 9, 2018 at 1:53 PM Joseph Myers  wrote:
>
> On Tue, 9 Oct 2018, Richard Biener wrote:
>
> > It was repeatedly suggested that we _could_ derive alignment info from
> > function parameter types since we rely on precise typing there for example
> > for points-to analysis (albeit only for restrict qualification processing 
> > and
> > for DECL_BY_REFERENCE "pointers").  That would fix the simple testcase
> > that was presented here.
>
> Even in that case you mustn't assume alignment for pointer comparisons,
> only for dereferences.  Assuming it for comparisons breaks e.g. glibc's
>
> # define LC_GLOBAL_LOCALE   ((locale_t) -1L)
>
> (locale_t is a pointer-to-pointer-aligned-struct) and other similar
> constructs involving magic constants (not dereferenced) of pointer type;
> comparisons of a locale_t value against LC_GLOBAL_LOCALE need to work.

Heh!  That's non-conforming!

But yes, looks like it won't fly after all.

Richard.

> --
> Joseph S. Myers
> jos...@codesourcery.com