Re: [PATCH 3/4] Also propagate SRA accesses from LHS to RHS (PR 92706)

2020-01-14 Thread Richard Biener
On Mon, 13 Jan 2020, Martin Jambor wrote:

> Hi,
> 
> sorry for taking so long to reply...
> 
> On Wed, Dec 18 2019, Richard Biener wrote:
> > On December 17, 2019 1:43:15 PM GMT+01:00, Martin Jambor  
> > wrote:
> >>Hi,
> >>
> >>the previous patch unfortunately does not fix the first testcase in PR
> >>92706 and since I am afraid it might be the important one, I also
> >>focused on that.  The issue here is again total scalarization accesses
> >>clashing with those representing accesses in the IL - on another
> >>aggregate but here the sides are switched.  Whereas in the previous
> >>case the total scalarization accesses prevented propagation along
> >>assignments, here we have the user accesses on the LHS, so even though
> >>we do not create anything there, we totally scalarize the RHS and
> >>again end up with assignments with different scalarizations leading to
> >>bad code.
> >>
> >>So we clearly need to propagate information about accesses from RHSs
> >>to LHSs too, which the patch below does.  Because the intent is only
> >>preventing bad total scalarization - which the last patch now performs
> >>late enough - and do not care about grp_write flag and so forth, the
> >>propagation is a bit simpler and so I did not try to unify all of the
> >>code for both directions.
> >
> >  But can we really propagate the directions independently? Lacc to racc 
> > propagation would induce accesses to different racc to Lacc branches of the 
> > access tree of the parent, no? So in full generality the access links Form 
> > an undirected graph where you perform propagation in both directions of 
> > edges (and you'd have to consider cycles). 'linked parts' of the graph then 
> > need to have the same (or at least a compatible) scalarization, and three 
> > would be the possibility to compute the optimal 'conflict border' where to 
> > fix the conflict we'd keep one node in the graph unscalarized. 
> >
> > The way you did it might be sufficient in practice of course and we should 
> > probably go with that for now?
> 
> I think it should be - at least I think I would not be able to come up
> with an implementation quickly enough for GCC 10 - I assumed that was
> the target.  And also because there is that one important difference
> between the propagation, the RHS->LHS also propagates
> "actually-contains-data" whereas the other direction is just to prevent
> total scalarization to create conflicts - and it is sufficient to do
> that and I suppose in any search for optimal scalarization we'd give
> total scalarization the smallest priority anyway.

OK, sounds reasonable.

The patch is OK then, but I'll wait for your updated series.

Thanks,
Richard.

> Thanks,
> 
> Martin
> 
> >
> > Richard. 
> >
> >>I still think that even with this patch the total scalarization has to
> >>follow the declared type of the aggregate and cannot be done using
> >>integers of the biggest suitable power, at least in early SRA, because
> >>these propagations of course do not work interprocedurally but
> >>inlining can and does eventually bring accesses from two functions
> >>together which could (and IMHO would) lead to same problems.
> >>
> >>Bootstrapped and LTO-bootstrapped and tested on an x86_64-linux and
> >>bootstrapped and tested it on aarch64 and i686 (except that on i686
> >>the testcase will need to be skipped because __int128_t is not
> >>available there).  I expect that review will lead to requests to
> >>change things but as far as I am concerned, this is ready for trunk
> >>too.
> >>
> >>Thanks,
> >>
> >>Martin
> >>
> >>2019-12-11  Martin Jambor  
> >>
> >>PR tree-optimization/92706
> >>* tree-sra.c (struct access): Fields first_link, last_link,
> >>next_queued and grp_queued renamed to first_rhs_link, last_rhs_link,
> >>next_rhs_queued and grp_rhs_queued respectively, new fields
> >>first_lhs_link, last_lhs_link, next_lhs_queued and grp_lhs_queued.
> >>(struct assign_link): Field next renamed to next_rhs, new field
> >>next_lhs.  Updated comment.
> >>(work_queue_head): Renamed to rhs_work_queue_head.
> >>(lhs_work_queue_head): New variable.
> >>(add_link_to_lhs): New function.
> >>(relink_to_new_repr): Also relink LHS lists.
> >>(add_access_to_work_queue): Renamed to add_access_to_rhs_work_queue.
> >>(add_access_to_lhs_work_queue): New function.
> >>(pop_access_from_work_queue): Renamed to
> >>pop_access_from_rhs_work_queue.
> >>(pop_access_from_lhs_work_queue): New function.
> >>(build_accesses_from_assign): Also add links to LHS lists and to LHS
> >>work_queue.
> >>(child_would_conflict_in_lacc): Renamed to
> >>child_would_conflict_in_acc.  Adjusted parameter names.
> >>(create_artificial_child_access): New parameter set_grp_read, use it.
> >>(subtree_mark_written_and_enqueue): Renamed to
> >>subtree_mark_written_and_rhs_enqueue.
> >>(propagate_subaccesses_across_link): Renamed to
> >>propagate_subaccesses_from_rhs.
> >>   

Re: [PATCH 3/4] Also propagate SRA accesses from LHS to RHS (PR 92706)

2020-01-13 Thread Martin Jambor
Hi,

sorry for taking so long to reply...

On Wed, Dec 18 2019, Richard Biener wrote:
> On December 17, 2019 1:43:15 PM GMT+01:00, Martin Jambor  
> wrote:
>>Hi,
>>
>>the previous patch unfortunately does not fix the first testcase in PR
>>92706 and since I am afraid it might be the important one, I also
>>focused on that.  The issue here is again total scalarization accesses
>>clashing with those representing accesses in the IL - on another
>>aggregate but here the sides are switched.  Whereas in the previous
>>case the total scalarization accesses prevented propagation along
>>assignments, here we have the user accesses on the LHS, so even though
>>we do not create anything there, we totally scalarize the RHS and
>>again end up with assignments with different scalarizations leading to
>>bad code.
>>
>>So we clearly need to propagate information about accesses from RHSs
>>to LHSs too, which the patch below does.  Because the intent is only
>>preventing bad total scalarization - which the last patch now performs
>>late enough - and do not care about grp_write flag and so forth, the
>>propagation is a bit simpler and so I did not try to unify all of the
>>code for both directions.
>
>  But can we really propagate the directions independently? Lacc to racc 
> propagation would induce accesses to different racc to Lacc branches of the 
> access tree of the parent, no? So in full generality the access links Form an 
> undirected graph where you perform propagation in both directions of edges 
> (and you'd have to consider cycles). 'linked parts' of the graph then need to 
> have the same (or at least a compatible) scalarization, and three would be 
> the possibility to compute the optimal 'conflict border' where to fix the 
> conflict we'd keep one node in the graph unscalarized. 
>
> The way you did it might be sufficient in practice of course and we should 
> probably go with that for now?

I think it should be - at least I think I would not be able to come up
with an implementation quickly enough for GCC 10 - I assumed that was
the target.  And also because there is that one important difference
between the propagation, the RHS->LHS also propagates
"actually-contains-data" whereas the other direction is just to prevent
total scalarization to create conflicts - and it is sufficient to do
that and I suppose in any search for optimal scalarization we'd give
total scalarization the smallest priority anyway.

Thanks,

Martin

>
> Richard. 
>
>>I still think that even with this patch the total scalarization has to
>>follow the declared type of the aggregate and cannot be done using
>>integers of the biggest suitable power, at least in early SRA, because
>>these propagations of course do not work interprocedurally but
>>inlining can and does eventually bring accesses from two functions
>>together which could (and IMHO would) lead to same problems.
>>
>>Bootstrapped and LTO-bootstrapped and tested on an x86_64-linux and
>>bootstrapped and tested it on aarch64 and i686 (except that on i686
>>the testcase will need to be skipped because __int128_t is not
>>available there).  I expect that review will lead to requests to
>>change things but as far as I am concerned, this is ready for trunk
>>too.
>>
>>Thanks,
>>
>>Martin
>>
>>2019-12-11  Martin Jambor  
>>
>>  PR tree-optimization/92706
>>  * tree-sra.c (struct access): Fields first_link, last_link,
>>  next_queued and grp_queued renamed to first_rhs_link, last_rhs_link,
>>  next_rhs_queued and grp_rhs_queued respectively, new fields
>>  first_lhs_link, last_lhs_link, next_lhs_queued and grp_lhs_queued.
>>  (struct assign_link): Field next renamed to next_rhs, new field
>>  next_lhs.  Updated comment.
>>  (work_queue_head): Renamed to rhs_work_queue_head.
>>  (lhs_work_queue_head): New variable.
>>  (add_link_to_lhs): New function.
>>  (relink_to_new_repr): Also relink LHS lists.
>>  (add_access_to_work_queue): Renamed to add_access_to_rhs_work_queue.
>>  (add_access_to_lhs_work_queue): New function.
>>  (pop_access_from_work_queue): Renamed to
>>  pop_access_from_rhs_work_queue.
>>  (pop_access_from_lhs_work_queue): New function.
>>  (build_accesses_from_assign): Also add links to LHS lists and to LHS
>>  work_queue.
>>  (child_would_conflict_in_lacc): Renamed to
>>  child_would_conflict_in_acc.  Adjusted parameter names.
>>  (create_artificial_child_access): New parameter set_grp_read, use it.
>>  (subtree_mark_written_and_enqueue): Renamed to
>>  subtree_mark_written_and_rhs_enqueue.
>>  (propagate_subaccesses_across_link): Renamed to
>>  propagate_subaccesses_from_rhs.
>>  (propagate_subaccesses_from_lhs): New function.
>>  (propagate_all_subaccesses): Also propagate subaccesses from LHSs to
>>  RHSs.
>>
>>  testsuite/
>>  * gcc.dg/tree-ssa/pr92706-1.c: New test.


Re: [PATCH 3/4] Also propagate SRA accesses from LHS to RHS (PR 92706)

2019-12-18 Thread Richard Biener
On December 17, 2019 1:43:15 PM GMT+01:00, Martin Jambor  
wrote:
>Hi,
>
>the previous patch unfortunately does not fix the first testcase in PR
>92706 and since I am afraid it might be the important one, I also
>focused on that.  The issue here is again total scalarization accesses
>clashing with those representing accesses in the IL - on another
>aggregate but here the sides are switched.  Whereas in the previous
>case the total scalarization accesses prevented propagation along
>assignments, here we have the user accesses on the LHS, so even though
>we do not create anything there, we totally scalarize the RHS and
>again end up with assignments with different scalarizations leading to
>bad code.
>
>So we clearly need to propagate information about accesses from RHSs
>to LHSs too, which the patch below does.  Because the intent is only
>preventing bad total scalarization - which the last patch now performs
>late enough - and do not care about grp_write flag and so forth, the
>propagation is a bit simpler and so I did not try to unify all of the
>code for both directions.

 But can we really propagate the directions independently? Lacc to racc 
propagation would induce accesses to different racc to Lacc branches of the 
access tree of the parent, no? So in full generality the access links Form an 
undirected graph where you perform propagation in both directions of edges (and 
you'd have to consider cycles). 'linked parts' of the graph then need to have 
the same (or at least a compatible) scalarization, and three would be the 
possibility to compute the optimal 'conflict border' where to fix the conflict 
we'd keep one node in the graph unscalarized. 

The way you did it might be sufficient in practice of course and we should 
probably go with that for now?

Richard. 

>I still think that even with this patch the total scalarization has to
>follow the declared type of the aggregate and cannot be done using
>integers of the biggest suitable power, at least in early SRA, because
>these propagations of course do not work interprocedurally but
>inlining can and does eventually bring accesses from two functions
>together which could (and IMHO would) lead to same problems.
>
>Bootstrapped and LTO-bootstrapped and tested on an x86_64-linux and
>bootstrapped and tested it on aarch64 and i686 (except that on i686
>the testcase will need to be skipped because __int128_t is not
>available there).  I expect that review will lead to requests to
>change things but as far as I am concerned, this is ready for trunk
>too.
>
>Thanks,
>
>Martin
>
>2019-12-11  Martin Jambor  
>
>   PR tree-optimization/92706
>   * tree-sra.c (struct access): Fields first_link, last_link,
>   next_queued and grp_queued renamed to first_rhs_link, last_rhs_link,
>   next_rhs_queued and grp_rhs_queued respectively, new fields
>   first_lhs_link, last_lhs_link, next_lhs_queued and grp_lhs_queued.
>   (struct assign_link): Field next renamed to next_rhs, new field
>   next_lhs.  Updated comment.
>   (work_queue_head): Renamed to rhs_work_queue_head.
>   (lhs_work_queue_head): New variable.
>   (add_link_to_lhs): New function.
>   (relink_to_new_repr): Also relink LHS lists.
>   (add_access_to_work_queue): Renamed to add_access_to_rhs_work_queue.
>   (add_access_to_lhs_work_queue): New function.
>   (pop_access_from_work_queue): Renamed to
>   pop_access_from_rhs_work_queue.
>   (pop_access_from_lhs_work_queue): New function.
>   (build_accesses_from_assign): Also add links to LHS lists and to LHS
>   work_queue.
>   (child_would_conflict_in_lacc): Renamed to
>   child_would_conflict_in_acc.  Adjusted parameter names.
>   (create_artificial_child_access): New parameter set_grp_read, use it.
>   (subtree_mark_written_and_enqueue): Renamed to
>   subtree_mark_written_and_rhs_enqueue.
>   (propagate_subaccesses_across_link): Renamed to
>   propagate_subaccesses_from_rhs.
>   (propagate_subaccesses_from_lhs): New function.
>   (propagate_all_subaccesses): Also propagate subaccesses from LHSs to
>   RHSs.
>
>   testsuite/
>   * gcc.dg/tree-ssa/pr92706-1.c: New test.
>---
> gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c |  17 ++
> gcc/tree-sra.c| 316 --
> 2 files changed, 253 insertions(+), 80 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c
>
>diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c
>b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c
>new file mode 100644
>index 000..c36d103798e
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c
>@@ -0,0 +1,17 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-esra-details" } */
>+
>+struct S { int i[4]; } __attribute__((aligned(128)));
>+typedef __int128_t my_int128 __attribute__((may_alias));
>+__int128_t load (void *p)
>+{
>+  struct S v;
>+  __builtin_memcpy (&v, p, sizeof (struct S));
>

[PATCH 3/4] Also propagate SRA accesses from LHS to RHS (PR 92706)

2019-12-17 Thread Martin Jambor
Hi,

the previous patch unfortunately does not fix the first testcase in PR
92706 and since I am afraid it might be the important one, I also
focused on that.  The issue here is again total scalarization accesses
clashing with those representing accesses in the IL - on another
aggregate but here the sides are switched.  Whereas in the previous
case the total scalarization accesses prevented propagation along
assignments, here we have the user accesses on the LHS, so even though
we do not create anything there, we totally scalarize the RHS and
again end up with assignments with different scalarizations leading to
bad code.

So we clearly need to propagate information about accesses from RHSs
to LHSs too, which the patch below does.  Because the intent is only
preventing bad total scalarization - which the last patch now performs
late enough - and do not care about grp_write flag and so forth, the
propagation is a bit simpler and so I did not try to unify all of the
code for both directions.

I still think that even with this patch the total scalarization has to
follow the declared type of the aggregate and cannot be done using
integers of the biggest suitable power, at least in early SRA, because
these propagations of course do not work interprocedurally but
inlining can and does eventually bring accesses from two functions
together which could (and IMHO would) lead to same problems.

Bootstrapped and LTO-bootstrapped and tested on an x86_64-linux and
bootstrapped and tested it on aarch64 and i686 (except that on i686
the testcase will need to be skipped because __int128_t is not
available there).  I expect that review will lead to requests to
change things but as far as I am concerned, this is ready for trunk
too.

Thanks,

Martin

2019-12-11  Martin Jambor  

PR tree-optimization/92706
* tree-sra.c (struct access): Fields first_link, last_link,
next_queued and grp_queued renamed to first_rhs_link, last_rhs_link,
next_rhs_queued and grp_rhs_queued respectively, new fields
first_lhs_link, last_lhs_link, next_lhs_queued and grp_lhs_queued.
(struct assign_link): Field next renamed to next_rhs, new field
next_lhs.  Updated comment.
(work_queue_head): Renamed to rhs_work_queue_head.
(lhs_work_queue_head): New variable.
(add_link_to_lhs): New function.
(relink_to_new_repr): Also relink LHS lists.
(add_access_to_work_queue): Renamed to add_access_to_rhs_work_queue.
(add_access_to_lhs_work_queue): New function.
(pop_access_from_work_queue): Renamed to
pop_access_from_rhs_work_queue.
(pop_access_from_lhs_work_queue): New function.
(build_accesses_from_assign): Also add links to LHS lists and to LHS
work_queue.
(child_would_conflict_in_lacc): Renamed to
child_would_conflict_in_acc.  Adjusted parameter names.
(create_artificial_child_access): New parameter set_grp_read, use it.
(subtree_mark_written_and_enqueue): Renamed to
subtree_mark_written_and_rhs_enqueue.
(propagate_subaccesses_across_link): Renamed to
propagate_subaccesses_from_rhs.
(propagate_subaccesses_from_lhs): New function.
(propagate_all_subaccesses): Also propagate subaccesses from LHSs to
RHSs.

testsuite/
* gcc.dg/tree-ssa/pr92706-1.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c |  17 ++
 gcc/tree-sra.c| 316 --
 2 files changed, 253 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c
new file mode 100644
index 000..c36d103798e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-esra-details" } */
+
+struct S { int i[4]; } __attribute__((aligned(128)));
+typedef __int128_t my_int128 __attribute__((may_alias));
+__int128_t load (void *p)
+{
+  struct S v;
+  __builtin_memcpy (&v, p, sizeof (struct S));
+  struct S u;
+  u = v;
+  struct S w;
+  w = u;
+  return *(my_int128 *)&w;
+}
+
+/* { dg-final { scan-tree-dump-not "Created a replacement for u offset: 
\[^0\]" "esra" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 0f28157456c..9f087e5c27a 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -167,11 +167,15 @@ struct access
   struct access *next_sibling;
 
   /* Pointers to the first and last element in the linked list of assign
- links.  */
-  struct assign_link *first_link, *last_link;
+ links for propagation from LHS to RHS.  */
+  struct assign_link *first_rhs_link, *last_rhs_link;
 
-  /* Pointer to the next access in the work queue.  */
-  struct access *next_queued;
+  /* Pointers to the first and last element in the linked list of assign
+ links for propagation from LHS to RHS.  */
+  struct a