[PATCH] Fix store merging (PR tree-optimization/84503)

2018-02-22 Thread Jakub Jelinek
Hi!

The long comment above the new check_no_overlap function below should
hopefully explain the problem.  coalesce_immediate_stores is splitting the
stores from the same base into groups that we are going to optimize
individually; for each successfully merged group we emit new stores at the
location of the last store from that group.  And coalesce_immediate_stores
processes stores ordered primarily by increasing bit position and only
secondarily by the original ordering in the basic block.
On the following testcases, we have the unfortunate ordering of:
 MEM[(long long int *)p_28] = 0;
 MEM[(long long int *)p_28 + 8B] = 0;
 MEM[(long long int *)p_28 + 16B] = 0;
 MEM[(long long int *)p_28 + 24B] = 0;
 _129 = (int) _130;
 MEM[(int *)p_28 + 8B] = _129;
 MEM[(int *)p_28].a = -1;
We already have
 MEM[(long long int *)p_28] = 0;
 MEM[(int *)p_28].a = -1;
in the first group (which is fine) and the following 2 stores to consider
are
 MEM[(long long int *)p_28 + 8B] = 0;
and
 MEM[(int *)p_28 + 8B] = _129;
We add the first store to the group, which has the effect of emitting the
merged stores at the location of former MEM[(int *)p_28].a = -1;
store.  Then we process MEM[(int *)p_28 + 8B] = _129; (which was handled
just as potential bswap possibility and as it overlaps with previous
and can't be merged with next either, it will be its own group and thus
reuse the original stmt; the net effect is that we've reordered the
MEM[(long long int *)p_28 + 8B] = 0; store from before the
MEM[(int *)p_28 + 8B] = _129; store to after it, but those two do alias.

Instead of detecting this situation late, where we could just throw away the
whole group (which would mean we couldn't merge even the
MEM[(long long int *)p_28] = 0; and MEM[(int *)p_28].a = -1; stores)
this patch attempts to check before calling merge_into whether it will
not overlap with later stores we won't be able to emit in the same group
and if it does, it terminates the group right away.

I had to fix also the merged_store_group::merge_into width computation,
it was mistakenly just counting sum of the bits we've added to the group,
but we really need the distance between the first bit in the group and last
bit, including any gaps in between, but exclusing gaps before the start and
after the end (still within the bitregion).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

For 7.x I think we'll need something along the lines of PR82916 instead,
while the same testcase fails there, we only handle stores with lhs being
a constant and so the problem is that we aren't terminating the chain when
we should on the variable store.

2018-02-21  Jakub Jelinek  

PR tree-optimization/84503
* gimple-ssa-store-merging.c (merged_store_group::merge_into): Compute
width as info->bitpos + info->bitsize - start.
(merged_store_group::merge_overlapping): Simplify width computation.
(check_no_overlap): New function.
(imm_store_chain_info::try_coalesce_bswap): Compute expected
start + width and last_order of the group, fail if check_no_overlap
fails.
(imm_store_chain_info::coalesce_immediate_stores): Don't merge info
to group if check_no_overlap fails.

* gcc.dg/pr84503-1.c: New test.
* gcc.dg/pr84503-2.c: New test.

--- gcc/gimple-ssa-store-merging.c.jj   2018-01-16 09:52:26.230235131 +0100
+++ gcc/gimple-ssa-store-merging.c  2018-02-21 20:13:45.239129599 +0100
@@ -1891,12 +1891,11 @@ merged_store_group::do_merge (store_imme
 void
 merged_store_group::merge_into (store_immediate_info *info)
 {
-  unsigned HOST_WIDE_INT wid = info->bitsize;
   /* Make sure we're inserting in the position we think we're inserting.  */
   gcc_assert (info->bitpos >= start + width
  && info->bitregion_start <= bitregion_end);
 
-  width += wid;
+  width = info->bitpos + info->bitsize - start;
   do_merge (info);
 }
 
@@ -1909,7 +1908,7 @@ merged_store_group::merge_overlapping (s
 {
   /* If the store extends the size of the group, extend the width.  */
   if (info->bitpos + info->bitsize > start + width)
-width += info->bitpos + info->bitsize - (start + width);
+width = info->bitpos + info->bitsize - start;
 
   do_merge (info);
 }
@@ -2304,6 +2303,55 @@ gather_bswap_load_refs (vec *refs,
 }
 }
 
+/* Check if there are any stores in M_STORE_INFO after index I
+   (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap
+   a potential group ending with END that have their order
+   smaller than LAST_ORDER.  RHS_CODE is the kind of store in the
+   group.  Return true if there are no such stores.
+   Consider:
+ MEM[(long long int *)p_28] = 0;
+ MEM[(long long int *)p_28 + 8B] = 0;
+ MEM[(long long int *)p_28 + 16B] = 0;
+ MEM[(long long int *)p_28 + 24B] = 0;
+ _129 = (int) _130;
+ MEM[(int *)p_28 + 8B] = _129;
+ MEM[(int *)p_28].a = -1;
+   We already have
+

Re: [PATCH] Fix store merging (PR tree-optimization/84503)

2018-02-21 Thread Richard Biener
On February 21, 2018 11:28:36 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>The long comment above the new check_no_overlap function below should
>hopefully explain the problem.  coalesce_immediate_stores is splitting
>the
>stores from the same base into groups that we are going to optimize
>individually; for each successfully merged group we emit new stores at
>the
>location of the last store from that group.  And
>coalesce_immediate_stores
>processes stores ordered primarily by increasing bit position and only
>secondarily by the original ordering in the basic block.
>On the following testcases, we have the unfortunate ordering of:
> MEM[(long long int *)p_28] = 0;
> MEM[(long long int *)p_28 + 8B] = 0;
> MEM[(long long int *)p_28 + 16B] = 0;
> MEM[(long long int *)p_28 + 24B] = 0;
> _129 = (int) _130;
> MEM[(int *)p_28 + 8B] = _129;
> MEM[(int *)p_28].a = -1;
>We already have
> MEM[(long long int *)p_28] = 0;
> MEM[(int *)p_28].a = -1;
>in the first group (which is fine) and the following 2 stores to
>consider
>are
> MEM[(long long int *)p_28 + 8B] = 0;
>and
> MEM[(int *)p_28 + 8B] = _129;
>We add the first store to the group, which has the effect of emitting
>the
>merged stores at the location of former MEM[(int *)p_28].a = -1;
>store.  Then we process MEM[(int *)p_28 + 8B] = _129; (which was
>handled
>just as potential bswap possibility and as it overlaps with previous
>and can't be merged with next either, it will be its own group and thus
>reuse the original stmt; the net effect is that we've reordered the
>MEM[(long long int *)p_28 + 8B] = 0; store from before the
>MEM[(int *)p_28 + 8B] = _129; store to after it, but those two do
>alias.
>
>Instead of detecting this situation late, where we could just throw
>away the
>whole group (which would mean we couldn't merge even the
>MEM[(long long int *)p_28] = 0; and MEM[(int *)p_28].a = -1; stores)
>this patch attempts to check before calling merge_into whether it will
>not overlap with later stores we won't be able to emit in the same
>group
>and if it does, it terminates the group right away.
>
>I had to fix also the merged_store_group::merge_into width computation,
>it was mistakenly just counting sum of the bits we've added to the
>group,
>but we really need the distance between the first bit in the group and
>last
>bit, including any gaps in between, but exclusing gaps before the start
>and
>after the end (still within the bitregion).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Richard. 

>For 7.x I think we'll need something along the lines of PR82916
>instead,
>while the same testcase fails there, we only handle stores with lhs
>being
>a constant and so the problem is that we aren't terminating the chain
>when
>we should on the variable store.
>
>2018-02-21  Jakub Jelinek  
>
>   PR tree-optimization/84503
>   * gimple-ssa-store-merging.c (merged_store_group::merge_into): Compute
>   width as info->bitpos + info->bitsize - start.
>   (merged_store_group::merge_overlapping): Simplify width computation.
>   (check_no_overlap): New function.
>   (imm_store_chain_info::try_coalesce_bswap): Compute expected
>   start + width and last_order of the group, fail if check_no_overlap
>   fails.
>   (imm_store_chain_info::coalesce_immediate_stores): Don't merge info
>   to group if check_no_overlap fails.
>
>   * gcc.dg/pr84503-1.c: New test.
>   * gcc.dg/pr84503-2.c: New test.
>
>--- gcc/gimple-ssa-store-merging.c.jj  2018-01-16 09:52:26.230235131
>+0100
>+++ gcc/gimple-ssa-store-merging.c 2018-02-21 20:13:45.239129599 +0100
>@@ -1891,12 +1891,11 @@ merged_store_group::do_merge (store_imme
> void
> merged_store_group::merge_into (store_immediate_info *info)
> {
>-  unsigned HOST_WIDE_INT wid = info->bitsize;
>/* Make sure we're inserting in the position we think we're inserting. 
>*/
>   gcc_assert (info->bitpos >= start + width
> && info->bitregion_start <= bitregion_end);
> 
>-  width += wid;
>+  width = info->bitpos + info->bitsize - start;
>   do_merge (info);
> }
> 
>@@ -1909,7 +1908,7 @@ merged_store_group::merge_overlapping (s
> {
>   /* If the store extends the size of the group, extend the width.  */
>   if (info->bitpos + info->bitsize > start + width)
>-width += info->bitpos + info->bitsize - (start + width);
>+width = info->bitpos + info->bitsize - start;
> 
>   do_merge (info);
> }
>@@ -2304,6 +2303,55 @@ gather_bswap_load_refs (vec *refs,
> }
> }
> 
>+/* Check if there are any stores in M_STORE_INFO after index I
>+   (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap
>+   a potential group ending with END that have their order
>+   smaller than LAST_ORDER.  RHS_CODE is the kind of store in the
>+   group.  Return true if there are no such stores.
>+   Consider:
>+ MEM[(long long int *)p_28] = 0;
>+ MEM[(long long int *)p_28 + 8B] =