[Bug rtl-optimization/34011] Memory load is not eliminated from tight vectorized loop

dorit at gcc dot gnu dot org Wed, 07 Nov 2007 10:06:20 -0800


------- Comment #1 from dorit at gcc dot gnu dot org  2007-11-07 18:06 -------
(In reply to comment #0)
> Following testcase exposes optimization problem with current SVN gcc:
...
> the same address
> is accessed with unaligned access (3) as well as aligned access.


This is a missed-optimization in the vectorizer - we use loop-versioning to
deal with the fact that we don't yet support misaligned stores; so the
vectorized version of the loop is guarded by a runtime test that checks that
the address of the store is aligned. However, we don't use the information that
there's a load from the same address that is therefore also guaranteed to be
aligned. 

We actualy have this information (we detect DRs that have the same alignment
and collect them in STMT_VINFO_SAME_ALIGN_REFS), but we don't use it when we do
the versioning. We *do* use this information when instead of versioning the
loop, we peel the loop to make the store aligned. In this case we also mark the
relevant SAME_ALIGN_REFS as aligned and generate aligned accesses for them.

(By the way, the reason we decide to use loop-versioning and not loop-peeling
is because we can't determing whether the pointers overlap at compile time. So
we have to use runtime dependence testing (i.e. versioning for aliasing), and
since we currently don't support both versioning and peeling together, this
dictates that we will use runtime alignment testing instead of peeling.)

Here is how it looks like in the vectorizer dump file:

"
pr34011.c:14: note: === vect_analyze_dependences ===
pr34011.c:14: note: dependence distance  = 0.
pr34011.c:14: note: accesses have the same alignment.
pr34011.c:14: note: dependence distance modulo vf == 0 between *D.1529_9 and
*D.1529_9
pr34011.c:14: note: versioning for alias required: can't determine dependence
between *D.1531_14 and *D.1529_9
pr34011.c:14: note: mark for run-time aliasing test between *D.1531_14 and
*D.1529_9
...
pr34011.c:14: note: === vect_enhance_data_refs_alignment ===
pr34011.c:14: note: Unknown misalignment, is_packed = 0
pr34011.c:14: note: Alignment of access forced using versioning.
pr34011.c:14: note: Versioning for alignment will be applied.
pr34011.c:14: note: Vectorizing an unaligned access.
pr34011.c:14: note: Vectorizing an unaligned access.
"

Instead, if I add __restrict__ qualifiers to the pointer arguments, we get
this:

"
pr34011b.c:14: note: === vect_analyze_dependences ===
pr34011b.c:14: note: dependence distance  = 0.
pr34011b.c:14: note: accesses have the same alignment.
pr34011b.c:14: note: dependence distance modulo vf == 0 between *D.1529_9 and
*D.1529_9
...
pr34011b.c:14: note: === vect_enhance_data_refs_alignment ===
pr34011b.c:14: note: Unknown misalignment, is_packed = 0
...
pr34011b.c:14: note: Alignment of access forced using peeling.
pr34011b.c:14: note: Peeling for alignment will be applied.
pr34011b.c:14: note: Vectorizing an unaligned access.
"

i.e. we don't need to use runtime dependence testing and version the loop, so
we can use peeling to align the store along with anything that has the same
alignment as the store:

<bb 6>:
  MEM[base: D.1676, index: ivtmp.142] = M*(vect_p.111 +
ivtmp.142){misalignment: 0} << srcshift | MEM[base: D.1676, index: ivtmp.142];

...
> Missing IV elimination could be attributed to tree loop optimizations, but
> others are IMO RTL optimization problems, 

(except for the misaligned access, which the vectorizer can avoid).

> because we enter RTL generation with:
> bad:
> <bb 4>:
>   MEM[index: ivtmp.127] = M*(vector int *) ivtmp.130{misalignment: 0} <<
> srcshift.3 | M*(vector int *) ivtmp.127{misalignment: 0};


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34011

[Bug rtl-optimization/34011] Memory load is not eliminated from tight vectorized loop

Reply via email to