------- Comment #1 from dorit at gcc dot gnu dot org 2007-11-07 18:06 ------- (In reply to comment #0) > Following testcase exposes optimization problem with current SVN gcc: ... > the same address > is accessed with unaligned access (3) as well as aligned access.
This is a missed-optimization in the vectorizer - we use loop-versioning to deal with the fact that we don't yet support misaligned stores; so the vectorized version of the loop is guarded by a runtime test that checks that the address of the store is aligned. However, we don't use the information that there's a load from the same address that is therefore also guaranteed to be aligned. We actualy have this information (we detect DRs that have the same alignment and collect them in STMT_VINFO_SAME_ALIGN_REFS), but we don't use it when we do the versioning. We *do* use this information when instead of versioning the loop, we peel the loop to make the store aligned. In this case we also mark the relevant SAME_ALIGN_REFS as aligned and generate aligned accesses for them. (By the way, the reason we decide to use loop-versioning and not loop-peeling is because we can't determing whether the pointers overlap at compile time. So we have to use runtime dependence testing (i.e. versioning for aliasing), and since we currently don't support both versioning and peeling together, this dictates that we will use runtime alignment testing instead of peeling.) Here is how it looks like in the vectorizer dump file: " pr34011.c:14: note: === vect_analyze_dependences === pr34011.c:14: note: dependence distance = 0. pr34011.c:14: note: accesses have the same alignment. pr34011.c:14: note: dependence distance modulo vf == 0 between *D.1529_9 and *D.1529_9 pr34011.c:14: note: versioning for alias required: can't determine dependence between *D.1531_14 and *D.1529_9 pr34011.c:14: note: mark for run-time aliasing test between *D.1531_14 and *D.1529_9 ... pr34011.c:14: note: === vect_enhance_data_refs_alignment === pr34011.c:14: note: Unknown misalignment, is_packed = 0 pr34011.c:14: note: Alignment of access forced using versioning. pr34011.c:14: note: Versioning for alignment will be applied. pr34011.c:14: note: Vectorizing an unaligned access. pr34011.c:14: note: Vectorizing an unaligned access. " Instead, if I add __restrict__ qualifiers to the pointer arguments, we get this: " pr34011b.c:14: note: === vect_analyze_dependences === pr34011b.c:14: note: dependence distance = 0. pr34011b.c:14: note: accesses have the same alignment. pr34011b.c:14: note: dependence distance modulo vf == 0 between *D.1529_9 and *D.1529_9 ... pr34011b.c:14: note: === vect_enhance_data_refs_alignment === pr34011b.c:14: note: Unknown misalignment, is_packed = 0 ... pr34011b.c:14: note: Alignment of access forced using peeling. pr34011b.c:14: note: Peeling for alignment will be applied. pr34011b.c:14: note: Vectorizing an unaligned access. " i.e. we don't need to use runtime dependence testing and version the loop, so we can use peeling to align the store along with anything that has the same alignment as the store: <bb 6>: MEM[base: D.1676, index: ivtmp.142] = M*(vect_p.111 + ivtmp.142){misalignment: 0} << srcshift | MEM[base: D.1676, index: ivtmp.142]; ... > Missing IV elimination could be attributed to tree loop optimizations, but > others are IMO RTL optimization problems, (except for the misaligned access, which the vectorizer can avoid). > because we enter RTL generation with: > bad: > <bb 4>: > MEM[index: ivtmp.127] = M*(vector int *) ivtmp.130{misalignment: 0} << > srcshift.3 | M*(vector int *) ivtmp.127{misalignment: 0}; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34011