[Bug tree-optimization/59594] [4.9 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59594 Jakub Jelinek changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from Jakub Jelinek --- Fixed.
[Bug tree-optimization/59594] [4.9 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59594 --- Comment #7 from Jakub Jelinek --- Author: jakub Date: Wed Jan 29 09:27:43 2014 New Revision: 207225 URL: http://gcc.gnu.org/viewcvs?rev=207225&root=gcc&view=rev Log: PR tree-optimization/59594 * tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Sort a copy of the datarefs vector rather than the vector itself. * gcc.dg/vect/no-vfa-vect-depend-2.c: New test. * gcc.dg/vect/no-vfa-vect-depend-3.c: New test. * gcc.dg/vect/pr59594.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c trunk/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c trunk/gcc/testsuite/gcc.dg/vect/pr59594.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-data-refs.c
[Bug tree-optimization/59594] [4.9 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59594 --- Comment #6 from Jakub Jelinek --- On: #define N 1024 int ia[N + 1]; int ib[N + 1]; void f1 (void) { int i; for (i = 0; i < N; i++) { ia[i + 1] = 1; ib[i] = ia[i]; } } void f2 (void) { int i; for (i = 0; i < N; i++) { ia[i] = 1; ib[i] = ia[i + 1]; } } void f3 (void) { int i; for (i = N - 1; i >= 0; i--) { ia[i + 1] = 1; ib[i] = ia[i]; } } void f4 (void) { int i; for (i = N - 1; i >= 0; i--) { ia[i] = 1; ib[i] = ia[i + 1]; } } we properly vectorize f2 and f3 where the write/read DDR is DDR_REVERSED_P and not f1/f4. On #define N 1024 int ia[N + 1]; int ib[N + 1]; void f1 (void) { int i; for (i = 0; i < N; i++) { ia[i + 1] = 1; ia[i] = 2; } } void f2 (void) { int i; for (i = 0; i < N; i++) { ia[i] = 1; ia[i + 1] = 2; } } void f3 (void) { int i; for (i = N - 1; i >= 0; i--) { ia[i + 1] = 1; ia[i] = 2; } } void f4 (void) { int i; for (i = N - 1; i >= 0; i--) { ia[i] = 1; ia[i + 1] = 2; } } we don't vectorize f1 and f2, in both cases for the write/write DDR DDR_REVERSED_P is false, and vectorize f3/f4, where DDR_REVERSED_P is true in both cases. f2 and f3 shouldn't be vectorizable (at least not as is, when we'd be trying to vectorize the two stores just by putting a vector store at that position), f1 and f4 can. So, this leads me to believe that for write/write we don't have a way to differentiate between the bad and good cases using dist > 0 && DDR_REVERSED_P test. In that case, I'd think best would be to not ignore dist > 0 && DDR_REVERSED_P (ddr) ddrs if (!DR_IS_READ (dra) && !DR_IS_READ (drb)).
[Bug tree-optimization/59594] [4.9 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59594 --- Comment #5 from Jakub Jelinek --- Created attachment 31919 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31919&action=edit gcc49-pr59594.patch Untested patch for discussion. The reason why we (incorrectly) vectorize the testcase is that we ignore the data dependency, on the testcase both the b[a] read vs. b[a+1] store and b[a] store vs. b[a+1] store DDRs have dist 1 and DDR_REVERSED_P set and we ignore those. Now on say: int printf (const char *, ...); int a; static int b[1024]; int main () { for (a = 0; a <= 512; a++) { b[a - 1] = b[a]; b[a] = 1; } printf ("%d\n", b[1]); return 0; } only the b[a] read vs. b[a-1] store is dist 1 DDR_REVERSED_P and b[a] store vs. b[a-1] store is dist 1 !DDR_REVERSED_P, thus we don't vectorize it (correctly). Unfortunately not ignoring dist > 0 && DDR_REVERSED_P ddrs for negative step regresses the testcase I've attached, where there is a write after read ddr and it works properly with the current check. While the attached patch keeps that testcase (no-vfa-vect-depend*.c) working and fixes the test (pr59594.c), the conditions are piled completely randomly, I'm afraid I don't know why it is so, if for the DDR_REVERSED_P continue it matters whether step is positive or negative, or if that is irrelevant and all the write after write DDR_REVERSED_P ddrs need to be checked normally (abs (dist) >= *max_vf), or if say only write after read should be treated as the code treats it right now and even read after write is problematic. The DDR_REVERSED_P stuff has been added in 2007 for PR32377, see e.g. http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01067.html Richard, any ideas?
[Bug tree-optimization/59594] [4.9 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59594 --- Comment #4 from Jakub Jelinek --- You're right, r206180 fixed this (i.e. that it FAILs at runtime even for -mtune=generic). In any case, sounds like this is a problem in determination of the aliasing, we should have refused to vectorize this (of course unless we are smart enough to find out all the b[a] = 1; stores but the b[0] = 1; is redundant, but I suppose the vectorizer isn't the right place to do that optimization.
[Bug tree-optimization/59594] [4.9 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59594 --- Comment #3 from H.J. Lu --- (In reply to Jakub Jelinek from comment #2) > (In reply to H.J. Lu from comment #1) > > It is caused by r204062. > > That is only a part of the story, this testcase seems to be interesting. > > Starting with r204062 until r204560 this has been broken because of an ldist > bug. > r204561 fixed that again and the testcase worked until r206147. > r206148 (aka negative step vectorization support) started to ICE on this. > With r206178 it stopped ICEing and works again with -O3 -mtune=generic, not > sure if that has been intentional to change the generic tuning with that > patch. H.J.? There was a typo in r206178 and was fixed by r206180: http://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ff385162ee38caf3100ba4a0f682241c3b0d681d Can you try r206180 on this? > Anyway, with -O3 -mtune=core-avx2 e.g. r206178 still ICEs and r206179 (which > fixed the ICE) starts the miscompilation.
[Bug tree-optimization/59594] [4.9 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59594 Jakub Jelinek changed: What|Removed |Added CC||hjl.tools at gmail dot com, ||jakub at gcc dot gnu.org, ||meibf at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- (In reply to H.J. Lu from comment #1) > It is caused by r204062. That is only a part of the story, this testcase seems to be interesting. Starting with r204062 until r204560 this has been broken because of an ldist bug. r204561 fixed that again and the testcase worked until r206147. r206148 (aka negative step vectorization support) started to ICE on this. With r206178 it stopped ICEing and works again with -O3 -mtune=generic, not sure if that has been intentional to change the generic tuning with that patch. H.J.? Anyway, with -O3 -mtune=core-avx2 e.g. r206178 still ICEs and r206179 (which fixed the ICE) starts the miscompilation.
[Bug tree-optimization/59594] [4.9 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59594 H.J. Lu changed: What|Removed |Added CC||rguenther at suse dot de --- Comment #1 from H.J. Lu --- It is caused by r204062.
[Bug tree-optimization/59594] [4.9 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59594 H.J. Lu changed: What|Removed |Added Priority|P3 |P1 Status|UNCONFIRMED |NEW Last reconfirmed||2013-12-24 Target Milestone|--- |4.9.0 Summary|wrong code (by tree |[4.9 Regression] wrong code |vectorizer) at -O3 on |(by tree vectorizer) at -O3 |x86_64-linux-gnu|on x86_64-linux-gnu Ever confirmed|0 |1