[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 Jakub Jelinek changed: What|Removed |Added Status|WAITING |RESOLVED CC||jakub at gcc dot gnu.org Resolution|--- |FIXED Target Milestone|6.5 |7.3 --- Comment #9 from Jakub Jelinek --- GCC 6 branch is being closed, fixed in 7.x.
[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 --- Comment #8 from rguenther at suse dot de --- On Thu, 24 May 2018, jason.vas.dias at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 > > --- Comment #7 from Jason Vas Dias --- > Aha! > Yes, I was experimenting with the new '-march=haswell' and > '-mtune=intel' options > ( which seem to me to be the wrong way round - shouldn't 'haswell' be an >'-mtune' option and 'intel' be an '-march' option ? but this is not > the case, > according to documentation. > ) . > GCC 6.4.1 was configured with : > > ./configure \ >--prefix=/usr/local --libdir=/usr/local/lib64 --enable-languages=all \ > --enable-targets=all --enable-multilib --enable-threads=posix --enable-lto \ > --with-cpu-64=intel --with-cpu-32=generic \ > --with-arch-64=haswell --with-tune-64=intel --with-arch-32=i686 \ > --with-fp=sse+387 --with-tune-32=generic --enable-shared \ > --with-pic --with-gmp=/usr/local --with-isl=/usr/local \ > --with-cloog=/usr/local --with-mpc=/usr/local --with-isl=/usr/local \ > --with-system-zlib --with-gnu-ld --with-gnu-as --enable-serial-configure \ > --host=x86_64-linux-gnu --build=x86_64-linux-gnu --target=x86_64-linux-gnu > ' > > What I am trying to achieve is that the DEFAULT 64-bit platform for the > compiler > (the target the compiler builds for without any '-m=yyy' options) > should > be '-march=haswell -mtune=intel', which I think should be the equivalent > to the older options '-march=x86-64 -mtune=haswell' , and to > '-mtune=native' on this platform - please let me know if this is not the > case . > > The 5.5.0 & 7.3.1 compilers were built with > '--with-arch64=x86-64 --with-cpu64=haswell' , > but re-reading the updated 6.4.1 '-mtune'/'-march' documentation led > me to believe > that the new '--with-arch-64=haswell --with-tune-64=intel' options were > more appropriate . I guess not ? > (The 5.5.0 and 7.3.1 builds are 6months & 2months old, before the > '-march=haswell' support. > ). > > I will try rebuilding 6.3.1 with '--with-arch64=x86-64 > --with-cpu64=haswell' and > retest. Thanks! The testsuite is mostly "tuned" to the defaults, that is -march=x86-64 and -mtune=generic. So you likely won't have luck with the above choice either. The testcase could be improved to handle the situation more gracefully but really there's no point on the old GCC 6 branch.
[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 --- Comment #7 from Jason Vas Dias --- Aha! Yes, I was experimenting with the new '-march=haswell' and '-mtune=intel' options ( which seem to me to be the wrong way round - shouldn't 'haswell' be an '-mtune' option and 'intel' be an '-march' option ? but this is not the case, according to documentation. ) . GCC 6.4.1 was configured with : ./configure \ --prefix=/usr/local --libdir=/usr/local/lib64 --enable-languages=all \ --enable-targets=all --enable-multilib --enable-threads=posix --enable-lto \ --with-cpu-64=intel --with-cpu-32=generic \ --with-arch-64=haswell --with-tune-64=intel --with-arch-32=i686 \ --with-fp=sse+387 --with-tune-32=generic --enable-shared \ --with-pic --with-gmp=/usr/local --with-isl=/usr/local \ --with-cloog=/usr/local --with-mpc=/usr/local --with-isl=/usr/local \ --with-system-zlib --with-gnu-ld --with-gnu-as --enable-serial-configure \ --host=x86_64-linux-gnu --build=x86_64-linux-gnu --target=x86_64-linux-gnu ' What I am trying to achieve is that the DEFAULT 64-bit platform for the compiler (the target the compiler builds for without any '-m=yyy' options) should be '-march=haswell -mtune=intel', which I think should be the equivalent to the older options '-march=x86-64 -mtune=haswell' , and to '-mtune=native' on this platform - please let me know if this is not the case . The 5.5.0 & 7.3.1 compilers were built with '--with-arch64=x86-64 --with-cpu64=haswell' , but re-reading the updated 6.4.1 '-mtune'/'-march' documentation led me to believe that the new '--with-arch-64=haswell --with-tune-64=intel' options were more appropriate . I guess not ? (The 5.5.0 and 7.3.1 builds are 6months & 2months old, before the '-march=haswell' support. ). I will try rebuilding 6.3.1 with '--with-arch64=x86-64 --with-cpu64=haswell' and retest. Thanks! On 24/05/2018, rguenth at gcc dot gnu.orgwrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 > > --- Comment #6 from Richard Biener --- > The log file shows the loop was already vectorized by loop vectorization. > How > did you configure gcc? It might be you configured a default -march/tune > that > doesn't match the testcase expectation (and the testcase could probably use > -ftree-slp-vectorize instead of -ftree-vectorize). > > -- > You are receiving this mail because: > You reported the bug.
[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 --- Comment #6 from Richard Biener --- The log file shows the loop was already vectorized by loop vectorization. How did you configure gcc? It might be you configured a default -march/tune that doesn't match the testcase expectation (and the testcase could probably use -ftree-slp-vectorize instead of -ftree-vectorize).
[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 --- Comment #5 from Jason Vas Dias --- Could it be an issue to do with running on different hardware? The CPU on the machine is a rather old 4-core (8 with HyperThreading) Haswell : processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz stepping: 3 microcode : 0x22 cpu MHz : 3400.000 cache size : 8192 KB physical id : 0 siblings: 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc a perfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts bogomips: 6784.22 clflush size: 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management:
[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 --- Comment #4 from Jason Vas Dias --- Same commands run by GCC 5.5.0 or GCC 7.3.1 succeed: $ g++5 slp-pr56812.cc -nostdinc++ -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model -msse2 -fdump-tree-slp-details=gcc5.out -O3 -funroll-loops -fvect-cost-model=dynamic -S -o slp-pr56812.gcc5.s $ grep 'basic block vectorized' gcc5.out slp-pr56812.cc:17:16: note: basic block vectorized $ gcc_7_3_env $ g++7 slp-pr56812.cc -nostdinc++ -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model -msse2 -fdump-tree-slp-details=gcc7.out -O3 -funroll-loops -fvect-cost-model=dynamic -S -o slp-pr56812.gcc7.s $ grep 'basic block vectorized' gcc7.out slp-pr56812.cc:18:1: note: basic block vectorized
[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 --- Comment #3 from Jason Vas Dias --- Created attachment 44174 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44174=edit slp1 log file Here is the slp1 log file produced by command: $ /home/devel/OS/gcc-6-branch/host-x86_64-linux-gnu/gcc/testsuite/g++/../../xg++ -B/home/devel/OS/gcc-6-branch/host-x86_64-linux-gnu/gcc/testsuite/g++/../../ /home/devel/OS/gcc-6-branch/gcc/testsuite/g++.dg/vect/slp-pr56812.cc -fno-diagnostics-show-caret -fdiagnostics-color=never -nostdinc++ -I/home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libstdc++-v3/include/x86_64-linux-gnu -I/home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libstdc++-v3/include -I/home/devel/OS/gcc-6-branch/libstdc++-v3/libsupc++ -I/home/devel/OS/gcc-6-branch/libstdc++-v3/include/backward -I/home/devel/OS/gcc-6-branch/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model -msse2 -fdump-tree-slp-details -O3 -funroll-loops -fvect-cost-model=dynamic -S -o slp-pr56812.s It does not contain the string 'basic block vectorized', so the test fails.
[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 --- Comment #2 from Jason Vas Dias --- Created attachment 44173 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44173=edit log file produced by 'make check-g++ 'RUNTESTFLAGS=vect.exp=slp-pr56812*' Log file showing test failures as requested
[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891 Richard Biener changed: What|Removed |Added Target||x86_64-*-* Status|UNCONFIRMED |WAITING Version|unknown |6.3.1 Keywords||missed-optimization, ||needs-bisection Last reconfirmed||2018-05-24 CC||rguenth at gcc dot gnu.org Ever confirmed|0 |1 Summary|[6.4.1 regression] Simple |[6 Regression] Simple loop |loop is not SLP-vectorized |is not SLP-vectorized after |after r196872 |r196872 Target Milestone|--- |6.5 --- Comment #1 from Richard Biener --- The test works fine for me on x86_64 Linux (openSUSE Leap 42.2) on the GCC 6 branch (r260441). I don't see anything host specific in it. Please cut from the testsuite log the compiler commands and attach the slp1 dump file.