http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57037
Bug #: 57037 Summary: GCC does not generate non-temporal stores on i386 with SSE2+ Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: anl...@gmx.de Hello, it appears that gcc does not generate non-temporal stores available on i386 at least with SSE2. This is an important optimization for some memory-bandwidth limited codes. Example: for the stream triad kernel, subroutine stream_kernel_triad (a, b, c, n, s) integer , intent(in) :: n double precision :: a(*), b(*), c(*) double precision, intent(in) :: s integer :: j do j = 1,n a(j) = b(j) + s*c(j) end do end subroutine stream_kernel_triad the Intel compiler generates vectorized code with a throughput that is 25% higher on my Core2 than when disabling the generation of non-temporal stores (i.e. compiling with "-opt-streaming-stores never"). gfortran (using -Ofast -fprefetch-loop-arrays) exactly reproduces the performance of the Intel compiler without temporal stores. It appears that this is an important optimization. Harald