2008/8/18 VandeVondele Joost <[EMAIL PROTECTED]>: > > The attached testcase yields (on a core2 duo, gcc trunk): > >> gfortran -O3 -ftree-vectorize -ffast-math -march=native test.f90 >> time ./a.out > > real 0m3.414s > >> ifort -xT -O3 test.f90 >> time ./a.out > > real 0m1.556s > > The assembly contains: > > ifort gfortran > mulpd 140 0 > mulsd 0 280 > > so the reason seems that ifort vectorizes the following code (full testcase > attached): > > SUBROUTINE collocate_core_6(res,coef_xyz,pol_x,pol_y,pol_z,cmax,kg,jg) > > IMPLICIT NONE > INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 ) > integer, PARAMETER :: lp=6 > real(wp), INTENT(OUT) :: res > integer, INTENT(IN) :: cmax,kg,jg > real(wp), INTENT(IN) :: pol_x(0:lp,-cmax:cmax) > real(wp), INTENT(IN) :: pol_y(1:2,0:lp,-cmax:0) > real(wp), INTENT(IN) :: pol_z(1:2,0:lp,-cmax:0) > real(wp), INTENT(IN) :: coef_xyz(((lp+1)*(lp+2)*(lp+3))/6) > real(wp) :: coef_xy(2,(lp+1)*(lp+2)/2) > real(wp) :: coef_x(4,0:lp) > > [...] > coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,12)*pol_y(1,1,jg) > coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,12)*pol_y(2,1,jg) > coef_x(1:2,5)=coef_x(1:2,5)+coef_xy(1:2,13)*pol_y(1,1,jg) > coef_x(3:4,5)=coef_x(3:4,5)+coef_xy(1:2,13)*pol_y(2,1,jg) > coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,14)*pol_y(1,2,jg) > coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,14)*pol_y(2,2,jg) > coef_x(1:2,1)=coef_x(1:2,1)+coef_xy(1:2,15)*pol_y(1,2,jg) > coef_x(3:4,1)=coef_x(3:4,1)+coef_xy(1:2,15)*pol_y(2,2,jg) > coef_x(1:2,2)=coef_x(1:2,2)+coef_xy(1:2,16)*pol_y(1,2,jg) > coef_x(3:4,2)=coef_x(3:4,2)+coef_xy(1:2,16)*pol_y(2,2,jg) > coef_x(1:2,3)=coef_x(1:2,3)+coef_xy(1:2,17)*pol_y(1,2,jg) > coef_x(3:4,3)=coef_x(3:4,3)+coef_xy(1:2,17)*pol_y(2,2,jg) > coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,18)*pol_y(1,2,jg) > coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,18)*pol_y(2,2,jg) > coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,19)*pol_y(1,3,jg) > coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,19)*pol_y(2,3,jg) > [...] > > either it is able to interpret the short vectors as such, or it realizes > that these very short implicit loops are nevertheless favourable for > vectorization. > > Is there a trick to get gcc vectorize these loops, or is there some > technology missing for this ? > > Should I file a PR for this (this is somewhat similar to PR31079 and > PR31021)?
It would be nice to have a stand-alone testcase for this, so please file a bugreport. Thanks, Richard.