https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #17 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
What an inline packing would (approximately) produce is this:

    subroutine processBPP(X, BPP, N)
        integer,                    intent(in)      ::  N
        real,   dimension(N,3),     intent(out)     ::  X
        real,   dimension(N,10),    intent(in)      ::  BPP

        integer                                     ::  i
        real :: tmp1(3)
        real :: tmp2(6)
        integer :: k1, k2

        do concurrent (i = 1:N)
           k1 = 0
           do
              if (.not. k1 < 3) exit
              tmp1(k1+1) = BPP(i,k1+1)
              k1 = k1 + 1
           end do

           k2 = 0
           do
              if (.not. k2 < 6) exit
              tmp2(k2+1) = BPP(i,k2+5)
              k2 = k2 + 1
           end do

           X(i,:) = fpdbacksolve(tmp1, tmp2)
        end do

    end subroutine processBPP

I see no timing difference for gfortran with this to the (:) version.
Chris, can you confirm this?

And is flang still faster by a factor of two if you use this version?

Reply via email to