https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279

--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Created attachment 54183
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54183&action=edit
Example patch with Michael S's code just pasted over the libgcc implementation,
for a test

A benchmarks: Just pasting over the code from the github
repo yields an improvement of gfortran's matmul by almost a factor of two,
so significant speedups are possible:

module tick
interface
function rdtsc() bind(C,name="rdtsc")
use iso_c_binding
integer(kind=c_long) :: rdtsc
end function rdtsc
end interface
end module tick

program main
use tick
use iso_c_binding
implicit none
integer, parameter :: wp = selected_real_kind(30)
! integer, parameter :: n=5000, p=4000, m=3666
integer, parameter :: n = 1000, p = 1000, m = 1000
real (kind=wp) :: c(n,p), a(n,m), b(m, p)
character(len=80) :: line
integer(c_long) :: t1, t2, t3
real (kind=wp) :: fl = 2.d0*n*m*p
integer :: i,j

print *,wp

line = '10 10'
call random_number(a)
call random_number(b)
t1 = rdtsc()
t2 = rdtsc()
t3 = t2-t1
print *,t3
t1 = rdtsc()
c = matmul(a,b)
t2 = rdtsc()
print *,1/(fl/(t2-t1-t3)),"Cycles per operation"
read (unit=line,fmt=*) i,j
write (unit=line,fmt=*) c(i,j)
end program main

showed

tkoenig@gcc188:~> ./original
16
32
^C
tkoenig@gcc188:~> time ./original
16
32
90.5696151959999999999999999999999997 Cycles per operation

real 1m2,148s
user 1m2,123s
sys 0m0,008s
tkoenig@gcc188:~> time ./modified
16
32
52.8148391719999999999999999999999957 Cycles per operation

real 0m36,296s
user 0m36,278s
sys 0m0,008s 

where "original" is the current libgcc soft-float implementation, and
"modified" is with the code from the repro.

It does not handle exceptions, so this causes a few regressions, but certainly
shows the potential

Reply via email to