https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565

--- Comment #9 from Quanhua Liu <quanhua.liu at noaa dot gov> ---
Hi Richard,

It seems that I cannot add comment online to the ticket.
I tried
    gfortran -o z -O3 -march=native test_matrixCal.f90 -fexternal-blas 
-lblas -fdump-tree-optimized
   time a.out 1
   and
    time a.out 2
Both are very slow ( 6s in comparison to previous 0.8 s using method 2).
I don't know which blab on my machine is.

On your machine, can you help to test
   BB = transpose(B)
   C = matmul(A,BB)
  using gfortran -O3 test_matrixCal.f90
  time a.out  2
against test
   C = matmul(A, transpose(B) )
using any option or blas timing?

The timing depends on machine. It would be great helpful if you can 
provide the timing for the two methods from your site

Thank you!

Quanhua Liu
On 8/9/2022 1:53 PM, sgk at troutmask dot apl.washington.edu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
>
> --- Comment #7 from Steve Kargl <sgk at troutmask dot apl.washington.edu> ---
> On Tue, Aug 09, 2022 at 05:17:57PM +0000, quanhua.liu at noaa dot gov wrote:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
>>
>> --- Comment #5 from Quanhua Liu <quanhua.liu at noaa dot gov> ---
>> Hi Richard,
>>
>> Using -fexternal-blas for gfortran v10.3.0 is much slower than
>> the method 2:
>>     BB = transpose(B)
>>     C = matmul(A, BB)
>>
>> How about on your machine?
>>
>>> If you are doing a problem of this size or larger, you want to use the
>>> -fexternal-blas option and link in OpenBLAS.
>
> I wrote "and link in OpenBLAS".
>
>>> I added timing code and replicated the loop to both in one go.
>>>
>>> % gfcx -o z -O3 -march=native a.f90 && ./z
>>>      1.16500998       1615.08594
>>>      5.32258606       1615.08020
>
>>> % gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas && ./z
>>>      2.44668889       1615.08301
>>>      1.99379802       1615.08301
> Method 1 is faster with OpenBLAS.
>

Reply via email to