MKL_VERBOSE=1 ./ex1

matrix nonzeros = 100, allocated nonzeros = 100
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 
architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with 
support of Vector Neural Network Instructions enabled processors, Lnx 2.50GHz 
lp64 gnu_thread
MKL_VERBOSE 
ZGEMV(N,10,10,0x7ffd9d7078f0,0x187eb20,10,0x187f7c0,1,0x7ffd9d707900,0x187ff70,1)
 167.34ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x7ffd9d7078c0,-1,0) 77.19ms 
CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x1894490,10,0) 83.97ms CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZSYTRS(L,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 44.94ms 
CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 20.72us CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZSYTRS(L,10,2,0x1894b50,10,0x1893df0,0x187d2a0,10,0) 4.22us CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZGEMM(N,N,10,2,10,0x7ffd9d707790,0x187eb20,10,0x187d2a0,10,0x7ffd9d7077a0,0x1896a70,10)
 1.41ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZAXPY(20,0x7ffd9d7078a0,0x1896a70,1,0x187b650,1) 381ns CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x7ffd9d707840,-1,0) 742ns 
CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x18951a0,10,0) 4.20us CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZSYTRS(L,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 2.94us CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 292ns CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZGEMV(N,10,10,0x7ffd9d7078f0,0x187eb20,10,0x187f7c0,1,0x7ffd9d707900,0x187ff70,1)
 1.17us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZGETRF(10,10,0x1894b50,10,0x1893df0,0) 202.48ms CNR:OFF Dyn:1 
FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZGETRS(N,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 20.78ms 
CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 954ns CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZGETRS(N,10,2,0x1894b50,10,0x1893df0,0x187d2a0,10,0) 30.74ms 
CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZGEMM(N,N,10,2,10,0x7ffd9d707790,0x187eb20,10,0x187d2a0,10,0x7ffd9d7077a0,0x18969c0,10)
 3.95us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZAXPY(20,0x7ffd9d7078a0,0x18969c0,1,0x187b650,1) 995ns CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZGETRF(10,10,0x1894b50,10,0x1893df0,0) 4.09us CNR:OFF Dyn:1 
FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZGETRS(N,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 3.92us CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 274ns CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZGEMV(N,15,10,0x7ffd9d7078f0,0x187ec70,15,0x187fc30,1,0x7ffd9d707900,0x1880400,1)
 1.59us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x1894550,0x7ffd9d707900,-1,0) 47.07us 
CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x1894550,0x1895cb0,10,0) 26.62us CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZUNMQR(L,C,15,1,10,0x1894b40,15,0x1894550,0x1895b00,15,0x7ffd9d7078b0,-1,0) 
35.32us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZUNMQR(L,C,15,1,10,0x1894b40,15,0x1894550,0x1895b00,15,0x1895cb0,10,0) 42.33ms 
CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZTRTRS(U,N,N,10,1,0x1894b40,15,0x1895b00,15,0) 16.11us CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187fc30,1,0x1880c70,1) 395ns CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZGEMM(N,N,15,2,10,0x7ffd9d707790,0x187ec70,15,0x187d310,10,0x7ffd9d7077a0,0x187b5b0,15)
 3.22us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZUNMQR(L,C,15,2,10,0x1894b40,15,0x1894550,0x1897760,15,0x7ffd9d7078c0,-1,0) 
730ns CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZUNMQR(L,C,15,2,10,0x1894b40,15,0x1894550,0x1897760,15,0x1895cb0,10,0) 4.42us 
CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZTRTRS(U,N,N,10,2,0x1894b40,15,0x1897760,15,0) 5.96us CNR:OFF Dyn:1 
FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZAXPY(20,0x7ffd9d7078a0,0x187d310,1,0x1897610,1) 222ns CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x18954b0,0x7ffd9d707820,-1,0) 685ns 
CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x18954b0,0x1895d60,10,0) 6.11us CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZUNMQR(L,C,15,1,10,0x1894b40,15,0x18954b0,0x1895bb0,15,0x7ffd9d7078b0,-1,0) 
390ns CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE 
ZUNMQR(L,C,15,1,10,0x1894b40,15,0x18954b0,0x1895bb0,15,0x1895d60,10,0) 3.09us 
CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZTRTRS(U,N,N,10,1,0x1894b40,15,0x1895bb0,15,0) 1.05us CNR:OFF Dyn:1 
FastMM:1 TID:0  NThr:1
MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187fc30,1,0x1880c70,1) 257ns CNR:OFF 
Dyn:1 FastMM:1 TID:0  NThr:1

Yes, for petsc example, there are MKL outputs, but for my own program. All I 
did is to change the matrix type from MATAIJ to MATAIJMKL to get optimized 
performance for spmv from MKL. Should I expect to see any MKL outputs in this 
case?

Thanks,
Yongzhong

From: Junchao Zhang <junchao.zh...@gmail.com>
Date: Saturday, June 22, 2024 at 9:40 AM
To: Yongzhong Li <yongzhong...@mail.utoronto.ca>
Cc: Pierre Jolivet <pie...@joliv.et>, petsc-users@mcs.anl.gov 
<petsc-users@mcs.anl.gov>
Subject: Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc KSPSolve 
Performance Issue
No,  you don't.  It is strange.  Perhaps you can you run a petsc example first 
and see if MKL is really used
$ cd src/mat/tests
$ make ex1
$ MKL_VERBOSE=1 ./ex1

--Junchao Zhang


On Fri, Jun 21, 2024 at 4:03 PM Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>> wrote:
I am using

export MKL_VERBOSE=1
./xx

in the bash file, do I have to use - ksp_converged_reason?

Thanks,
Yongzhong

From: Pierre Jolivet <pie...@joliv.et<mailto:pie...@joliv.et>>
Date: Friday, June 21, 2024 at 1:47 PM
To: Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>>
Cc: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>>, 
petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc KSPSolve 
Performance Issue
你通常不会收到来自 pie...@joliv.et<mailto:pie...@joliv.et> 
的电子邮件。了解这一点为什么很重要<https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!flsZMI97ne0yyxHhLda3hROB9qsgstuZS-jPinxGIzFCCSdn1ujdoMR8dyz-5_kVqqMM-12Lt0dTdjKrx3wXhHZmBhNydvFQeSY$
 >
How do you set the variable?

$ MKL_VERBOSE=1 ./ex1 -ksp_converged_reason
MKL_VERBOSE oneMKL 2024.0 Update 1 Product build 20240215 for Intel(R) 64 
architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled 
processors, Lnx 2.80GHz lp64 intel_thread
MKL_VERBOSE DDOT(10,0x22127c0,1,0x22127c0,1) 2.02ms CNR:OFF Dyn:1 FastMM:1 
TID:0  NThr:1
MKL_VERBOSE DSCAL(10,0x7ffc9fb4ff08,0x22127c0,1) 12.67us CNR:OFF Dyn:1 FastMM:1 
TID:0  NThr:1
MKL_VERBOSE DDOT(10,0x22127c0,1,0x2212840,1) 1.52us CNR:OFF Dyn:1 FastMM:1 
TID:0  NThr:1
MKL_VERBOSE DDOT(10,0x2212840,1,0x2212840,1) 167ns CNR:OFF Dyn:1 FastMM:1 TID:0 
 NThr:1
[...]

On 21 Jun 2024, at 7:37 PM, Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>> wrote:

This Message Is From an External Sender
This message came from outside your organization.
Hello all,

I set MKL_VERBOSE = 1, but observed no print output specific to the use of MKL. 
Does PETSc enable this verbose output?

Best,
Yongzhong

From: Pierre Jolivet <pie...@joliv.et<mailto:pie...@joliv.et>>
Date: Friday, June 21, 2024 at 1:36 AM
To: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>>
Cc: Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>>, 
petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc KSPSolve 
Performance Issue
你通常不会收到来自 pie...@joliv.et<mailto:pie...@joliv.et> 
的电子邮件。了解这一点为什么很重要<https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!eXBeeIXo9Yqgp2nypqwKYimLnGBZXnF4dXxgLM1UoOIO6n8nt3XlfgjVWLPWJh4UOa5NNpx-nrJb_H828XRQKUREfR2m69oCbxI$>


On 21 Jun 2024, at 6:42 AM, Junchao Zhang 
<junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>> wrote:

This Message Is From an External Sender
This message came from outside your organization.
I remember there are some MKL env vars to print MKL routines called.

The environment variable is MKL_VERBOSE

Thanks,
Pierre

Maybe we can try it to see what MKL routines are really used and then we can 
understand why some petsc functions did not speed up

--Junchao Zhang


On Thu, Jun 20, 2024 at 10:39 PM Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>> wrote:
This Message Is From an External Sender
This message came from outside your organization.

Hi Barry, sorry for my last results. I didn’t fully understand the stage 
profiling and logging in PETSc, now I only record KSPSolve() stage of my 
program. Some sample codes are as follow,

                // Static variable to keep track of the stage counter
                static int stageCounter = 1;

                // Generate a unique stage name
                std::ostringstream oss;
                oss << "Stage " << stageCounter << " of Code";
                std::string stageName = oss.str();

                // Register the stage
                PetscLogStage stagenum;

                PetscLogStageRegister(stageName.c_str(), &stagenum);
                PetscLogStagePush(stagenum);

                KSPSolve(*ksp_ptr, b, x);

                PetscLogStagePop();
                stageCounter++;

I have attached my new logging results, there are 1 main stage and 4 other 
stages where each one is KSPSolve() call.

To provide some additional backgrounds, if you recall, I have been trying to 
get efficient iterative solution using multithreading. I found out by compiling 
PETSc with Intel MKL library instead of OpenBLAS, I am able to perform sparse 
matrix-vector multiplication faster, I am using MATSEQAIJMKL. This makes the 
shell matrix vector product in each iteration scale well with the #of threads. 
However, I found out the total GMERS solve time (~KSPSolve() time) is not 
scaling well the #of threads.

>From the logging results I learned that when performing KSPSolve(), there are 
>some CPU overheads in PCApply() and KSPGMERSOrthog(). I ran my programs using 
>different number of threads and plotted the time consumption for PCApply() and 
>KSPGMERSOrthog() against #of thread. I found out these two operations are not 
>scaling with the threads at all! My results are attached as the pdf to give 
>you a clear view.

My questions is,

>From my understanding, in PCApply, MatSolve() is involved, KSPGMERSOrthog() 
>will have many vector operations, so why these two parts can’t scale well with 
>the # of threads when the intel MKL library is linked?

Thank you,
Yongzhong

From: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>
Date: Friday, June 14, 2024 at 11:36 AM
To: Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>, 
petsc-ma...@mcs.anl.gov<mailto:petsc-ma...@mcs.anl.gov> 
<petsc-ma...@mcs.anl.gov<mailto:petsc-ma...@mcs.anl.gov>>, Piero Triverio 
<piero.trive...@utoronto.ca<mailto:piero.trive...@utoronto.ca>>
Subject: Re: [petsc-maint] Assistance Needed with PETSc KSPSolve Performance 
Issue

   I am a bit confused. Without the initial guess computation, there are still 
a bunch of events I don't understand

MatTranspose          79 1.0 4.0598e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMatMultSym        110 1.0 1.7419e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatMatMultNum         90 1.0 1.2640e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatMatMatMultSym      20 1.0 1.3049e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatRARtSym            25 1.0 1.2492e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatMatTrnMultSym      25 1.0 8.8265e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMatTrnMultNum      25 1.0 2.4820e+02 1.0 6.83e+10 1.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0   275
MatTrnMatMultSym      10 1.0 7.2984e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatTrnMatMultNum      10 1.0 9.3128e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0

in addition there are many more VecMAXPY then VecMDot (in GMRES they are each 
done the same number of times)

VecMDot             5588 1.0 1.7183e+03 1.0 2.06e+13 1.0 0.0e+00 0.0e+00 
0.0e+00  8 10  0  0  0   8 10  0  0  0 12016
VecMAXPY           22412 1.0 8.4898e+03 1.0 4.17e+13 1.0 0.0e+00 0.0e+00 
0.0e+00 39 20  0  0  0  39 20  0  0  0  4913

Finally there are a huge number of

MatMultAdd        258048 1.0 1.4178e+03 1.0 6.10e+13 1.0 0.0e+00 0.0e+00 
0.0e+00  7 29  0  0  0   7 29  0  0  0 43025

Are you making calls to all these routines? Are you doing this inside your 
MatMult() or before you call KSPSolve?

The reason I wanted you to make a simpler run without the initial guess code is 
that your events are far more complicated than would be produced by GMRES alone 
so it is not possible to understand the behavior you are seeing without fully 
understanding all the events happening in the code.

  Barry


On Jun 14, 2024, at 1:19 AM, Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>> wrote:

Thanks, I have attached the results without using any KSPGuess. At low 
frequency, the iteration steps are quite close to the one with KSPGuess, 
specifically

  KSPGuess Object: 1 MPI process
    type: fischer
    Model 1, size 200

However, I found at higher frequency, the # of iteration steps are  significant 
higher than the one with KSPGuess, I have attahced both of the results for your 
reference.

Moreover, could I ask why the one without the KSPGuess options can be used for 
a baseline comparsion? What are we comparing here? How does it relate to the 
performance issue/bottleneck I found? “I have noticed that the time taken by 
KSPSolve is almost two times greater than the CPU time for matrix-vector 
product multiplied by the number of iteration”

Thank you!
Yongzhong

From: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>
Date: Thursday, June 13, 2024 at 2:14 PM
To: Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>, 
petsc-ma...@mcs.anl.gov<mailto:petsc-ma...@mcs.anl.gov> 
<petsc-ma...@mcs.anl.gov<mailto:petsc-ma...@mcs.anl.gov>>, Piero Triverio 
<piero.trive...@utoronto.ca<mailto:piero.trive...@utoronto.ca>>
Subject: Re: [petsc-maint] Assistance Needed with PETSc KSPSolve Performance 
Issue

  Can you please run the same thing without the  KSPGuess option(s) for a 
baseline comparison?

   Thanks

   Barry

On Jun 13, 2024, at 1:27 PM, Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>> wrote:

This Message Is From an External Sender
This message came from outside your organization.
Hi Matt,

I have rerun the program with the keys you provided. The system output when 
performing ksp solve and the final petsc log output were stored in a .txt file 
attached for your reference.

Thanks!
Yongzhong

From: Matthew Knepley <knep...@gmail.com<mailto:knep...@gmail.com>>
Date: Wednesday, June 12, 2024 at 6:46 PM
To: Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>, 
petsc-ma...@mcs.anl.gov<mailto:petsc-ma...@mcs.anl.gov> 
<petsc-ma...@mcs.anl.gov<mailto:petsc-ma...@mcs.anl.gov>>, Piero Triverio 
<piero.trive...@utoronto.ca<mailto:piero.trive...@utoronto.ca>>
Subject: Re: [petsc-maint] Assistance Needed with PETSc KSPSolve Performance 
Issue
你通常不会收到来自 knep...@gmail.com<mailto:knep...@gmail.com> 
的电子邮件。了解这一点为什么很重要<https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!djGfJnEhNJROfsMsBJy5u_KoRKbug55xZ64oHKUFnH2cWku_Th1hwt4TDdoMd8pWYVDzJeqJslMNZwpO3y0Et94d31qk-oCEwo4$>
On Wed, Jun 12, 2024 at 6:36 PM Yongzhong Li 
<yongzhong...@mail.utoronto.ca<mailto:yongzhong...@mail.utoronto.ca>> wrote:
Dear PETSc’s developers, I hope this email finds you well. I am currently 
working on a project using PETSc and have encountered a performance issue with 
the KSPSolve function. Specifically, I have noticed that the time taken by 
KSPSolve is
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
Dear PETSc’s developers,
I hope this email finds you well.
I am currently working on a project using PETSc and have encountered a 
performance issue with the KSPSolve function. Specifically, I have noticed that 
the time taken by KSPSolve is almost two times greater than the CPU time for 
matrix-vector product multiplied by the number of iteration steps. I use C++ 
chrono to record CPU time.
For context, I am using a shell system matrix A. Despite my efforts to 
parallelize the matrix-vector product (Ax), the overall solve time remains 
higher than the matrix vector product per iteration indicates when multiple 
threads were used. Here are a few details of my setup:

  *   Matrix Type: Shell system matrix
  *   Preconditioner: Shell PC
  *   Parallel Environment: Using Intel MKL as PETSc’s BLAS/LAPACK library, 
multithreading is enabled
I have considered several potential reasons, such as preconditioner setup, 
additional solver operations, and the inherent overhead of using a shell system 
matrix. However, since KSPSolve is a high-level API, I have been unable to 
pinpoint the exact cause of the increased solve time.
Have you observed the same issue? Could you please provide some experience on 
how to diagnose and address this performance discrepancy? Any insights or 
recommendations you could offer would be greatly appreciated.

For any performance question like this, we need to see the output of your code 
run with

  -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view

  Thanks,

     Matt

Thank you for your time and assistance.
Best regards,
Yongzhong
-----------------------------------------------------------
Yongzhong Li
PhD student | Electromagnetics Group
Department of Electrical & Computer Engineering
University of Toronto
https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!flsZMI97ne0yyxHhLda3hROB9qsgstuZS-jPinxGIzFCCSdn1ujdoMR8dyz-5_kVqqMM-12Lt0dTdjKrx3wXhHZmBhNy72AFb1k$
 
<https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!cuLttMJEcegaqu461Bt4QLsO4fASfLM5vjRbtyNhWJQiInbjgNwkGNdkFE1ebSbFjOUatYB0-jd2yQWMWzqkDFFjwMvNl3ZKAr8$>



--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!flsZMI97ne0yyxHhLda3hROB9qsgstuZS-jPinxGIzFCCSdn1ujdoMR8dyz-5_kVqqMM-12Lt0dTdjKrx3wXhHZmBhNyYEgp7uQ$
 
<https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!djGfJnEhNJROfsMsBJy5u_KoRKbug55xZ64oHKUFnH2cWku_Th1hwt4TDdoMd8pWYVDzJeqJslMNZwpO3y0Et94d31qkNOuenGA$>
<ksp_petsc_log.txt>

<ksp_petsc_log.txt><ksp_petsc_log_noguess.txt>

Reply via email to