On Sat, Jun 22, 2024 at 5:03 PM Yongzhong Li <yongzhong...@mail.utoronto.ca> wrote:
> MKL_VERBOSE=1 ./ex1 matrix nonzeros = 100, allocated nonzeros = 100 > MKL_VERBOSE Intel(R) MKL 2019. 0 Update 4 Product build 20190411 for > Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) > AVX-512) with support of Vector > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > MKL_VERBOSE=1 ./ex1 > > > matrix nonzeros = 100, allocated nonzeros = 100 > > MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for > Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) > AVX-512) with support of Vector Neural Network Instructions enabled > processors, Lnx 2.50GHz lp64 gnu_thread > > MKL_VERBOSE > ZGEMV(N,10,10,0x7ffd9d7078f0,0x187eb20,10,0x187f7c0,1,0x7ffd9d707900,0x187ff70,1) > 167.34ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x7ffd9d7078c0,-1,0) > 77.19ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x1894490,10,0) 83.97ms > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZSYTRS(L,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 44.94ms > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 20.72us > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZSYTRS(L,10,2,0x1894b50,10,0x1893df0,0x187d2a0,10,0) 4.22us > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZGEMM(N,N,10,2,10,0x7ffd9d707790,0x187eb20,10,0x187d2a0,10,0x7ffd9d7077a0,0x1896a70,10) > 1.41ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZAXPY(20,0x7ffd9d7078a0,0x1896a70,1,0x187b650,1) 381ns CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x7ffd9d707840,-1,0) 742ns > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x18951a0,10,0) 4.20us > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZSYTRS(L,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 2.94us > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 292ns CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZGEMV(N,10,10,0x7ffd9d7078f0,0x187eb20,10,0x187f7c0,1,0x7ffd9d707900,0x187ff70,1) > 1.17us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZGETRF(10,10,0x1894b50,10,0x1893df0,0) 202.48ms CNR:OFF Dyn:1 > FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZGETRS(N,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 20.78ms > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 954ns CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZGETRS(N,10,2,0x1894b50,10,0x1893df0,0x187d2a0,10,0) 30.74ms > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZGEMM(N,N,10,2,10,0x7ffd9d707790,0x187eb20,10,0x187d2a0,10,0x7ffd9d7077a0,0x18969c0,10) > 3.95us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZAXPY(20,0x7ffd9d7078a0,0x18969c0,1,0x187b650,1) 995ns CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZGETRF(10,10,0x1894b50,10,0x1893df0,0) 4.09us CNR:OFF Dyn:1 > FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZGETRS(N,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 3.92us > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 274ns CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZGEMV(N,15,10,0x7ffd9d7078f0,0x187ec70,15,0x187fc30,1,0x7ffd9d707900,0x1880400,1) > 1.59us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x1894550,0x7ffd9d707900,-1,0) > 47.07us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x1894550,0x1895cb0,10,0) 26.62us > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZUNMQR(L,C,15,1,10,0x1894b40,15,0x1894550,0x1895b00,15,0x7ffd9d7078b0,-1,0) > 35.32us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZUNMQR(L,C,15,1,10,0x1894b40,15,0x1894550,0x1895b00,15,0x1895cb0,10,0) > 42.33ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZTRTRS(U,N,N,10,1,0x1894b40,15,0x1895b00,15,0) 16.11us CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187fc30,1,0x1880c70,1) 395ns CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZGEMM(N,N,15,2,10,0x7ffd9d707790,0x187ec70,15,0x187d310,10,0x7ffd9d7077a0,0x187b5b0,15) > 3.22us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZUNMQR(L,C,15,2,10,0x1894b40,15,0x1894550,0x1897760,15,0x7ffd9d7078c0,-1,0) > 730ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZUNMQR(L,C,15,2,10,0x1894b40,15,0x1894550,0x1897760,15,0x1895cb0,10,0) > 4.42us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZTRTRS(U,N,N,10,2,0x1894b40,15,0x1897760,15,0) 5.96us CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZAXPY(20,0x7ffd9d7078a0,0x187d310,1,0x1897610,1) 222ns CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x18954b0,0x7ffd9d707820,-1,0) 685ns > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x18954b0,0x1895d60,10,0) 6.11us > CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZUNMQR(L,C,15,1,10,0x1894b40,15,0x18954b0,0x1895bb0,15,0x7ffd9d7078b0,-1,0) > 390ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE > ZUNMQR(L,C,15,1,10,0x1894b40,15,0x18954b0,0x1895bb0,15,0x1895d60,10,0) > 3.09us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZTRTRS(U,N,N,10,1,0x1894b40,15,0x1895bb0,15,0) 1.05us CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187fc30,1,0x1880c70,1) 257ns CNR:OFF > Dyn:1 FastMM:1 TID:0 NThr:1 > > Yes, for petsc example, there are MKL outputs, but for my own program. All > I did is to change the matrix type from MATAIJ to MATAIJMKL to get > optimized performance for spmv from MKL. Should I expect to see any MKL > outputs in this case? > Are you sure that the type changed? You can MatView() the matrix with format ascii_info to see. Thanks, Matt > Thanks, > > Yongzhong > > > > *From: *Junchao Zhang <junchao.zh...@gmail.com> > *Date: *Saturday, June 22, 2024 at 9:40 AM > *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> > *Cc: *Pierre Jolivet <pie...@joliv.et>, petsc-users@mcs.anl.gov < > petsc-users@mcs.anl.gov> > *Subject: *Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc > KSPSolve Performance Issue > > No, you don't. It is strange. Perhaps you can you run a petsc example > first and see if MKL is really used > > $ cd src/mat/tests > > $ make ex1 > > $ MKL_VERBOSE=1 ./ex1 > > > --Junchao Zhang > > > > > > On Fri, Jun 21, 2024 at 4:03 PM Yongzhong Li < > yongzhong...@mail.utoronto.ca> wrote: > > I am using > > export MKL_VERBOSE=1 > > ./xx > > in the bash file, do I have to use - ksp_converged_reason? > > Thanks, > > Yongzhong > > > > *From: *Pierre Jolivet <pie...@joliv.et> > *Date: *Friday, June 21, 2024 at 1:47 PM > *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> > *Cc: *Junchao Zhang <junchao.zh...@gmail.com>, petsc-users@mcs.anl.gov < > petsc-users@mcs.anl.gov> > *Subject: *Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc > KSPSolve Performance Issue > > 你通常不会收到来自 pie...@joliv.et 的电子邮件。了解这一点为什么很重要 > <https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!flsZMI97ne0yyxHhLda3hROB9qsgstuZS-jPinxGIzFCCSdn1ujdoMR8dyz-5_kVqqMM-12Lt0dTdjKrx3wXhHZmBhNydvFQeSY$> > > How do you set the variable? > > > > $ MKL_VERBOSE=1 ./ex1 -ksp_converged_reason > > MKL_VERBOSE oneMKL 2024.0 Update 1 Product build 20240215 for Intel(R) 64 > architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled > processors, Lnx 2.80GHz lp64 intel_thread > > MKL_VERBOSE DDOT(10,0x22127c0,1,0x22127c0,1) 2.02ms CNR:OFF Dyn:1 FastMM:1 > TID:0 NThr:1 > > MKL_VERBOSE DSCAL(10,0x7ffc9fb4ff08,0x22127c0,1) 12.67us CNR:OFF Dyn:1 > FastMM:1 TID:0 NThr:1 > > MKL_VERBOSE DDOT(10,0x22127c0,1,0x2212840,1) 1.52us CNR:OFF Dyn:1 FastMM:1 > TID:0 NThr:1 > > MKL_VERBOSE DDOT(10,0x2212840,1,0x2212840,1) 167ns CNR:OFF Dyn:1 FastMM:1 > TID:0 NThr:1 > > [...] > > > > On 21 Jun 2024, at 7:37 PM, Yongzhong Li <yongzhong...@mail.utoronto.ca> > wrote: > > > > This Message Is From an External Sender > > This message came from outside your organization. > > Hello all, > > I set MKL_VERBOSE = 1, but observed no print output specific to the use of > MKL. Does PETSc enable this verbose output? > > Best, > > Yongzhong > > > > *From: *Pierre Jolivet <pie...@joliv.et> > *Date: *Friday, June 21, 2024 at 1:36 AM > *To: *Junchao Zhang <junchao.zh...@gmail.com> > *Cc: *Yongzhong Li <yongzhong...@mail.utoronto.ca>, > petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> > *Subject: *Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc > KSPSolve Performance Issue > > 你通常不会收到来自 pie...@joliv.et 的电子邮件。了解这一点为什么很重要 > <https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!eXBeeIXo9Yqgp2nypqwKYimLnGBZXnF4dXxgLM1UoOIO6n8nt3XlfgjVWLPWJh4UOa5NNpx-nrJb_H828XRQKUREfR2m69oCbxI$> > > > > > > On 21 Jun 2024, at 6:42 AM, Junchao Zhang <junchao.zh...@gmail.com> wrote: > > > > This Message Is From an External Sender > > This message came from outside your organization. > > I remember there are some MKL env vars to print MKL routines called. > > > > The environment variable is MKL_VERBOSE > > > > Thanks, > > Pierre > > > > Maybe we can try it to see what MKL routines are really used and then we > can understand why some petsc functions did not speed up > > > --Junchao Zhang > > > > > > On Thu, Jun 20, 2024 at 10:39 PM Yongzhong Li < > yongzhong...@mail.utoronto.ca> wrote: > > *This Message Is From an External Sender* > > This message came from outside your organization. > > > > Hi Barry, sorry for my last results. I didn’t fully understand the stage > profiling and logging in PETSc, now I only record KSPSolve() stage of my > program. Some sample codes are as follow, > > // Static variable to keep track of the stage counter > > static int stageCounter = 1; > > > > // Generate a unique stage name > > std::ostringstream oss; > > oss << "Stage " << stageCounter << " of Code"; > > std::string stageName = oss.str(); > > > > // Register the stage > > PetscLogStage stagenum; > > > > PetscLogStageRegister(stageName.c_str(), &stagenum); > > PetscLogStagePush(stagenum); > > > > *KSPSolve(*ksp_ptr, b, x);* > > > > PetscLogStagePop(); > > stageCounter++; > > I have attached my new logging results, there are 1 main stage and 4 other > stages where each one is KSPSolve() call. > > To provide some additional backgrounds, if you recall, I have been trying > to get efficient iterative solution using multithreading. I found out by > compiling PETSc with Intel MKL library instead of OpenBLAS, I am able to > perform sparse matrix-vector multiplication faster, I am using > MATSEQAIJMKL. This makes the shell matrix vector product in each iteration > scale well with the #of threads. However, I found out the total GMERS solve > time (~KSPSolve() time) is not scaling well the #of threads. > > From the logging results I learned that when performing KSPSolve(), there > are some CPU overheads in PCApply() and KSPGMERSOrthog(). I ran my programs > using different number of threads and plotted the time consumption for > PCApply() and KSPGMERSOrthog() against #of thread. I found out these two > operations are not scaling with the threads at all! My results are attached > as the pdf to give you a clear view. > > My questions is, > > From my understanding, in PCApply, MatSolve() is involved, > KSPGMERSOrthog() will have many vector operations, so why these two parts > can’t scale well with the # of threads when the intel MKL library is linked? > > Thank you, > Yongzhong > > > > *From: *Barry Smith <bsm...@petsc.dev> > *Date: *Friday, June 14, 2024 at 11:36 AM > *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> > *Cc: *petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>, > petsc-ma...@mcs.anl.gov <petsc-ma...@mcs.anl.gov>, Piero Triverio < > piero.trive...@utoronto.ca> > *Subject: *Re: [petsc-maint] Assistance Needed with PETSc KSPSolve > Performance Issue > > > > I am a bit confused. Without the initial guess computation, there are > still a bunch of events I don't understand > > > > MatTranspose 79 1.0 4.0598e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatMatMultSym 110 1.0 1.7419e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatMatMultNum 90 1.0 1.2640e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatMatMatMultSym 20 1.0 1.3049e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatRARtSym 25 1.0 1.2492e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatMatTrnMultSym 25 1.0 8.8265e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatMatTrnMultNum 25 1.0 2.4820e+02 1.0 6.83e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 275 > > MatTrnMatMultSym 10 1.0 7.2984e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatTrnMatMultNum 10 1.0 9.3128e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > > in addition there are many more VecMAXPY then VecMDot (in GMRES they are > each done the same number of times) > > > > VecMDot 5588 1.0 1.7183e+03 1.0 2.06e+13 1.0 0.0e+00 0.0e+00 > 0.0e+00 8 10 0 0 0 8 10 0 0 0 12016 > > VecMAXPY 22412 1.0 8.4898e+03 1.0 4.17e+13 1.0 0.0e+00 0.0e+00 > 0.0e+00 39 20 0 0 0 39 20 0 0 0 4913 > > > > Finally there are a huge number of > > > > MatMultAdd 258048 1.0 1.4178e+03 1.0 6.10e+13 1.0 0.0e+00 0.0e+00 > 0.0e+00 7 29 0 0 0 7 29 0 0 0 43025 > > > > Are you making calls to all these routines? Are you doing this inside your > MatMult() or before you call KSPSolve? > > > > The reason I wanted you to make a simpler run without the initial guess > code is that your events are far more complicated than would be produced by > GMRES alone so it is not possible to understand the behavior you are seeing > without fully understanding all the events happening in the code. > > > > Barry > > > > > > On Jun 14, 2024, at 1:19 AM, Yongzhong Li <yongzhong...@mail.utoronto.ca> > wrote: > > > > Thanks, I have attached the results without using any KSPGuess. At low > frequency, the iteration steps are quite close to the one with KSPGuess, > specifically > > KSPGuess Object: 1 MPI process > > type: fischer > > Model 1, size 200 > > However, I found at higher frequency, the # of iteration steps are > significant higher than the one with KSPGuess, I have attahced both of the > results for your reference. > > Moreover, could I ask why the one without the KSPGuess options can be used > for a baseline comparsion? What are we comparing here? How does it relate > to the performance issue/bottleneck I found? “*I have noticed that the > time taken by **KSPSolve** is **almost two times **greater than the CPU > time for matrix-vector product multiplied by the number of iteration*” > > Thank you! > Yongzhong > > > > *From: *Barry Smith <bsm...@petsc.dev> > *Date: *Thursday, June 13, 2024 at 2:14 PM > *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> > *Cc: *petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>, > petsc-ma...@mcs.anl.gov <petsc-ma...@mcs.anl.gov>, Piero Triverio < > piero.trive...@utoronto.ca> > *Subject: *Re: [petsc-maint] Assistance Needed with PETSc KSPSolve > Performance Issue > > > > Can you please run the same thing without the KSPGuess option(s) for a > baseline comparison? > > > > Thanks > > > > Barry > > > > On Jun 13, 2024, at 1:27 PM, Yongzhong Li <yongzhong...@mail.utoronto.ca> > wrote: > > > > This Message Is From an External Sender > > This message came from outside your organization. > > Hi Matt, > > I have rerun the program with the keys you provided. The system output > when performing ksp solve and the final petsc log output were stored in a > .txt file attached for your reference. > > Thanks! > Yongzhong > > > > *From: *Matthew Knepley <knep...@gmail.com> > *Date: *Wednesday, June 12, 2024 at 6:46 PM > *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> > *Cc: *petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>, > petsc-ma...@mcs.anl.gov <petsc-ma...@mcs.anl.gov>, Piero Triverio < > piero.trive...@utoronto.ca> > *Subject: *Re: [petsc-maint] Assistance Needed with PETSc KSPSolve > Performance Issue > > 你通常不会收到来自 knep...@gmail.com 的电子邮件。了解这一点为什么很重要 > <https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!djGfJnEhNJROfsMsBJy5u_KoRKbug55xZ64oHKUFnH2cWku_Th1hwt4TDdoMd8pWYVDzJeqJslMNZwpO3y0Et94d31qk-oCEwo4$> > > On Wed, Jun 12, 2024 at 6:36 PM Yongzhong Li < > yongzhong...@mail.utoronto.ca> wrote: > > Dear PETSc’s developers, I hope this email finds you well. I am currently > working on a project using PETSc and have encountered a performance issue > with the KSPSolve function. Specifically, I have noticed that the time > taken by KSPSolve is > > ZjQcmQRYFpfptBannerStart > > *This Message Is From an External Sender* > > This message came from outside your organization. > > > > ZjQcmQRYFpfptBannerEnd > > Dear PETSc’s developers, > > I hope this email finds you well. > > I am currently working on a project using PETSc and have encountered a > performance issue with the KSPSolve function. Specifically, *I have > noticed that the time taken by **KSPSolve** is **almost two times **greater > than the CPU time for matrix-vector product multiplied by the number of > iteration steps*. I use C++ chrono to record CPU time. > > For context, I am using a shell system matrix A. Despite my efforts to > parallelize the matrix-vector product (Ax), the overall solve time > remains higher than the matrix vector product per iteration indicates > when multiple threads were used. Here are a few details of my setup: > > - *Matrix Type*: Shell system matrix > - *Preconditioner*: Shell PC > - *Parallel Environment*: Using Intel MKL as PETSc’s BLAS/LAPACK > library, multithreading is enabled > > I have considered several potential reasons, such as preconditioner setup, > additional solver operations, and the inherent overhead of using a shell > system matrix. *However, since KSPSolve is a high-level API, I have been > unable to pinpoint the exact cause of the increased solve time.* > > Have you observed the same issue? Could you please provide some experience > on how to diagnose and address this performance discrepancy? Any > insights or recommendations you could offer would be greatly appreciated. > > > > For any performance question like this, we need to see the output of your > code run with > > > > -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view > > > > Thanks, > > > > Matt > > > > Thank you for your time and assistance. > > Best regards, > > Yongzhong > > ----------------------------------------------------------- > > *Yongzhong Li* > > PhD student | Electromagnetics Group > > Department of Electrical & Computer Engineering > > University of Toronto > > https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!bDAP9_cc4kxQoG-PxDlkBdIp_YAhb2swSdTCmldNce2eI4DO6YATl5KED0zpX5PC2AEvY1tq0jjSK32rn8gN$ > > <https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!cuLttMJEcegaqu461Bt4QLsO4fASfLM5vjRbtyNhWJQiInbjgNwkGNdkFE1ebSbFjOUatYB0-jd2yQWMWzqkDFFjwMvNl3ZKAr8$> > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bDAP9_cc4kxQoG-PxDlkBdIp_YAhb2swSdTCmldNce2eI4DO6YATl5KED0zpX5PC2AEvY1tq0jjSK2axnx8f$ > > <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!djGfJnEhNJROfsMsBJy5u_KoRKbug55xZ64oHKUFnH2cWku_Th1hwt4TDdoMd8pWYVDzJeqJslMNZwpO3y0Et94d31qkNOuenGA$> > > <ksp_petsc_log.txt> > > > > <ksp_petsc_log.txt><ksp_petsc_log_noguess.txt> > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bDAP9_cc4kxQoG-PxDlkBdIp_YAhb2swSdTCmldNce2eI4DO6YATl5KED0zpX5PC2AEvY1tq0jjSK2axnx8f$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bDAP9_cc4kxQoG-PxDlkBdIp_YAhb2swSdTCmldNce2eI4DO6YATl5KED0zpX5PC2AEvY1tq0jjSK1KTaf95$ >