Hi, that's really good news for us, thanks! I will plot again the memory scaling using these new options and let you know. Next week I hope.
Before that, I just need to clarify the situation. Throughout our discussions, we mentionned a number of options concerning the scalability: -matptatp_via scalable -inner_diag_matmatmult_via scalable -inner_diag_matmatmult_via scalable -mat_freeintermediatedatastructures -matptap_via allatonce -matptap_via allatonce_merged Which ones of them are compatible? Should I use all of them at the same time? Is there redundancy? Thanks, Myriam Le 04/25/19 à 21:47, Zhang, Hong a écrit : > Myriam: > Checking MatPtAP() in petsc-3.6.4, I realized that it uses different > algorithm than petsc-10 and later versions. petsc-3.6 uses out-product > for C=P^T * AP, while petsc-3.10 uses local transpose of P. petsc-3.10 > accelerates data accessing, but doubles the memory of P. > > Fande added two new implementations for MatPtAP() to petsc-master > which use much smaller and scalable memories with slightly higher > computing time (faster than hypre though). You may use these new > implementations if you have concern on memory scalability. The option > for these new implementation are: > -matptap_via allatonce > -matptap_via allatonce_merged > > Hong > > On Mon, Apr 15, 2019 at 12:10 PM hzh...@mcs.anl.gov > <mailto:hzh...@mcs.anl.gov> <hzh...@mcs.anl.gov > <mailto:hzh...@mcs.anl.gov>> wrote: > > Myriam: > Thank you very much for providing these results! > I have put effort to accelerate execution time and avoid using > global sizes in PtAP, for which the algorithm of transpose of > P_local and P_other likely doubles the memory usage. I'll try to > investigate why it becomes unscalable. > Hong > > Hi, > > you'll find the new scaling attached (green line). I used the > version 3.11 and the four scalability options : > -matptap_via scalable > -inner_diag_matmatmult_via scalable > -inner_offdiag_matmatmult_via scalable > -mat_freeintermediatedatastructures > > The scaling is much better! The code even uses less memory for > the smallest cases. There is still an increase for the larger > one. > > With regard to the time scaling, I used KSPView and LogView on > the two previous scalings (blue and yellow lines) but not on > the last one (green line). So we can't really compare them, am > I right? However, we can see that the new time scaling looks > quite good. It slightly increases from ~8s to ~27s. > > Unfortunately, the computations are expensive so I would like > to avoid re-run them if possible. How relevant would be a > proper time scaling for you? > > Myriam > > > Le 04/12/19 à 18:18, Zhang, Hong a écrit : >> Myriam : >> Thanks for your effort. It will help us improve PETSc. >> Hong >> >> Hi all, >> >> I used the wrong script, that's why it diverged... Sorry >> about that. >> I tried again with the right script applied on a tiny >> problem (~200 >> elements). I can see a small difference in memory usage >> (gain ~ 1mB). >> when adding the -mat_freeintermediatestructures option. I >> still have to >> execute larger cases to plot the scaling. The >> supercomputer I am used to >> run my jobs on is really busy at the moment so it takes a >> while. I hope >> I'll send you the results on Monday. >> >> Thanks everyone, >> >> Myriam >> >> >> Le 04/11/19 à 06:01, Jed Brown a écrit : >> > "Zhang, Hong" <hzh...@mcs.anl.gov >> <mailto:hzh...@mcs.anl.gov>> writes: >> > >> >> Jed: >> >>>> Myriam, >> >>>> Thanks for the plot. >> '-mat_freeintermediatedatastructures' should not affect >> solution. It releases almost half of memory in C=PtAP if >> C is not reused. >> >>> And yet if turning it on causes divergence, that >> would imply a bug. >> >>> Hong, are you able to reproduce the experiment to see >> the memory >> >>> scaling? >> >> I like to test his code using an alcf machine, but my >> hands are full now. I'll try it as soon as I find time, >> hopefully next week. >> > I have now compiled and run her code locally. >> > >> > Myriam, thanks for your last mail adding configuration >> and removing the >> > MemManager.h dependency. I ran with and without >> > -mat_freeintermediatedatastructures and don't see a >> difference in >> > convergence. What commands did you run to observe that >> difference? >> >> -- >> Myriam Peyrounette >> CNRS/IDRIS - HLST >> -- >> >> > > -- > Myriam Peyrounette > CNRS/IDRIS - HLST > -- > -- Myriam Peyrounette CNRS/IDRIS - HLST --
smime.p7s
Description: Signature cryptographique S/MIME