I think the slowdown in the second test (interpolate 0) is coming from the fact that Plex has to compute the dual graph on the fly, see here https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plexpartition.c#L469 .
Are you having a performance regression with DMPLEX on this second case, or is it just the first time you noticed it? Il giorno mar 28 apr 2020 alle ore 13:57 Matthew Knepley <knep...@gmail.com> ha scritto: > On Tue, Apr 28, 2020 at 5:19 AM Pierre Jolivet <pierre.joli...@enseeiht.fr> > wrote: > >> On 14 Apr 2020, at 2:36 PM, Matthew Knepley <knep...@gmail.com> wrote: >> >> On Tue, Apr 14, 2020 at 6:36 AM Pierre Jolivet < >> pierre.joli...@enseeiht.fr> wrote: >> >>> Hello, >>> I’d like to call DMPlexInterpolate after DMPlexDistribute and not the >>> other way around for performance reasons (please stop me here if this is >>> equivalent). >>> When there is no overlap in DMPlexDistribute, it goes through fine. >>> If there is overlap, I run into an error. >>> Is this the expected behavior? >> >> >> Sorry for taking so long to get back at this. >> I thought I got everything setup, but when looking at the performance, >> I’m quite surprised. >> I rewound and looked at src/dm/impls/plex/tutorials/ex2.c by just adding >> the DMPlexDistribute step. I guess this is equivalent to your steps #1 and >> #2. >> diff --git a/src/dm/impls/plex/tutorials/ex2.c >> b/src/dm/impls/plex/tutorials/ex2.c >> index a069d922b2..5e5c4fb584 100644 >> --- a/src/dm/impls/plex/tutorials/ex2.c >> +++ b/src/dm/impls/plex/tutorials/ex2.c >> @@ -65,2 +65,5 @@ static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx >> *user, DM *dm) >> ierr = DMPlexCreateFromFile(comm, user->filename, user->interpolate, >> dm);CHKERRQ(ierr); >> + DM dmParallel; >> + ierr = DMPlexDistribute(*dm, 0, NULL, &dmParallel);CHKERRQ(ierr); >> + ierr = DMDestroy(&dmParallel);CHKERRQ(ierr); >> } >> >> With a cube of 741000 nodes 4574068 elements. >> $ cat untitled.geo && ~/gmsh-4.5.4-Linux64/bin/gmsh untitled.geo -bin -3 >> //+ >> SetFactory("OpenCASCADE"); >> Box(1) = {-0.5, -0.5, 0, 1, 1, 1}; >> Characteristic Length {:} = 0.01; >> >> I get the following timings (optimized build, scalar-type=real, >> 64-bit-indices=false, SKL cluster). >> $ mpirun -n 120 ./ex2 -filename untitled.msh -log_view -interpolate true >> DMPlexInterp 1 1.0 1.0444e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 73 0 0 0 5 73 0 0 0 6 0 >> DMPlexDistribute 1 1.0 2.8793e+01 1.0 0.00e+00 0.0 3.6e+03 6.2e+05 >> 3.3e+01 21 0100100 60 21 0100100 69 0 >> $ mpirun -n 120 ./ex2 -filename untitled.msh -log_view -interpolate false >> DMPlexDistribute 1 1.0 7.0265e+02 1.0 0.00e+00 0.0 3.1e+03 2.0e+05 >> 2.6e+01 99 0100100 58 99 0100100 68 0 >> >> Do you think I messed up something else? >> Our legacy implementation, on the same mesh, takes around 20 seconds >> (accounting for the ParMETIS call) to partition and distribute the mesh and >> generate the underlying structures for our FEM kernels (kd tree, >> distributed numbering, so on and so forth). >> Of course, DMPlex is much more versatile and generic, so I’d understand >> if it’s a little bit slower, but that’s quite a lot slower right now. >> Are there any option I could try to play with to speed things up? >> > > Yes, something is messed up. Vaclav's latest paper has almost the exact > setup for the cube benchmark (https://arxiv.org/pdf/2004.08729.pdf), and > you can see that the interpolation time is an order of magnitude smaller > for 128 procs and also scales linearly. What you show above in ex2 looks > serial, in that the mesh is loaded on 1 proc, and then interpolation is > also done on 1 proc, which corresponds to the time he shows as "Serial > Startup". I cannot explain the poor performance of your second run, however > the interpolation time in the first can be greatly reduced by interpolating > in parallel. Thus, the right answer is to load in parallel, interpolation, > and redistribute. That is what we do in that paper. > > Short answer: If you want scalable load and interpolation, I think you > should be using the parallel stuff that Vaclav just put in. I think that > means first converting your mesh to the HDF5 format using CreateFromFile > and then DMView to hdf5. Then you can load it in parallel. Vaclav, has > everything been pushed for this? > > Thanks, > > Matt > > Thanks in advance for your help, >> Pierre >> >> Yes. What we want you to do is: >> >> 1) Load/generate mesh >> >> 2) Distribute (this can be done at the same time as load with parallel >> load) >> >> 3) Interpolate (this is also an option from parallel load) >> >> 4) If necessary, redistribute for load balance >> >> 5) Construct overlap >> >> When you pass '1' below to DMDistribute(), it distributes as normal >> and then calls >> >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexDistributeOverlap.html >> >> at the end. So you just postpone calling that until you have >> interpolated. >> >> Thanks, >> >> Matt >> >> >>> Here is a MWE. >>> $ patch -p1 < patch.txt >>> $ cd src/dm/impls/plex/tests/ >>> $ make ex18 >>> $ mpirun -n 2 ./ex18 -distribute -interpolate after_distribute >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: Petsc has generated inconsistent data >>> [0]PETSC ERROR: Point SF contains 1 which is a cell >>> >>> Thanks, >>> Pierre >>> >>> diff --git a/src/dm/impls/plex/tests/ex18.c >>> b/src/dm/impls/plex/tests/ex18.c >>> index 07421b3522..dd62be58e5 100644 >>> --- a/src/dm/impls/plex/tests/ex18.c >>> +++ b/src/dm/impls/plex/tests/ex18.c >>> @@ -806 +806 @@ static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx >>> *user, DM *dm) >>> - ierr = DMPlexDistribute(*dm, 0, NULL, &pdm);CHKERRQ(ierr); >>> + ierr = DMPlexDistribute(*dm, 1, NULL, &pdm);CHKERRQ(ierr); >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> > -- Stefano