So, I tried a distributed input file with our legacy implementation, down to 10 seconds. I’ll try to figure out the CreateFromFile + DMView stuff. Stefano, I can’t say whether it’s a regression or not, I’m just starting to interface DMPlex in our code to see if it’s a viable alternative.
Thanks, Pierre > On 28 Apr 2020, at 1:03 PM, Stefano Zampini <stefano.zamp...@gmail.com> wrote: > > I think the slowdown in the second test (interpolate 0) is coming from the > fact that Plex has to compute the dual graph on the fly, see here > https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plexpartition.c#L469 > > <https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plexpartition.c#L469>. > > Are you having a performance regression with DMPLEX on this second case, or > is it just the first time you noticed it? > > Il giorno mar 28 apr 2020 alle ore 13:57 Matthew Knepley <knep...@gmail.com > <mailto:knep...@gmail.com>> ha scritto: > On Tue, Apr 28, 2020 at 5:19 AM Pierre Jolivet <pierre.joli...@enseeiht.fr > <mailto:pierre.joli...@enseeiht.fr>> wrote: >> On 14 Apr 2020, at 2:36 PM, Matthew Knepley <knep...@gmail.com >> <mailto:knep...@gmail.com>> wrote: >> >> On Tue, Apr 14, 2020 at 6:36 AM Pierre Jolivet <pierre.joli...@enseeiht.fr >> <mailto:pierre.joli...@enseeiht.fr>> wrote: >> Hello, >> I’d like to call DMPlexInterpolate after DMPlexDistribute and not the other >> way around for performance reasons (please stop me here if this is >> equivalent). >> When there is no overlap in DMPlexDistribute, it goes through fine. >> If there is overlap, I run into an error. >> Is this the expected behavior? > > Sorry for taking so long to get back at this. > I thought I got everything setup, but when looking at the performance, I’m > quite surprised. > I rewound and looked at src/dm/impls/plex/tutorials/ex2.c by just adding the > DMPlexDistribute step. I guess this is equivalent to your steps #1 and #2. > diff --git a/src/dm/impls/plex/tutorials/ex2.c > b/src/dm/impls/plex/tutorials/ex2.c > index a069d922b2..5e5c4fb584 100644 > --- a/src/dm/impls/plex/tutorials/ex2.c > +++ b/src/dm/impls/plex/tutorials/ex2.c > @@ -65,2 +65,5 @@ static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx > *user, DM *dm) > ierr = DMPlexCreateFromFile(comm, user->filename, user->interpolate, > dm);CHKERRQ(ierr); > + DM dmParallel; > + ierr = DMPlexDistribute(*dm, 0, NULL, &dmParallel);CHKERRQ(ierr); > + ierr = DMDestroy(&dmParallel);CHKERRQ(ierr); > } > > With a cube of 741000 nodes 4574068 elements. > $ cat untitled.geo && ~/gmsh-4.5.4-Linux64/bin/gmsh untitled.geo -bin -3 > //+ > SetFactory("OpenCASCADE"); > Box(1) = {-0.5, -0.5, 0, 1, 1, 1}; > Characteristic Length {:} = 0.01; > > I get the following timings (optimized build, scalar-type=real, > 64-bit-indices=false, SKL cluster). > $ mpirun -n 120 ./ex2 -filename untitled.msh -log_view -interpolate true > DMPlexInterp 1 1.0 1.0444e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.0e+00 73 0 0 0 5 73 0 0 0 6 0 > DMPlexDistribute 1 1.0 2.8793e+01 1.0 0.00e+00 0.0 3.6e+03 6.2e+05 > 3.3e+01 21 0100100 60 21 0100100 69 0 > $ mpirun -n 120 ./ex2 -filename untitled.msh -log_view -interpolate false > DMPlexDistribute 1 1.0 7.0265e+02 1.0 0.00e+00 0.0 3.1e+03 2.0e+05 > 2.6e+01 99 0100100 58 99 0100100 68 0 > > Do you think I messed up something else? > Our legacy implementation, on the same mesh, takes around 20 seconds > (accounting for the ParMETIS call) to partition and distribute the mesh and > generate the underlying structures for our FEM kernels (kd tree, distributed > numbering, so on and so forth). > Of course, DMPlex is much more versatile and generic, so I’d understand if > it’s a little bit slower, but that’s quite a lot slower right now. > Are there any option I could try to play with to speed things up? > > Yes, something is messed up. Vaclav's latest paper has almost the exact setup > for the cube benchmark (https://arxiv.org/pdf/2004.08729.pdf > <https://arxiv.org/pdf/2004.08729.pdf>), and you can see that the > interpolation time is an order of magnitude smaller for 128 procs and also > scales linearly. What you show above in ex2 looks serial, in that the mesh is > loaded on 1 proc, and then interpolation is also done on 1 proc, which > corresponds to the time he shows as "Serial Startup". I cannot explain the > poor performance of your second run, however the interpolation time in the > first can be greatly reduced by interpolating in parallel. Thus, the right > answer is to load in parallel, interpolation, and redistribute. That is what > we do in that paper. > > Short answer: If you want scalable load and interpolation, I think you should > be using the parallel stuff that Vaclav just put in. I think that means first > converting your mesh to the HDF5 format using CreateFromFile and then DMView > to hdf5. Then you can load it in parallel. Vaclav, has everything been pushed > for this? > > Thanks, > > Matt > > Thanks in advance for your help, > Pierre > >> Yes. What we want you to do is: >> >> 1) Load/generate mesh >> >> 2) Distribute (this can be done at the same time as load with parallel load) >> >> 3) Interpolate (this is also an option from parallel load) >> >> 4) If necessary, redistribute for load balance >> >> 5) Construct overlap >> >> When you pass '1' below to DMDistribute(), it distributes as normal and >> then calls >> >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexDistributeOverlap.html >> >> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexDistributeOverlap.html> >> >> at the end. So you just postpone calling that until you have >> interpolated. >> >> Thanks, >> >> Matt >> >> Here is a MWE. >> $ patch -p1 < patch.txt >> $ cd src/dm/impls/plex/tests/ >> $ make ex18 >> $ mpirun -n 2 ./ex18 -distribute -interpolate after_distribute >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Petsc has generated inconsistent data >> [0]PETSC ERROR: Point SF contains 1 which is a cell >> >> Thanks, >> Pierre >> >> diff --git a/src/dm/impls/plex/tests/ex18.c b/src/dm/impls/plex/tests/ex18.c >> index 07421b3522..dd62be58e5 100644 >> --- a/src/dm/impls/plex/tests/ex18.c >> +++ b/src/dm/impls/plex/tests/ex18.c >> @@ -806 +806 @@ static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx >> *user, DM *dm) >> - ierr = DMPlexDistribute(*dm, 0, NULL, &pdm);CHKERRQ(ierr); >> + ierr = DMPlexDistribute(*dm, 1, NULL, &pdm);CHKERRQ(ierr); >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/> > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/> > > > -- > Stefano