So, I tried a distributed input file with our legacy implementation, down to 10 
seconds.
I’ll try to figure out the CreateFromFile + DMView stuff.
Stefano, I can’t say whether it’s a regression or not, I’m just starting to 
interface DMPlex in our code to see if it’s a viable alternative.

Thanks,
Pierre

> On 28 Apr 2020, at 1:03 PM, Stefano Zampini <stefano.zamp...@gmail.com> wrote:
> 
> I think the slowdown in the second test (interpolate 0) is coming from the 
> fact that Plex has to compute the dual graph on the fly, see here 
> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plexpartition.c#L469
>  
> <https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plexpartition.c#L469>.
> 
> Are you having a performance regression with DMPLEX on this second case, or 
> is it just the first time you noticed it?
> 
> Il giorno mar 28 apr 2020 alle ore 13:57 Matthew Knepley <knep...@gmail.com 
> <mailto:knep...@gmail.com>> ha scritto:
> On Tue, Apr 28, 2020 at 5:19 AM Pierre Jolivet <pierre.joli...@enseeiht.fr 
> <mailto:pierre.joli...@enseeiht.fr>> wrote:
>> On 14 Apr 2020, at 2:36 PM, Matthew Knepley <knep...@gmail.com 
>> <mailto:knep...@gmail.com>> wrote:
>> 
>> On Tue, Apr 14, 2020 at 6:36 AM Pierre Jolivet <pierre.joli...@enseeiht.fr 
>> <mailto:pierre.joli...@enseeiht.fr>> wrote:
>> Hello,
>> I’d like to call DMPlexInterpolate after DMPlexDistribute and not the other 
>> way around for performance reasons (please stop me here if this is 
>> equivalent).
>> When there is no overlap in DMPlexDistribute, it goes through fine.
>> If there is overlap, I run into an error.
>> Is this the expected behavior?
> 
> Sorry for taking so long to get back at this.
> I thought I got everything setup, but when looking at the performance, I’m 
> quite surprised.
> I rewound and looked at src/dm/impls/plex/tutorials/ex2.c by just adding the 
> DMPlexDistribute step. I guess this is equivalent to your steps #1 and #2.
> diff --git a/src/dm/impls/plex/tutorials/ex2.c 
> b/src/dm/impls/plex/tutorials/ex2.c
> index a069d922b2..5e5c4fb584 100644
> --- a/src/dm/impls/plex/tutorials/ex2.c
> +++ b/src/dm/impls/plex/tutorials/ex2.c
> @@ -65,2 +65,5 @@ static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx 
> *user, DM *dm)
>      ierr = DMPlexCreateFromFile(comm, user->filename, user->interpolate, 
> dm);CHKERRQ(ierr);
> +    DM dmParallel;
> +    ierr = DMPlexDistribute(*dm, 0, NULL, &dmParallel);CHKERRQ(ierr);
> +    ierr = DMDestroy(&dmParallel);CHKERRQ(ierr);
>    }
> 
> With a cube of 741000 nodes 4574068 elements.
> $ cat untitled.geo && ~/gmsh-4.5.4-Linux64/bin/gmsh untitled.geo -bin -3
> //+
> SetFactory("OpenCASCADE");
> Box(1) = {-0.5, -0.5, 0, 1, 1, 1};
> Characteristic Length {:} = 0.01;
> 
> I get the following timings (optimized build, scalar-type=real, 
> 64-bit-indices=false, SKL cluster).
> $ mpirun  -n 120 ./ex2 -filename untitled.msh -log_view -interpolate true
> DMPlexInterp           1 1.0 1.0444e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 3.0e+00 73  0  0  0  5  73  0  0  0  6     0
> DMPlexDistribute       1 1.0 2.8793e+01 1.0 0.00e+00 0.0 3.6e+03 6.2e+05 
> 3.3e+01 21  0100100 60  21  0100100 69     0
> $ mpirun  -n 120 ./ex2 -filename untitled.msh -log_view -interpolate false
> DMPlexDistribute       1 1.0 7.0265e+02 1.0 0.00e+00 0.0 3.1e+03 2.0e+05 
> 2.6e+01 99  0100100 58  99  0100100 68     0
> 
> Do you think I messed up something else?
> Our legacy implementation, on the same mesh, takes around 20 seconds 
> (accounting for the ParMETIS call) to partition and distribute the mesh and 
> generate the underlying structures for our FEM kernels (kd tree, distributed 
> numbering, so on and so forth).
> Of course, DMPlex is much more versatile and generic, so I’d understand if 
> it’s a little bit slower, but that’s quite a lot slower right now.
> Are there any option I could try to play with to speed things up?
> 
> Yes, something is messed up. Vaclav's latest paper has almost the exact setup 
> for the cube benchmark (https://arxiv.org/pdf/2004.08729.pdf 
> <https://arxiv.org/pdf/2004.08729.pdf>), and you can see that the 
> interpolation time is an order of magnitude smaller for 128 procs and also 
> scales linearly. What you show above in ex2 looks serial, in that the mesh is 
> loaded on 1 proc, and then interpolation is also done on 1 proc, which 
> corresponds to the time he shows as "Serial Startup". I cannot explain the 
> poor performance of your second run, however the interpolation time in the 
> first can be greatly reduced by interpolating in parallel. Thus, the right 
> answer is to load in parallel, interpolation, and redistribute. That is what 
> we do in that paper.
> 
> Short answer: If you want scalable load and interpolation, I think you should 
> be using the parallel stuff that Vaclav just put in. I think that means first 
> converting your mesh to the HDF5 format using CreateFromFile and then DMView 
> to hdf5. Then you can load it in parallel. Vaclav, has everything been pushed 
> for this?
> 
>   Thanks,
> 
>       Matt
> 
> Thanks in advance for your help,
> Pierre
> 
>> Yes. What we want you to do is:
>> 
>> 1) Load/generate mesh
>> 
>> 2) Distribute (this can be done at the same time as load with parallel load)
>> 
>> 3) Interpolate (this is also an option from parallel load)
>> 
>> 4) If necessary, redistribute for load balance
>> 
>> 5) Construct overlap
>> 
>>     When you pass '1' below to DMDistribute(), it distributes as normal and 
>> then calls
>> 
>>       
>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexDistributeOverlap.html
>>  
>> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexDistributeOverlap.html>
>> 
>>     at the end. So you just postpone calling that until you have 
>> interpolated.
>> 
>>   Thanks,
>> 
>>     Matt
>>  
>> Here is a MWE.
>> $ patch -p1 < patch.txt
>> $ cd src/dm/impls/plex/tests/
>> $ make ex18
>> $ mpirun -n 2 ./ex18 -distribute -interpolate after_distribute
>> [0]PETSC ERROR: --------------------- Error Message 
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Petsc has generated inconsistent data
>> [0]PETSC ERROR: Point SF contains 1 which is a cell
>> 
>> Thanks,
>> Pierre
>> 
>> diff --git a/src/dm/impls/plex/tests/ex18.c b/src/dm/impls/plex/tests/ex18.c
>> index 07421b3522..dd62be58e5 100644
>> --- a/src/dm/impls/plex/tests/ex18.c
>> +++ b/src/dm/impls/plex/tests/ex18.c
>> @@ -806 +806 @@ static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx 
>> *user, DM *dm)
>> -    ierr = DMPlexDistribute(*dm, 0, NULL, &pdm);CHKERRQ(ierr);
>> +    ierr = DMPlexDistribute(*dm, 1, NULL, &pdm);CHKERRQ(ierr);
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 
> 
> -- 
> Stefano

Reply via email to