I think the slowdown in the second test (interpolate 0) is coming from the
fact that Plex has to compute the dual graph on the fly, see here
https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plexpartition.c#L469
.

Are you having a performance regression with DMPLEX on this second case, or
is it just the first time you noticed it?

Il giorno mar 28 apr 2020 alle ore 13:57 Matthew Knepley <knep...@gmail.com>
ha scritto:

> On Tue, Apr 28, 2020 at 5:19 AM Pierre Jolivet <pierre.joli...@enseeiht.fr>
> wrote:
>
>> On 14 Apr 2020, at 2:36 PM, Matthew Knepley <knep...@gmail.com> wrote:
>>
>> On Tue, Apr 14, 2020 at 6:36 AM Pierre Jolivet <
>> pierre.joli...@enseeiht.fr> wrote:
>>
>>> Hello,
>>> I’d like to call DMPlexInterpolate after DMPlexDistribute and not the
>>> other way around for performance reasons (please stop me here if this is
>>> equivalent).
>>> When there is no overlap in DMPlexDistribute, it goes through fine.
>>> If there is overlap, I run into an error.
>>> Is this the expected behavior?
>>
>>
>> Sorry for taking so long to get back at this.
>> I thought I got everything setup, but when looking at the performance,
>> I’m quite surprised.
>> I rewound and looked at src/dm/impls/plex/tutorials/ex2.c by just adding
>> the DMPlexDistribute step. I guess this is equivalent to your steps #1 and
>> #2.
>> diff --git a/src/dm/impls/plex/tutorials/ex2.c
>> b/src/dm/impls/plex/tutorials/ex2.c
>> index a069d922b2..5e5c4fb584 100644
>> --- a/src/dm/impls/plex/tutorials/ex2.c
>> +++ b/src/dm/impls/plex/tutorials/ex2.c
>> @@ -65,2 +65,5 @@ static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx
>> *user, DM *dm)
>>      ierr = DMPlexCreateFromFile(comm, user->filename, user->interpolate,
>> dm);CHKERRQ(ierr);
>> +    DM dmParallel;
>> +    ierr = DMPlexDistribute(*dm, 0, NULL, &dmParallel);CHKERRQ(ierr);
>> +    ierr = DMDestroy(&dmParallel);CHKERRQ(ierr);
>>    }
>>
>> With a cube of 741000 nodes 4574068 elements.
>> $ cat untitled.geo && ~/gmsh-4.5.4-Linux64/bin/gmsh untitled.geo -bin -3
>> //+
>> SetFactory("OpenCASCADE");
>> Box(1) = {-0.5, -0.5, 0, 1, 1, 1};
>> Characteristic Length {:} = 0.01;
>>
>> I get the following timings (optimized build, scalar-type=real,
>> 64-bit-indices=false, SKL cluster).
>> $ mpirun  -n 120 ./ex2 -filename untitled.msh -log_view -interpolate true
>> DMPlexInterp           1 1.0 1.0444e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 3.0e+00 73  0  0  0  5  73  0  0  0  6     0
>> DMPlexDistribute       1 1.0 2.8793e+01 1.0 0.00e+00 0.0 3.6e+03 6.2e+05
>> 3.3e+01 21  0100100 60  21  0100100 69     0
>> $ mpirun  -n 120 ./ex2 -filename untitled.msh -log_view -interpolate false
>> DMPlexDistribute       1 1.0 7.0265e+02 1.0 0.00e+00 0.0 3.1e+03 2.0e+05
>> 2.6e+01 99  0100100 58  99  0100100 68     0
>>
>> Do you think I messed up something else?
>> Our legacy implementation, on the same mesh, takes around 20 seconds
>> (accounting for the ParMETIS call) to partition and distribute the mesh and
>> generate the underlying structures for our FEM kernels (kd tree,
>> distributed numbering, so on and so forth).
>> Of course, DMPlex is much more versatile and generic, so I’d understand
>> if it’s a little bit slower, but that’s quite a lot slower right now.
>> Are there any option I could try to play with to speed things up?
>>
>
> Yes, something is messed up. Vaclav's latest paper has almost the exact
> setup for the cube benchmark (https://arxiv.org/pdf/2004.08729.pdf), and
> you can see that the interpolation time is an order of magnitude smaller
> for 128 procs and also scales linearly. What you show above in ex2 looks
> serial, in that the mesh is loaded on 1 proc, and then interpolation is
> also done on 1 proc, which corresponds to the time he shows as "Serial
> Startup". I cannot explain the poor performance of your second run, however
> the interpolation time in the first can be greatly reduced by interpolating
> in parallel. Thus, the right answer is to load in parallel, interpolation,
> and redistribute. That is what we do in that paper.
>
> Short answer: If you want scalable load and interpolation, I think you
> should be using the parallel stuff that Vaclav just put in. I think that
> means first converting your mesh to the HDF5 format using CreateFromFile
> and then DMView to hdf5. Then you can load it in parallel. Vaclav, has
> everything been pushed for this?
>
>   Thanks,
>
>       Matt
>
> Thanks in advance for your help,
>> Pierre
>>
>> Yes. What we want you to do is:
>>
>> 1) Load/generate mesh
>>
>> 2) Distribute (this can be done at the same time as load with parallel
>> load)
>>
>> 3) Interpolate (this is also an option from parallel load)
>>
>> 4) If necessary, redistribute for load balance
>>
>> 5) Construct overlap
>>
>>     When you pass '1' below to DMDistribute(), it distributes as normal
>> and then calls
>>
>>
>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexDistributeOverlap.html
>>
>>     at the end. So you just postpone calling that until you have
>> interpolated.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>> Here is a MWE.
>>> $ patch -p1 < patch.txt
>>> $ cd src/dm/impls/plex/tests/
>>> $ make ex18
>>> $ mpirun -n 2 ./ex18 -distribute -interpolate after_distribute
>>> [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> [0]PETSC ERROR: Petsc has generated inconsistent data
>>> [0]PETSC ERROR: Point SF contains 1 which is a cell
>>>
>>> Thanks,
>>> Pierre
>>>
>>> diff --git a/src/dm/impls/plex/tests/ex18.c
>>> b/src/dm/impls/plex/tests/ex18.c
>>> index 07421b3522..dd62be58e5 100644
>>> --- a/src/dm/impls/plex/tests/ex18.c
>>> +++ b/src/dm/impls/plex/tests/ex18.c
>>> @@ -806 +806 @@ static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx
>>> *user, DM *dm)
>>> -    ierr = DMPlexDistribute(*dm, 0, NULL, &pdm);CHKERRQ(ierr);
>>> +    ierr = DMPlexDistribute(*dm, 1, NULL, &pdm);CHKERRQ(ierr);
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
Stefano

Reply via email to