Re: [petsc-dev] Checkpoint-restart with DMPlex objects

Hapla Vaclav via petsc-dev Tue, 18 Dec 2018 03:27:22 -0800


On 18 Dec 2018, at 12:16, Hapla Vaclav via petsc-dev 
<petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote:

On 17 Dec 2018, at 18:11, Lawrence Mitchell <we...@gmx.li<mailto:we...@gmx.li>>
wrote:

On 17 Dec 2018, at 11:56, Hapla Vaclav
<vaclav.ha...@erdw.ethz.ch<mailto:vaclav.ha...@erdw.ethz.ch>> wrote:

Matt, great that your reminded this email. I actually completely missed it that
time.

On 14 Dec 2018, at 19:54, Matthew Knepley via petsc-dev
<petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote:

[...]

I would like:

- To be able to dump the DMPlex, and fields, on N processes

I think the current HDF5 does what you want.

- To be able to load the DMPlex, and fields, on P processes. In the first
instance, to get things going, I am happy if P=1.

I think this also works with arbitrary P, although the testing can be described
as extremely thin.

I think we need to be much more precise here. First off, there are now two HDF5
formats:
1) PETSC_VIEWER_HDF5_PETSC - store Plex graph serialization
2) PETSC_VIEWER_HDF5_XDMF - store XDMF-compatible representation of vertices
and cells
3) PETSC_VIEWER_HDF5_VIZ slightly extends 2) with some stuff for visualization,
you perhaps understand it better

PETSC_VIEWER_DEFAULT/PETSC_VIEWER_NATIVE mean store all three above. I think
what Lawrence calls Native should be 1).

The format 1) is currently written in parallel but loaded sequentially
https://bitbucket.org/petsc/petsc/src/fbb1886742ac2bbe3b4d1df09bff9724d3fee060/src/dm/impls/plex/plexhdf5.c#lines-834

I don't understand, how it can work correctly for a distributed mesh while the
Point SF (connecting partitions) is not stored FWICS. I think there's even no
PetscSFView_HDF5(). I will check it more deeply soon.

The format 2) is for which I implemented parallel DMLoad().
Unfortunately, I can't declare it bulletproof until we declare parallel
DMPlexInterpolate() as 100% working. I did quite some work towards it in
https://bitbucket.org/petsc/petsc/pull-requests/1227/dmplexintepolate-fix-orientation-of-faces/
but as stated in the PR summary, there are still some examples failing because
of the wrong Point SF which is partly fixed in knepley/fix-plex-interpolate-sf
but it seems it's not yet finished. Matt, is there any chance you could look at
it at some point in near future?

I think for Lawrence's purposes, 2) can be used to read the initial mesh file
but for checkpointing 1) seems to be better ATM because it dumps everything
including interpolated edges & faces, labels and perhaps some more additional
information.

OK, so I guess there are two different things going on here:

1. Store the data you need to reconstruct a DMPlex

2. Store the data you need to have a DMPlex viewable via XDMF.

3. Store the data you need to reconstruct a DMPlex AND have it viewable via
XDMF.

For checkpointing only purposes, I only really need 1; for viz purposes, one
only needs 2; ideally, one would not separate viz and checkpointing files if
there is sufficient overlap of data (I think there is), which needs 3.

Yes. Coordinates can be shared, topology storage is completely different, so
with 3, it's just stored in two different ways redundantly.

Note I try to explain it's not that sharp that the XDMF format itself is only
for viewing. The format itself is virtually capable to store the whole DMPlex
representation. As I say, if you create HDF5 viewer and push
PETSC_VIEWER_HDF5_XDMF format, DMLoad() loads _just_ the XDMF-related data in
parallel (using DMPlexCreateFromCellListParallel()). For example,
src/dm/impls/plex/examples/tutorials/ex5.c shows that.

The current interface is limited in that it only stores/loads vertices/cells,
and edges/faces must be made on-the-fly. However, XDMF itself allows to store
also these, so the need to call DMPlexInterpolate() could be eliminated.

Sorry, this is not entirely correct. Edges/faces are not explicitly stored but
can be referred to using cell-local numbering. So labels or fields can still
refer to them. But we can't avoid DMPlexInterpolate() after each load. For me
it's not such a problem because parallel DMPlexInterpolate() scales almost
perfectly (it's only not entirely correct ATM as I keep reminding).

Vaclav

There are also means to store labels using XDMF attributes/sets, and I'm
planning to implement this in DMPlex.

So my overall idea (which I presented also at this year's User Meeting in
London and nobody has objected yet), is that some FE codes could potentially
use only this for both checkpointing and viewing. Advantages would include
removing the redundancy in storage (reducing the file size) and increased
interoperability with 3rd party tools.

Anyway, as XDMF refers to HDF5 heavy data, one is free to store any
XDMF-unrelated data there as well, which is what happens with
PETSC_VIEWER_DEFAULT. The "native PETSc" data, the "PETSc-native" topology, is
then stored as well. _At the moment_ it's a good idea because of the
limitations of the _current_ XDMF-HDF5 I/O implementation in PETSc.

Of course, XDMF + corresponding HDF5 part can never be as general as DMPlex
itself and the "native PETSc" storage. For instance just because it supports
"only" 3 dimensions of a normal world :-) whereas DMPlex support virtually any
spatial dimension (a big deal of functionality relies on spatial dimension <=
3, though).

Anyone knows what e.g. Fenics uses for checkpointing and/or final viewable
datafile?

Thanks

Vaclav

I will nevertheless keep on working to improve 2) so that it can store edges &
faces & labels in the XDMF-compatible way.

For dumping, I think I can do DMView(dm) in PETSc "native" format, and that
will write out the topology in a global numbering.

I would use HDF5.

For the field coefficients, I can just VecView(vec). But there does not appear
to be any way of saving the Section so that I can actually attach those
coefficients to points in the mesh.

Hmm, I will check this right now. If it does not exist, I will write it.

No, it certainly doesn't exist. There is only ASCII view implemented.

I can do PetscSectionCreateGlobalSection(section), so that I have a the global
numbering for offsets, but presumably for the point numbering, I need to
convert the local chart into global point numbers using
DMPlexCreatePointNumbering?

No, all Sections use local points. We do not use global point numbers anywhere
in Plex.

True. DMPlex is partition-wise sequential. The only thing which connects the
submeshes is the Point SF.

OK, so I think I misunderstood what the dump format looks like then. For
parallel store/load cycle when I go from N to P processes what must I do?

If I understand correctly the dump on N processes contains:

For each process, in process-local numbering

- The DMPlex topology on that process

Now, given that the only thing that connects these local pieces of the DM
together is the point SF, as Vaclav says, it must be the case that a reloadable
dump file contains that information.

OK, so to dump a field so that we can reload it we must need:

- topology (in local numbering)
- point SF (connecting the local pieces of the topology together)
- Vector (dofs), presumably in local layout to make things easier
- Section describing the vector layout (local numbering)

So to load, I do:

1. load and distribute topology, and construct new point SF (this presumably
gives me a "migration SF" that maps from old points to new points

2. Broadcast the Section over migration SF so that we know how many dofs belong
to each point in the new topology

3. Broadcast the Vec over the migration SF to get the dofs to the right place.

Whenever I think of this on paper it seems "easy", but then I occasionally try
and sit down and do it and immediately get lost, so I am normally missing
something.

What am I missing this time?

Lawrence

Re: [petsc-dev] Checkpoint-restart with DMPlex objects

Reply via email to