Re: [FEniCS] Issue 162 - interpolation on nonmatching meshes

Mikael Mortensen Thu, 05 Dec 2013 00:57:25 -0800

Den Dec 4, 2013 kl. 6:58 PM skrev Garth N. Wells:

> On 2013-12-04 17:15, Anders Logg wrote:
>> On Tue, Dec 03, 2013 at 10:12:22AM +0100, Mikael Mortensen wrote:
>>> Den Dec 2, 2013 kl. 9:17 PM skrev Anders Logg:
>>> > I don't see a simple way around this (other than inventing a new
>>> > algorithm - the only way I can think of is to create a third data
>>> > structure with a regular grid predictively distributed onto the
>>> > different processors and then communicate via that grid) so I don't
>>> > mind this being added.
>>> Great
>>> I could probably create a very inefficient parallel interpolation
>>> routine by looping over all partitions of the distributed mesh1, one
>>> by one, collectively (i.e., broadcast mesh1's partition on process 0
>>> to all, then run collectively over it, allowing us to call u0.eval
>>> or evaluate_dofs collectively. When finished move on to partition on
>>> process 1). That could work for any space I guess?
>> I meant communicating via a third independent grid constructed in a
>> way such that each cell only needs to communicate with a subset of
>> that third grid.
> 
> I would create a bounding box(es) of the mesh on each process, and then build 
> a search tree for the boxes. Then do
>


That's a good idea if I understand correctly. Use all_gathered bounding boxes 
for the mesh partitions to find the process that owns the interpolation point 
and then only call u0.eval after found. I can then reuse in_boundary_box++ from 
PeriodicBoundaryComputation, right? I'll have a closer look.

You sure this isn't handled automatically by bounding box trees created the 
first time u0.eval is called?

>  for (CellIterator cell(*_mesh); !cell.end(); ++cell)
>  {
>    // If interpolation point is on this process, do as we do now
> 
>    // If interpolation point is *not* on this process, insert
>    // into cache to be fetched
>  }
> 
>  // Call function to retrieve non-local data. This function should use
>  // the search tree.

Not quite sure I follow. AFAIK it is still not possible to do as before since 
restrict_as_ufc_function calls element.evaluate_dofs and as such the entire 
cell must be on both processes, not just the interpolation point. I would 
follow if the above was an iteration over interpolation points, though, like it 
is in my code already.

For Lagrange elements there is really no reason to do the cell loop. I forgot 
to mention in the previous mail that the nonmatching routine attached to the 
issue is 4-5 (!!!) times faster in 3D than the regular interpolate for P1 
elements. That is because the loop is over vertices and not over cells. And 
there are many fewer callbacks to eval, just the optimal one for each dof. 

Mikael


> 
> The constrained/periodic bc code uses bounding boxes to locate constrained 
> vertices/edges/facets that are off-process. It doesn't use a tree for 
> searching, but it should. The some abstract function for finding the 
> off-process owner of a point or mesh entity would be very helpful.
> 
> Garth
> 
> 
> 
>> --
>> Anders
>>> > Since not all elements are supported, I think this function should
>>> > have a separate name for now, perhaps
>>> >
>>> >   interpolate_lagrange()
>>> >
>>> > My suggestion would be that the documentation explains that this
>>> > function works in parallel for non-matching meshes (in contrast to the
>>> > regular interpolate) but does only support Lagrange spaces.
>>> Sounds good to me.
>>> Mikael
>>> >
>>> >> Hi,
>>> >>
>>> >> This is in reference to Issue 162 
>>> >> https://bitbucket.org/fenics-project/dolfin/
>>> >> issue/162/interpolation-nonmatching-mesh-in-parallel
>>> >>
>>> >> Currently it is not possible in parallel to interpolate a
>>> >> dolfin Function u0 defined on one mesh to a Function with the
>>> >> same FunctionSpace defined on a different mesh. I would like to be able 
>>> >> to do
>>> >> this
>>> >>
>>> >> mesh0 = UnitSquareMesh(100, 100)
>>> >> mesh1 = UnitSquareMesh(20, 20)
>>> >> V0 = FunctionSpace(mesh0, "CG", 1)
>>> >> V1 = FunctionSpace(mesh1, "CG", 1)
>>> >> u0 = interpolate(Expression("x[0]*x[1]"), V0)
>>> >> u0.update()
>>> >> u = interpolate_nonmatching_mesh(u0, V1)
>>> >>
>>> >> or probably overloaded like this
>>> >>
>>> >> u = interpolate(u0, V1)
>>> >>
>>> >> Now there are probably many different ways this can be implemented. The 
>>> >> most
>>> >> obvious approach that comes to mind is simply to overload 
>>> >> `Function::eval` with
>>> >> a parallel version. There are many reasons why this doesn’t quite seem 
>>> >> to work.
>>> >> Consider the current `interpolate` implementation
>>> >>
>>> >> FunctionSpace::interpolate(GenericVector& expansion_coefficients,
>>> >>                                const GenericFunction& v)
>>> >> …
>>> >> // Iterate over mesh and interpolate on each cell
>>> >>  UFCCell ufc_cell(*_mesh);
>>> >>  for (CellIterator cell(*_mesh); !cell.end(); ++cell)
>>> >>  {
>>> >>    // Update to current cell
>>> >>    ufc_cell.update(*cell);
>>> >>
>>> >>    // Restrict function to cell
>>> >>    v.restrict(&cell_coefficients[0], *_element, *cell, ufc_cell);
>>> >>
>>> >>    // Tabulate dofs
>>> >>    const std::vector<dolfin::la_index>& cell_dofs = 
>>> >> _dofmap->cell_dofs(cell->
>>> >> index());
>>> >>
>>> >>    // Copy dofs to vector
>>> >>    expansion_coefficients.set(&cell_coefficients[0],
>>> >>                               _dofmap->cell_dimension(cell->index()),
>>> >>                               &cell_dofs[0]);
>>> >>  }
>>> >>
>>> >> The cell loop runs on all processors for the part of the mesh of V1 that 
>>> >> lives
>>> >> on that process. `v.restrict` calls `restrict_as_ufc_function` since V0 
>>> >> and V1
>>> >> have different meshes, `restrict_as_ufc_function` calls 
>>> >> `element.evaluate_dofs`
>>> >> that again calls `u0.eval` for each dof on the cell. Now the major 
>>> >> problem here
>>> >> that makes it difficult to extend this to parallel is that the cell 
>>> >> iteration
>>> >> is not collective and `eval` is not called collectively. Since the 
>>> >> coordinates
>>> >> of dofs on a cell in the V1-mesh could live on any other cpu for the 
>>> >> V0-mesh,
>>> >> the `eval` needs to be called collectively with some try/catch clause 
>>> >> and with
>>> >> a lot of broadcasting (of coordinates and result) to obtain what is
>>> >> required.  Alternatively, the non-collective `eval` that fails must be 
>>> >> captured
>>> >> and handled using MPI++ later.
>>> >>
>>> >> Faced with these problems my solution has instead been to create a 
>>> >> stand-alone
>>> >> version for parallel interpolation on nonmatching meshes. Not trying to
>>> >> overload `eval` or fit it into the current `interpolate` framework. The 
>>> >> code
>>> >> that performs the interpolation is attached to issue 162. The code is 
>>> >> fairly
>>> >> well documented and should, hopefully, be possible to follow. The 
>>> >> algorithm is
>>> >> basically like this
>>> >>
>>> >> 1) On each process tabulate all coordinates for dofs in the functionspace
>>> >> we’re interpolating to, i.e., V1 in the above code.
>>> >>
>>> >> x = V1.dofmap()->tabulate_all_coordinates(mesh1)
>>> >>
>>> >> 2) Create a map from dof number in V1 to component number in Mixed Space.
>>> >>
>>> >> This is necessary to be able to handle mixed spaces. I do not think I 
>>> >> have
>>> >> found the best solution to this problem, so any help here would be much
>>> >> appreciated. I just realized that the current solution does not work
>>> >> for special spaces like BDM or RT.
>>> >>
>>> >> 3) Compute `u0.eval` for all points x computed in 1).
>>> >>
>>> >> Problem here is that u0 and u have different meshes and as such a point 
>>> >> in u
>>> >> will not necessarily, or not even likely, be found on the same processor 
>>> >> for
>>> >> u0. Hence the dofs coordinates must be passed around and searched on all 
>>> >> ranks
>>> >> until found. When found the result is returned to the process that owns 
>>> >> the
>>> >> dof. (Alternatively, but slower, `u0.eval` could simply be called
>>> >> collectively.) A major issue here for performance is that all points not 
>>> >> found
>>> >> on one process are collected in a vector and passed on to the
>>> >> next process through just one single MPI::send_recv. All points not 
>>> >> found are
>>> >> sent and points not found by process with one lower rank are received.
>>> >>
>>> >> 4) Finally, set all values in u using the dof to component map.
>>> >>
>>> >>
>>> >> The problem is inherently unscalable by nature since the efficiency 
>>> >> depends on
>>> >> the distribution of two different meshes. If they happen to overlap 
>>> >> much, then
>>> >> the routine may be faster with 2 than one cpu, but in the end that 
>>> >> depends on
>>> >> the mesh partitioning that you cannot really control, at least not just 
>>> >> using
>>> >> the default mesh-partitioner. Consider running a problem with 2 CPUs. 
>>> >> Here it
>>> >> should be quite obvious that each dof must be searched on average 1.5 
>>> >> times
>>> >> (half are found on this CPU and the rest on the other). Since each 
>>> >> process'
>>> >> mesh is half the size of the original, the total cost is then 
>>> >> approximately
>>> >> 1.5*0.5 = 0.75. So, not counting any cost of MPI communication, you 
>>> >> should
>>> >> expect to reduce the computational time to 75% of the original. However, 
>>> >> the
>>> >> MPI communications are extensive (half the mesh is sent back and forth) 
>>> >> and you
>>> >> should honestly not expect any speedup at all. The estimate for N cpus 
>>> >> will
>>> >> similarly be (N+1) / N / 2 + MPI communications.
>>> >>
>>> >> If you have any comments to the implementation or, even better, 
>>> >> suggestions on
>>> >> how to improve it, then please let me know by following up on this mail. 
>>> >> If you
>>> >> like the current solution and you’d like to see this implemented soon
>>> >> in dolfin, then please vote for issue 162 on bitbucket:-)
>>> >>
>>> >> Meanwhile, the `interpolate_nonmatching_mesh` function is available on 
>>> >> https://
>>> >> github.com/mikaem/fenicstools
>>> >>
>>> >> Best regards
>>> >>
>>> >> Mikael
>>> >>
>>> >>
>>> >
>>> >> _______________________________________________
>>> >> fenics mailing list
>>> >> [email protected]
>>> >> http://fenicsproject.org/mailman/listinfo/fenics
>>> >
>>> > _______________________________________________
>>> > fenics mailing list
>>> > [email protected]
>>> > http://fenicsproject.org/mailman/listinfo/fenics
>> _______________________________________________
>> fenics mailing list
>> [email protected]
>> http://fenicsproject.org/mailman/listinfo/fenics
> _______________________________________________
> fenics mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] Issue 162 - interpolation on nonmatching meshes

Reply via email to