Hi,
 
This is in reference to Issue 162 
https://bitbucket.org/fenics-project/dolfin/issue/162/interpolation-nonmatching-mesh-in-parallel
 
Currently it is not possible in parallel to interpolate a dolfin Function u0 
defined on one mesh to a Function with the same FunctionSpace defined on a 
different mesh. I would like to be able to do this
 
mesh0 = UnitSquareMesh(100, 100)
mesh1 = UnitSquareMesh(20, 20)
V0 = FunctionSpace(mesh0, "CG", 1)
V1 = FunctionSpace(mesh1, "CG", 1)
u0 = interpolate(Expression("x[0]*x[1]"), V0)
u0.update()
u = interpolate_nonmatching_mesh(u0, V1)
 
or probably overloaded like this
 
u = interpolate(u0, V1)
 
Now there are probably many different ways this can be implemented. The most 
obvious approach that comes to mind is simply to overload `Function::eval` with 
a parallel version. There are many reasons why this doesn’t quite seem to work. 
Consider the current `interpolate` implementation
 
FunctionSpace::interpolate(GenericVector& expansion_coefficients,
                                const GenericFunction& v)
…
// Iterate over mesh and interpolate on each cell
  UFCCell ufc_cell(*_mesh);
  for (CellIterator cell(*_mesh); !cell.end(); ++cell)
  {
    // Update to current cell
    ufc_cell.update(*cell);
 
    // Restrict function to cell
    v.restrict(&cell_coefficients[0], *_element, *cell, ufc_cell);
 
    // Tabulate dofs
    const std::vector<dolfin::la_index>& cell_dofs = 
_dofmap->cell_dofs(cell->index());
 
    // Copy dofs to vector
    expansion_coefficients.set(&cell_coefficients[0],
                               _dofmap->cell_dimension(cell->index()),
                               &cell_dofs[0]);
  }
 
The cell loop runs on all processors for the part of the mesh of V1 that lives 
on that process. `v.restrict` calls `restrict_as_ufc_function` since V0 and V1 
have different meshes, `restrict_as_ufc_function` calls `element.evaluate_dofs` 
that again calls `u0.eval` for each dof on the cell. Now the major problem here 
that makes it difficult to extend this to parallel is that the cell iteration 
is not collective and `eval` is not called collectively. Since the coordinates 
of dofs on a cell in the V1-mesh could live on any other cpu for the V0-mesh, 
the `eval` needs to be called collectively with some try/catch clause and with 
a lot of broadcasting (of coordinates and result) to obtain what is required.  
Alternatively, the non-collective `eval` that fails must be captured and 
handled using MPI++ later.
 
Faced with these problems my solution has instead been to create a stand-alone 
version for parallel interpolation on nonmatching meshes. Not trying to 
overload `eval` or fit it into the current `interpolate` framework. The code 
that performs the interpolation is attached to issue 162. The code is fairly 
well documented and should, hopefully, be possible to follow. The algorithm is 
basically like this
 
1) On each process tabulate all coordinates for dofs in the functionspace we’re 
interpolating to, i.e., V1 in the above code.

x = V1.dofmap()->tabulate_all_coordinates(mesh1)
 
2) Create a map from dof number in V1 to component number in Mixed Space.    

This is necessary to be able to handle mixed spaces. I do not think I have 
found the best solution to this problem, so any help here would be much 
appreciated. I just realized that the current solution does not work for 
special spaces like BDM or RT.
    
3) Compute `u0.eval` for all points x computed in 1).

Problem here is that u0 and u have different meshes and as such a point in u 
will not necessarily, or not even likely, be found on the same processor for 
u0. Hence the dofs coordinates must be passed around and searched on all ranks 
until found. When found the result is returned to the process that owns the 
dof. (Alternatively, but slower, `u0.eval` could simply be called 
collectively.) A major issue here for performance is that all points not found 
on one process are collected in a vector and passed on to the next process 
through just one single MPI::send_recv. All points not found are sent and 
points not found by process with one lower rank are received.
 
4) Finally, set all values in u using the dof to component map.
 

The problem is inherently unscalable by nature since the efficiency depends on 
the distribution of two different meshes. If they happen to overlap much, then 
the routine may be faster with 2 than one cpu, but in the end that depends on 
the mesh partitioning that you cannot really control, at least not just using 
the default mesh-partitioner. Consider running a problem with 2 CPUs. Here it 
should be quite obvious that each dof must be searched on average 1.5 times 
(half are found on this CPU and the rest on the other). Since each process' 
mesh is half the size of the original, the total cost is then approximately 
1.5*0.5 = 0.75. So, not counting any cost of MPI communication, you should 
expect to reduce the computational time to 75% of the original. However, the 
MPI communications are extensive (half the mesh is sent back and forth) and you 
should honestly not expect any speedup at all. The estimate for N cpus will 
similarly be (N+1) / N / 2 + MPI communications.

If you have any comments to the implementation or, even better, suggestions on 
how to improve it, then please let me know by following up on this mail. If you 
like the current solution and you’d like to see this implemented soon in 
dolfin, then please vote for issue 162 on bitbucket:-)
 
Meanwhile, the `interpolate_nonmatching_mesh` function is available on 
https://github.com/mikaem/fenicstools

Best regards

Mikael


_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to