On Tue, Dec 03, 2013 at 10:12:22AM +0100, Mikael Mortensen wrote: > > Den Dec 2, 2013 kl. 9:17 PM skrev Anders Logg: > > > I don't see a simple way around this (other than inventing a new > > algorithm - the only way I can think of is to create a third data > > structure with a regular grid predictively distributed onto the > > different processors and then communicate via that grid) so I don't > > mind this being added. > > Great > > I could probably create a very inefficient parallel interpolation > routine by looping over all partitions of the distributed mesh1, one > by one, collectively (i.e., broadcast mesh1's partition on process 0 > to all, then run collectively over it, allowing us to call u0.eval > or evaluate_dofs collectively. When finished move on to partition on > process 1). That could work for any space I guess?
I meant communicating via a third independent grid constructed in a way such that each cell only needs to communicate with a subset of that third grid. -- Anders > > Since not all elements are supported, I think this function should > > have a separate name for now, perhaps > > > > interpolate_lagrange() > > > > My suggestion would be that the documentation explains that this > > function works in parallel for non-matching meshes (in contrast to the > > regular interpolate) but does only support Lagrange spaces. > > Sounds good to me. > > Mikael > > > > >> Hi, > >> > >> This is in reference to Issue 162 > >> https://bitbucket.org/fenics-project/dolfin/ > >> issue/162/interpolation-nonmatching-mesh-in-parallel > >> > >> Currently it is not possible in parallel to interpolate a > >> dolfin Function u0 defined on one mesh to a Function with the > >> same FunctionSpace defined on a different mesh. I would like to be able to > >> do > >> this > >> > >> mesh0 = UnitSquareMesh(100, 100) > >> mesh1 = UnitSquareMesh(20, 20) > >> V0 = FunctionSpace(mesh0, "CG", 1) > >> V1 = FunctionSpace(mesh1, "CG", 1) > >> u0 = interpolate(Expression("x[0]*x[1]"), V0) > >> u0.update() > >> u = interpolate_nonmatching_mesh(u0, V1) > >> > >> or probably overloaded like this > >> > >> u = interpolate(u0, V1) > >> > >> Now there are probably many different ways this can be implemented. The > >> most > >> obvious approach that comes to mind is simply to overload `Function::eval` > >> with > >> a parallel version. There are many reasons why this doesn’t quite seem to > >> work. > >> Consider the current `interpolate` implementation > >> > >> FunctionSpace::interpolate(GenericVector& expansion_coefficients, > >> const GenericFunction& v) > >> … > >> // Iterate over mesh and interpolate on each cell > >> UFCCell ufc_cell(*_mesh); > >> for (CellIterator cell(*_mesh); !cell.end(); ++cell) > >> { > >> // Update to current cell > >> ufc_cell.update(*cell); > >> > >> // Restrict function to cell > >> v.restrict(&cell_coefficients[0], *_element, *cell, ufc_cell); > >> > >> // Tabulate dofs > >> const std::vector<dolfin::la_index>& cell_dofs = > >> _dofmap->cell_dofs(cell-> > >> index()); > >> > >> // Copy dofs to vector > >> expansion_coefficients.set(&cell_coefficients[0], > >> _dofmap->cell_dimension(cell->index()), > >> &cell_dofs[0]); > >> } > >> > >> The cell loop runs on all processors for the part of the mesh of V1 that > >> lives > >> on that process. `v.restrict` calls `restrict_as_ufc_function` since V0 > >> and V1 > >> have different meshes, `restrict_as_ufc_function` calls > >> `element.evaluate_dofs` > >> that again calls `u0.eval` for each dof on the cell. Now the major problem > >> here > >> that makes it difficult to extend this to parallel is that the cell > >> iteration > >> is not collective and `eval` is not called collectively. Since the > >> coordinates > >> of dofs on a cell in the V1-mesh could live on any other cpu for the > >> V0-mesh, > >> the `eval` needs to be called collectively with some try/catch clause and > >> with > >> a lot of broadcasting (of coordinates and result) to obtain what is > >> required. Alternatively, the non-collective `eval` that fails must be > >> captured > >> and handled using MPI++ later. > >> > >> Faced with these problems my solution has instead been to create a > >> stand-alone > >> version for parallel interpolation on nonmatching meshes. Not trying to > >> overload `eval` or fit it into the current `interpolate` framework. The > >> code > >> that performs the interpolation is attached to issue 162. The code is > >> fairly > >> well documented and should, hopefully, be possible to follow. The > >> algorithm is > >> basically like this > >> > >> 1) On each process tabulate all coordinates for dofs in the functionspace > >> we’re interpolating to, i.e., V1 in the above code. > >> > >> x = V1.dofmap()->tabulate_all_coordinates(mesh1) > >> > >> 2) Create a map from dof number in V1 to component number in Mixed Space. > >> > >> This is necessary to be able to handle mixed spaces. I do not think I have > >> found the best solution to this problem, so any help here would be much > >> appreciated. I just realized that the current solution does not work > >> for special spaces like BDM or RT. > >> > >> 3) Compute `u0.eval` for all points x computed in 1). > >> > >> Problem here is that u0 and u have different meshes and as such a point in > >> u > >> will not necessarily, or not even likely, be found on the same processor > >> for > >> u0. Hence the dofs coordinates must be passed around and searched on all > >> ranks > >> until found. When found the result is returned to the process that owns the > >> dof. (Alternatively, but slower, `u0.eval` could simply be called > >> collectively.) A major issue here for performance is that all points not > >> found > >> on one process are collected in a vector and passed on to the > >> next process through just one single MPI::send_recv. All points not found > >> are > >> sent and points not found by process with one lower rank are received. > >> > >> 4) Finally, set all values in u using the dof to component map. > >> > >> > >> The problem is inherently unscalable by nature since the efficiency > >> depends on > >> the distribution of two different meshes. If they happen to overlap much, > >> then > >> the routine may be faster with 2 than one cpu, but in the end that depends > >> on > >> the mesh partitioning that you cannot really control, at least not just > >> using > >> the default mesh-partitioner. Consider running a problem with 2 CPUs. Here > >> it > >> should be quite obvious that each dof must be searched on average 1.5 times > >> (half are found on this CPU and the rest on the other). Since each process' > >> mesh is half the size of the original, the total cost is then approximately > >> 1.5*0.5 = 0.75. So, not counting any cost of MPI communication, you should > >> expect to reduce the computational time to 75% of the original. However, > >> the > >> MPI communications are extensive (half the mesh is sent back and forth) > >> and you > >> should honestly not expect any speedup at all. The estimate for N cpus will > >> similarly be (N+1) / N / 2 + MPI communications. > >> > >> If you have any comments to the implementation or, even better, > >> suggestions on > >> how to improve it, then please let me know by following up on this mail. > >> If you > >> like the current solution and you’d like to see this implemented soon > >> in dolfin, then please vote for issue 162 on bitbucket:-) > >> > >> Meanwhile, the `interpolate_nonmatching_mesh` function is available on > >> https:// > >> github.com/mikaem/fenicstools > >> > >> Best regards > >> > >> Mikael > >> > >> > > > >> _______________________________________________ > >> fenics mailing list > >> [email protected] > >> http://fenicsproject.org/mailman/listinfo/fenics > > > > _______________________________________________ > > fenics mailing list > > [email protected] > > http://fenicsproject.org/mailman/listinfo/fenics > _______________________________________________ fenics mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics
