On 2013-12-05 08:57, Mikael Mortensen wrote:
Den Dec 4, 2013 kl. 6:58 PM skrev Garth N. Wells:
On 2013-12-04 17:15, Anders Logg wrote:
On Tue, Dec 03, 2013 at 10:12:22AM +0100, Mikael Mortensen wrote:
Den Dec 2, 2013 kl. 9:17 PM skrev Anders Logg:
> I don't see a simple way around this (other than inventing a new
> algorithm - the only way I can think of is to create a third data
> structure with a regular grid predictively distributed onto the
> different processors and then communicate via that grid) so I don't
> mind this being added.
Great
I could probably create a very inefficient parallel interpolation
routine by looping over all partitions of the distributed mesh1, one
by one, collectively (i.e., broadcast mesh1's partition on process 0
to all, then run collectively over it, allowing us to call u0.eval
or evaluate_dofs collectively. When finished move on to partition on
process 1). That could work for any space I guess?
I meant communicating via a third independent grid constructed in a
way such that each cell only needs to communicate with a subset of
that third grid.
I would create a bounding box(es) of the mesh on each process, and
then build a search tree for the boxes. Then do
That's a good idea if I understand correctly. Use all_gathered
bounding boxes for the mesh partitions to find the process that owns
the interpolation point
The bounding boxes can be used to determine which processes *might* own
a particular point. In the simple case, each process could store a copy
of all bounding boxes. For really large problems, we might want a more
elaborate distributed tree design.
and then only call u0.eval after found.
The function call u0.eval(p) should ask the process that might own the
point 'p' to do the eval, and to send back the result.
I can
then reuse in_boundary_box++ from PeriodicBoundaryComputation, right?
I'll have a closer look.
Maybe. Most likely it should be spun out as separate class. I think you
could register this a task Issue on Bitbucket.
You sure this isn't handled automatically by bounding box trees
created the first time u0.eval is called?
for (CellIterator cell(*_mesh); !cell.end(); ++cell)
{
// If interpolation point is on this process, do as we do now
// If interpolation point is *not* on this process, insert
// into cache to be fetched
}
// Call function to retrieve non-local data. This function should use
// the search tree.
Not quite sure I follow. AFAIK it is still not possible to do as
before since restrict_as_ufc_function calls element.evaluate_dofs and
as such the entire cell must be on both processes, not just the
interpolation point. I would follow if the above was an iteration over
interpolation points, though, like it is in my code already.
Yes, I'm thinking in terms of iteration over interpolation points.
For Lagrange elements there is really no reason to do the cell loop.
For P1 yes, for higher order it may be necessary. It could be made
efficient by going over all mesh entities.
Garth
I
forgot to mention in the previous mail that the nonmatching routine
attached to the issue is 4-5 (!!!) times faster in 3D than the regular
interpolate for P1 elements. That is because the loop is over vertices
and not over cells. And there are many fewer callbacks to eval, just
the optimal one for each dof.
Mikael
The constrained/periodic bc code uses bounding boxes to locate
constrained vertices/edges/facets that are off-process. It doesn't use
a tree for searching, but it should. The some abstract function for
finding the off-process owner of a point or mesh entity would be very
helpful.
Garth
--
Anders
> Since not all elements are supported, I think this function should
> have a separate name for now, perhaps
>
> interpolate_lagrange()
>
> My suggestion would be that the documentation explains that this
> function works in parallel for non-matching meshes (in contrast to the
> regular interpolate) but does only support Lagrange spaces.
Sounds good to me.
Mikael
>
>> Hi,
>>
>> This is in reference to Issue 162
https://bitbucket.org/fenics-project/dolfin/
>> issue/162/interpolation-nonmatching-mesh-in-parallel
>>
>> Currently it is not possible in parallel to interpolate a
>> dolfin Function u0 defined on one mesh to a Function with the
>> same FunctionSpace defined on a different mesh. I would like to be able to do
>> this
>>
>> mesh0 = UnitSquareMesh(100, 100)
>> mesh1 = UnitSquareMesh(20, 20)
>> V0 = FunctionSpace(mesh0, "CG", 1)
>> V1 = FunctionSpace(mesh1, "CG", 1)
>> u0 = interpolate(Expression("x[0]*x[1]"), V0)
>> u0.update()
>> u = interpolate_nonmatching_mesh(u0, V1)
>>
>> or probably overloaded like this
>>
>> u = interpolate(u0, V1)
>>
>> Now there are probably many different ways this can be implemented. The most
>> obvious approach that comes to mind is simply to overload `Function::eval`
with
>> a parallel version. There are many reasons why this doesn’t quite seem to
work.
>> Consider the current `interpolate` implementation
>>
>> FunctionSpace::interpolate(GenericVector& expansion_coefficients,
>> const GenericFunction& v)
>> …
>> // Iterate over mesh and interpolate on each cell
>> UFCCell ufc_cell(*_mesh);
>> for (CellIterator cell(*_mesh); !cell.end(); ++cell)
>> {
>> // Update to current cell
>> ufc_cell.update(*cell);
>>
>> // Restrict function to cell
>> v.restrict(&cell_coefficients[0], *_element, *cell, ufc_cell);
>>
>> // Tabulate dofs
>> const std::vector<dolfin::la_index>& cell_dofs = _dofmap->cell_dofs(cell->
>> index());
>>
>> // Copy dofs to vector
>> expansion_coefficients.set(&cell_coefficients[0],
>> _dofmap->cell_dimension(cell->index()),
>> &cell_dofs[0]);
>> }
>>
>> The cell loop runs on all processors for the part of the mesh of V1 that
lives
>> on that process. `v.restrict` calls `restrict_as_ufc_function` since V0 and
V1
>> have different meshes, `restrict_as_ufc_function` calls
`element.evaluate_dofs`
>> that again calls `u0.eval` for each dof on the cell. Now the major problem
here
>> that makes it difficult to extend this to parallel is that the cell iteration
>> is not collective and `eval` is not called collectively. Since the
coordinates
>> of dofs on a cell in the V1-mesh could live on any other cpu for the V0-mesh,
>> the `eval` needs to be called collectively with some try/catch clause and
with
>> a lot of broadcasting (of coordinates and result) to obtain what is
>> required. Alternatively, the non-collective `eval` that fails must be
captured
>> and handled using MPI++ later.
>>
>> Faced with these problems my solution has instead been to create a
stand-alone
>> version for parallel interpolation on nonmatching meshes. Not trying to
>> overload `eval` or fit it into the current `interpolate` framework. The code
>> that performs the interpolation is attached to issue 162. The code is fairly
>> well documented and should, hopefully, be possible to follow. The algorithm
is
>> basically like this
>>
>> 1) On each process tabulate all coordinates for dofs in the functionspace
>> we’re interpolating to, i.e., V1 in the above code.
>>
>> x = V1.dofmap()->tabulate_all_coordinates(mesh1)
>>
>> 2) Create a map from dof number in V1 to component number in Mixed Space.
>>
>> This is necessary to be able to handle mixed spaces. I do not think I have
>> found the best solution to this problem, so any help here would be much
>> appreciated. I just realized that the current solution does not work
>> for special spaces like BDM or RT.
>>
>> 3) Compute `u0.eval` for all points x computed in 1).
>>
>> Problem here is that u0 and u have different meshes and as such a point in u
>> will not necessarily, or not even likely, be found on the same processor for
>> u0. Hence the dofs coordinates must be passed around and searched on all
ranks
>> until found. When found the result is returned to the process that owns the
>> dof. (Alternatively, but slower, `u0.eval` could simply be called
>> collectively.) A major issue here for performance is that all points not
found
>> on one process are collected in a vector and passed on to the
>> next process through just one single MPI::send_recv. All points not found are
>> sent and points not found by process with one lower rank are received.
>>
>> 4) Finally, set all values in u using the dof to component map.
>>
>>
>> The problem is inherently unscalable by nature since the efficiency depends
on
>> the distribution of two different meshes. If they happen to overlap much,
then
>> the routine may be faster with 2 than one cpu, but in the end that depends on
>> the mesh partitioning that you cannot really control, at least not just using
>> the default mesh-partitioner. Consider running a problem with 2 CPUs. Here it
>> should be quite obvious that each dof must be searched on average 1.5 times
>> (half are found on this CPU and the rest on the other). Since each process'
>> mesh is half the size of the original, the total cost is then approximately
>> 1.5*0.5 = 0.75. So, not counting any cost of MPI communication, you should
>> expect to reduce the computational time to 75% of the original. However, the
>> MPI communications are extensive (half the mesh is sent back and forth) and
you
>> should honestly not expect any speedup at all. The estimate for N cpus will
>> similarly be (N+1) / N / 2 + MPI communications.
>>
>> If you have any comments to the implementation or, even better, suggestions
on
>> how to improve it, then please let me know by following up on this mail. If
you
>> like the current solution and you’d like to see this implemented soon
>> in dolfin, then please vote for issue 162 on bitbucket:-)
>>
>> Meanwhile, the `interpolate_nonmatching_mesh` function is available on
https://
>> github.com/mikaem/fenicstools
>>
>> Best regards
>>
>> Mikael
>>
>>
>
>> _______________________________________________
>> fenics mailing list
>> [email protected]
>> http://fenicsproject.org/mailman/listinfo/fenics
>
> _______________________________________________
> fenics mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics