Re: [FEniCS] Issue 162 - interpolation on nonmatching meshes

Mikael Mortensen Fri, 14 Feb 2014 05:57:30 -0800

Hi,

I have now pushed a branch on bitbucket for interpolation on nonmatching 
meshes: 
https://bitbucket.org/fenics-project/dolfin/branch/mikael/lagrange-interpolation


Since the interpolation only works for Lagrange elements, I didn't feel right 
about adding the routines to the existing Function class and as such I have 
created a new class *LagrangeInterpolator* for performing this interpolation. 
The class can interpolate both Expressions and Functions from different meshes.

The absolutely fabulous thing about this class is its performance. Take a look 
at these numbers comparing the regular interpolate with LagrangeInterpolator 
for interpolating a Python Expression that has an overloaded eval (see script 
below):

[mikael@ubuntu test]$ mpirun -np 2 python test_lagrange.py
Number of global vertices: 343
Number of global cells: 1296
Number of global vertices: 1331
Number of global cells: 6000

Testing interpolation on UnitCubeMesh

FunctionSpace:

    Regular interpolate:
     5184 calls to eval using a total of 0.8027 seconds 
    LagrangeInterpolator:
      343 calls to eval using a total of 0.0592 seconds 

VectorFunctionSpace:

    Regular interpolate:
    15552 calls to eval using a total of 2.4001 seconds 
    LagrangeInterpolator:
      343 calls to eval using a total of 0.0582 seconds 

TensorFunctionSpace:

    Regular interpolate:
    46656 calls to eval using a total of 7.1846 seconds 
    LagrangeInterpolator:
      343 calls to eval using a total of 0.0728 seconds 

MixedSpace:

    Regular interpolate:
    20736 calls to eval using a total of 3.2043 seconds 
    LagrangeInterpolator:
      384 calls to eval using a total of 0.0679 seconds 


There is a factor 100 speedup for interpolating an Expression into a 
TensorFunctionSpace:-) Major reason is the number of callbacks to the Python 
eval.

Please have a look at the branch and let me know what you think. If ithere are 
no major objections I'll make a pull request soon.

Best regards

Mikael


Here is the script used to generate the results:

from dolfin import *
from numpy import arange
import time

# Test LagrangeInterpolator for some FunctionSpaces

count = 0    
class MySlowExp(Expression):
    
    def __init__(self, N, **kwargs):
        self.N = N

    def eval(self, values, x):
        global count
        count +=1
        time.sleep(0.0001)
        values[:] = arange(self.N)+1

mesh = UnitCubeMesh(6, 6, 6)
mesh1 = UnitCubeMesh(10, 10, 10)
Q = FunctionSpace(mesh, 'CG', 1)
V = VectorFunctionSpace(mesh, 'CG', 1)
S = TensorFunctionSpace(mesh, 'CG', 1)
Q1 = FunctionSpace(mesh1, 'CG', 1)
V1 = VectorFunctionSpace(mesh1, 'CG', 1)
S1 = TensorFunctionSpace(mesh1, 'CG', 1)
W = V * Q
W1 = V1 * Q1

mpi_comm = mpi_comm_world()
my_rank = MPI.rank(mpi_comm)

interpolator = LagrangeInterpolator()

if my_rank == 0:
    print
    print "Testing interpolation on UnitCubeMesh"
    print
    print "FunctionSpace:"
    print

t0 = time.time()
v0 = interpolate(MySlowExp(1, element=Q.ufl_element()), Q)
count = MPI.sum(mpi_comm, count)
if my_rank == 0:
    print "    Regular interpolate:"
    print "    {:5d} calls to eval using a total of {:2.4f} seconds 
".format(count, time.time()-t0)

count = 0    
t0 = time.time()
v1 = Function(Q)
interpolator.interpolate(MySlowExp(1, element=Q.ufl_element()), v1)
count = MPI.sum(mpi_comm, count)
if my_rank == 0:
    print "    LagrangeInterpolator:"
    print "    {:5d} calls to eval using a total of {:2.4f} seconds 
".format(count, time.time()-t0)

v2 = Function(Q1)
interpolator.interpolate(v0, v2)
vn0 = assemble(v0*v0*dx)
vn1 = assemble(v2*v2*dx)
assert(abs(vn0-vn1) < 1e-10)

if my_rank == 0:
    print
    print "VectorFunctionSpace:"
    print
count = 0    
t0 = time.time()
v0 = interpolate(MySlowExp(3, element=V.ufl_element()), V)
count = MPI.sum(mpi_comm, count)
if my_rank == 0:
    print "    Regular interpolate:"
    print "    {:5d} calls to eval using a total of {:2.4f} seconds 
".format(count, time.time()-t0)

count = 0    
t0 = time.time()
v1 = Function(V)
interpolator.interpolate(MySlowExp(3, element=V.ufl_element()), v1)
count = MPI.sum(mpi_comm, count)
if my_rank == 0:
    print "    LagrangeInterpolator:"
    print "    {:5d} calls to eval using a total of {:2.4f} seconds 
".format(count, time.time()-t0)

v2 = Function(V1)
interpolator.interpolate(v0, v2)
vn0 = assemble(dot(v1, v1)*dx)
vn1 = assemble(dot(v2, v2)*dx)
assert(abs(vn0-vn1) < 1e-10)

if my_rank == 0:
    print
    print "TensorFunctionSpace:"
    print

count = 0    
t0 = time.time()
v0 = interpolate(MySlowExp(9, element=S.ufl_element()), S)
count = MPI.sum(mpi_comm, count)
if my_rank == 0:
    print "    Regular interpolate:"
    print "    {:5d} calls to eval using a total of {:2.4f} seconds 
".format(count, time.time()-t0)

count = 0    
t0 = time.time()
v1 = Function(S)
interpolator.interpolate(MySlowExp(9, element=S.ufl_element()), v1)
count = MPI.sum(mpi_comm, count)
if my_rank == 0:
    print "    LagrangeInterpolator:"
    print "    {:5d} calls to eval using a total of {:2.4f} seconds 
".format(count, time.time()-t0)

v2 = Function(S1)
interpolator.interpolate(v0, v2)
vn0 = assemble(inner(v0, v0)*dx)
vn1 = assemble(inner(v2, v2)*dx)
assert(abs(vn0-vn1) < 1e-10)

if my_rank == 0:
    print
    print "MixedSpace:"
    print

count = 0    
t0 = time.time()
v0 = interpolate(MySlowExp(4, element=W.ufl_element()), W)
count = MPI.sum(mpi_comm, count)
if my_rank == 0:
    print "    Regular interpolate:"
    print "    {:5d} calls to eval using a total of {:2.4f} seconds 
".format(count, time.time()-t0)

count = 0    
t0 = time.time()
v1 = Function(W)
interpolator.interpolate(MySlowExp(4, element=W.ufl_element()), v1)
count = MPI.sum(mpi_comm, count)
if my_rank == 0:
    print "    LagrangeInterpolator:"
    print "    {:5d} calls to eval using a total of {:2.4f} seconds 
".format(count, time.time()-t0)

v2 = Function(W1)
interpolator.interpolate(v0, v2)
vn0 = assemble(inner(v0, v0)*dx)
vn1 = assemble(inner(v2, v2)*dx)
assert(abs(vn0-vn1) < 1e-10)
12 Dec 2013 kl. 16:36 skrev Mikael Mortensen <[email protected]>:

> 
> Den Dec 6, 2013 kl. 6:28 PM skrev Garth N. Wells:
> 
>> On 2013-12-05 08:57, Mikael Mortensen wrote:
>>> Den Dec 4, 2013 kl. 6:58 PM skrev Garth N. Wells:
>>>> On 2013-12-04 17:15, Anders Logg wrote:
>>>>> On Tue, Dec 03, 2013 at 10:12:22AM +0100, Mikael Mortensen wrote:
>>>>>> Den Dec 2, 2013 kl. 9:17 PM skrev Anders Logg:
>>>>>>> I don't see a simple way around this (other than inventing a new
>>>>>>> algorithm - the only way I can think of is to create a third data
>>>>>>> structure with a regular grid predictively distributed onto the
>>>>>>> different processors and then communicate via that grid) so I don't
>>>>>>> mind this being added.
>>>>>> Great
>>>>>> I could probably create a very inefficient parallel interpolation
>>>>>> routine by looping over all partitions of the distributed mesh1, one
>>>>>> by one, collectively (i.e., broadcast mesh1's partition on process 0
>>>>>> to all, then run collectively over it, allowing us to call u0.eval
>>>>>> or evaluate_dofs collectively. When finished move on to partition on
>>>>>> process 1). That could work for any space I guess?
>>>>> I meant communicating via a third independent grid constructed in a
>>>>> way such that each cell only needs to communicate with a subset of
>>>>> that third grid.
>>>> I would create a bounding box(es) of the mesh on each process, and then 
>>>> build a search tree for the boxes. Then do
>>> That's a good idea if I understand correctly. Use all_gathered
>>> bounding boxes for the mesh partitions to find the process that owns
>>> the interpolation point
>> 
>> The bounding boxes can be used to determine which processes *might* own a 
>> particular point. In the simple case, each process could store a copy of all 
>> bounding boxes. For really large problems, we might want a more elaborate 
>> distributed tree design.
> 
> I have implemented this now. There is a significant speed up when using a 
> large number of processors (I've tried up to 32).
> 
>> 
>>> and then only call u0.eval after found.
>> 
>> The function call u0.eval(p) should ask the process that might own the point 
>> 'p' to do the eval, and to send back the result.
>> 
>>> I can
>>> then reuse in_boundary_box++ from PeriodicBoundaryComputation, right?
>>> I'll have a closer look.
>> 
>> Maybe. Most likely it should be spun out as separate class. I think you 
>> could register this a task Issue on Bitbucket.
> 
> In the current implementation I use both the lt_coordinate struct and 
> in_bounding_box from PeriodicBoundaryComputation. Not sure how this should be 
> organized in a separate class though. 
> 
> There is a considerable speed-up for mixed spaces if I create a map from 
> interpolation points to dofs living at that point (i.e., a 
> VectorFunctionSpace has three dofs on each point in 3D). This reduces the 
> calls to eval to one per interpolation point and lowers the MPI traffic. 
> However, the std::map is slower than using a regular std::vector for single 
> spaces. Not sure what's the best way to combine these two yet. Any 
> suggestions are most welcome.
> 
> I need to clean up the code a bit, but the current version is attached to 
> issue 162.
> 
> Mikael
> 
>> 
>>> You sure this isn't handled automatically by bounding box trees
>>> created the first time u0.eval is called?
>>>> for (CellIterator cell(*_mesh); !cell.end(); ++cell)
>>>> {
>>>> // If interpolation point is on this process, do as we do now
>>>> // If interpolation point is *not* on this process, insert
>>>> // into cache to be fetched
>>>> }
>>>> // Call function to retrieve non-local data. This function should use
>>>> // the search tree.
>>> Not quite sure I follow. AFAIK it is still not possible to do as
>>> before since restrict_as_ufc_function calls element.evaluate_dofs and
>>> as such the entire cell must be on both processes, not just the
>>> interpolation point. I would follow if the above was an iteration over
>>> interpolation points, though, like it is in my code already.
>> 
>> 
>> Yes, I'm thinking in terms of iteration over interpolation points.
>> 
>>> For Lagrange elements there is really no reason to do the cell loop.
>> 
>> For P1 yes, for higher order it may be necessary. It could be made efficient 
>> by going over all mesh entities.
>> 
>> Garth
>>

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] Issue 162 - interpolation on nonmatching meshes

Reply via email to