[FEniCS] Improvement to assembly performance with a minor regression

Martin Sandve Alnæs Fri, 09 May 2014 06:27:42 -0700

Hi all,
I've implemented selective local evaluation of coefficient functions in the
assembler depending on which functions each integral depends on. It's
currently in branches called
martinal/topic-add-enabled-coefficients-per-integral
in ufl, ffc and dolfin (must be used together).
Note that this changes ufc interface so everything must be recompiled.


To show the performance improvement, here's a simple benchmark script,
assembling two forms (called a and b) that depend on one and two
coefficients (f and (f and g) respectively) but yield the exact same
integral and assembly result when assembled without any subdomains (the
dx(1) term in form b is never executed). Each form is assembled twice for
semi-robust timing and I first ran the script to keep the jit out of the
picture. (Performance numbers below the code).


from dolfin import *
import time

n = 60
mesh = UnitCubeMesh(n, n, n)
V = FunctionSpace(mesh, "Lagrange", 1)
f = Function(V)
g = Function(V)

a = f*dx()
b = f*dx() + g*dx(1)

t1 = time.time()
A1 = assemble(a)
t2 = time.time()
A2 = assemble(a)
t3 = time.time()

print "A1:", (t2-t1)
print "A2:", (t3-t2)

t1 = time.time()
B1 = assemble(b)
t2 = time.time()
B2 = assemble(b)
t3 = time.time()

print "B1:", (t2-t1)
print "B2:", (t3-t2)


Resulting time to assemble with current master:

A1: 0.467525005341
A2: 0.465034008026
B1: 0.882906198502
B2: 0.830652952194

Note how the additional coefficient in form b gives very significant
overhead for this simple functional even though it's never used in the
computations.

The time to assemble with the new branches:

A1: 0.531542062759
A2: 0.530611991882
B1: 0.540424108505
B2: 0.535769939423

Note two things:
The performance is a bit lower for the simple case. It might be possible to
optimize this.
The performance is the same for both cases, significantly faster for form b
because the function g is never restricted.


The cases that will benefit from this feature performance wise are forms
with two or more integrals involving different coefficients.

The cases that will have a small regression performance wise are forms with
only one integral, with no coefficients, or where all integrals use the
same coefficients. The relative performance regression is most noticable
for simple forms such as mass and stiffness matrices.

There are multiple future features that depend on this functionality:
- it allows for functions that cannot be evaluated everywhere to be called
only in their valid domain (examples are functions only living on
subdomains, a partially overlapping mesh, or the boundary).
- possible refactoring of preprocessing in ufl to reduce the amount of
symbolic processing done for forms that are already in the jit cache.

The functionality is obviously highly beneficial, so is it ok if I push it
now even with the performance regression for simple forms?

Martin

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

[FEniCS] Improvement to assembly performance with a minor regression

Reply via email to