Re: [FEniCS] Improvement to assembly performance with a minor regression

Anders Logg Fri, 09 May 2014 12:56:44 -0700

On Fri, May 09, 2014 at 03:27:20PM +0200, Martin Sandve Alnæs wrote:
> Hi all,
> I've implemented selective local evaluation of coefficient functions in the
> assembler depending on which functions each integral depends on. It's 
> currently
> in branches called
> martinal/topic-add-enabled-coefficients-per-integral
> in ufl, ffc and dolfin (must be used together).
> Note that this changes ufc interface so everything must be recompiled.
>
> To show the performance improvement, here's a simple benchmark script,
> assembling two forms (called a and b) that depend on one and two coefficients
> (f and (f and g) respectively) but yield the exact same integral and assembly
> result when assembled without any subdomains (the dx(1) term in form b is 
> never
> executed). Each form is assembled twice for semi-robust timing and I first ran
> the script to keep the jit out of the picture. (Performance numbers below the
> code).
>
>
> from dolfin import *
> import time
>
> n = 60
> mesh = UnitCubeMesh(n, n, n)
> V = FunctionSpace(mesh, "Lagrange", 1)
> f = Function(V)
> g = Function(V)
>
> a = f*dx()
> b = f*dx() + g*dx(1)
>
> t1 = time.time()
> A1 = assemble(a)
> t2 = time.time()
> A2 = assemble(a)
> t3 = time.time()
>
> print "A1:", (t2-t1)
> print "A2:", (t3-t2)
>
> t1 = time.time()
> B1 = assemble(b)
> t2 = time.time()
> B2 = assemble(b)
> t3 = time.time()
>
> print "B1:", (t2-t1)
> print "B2:", (t3-t2)
>
>
> Resulting time to assemble with current master:
>
> A1: 0.467525005341
> A2: 0.465034008026
> B1: 0.882906198502
> B2: 0.830652952194
>
> Note how the additional coefficient in form b gives very significant overhead
> for this simple functional even though it's never used in the computations.
>
> The time to assemble with the new branches:
>
> A1: 0.531542062759
> A2: 0.530611991882
> B1: 0.540424108505
> B2: 0.535769939423
>
> Note two things:
> The performance is a bit lower for the simple case. It might be possible to
> optimize this.
> The performance is the same for both cases, significantly faster for form b
> because the function g is never restricted.
>
>
> The cases that will benefit from this feature performance wise are forms with
> two or more integrals involving different coefficients.
>
> The cases that will have a small regression performance wise are forms with
> only one integral, with no coefficients, or where all integrals use the same
> coefficients. The relative performance regression is most noticable for simple
> forms such as mass and stiffness matrices.
>
> There are multiple future features that depend on this functionality:
> - it allows for functions that cannot be evaluated everywhere to be called 
> only
> in their valid domain (examples are functions only living on subdomains, a
> partially overlapping mesh, or the boundary).
> - possible refactoring of preprocessing in ufl to reduce the amount of 
> symbolic
> processing done for forms that are already in the jit cache.
>
> The functionality is obviously highly beneficial, so is it ok if I push it now
> even with the performance regression for simple forms?


Could you first check what the performance regression is (if any) for
assembling a standard right-hand side vector f*dx and Poisson
stiffness matrix?

Perhaps this is only noticeable for functionals.

--
Anders
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] Improvement to assembly performance with a minor regression

Reply via email to