Re: [FEniCS] Improvement to assembly performance with a minor regression

Martin Sandve Alnæs Mon, 12 May 2014 02:07:22 -0700

The functional here is the same as in the previous email, just on a newer
faster laptop. On this computer I don't see any slowdown for the functional
either.


Functional (a=f*dx, b=f*dx+g*dx(1)):
Before:
A: 0.261495113373
B: 0.431048870087
After:
A: 0.251796007156
B: 0.249910116196

Linear form (a=f*v*dx, b=f*v*dx+g*v*dx(1)):
Before:
A: 0.302114009857
B: 0.478947877884
After:
A: 0.292839050293
B: 0.29376912117

Bilinear form (a=f*v*u*dx, b=f*v*u*dx+g*v*u*dx(1)):
Before:
A: 0.665670156479
B: 0.849056959152
After:
A: 0.648552894592
B: 0.685650110245

I'll go ahead with merging then.

Martin



On 12 May 2014 10:27, Martin Sandve Alnæs <[email protected]> wrote:

> I'll check. It's just really painful to rebuild with ufc changes... Is it
> really necessary to rebuild all of dolfin after ufc changes? The dolfin
> build system is not really doing its job in this situation.
>
> Martin
>
>
> On 9 May 2014 21:55, Anders Logg <[email protected]> wrote:
>
>> On Fri, May 09, 2014 at 03:27:20PM +0200, Martin Sandve Alnæs wrote:
>> > Hi all,
>> > I've implemented selective local evaluation of coefficient functions in
>> the
>> > assembler depending on which functions each integral depends on. It's
>> currently
>> > in branches called
>> > martinal/topic-add-enabled-coefficients-per-integral
>> > in ufl, ffc and dolfin (must be used together).
>> > Note that this changes ufc interface so everything must be recompiled.
>> >
>> > To show the performance improvement, here's a simple benchmark script,
>> > assembling two forms (called a and b) that depend on one and two
>> coefficients
>> > (f and (f and g) respectively) but yield the exact same integral and
>> assembly
>> > result when assembled without any subdomains (the dx(1) term in form b
>> is never
>> > executed). Each form is assembled twice for semi-robust timing and I
>> first ran
>> > the script to keep the jit out of the picture. (Performance numbers
>> below the
>> > code).
>> >
>> >
>> > from dolfin import *
>> > import time
>> >
>> > n = 60
>> > mesh = UnitCubeMesh(n, n, n)
>> > V = FunctionSpace(mesh, "Lagrange", 1)
>> > f = Function(V)
>> > g = Function(V)
>> >
>> > a = f*dx()
>> > b = f*dx() + g*dx(1)
>> >
>> > t1 = time.time()
>> > A1 = assemble(a)
>> > t2 = time.time()
>> > A2 = assemble(a)
>> > t3 = time.time()
>> >
>> > print "A1:", (t2-t1)
>> > print "A2:", (t3-t2)
>> >
>> > t1 = time.time()
>> > B1 = assemble(b)
>> > t2 = time.time()
>> > B2 = assemble(b)
>> > t3 = time.time()
>> >
>> > print "B1:", (t2-t1)
>> > print "B2:", (t3-t2)
>> >
>> >
>> > Resulting time to assemble with current master:
>> >
>> > A1: 0.467525005341
>> > A2: 0.465034008026
>> > B1: 0.882906198502
>> > B2: 0.830652952194
>> >
>> > Note how the additional coefficient in form b gives very significant
>> overhead
>> > for this simple functional even though it's never used in the
>> computations.
>> >
>> > The time to assemble with the new branches:
>> >
>> > A1: 0.531542062759
>> > A2: 0.530611991882
>> > B1: 0.540424108505
>> > B2: 0.535769939423
>> >
>> > Note two things:
>> > The performance is a bit lower for the simple case. It might be
>> possible to
>> > optimize this.
>> > The performance is the same for both cases, significantly faster for
>> form b
>> > because the function g is never restricted.
>> >
>> >
>> > The cases that will benefit from this feature performance wise are
>> forms with
>> > two or more integrals involving different coefficients.
>> >
>> > The cases that will have a small regression performance wise are forms
>> with
>> > only one integral, with no coefficients, or where all integrals use the
>> same
>> > coefficients. The relative performance regression is most noticable for
>> simple
>> > forms such as mass and stiffness matrices.
>> >
>> > There are multiple future features that depend on this functionality:
>> > - it allows for functions that cannot be evaluated everywhere to be
>> called only
>> > in their valid domain (examples are functions only living on
>> subdomains, a
>> > partially overlapping mesh, or the boundary).
>> > - possible refactoring of preprocessing in ufl to reduce the amount of
>> symbolic
>> > processing done for forms that are already in the jit cache.
>> >
>> > The functionality is obviously highly beneficial, so is it ok if I push
>> it now
>> > even with the performance regression for simple forms?
>>
>> Could you first check what the performance regression is (if any) for
>> assembling a standard right-hand side vector f*dx and Poisson
>> stiffness matrix?
>>
>> Perhaps this is only noticeable for functionals.
>>
>> --
>> Anders
>>
>
>

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] Improvement to assembly performance with a minor regression

Reply via email to