Hi all, I've implemented selective local evaluation of coefficient functions in the assembler depending on which functions each integral depends on. It's currently in branches called martinal/topic-add-enabled-coefficients-per-integral in ufl, ffc and dolfin (must be used together). Note that this changes ufc interface so everything must be recompiled.
To show the performance improvement, here's a simple benchmark script, assembling two forms (called a and b) that depend on one and two coefficients (f and (f and g) respectively) but yield the exact same integral and assembly result when assembled without any subdomains (the dx(1) term in form b is never executed). Each form is assembled twice for semi-robust timing and I first ran the script to keep the jit out of the picture. (Performance numbers below the code). from dolfin import * import time n = 60 mesh = UnitCubeMesh(n, n, n) V = FunctionSpace(mesh, "Lagrange", 1) f = Function(V) g = Function(V) a = f*dx() b = f*dx() + g*dx(1) t1 = time.time() A1 = assemble(a) t2 = time.time() A2 = assemble(a) t3 = time.time() print "A1:", (t2-t1) print "A2:", (t3-t2) t1 = time.time() B1 = assemble(b) t2 = time.time() B2 = assemble(b) t3 = time.time() print "B1:", (t2-t1) print "B2:", (t3-t2) Resulting time to assemble with current master: A1: 0.467525005341 A2: 0.465034008026 B1: 0.882906198502 B2: 0.830652952194 Note how the additional coefficient in form b gives very significant overhead for this simple functional even though it's never used in the computations. The time to assemble with the new branches: A1: 0.531542062759 A2: 0.530611991882 B1: 0.540424108505 B2: 0.535769939423 Note two things: The performance is a bit lower for the simple case. It might be possible to optimize this. The performance is the same for both cases, significantly faster for form b because the function g is never restricted. The cases that will benefit from this feature performance wise are forms with two or more integrals involving different coefficients. The cases that will have a small regression performance wise are forms with only one integral, with no coefficients, or where all integrals use the same coefficients. The relative performance regression is most noticable for simple forms such as mass and stiffness matrices. There are multiple future features that depend on this functionality: - it allows for functions that cannot be evaluated everywhere to be called only in their valid domain (examples are functions only living on subdomains, a partially overlapping mesh, or the boundary). - possible refactoring of preprocessing in ufl to reduce the amount of symbolic processing done for forms that are already in the jit cache. The functionality is obviously highly beneficial, so is it ok if I push it now even with the performance regression for simple forms? Martin
_______________________________________________ fenics mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics
