On 16 September 2013 13:59, Johan Hake <[email protected]> wrote:
> Interesting!
>
> What do you mean with dof blocking?
>

Making sure that dofs at a node are grouped together, e.g. (ux, uy and
uz) in elasticity. The block size (which must be constant) is passed
through to the PETSc matrix. This allows PETSc to make some
optimisations for sparse matrix-vector products, and it dramatically
improves some preconditioners (e.g. AMG) because they work with a much
smaller sparsity graph and can incorporate the extra information in
the preconditoner (e.g., in constructing aggregates).

> So it looks like the SCOTCH reordering is faster to build but performs
> similar to the old Boost reordering?
>

Yes. Importantly, both re-ordering schemes appear to scale linearly
with the number of dofs.

Garth

> Johan
>
>
> On Mon, Sep 16, 2013 at 2:32 PM, Garth N. Wells <[email protected]> wrote:
>>
>> Here are some quick numbers for anyone interested in the impact of dof
>> re-ordering, and why DOLFIN renumbers the UFC dofmap by default. For a
>> vector-valued reaction-diffusion problem using P2 Lagrange elements in
>> 3D (6.44M dofs) and PETSc as the backend:
>>
>> Ordering method             | None  | Random* |  A*   | A     | B*    | B
>>
>> ----------------------------|-------|---------|-------|-------|-------|-------
>> Call graph ordering         | 0.0s  | -       | 1.2s  | 9.2s  | 2.4s  |
>> 17.87
>> Assembly                    | 27.9s | 34.4s   | 29.3s | 29.4s | 30.2s |
>> 32s
>> 100 CG + Jacobi iterations  | 221s  | 1177s   | 142s  | 176s  | 140s  |
>> 160s
>>
>>
>> A: New SCOTCH reordering (Gibbs-Poole-Stockmeyer)
>> B: Boost reordering (reverse Cuthill-McKee)
>> *: With dof blocking
>>
>> The Krylov solver speed-up with re-ordering and dof blocking is
>> dramatic. Speed-ups are also observed when using direct solvers, but
>> these are less dramatic. The small drop in assembly when the dofs are
>> re-ordered is due to reduced memory locality in the linear algebra
>> objects with respect to the order in which the mesh cells are iterated
>> over. This can be (and soon will be) fixed by re-ordering the mesh
>> indices. It has been observed that some solvers (e.g. BommerAMG fail
>> to converge for this problem without re-ordering, and converge in <10
>> iterations with re-ordering and dof blocking).
>>
>> The DOLFIN UnitFooMesh ordering is quite cache-friendly. Meshes
>> created by 3rd-party libraries are unlikely to be as cache-friendly
>> and could tend towards the random ordering timing,
>>
>> Garth
>> _______________________________________________
>> fenics mailing list
>> [email protected]
>> http://fenicsproject.org/mailman/listinfo/fenics
>
>
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to