Re: [DOLFIN-dev] profiling an assembly

Murtazo Nazarov Sun, 18 May 2008 13:08:06 -0700

...
>> > But we also need to remember that
>> >
>> > 1. Solve time may dominate assemble anyway so that's where we should
>> > optimize.
>> >
>> > 2. Assembling the action instead of the operator removes the A.add()
>> > bottleneck.
>> >
>> > As mentioned before, we are experimenting with iterating locally over
>> > cells sharing common dofs and combining batches of element tensors
>> > before inserting into the global sparse matrix row by row. Let's see
>> > how it goes.
>> >
>> > Some other comments:
>> >
>> > Murtazo: It's interesting that femLego is that much faster. It would
>> > be interesting to see exactly why it is faster. Can you take a simple
>> > example (maybe even Poisson) and profile both femLego and DOLFIN on
>> > the same mesh and get detailed results for tabulate_tensor,
>> > tabulate_dofs, A.add(). If femLego is faster on A.add(), then what
>> > linear algebra backend is it using?
>>
>> Yes, the test we did it is simple 2D Poisson with unite square mesh and
>> an
>> assembly in FemLego is 3 time faster, because A.add() is done in a way I
>> wrote in previous mails. The linear algebra package is AZTEC. Perhaps,
>> dolfin should be much faster than femLego if A.add() is the same, since
>> FFC is very fast than quadrature rule.
>
> I thought AZTEC was just solvers. I mean what sparse matrix format is
> used. And what does the interface look like for communicating the
> exact position for insertion?


The sparse matrix is just double* atw; Attached I send you the subroutine 
which does this A.add(): idxatw(el,li,lj) is index of global matrix for
cell = ell, row = li, col = lj.


>
>> > Murtazo: It seems you suggest we should basically store some kind of
>> > index/pointer for where to write directly to memory when adding the
>> > entries. This has two problems: (i) we need to store quite a few of
>> > these indices (n^2*num_cells where n is the local space dimension),
>> > and (ii) it requires cooperation from the linear algebra backend.
>> > We would need to ask PETSc to tell us where in its private data
>> > structures it inserted the entries.
>>
>> Yes, maybe there is a better way to do it. If we store the global
>> indices
>> of the A it will be totally A.nz()*numVertices*num_components, but still
>> we will have a speedup which is more important in some case.
>
> That doesn't look correct to me. I think we would need n^2*num_cells.
> We iterate over the cells and for each cell we have n^2 entries and we
> need to know where to insert each one of those.
>

I have contributed and have experience with femLego (i did my exjob with
that). This way works well in parallel also. The problem may not be
problem for a very large mesh, since anyway one need to use parallel
processors.

murtazo

_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Re: [DOLFIN-dev] profiling an assembly

Reply via email to