... >> > But we also need to remember that >> > >> > 1. Solve time may dominate assemble anyway so that's where we should >> > optimize. >> > >> > 2. Assembling the action instead of the operator removes the A.add() >> > bottleneck. >> > >> > As mentioned before, we are experimenting with iterating locally over >> > cells sharing common dofs and combining batches of element tensors >> > before inserting into the global sparse matrix row by row. Let's see >> > how it goes. >> > >> > Some other comments: >> > >> > Murtazo: It's interesting that femLego is that much faster. It would >> > be interesting to see exactly why it is faster. Can you take a simple >> > example (maybe even Poisson) and profile both femLego and DOLFIN on >> > the same mesh and get detailed results for tabulate_tensor, >> > tabulate_dofs, A.add(). If femLego is faster on A.add(), then what >> > linear algebra backend is it using? >> >> Yes, the test we did it is simple 2D Poisson with unite square mesh and >> an >> assembly in FemLego is 3 time faster, because A.add() is done in a way I >> wrote in previous mails. The linear algebra package is AZTEC. Perhaps, >> dolfin should be much faster than femLego if A.add() is the same, since >> FFC is very fast than quadrature rule. > > I thought AZTEC was just solvers. I mean what sparse matrix format is > used. And what does the interface look like for communicating the > exact position for insertion?
The sparse matrix is just double* atw; Attached I send you the subroutine which does this A.add(): idxatw(el,li,lj) is index of global matrix for cell = ell, row = li, col = lj. > >> > Murtazo: It seems you suggest we should basically store some kind of >> > index/pointer for where to write directly to memory when adding the >> > entries. This has two problems: (i) we need to store quite a few of >> > these indices (n^2*num_cells where n is the local space dimension), >> > and (ii) it requires cooperation from the linear algebra backend. >> > We would need to ask PETSc to tell us where in its private data >> > structures it inserted the entries. >> >> Yes, maybe there is a better way to do it. If we store the global >> indices >> of the A it will be totally A.nz()*numVertices*num_components, but still >> we will have a speedup which is more important in some case. > > That doesn't look correct to me. I think we would need n^2*num_cells. > We iterate over the cells and for each cell we have n^2 entries and we > need to know where to insert each one of those. > I have contributed and have experience with femLego (i did my exjob with that). This way works well in parallel also. The problem may not be problem for a very large mesh, since anyway one need to use parallel processors. murtazo _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
