On Sun, May 18, 2008 at 10:55:10PM +0200, Johan Hoffman wrote: > > On Sun 2008-05-18 21:54, Johan Hoffman wrote: > >> > On Sat, May 17, 2008 at 04:40:48PM +0200, Johan Hoffman wrote: > >> > > >> > 1. Solve time may dominate assemble anyway so that's where we should > >> > optimize. > >> > >> Yes, there may be such cases, in particular for simple forms (Laplace > >> equation etc.). For more complex forms with more terms and coefficients, > >> assembly typically dominates, from what I have seen. This is the case > >> for > >> the flow problems of Murtazo for example. > > > > This probably depends if you use are using a projection method. If you > > are > > solving the saddle point problem, you can forget about assembly time. > > Well, this is not what we see. I agree that this is what you would like, > but this is not the case now. That is why we are now focusing on the > assembly bottleneck. > > But > > optimizing the solve is all about constructing a good preconditioner. If > > the > > operator is elliptic then AMG should work well and you don't have to > > think, but > > if it is indefinite all bets are off. I think we can build saddle point > > preconditioners now by writing some funny-looking mixed form files, but > > that > > could be made easier. > > We use a splitting approach with GMRES for the momentum equation and AMG > for the continuity equations. This appears to work faitly well. As I said, > the assembly of the momentum equation is dominating. > > > > >> > 2. Assembling the action instead of the operator removes the A.add() > >> > bottleneck. > >> > >> True. But it may be worthwhile to put some effort into optimizing also > >> the > >> matrix assembly. > > > > In any case, you have to form something to precondition with. > > > >> > As mentioned before, we are experimenting with iterating locally over > >> > cells sharing common dofs and combining batches of element tensors > >> > before inserting into the global sparse matrix row by row. Let's see > >> > how it goes. > >> > >> Yes, this is interesting. Would be very interesting to hear about the > >> progress. > >> > >> It is also interesting to understand what would optimize the insertion > >> for > >> different linear algebra backends, in particular Jed seems to have a > >> good > >> knowledge on petsc. We could then build backend optimimization into the > >> local dof-orderings etc. > > > > I just press M-. when I'm curious :-) > > > > I can't imagine it pays to optimize for a particular backend (it's not > > PETSc > > anyway, rather whichever format is used by the preconditioner). The CSR > > data > > structure is pretty common, but it will always be fastest to insert an > > entire > > row at once. If using an intermediate hashed structure makes this > > convenient, > > then it would help. The paper I posted assembles the entire matrix in > > hashed > > format and then converts it to CSR. I'll guess that a hashed cache for > > the > > assembly (flushed every few MiB, for instance) would work at least as well > > as > > assembling the entire thing in hashed format. > > Yes, it seems that some form of hashed structure is a good possibility to > optimize. What Murtazo is referring to would be similar to hash the whole > matrix as in the paper you posted,
The way I interpret it, they are very different. The hash would store a mapping from (i, j) to values while Murtazo suggest storing a mapping from (element, i, j) to values. -- Anders > and at Simula work on the row hashed > structures appears to be under way. > > It will be interesting to see the results. > > /Johan > > > > > Jed > > _______________________________________________ > > DOLFIN-dev mailing list > > [email protected] > > http://www.fenics.org/mailman/listinfo/dolfin-dev > > > > > _______________________________________________ > DOLFIN-dev mailing list > [email protected] > http://www.fenics.org/mailman/listinfo/dolfin-dev _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
