On Mon, May 19, 2008 at 03:44:16PM +0200, Murtazo Nazarov wrote: > Anders Logg wrote: > > On Sun, May 18, 2008 at 10:55:10PM +0200, Johan Hoffman wrote: > > > >>> On Sun 2008-05-18 21:54, Johan Hoffman wrote: > >>> > >>>>> On Sat, May 17, 2008 at 04:40:48PM +0200, Johan Hoffman wrote: > >>>>> > >>>>> 1. Solve time may dominate assemble anyway so that's where we should > >>>>> optimize. > >>>>> > >>>> Yes, there may be such cases, in particular for simple forms (Laplace > >>>> equation etc.). For more complex forms with more terms and coefficients, > >>>> assembly typically dominates, from what I have seen. This is the case > >>>> for > >>>> the flow problems of Murtazo for example. > >>>> > >>> This probably depends if you use are using a projection method. If you > >>> are > >>> solving the saddle point problem, you can forget about assembly time. > >>> > >> Well, this is not what we see. I agree that this is what you would like, > >> but this is not the case now. That is why we are now focusing on the > >> assembly bottleneck. > >> > >> But > >> > >>> optimizing the solve is all about constructing a good preconditioner. If > >>> the > >>> operator is elliptic then AMG should work well and you don't have to > >>> think, but > >>> if it is indefinite all bets are off. I think we can build saddle point > >>> preconditioners now by writing some funny-looking mixed form files, but > >>> that > >>> could be made easier. > >>> > >> We use a splitting approach with GMRES for the momentum equation and AMG > >> for the continuity equations. This appears to work faitly well. As I said, > >> the assembly of the momentum equation is dominating. > >> > >> > >>>>> 2. Assembling the action instead of the operator removes the A.add() > >>>>> bottleneck. > >>>>> > >>>> True. But it may be worthwhile to put some effort into optimizing also > >>>> the > >>>> matrix assembly. > >>>> > >>> In any case, you have to form something to precondition with. > >>> > >>> > >>>>> As mentioned before, we are experimenting with iterating locally over > >>>>> cells sharing common dofs and combining batches of element tensors > >>>>> before inserting into the global sparse matrix row by row. Let's see > >>>>> how it goes. > >>>>> > >>>> Yes, this is interesting. Would be very interesting to hear about the > >>>> progress. > >>>> > >>>> It is also interesting to understand what would optimize the insertion > >>>> for > >>>> different linear algebra backends, in particular Jed seems to have a > >>>> good > >>>> knowledge on petsc. We could then build backend optimimization into the > >>>> local dof-orderings etc. > >>>> > >>> I just press M-. when I'm curious :-) > >>> > >>> I can't imagine it pays to optimize for a particular backend (it's not > >>> PETSc > >>> anyway, rather whichever format is used by the preconditioner). The CSR > >>> data > >>> structure is pretty common, but it will always be fastest to insert an > >>> entire > >>> row at once. If using an intermediate hashed structure makes this > >>> convenient, > >>> then it would help. The paper I posted assembles the entire matrix in > >>> hashed > >>> format and then converts it to CSR. I'll guess that a hashed cache for > >>> the > >>> assembly (flushed every few MiB, for instance) would work at least as well > >>> as > >>> assembling the entire thing in hashed format. > >>> > >> Yes, it seems that some form of hashed structure is a good possibility to > >> optimize. What Murtazo is referring to would be similar to hash the whole > >> matrix as in the paper you posted, > >> > > > > The way I interpret it, they are very different. The hash would store > > a mapping from (i, j) to values while Murtazo suggest storing a > > mapping from (element, i, j) to values. > > > > > Sorry, if i, j is already in global, than (element, i,j) is equivalent > to just (i,j), it means we can just do mapping of (i,j) to values. The > reason I included the element is that, i and j a local before the global > mapping.
How is that possible? When we assemble, we need to know for each element where to insert each of the local n^2 entries. Then we need a mapping from (element, i, j) to a position in the global matrix. -- Anders _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
