Anders Logg wrote: > On Mon, May 19, 2008 at 03:44:16PM +0200, Murtazo Nazarov wrote: > >> Anders Logg wrote: >> >>> On Sun, May 18, 2008 at 10:55:10PM +0200, Johan Hoffman wrote: >>> >>> >>>>> On Sun 2008-05-18 21:54, Johan Hoffman wrote: >>>>> >>>>> >>>>>>> On Sat, May 17, 2008 at 04:40:48PM +0200, Johan Hoffman wrote: >>>>>>> >>>>>>> 1. Solve time may dominate assemble anyway so that's where we should >>>>>>> optimize. >>>>>>> >>>>>>> >>>>>> Yes, there may be such cases, in particular for simple forms (Laplace >>>>>> equation etc.). For more complex forms with more terms and coefficients, >>>>>> assembly typically dominates, from what I have seen. This is the case >>>>>> for >>>>>> the flow problems of Murtazo for example. >>>>>> >>>>>> >>>>> This probably depends if you use are using a projection method. If you >>>>> are >>>>> solving the saddle point problem, you can forget about assembly time. >>>>> >>>>> >>>> Well, this is not what we see. I agree that this is what you would like, >>>> but this is not the case now. That is why we are now focusing on the >>>> assembly bottleneck. >>>> >>>> But >>>> >>>> >>>>> optimizing the solve is all about constructing a good preconditioner. If >>>>> the >>>>> operator is elliptic then AMG should work well and you don't have to >>>>> think, but >>>>> if it is indefinite all bets are off. I think we can build saddle point >>>>> preconditioners now by writing some funny-looking mixed form files, but >>>>> that >>>>> could be made easier. >>>>> >>>>> >>>> We use a splitting approach with GMRES for the momentum equation and AMG >>>> for the continuity equations. This appears to work faitly well. As I said, >>>> the assembly of the momentum equation is dominating. >>>> >>>> >>>> >>>>>>> 2. Assembling the action instead of the operator removes the A.add() >>>>>>> bottleneck. >>>>>>> >>>>>>> >>>>>> True. But it may be worthwhile to put some effort into optimizing also >>>>>> the >>>>>> matrix assembly. >>>>>> >>>>>> >>>>> In any case, you have to form something to precondition with. >>>>> >>>>> >>>>> >>>>>>> As mentioned before, we are experimenting with iterating locally over >>>>>>> cells sharing common dofs and combining batches of element tensors >>>>>>> before inserting into the global sparse matrix row by row. Let's see >>>>>>> how it goes. >>>>>>> >>>>>>> >>>>>> Yes, this is interesting. Would be very interesting to hear about the >>>>>> progress. >>>>>> >>>>>> It is also interesting to understand what would optimize the insertion >>>>>> for >>>>>> different linear algebra backends, in particular Jed seems to have a >>>>>> good >>>>>> knowledge on petsc. We could then build backend optimimization into the >>>>>> local dof-orderings etc. >>>>>> >>>>>> >>>>> I just press M-. when I'm curious :-) >>>>> >>>>> I can't imagine it pays to optimize for a particular backend (it's not >>>>> PETSc >>>>> anyway, rather whichever format is used by the preconditioner). The CSR >>>>> data >>>>> structure is pretty common, but it will always be fastest to insert an >>>>> entire >>>>> row at once. If using an intermediate hashed structure makes this >>>>> convenient, >>>>> then it would help. The paper I posted assembles the entire matrix in >>>>> hashed >>>>> format and then converts it to CSR. I'll guess that a hashed cache for >>>>> the >>>>> assembly (flushed every few MiB, for instance) would work at least as well >>>>> as >>>>> assembling the entire thing in hashed format. >>>>> >>>>> >>>> Yes, it seems that some form of hashed structure is a good possibility to >>>> optimize. What Murtazo is referring to would be similar to hash the whole >>>> matrix as in the paper you posted, >>>> >>>> >>> The way I interpret it, they are very different. The hash would store >>> a mapping from (i, j) to values while Murtazo suggest storing a >>> mapping from (element, i, j) to values. >>> >>> >>> >> Sorry, if i, j is already in global, than (element, i,j) is equivalent >> to just (i,j), it means we can just do mapping of (i,j) to values. The >> reason I included the element is that, i and j a local before the global >> mapping. >> > > How is that possible? > > When we assemble, we need to know for each element where to insert > each of the local n^2 entries. Then we need a mapping from > (element, i, j) to a position in the global matrix. > > True. Maybe I am confusing with the notations. Right, the cost will be n2*num_cells.
murtazo _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
