Anders Logg wrote:
> On Sun, May 18, 2008 at 10:55:10PM +0200, Johan Hoffman wrote:
>   
>>> On Sun 2008-05-18 21:54, Johan Hoffman wrote:
>>>       
>>>>> On Sat, May 17, 2008 at 04:40:48PM +0200, Johan Hoffman wrote:
>>>>>
>>>>> 1. Solve time may dominate assemble anyway so that's where we should
>>>>> optimize.
>>>>>           
>>>> Yes, there may be such cases, in particular for simple forms (Laplace
>>>> equation etc.). For more complex forms with more terms and coefficients,
>>>> assembly typically dominates, from what I have seen. This is the case
>>>> for
>>>> the flow problems of Murtazo for example.
>>>>         
>>> This probably depends if you use are using a projection method.  If you
>>> are
>>> solving the saddle point problem, you can forget about assembly time.
>>>       
>> Well, this is not what we see. I agree that this is what you would like,
>> but this is not the case now. That is why we are now focusing on the
>> assembly bottleneck.
>>
>> But
>>     
>>> optimizing the solve is all about constructing a good preconditioner.  If
>>> the
>>> operator is elliptic then AMG should work well and you don't have to
>>> think, but
>>> if it is indefinite all bets are off.  I think we can build saddle point
>>> preconditioners now by writing some funny-looking mixed form files, but
>>> that
>>> could be made easier.
>>>       
>> We use a splitting approach with GMRES for the momentum equation and AMG
>> for the continuity equations. This appears to work faitly well. As I said,
>> the assembly of the momentum equation is dominating.
>>
>>     
>>>>> 2. Assembling the action instead of the operator removes the A.add()
>>>>> bottleneck.
>>>>>           
>>>> True. But it may be worthwhile to put some effort into optimizing also
>>>> the
>>>> matrix assembly.
>>>>         
>>> In any case, you have to form something to precondition with.
>>>
>>>       
>>>>> As mentioned before, we are experimenting with iterating locally over
>>>>> cells sharing common dofs and combining batches of element tensors
>>>>> before inserting into the global sparse matrix row by row. Let's see
>>>>> how it goes.
>>>>>           
>>>> Yes, this is interesting. Would be very interesting to hear about the
>>>> progress.
>>>>
>>>> It is also interesting to understand what would optimize the insertion
>>>> for
>>>> different linear algebra backends, in particular Jed seems to have a
>>>> good
>>>> knowledge on petsc. We could then build backend optimimization into the
>>>> local dof-orderings etc.
>>>>         
>>> I just press M-. when I'm curious :-)
>>>
>>> I can't imagine it pays to optimize for a particular backend (it's not
>>> PETSc
>>> anyway, rather whichever format is used by the preconditioner).  The CSR
>>> data
>>> structure is pretty common, but it will always be fastest to insert an
>>> entire
>>> row at once.  If using an intermediate hashed structure makes this
>>> convenient,
>>> then it would help.  The paper I posted assembles the entire matrix in
>>> hashed
>>> format and then converts it to CSR.  I'll guess that a hashed cache for
>>> the
>>> assembly (flushed every few MiB, for instance) would work at least as well
>>> as
>>> assembling the entire thing in hashed format.
>>>       
>> Yes, it seems that some form of hashed structure is a good possibility to
>> optimize. What Murtazo is referring to would be similar to hash the whole
>> matrix as in the paper you posted,
>>     
>
> The way I interpret it, they are very different. The hash would store
> a mapping from (i, j) to values while Murtazo suggest storing a
> mapping from (element, i, j) to values.
>
>   
Sorry, if i, j is already in global, than (element, i,j) is equivalent 
to just (i,j), it means we can just do mapping of (i,j) to values. The 
reason I included the element is that, i and j a local before the global 
mapping.

_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Reply via email to