Re: [DOLFIN-dev] profiling an assembly

Murtazo Nazarov Mon, 19 May 2008 07:43:36 -0700

Anders Logg wrote:
> On Mon, May 19, 2008 at 03:44:16PM +0200, Murtazo Nazarov wrote:
>   
>> Anders Logg wrote:
>>     
>>> On Sun, May 18, 2008 at 10:55:10PM +0200, Johan Hoffman wrote:
>>>   
>>>       
>>>>> On Sun 2008-05-18 21:54, Johan Hoffman wrote:
>>>>>       
>>>>>           
>>>>>>> On Sat, May 17, 2008 at 04:40:48PM +0200, Johan Hoffman wrote:
>>>>>>>
>>>>>>> 1. Solve time may dominate assemble anyway so that's where we should
>>>>>>> optimize.
>>>>>>>           
>>>>>>>               
>>>>>> Yes, there may be such cases, in particular for simple forms (Laplace
>>>>>> equation etc.). For more complex forms with more terms and coefficients,
>>>>>> assembly typically dominates, from what I have seen. This is the case
>>>>>> for
>>>>>> the flow problems of Murtazo for example.
>>>>>>         
>>>>>>             
>>>>> This probably depends if you use are using a projection method.  If you
>>>>> are
>>>>> solving the saddle point problem, you can forget about assembly time.
>>>>>       
>>>>>           
>>>> Well, this is not what we see. I agree that this is what you would like,
>>>> but this is not the case now. That is why we are now focusing on the
>>>> assembly bottleneck.
>>>>
>>>> But
>>>>     
>>>>         
>>>>> optimizing the solve is all about constructing a good preconditioner.  If
>>>>> the
>>>>> operator is elliptic then AMG should work well and you don't have to
>>>>> think, but
>>>>> if it is indefinite all bets are off.  I think we can build saddle point
>>>>> preconditioners now by writing some funny-looking mixed form files, but
>>>>> that
>>>>> could be made easier.
>>>>>       
>>>>>           
>>>> We use a splitting approach with GMRES for the momentum equation and AMG
>>>> for the continuity equations. This appears to work faitly well. As I said,
>>>> the assembly of the momentum equation is dominating.
>>>>
>>>>     
>>>>         
>>>>>>> 2. Assembling the action instead of the operator removes the A.add()
>>>>>>> bottleneck.
>>>>>>>           
>>>>>>>               
>>>>>> True. But it may be worthwhile to put some effort into optimizing also
>>>>>> the
>>>>>> matrix assembly.
>>>>>>         
>>>>>>             
>>>>> In any case, you have to form something to precondition with.
>>>>>
>>>>>       
>>>>>           
>>>>>>> As mentioned before, we are experimenting with iterating locally over
>>>>>>> cells sharing common dofs and combining batches of element tensors
>>>>>>> before inserting into the global sparse matrix row by row. Let's see
>>>>>>> how it goes.
>>>>>>>           
>>>>>>>               
>>>>>> Yes, this is interesting. Would be very interesting to hear about the
>>>>>> progress.
>>>>>>
>>>>>> It is also interesting to understand what would optimize the insertion
>>>>>> for
>>>>>> different linear algebra backends, in particular Jed seems to have a
>>>>>> good
>>>>>> knowledge on petsc. We could then build backend optimimization into the
>>>>>> local dof-orderings etc.
>>>>>>         
>>>>>>             
>>>>> I just press M-. when I'm curious :-)
>>>>>
>>>>> I can't imagine it pays to optimize for a particular backend (it's not
>>>>> PETSc
>>>>> anyway, rather whichever format is used by the preconditioner).  The CSR
>>>>> data
>>>>> structure is pretty common, but it will always be fastest to insert an
>>>>> entire
>>>>> row at once.  If using an intermediate hashed structure makes this
>>>>> convenient,
>>>>> then it would help.  The paper I posted assembles the entire matrix in
>>>>> hashed
>>>>> format and then converts it to CSR.  I'll guess that a hashed cache for
>>>>> the
>>>>> assembly (flushed every few MiB, for instance) would work at least as well
>>>>> as
>>>>> assembling the entire thing in hashed format.
>>>>>       
>>>>>           
>>>> Yes, it seems that some form of hashed structure is a good possibility to
>>>> optimize. What Murtazo is referring to would be similar to hash the whole
>>>> matrix as in the paper you posted,
>>>>     
>>>>         
>>> The way I interpret it, they are very different. The hash would store
>>> a mapping from (i, j) to values while Murtazo suggest storing a
>>> mapping from (element, i, j) to values.
>>>
>>>   
>>>       
>> Sorry, if i, j is already in global, than (element, i,j) is equivalent 
>> to just (i,j), it means we can just do mapping of (i,j) to values. The 
>> reason I included the element is that, i and j a local before the global 
>> mapping.
>>     
>
> How is that possible?
>
> When we assemble, we need to know for each element where to insert
> each of the local n^2 entries. Then we need a mapping from
> (element, i, j) to a position in the global matrix.
>
>   
True. Maybe I am confusing with the notations. Right, the cost will be 
n2*num_cells.


murtazo
_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Re: [DOLFIN-dev] profiling an assembly

Reply via email to