Re: [Dolfin] assemble of Matrix with Real spaces slow

Garth N. Wells Fri, 04 Mar 2011 08:48:29 -0800


On 04/03/11 16:38, Johan Hake wrote:
> On Friday March 4 2011 03:29:32 Garth N. Wells wrote:
>> On 03/03/11 19:48, Johan Hake wrote:
>>> On Thursday March 3 2011 11:20:03 Marie E. Rognes wrote:
>>>> On 03/03/2011 08:03 PM, Johan Hake wrote:
>>>>> Hello!
>>>>>
>>>>> I am using Mixed spaces with Reals quite alot. It turnes out that
>>>>> assemble forms with functions from MixedFunctionSpaces containing Real
>>>>> spaces are dead slow. The time spent also increase with the number of
>>>>> included Real spaces, even if none of them are included in form which
>>>>> is assembled.
>>>>>
>>>>> The attached test script illustrates this.
>>>>
>>>> By replacing "CG", 1 by "R", 0 or?
>>>
>>> OMG!! Yes, *flush*
>>>
>>> That explains the memory usage :P
>>>
>>>>> The test script also reviels that an unproportial time is spent in FFC
>>>>> generating the code. This time also increase with the number of Real
>>>>> spaces included. Turning of FErari helped a bit with this point.
>>>>
>>>> I can take a look on the FFC side, but not today.
>>>
>>> Nice!
>>>
>>> With the update correction from Marie the numbers now looks like:
>>>
>>> With PETSc backend
>>>
>>> Tensor without Mixed space  |       0.11211     0.11211     1
>>> With 1 global dofs          |        1.9482      1.9482     1
>>> With 2 global dofs          |        2.8725      2.8725     1
>>> With 4 global dofs          |        5.1959      5.1959     1
>>> With 8 global dofs          |        10.524      10.524     1
>>> With 16 global dofs         |        25.574      25.574     1
>>>
>>> With Epetra backend
>>>
>>> Tensor without Mixed space  |       0.87544     0.87544     1
>>> With 1 global dofs          |        1.7089      1.7089     1
>>> With 2 global dofs          |        2.6868      2.6868     1
>>> With 4 global dofs          |          4.28        4.28     1
>>> With 8 global dofs          |         8.123       8.123     1
>>> With 16 global dofs         |        17.394      17.394     1
>>>
>>> Still a pretty big increase in time for just adding 16 scalar dofs to a
>>> system of 274625 dofs in teh first place.
>>
>> I have seen this big slow down for large problems. The first issue,
>> which was the computation of the sparsity pattern, has been 'resolved'
>> by using boost::unordered_set. This comes at the expense of the a small
>> slow down for regular problems.
>>
>> I also noticed that Epetra performs much better for these problems than
>> PETSc does. We need to check the matrix initialisation, but it could
>> ultimately be a limitation of the backends. Each matrix row
>> corresponding to a global dof is full, and it may be that backends
>> designed for large sparse matrices do not handle this well.
> 
> How could inserting into the matrix be the bottle neck. In the test script I 
> attached I do not assemble any global dofs.
>


I think that you've find that it is. It will be assembling zeroes in the
global dof positions.

>> The best approach is probably to add the entire row at once for global
>> dofs. This would require a modified assembler.
>>
>> There is a UFC Blueprint to identify global dofs:
>>
>>     https://blueprints.launchpad.net/ufc/+spec/global-dofs
>>
>> If we can identify global dofs, we have a better chance of dealing with
>> the problem properly. This includes running in parallel with global dofs.
> 
> Do you envision to have integrals over global dofs to get separated into its 
> own tabulate tensor function? Then in DOLFIN we can assemble the whole 
> row/column in one loop and insert it into the Matrix in one go.
>

No - I had in mind possibly adding only cell-based dofs to the matrix,
and adding the global rows into a global vector, then is that inserted
at the end (as one row) into a matrix. I'm not advocating at this stage
a change to the tabulate_foo interface.

> Do you think we also need to recognise global dofs in UFL to properly flesh 
> out these integrals?
> 

Yes. For one to handle them properly in parallel since global dofs to
not reside at mesh entity, and the domains are parallelised mesh-wise.

Garth

> Johan
> 
>> Garth
>>
>>> Johan
>>>
>>>> --
>>>> Marie
>>>>
>>>>> I have not profiled any of this, but I just throw it out there. I do
>>>>> not recognize any difference between for example Epetra or PETSc
>>>>> backend as suggested in the fixed bug for building of sparsity pattern
>>>>> with global dofs.
>>>>>
>>>>> My test has been done on a DOLFIN 0.9.9+. I haven't profiled it yet.
>>>>>
>>>>> Output from summary:
>>>>>    Tensor without Mixed space  |       0.11401     0.11401     1
>>>>>    With 1 global dofs          |       0.40725     0.40725     1
>>>>>    With 2 global dofs          |       0.94694     0.94694     1
>>>>>    With 4 global dofs          |         2.763       2.763     1
>>>>>    With 8 global dofs          |        9.6149      9.6149     1
>>>>>
>>>>> Also the amount of memory used to build the sparsity patter seams to
>>>>> double for each step. The memory print for a 32x32x32 unit cube with 16
>>>>> global dofs was 1.6 GB memory(!?).
>>>>>
>>>>> Johan
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~dolfin
>>>>> Post to     : [email protected]
>>>>> Unsubscribe : https://launchpad.net/~dolfin
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dolfin
>>>> Post to     : [email protected]
>>>> Unsubscribe : https://launchpad.net/~dolfin
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dolfin
>>>> Post to     : [email protected]
>>>> Unsubscribe : https://launchpad.net/~dolfin
>>>> More help   : https://help.launchpad.net/ListHelp
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dolfin
>> Post to     : [email protected]
>> Unsubscribe : https://launchpad.net/~dolfin
>> More help   : https://help.launchpad.net/ListHelp


_______________________________________________
Mailing list: https://launchpad.net/~dolfin
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~dolfin
More help   : https://help.launchpad.net/ListHelp

Re: [Dolfin] assemble of Matrix with Real spaces slow

Reply via email to