On Friday March 4 2011 09:23:58 Garth N. Wells wrote: > On 04/03/11 17:11, Johan Hake wrote: > > On Friday March 4 2011 08:48:14 Garth N. Wells wrote: > >> On 04/03/11 16:38, Johan Hake wrote: > >>> On Friday March 4 2011 03:29:32 Garth N. Wells wrote: > >>>> On 03/03/11 19:48, Johan Hake wrote: > >>>>> On Thursday March 3 2011 11:20:03 Marie E. Rognes wrote: > >>>>>> On 03/03/2011 08:03 PM, Johan Hake wrote: > >>>>>>> Hello! > >>>>>>> > >>>>>>> I am using Mixed spaces with Reals quite alot. It turnes out that > >>>>>>> assemble forms with functions from MixedFunctionSpaces containing > >>>>>>> Real spaces are dead slow. The time spent also increase with the > >>>>>>> number of included Real spaces, even if none of them are included > >>>>>>> in form which is assembled. > >>>>>>> > >>>>>>> The attached test script illustrates this. > >>>>>> > >>>>>> By replacing "CG", 1 by "R", 0 or? > >>>>> > >>>>> OMG!! Yes, *flush* > >>>>> > >>>>> That explains the memory usage :P > >>>>> > >>>>>>> The test script also reviels that an unproportial time is spent in > >>>>>>> FFC generating the code. This time also increase with the number of > >>>>>>> Real spaces included. Turning of FErari helped a bit with this > >>>>>>> point. > >>>>>> > >>>>>> I can take a look on the FFC side, but not today. > >>>>> > >>>>> Nice! > >>>>> > >>>>> With the update correction from Marie the numbers now looks like: > >>>>> > >>>>> With PETSc backend > >>>>> > >>>>> Tensor without Mixed space | 0.11211 0.11211 1 > >>>>> With 1 global dofs | 1.9482 1.9482 1 > >>>>> With 2 global dofs | 2.8725 2.8725 1 > >>>>> With 4 global dofs | 5.1959 5.1959 1 > >>>>> With 8 global dofs | 10.524 10.524 1 > >>>>> With 16 global dofs | 25.574 25.574 1 > >>>>> > >>>>> With Epetra backend > >>>>> > >>>>> Tensor without Mixed space | 0.87544 0.87544 1 > >>>>> With 1 global dofs | 1.7089 1.7089 1 > >>>>> With 2 global dofs | 2.6868 2.6868 1 > >>>>> With 4 global dofs | 4.28 4.28 1 > >>>>> With 8 global dofs | 8.123 8.123 1 > >>>>> With 16 global dofs | 17.394 17.394 1 > >>>>> > >>>>> Still a pretty big increase in time for just adding 16 scalar dofs to > >>>>> a system of 274625 dofs in teh first place. > >>>> > >>>> I have seen this big slow down for large problems. The first issue, > >>>> which was the computation of the sparsity pattern, has been 'resolved' > >>>> by using boost::unordered_set. This comes at the expense of the a > >>>> small slow down for regular problems. > >>>> > >>>> I also noticed that Epetra performs much better for these problems > >>>> than PETSc does. We need to check the matrix initialisation, but it > >>>> could ultimately be a limitation of the backends. Each matrix row > >>>> corresponding to a global dof is full, and it may be that backends > >>>> designed for large sparse matrices do not handle this well. > >>> > >>> How could inserting into the matrix be the bottle neck. In the test > >>> script I attached I do not assemble any global dofs. > >> > >> I think that you've find that it is. It will be assembling zeroes in the > >> global dof positions. > > > > I guess you are right. Is the sparsity pattern and also the tabulated > > tensor based only on the MixedSpace formulation, and not on the actual > > integral? > > The sparsity pattern is based on the dof map, which depends on the > function spaces. > > > Is this a bug or feature? > > I would say just the natural approach. There is/was a Blueprint to avoid > computing and assembling the zeroes in problems like Stokes, but I'm not > sure that this would be worthwhile, since it would involve assembling > non-matrix values, and most backends want to assemble dense local > matrices into sparse global matrices.
Make sense. I guess the bottle neck is then the insertion into the global matrix, where your suggested approach might improve the performance. Sounds like a pretty hard fix though... Johan > Garth > > >>>> The best approach is probably to add the entire row at once for global > >>>> dofs. This would require a modified assembler. > >>>> > >>>> There is a UFC Blueprint to identify global dofs: > >>>> https://blueprints.launchpad.net/ufc/+spec/global-dofs > >>>> > >>>> If we can identify global dofs, we have a better chance of dealing > >>>> with the problem properly. This includes running in parallel with > >>>> global dofs. > >>> > >>> Do you envision to have integrals over global dofs to get separated > >>> into its own tabulate tensor function? Then in DOLFIN we can assemble > >>> the whole row/column in one loop and insert it into the Matrix in one > >>> go. > >> > >> No - I had in mind possibly adding only cell-based dofs to the matrix, > >> and adding the global rows into a global vector, then is that inserted > >> at the end (as one row) into a matrix. I'm not advocating at this stage > >> a change to the tabulate_foo interface. > > > > Ok, but you need its own tabulate_tensor function for the global dofs? > > > >>> Do you think we also need to recognise global dofs in UFL to properly > >>> flesh out these integrals? > >> > >> Yes. For one to handle them properly in parallel since global dofs to > >> not reside at mesh entity, and the domains are parallelised mesh-wise. > > > > Ok > > > > Johan > > > >> Garth > >> > >>> Johan > >>> > >>>> Garth > >>>> > >>>>> Johan > >>>>> > >>>>>> -- > >>>>>> Marie > >>>>>> > >>>>>>> I have not profiled any of this, but I just throw it out there. I > >>>>>>> do not recognize any difference between for example Epetra or > >>>>>>> PETSc backend as suggested in the fixed bug for building of > >>>>>>> sparsity pattern with global dofs. > >>>>>>> > >>>>>>> My test has been done on a DOLFIN 0.9.9+. I haven't profiled it > >>>>>>> yet. > >>>>>>> > >>>>>>> Output from summary: > >>>>>>> Tensor without Mixed space | 0.11401 0.11401 1 > >>>>>>> With 1 global dofs | 0.40725 0.40725 1 > >>>>>>> With 2 global dofs | 0.94694 0.94694 1 > >>>>>>> With 4 global dofs | 2.763 2.763 1 > >>>>>>> With 8 global dofs | 9.6149 9.6149 1 > >>>>>>> > >>>>>>> Also the amount of memory used to build the sparsity patter seams > >>>>>>> to double for each step. The memory print for a 32x32x32 unit cube > >>>>>>> with 16 global dofs was 1.6 GB memory(!?). > >>>>>>> > >>>>>>> Johan > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Mailing list: https://launchpad.net/~dolfin > >>>>>>> Post to : [email protected] > >>>>>>> Unsubscribe : https://launchpad.net/~dolfin > >>>>>>> More help : https://help.launchpad.net/ListHelp > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Mailing list: https://launchpad.net/~dolfin > >>>>>> Post to : [email protected] > >>>>>> Unsubscribe : https://launchpad.net/~dolfin > >>>>>> More help : https://help.launchpad.net/ListHelp > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Mailing list: https://launchpad.net/~dolfin > >>>>>> Post to : [email protected] > >>>>>> Unsubscribe : https://launchpad.net/~dolfin > >>>>>> More help : https://help.launchpad.net/ListHelp > >>>> > >>>> _______________________________________________ > >>>> Mailing list: https://launchpad.net/~dolfin > >>>> Post to : [email protected] > >>>> Unsubscribe : https://launchpad.net/~dolfin > >>>> More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~dolfin Post to : [email protected] Unsubscribe : https://launchpad.net/~dolfin More help : https://help.launchpad.net/ListHelp

