On 04/03/11 16:38, Johan Hake wrote: > On Friday March 4 2011 03:29:32 Garth N. Wells wrote: >> On 03/03/11 19:48, Johan Hake wrote: >>> On Thursday March 3 2011 11:20:03 Marie E. Rognes wrote: >>>> On 03/03/2011 08:03 PM, Johan Hake wrote: >>>>> Hello! >>>>> >>>>> I am using Mixed spaces with Reals quite alot. It turnes out that >>>>> assemble forms with functions from MixedFunctionSpaces containing Real >>>>> spaces are dead slow. The time spent also increase with the number of >>>>> included Real spaces, even if none of them are included in form which >>>>> is assembled. >>>>> >>>>> The attached test script illustrates this. >>>> >>>> By replacing "CG", 1 by "R", 0 or? >>> >>> OMG!! Yes, *flush* >>> >>> That explains the memory usage :P >>> >>>>> The test script also reviels that an unproportial time is spent in FFC >>>>> generating the code. This time also increase with the number of Real >>>>> spaces included. Turning of FErari helped a bit with this point. >>>> >>>> I can take a look on the FFC side, but not today. >>> >>> Nice! >>> >>> With the update correction from Marie the numbers now looks like: >>> >>> With PETSc backend >>> >>> Tensor without Mixed space | 0.11211 0.11211 1 >>> With 1 global dofs | 1.9482 1.9482 1 >>> With 2 global dofs | 2.8725 2.8725 1 >>> With 4 global dofs | 5.1959 5.1959 1 >>> With 8 global dofs | 10.524 10.524 1 >>> With 16 global dofs | 25.574 25.574 1 >>> >>> With Epetra backend >>> >>> Tensor without Mixed space | 0.87544 0.87544 1 >>> With 1 global dofs | 1.7089 1.7089 1 >>> With 2 global dofs | 2.6868 2.6868 1 >>> With 4 global dofs | 4.28 4.28 1 >>> With 8 global dofs | 8.123 8.123 1 >>> With 16 global dofs | 17.394 17.394 1 >>> >>> Still a pretty big increase in time for just adding 16 scalar dofs to a >>> system of 274625 dofs in teh first place. >> >> I have seen this big slow down for large problems. The first issue, >> which was the computation of the sparsity pattern, has been 'resolved' >> by using boost::unordered_set. This comes at the expense of the a small >> slow down for regular problems. >> >> I also noticed that Epetra performs much better for these problems than >> PETSc does. We need to check the matrix initialisation, but it could >> ultimately be a limitation of the backends. Each matrix row >> corresponding to a global dof is full, and it may be that backends >> designed for large sparse matrices do not handle this well. > > How could inserting into the matrix be the bottle neck. In the test script I > attached I do not assemble any global dofs. >
I think that you've find that it is. It will be assembling zeroes in the global dof positions. >> The best approach is probably to add the entire row at once for global >> dofs. This would require a modified assembler. >> >> There is a UFC Blueprint to identify global dofs: >> >> https://blueprints.launchpad.net/ufc/+spec/global-dofs >> >> If we can identify global dofs, we have a better chance of dealing with >> the problem properly. This includes running in parallel with global dofs. > > Do you envision to have integrals over global dofs to get separated into its > own tabulate tensor function? Then in DOLFIN we can assemble the whole > row/column in one loop and insert it into the Matrix in one go. > No - I had in mind possibly adding only cell-based dofs to the matrix, and adding the global rows into a global vector, then is that inserted at the end (as one row) into a matrix. I'm not advocating at this stage a change to the tabulate_foo interface. > Do you think we also need to recognise global dofs in UFL to properly flesh > out these integrals? > Yes. For one to handle them properly in parallel since global dofs to not reside at mesh entity, and the domains are parallelised mesh-wise. Garth > Johan > >> Garth >> >>> Johan >>> >>>> -- >>>> Marie >>>> >>>>> I have not profiled any of this, but I just throw it out there. I do >>>>> not recognize any difference between for example Epetra or PETSc >>>>> backend as suggested in the fixed bug for building of sparsity pattern >>>>> with global dofs. >>>>> >>>>> My test has been done on a DOLFIN 0.9.9+. I haven't profiled it yet. >>>>> >>>>> Output from summary: >>>>> Tensor without Mixed space | 0.11401 0.11401 1 >>>>> With 1 global dofs | 0.40725 0.40725 1 >>>>> With 2 global dofs | 0.94694 0.94694 1 >>>>> With 4 global dofs | 2.763 2.763 1 >>>>> With 8 global dofs | 9.6149 9.6149 1 >>>>> >>>>> Also the amount of memory used to build the sparsity patter seams to >>>>> double for each step. The memory print for a 32x32x32 unit cube with 16 >>>>> global dofs was 1.6 GB memory(!?). >>>>> >>>>> Johan >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Mailing list: https://launchpad.net/~dolfin >>>>> Post to : [email protected] >>>>> Unsubscribe : https://launchpad.net/~dolfin >>>>> More help : https://help.launchpad.net/ListHelp >>>> >>>> _______________________________________________ >>>> Mailing list: https://launchpad.net/~dolfin >>>> Post to : [email protected] >>>> Unsubscribe : https://launchpad.net/~dolfin >>>> More help : https://help.launchpad.net/ListHelp >>>> >>>> >>>> _______________________________________________ >>>> Mailing list: https://launchpad.net/~dolfin >>>> Post to : [email protected] >>>> Unsubscribe : https://launchpad.net/~dolfin >>>> More help : https://help.launchpad.net/ListHelp >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~dolfin >> Post to : [email protected] >> Unsubscribe : https://launchpad.net/~dolfin >> More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~dolfin Post to : [email protected] Unsubscribe : https://launchpad.net/~dolfin More help : https://help.launchpad.net/ListHelp

