On Sat 2008-05-17 16:47, Johan Hoffman wrote: > > Yes, I got the same numbers with PETSc. I checked and it it the same > > problem with uBlas, I am pretty sure that searching the elements in the > > assembly takes a very long time. Is it possible to change the element > > matrix A(indx) directly in uBlas? If it is possible we may get that the > > speedup would be much.
I really doubt this. Indexing the matrix this way is slow. It would be similar to calling MatInsertValue (no -s). Perhaps the trouble is that you set the same value many times. Is matrix assembly really significant compared to solving? > > There is a MatSetOption in PETSc, MAT_USE_HASH_TABLE, to take care what > > exactly I would like to have. But, that option does not work with AIJ > > format we are using in dolfin. > > Ok. Good. Why does this not work? With what matrix formats does it work? According to the source, only the BAIJ formats. Assuming you never eliminate parts of a vector when enforcing boundary conditions (though you usually should), BAIJ would probably be better. However, when I asked about this recently, Barry said it is unlikely to make a big difference assuming inodes are being used in the AIJ matrix. > >> Still, as I vary the size of the mesh I get this performance metric > >> virtually constant: > >> Assembled 7.3e+05 non-zero matrix elements per second (first pass) > >> Assembled 1.4e+06 non-zero matrix elements per second (re-assemble). This number seems a little low considering that in pure PETSc codes built without debugging, I can see 10^7/sec. That is, a particular FD matrix with 2e7 nonzeros takes 2 seconds to assemble. Of course, in this setting, we set all the elements in a row at once and never come back to it. In the finite element setting, assembly will always take longer, but it should be much less than solving for bigger systems. In FEM, you are updating some of the same values multiple times. Corner nodes with triangular elements show up in about 6 elements and with tetrahedral elements, in around 20. If you have a low-order discretization, most or all of your degrees of freedom are shared by multiple elements, so you add contributions several times per row. There is no way to `fix' this in the finite element framework. How long does it take to solve the system? Have you compared with a PETSc built without debugging (can make a big difference!)? What does running with -log_summary give? Is the preallocation correct on the first pass? (Run with -info and look for the number of times malloc was called in the first assembly pass.) Also note Andy Terrel's post regarding preallocation. This is the same bug I mentioned in this thread. Moving MatSetFromOptions() up fixes it. The trouble is that MatSetUp() (MatSetFromOptions() calls it) must be called before preallocation so the information doesn't get discarded. When you call MatSeqAIJSetPreallocation(), it will preallocate space for any matrix type which inherits from SeqAIJ. You can also call MatMPIAIJSetPreallocation() at the same point and it will also work for all matrix types which inherit from MPIAIJ. This would cover all the direct solvers people are likely to use (Mumps, Umfpack, Spooles, SuperLU). If block matrices are used, you can call the BAIJ versions here as well. Before the preallocation fix: [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 25140 X 25140; storage space: 261180 unneeded,730140 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 57708 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 59 After the fix: [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 25140 X 25140; storage space: 0 unneeded,730140 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 59 The time difference is around 3 orders of magnitude. Jed
pgpUmpsVHbVZC.pgp
Description: PGP signature
_______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
