Dear Dustin,


    How is your computational setup, i.e., how many nonzero entries do
    you have in your matrix?

I'm not sure if I understand what you mean. Do you mean the number of nonzero entries in /SparseMatrix/ or in the /BlockSparsityPattern/ or in the dynamic one? How can I get this information?
You can call DynamicSparsityPattern::n_nonzero_elements() to get the number of nonzero entries in the dynamic sparsity pattern. This method also exists in BlockSparsityPattern (or all sparsity patterns that inherit from BlockSparsityPatternBase):
https://dealii.org/developer/doxygen/deal.II/classBlockSparsityPatternBase.html

What I'm trying to understand here is what kind of properties your problem has - whether there are many nonzero entries per row and other special things that could explain your problems.

I just checked the 3D case of step-22 for the performance of BlockSparsityPattern::copy_from(BlockDynamicSparsityPattern) and the performance looks where I would expect it to be. It takes 1.19s to copy the sparsity pattern for a case with 1.6m DoFs (I have some modifications for the mesh compared to what you find online) on my laptop. Given that there are 275m nonzero entries in that matrix and I need to touch around 4.4 GB (= 4 x 275m x 4 bytes per unsigned int, once for clearing the data in the pattern, once for reading in the dynamic pattern, once for writing into the fixed pattern plus once for write-allocate on that last operation) of memory here, I reach 26% of the theoretical possible on this machine (~14 GB/s memory transfer per core). While I would know how to reach more than 80% of peak memory bandwidth here, this function is no way near being relevant in the global run time in any of my performance profiles. And I'm likely the deal.II person with most affinity to performance numbers.

Thus my interest in what is particular about your setup.

    Have you checked that you do not run out of memory and see a large
    swap time?

I'm quiet sure that this is not the case/problem since I used one of our compute servers with 64 GB memory. Moreover, at the moment the program runs with an additional global refinement, i.e. about 16 million dofs and only 33% of the memory is used. Swap isn't used at all.
That's good to know, so we can exclude the memory issue. Does your program use multithreading? It probably does in case you do not do anything special when configuring deal.II; the copy operation is not parallelized by threads but neither are almost all other initialization functions, so it should not become such a disproportionate timing here. 10h for 2.5m dofs looks insane. I would expect something between 0.5 and 10 seconds, depending on the number of nonzeros in those blocks.

Is there anything else special about your configuration or problem as compared to the cases presented in the tutorial? What deal.II version are you using, what is the finite element? Any special constraints on those systems?

Unfortunately this can not be done that easy. I have to reorganize things and kill a lot of superflous code. But besides that, I have a lot of other work to do. May be I can provide you an example file at the end of next week.
Let us know when you have a test case. I'm really curious what could cause this huge run time.

Best,
Martin

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to