Hi Edgar, Including your previous results, the ex4 patch alone still has the fastest "Active" time on average:
# previous results upstream: AVG= 1.51775 ex4 patch: AVG= 1.29828 # current results virtual function patch: AVG= 1.52382 virtual function + ex4 patches: AVG= 1.6248 But since you rebuilt libmesh between then and now, I'm not sure we should really compare the two. Another thing to mention is that running the test in parallel is probably counter productive to our goals since: 1.) It reduces the overall active time, and the shorter the duration of the thing you are trying to time, the more it is affected by the timing code itself. 2.) It will introduce more variability in the results. Currently, the coefficient of variation (mean divided by stddev) for these results is on the order of 15-18%, a fair bit larger than the differences in the times themselves. If you are interested in investigating this further, I would suggest that you re-check the previous results on your current libmesh build, but I'd also run everything in serial to try and reduce the variation to the point where one of the four possible versions is statistically faster than all the others... -- John On Sun, Jun 27, 2021 at 10:03 PM edgar <edgar...@cryptolab.net> wrote: > On 2021-06-18 21:45, John Peterson wrote: > > Your compiler flags are definitely far more advanced/aggressive than > > mine, > > I cannot take credit for that, really. I only modified the -O2 to -O3, > made sure that -funroll_loops was there and customised to my processor > (amdfam10). All the other flags come directly from the Makefile provided > by libMesh. > > > which are just on the default of -O2. However, I think what we should > > conclude from your results is that there is something slower than it > > needs > > to be with DenseMatrix::resize(), not that we should move the > > DenseMatrix > > creation/destruction inside the loop over elements. What I tried (see > > attached patch or the "dense_matrix_resize_no_virtual" branch in my > > fork) > > is avoiding the virtual function call to DenseMatrix::zero() which is > > currently made from DenseMatrix::resize(). In my testing, this change > > did > > not seem to make much of a difference but I'm curious about what you > > would > > get with your compiler args, this patch, and the unpatched ex4. > > There _is_ something consistently different for sure. I only ran the > case with `mpirun -np 4' and `-n 40'. The difference of the sums of > times is in the order of 1 second. For five tests of this size and my > rather limited system, I would say that your change yields marginally > faster computation, and should be used. In which case, my modifications > should be avoided. > > In the interest of completeness, I need to say that I had to rebuild > libMesh, because of compilation errors. I don't quite remember what > version it is right now, but it is not the updated master branch (due to > some issues that I am having with my Internet connection). Although this > may not affect the comparison, it should be noted. > > The results are shown below and in examples/introduction/sums.org > > #+name: tbl-results > #+caption: The first two columns correspond to the (patched) original > code. The last pair are the results with my modification (also with > patch). In each case, the first of the columns is alive time, and the > second one is active time. Data was copied from the .bz2 files. > | 3.65205 | 1.292 | 3.63248 | 1.31057 | > | 4.82533 | 1.76303 | 5.31107 | 1.95794 | > | 5.05955 | 1.84457 | 5.26696 | 1.964 | > | 3.86126 | 1.40952 | 3.53834 | 1.29313 | > | 3.58892 | 1.30998 | 4.369 | 1.59834 | > > #+caption: calculate the sums of each column > #+begin_src python :var data=tbl-results > ex4_alive = sum((I[0] for I in data)) > ex4_active = sum((I[1] for I in data)) > ex4_mod_alive = sum((I[2] for I in data)) > ex4_mod_active = sum((I[3] for I in data)) > return [["ex4_alive", "ex4_active", "ex4_mod_alive", > "ex4_mod_active"], > None, > [ex4_alive, ex4_active, ex4_mod_alive, ex4_mod_active]] > #+end_src > > #+RESULTS: > | ex4_alive | ex4_active | ex4_mod_alive | ex4_mod_active | > |-----------+--------------------+--------------------+----------------| > | 20.98711 | 7.6190999999999995 | 22.117849999999997 | 8.12398 | -- John _______________________________________________ Libmesh-users mailing list Libmesh-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-users