On Thu, Jun 10, 2021 at 12:05 PM edgar <edgar...@cryptolab.net> wrote:
> On 2021-06-10 16:00, John Peterson wrote: > > On Wed, Jun 9, 2021 at 9:11 PM edgar <edgar...@cryptolab.net> wrote: > > > >> Hi, > >> > >> I am humbly sharing something which I think would improve the > >> documentation and the logic of examples 3 and 4 a bit. I think that > >> this > >> would apply to other examples as well. (I was planning to keep > >> learning > >> from the examples, and have a more substantial contribution at the > >> end, > >> but it has been like a month since I last touched libMesh, and I it > >> seems that I am going to be very busy in the next couple of months). > >> > >> Thanks. > > > > > > Hi Edgar, > > > > I agree with the updates to the code comments in both files, so thanks > > for > > those. In the ex4 diff, it looks like you move the Ke, Fe declarations > > from > > outside the for-loop over elements to inside? This is not likely to be > > an > > optimization, though, since creating that storage once and "resizing" > > it > > many times in the loop avoids dynamic memory allocations... the resizes > > are > > no-ops if the same Elem type is used at each iteration of the for-loop. > > If > > you have some performance profiling for this example that suggests > > otherwise, I'd be happy to take a look. > > In all honesty, John, I did run a performance log on them, and the > modification was faster, but I don't have it anymore. As I implied, my > intention was to implement the changes in most examples, but I just > haven't had the time. I can reproduce the logs, but I don't know when I > will have the time for that :S (sorry :( ). I guess that the reduced > time comes from the compiler recognising the variable as short-lived > within the loop and avoiding the resizing of the matrices for each loop. > I recorded the "Active time" for the "Matrix Assembly Performance" PerfLog in introduction_ex4 running "./example-opt -d 3 -n 40" for both the original codepath and your proposed change, averaging the results over 5 runs. The results were: Original code, "./example-opt -d 3 -n 40" import numpy as np np.mean([3.91801, 3.93206, 3.94358, 3.97729, 3.90512]) = 3.93 Patch, "./example-opt -d 3 -n 40" import numpy as np np.mean([4.10462, 4.06232, 3.95176, 3.92786, 3.97992]) = 4.00 so I'd say the original code path is marginally (but still statistically significantly) faster, although keep in mind that matrix assembly is only about 21% of the total time for this example while the solve is about 71%. -- John _______________________________________________ Libmesh-users mailing list Libmesh-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-users