On Thu, Jun 10, 2021 at 12:05 PM edgar <edgar...@cryptolab.net> wrote:

> On 2021-06-10 16:00, John Peterson wrote:
> > On Wed, Jun 9, 2021 at 9:11 PM edgar <edgar...@cryptolab.net> wrote:
> >
> >> Hi,
> >>
> >> I am humbly sharing something which I think would improve the
> >> documentation and the logic of examples 3 and 4 a bit. I think that
> >> this
> >> would apply to other examples as well. (I was planning to keep
> >> learning
> >> from the examples, and have a more substantial contribution at the
> >> end,
> >> but it has been like a month since I last touched libMesh, and I it
> >> seems that I am going to be very busy in the next couple of months).
> >>
> >> Thanks.
> >
> >
> > Hi Edgar,
> >
> > I agree with the updates to the code comments in both files, so thanks
> > for
> > those. In the ex4 diff, it looks like you move the Ke, Fe declarations
> > from
> > outside the for-loop over elements to inside? This is not likely to be
> > an
> > optimization, though, since creating that storage once and "resizing"
> > it
> > many times in the loop avoids dynamic memory allocations... the resizes
> > are
> > no-ops if the same Elem type is used at each iteration of the for-loop.
> > If
> > you have some performance profiling for this example that suggests
> > otherwise, I'd be happy to take a look.
>
> In all honesty, John, I did run a performance log on them, and the
> modification was faster, but I don't have it anymore. As I implied, my
> intention was to implement the changes in most examples, but I just
> haven't had the time. I can reproduce the logs, but I don't know when I
> will have the time for that :S (sorry :( ). I guess that the reduced
> time comes from the compiler recognising the variable as short-lived
> within the loop and avoiding the resizing of the matrices for each loop.
>

I recorded the "Active time" for the "Matrix Assembly Performance" PerfLog
in introduction_ex4 running "./example-opt -d 3 -n 40" for both the
original codepath and your proposed change, averaging the results over 5
runs. The results were:

Original code, "./example-opt -d 3 -n 40"
import numpy as np
np.mean([3.91801, 3.93206, 3.94358, 3.97729, 3.90512]) = 3.93

Patch, "./example-opt -d 3 -n 40"
import numpy as np
np.mean([4.10462, 4.06232, 3.95176, 3.92786, 3.97992]) = 4.00

so I'd say the original code path is marginally (but still statistically
significantly) faster, although keep in mind that matrix assembly is only
about 21% of the total time for this example while the solve is about 71%.

-- 
John

_______________________________________________
Libmesh-users mailing list
Libmesh-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to