On Friday, 4 December 2020 at 14:48:32 UTC, jmh530 wrote:
It looks like all the `sweep_XXX` functions are only defined
for contiguous slices, as that would be the default if define a
Slice!(T, N).
How the functions access the data is a big difference. If you
compare the `sweep_field` version with the `sweep_naive`
version, the `sweep_field` function is able to access through
one index, whereas the `sweep_naive` function has to use two in
the 2d version and 3 in the 3d version.
Also, the main difference in the NDSlice version is that it
uses *built-in* MIR functionality, like how `sweep_ndslice`
uses the `each` function from MIR, whereas `sweep_field` uses a
for loop. I think this is partially to show that the built-in
MIR functionality is as fast as if you tried to do it with a
for loop yourself.
I see, looking at some of the code, field case is literally doing
the indexing calculation right there. I guess ndslice is doing
the same thing just with "Mir magic" an in the background? Still,
ndslice is able to get a consistent higher rate of flops than the
field case - interesting. One thing I discovered about these
kinds of plots is that introducing log scale or two particularly
for timed comparisons can make the differences between different
methods that look close clearer. A log plot might show some
consistent difference between the timings of ndslice and the
field case. Underneath they should be doing essentially the same
thing so teasing out what is causing the difference would be
interesting. Is Mir doing some more efficient form of the
indexing calculation than naked field calculations?
I'm still not sure why slice is so slow. Doesn't that completely
rely on the opSlice implementations? The choice of indexing
method and underlying data structure? Isn't it just a symbolic
interface that you write whatever you want?