On Saturday, 5 December 2020 at 07:44:33 UTC, 9il wrote:

sweep_ndslice uses (2*N - 1) arrays to index U, this allows LDC to unroll the loop.

For example, for 2D case, withNeighboursSum [2] will store the pointer to the result, and the pointer at rows above and below.

matrix:
--------------
------a------- above iterator
------r------- the result
------b------- below iterator
--------------

Also, for AVX-512 targets it allows vectorizing the loop [1]. The benchmark has been run on the AVX2 CPU.

[1] https://github.com/typohnebild/numpy-vs-mir/issues/4
[2] http://mir-algorithm.libmir.org/mir_ndslice_topology.html#.withNeighboursSum

Very interesting, thank you for the explanations. Are there journal/book other implementation references for these approaches to implementing tensor-like multidimensional arrays? Tensor-like multidimensional arrays data structures is one of the worst covered in research/conventional literature compared to almost anything else, which can be rather frustrating.

Reply via email to