(NB all: the thread title seems to interchange the acronyms for the Thread Contraction Library (TCL) and the High-Perormance Tensor Transpose (HPTT) packages. I'm not fixing it so as not to break threading.)
On Wed, Aug 16, 2017 at 11:40 AM Paul Springer <pav...@gmx.de> wrote: > If you want to get it into Numpy, it would be worth checking if the > existing functions can be improved before adding new ones. > > Note that Numpy transposition method just rearranges the indices, so the > advantage of actual transposition is to have better cache performance or > allow direct use of CBLAS. I assume TCL uses some tricks to do > transposition in a way that is more cache friendly? > > HPTT is a sophisticated library for tensor transpositions, as such it > blocks the tensors such that (1) spatial locality can be exploited. > Moreover, (2) it uses explicit vectorization to take advantage of the CPU's > vector units. > I think this library provides functionality that isn't readily accessible from within numpy at the moment. The only functions I know of to rearrange the memory layout of data are things like ascontiguousarray and asfortranarray, as well as assignment (e.g. a[...] = b). The general strategy within numpy is to assume that all functions work equally well on arrays with arbitrary memory layouts, so that users often don't even know the memory layouts of their data. The striding functionality means data usually doesn't actually get transposed until absolutely necessary. Of course, few if any numpy functions work equally well on different memory layouts; unary ufuncs contain code to try to carry out their iteration in the fastest way, but it's not clear how well that works or whether they have the freedom to choose the layouts of their output arrays. If you wanted to integrate HPTT into numpy, I think the best approach might be to wire it into the assignment machinery, so that when users do things like a[::2,:] = b[:,::3].T HPTT springs into action behind the scenes and makes this assignment as efficient as possible (how well does it handle arrays with spaces between elements?). Then ascontiguousarray and asfortranarray and the like could simply use assignment to an appropriately-ordered destination when they actually needed to do anything. TCL uses the Transpose-Transpose-GEMM-Transpose approach where all tensors > are flattened into matrices (via HPTT) and then contracted via GEMM; the > final result is eventually folded (via HPTT) into the desired output tensor. > This is a pretty direct replacement of einsum, but I think einsum may well already do pretty much this, apart from not using HPTT to do the transposes. So the way to get this functionality would be to make the matrix-rearrangement primitives use HPTT, as above. Would it be possible to expose HPTT and TCL as optional packages within > NumPY? This way I don't have to redo the work that I've already put into > those libraries. > I think numpy should be regarded as a basically-complete package for manipulating strided in-memory data, to which we are reluctant to add new user-visible functionality. Tools that can act under the hood to make existing code faster, or to reduce the work users must to to make their code run fast enough, are valuable. > Might check the license if your work uses code from a publication. > > As far as licenses are concerned that should not be a problem since I > wrote to code myself and it doesn't use code from publications other than > mine. > Would some of your techniques help numpy to more rapidly evaluate things like C[...] = A+B, when A B and C are arbitrarily strided and there are no ordering constraints on the result? Or just A+B where numpy is free to choose the memory layout for the result? Anne
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion