Hello Nathaniel, > I'd also be interested in hearing more about the memory requirements of this > approach. How does the temporary memory required typically scale with the > size of the input arrays? Is there an upper bound on the temporary memory > required? > Currently the algorithm will not create an array larger than the largest input or output array (maximum_array_size). This gives a maximum upper bound of (number_of_terms/2 + 1) * maximum_array_size. In practice, this rarely goes beyond the maximum_array_size as building large outer products is not usually helpful. The views are also dereferenced after they are used, I believe this should delete the arrays correctly. However, this is one thing I am not sure is being handled in the best way and can use further testing. Figuring out cumulative memory should also be possible for the brute force path algorithm, but I am not sure if this is possible for the faster greedy path algorithm without large changes.
Overall this sounds great. If anyone has a suggestion of where this should go I can start working on a PR and we can work out the remaining issues there? -Daniel Smith > On Jan 3, 2015, at 10:57 AM, Nathaniel Smith <n...@pobox.com> wrote: > > On 3 Jan 2015 02:46, "Daniel Smith" <dgasm...@icloud.com > <mailto:dgasm...@icloud.com>> wrote: > > > > Hello everyone, > > > > I have been working on a chunk of code that basically sets out to provide a > > single function that can take an arbitrary einsum expression and computes > > it in the most optimal way. While np.einsum can compute arbitrary > > expressions, there are two drawbacks to using pure einsum: einsum does not > > consider building intermediate arrays for possible reductions in overall > > rank and is not currently capable of using a vendor BLAS. I have been > > working on a project that aims to solve both issues simultaneously: > > > >https://github.com/dgasmith/opt_einsum > ><https://github.com/dgasmith/opt_einsum> > > > > This program first builds the optimal way to contract the tensors together, > > or using my own nomenclature a “path.” This path is then iterated over and > > uses tensordot when possible and einsum for everything else. In test cases > > the worst case scenario adds a 20 microsecond overhead penalty and, in the > > best case scenario, it can reduce the overall rank of the tensor. The > > primary (if somewhat exaggerated) example is a 5-term N^8 index > > transformation that can be reduced to N^5; even when N is very small (N=10) > > there is a 2,000 fold speed increase over pure einsum or, if using > > tensordot, a 2,400 fold speed increase. This is somewhat similar to the new > > np.linalg.multi_dot function. > > This sounds super awesome. Who *wouldn't* want a free 2,000x speed increase? > And I especially like your test suite. > > I'd also be interested in hearing more about the memory requirements of this > approach. How does the temporary memory required typically scale with the > size of the input arrays? Is there an upper bound on the temporary memory > required? > > > As such, I am very interested in implementing this into numpy. While I > > think opt_einsum is in a pretty good place, there is still quite a bit to > > do (see outstanding issues in the README). Even if this is not something > > that would fit into numpy I would still be very interested in your comments. > > We would definitely be interested in integrating this functionality into > numpy. After all, half the point of having an interface like einsum is that > it provides a clean boundary where we can swap in complicated, sophisticated > machinery without users having to care. No one wants to curate their own pile > of custom optimized libraries. :-) > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion