On Mon, Nov 23, 2015 at 2:09 PM, Sebastian Berg <sebast...@sipsolutions.net> wrote:
> On Mo, 2015-11-23 at 13:31 -0800, Eli Bendersky wrote: > > Hello, > > > > > > I'm trying to understand the buffering done by the Numpy iterator > > interface (the new post 1.6-one) when running ufuncs on arrays that > > require broadcasting. Consider this simple case: > > > > In [35]: m = np.arange(16).reshape(4,4) > > In [37]: n = np.arange(4) > > In [39]: m + n > > Out[39]: > > array([[ 0, 2, 4, 6], > > [ 4, 6, 8, 10], > > [ 8, 10, 12, 14], > > [12, 14, 16, 18]]) > > > > > > > On first sight this seems true. However, there is one other point to > consider here. The inner ufunc loop can only handle a single stride. The > contiguous array `n` has to be iterated as if it had the strides > `(0, 8)`, which is not the strides of the contiguous array `m` which can > be "unrolled" to 1-D. Those effective strides are thus not contiguous > for the inner ufunc loop and cannot be unrolled into a single ufunc > call! > > The optimization (which might kick in a bit more broadly maybe), is thus > that the number of inner loop calls is minimized, whether that is worth > it, I am not sure, it may well be that there is some worthy optimization > possible here. > Note however, that this does not occur for large inner loop sizes > (though I think you can find some "bad" sizes): > > ``` > In [18]: n = np.arange(40000) > > In [19]: m = np.arange(160000).reshape(4,40000) > > In [20]: o = m + n > Iterator: Checking casting for operand 0 > op: dtype('int64'), iter: dtype('int64') > Iterator: Checking casting for operand 1 > op: dtype('int64'), iter: dtype('int64') > Iterator: Checking casting for operand 2 > op: <null>, iter: dtype('int64') > Iterator: Setting allocated stride 1 for iterator dimension 0 to 8 > Iterator: Setting allocated stride 0 for iterator dimension 1 to 320000 > Iterator: Copying inputs to buffers > Iterator: Expanding inner loop size from 8192 to 40000 since buffering > wasn't needed > Any buffering needed: 0 > Iterator: Finished copying inputs to buffers (buffered size is 40000) > ``` > The heuristic in the code says that if we can use a single stride and that's larger than the buffer size (which I assume is the default buffer size, and can change) then it's "is_onestride" and no buffering is done. So this led me to explore around this threshold (8192 items by default on my machine), and indeed we can notice funny behavior: In [51]: %%timeit n = arange(8192); m = np.arange(8192*40).reshape(40,8192) o = m + n ....: 1000 loops, best of 3: 274 µs per loop In [52]: %%timeit n = arange(8292); m = np.arange(8292*40).reshape(40,8292) o = m + n ....: 1000 loops, best of 3: 229 µs per loop So, given this, it's not very clear why the "optimization" kicks in. Buffering for small sizes seems like a mistake. Eli > > Anyway, feel free to have a look ;). The code is not the most read one > in NumPy, and it would not surprise me a lot if you can find something > to tweak. > > - Sebastian > > > > > > If I instrument Numpy (setting NPY_IT_DBG_TRACING and such), I see > > that when the add() ufunc is called, 'n' is copied into a temporary > > buffer by the iterator. The ufunc then gets the buffer as its data. > > > > > > My question is: why is this buffering needed? It seems wasteful, since > > no casting is required here, no special alignment problems and also > > 'n' is contiguously laid out in memory. It seems that it would be more > > efficient to just use 'n' in the ufunc instead of passing in the > > buffer. What am I missing? > > > > > > Thanks in advance, > > > > Eli > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion