Eric, Great point. The multi-dimensional slicing and sequence return type is definitely strange. I was thinking about that last night. I’m a little new to the __array__ methods. Are you saying that the sequence behaviour would stay the same, (ie. __iter__, __revesed__, __contains__), but np.asarray(np.ndrange((3, 3))) would return something like an array of tuples? I’m not sure this is something that anybody can’t already with do meshgrid + stack
and only implement methods already present in numpy. I’m not sure what this means. I’ll note that in Python 3 <https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range>, range is it’s own thing. It is still a sequence type but it doesn’t support addition. I’m kinda ok with ndrange/ndindex being a sequence type, supporting ND slicing, but not being an array ;) I’m kinda warming up to the idea of expanding ndindex. 1. The additional start and step can be omitted from ndindex for a while (indefinitely?). Slicing is way more convenient anyway. 2. Warnings can help people move from nd.index(1, 2, 3) to nd.index((1, 2, 3)) 3. ndindex can return a seperate iterator, but the ndindex object would hold a reference to it. Calls to ndindex.__next__ would simply return next(of_that_object) Note. This would break introspection since the iterator is no longer ndindex type. I’m kinda OK with this though, but breaking code is never nice :( 4. Bench-marking can help motivate the choice of iterator used for step=(1,) * N start=(0,) * N 5. Wait until 2019 because I don’t want to deal with performance regressions of potentially using range in Python2 and I don’t want this to motivate any implementation details. Mark On Wed, Oct 10, 2018 at 12:36 AM Eric Wieser <wieser.eric+nu...@gmail.com> wrote: > One thing that worries me here - in python, range(...) in essence > generates a lazy list - so I’d expect ndrange to generate a lazy ndarray. > In practice, that means it would be a duck-type defining an __array__ > method to evaluate it, and only implement methods already present in numpy. > > It’s not clear to me what the datatype of such an array-like would be. > Candidates I can think of are: > > 1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a > little awkward > 2. (intp, (N,)) - which collapses into a shape + (3,) array > 3. object_. > 4. Some new np.tuple_ dtype, a heterogenous tuple, which is like the > structured np.void but without field names. I’m not sure how > vectorized element indexing would be spelt though. > > Eric > > > On Tue, 9 Oct 2018 at 21:59 Stephan Hoyer <sho...@gmail.com> wrote: > >> The speed difference is interesting but really a different question than >> the public API. >> >> I'm coming around to ndrange(). I can see how it could be useful for >> symbolic manipulation of arrays and indexing operations, similar to what we >> do in dask and xarray. >> >> On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche <mark.harfou...@gmail.com> >> wrote: >> >>> since ndrange is a superset of the features of ndindex, we can implement >>> ndindex with ndrange or keep it as is. >>> ndindex is now a glorified `nditer` object anyway. So it isn't so much >>> of a maintenance burden. >>> As for how ndindex is implemented, I'm a little worried about python 2 >>> performance seeing as range is a list. >>> I would wait on changing the way ndindex is implemented for now. >>> >>> I agree with Stephan that ndindex should be kept in. Many want backward >>> compatible code. It would be hard for me to justify why a dependency should >>> be bumped up to bleeding edge numpy just for a convenience iterator. >>> >>> Honestly, I was really surprised to see such a speed difference, I >>> thought it would have been closer. >>> >>> Allan, I decided to run a few more benchmarks, the nditer just seems >>> slow for single array access some reason. Maybe a bug? >>> >>> ``` >>> import numpy as np >>> import itertools >>> a = np.ones((1000, 1000)) >>> >>> b = {} >>> for i in np.ndindex(a.shape): >>> b[i] = i >>> >>> %%timeit >>> # op_flag=('readonly',) doesn't change performance >>> for a_value in np.nditer(a): >>> pass >>> 109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> >>> %%timeit >>> for i in itertools.product(range(1000), range(1000)): >>> a_value = a[i] >>> 113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> >>> %%timeit >>> for i in itertools.product(range(1000), range(1000)): >>> c = b[i] >>> 193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> >>> %%timeit >>> for a_value in a.flat: >>> pass >>> 25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> >>> %%timeit >>> for k, v in b.items(): >>> pass >>> 19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> >>> %%timeit >>> for i in itertools.product(range(1000), range(1000)): >>> pass >>> 28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> ``` >>> >>> On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer <sho...@gmail.com> wrote: >>> >>>> I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., >>>> discouraging its use in our docs, but not actually deprecating it). >>>> Certainly ndrange seems like a small but meaningful improvement in the >>>> interface. >>>> >>>> That said, I'm not convinced this is really worth the trouble. I think >>>> the nested loop is still pretty readable/clear, and there are few times >>>> when I've actually found ndindex() be useful. >>>> >>>> On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <allanhald...@gmail.com> >>>> wrote: >>>> >>>>> On 10/8/18 12:21 PM, Mark Harfouche wrote: >>>>> > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like >>>>> > `range`, is not an iterator. Changing this behaviour would likely >>>>> lead >>>>> > to breaking code that uses that assumption. For example anybody using >>>>> > introspection or code like: >>>>> > >>>>> > ``` >>>>> > indx = np.ndindex(5, 5) >>>>> > next(indx) # Don't look at the (0, 0) coordinate >>>>> > for i in indx: >>>>> > print(i) >>>>> > ``` >>>>> > would break if `ndindex` becomes "not an iterator" >>>>> >>>>> OK, I see now. Just like python3 has separate range and range_iterator >>>>> types, where range is sliceable, we would have separate ndrange and >>>>> ndindex types, where ndrange is sliceable. You're just copying the >>>>> python3 api. That justifies it pretty well for me. >>>>> >>>>> I still think we shouldn't have two functions which do nearly the same >>>>> thing. We should only have one, and get rid of the other. I see two >>>>> ways >>>>> forward: >>>>> >>>>> * replace ndindex by your ndrange code, so it is no longer an iter. >>>>> This would require some deprecation cycles for the cases that break. >>>>> * deprecate ndindex in favor of a new function ndrange. We would keep >>>>> ndindex around for back-compatibility, with a dep warning to use >>>>> ndrange instead. >>>>> >>>>> Doing a code search on github, I can see that a lot of people's code >>>>> would break if ndindex no longer was an iter. I also like the name >>>>> ndrange for its allusion to python3's range behavior. That makes me >>>>> lean >>>>> towards the second option of a separate ndrange, with possible >>>>> deprecation of ndindex. >>>>> >>>>> > itertools.product + range seems to be much faster than the current >>>>> > implementation of ndindex >>>>> > >>>>> > (python 3.6) >>>>> > ``` >>>>> > %%timeit >>>>> > >>>>> > for i in np.ndindex(100, 100): >>>>> > pass >>>>> > 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops >>>>> each) >>>>> > >>>>> > %%timeit >>>>> > import itertools >>>>> > for i in itertools.product(range(100), range(100)): >>>>> > pass >>>>> > 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops >>>>> each) >>>>> > ``` >>>>> >>>>> If the new code ends up faster than the old code, that's great, and >>>>> further justification for using ndrange instead of ndindex. I had >>>>> thought using nditer in the old code was fastest. >>>>> >>>>> So as far as I am concerned, I say go ahead with the PR the way you are >>>>> doing it. >>>>> >>>>> Allan >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion@python.org >>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion