One thing that worries me here - in python, range(...) in essence generates a lazy list - so I’d expect ndrange to generate a lazy ndarray. In practice, that means it would be a duck-type defining an __array__ method to evaluate it, and only implement methods already present in numpy.
It’s not clear to me what the datatype of such an array-like would be. Candidates I can think of are: 1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a little awkward 2. (intp, (N,)) - which collapses into a shape + (3,) array 3. object_. 4. Some new np.tuple_ dtype, a heterogenous tuple, which is like the structured np.void but without field names. I’m not sure how vectorized element indexing would be spelt though. Eric On Tue, 9 Oct 2018 at 21:59 Stephan Hoyer <sho...@gmail.com> wrote: > The speed difference is interesting but really a different question than > the public API. > > I'm coming around to ndrange(). I can see how it could be useful for > symbolic manipulation of arrays and indexing operations, similar to what we > do in dask and xarray. > > On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche <mark.harfou...@gmail.com> > wrote: > >> since ndrange is a superset of the features of ndindex, we can implement >> ndindex with ndrange or keep it as is. >> ndindex is now a glorified `nditer` object anyway. So it isn't so much of >> a maintenance burden. >> As for how ndindex is implemented, I'm a little worried about python 2 >> performance seeing as range is a list. >> I would wait on changing the way ndindex is implemented for now. >> >> I agree with Stephan that ndindex should be kept in. Many want backward >> compatible code. It would be hard for me to justify why a dependency should >> be bumped up to bleeding edge numpy just for a convenience iterator. >> >> Honestly, I was really surprised to see such a speed difference, I >> thought it would have been closer. >> >> Allan, I decided to run a few more benchmarks, the nditer just seems slow >> for single array access some reason. Maybe a bug? >> >> ``` >> import numpy as np >> import itertools >> a = np.ones((1000, 1000)) >> >> b = {} >> for i in np.ndindex(a.shape): >> b[i] = i >> >> %%timeit >> # op_flag=('readonly',) doesn't change performance >> for a_value in np.nditer(a): >> pass >> 109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >> >> %%timeit >> for i in itertools.product(range(1000), range(1000)): >> a_value = a[i] >> 113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) >> >> %%timeit >> for i in itertools.product(range(1000), range(1000)): >> c = b[i] >> 193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >> >> %%timeit >> for a_value in a.flat: >> pass >> 25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >> >> %%timeit >> for k, v in b.items(): >> pass >> 19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >> >> %%timeit >> for i in itertools.product(range(1000), range(1000)): >> pass >> 28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >> ``` >> >> On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer <sho...@gmail.com> wrote: >> >>> I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., >>> discouraging its use in our docs, but not actually deprecating it). >>> Certainly ndrange seems like a small but meaningful improvement in the >>> interface. >>> >>> That said, I'm not convinced this is really worth the trouble. I think >>> the nested loop is still pretty readable/clear, and there are few times >>> when I've actually found ndindex() be useful. >>> >>> On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <allanhald...@gmail.com> >>> wrote: >>> >>>> On 10/8/18 12:21 PM, Mark Harfouche wrote: >>>> > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like >>>> > `range`, is not an iterator. Changing this behaviour would likely lead >>>> > to breaking code that uses that assumption. For example anybody using >>>> > introspection or code like: >>>> > >>>> > ``` >>>> > indx = np.ndindex(5, 5) >>>> > next(indx) # Don't look at the (0, 0) coordinate >>>> > for i in indx: >>>> > print(i) >>>> > ``` >>>> > would break if `ndindex` becomes "not an iterator" >>>> >>>> OK, I see now. Just like python3 has separate range and range_iterator >>>> types, where range is sliceable, we would have separate ndrange and >>>> ndindex types, where ndrange is sliceable. You're just copying the >>>> python3 api. That justifies it pretty well for me. >>>> >>>> I still think we shouldn't have two functions which do nearly the same >>>> thing. We should only have one, and get rid of the other. I see two ways >>>> forward: >>>> >>>> * replace ndindex by your ndrange code, so it is no longer an iter. >>>> This would require some deprecation cycles for the cases that break. >>>> * deprecate ndindex in favor of a new function ndrange. We would keep >>>> ndindex around for back-compatibility, with a dep warning to use >>>> ndrange instead. >>>> >>>> Doing a code search on github, I can see that a lot of people's code >>>> would break if ndindex no longer was an iter. I also like the name >>>> ndrange for its allusion to python3's range behavior. That makes me lean >>>> towards the second option of a separate ndrange, with possible >>>> deprecation of ndindex. >>>> >>>> > itertools.product + range seems to be much faster than the current >>>> > implementation of ndindex >>>> > >>>> > (python 3.6) >>>> > ``` >>>> > %%timeit >>>> > >>>> > for i in np.ndindex(100, 100): >>>> > pass >>>> > 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops >>>> each) >>>> > >>>> > %%timeit >>>> > import itertools >>>> > for i in itertools.product(range(100), range(100)): >>>> > pass >>>> > 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops >>>> each) >>>> > ``` >>>> >>>> If the new code ends up faster than the old code, that's great, and >>>> further justification for using ndrange instead of ndindex. I had >>>> thought using nditer in the old code was fastest. >>>> >>>> So as far as I am concerned, I say go ahead with the PR the way you are >>>> doing it. >>>> >>>> Allan >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion