On So, 2016-02-14 at 23:41 -0800, Antony Lee wrote: > I wonder whether numpy is using the "old" iteration protocol > (repeatedly calling x[i] for increasing i until StopIteration is > reached?) A quick timing shows that it is indeed slower. > ... actually it's not even clear to me what qualifies as a sequence > for `np.array`: > > class C: > def __iter__(self): > return iter(range(10)) # [0... 9] under the new iteration > protocol > def __getitem__(self, i): > raise IndexError # [] under the old iteration protocol >
Numpy currently uses PySequence_Fast, but it has to do a two pass algorithm (find dtype+dims), and the range is converted twice to list by this call. That explains the speed advantage of converting to list manually. - Sebastian > np.array(C()) > ===> array(<__main__.C object at 0x7f3f21ffff28>, dtype=object) > > So how can np.array(range(...)) even work? > > 2016-02-14 22:21 GMT-08:00 Ralf Gommers <ralf.gomm...@gmail.com>: > > > > > > On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris < > > charlesr.har...@gmail.com> wrote: > > > > > > > > > On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers < > > > ralf.gomm...@gmail.com> wrote: > > > > > > > > > > > > On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee < > > > > antony....@berkeley.edu> wrote: > > > > > re: no reason why... > > > > > This has nothing to do with Python2/Python3 (I personally > > > > > stopped using Python2 at least 3 years ago.) Let me put it > > > > > this way instead: if Python3's "range" (or Python2's > > > > > "xrange") was not a builtin type but a type provided by > > > > > numpy, I don't think it would be controversial at all to > > > > > provide an `__array__` special method to efficiently convert > > > > > it to a ndarray. It would be the same if `np.array` used a > > > > > `functools.singledispatch` dispatcher rather than an > > > > > `__array__` special method (which is obviously not possible > > > > > for chronological reasons). > > > > > > > > > > re: iterable vs iterator: check for the presence of the > > > > > __next__ special method (or isinstance(x, Iterable) vs. > > > > > isinstance(x, Iterator) and not isinstance(x, Iterable)) > > > > > > > > > I think it's good to do something about this, but it's not > > > > clear what the exact proposal is. I could image one or both of: > > > > > > > > - special-case the range() object in array (and > > > > asarray/asanyarray?) such that array(range(N)) becomes as fast > > > > as arange(N). > > > > - special-case all iterators, such that array(range(N)) > > > > becomes as fast as deque(range(N)) > > > > > > > I think the last wouldn't help much, as numpy would still need to > > > determine dimensions and type. I assume that is one of the > > > reason sparse itself doesn't do that. > > > > > Not orders of magnitude, but this shows that there's something to > > optimize for iterators: > > > > In [1]: %timeit np.array(range(100000)) > > 100 loops, best of 3: 14.9 ms per loop > > > > In [2]: %timeit np.array(list(range(100000))) > > 100 loops, best of 3: 9.68 ms per loop > > > > Ralf > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion