On So, 2016-02-14 at 23:41 -0800, Antony Lee wrote:
> I wonder whether numpy is using the "old" iteration protocol
> (repeatedly calling x[i] for increasing i until StopIteration is
> reached?)  A quick timing shows that it is indeed slower.
> ... actually it's not even clear to me what qualifies as a sequence
> for `np.array`:
> 
> class C:       
>     def __iter__(self):                       
>         return iter(range(10)) # [0... 9] under the new iteration
> protocol
>     def __getitem__(self, i):
>         raise IndexError # [] under the old iteration protocol
> 

Numpy currently uses PySequence_Fast, but it has to do a two pass
algorithm (find dtype+dims), and the range is converted twice to list
by this call. That explains the speed advantage of converting to list
manually.

- Sebastian


> np.array(C())
> ===> array(<__main__.C object at 0x7f3f21ffff28>, dtype=object)
> 
> So how can np.array(range(...)) even work?
> 
> 2016-02-14 22:21 GMT-08:00 Ralf Gommers <ralf.gomm...@gmail.com>:
> > 
> > 
> > On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris <
> > charlesr.har...@gmail.com> wrote:
> > > 
> > > 
> > > On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers <
> > > ralf.gomm...@gmail.com> wrote:
> > > > 
> > > > 
> > > > On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <
> > > > antony....@berkeley.edu> wrote:
> > > > > re: no reason why...
> > > > > This has nothing to do with Python2/Python3 (I personally
> > > > > stopped using Python2 at least 3 years ago.)  Let me put it
> > > > > this way instead: if Python3's "range" (or Python2's
> > > > > "xrange") was not a builtin type but a type provided by
> > > > > numpy, I don't think it would be controversial at all to
> > > > > provide an `__array__` special method to efficiently convert
> > > > > it to a ndarray.  It would be the same if `np.array` used a
> > > > > `functools.singledispatch` dispatcher rather than an
> > > > > `__array__` special method (which is obviously not possible
> > > > > for chronological reasons).
> > > > > 
> > > > > re: iterable vs iterator: check for the presence of the
> > > > > __next__ special method (or isinstance(x, Iterable) vs.
> > > > > isinstance(x, Iterator) and not isinstance(x, Iterable))
> > > > > 
> > > > I think it's good to do something about this, but it's not
> > > > clear what the exact proposal is. I could image one or both of:
> > > > 
> > > >   - special-case the range() object in array (and
> > > > asarray/asanyarray?) such that array(range(N)) becomes as fast
> > > > as arange(N).
> > > >   - special-case all iterators, such that array(range(N))
> > > > becomes as fast as deque(range(N))
> > > > 
> > > I think the last wouldn't help much, as numpy would still need to
> > > determine dimensions and type.  I assume that is one of the
> > > reason sparse itself doesn't do that.
> > > 
> > Not orders of magnitude, but this shows that there's something to
> > optimize for iterators:
> > 
> > In [1]: %timeit np.array(range(100000))
> > 100 loops, best of 3: 14.9 ms per loop
> > 
> > In [2]: %timeit np.array(list(range(100000)))
> > 100 loops, best of 3: 9.68 ms per loop
> > 
> > Ralf
> >  
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to