Accessing individual elements of NumPy arrays is slower than accessing 
individual elements of lists --- around 2.5x-3x slower.    NumPy has to do more 
work to figure out what kind of indexing you are trying to do because of its 
flexibility.   It also has to create the Python object to return.    In 
contrast, the list approach already has the Python objects created and you are 
just returning pointers to them and there is much less flexibility in the kinds 
of indexing you can do.  

Simple timings show that a.item(i,j) is about 2x slower than list element 
access (but faster than a[i,j] which is about 2.5x to 3x slower).   The 
slowness of a.item is due to the need to create the Python object to return 
(there are just raw bytes there) so it gives some idea of the relative cost of 
each part of the slowness of a[i,j].  

Also, math on the array scalars returned from NumPy will be slower than math on 
integers and floats --- because NumPy re-uses the ufunc machinery which is not 
optimized at all for scalars. 

The take-away is that NumPy is built for doing vectorized operations on bytes 
of data.   It is not optimized for doing element-by-element individual access.  
  The right approach there is to just use lists (or use a version specialized 
for the kind of data in the lists that removes the boxing and unboxing).  

Here are my timings using IPython for NumPy indexing: 

1-D: 

In[2]: a = arange(100)

In [3]: %timeit [a.item(i) for i in xrange(100)]
10000 loops, best of 3: 25.6 us per loop

In [4]: %timeit [a[i] for i in xrange(100)]
10000 loops, best of 3: 31.8 us per loop

In [5]: al = a.tolist()

In [6]: %timeit [al[i] for i in xrange(100)]
100000 loops, best of 3: 10.6 us per loop



2-D: 

In [7]: a = arange(100).reshape(10,10)

In [8]: al = a.tolist()

In [9]: %timeit [al[i][j] for i in xrange(10) for j in xrange(10)]
10000 loops, best of 3: 18.6 us per loop

In [10]: %timeit [a[i,j] for i in xrange(10) for j in xrange(10)]
10000 loops, best of 3: 44.4 us per loop

In [11]: %timeit [a.item(i,j) for i in xrange(10) for j in xrange(10)]
10000 loops, best of 3: 34.2 us per loop



-Travis



On Jun 22, 2012, at 8:48 AM, Benjamin Root wrote:

> 
> 
> On Fri, Jun 22, 2012 at 9:42 AM, eat <e.antero.ta...@gmail.com> wrote:
> Hi,
> 
> On Fri, Jun 22, 2012 at 7:51 AM, Gael Varoquaux 
> <gael.varoqu...@normalesup.org> wrote:
> On Thu, Jun 21, 2012 at 08:59:09PM -0400, Benjamin Root wrote:
> >      > munkres seems to be a pure python implementation ;-).
> 
> >      Oops! I could have sworn that I once tried one named munkres that used
> >      numpy. But that was several years ago.
> 
> >    There is a development branch of sk-learn with an implementation of the
> >    hungarian assignment solver using numpy. It will even do non-square
> >    matrices and matrices with an empty dimension.
> 
> Yes, absolutely, thanks to Ben:
> https://github.com/GaelVaroquaux/scikit-learn/blob/hungarian/sklearn/utils/hungarian.py
> I never merged this in the main scikit-learn tree, because munkres is not
> used so far. Maybe I should merge it in the main tree, or maybe it should
> be added to scipy or numpy.
> I made some simple timing comparisons (see attached picture) between numpy 
> based hungarian and pure python shortest path based hungarian_sp. It seems 
> that pure python based implementation outperforms numpy based implementation. 
> Timings are averaged over five runs.
> 
> The difference cannot totally be explained by different algorithms (although 
> shortest path based seem to scale better).  Rather the heavy access to rows 
> and columns seem to favor list of lists. So this type of algorithms may 
> indeed be real challenges for numpy.
> 
> 
> eat,
> 
> Thanks for that analysis.  Personally, I never needed high-performance so I 
> never bothered to optimize it.  However, it does appear that there is an 
> order-of-magnitude difference between the two, and so it might be worth it to 
> see what can be done to fix that.
> 
> Ben Root
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to