A Friday 11 December 2009 17:36:54 Bruce Southey escrigué: > On 12/11/2009 10:03 AM, Francesc Alted wrote: > > A Friday 11 December 2009 16:44:29 Dag Sverre Seljebotn escrigué: > >> Jasper van de Gronde wrote: > >>> Dag Sverre Seljebotn wrote: > >>>> Jasper van de Gronde wrote: > >>>>> I've attached a test file which shows the problem. It also tries > >>>>> adding columns instead of rows (in case the memory layout is playing > >>>>> tricks), but this seems to make no difference. This is the output I > >>>>> got: > >>>>> > >>>>> Dot product: 5.188786 > >>>>> Add a row: 8.032767 > >>>>> Add a column: 8.070953 > >>>>> > >>>>> Any ideas on why adding a row (or column) of a matrix is slower than > >>>>> computing a matrix product with a similarly sized matrix... (Xi has > >>>>> less columns than Xi2, but just as many rows.) > >>>> > >>>> I think we need some numbers to put this into context -- how big are > >>>> the vectors/matrices? How many iterations was the loop run? If the > >>>> vectors are small and the loop is run many times, how fast the > >>>> operation "ought" to be is irrelevant as it would drown in Python > >>>> overhead. > >>> > >>> Originally I had attached a Python file demonstrating the problem, but > >>> apparently this wasn't accepted by the list. In any case, the matrices > >>> and vectors weren't too big (60x20), so I tried making them bigger and > >>> indeed the "fast" version was now considerably faster. > >> > >> 60x20 is "nothing", so a full matrix multiplication or a single > >> matrix-vector probably takes the same time (that is, the difference > >> between them in itself is likely smaller than the error you make during > >> measuring). > >> > >> In this context, the benchmarks will be completely dominated by the > >> number of Python calls you make (each, especially taking the slice, > >> means allocating Python objects, calling a bunch of functions in C, etc. > >> etc). So it's not that strange, taking a slice isn't free, some Python > >> objects must be created etc. etc. > > > > Yeah, I think taking slices here is taking quite a lot of time: > > > > In [58]: timeit E + Xi2[P/2,:] > > 100000 loops, best of 3: 3.95 µs per loop > > > > In [59]: timeit E + Xi2[P/2] > > 100000 loops, best of 3: 2.17 µs per loop > > > > don't know why the additional ',:' in the slice is taking so much time, > > but my guess is that passing& analyzing the second argument > > (slice(None,None,None)) could be the responsible for the slowdown (but > > that is taking too much time). Mmh, perhaps it would be worth to study > > this more carefully so that an optimization could be done in NumPy. > > > >> I think the lesson mostly should be that with so little data, > >> benchmarking becomes a very difficult art. > > > > Well, I think it is not difficult, it is just that you are perhaps > > benchmarking Python/NumPy machinery instead ;-) I'm curious whether > > Matlab can do slicing much more faster than NumPy. Jasper? > > What are using actually trying to test here?
Good question. I don't know for sure :-) > I do not see any equivalence in the operations or output here. > -With your slices you need two dot products but ultimately you are only > using one for your dot product. > -There are addition operations on the slices that are not present in the > dot product. > -The final E arrays are not the same for all three operations. I don't understand the ultimate goal of the OP either, but what caught my attention was that: In [74]: timeit Xi2[P/2] 1000000 loops, best of 3: 278 ns per loop In [75]: timeit Xi2[P/2,:] 1000000 loops, best of 3: 1.04 µs per loop i.e. adding an additional parameter (the ':') to the slice, drives the time to run almost 4x slower. And with this, it is *partially* explained the problem exposed by OP, i.e.: In [77]: timeit np.dot(Xi,w) 100000 loops, best of 3: 2.91 µs per loop In [78]: timeit E + Xi2[P/2] 100000 loops, best of 3: 2.05 µs per loop In [79]: timeit E + Xi2[P/2,:] 100000 loops, best of 3: 3.81 µs per loop But again, don't ask me whether the results are okay or not. I'm playing here the role of a pure computational scientist on a very concrete problem ;-) > Having said that, the more you can vectorize your function, the more > efficient it will likely be especially with Atlas etc. Except if your arrays are small enough, which is the underlying issue here IMO. -- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion