Re: [Numpy-discussion] matrix default to column vector?

2009-06-05 Thread Charles R Harris
On Fri, Jun 5, 2009 at 11:30 AM, Alan G Isaac  wrote:

> On 6/5/2009 11:38 AM Olivier Verdier apparently wrote:
> > I think matrices can be pretty tricky when used for
> > teaching.  For instance, you have to explain that all the
> > operators work component-wise, except the multiplication!
> > Another caveat is that since matrices are always 2x2, the
> > "scalar product" of two column vectors computed as " x.T
> > * y" will not be a scalar, but a 2x2 matrix. There is also
> > the fact that you must cast all your vectors to column/raw
> > matrices (as in matlab). For all these reasons, I prefer
> > to use arrays and dot for teaching, and I have never had
> > any complaints.
>
>
> I do not understand this "argument".
> You should take it very seriously when someone
> reports to you that the matrix object is a crucial to them,
> e.g., as a teaching tool.  Even if you do not find
> personally persuasive an example like
> http://mail.scipy.org/pipermail/numpy-discussion/2009-June/043001.html
> I have told you: this is important for my students.
> Reporting that your students do not complain about using
> arrays instead of matrices does not change this one bit.
>
> Student backgrounds differ by domain of application.  In
> economics, matrices are in *very* wide use, and
> multidimensional arrays get almost no use.  Textbooks in
> econometrics (a huge and important field, even outside of
> economics) are full of proofs using matrix algebra.
> A close match to what the students see is crucial.
> When working with multiplication or exponentiation,
> matrices do what they expect, and 2d arrays do not.
>
> One more point. As Python users we get used to installing
> a package here and a package there to add functionality.
> But this is not how most people looking for a matrix
> language see the world.  Removing the matrix object from
> NumPy will raise the barrier to adoption by social
> scientists, and there should be a strongly persuasive reason
> before taking such a step.
>
> Separately from all that, does anyone doubt that there is
> code that depends on the matrix object?  The core objection
> to a past proposal for useful change was that it could break
> extant code.  I would hope that nobody who took that
> position would subsequently propose removing the matrix
> object altogether.
>
> Cheers,
> Alan Isaac
>
> PS If x and y are "column vectors" (i.e., matrices), then
> x.T * y *should* be a 1×1 matrix.
> Since the * operator is doing matrix multiplication,
> this is the correct result, not an anomaly.
>

Well, one could argue that. The x.T is a member of the dual, hence maps
vectors to the reals. Usually the reals aren't represented by 1x1 matrices.
Just my [.02] cents.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread josef . pktd
On Fri, Jun 5, 2009 at 5:59 PM, Brent Pedersen  wrote:
> On Fri, Jun 5, 2009 at 2:01 PM, Keith Goodman wrote:
>> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen  wrote:
>>> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote:
 On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman  wrote:
> On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
>>> Hello,
>>> I have a vectorizing problem that I don't see an obvious way to solve.  
>>> What
>>> I have is a vector like:
>>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>>> and a matrix
>>> T=zeros((6,6))
>>> and what I want in T is a count of all of the transitions in obs, e.g.
>>> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
>>> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
>>> for o1,o2 in zip(obs[:-1],obs[1:]):
>>>     T[o1,o2]+=1
>>>
>>> which gives the correct answer from above, which is:
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>>>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>>>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>
>>>
>>> but I thought there would be a better way.  I tried:
>>> o1=obs[:-1]
>>> o2=obs[1:]
>>> T[o1,o2]+=1
>>> but this doesn't give a count, it just yields 1's at the transition 
>>> points,
>>> like:
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>>>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>>>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>
>>> Is there a clever way to do this?  I could write a quick Cython 
>>> solution,
>>> but I wanted to keep this as an all-numpy implementation if I can.
>>>
>>
>> histogram2d or its imitation, there was a discussion on histogram2d a
>> short time ago
>>
> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
> obs2 = obs - 1
> trans = 
> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
> re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
>> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
>> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
>> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
>> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
> np.all(re == trans)
>> True
>>
> trans
>> array([[0, 0, 0, 0, 0, 0],
>>       [0, 0, 3, 0, 0, 1],
>>       [0, 3, 0, 1, 0, 0],
>>       [0, 0, 2, 0, 1, 0],
>>       [0, 0, 0, 2, 0, 0],
>>       [0, 0, 0, 0, 1, 0]])
>>
>>
>> or
>>
> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
> range=[[0,5],[0,5]])
> re
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
> h
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
> np.all(re == h)
>> True
>
> There's no way my list method can beat that. But by adding
>
> import psyco
> psyco.full()
>
> I get a total speed up of a factor of 15 when obs is length 1.

 Actually, it is faster:

 histogram:

>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>> range=[[0,5],[0,5]])
>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>> range=[[0,5],[0,5]])
 100 loops, best of 3: 4.14 ms per loop

 lists:

>> timeit test(obs3, T3)
 1000 loops, best of 3: 1.32 ms per loop
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

>>>
>>>
>>> here's a go:
>>>
>>> import numpy as np
>>> import random
>>> from itertools import groupby
>>>
>>> def test1(obs, T):
>>>   for o1,o2 in zip(obs[:-1],obs[1:]):
>>>       T[o1][o2] += 1
>>>   return T
>>>
>>>
>>> def test2(obs, T):
>>>    s = zip(obs[:-1], obs[1:])
>>>    for idx, g in groupby(sorted(s)):
>>>        T[idx] = len(list(g))
>>>    return T
>>>
>>> obs = [random.randint(0, 5) for z in range(1)]
>>>
>>> print test2(obs, np.zeros((6, 6)))
>>> print test1(obs, np.zeros((6, 6)))
>>>
>>>
>>> ##
>>>
>>> In [10]: timeit test1(obs, np.zeros((6, 6)))
>>> 100 loops, best of 3: 18.8 ms per lo

Re: [Numpy-discussion] matrix multiplication

2009-06-05 Thread Keith Goodman
On Fri, Jun 5, 2009 at 5:19 PM, Christopher Barker
 wrote:
> Robert Kern wrote:
> x = np.array([1,2,3])
> timeit x.sum()
>>> 10 loops, best of 3: 3.01 µs per loop
> from numpy import sum
> timeit sum(x)
>>> 10 loops, best of 3: 4.84 µs per loop
>
> that is a VERY short array, so one extra function call overhead could
> make the difference. Is it really your use case to have such tiny sums
> inside a big loop, and is there no way to vectorize that?

I was trying to make the timeit difference large. It is the overhead
that I was interested in. But it is still noticable when x is a
"typical" size:

>> x = np.arange(1000)
>> timeit x.sum()
10 loops, best of 3: 5.46 µs per loop
>> from numpy import sum
>> timeit sum(x)
10 loops, best of 3: 7.31 µs per loop

>> x = np.random.rand(1000)
>> timeit x.sum()
10 loops, best of 3: 6.81 µs per loop
>> timeit sum(x)
10 loops, best of 3: 8.36 µs per loop

That reminds me of a big difference between arrays and matrices.
Matrices have the overhead of going through python code (the matrix
class) to get to the core C array code. I coverted an interative
optimization function from matrices to arrays and got a speed up of a
factor of 3.5. Array size was around (500, 10) and (500,)
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix multiplication

2009-06-05 Thread Christopher Barker
Robert Kern wrote:
 x = np.array([1,2,3])
 timeit x.sum()
>> 10 loops, best of 3: 3.01 µs per loop
 from numpy import sum
 timeit sum(x)
>> 10 loops, best of 3: 4.84 µs per loop

that is a VERY short array, so one extra function call overhead could 
make the difference. Is it really your use case to have such tiny sums 
inside a big loop, and is there no way to vectorize that?

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix multiplication

2009-06-05 Thread Robert Kern
On Fri, Jun 5, 2009 at 17:54, Keith Goodman  wrote:
> On Fri, Jun 5, 2009 at 3:02 PM, Alan G Isaac  wrote:
>> I think something close to this would be possible:
>> add dot as an array method.
>>        A .dot(B) .dot(C)
>> is not as pretty as
>>        A * B * C
>> but it is much better than
>>        np.dot(np.dot(A,B),C)
>
> I've noticed that x.sum() is faster than sum(x)
>
>>> x = np.array([1,2,3])
>>> timeit x.sum()
> 10 loops, best of 3: 3.01 µs per loop
>>> from numpy import sum
>>> timeit sum(x)
> 10 loops, best of 3: 4.84 µs per loop
>
> Would the same be true of dot? That is, would x.dot(y) be faster than
> dot(x,y)? Or is it just that np.sum() has to go through some extra
> python code before it hits the C code?

No and yes, respectively.

> In general, if I'm trying to speed up an inner loop, I try to replace
> func(x) with x.func(). But I don't really understand the general
> principle at work here.

Most of the functions that mirror methods, including numpy.sum(), are
Python functions that have a little of bit of code to convert the
input to an array if necessary and to dispatch to the method.
numpy.dot() is already implemented in C.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix multiplication

2009-06-05 Thread Keith Goodman
On Fri, Jun 5, 2009 at 3:02 PM, Alan G Isaac  wrote:
> I think something close to this would be possible:
> add dot as an array method.
>        A .dot(B) .dot(C)
> is not as pretty as
>        A * B * C
> but it is much better than
>        np.dot(np.dot(A,B),C)

I've noticed that x.sum() is faster than sum(x)

>> x = np.array([1,2,3])
>> timeit x.sum()
10 loops, best of 3: 3.01 µs per loop
>> from numpy import sum
>> timeit sum(x)
10 loops, best of 3: 4.84 µs per loop

Would the same be true of dot? That is, would x.dot(y) be faster than
dot(x,y)? Or is it just that np.sum() has to go through some extra
python code before it hits the C code?

In general, if I'm trying to speed up an inner loop, I try to replace
func(x) with x.func(). But I don't really understand the general
principle at work here.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix multiplication

2009-06-05 Thread Charles R Harris
On Fri, Jun 5, 2009 at 4:02 PM, Alan G Isaac  wrote:

> On 6/5/2009 5:41 PM Chris Colbert apparently wrote:
> > well, it sounded like a good idea.
>
>
> I think something close to this would be possible:
> add dot as an array method.
>A .dot(B) .dot(C)
> is not as pretty as
>A * B * C
> but it is much better than
>np.dot(np.dot(A,B),C)
>

I prefer using the evaluation operator, so that becomes

A(B)(C) or, changing the multiplication order a bit, A(B(C))

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] IndexExpression bug?

2009-06-05 Thread Robert Kern
On Fri, Jun 5, 2009 at 16:14, Michael McNeil Forbes
 wrote:
>  >>> np.array([0,1,2,3])[1:-1]
> array([1, 2])
>
> but
>
>  >>> np.array([0,1,2,3])[np.s_[1:-1]]
> array([1, 2, 3])
>  >>> np.array([0,1,2,3])[np.index_exp[1:-1]]
> array([1, 2, 3])
>
> Possible fix:
> class IndexExpression(object):
>     ...
>     def __len__(self):
>         return 0
>
> (Presently this returns sys.maxint)
>
> Does this break anything (I can't find any coverage tests)?  If not, I
> will submit a ticket.

I think that getting rid of __getslice__ and __len__ should work
better. I don't really understand what the logic was behind including
them in the first place, though. I might be missing something.

In [21]: %cpaste
Pasting code; enter '--' alone on the line to stop.
:class IndexExpression(object):
:"""
:A nicer way to build up index tuples for arrays.
:
:For any index combination, including slicing and axis insertion,
:'a[indices]' is the same as 'a[index_exp[indices]]' for any
:array 'a'. However, 'index_exp[indices]' can be used anywhere
:in Python code and returns a tuple of slice objects that can be
:used in the construction of complex index expressions.
:"""
:
:def __init__(self, maketuple):
:self.maketuple = maketuple
:
:def __getitem__(self, item):
:if self.maketuple and type(item) is not tuple:
:return (item,)
:else:
:return item
:--

In [22]: s2 = IndexExpression(False)

In [23]: s2[1:-1]
Out[23]: slice(1, -1, None)


-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix multiplication

2009-06-05 Thread Matthew Brett
> I think something close to this would be possible:
> add dot as an array method.
>        A .dot(B) .dot(C)
> is not as pretty as
>        A * B * C
> but it is much better than
>        np.dot(np.dot(A,B),C)

That is much better.

Matthew
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix multiplication

2009-06-05 Thread Alan G Isaac
On 6/5/2009 5:41 PM Chris Colbert apparently wrote:
> well, it sounded like a good idea.


I think something close to this would be possible:
add dot as an array method.
A .dot(B) .dot(C)
is not as pretty as
A * B * C
but it is much better than
np.dot(np.dot(A,B),C)

In fact it is so much better, that it
might (?) be worth considering separately
from the entire matrix discussion.

And it does not provide a matrix exponential,
of course.

Alan Isaac

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Brent Pedersen
On Fri, Jun 5, 2009 at 2:01 PM, Keith Goodman wrote:
> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen  wrote:
>> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote:
>>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman  wrote:
 On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
>> Hello,
>> I have a vectorizing problem that I don't see an obvious way to solve.  
>> What
>> I have is a vector like:
>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>> and a matrix
>> T=zeros((6,6))
>> and what I want in T is a count of all of the transitions in obs, e.g.
>> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
>> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
>> for o1,o2 in zip(obs[:-1],obs[1:]):
>>     T[o1,o2]+=1
>>
>> which gives the correct answer from above, which is:
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>>
>> but I thought there would be a better way.  I tried:
>> o1=obs[:-1]
>> o2=obs[1:]
>> T[o1,o2]+=1
>> but this doesn't give a count, it just yields 1's at the transition 
>> points,
>> like:
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>> Is there a clever way to do this?  I could write a quick Cython solution,
>> but I wanted to keep this as an all-numpy implementation if I can.
>>
>
> histogram2d or its imitation, there was a discussion on histogram2d a
> short time ago
>
 obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 obs2 = obs - 1
 trans = 
 np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
 re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
 np.all(re == trans)
> True
>
 trans
> array([[0, 0, 0, 0, 0, 0],
>       [0, 0, 3, 0, 0, 1],
>       [0, 3, 0, 1, 0, 0],
>       [0, 0, 2, 0, 1, 0],
>       [0, 0, 0, 2, 0, 0],
>       [0, 0, 0, 0, 1, 0]])
>
>
> or
>
 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
 re
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
 h
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
 np.all(re == h)
> True

 There's no way my list method can beat that. But by adding

 import psyco
 psyco.full()

 I get a total speed up of a factor of 15 when obs is length 1.
>>>
>>> Actually, it is faster:
>>>
>>> histogram:
>>>
> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
> range=[[0,5],[0,5]])
>>> 100 loops, best of 3: 4.14 ms per loop
>>>
>>> lists:
>>>
> timeit test(obs3, T3)
>>> 1000 loops, best of 3: 1.32 ms per loop
>>> ___
>>> Numpy-discussion mailing list
>>> Numpy-discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> here's a go:
>>
>> import numpy as np
>> import random
>> from itertools import groupby
>>
>> def test1(obs, T):
>>   for o1,o2 in zip(obs[:-1],obs[1:]):
>>       T[o1][o2] += 1
>>   return T
>>
>>
>> def test2(obs, T):
>>    s = zip(obs[:-1], obs[1:])
>>    for idx, g in groupby(sorted(s)):
>>        T[idx] = len(list(g))
>>    return T
>>
>> obs = [random.randint(0, 5) for z in range(1)]
>>
>> print test2(obs, np.zeros((6, 6)))
>> print test1(obs, np.zeros((6, 6)))
>>
>>
>> ##
>>
>> In [10]: timeit test1(obs, np.zeros((6, 6)))
>> 100 loops, best of 3: 18.8 ms per loop
>>
>> In [11]: timeit test2(obs, np.zeros((6, 6)))
>> 100 loops, best of 3: 6.91 ms per loop
>
> Wait, you tested the list method with an array. Try
>
> timeit test1(obs, np.zeros((6, 6)).tolist())
>
> Probably best to

Re: [Numpy-discussion] Maturing the Matrix class in NumPy

2009-06-05 Thread Chris Colbert
well, it sounded like a good idea.

Oh, well.




On Fri, Jun 5, 2009 at 5:28 PM, Robert Kern  wrote:

> On Fri, Jun 5, 2009 at 16:24, Chris Colbert  wrote:
> > How about just introducing a slightly different syntax which tells numpy
> to
> > handle the array like a matrix:
> >
> > Some thing along the lines of this:
> >
> > A = array[[..]]
> > B = array[[..]]
> >
> > elementwise multipication (as it currently is):
> >
> > C = A * B
> >
> > matrix multiplication:
> >
> > C = {A} * {B}
> >
> > or
> >
> > C = [A] * [B]
> >
> > or any other brace we decide on
> >
> > Essentially, the brace tells numpy to handle the array objects like
> > matrices, but with no need for a specific matrix type.
>
> We don't control the Python language. We cannot make these kinds of
> changes to the syntax.
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Chris Colbert
I'll caution anyone from using Atlas from the repos in Ubuntu 9.04  as the
package is broken:

https://bugs.launchpad.net/ubuntu/+source/atlas/+bug/363510


just build Atlas yourself, you get better performance AND threading.
Building it is not the nightmare it sounds like. I think i've done it a
total of four times now, both 32-bit and 64-bit builds.

If you need help with it,  just email me off list.

Cheers,

Chris

On Fri, Jun 5, 2009 at 2:46 PM, Matthieu Brucher  wrote:

> 2009/6/5 David Cournapeau :
> > Eric Firing wrote:
> >>
> >> David,
> >>
> >> The eigen web site indicates that eigen achieves high performance
> >> without all the compilation difficulty of atlas.  Does eigen have enough
> >> functionality to replace atlas in numpy?
> >
> > No, eigen does not provide a (complete) BLAS/LAPACK interface. I don't
> > know if that's even a goal of eigen (it started as a project for KDE, to
> > support high performance core computations for things like spreadsheet
> > and co).
> >
> > But even then, it would be a huge undertaking. For all its flaws, LAPACK
> > is old, tested code, with a very stable language (F77). Eigen is:
> >- not mature.
> >- heavily expression-template-based C++, meaning compilation takes
> > ages + esoteric, impossible to decypher compilation errors. We have
> > enough build problems already :)
> >- SSE dependency harcoded, since it is setup at build time. That's
> > going backward IMHO - I would rather see a numpy/scipy which can load
> > the optimized code at runtime.
>
> I would add that it relies on C++ compiler extensions (the restrict
> keyword) as does blitz. You unfortunately can't expect every compiler
> to support it unless the C++ committee finally adds it to the
> standard.
>
> Matthieu
> --
> Information System Engineer, Ph.D.
> Website: http://matthieu-brucher.developpez.com/
> Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
> LinkedIn: http://www.linkedin.com/in/matthieubrucher
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] IndexExpression bug?

2009-06-05 Thread Michael McNeil Forbes
 >>> np.array([0,1,2,3])[1:-1]
array([1, 2])

but

 >>> np.array([0,1,2,3])[np.s_[1:-1]]
array([1, 2, 3])
 >>> np.array([0,1,2,3])[np.index_exp[1:-1]]
array([1, 2, 3])

Possible fix:
class IndexExpression(object):
 ...
 def __len__(self):
 return 0

(Presently this returns sys.maxint)

Does this break anything (I can't find any coverage tests)?  If not, I  
will submit a ticket.

Michael.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Maturing the Matrix class in NumPy

2009-06-05 Thread Robert Kern
On Fri, Jun 5, 2009 at 16:24, Chris Colbert  wrote:
> How about just introducing a slightly different syntax which tells numpy to
> handle the array like a matrix:
>
> Some thing along the lines of this:
>
> A = array[[..]]
> B = array[[..]]
>
> elementwise multipication (as it currently is):
>
> C = A * B
>
> matrix multiplication:
>
> C = {A} * {B}
>
> or
>
> C = [A] * [B]
>
> or any other brace we decide on
>
> Essentially, the brace tells numpy to handle the array objects like
> matrices, but with no need for a specific matrix type.

We don't control the Python language. We cannot make these kinds of
changes to the syntax.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Maturing the Matrix class in NumPy

2009-06-05 Thread Chris Colbert
How about just introducing a slightly different syntax which tells numpy to
handle the array like a matrix:

Some thing along the lines of this:

A = array[[..]]
B = array[[..]]

elementwise multipication (as it currently is):

C = A * B

matrix multiplication:

C = {A} * {B}

or

C = [A] * [B]

or any other brace we decide on

Essentially, the brace tells numpy to handle the array objects like
matrices, but with no need for a specific matrix type.


Since textbook math typical has some kind of bracket around matrix variable,
I think this would jive well


just my .02

Chris

On Fri, Jun 5, 2009 at 5:14 PM, Alan G Isaac  wrote:

> On 6/5/2009 3:49 PM Stéfan van der Walt apparently wrote:
> > If the Matrix class is to remain, we need to take the steps
> > necessary to integrate it into NumPy properly.
>
> I think this requires a list of current problems.
> Many of the problems for NumPy have been addressed over time.
> I believe the remaining problems center more on SciPy rather than NumPy.
> This requires that users report difficulties.
>
> For example, Jason Rennie says he ran into problems with
> scipy.optimize.fmin_cg, although I do not recall him reporting
> these (I do recall an optimization problem he reported using
> ndarrays).  Has he filed a bug report detailing his problem?
>
>
> > To get going we'll need a list of changes required (i.e. "in an ideal
> > world, how would matrices work?").
>
> The key anomaly concerning matrices comes with indexing.
> See the introduction here: http://www.scipy.org/MatrixIndexing
>
> However changing this for the current matrix object was rejected
> in the last (exhausting) go round.
>
>
> > There should be a set protocol for
> > all numpy functions that guarantees compatibility with ndarrays,
> > matrices and other derived classes.
>
> My impression was that this was resolved as follows:
> handle all ndarray based objects as arrays (using asarray)
> in any NumPy function, but return the subclass when possible.
> (E.g., using asmatrix, return a matrix output for a matrix input.)
> This seems fine to me.
>
>
> > Being one of the most vocal proponents of the Matrix class, would you
> > be prepared to develop your Matrix Proposal at
> > http://scipy.org/NewMatrixSpec further?
>
> I consider my proposal to have the following status: rejected.
> I consider the core reason to be: creates a backwards incompatibility.
> That was a very long and exhausting discussion that was productive
> in laying out the issues, but I do not think we can progress in that
> direction.
>
> The existing matrix object is very usable.
> It's primary problem is some indexing anomalies,
> http://www.scipy.org/MatrixIndexing
> and not everyone saw those as problems.
> In terms of NumPy functions, I think the asarray/asmatrix
> protocol fits the bill.  (Altho perhaps I am overlooking
> something as a user that is obvious to a developer.)
>
> Cheers,
> Alan
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Maturing the Matrix class in NumPy

2009-06-05 Thread Alan G Isaac
On 6/5/2009 3:49 PM Stéfan van der Walt apparently wrote:
> If the Matrix class is to remain, we need to take the steps
> necessary to integrate it into NumPy properly.

I think this requires a list of current problems.
Many of the problems for NumPy have been addressed over time.
I believe the remaining problems center more on SciPy rather than NumPy.
This requires that users report difficulties.

For example, Jason Rennie says he ran into problems with
scipy.optimize.fmin_cg, although I do not recall him reporting
these (I do recall an optimization problem he reported using
ndarrays).  Has he filed a bug report detailing his problem?


> To get going we'll need a list of changes required (i.e. "in an ideal
> world, how would matrices work?").

The key anomaly concerning matrices comes with indexing.
See the introduction here: http://www.scipy.org/MatrixIndexing

However changing this for the current matrix object was rejected
in the last (exhausting) go round.


> There should be a set protocol for
> all numpy functions that guarantees compatibility with ndarrays,
> matrices and other derived classes.

My impression was that this was resolved as follows:
handle all ndarray based objects as arrays (using asarray)
in any NumPy function, but return the subclass when possible.
(E.g., using asmatrix, return a matrix output for a matrix input.)
This seems fine to me.


> Being one of the most vocal proponents of the Matrix class, would you
> be prepared to develop your Matrix Proposal at
> http://scipy.org/NewMatrixSpec further?

I consider my proposal to have the following status: rejected.
I consider the core reason to be: creates a backwards incompatibility.
That was a very long and exhausting discussion that was productive
in laying out the issues, but I do not think we can progress in that
direction.

The existing matrix object is very usable.
It's primary problem is some indexing anomalies,
http://www.scipy.org/MatrixIndexing
and not everyone saw those as problems.
In terms of NumPy functions, I think the asarray/asmatrix
protocol fits the bill.  (Altho perhaps I am overlooking
something as a user that is obvious to a developer.)

Cheers,
Alan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Keith Goodman
On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen  wrote:
> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote:
>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman  wrote:
>>> On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
 On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
> Hello,
> I have a vectorizing problem that I don't see an obvious way to solve.  
> What
> I have is a vector like:
> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
> and a matrix
> T=zeros((6,6))
> and what I want in T is a count of all of the transitions in obs, e.g.
> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
> for o1,o2 in zip(obs[:-1],obs[1:]):
>     T[o1,o2]+=1
>
> which gives the correct answer from above, which is:
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
>
> but I thought there would be a better way.  I tried:
> o1=obs[:-1]
> o2=obs[1:]
> T[o1,o2]+=1
> but this doesn't give a count, it just yields 1's at the transition 
> points,
> like:
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
> Is there a clever way to do this?  I could write a quick Cython solution,
> but I wanted to keep this as an all-numpy implementation if I can.
>

 histogram2d or its imitation, there was a discussion on histogram2d a
 short time ago

>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>>> obs2 = obs - 1
>>> trans = 
>>> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
>>> re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
 ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
 ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
 ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
 ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
 ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>> np.all(re == trans)
 True

>>> trans
 array([[0, 0, 0, 0, 0, 0],
       [0, 0, 3, 0, 0, 1],
       [0, 3, 0, 1, 0, 0],
       [0, 0, 2, 0, 1, 0],
       [0, 0, 0, 2, 0, 0],
       [0, 0, 0, 0, 1, 0]])


 or

>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>>> range=[[0,5],[0,5]])
>>> re
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  1.],
       [ 0.,  3.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>> h
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  1.],
       [ 0.,  3.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])

>>> np.all(re == h)
 True
>>>
>>> There's no way my list method can beat that. But by adding
>>>
>>> import psyco
>>> psyco.full()
>>>
>>> I get a total speed up of a factor of 15 when obs is length 1.
>>
>> Actually, it is faster:
>>
>> histogram:
>>
 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
 timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
>> 100 loops, best of 3: 4.14 ms per loop
>>
>> lists:
>>
 timeit test(obs3, T3)
>> 1000 loops, best of 3: 1.32 ms per loop
>> ___
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> here's a go:
>
> import numpy as np
> import random
> from itertools import groupby
>
> def test1(obs, T):
>   for o1,o2 in zip(obs[:-1],obs[1:]):
>       T[o1][o2] += 1
>   return T
>
>
> def test2(obs, T):
>    s = zip(obs[:-1], obs[1:])
>    for idx, g in groupby(sorted(s)):
>        T[idx] = len(list(g))
>    return T
>
> obs = [random.randint(0, 5) for z in range(1)]
>
> print test2(obs, np.zeros((6, 6)))
> print test1(obs, np.zeros((6, 6)))
>
>
> ##
>
> In [10]: timeit test1(obs, np.zeros((6, 6)))
> 100 loops, best of 3: 18.8 ms per loop
>
> In [11]: timeit test2(obs, np.zeros((6, 6)))
> 100 loops, best of 3: 6.91 ms per loop

Wait, you tested the list method with an array. Try

timeit test1(obs, np.zeros((6, 6)).tolist())

Probably best to move the array/list creation out of the timeit loop.
Then my method won't have to pay the cost of converting to a list :)
___
Numpy-discussion mailing list
Numpy-discus

Re: [Numpy-discussion] vectorizing

2009-06-05 Thread josef . pktd
On Fri, Jun 5, 2009 at 4:35 PM, Keith Goodman  wrote:
> On Fri, Jun 5, 2009 at 1:31 PM,   wrote:
>> On Fri, Jun 5, 2009 at 4:27 PM, Keith Goodman  wrote:
>>> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen  wrote:
 On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote:
> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman  wrote:
>> On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
>>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
 Hello,
 I have a vectorizing problem that I don't see an obvious way to solve. 
  What
 I have is a vector like:
 obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 and a matrix
 T=zeros((6,6))
 and what I want in T is a count of all of the transitions in obs, e.g.
 T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because 
 the
 sequence 3-4 only happens once, etc...  I can do it unvectorized like:
 for o1,o2 in zip(obs[:-1],obs[1:]):
     T[o1,o2]+=1

 which gives the correct answer from above, which is:
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  3.,  0.,  0.,  1.],
        [ 0.,  3.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  2.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  2.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.]])


 but I thought there would be a better way.  I tried:
 o1=obs[:-1]
 o2=obs[1:]
 T[o1,o2]+=1
 but this doesn't give a count, it just yields 1's at the transition 
 points,
 like:
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.,  1.],
        [ 0.,  1.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.]])

 Is there a clever way to do this?  I could write a quick Cython 
 solution,
 but I wanted to keep this as an all-numpy implementation if I can.

>>>
>>> histogram2d or its imitation, there was a discussion on histogram2d a
>>> short time ago
>>>
>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>> obs2 = obs - 1
>> trans = 
>> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
>> re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
>>> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
>>> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
>>> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
>>> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
>> np.all(re == trans)
>>> True
>>>
>> trans
>>> array([[0, 0, 0, 0, 0, 0],
>>>       [0, 0, 3, 0, 0, 1],
>>>       [0, 3, 0, 1, 0, 0],
>>>       [0, 0, 2, 0, 1, 0],
>>>       [0, 0, 0, 2, 0, 0],
>>>       [0, 0, 0, 0, 1, 0]])
>>>
>>>
>>> or
>>>
>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>> range=[[0,5],[0,5]])
>> re
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>> h
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>
>> np.all(re == h)
>>> True
>>
>> There's no way my list method can beat that. But by adding
>>
>> import psyco
>> psyco.full()
>>
>> I get a total speed up of a factor of 15 when obs is length 1.
>
> Actually, it is faster:
>
> histogram:
>
>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>>> range=[[0,5],[0,5]])
>>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>>> range=[[0,5],[0,5]])
> 100 loops, best of 3: 4.14 ms per loop
>
> lists:
>
>>> timeit test(obs3, T3)
> 1000 loops, best of 3: 1.32 ms per loop
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


 here's a go:

 import numpy as np
 import random
 from itertools import groupby

 def test1(obs, T):
   for o1,o2 in zip(obs[:-1],obs[1:]):
       T[o1][o2] += 1
   return T


 def test2(obs, T):
    s = zip(obs[:-1], obs[1:])
    for idx, g in groupby(sorted(s)):
        T[idx] = len(list(g))
    return T

 obs = [random.randint(0, 5) for z in range(1)]

>>>

Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Keith Goodman
On Fri, Jun 5, 2009 at 1:31 PM,   wrote:
> On Fri, Jun 5, 2009 at 4:27 PM, Keith Goodman  wrote:
>> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen  wrote:
>>> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote:
 On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman  wrote:
> On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
>>> Hello,
>>> I have a vectorizing problem that I don't see an obvious way to solve.  
>>> What
>>> I have is a vector like:
>>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>>> and a matrix
>>> T=zeros((6,6))
>>> and what I want in T is a count of all of the transitions in obs, e.g.
>>> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
>>> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
>>> for o1,o2 in zip(obs[:-1],obs[1:]):
>>>     T[o1,o2]+=1
>>>
>>> which gives the correct answer from above, which is:
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>>>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>>>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>
>>>
>>> but I thought there would be a better way.  I tried:
>>> o1=obs[:-1]
>>> o2=obs[1:]
>>> T[o1,o2]+=1
>>> but this doesn't give a count, it just yields 1's at the transition 
>>> points,
>>> like:
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>>>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>>>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>
>>> Is there a clever way to do this?  I could write a quick Cython 
>>> solution,
>>> but I wanted to keep this as an all-numpy implementation if I can.
>>>
>>
>> histogram2d or its imitation, there was a discussion on histogram2d a
>> short time ago
>>
> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
> obs2 = obs - 1
> trans = 
> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
> re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
>> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
>> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
>> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
>> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
> np.all(re == trans)
>> True
>>
> trans
>> array([[0, 0, 0, 0, 0, 0],
>>       [0, 0, 3, 0, 0, 1],
>>       [0, 3, 0, 1, 0, 0],
>>       [0, 0, 2, 0, 1, 0],
>>       [0, 0, 0, 2, 0, 0],
>>       [0, 0, 0, 0, 1, 0]])
>>
>>
>> or
>>
> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
> range=[[0,5],[0,5]])
> re
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
> h
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
> np.all(re == h)
>> True
>
> There's no way my list method can beat that. But by adding
>
> import psyco
> psyco.full()
>
> I get a total speed up of a factor of 15 when obs is length 1.

 Actually, it is faster:

 histogram:

>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>> range=[[0,5],[0,5]])
>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>> range=[[0,5],[0,5]])
 100 loops, best of 3: 4.14 ms per loop

 lists:

>> timeit test(obs3, T3)
 1000 loops, best of 3: 1.32 ms per loop
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

>>>
>>>
>>> here's a go:
>>>
>>> import numpy as np
>>> import random
>>> from itertools import groupby
>>>
>>> def test1(obs, T):
>>>   for o1,o2 in zip(obs[:-1],obs[1:]):
>>>       T[o1][o2] += 1
>>>   return T
>>>
>>>
>>> def test2(obs, T):
>>>    s = zip(obs[:-1], obs[1:])
>>>    for idx, g in groupby(sorted(s)):
>>>        T[idx] = len(list(g))
>>>    return T
>>>
>>> obs = [random.randint(0, 5) for z in range(1)]
>>>
>>> print test2(obs, np.zeros((6, 6)))
>>> print test1(obs, np.zeros((6, 6)))
>>>
>>>
>>> ##
>>>
>>> In [10]: timeit test1(obs, np.zeros((6, 6)))
>>> 100 loops, best of 3: 18.8 ms per loop
>>>
>>> In

Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Brent Pedersen
On Fri, Jun 5, 2009 at 1:27 PM, Keith Goodman wrote:
> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen  wrote:
>> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote:
>>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman  wrote:
 On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
>> Hello,
>> I have a vectorizing problem that I don't see an obvious way to solve.  
>> What
>> I have is a vector like:
>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>> and a matrix
>> T=zeros((6,6))
>> and what I want in T is a count of all of the transitions in obs, e.g.
>> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
>> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
>> for o1,o2 in zip(obs[:-1],obs[1:]):
>>     T[o1,o2]+=1
>>
>> which gives the correct answer from above, which is:
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>>
>> but I thought there would be a better way.  I tried:
>> o1=obs[:-1]
>> o2=obs[1:]
>> T[o1,o2]+=1
>> but this doesn't give a count, it just yields 1's at the transition 
>> points,
>> like:
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>> Is there a clever way to do this?  I could write a quick Cython solution,
>> but I wanted to keep this as an all-numpy implementation if I can.
>>
>
> histogram2d or its imitation, there was a discussion on histogram2d a
> short time ago
>
 obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 obs2 = obs - 1
 trans = 
 np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
 re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
 np.all(re == trans)
> True
>
 trans
> array([[0, 0, 0, 0, 0, 0],
>       [0, 0, 3, 0, 0, 1],
>       [0, 3, 0, 1, 0, 0],
>       [0, 0, 2, 0, 1, 0],
>       [0, 0, 0, 2, 0, 0],
>       [0, 0, 0, 0, 1, 0]])
>
>
> or
>
 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
 re
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
 h
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
 np.all(re == h)
> True

 There's no way my list method can beat that. But by adding

 import psyco
 psyco.full()

 I get a total speed up of a factor of 15 when obs is length 1.
>>>
>>> Actually, it is faster:
>>>
>>> histogram:
>>>
> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
> range=[[0,5],[0,5]])
>>> 100 loops, best of 3: 4.14 ms per loop
>>>
>>> lists:
>>>
> timeit test(obs3, T3)
>>> 1000 loops, best of 3: 1.32 ms per loop
>>> ___
>>> Numpy-discussion mailing list
>>> Numpy-discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> here's a go:
>>
>> import numpy as np
>> import random
>> from itertools import groupby
>>
>> def test1(obs, T):
>>   for o1,o2 in zip(obs[:-1],obs[1:]):
>>       T[o1][o2] += 1
>>   return T
>>
>>
>> def test2(obs, T):
>>    s = zip(obs[:-1], obs[1:])
>>    for idx, g in groupby(sorted(s)):
>>        T[idx] = len(list(g))
>>    return T
>>
>> obs = [random.randint(0, 5) for z in range(1)]
>>
>> print test2(obs, np.zeros((6, 6)))
>> print test1(obs, np.zeros((6, 6)))
>>
>>
>> ##
>>
>> In [10]: timeit test1(obs, np.zeros((6, 6)))
>> 100 loops, best of 3: 18.8 ms per loop
>>
>> In [11]: timeit test2(obs, np.zeros((6, 6)))
>> 100 loops, best of 3: 6.91 ms per loop
>
> Nice!
>
> Try adding
>
> import psyco
> psyco.full()
>
> to test1. Or is that cheating?

it is if you're running 64bit.
:

Re: [Numpy-discussion] vectorizing

2009-06-05 Thread josef . pktd
On Fri, Jun 5, 2009 at 4:27 PM, Keith Goodman  wrote:
> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen  wrote:
>> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote:
>>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman  wrote:
 On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
>> Hello,
>> I have a vectorizing problem that I don't see an obvious way to solve.  
>> What
>> I have is a vector like:
>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>> and a matrix
>> T=zeros((6,6))
>> and what I want in T is a count of all of the transitions in obs, e.g.
>> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
>> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
>> for o1,o2 in zip(obs[:-1],obs[1:]):
>>     T[o1,o2]+=1
>>
>> which gives the correct answer from above, which is:
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>>
>> but I thought there would be a better way.  I tried:
>> o1=obs[:-1]
>> o2=obs[1:]
>> T[o1,o2]+=1
>> but this doesn't give a count, it just yields 1's at the transition 
>> points,
>> like:
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>> Is there a clever way to do this?  I could write a quick Cython solution,
>> but I wanted to keep this as an all-numpy implementation if I can.
>>
>
> histogram2d or its imitation, there was a discussion on histogram2d a
> short time ago
>
 obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 obs2 = obs - 1
 trans = 
 np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
 re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
 np.all(re == trans)
> True
>
 trans
> array([[0, 0, 0, 0, 0, 0],
>       [0, 0, 3, 0, 0, 1],
>       [0, 3, 0, 1, 0, 0],
>       [0, 0, 2, 0, 1, 0],
>       [0, 0, 0, 2, 0, 0],
>       [0, 0, 0, 0, 1, 0]])
>
>
> or
>
 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
 re
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
 h
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
 np.all(re == h)
> True

 There's no way my list method can beat that. But by adding

 import psyco
 psyco.full()

 I get a total speed up of a factor of 15 when obs is length 1.
>>>
>>> Actually, it is faster:
>>>
>>> histogram:
>>>
> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
> range=[[0,5],[0,5]])
>>> 100 loops, best of 3: 4.14 ms per loop
>>>
>>> lists:
>>>
> timeit test(obs3, T3)
>>> 1000 loops, best of 3: 1.32 ms per loop
>>> ___
>>> Numpy-discussion mailing list
>>> Numpy-discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> here's a go:
>>
>> import numpy as np
>> import random
>> from itertools import groupby
>>
>> def test1(obs, T):
>>   for o1,o2 in zip(obs[:-1],obs[1:]):
>>       T[o1][o2] += 1
>>   return T
>>
>>
>> def test2(obs, T):
>>    s = zip(obs[:-1], obs[1:])
>>    for idx, g in groupby(sorted(s)):
>>        T[idx] = len(list(g))
>>    return T
>>
>> obs = [random.randint(0, 5) for z in range(1)]
>>
>> print test2(obs, np.zeros((6, 6)))
>> print test1(obs, np.zeros((6, 6)))
>>
>>
>> ##
>>
>> In [10]: timeit test1(obs, np.zeros((6, 6)))
>> 100 loops, best of 3: 18.8 ms per loop
>>
>> In [11]: timeit test2(obs, np.zeros((6, 6)))
>> 100 loops, best of 3: 6.91 ms per loop
>
> Nice!
>
> Try adding
>
> import psyco
> psyco.full()
>
> to test1. Or is that cheating?
> __

Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Keith Goodman
On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen  wrote:
> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote:
>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman  wrote:
>>> On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
 On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
> Hello,
> I have a vectorizing problem that I don't see an obvious way to solve.  
> What
> I have is a vector like:
> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
> and a matrix
> T=zeros((6,6))
> and what I want in T is a count of all of the transitions in obs, e.g.
> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
> for o1,o2 in zip(obs[:-1],obs[1:]):
>     T[o1,o2]+=1
>
> which gives the correct answer from above, which is:
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
>
> but I thought there would be a better way.  I tried:
> o1=obs[:-1]
> o2=obs[1:]
> T[o1,o2]+=1
> but this doesn't give a count, it just yields 1's at the transition 
> points,
> like:
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
> Is there a clever way to do this?  I could write a quick Cython solution,
> but I wanted to keep this as an all-numpy implementation if I can.
>

 histogram2d or its imitation, there was a discussion on histogram2d a
 short time ago

>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>>> obs2 = obs - 1
>>> trans = 
>>> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
>>> re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
 ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
 ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
 ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
 ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
 ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>> np.all(re == trans)
 True

>>> trans
 array([[0, 0, 0, 0, 0, 0],
       [0, 0, 3, 0, 0, 1],
       [0, 3, 0, 1, 0, 0],
       [0, 0, 2, 0, 1, 0],
       [0, 0, 0, 2, 0, 0],
       [0, 0, 0, 0, 1, 0]])


 or

>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>>> range=[[0,5],[0,5]])
>>> re
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  1.],
       [ 0.,  3.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>> h
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  1.],
       [ 0.,  3.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])

>>> np.all(re == h)
 True
>>>
>>> There's no way my list method can beat that. But by adding
>>>
>>> import psyco
>>> psyco.full()
>>>
>>> I get a total speed up of a factor of 15 when obs is length 1.
>>
>> Actually, it is faster:
>>
>> histogram:
>>
 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
 timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
>> 100 loops, best of 3: 4.14 ms per loop
>>
>> lists:
>>
 timeit test(obs3, T3)
>> 1000 loops, best of 3: 1.32 ms per loop
>> ___
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> here's a go:
>
> import numpy as np
> import random
> from itertools import groupby
>
> def test1(obs, T):
>   for o1,o2 in zip(obs[:-1],obs[1:]):
>       T[o1][o2] += 1
>   return T
>
>
> def test2(obs, T):
>    s = zip(obs[:-1], obs[1:])
>    for idx, g in groupby(sorted(s)):
>        T[idx] = len(list(g))
>    return T
>
> obs = [random.randint(0, 5) for z in range(1)]
>
> print test2(obs, np.zeros((6, 6)))
> print test1(obs, np.zeros((6, 6)))
>
>
> ##
>
> In [10]: timeit test1(obs, np.zeros((6, 6)))
> 100 loops, best of 3: 18.8 ms per loop
>
> In [11]: timeit test2(obs, np.zeros((6, 6)))
> 100 loops, best of 3: 6.91 ms per loop

Nice!

Try adding

import psyco
psyco.full()

to test1. Or is that cheating?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Brent Pedersen
On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote:
> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman  wrote:
>> On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
>>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
 Hello,
 I have a vectorizing problem that I don't see an obvious way to solve.  
 What
 I have is a vector like:
 obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 and a matrix
 T=zeros((6,6))
 and what I want in T is a count of all of the transitions in obs, e.g.
 T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
 sequence 3-4 only happens once, etc...  I can do it unvectorized like:
 for o1,o2 in zip(obs[:-1],obs[1:]):
     T[o1,o2]+=1

 which gives the correct answer from above, which is:
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  3.,  0.,  0.,  1.],
        [ 0.,  3.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  2.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  2.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.]])


 but I thought there would be a better way.  I tried:
 o1=obs[:-1]
 o2=obs[1:]
 T[o1,o2]+=1
 but this doesn't give a count, it just yields 1's at the transition points,
 like:
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.,  1.],
        [ 0.,  1.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.]])

 Is there a clever way to do this?  I could write a quick Cython solution,
 but I wanted to keep this as an all-numpy implementation if I can.

>>>
>>> histogram2d or its imitation, there was a discussion on histogram2d a
>>> short time ago
>>>
>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>> obs2 = obs - 1
>> trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
>> re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
>>> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
>>> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
>>> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
>>> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
>> np.all(re == trans)
>>> True
>>>
>> trans
>>> array([[0, 0, 0, 0, 0, 0],
>>>       [0, 0, 3, 0, 0, 1],
>>>       [0, 3, 0, 1, 0, 0],
>>>       [0, 0, 2, 0, 1, 0],
>>>       [0, 0, 0, 2, 0, 0],
>>>       [0, 0, 0, 0, 1, 0]])
>>>
>>>
>>> or
>>>
>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>> range=[[0,5],[0,5]])
>> re
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>> h
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>
>> np.all(re == h)
>>> True
>>
>> There's no way my list method can beat that. But by adding
>>
>> import psyco
>> psyco.full()
>>
>> I get a total speed up of a factor of 15 when obs is length 1.
>
> Actually, it is faster:
>
> histogram:
>
>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
>>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>>> range=[[0,5],[0,5]])
> 100 loops, best of 3: 4.14 ms per loop
>
> lists:
>
>>> timeit test(obs3, T3)
> 1000 loops, best of 3: 1.32 ms per loop
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


here's a go:

import numpy as np
import random
from itertools import groupby

def test1(obs, T):
   for o1,o2 in zip(obs[:-1],obs[1:]):
   T[o1][o2] += 1
   return T


def test2(obs, T):
s = zip(obs[:-1], obs[1:])
for idx, g in groupby(sorted(s)):
T[idx] = len(list(g))
return T

obs = [random.randint(0, 5) for z in range(1)]

print test2(obs, np.zeros((6, 6)))
print test1(obs, np.zeros((6, 6)))


##

In [10]: timeit test1(obs, np.zeros((6, 6)))
100 loops, best of 3: 18.8 ms per loop

In [11]: timeit test2(obs, np.zeros((6, 6)))
100 loops, best of 3: 6.91 ms per loop
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Keith Goodman
On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman  wrote:
> On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
>>> Hello,
>>> I have a vectorizing problem that I don't see an obvious way to solve.  What
>>> I have is a vector like:
>>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>>> and a matrix
>>> T=zeros((6,6))
>>> and what I want in T is a count of all of the transitions in obs, e.g.
>>> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
>>> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
>>> for o1,o2 in zip(obs[:-1],obs[1:]):
>>>     T[o1,o2]+=1
>>>
>>> which gives the correct answer from above, which is:
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>>>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>>>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>
>>>
>>> but I thought there would be a better way.  I tried:
>>> o1=obs[:-1]
>>> o2=obs[1:]
>>> T[o1,o2]+=1
>>> but this doesn't give a count, it just yields 1's at the transition points,
>>> like:
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>>>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>>>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>
>>> Is there a clever way to do this?  I could write a quick Cython solution,
>>> but I wanted to keep this as an all-numpy implementation if I can.
>>>
>>
>> histogram2d or its imitation, there was a discussion on histogram2d a
>> short time ago
>>
> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
> obs2 = obs - 1
> trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
> re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
>> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
>> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
>> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
>> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
> np.all(re == trans)
>> True
>>
> trans
>> array([[0, 0, 0, 0, 0, 0],
>>       [0, 0, 3, 0, 0, 1],
>>       [0, 3, 0, 1, 0, 0],
>>       [0, 0, 2, 0, 1, 0],
>>       [0, 0, 0, 2, 0, 0],
>>       [0, 0, 0, 0, 1, 0]])
>>
>>
>> or
>>
> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
> re
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
> h
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
> np.all(re == h)
>> True
>
> There's no way my list method can beat that. But by adding
>
> import psyco
> psyco.full()
>
> I get a total speed up of a factor of 15 when obs is length 1.

Actually, it is faster:

histogram:

>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
>> range=[[0,5],[0,5]])
100 loops, best of 3: 4.14 ms per loop

lists:

>> timeit test(obs3, T3)
1000 loops, best of 3: 1.32 ms per loop
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread josef . pktd
On Fri, Jun 5, 2009 at 3:53 PM,   wrote:
> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
>> Hello,
>> I have a vectorizing problem that I don't see an obvious way to solve.  What
>> I have is a vector like:
>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>> and a matrix
>> T=zeros((6,6))
>> and what I want in T is a count of all of the transitions in obs, e.g.
>> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
>> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
>> for o1,o2 in zip(obs[:-1],obs[1:]):
>>     T[o1,o2]+=1
>>
>> which gives the correct answer from above, which is:
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>>
>> but I thought there would be a better way.  I tried:
>> o1=obs[:-1]
>> o2=obs[1:]
>> T[o1,o2]+=1
>> but this doesn't give a count, it just yields 1's at the transition points,
>> like:
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>> Is there a clever way to do this?  I could write a quick Cython solution,
>> but I wanted to keep this as an all-numpy implementation if I can.
>>
>
> histogram2d or its imitation, there was a discussion on histogram2d a
> short time ago
>
 obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 obs2 = obs - 1
 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)

I don't think this is correct, there are problems for the first and
last position if (1,1) and (5,5) are possible. I don't remember how I
did it before. The histogram2d version still looks correct for this
case. histogram2d uses bincount under the hood in a similar way.

Josef


 re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
 np.all(re == trans)
> True
>
 trans
> array([[0, 0, 0, 0, 0, 0],
>       [0, 0, 3, 0, 0, 1],
>       [0, 3, 0, 1, 0, 0],
>       [0, 0, 2, 0, 1, 0],
>       [0, 0, 0, 2, 0, 0],
>       [0, 0, 0, 0, 1, 0]])
>
>
> or
>
 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
 re
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
 h
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
 np.all(re == h)
> True
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Keith Goodman
On Fri, Jun 5, 2009 at 12:53 PM,   wrote:
> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
>> Hello,
>> I have a vectorizing problem that I don't see an obvious way to solve.  What
>> I have is a vector like:
>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>> and a matrix
>> T=zeros((6,6))
>> and what I want in T is a count of all of the transitions in obs, e.g.
>> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
>> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
>> for o1,o2 in zip(obs[:-1],obs[1:]):
>>     T[o1,o2]+=1
>>
>> which gives the correct answer from above, which is:
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>>
>> but I thought there would be a better way.  I tried:
>> o1=obs[:-1]
>> o2=obs[1:]
>> T[o1,o2]+=1
>> but this doesn't give a count, it just yields 1's at the transition points,
>> like:
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>> Is there a clever way to do this?  I could write a quick Cython solution,
>> but I wanted to keep this as an all-numpy implementation if I can.
>>
>
> histogram2d or its imitation, there was a discussion on histogram2d a
> short time ago
>
 obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 obs2 = obs - 1
 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
 re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
 np.all(re == trans)
> True
>
 trans
> array([[0, 0, 0, 0, 0, 0],
>       [0, 0, 3, 0, 0, 1],
>       [0, 3, 0, 1, 0, 0],
>       [0, 0, 2, 0, 1, 0],
>       [0, 0, 0, 2, 0, 0],
>       [0, 0, 0, 0, 1, 0]])
>
>
> or
>
 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
 re
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
 h
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
 np.all(re == h)
> True

There's no way my list method can beat that. But by adding

import psyco
psyco.full()

I get a total speed up of a factor of 15 when obs is length 1.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Maturing the Matrix class in NumPy

2009-06-05 Thread Stéfan van der Walt
Hi Alan

2009/6/5 Alan G Isaac :
> You should take it very seriously when someone
> reports to you that the matrix object is a crucial to them,
> e.g., as a teaching tool.  Even if you do not find
> personally persuasive an example like
> http://mail.scipy.org/pipermail/numpy-discussion/2009-June/043001.html

You make a good point and the example you gave earlier shows how the
Matrix object can be used to write expressions that correspond clearly
to those in a textbook.

Christopher is right that most of the developers don't use the Matrix
class much, but that is no excuse to neglect it as we are currently
doing.  If the Matrix class is to remain, we need to take the steps
necessary to integrate it into NumPy properly.

To get going we'll need a list of changes required (i.e. "in an ideal
world, how would matrices work?").  There should be a set protocol for
all numpy functions that guarantees compatibility with ndarrays,
matrices and other derived classes.  This also ties in with a
suggestion by Darren Dale on ufuncs that hasn't been getting the
attention it deserves.  Indexing issues need to be cleared up, with a
chapter on matrix use written for the guide.

Being one of the most vocal proponents of the Matrix class, would you
be prepared to develop your Matrix Proposal at
http://scipy.org/NewMatrixSpec further?  We need to get to a point
where we can assign tasks, and start writing code.  The keyphrase here
should be "rough consensus" to avoid the bike shedding we've run into
in the past.

Thanks for following this through,

Regards
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread josef . pktd
On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais  wrote:
> Hello,
> I have a vectorizing problem that I don't see an obvious way to solve.  What
> I have is a vector like:
> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
> and a matrix
> T=zeros((6,6))
> and what I want in T is a count of all of the transitions in obs, e.g.
> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
> for o1,o2 in zip(obs[:-1],obs[1:]):
>     T[o1,o2]+=1
>
> which gives the correct answer from above, which is:
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
>
> but I thought there would be a better way.  I tried:
> o1=obs[:-1]
> o2=obs[1:]
> T[o1,o2]+=1
> but this doesn't give a count, it just yields 1's at the transition points,
> like:
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
> Is there a clever way to do this?  I could write a quick Cython solution,
> but I wanted to keep this as an all-numpy implementation if I can.
>

histogram2d or its imitation, there was a discussion on histogram2d a
short time ago

>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>>> obs2 = obs - 1
>>> trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
>>> re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
... [ 0.,  0.,  3.,  0.,  0.,  1.],
... [ 0.,  3.,  0.,  1.,  0.,  0.],
... [ 0.,  0.,  2.,  0.,  1.,  0.],
... [ 0.,  0.,  0.,  2.,  0.,  0.],
... [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>> np.all(re == trans)
True

>>> trans
array([[0, 0, 0, 0, 0, 0],
   [0, 0, 3, 0, 0, 1],
   [0, 3, 0, 1, 0, 0],
   [0, 0, 2, 0, 1, 0],
   [0, 0, 0, 2, 0, 0],
   [0, 0, 0, 0, 1, 0]])


or

>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
>>> re
array([[ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  3.,  0.,  0.,  1.],
   [ 0.,  3.,  0.,  1.,  0.,  0.],
   [ 0.,  0.,  2.,  0.,  1.,  0.],
   [ 0.,  0.,  0.,  2.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>> h
array([[ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  3.,  0.,  0.,  1.],
   [ 0.,  3.,  0.,  1.,  0.,  0.],
   [ 0.,  0.,  2.,  0.,  1.,  0.],
   [ 0.,  0.,  0.,  2.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.]])

>>> np.all(re == h)
True
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Keith Goodman
On Fri, Jun 5, 2009 at 11:07 AM, Brian Blais  wrote:
> Hello,
> I have a vectorizing problem that I don't see an obvious way to solve.  What
> I have is a vector like:
> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
> and a matrix
> T=zeros((6,6))
> and what I want in T is a count of all of the transitions in obs, e.g.
> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
> for o1,o2 in zip(obs[:-1],obs[1:]):
>     T[o1,o2]+=1
>
> which gives the correct answer from above, which is:
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
>
> but I thought there would be a better way.  I tried:
> o1=obs[:-1]
> o2=obs[1:]
> T[o1,o2]+=1
> but this doesn't give a count, it just yields 1's at the transition points,
> like:
> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>
> Is there a clever way to do this?  I could write a quick Cython solution,
> but I wanted to keep this as an all-numpy implementation if I can.

It's a little faster (8.5% for me when obs is length 1) if you do

T = np.zeros((6,6), dtype=np.int)

But it more than 5 times faster if you use lists for T and obs. You're
just storing information here, so there is no reason to pay for the
overhead of arrays.

import random
import numpy as np

T = [[0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0],
[0,0,0,0,0,0], [0,0,0,0,0,0]]
obs = [random.randint(0, 5) for z in range(1)]

def test(obs, T):
for o1,o2 in zip(obs[:-1],obs[1:]):
T[o1][o2] += 1
return T
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] vectorizing

2009-06-05 Thread Brian Blais

Hello,

I have a vectorizing problem that I don't see an obvious way to  
solve.  What I have is a vector like:


obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])

and a matrix

T=zeros((6,6))

and what I want in T is a count of all of the transitions in obs,  
e.g. T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1  
because the sequence 3-4 only happens once, etc...  I can do it  
unvectorized like:


for o1,o2 in zip(obs[:-1],obs[1:]):
T[o1,o2]+=1


which gives the correct answer from above, which is:

array([[ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  3.,  0.,  0.,  1.],
   [ 0.,  3.,  0.,  1.,  0.,  0.],
   [ 0.,  0.,  2.,  0.,  1.,  0.],
   [ 0.,  0.,  0.,  2.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.]])



but I thought there would be a better way.  I tried:

o1=obs[:-1]
o2=obs[1:]
T[o1,o2]+=1

but this doesn't give a count, it just yields 1's at the transition  
points, like:


array([[ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  1.,  0.,  0.,  1.],
   [ 0.,  1.,  0.,  1.,  0.,  0.],
   [ 0.,  0.,  1.,  0.,  1.,  0.],
   [ 0.,  0.,  0.,  1.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.]])


Is there a clever way to do this?  I could write a quick Cython  
solution, but I wanted to keep this as an all-numpy implementation if  
I can.



thanks,


Brian Blais



--
Brian Blais
bbl...@bryant.edu
http://web.bryant.edu/~bblais



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix default to column vector?

2009-06-05 Thread Jason Rennie
As someone who is very used to thinking in terms of matrices and who just
went through the adjustment of translating matlab-like code to numpy, I
found the current matrix module to be confusing.  It's poor integration with
the rest of numpy/scipy (in particular, scipy.optimize.fmin_cg) made it more
difficult to use than it was worth.  I'd rather have "matrix" and/or "matrix
multiplication" sections of the documentation explain how to do typical,
basic matrix operations with nparray, dot, T, and arr[None,:].  I think a
matrix class would still be worthwhile for findability, but it should simply
serve as documentation for how to do matrix stuff with nparray.

Cheers,

Jason

-- 
Jason Rennie
Research Scientist, ITA Software
617-714-2645
http://www.itasoftware.com/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Matthieu Brucher
2009/6/5 David Cournapeau :
> Eric Firing wrote:
>>
>> David,
>>
>> The eigen web site indicates that eigen achieves high performance
>> without all the compilation difficulty of atlas.  Does eigen have enough
>> functionality to replace atlas in numpy?
>
> No, eigen does not provide a (complete) BLAS/LAPACK interface. I don't
> know if that's even a goal of eigen (it started as a project for KDE, to
> support high performance core computations for things like spreadsheet
> and co).
>
> But even then, it would be a huge undertaking. For all its flaws, LAPACK
> is old, tested code, with a very stable language (F77). Eigen is:
>    - not mature.
>    - heavily expression-template-based C++, meaning compilation takes
> ages + esoteric, impossible to decypher compilation errors. We have
> enough build problems already :)
>    - SSE dependency harcoded, since it is setup at build time. That's
> going backward IMHO - I would rather see a numpy/scipy which can load
> the optimized code at runtime.

I would add that it relies on C++ compiler extensions (the restrict
keyword) as does blitz. You unfortunately can't expect every compiler
to support it unless the C++ committee finally adds it to the
standard.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread David Cournapeau
Eric Firing wrote:
>
> David,
>
> The eigen web site indicates that eigen achieves high performance 
> without all the compilation difficulty of atlas.  Does eigen have enough 
> functionality to replace atlas in numpy?

No, eigen does not provide a (complete) BLAS/LAPACK interface. I don't
know if that's even a goal of eigen (it started as a project for KDE, to
support high performance core computations for things like spreadsheet
and co).

But even then, it would be a huge undertaking. For all its flaws, LAPACK
is old, tested code, with a very stable language (F77). Eigen is:
- not mature.
- heavily expression-template-based C++, meaning compilation takes
ages + esoteric, impossible to decypher compilation errors. We have
enough build problems already :)
- SSE dependency harcoded, since it is setup at build time. That's
going backward IMHO - I would rather see a numpy/scipy which can load
the optimized code at runtime.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix default to column vector?

2009-06-05 Thread josef . pktd
On Fri, Jun 5, 2009 at 1:33 PM, Geoffrey Ely  wrote:
> On Jun 5, 2009, at 10:18 AM, josef.p...@gmail.com wrote:
>> I'm only using arrays for consistency, but my econometrics code is
>> filled with arr[np.newaxis,:] .
>
>
> arr[None,:] is a lot cleaner in my opinion, especially when using
> "numpy" as the namespace.

It took me a long time to figure out that None and np.newaxis do the
same thing. I find newaxis more descriptive and mostly stick to it.
When I started looking at numpy, I was always wondering what these
"None"s were doing in the code.

just one more useful trick to preserve dimension, using slices instead
of index avoids arr[None,:], but requires more thinking.

Josef

>>> X
matrix([[1, 0],
[1, 1]])
>>> Y
array([[1, 0],
   [1, 1]])


>>> X[0,:].shape == Y[0:1,:].shape
True
>>> X[0,:] == Y[0:1,:]
matrix([[ True,  True]], dtype=bool)

> -Geoff
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix default to column vector?

2009-06-05 Thread Geoffrey Ely
On Jun 5, 2009, at 10:18 AM, josef.p...@gmail.com wrote:
> I'm only using arrays for consistency, but my econometrics code is
> filled with arr[np.newaxis,:] .


arr[None,:] is a lot cleaner in my opinion, especially when using  
"numpy" as the namespace.

-Geoff
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix default to column vector?

2009-06-05 Thread Alan G Isaac
On 6/5/2009 11:38 AM Olivier Verdier apparently wrote:
> I think matrices can be pretty tricky when used for 
> teaching.  For instance, you have to explain that all the 
> operators work component-wise, except the multiplication! 
> Another caveat is that since matrices are always 2x2, the 
> "scalar product" of two column vectors computed as " x.T 
> * y" will not be a scalar, but a 2x2 matrix. There is also 
> the fact that you must cast all your vectors to column/raw 
> matrices (as in matlab). For all these reasons, I prefer 
> to use arrays and dot for teaching, and I have never had 
> any complaints. 


I do not understand this "argument".
You should take it very seriously when someone
reports to you that the matrix object is a crucial to them,
e.g., as a teaching tool.  Even if you do not find
personally persuasive an example like
http://mail.scipy.org/pipermail/numpy-discussion/2009-June/043001.html
I have told you: this is important for my students.
Reporting that your students do not complain about using
arrays instead of matrices does not change this one bit.

Student backgrounds differ by domain of application.  In
economics, matrices are in *very* wide use, and
multidimensional arrays get almost no use.  Textbooks in
econometrics (a huge and important field, even outside of
economics) are full of proofs using matrix algebra.
A close match to what the students see is crucial.
When working with multiplication or exponentiation,
matrices do what they expect, and 2d arrays do not.

One more point. As Python users we get used to installing
a package here and a package there to add functionality.
But this is not how most people looking for a matrix
language see the world.  Removing the matrix object from
NumPy will raise the barrier to adoption by social
scientists, and there should be a strongly persuasive reason
before taking such a step.

Separately from all that, does anyone doubt that there is
code that depends on the matrix object?  The core objection
to a past proposal for useful change was that it could break
extant code.  I would hope that nobody who took that
position would subsequently propose removing the matrix
object altogether.

Cheers,
Alan Isaac

PS If x and y are "column vectors" (i.e., matrices), then
x.T * y *should* be a 1×1 matrix.
Since the * operator is doing matrix multiplication,
this is the correct result, not an anomaly.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Eric Firing
David Cournapeau wrote:

> 
> It really depends on the CPU, compiler, how atlas was compiled, etc...
> it can be slightly faster to 10 times faster (if you use a very poorly
> optimized ATLAS).
> 
> For some recent benchmarks:
> 
> http://eigen.tuxfamily.org/index.php?title=Benchmark
> 

David,

The eigen web site indicates that eigen achieves high performance 
without all the compilation difficulty of atlas.  Does eigen have enough 
functionality to replace atlas in numpy?  Presumably it would need C 
compatibility wrappers to emulate the blas functions.  Would that kill 
its performance?  Or be very difficult?

(I'm asking from curiosity combined with complete ignorance.  Until 
yesterday I had never even heard of eigen.)

Eric

> cheers,
> 
> David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix default to column vector?

2009-06-05 Thread josef . pktd
On Fri, Jun 5, 2009 at 11:38 AM, Olivier Verdier  wrote:
> I agree. It would be a good idea to have matrices out of numpy as a
> standalone package.
> Indeed, having matrices in the numpy core comes at a pedagogical cost.
> Newcomers (as I once was) do not know which to use. Matrix or array? It
> turns out that the vast majority of numpy/scipy modules use arrays, so
> arrays is the preferred way to go.
> It would thus be clearer to have arrays in numpy and matrices available as
> an external package.
> Besides, I think matrices can be pretty tricky when used for teaching. For
> instance, you have to explain that all the operators work component-wise,
> except the multiplication! Another caveat is that since matrices are always
> 2x2, the "scalar product" of two column vectors computed as " x.T * y" will
> not be a scalar, but a 2x2 matrix. There is also the fact that you must cast
> all your vectors to column/raw matrices (as in matlab). For all these
> reasons, I prefer to use arrays and dot for teaching, and I have never had
> any complaints.

For anyone with an econometrics background in gauss or matlab, numpy
arrays have a huge adjustment cost.
I'm only using arrays for consistency, but my econometrics code is
filled with arr[np.newaxis,:] .

I wouldn't recommend my friends switching from gauss to numpy/scipy
when they want to write or develop some econometrics algorithm. gauss
feels like copying the formula from the notes or a book. With numpy
arrays I always have to recover one dimension, and I have to verify
the shape of any array more often.
In econometrics the default vector is often a 2d vector so x.T*x and
x*x.T have very different meanings.
python/numpy/scipy have a lot of other advantages compared to gauss or
matlab, but it's not the linear algebra syntax.

So, I don't see any sense in removing matrices, you can just ignore
them and tell your students to do so. It increases the initial
switching cost, even if users, that get more into it, then switch to
arrays.

Josef

(BTW: I think scipy.stats will soon become matrix friendly)

>>> X = np.matrix([[1,0],[1,1]])
>>> X.mean()
0.75
>>> X.mean(0)
matrix([[ 1. ,  0.5]])
>>> X-X.mean(0)
matrix([[ 0. , -0.5],
[ 0. ,  0.5]])


>>> Y = np.array([[1,0],[1,1]])
>>> Y.mean(0).shape
(2,)
>>> Y - Y.mean(0)[np.newaxis,:]
array([[ 0. , -0.5],
   [ 0. ,  0.5]])

>>> np.matrix([[1,0],[1,1]])**2
matrix([[1, 0],
[2, 1]])
>>> np.array([[1,0],[1,1]])**2
array([[1, 0],
   [1, 1]])

>>> np.matrix([[1,0],[1,1]])**(-1)
matrix([[ 1.,  0.],
[-1.,  1.]])
>>> np.array([[1,0],[1,1]])**(-1)
array([[  1, -2147483648],
   [  1,   1]])


>>> x = np.matrix([[1],[0]])
>>> x.T*x
matrix([[1]])
>>> (x.T*x).shape
(1, 1)
>>> (x*x.T)
matrix([[1, 0],
[0, 0]])
>>> (x*x.T).shape
(2, 2)


>>> y = np.array([1,0])
>>> np.dot(y,y)
1
>>> np.dot(y,y.T)
1
>>> np.dot(y[:,np.newaxis], y[np.newaxis,:])
array([[1, 0],
   [0, 0]])
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Import fails of scipy.factorial wheninstalled as a zipped egg

2009-06-05 Thread Fadhley Salim

> I don't think numpy/scipy are zip safe, the numpy packageloader uses
os.path to find files.

Evidently! :-)

But the strange thing is that all this worked fine with Scipy 0.6.0 -
it's only since 0.7.0 was released that this started going wrong. 

Sal
Disclaimer CALYON UK:
This email does not create a legal relationship between any member of the 
Cr=E9dit Agricole group and the recipient or constitute investment advice. 
The content of this email should not be copied or disclosed (in whole or 
part) to any other person. It may contain information which is 
confidential, privileged or otherwise protected from disclosure. If you 
are not the intended recipient, you should notify us and delete it from 
your system. Emails may be monitored, are not secure and may be amended, 
destroyed or contain viruses and in communicating with us such conditions 
are accepted. Any content which does not relate to business matters is not 
endorsed by us.

Calyon is authorised by the Comit=e9 des Etablissements de Cr=e9dit et des
Entreprises d'Investissement (CECEI) and supervised by the Commission Bancaire
in France and subject to limited regulation by the Financial Services Authority.
Details about the extent of our regulation by the Financial Services Authority
are available from us on request. Calyon is incorporated in France with limited
liability and registered in England & Wales. Registration number: FC008194.
Registered office: Broadwalk House, 5 Appold Street, London, EC2A 2DA.
Disclaimer CALYON France:
This message and/or any attachments is intended for the sole use of its 
addressee.
If you are not the addressee, please immediately notify the sender and then 
destroy the message.
As this message and/or any attachments may have been altered without our 
knowledge, its content is not legally binding on CALYON Crédit Agricole CIB.
All rights reserved.

Ce message et ses pièces jointes est destiné à l'usage exclusif de son 
destinataire.
Si vous recevez ce message par erreur, merci d'en aviser immédiatement 
l'expéditeur et de le détruire ensuite.
Le présent message pouvant être altéré à notre insu, CALYON Crédit Agricole CIB 
ne peut pas être engagé par son contenu.
Tous droits réservés.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix default to column vector?

2009-06-05 Thread Olivier Verdier
I agree. It would be a good idea to have matrices out of numpy as a
standalone package.
Indeed, having matrices in the numpy core comes at a pedagogical cost.
Newcomers (as I once was) do not know which to use. Matrix or array? It
turns out that the vast majority of numpy/scipy modules use arrays, so
arrays is the preferred way to go.

It would thus be clearer to have arrays in numpy and matrices available as
an external package.

Besides, I think matrices can be pretty tricky when used for teaching. For
instance, you have to explain that all the operators work component-wise,
except the multiplication! Another caveat is that since matrices are always
2x2, the "scalar product" of two column vectors computed as " x.T * y" will
not be a scalar, but a 2x2 matrix. There is also the fact that you must cast
all your vectors to column/raw matrices (as in matlab). For all these
reasons, I prefer to use arrays and dot for teaching, and I have never had
any complaints.

== Olivier

2009/6/4 Tommy Grav 

>
> On Jun 4, 2009, at 5:41 PM, Alan G Isaac wrote:
> > On 6/4/2009 5:27 PM Tommy Grav apparently wrote:
> >> Or the core development team split the matrices out of numpy and
> >> make it
> >> as separate package that the people that use them could pick up and
> >> run with.
> >
> >
> > This too would be a mistake, I believe.
> > But it depends on whether a goal is to
> > have more people use NumPy.  I believe
> > the community will gain from growth.
> >
> > In sum, my argument is this:
> > Keeping a matrix object in NumPy has substantial
> > benefits in encouraging growth of the NumPy
> > community, and as far as I can tell, it is
> > imposing few costs.  Therefore I think there is
> > a very substantial burden on people who propose
> > removing the matrix object to demonstrate
> > just how the NumPy community will benefit from
> > this change.
>
> This is a perfectly valid argument. I am actually quite happy with the
> numpy package as it is (I work in astronomy), I was just pointing out
> that if there are few of the core numpy people interested in maintaing
> or upgrading the matrix class one solution might be to make it a
> scipy-like package that easily can be installed on top of numpy, but
> where the code base might be more accessible to those that are
> interested in matrices, but feel that numpy is a daunting beast to
> tackle.
> Some sense of ownership of a matrixpy package might encourage more
> people to contribute.
>
> Just an idea ;-)
> Tommy
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread David Paul Reichert
Hi,

Thanks for the suggestion.

Unfortunately I'm using university managed machines here, so
I have no control over the distribution, not even root access.

However, I just downloaded the latest Enthought distribution,
which uses numpy 1.3, and now numpy is only 30% to 60% slower
than matlab, instead of 5 times slower. I can live with that.
(whether it uses atlas now or not, I don't know).

Cheers

David


Quoting Jason Rennie :

> Hi David,
>
> Let me suggest that you try the latest version of Ubuntu (9.04/Jaunty),
> which was released two months ago.  It sounds like you are effectively using
> release 5 of RedHat Linux which was originally released May 2007.  There
> have been updates (5.1, 5.2, 5.3), but, if my memory serves me correctly,
> RedHat updates are more focused on fixing bugs and security issues rather
> than improving functionality.  Ubuntu does a full, new release every 6
> months so you don't have to wait as long to see improvements.  Ubuntu also
> has a tremendously better package management system.  You generally
> shouldn't be installing packages by hand as it sounds like you are doing.
>
> This post suggests that the latest version of Ubuntu is up-to-date wrt
> ATLAS:
>
> http://www.mail-archive.com/numpy-discussion@scipy.org/msg13102.html
>
> Jason
>
> On Fri, Jun 5, 2009 at 5:44 AM, David Paul Reichert <
> d.p.reich...@sms.ed.ac.uk> wrote:
>
>> Thanks for the replies so far.
>>
>> I had already tested using an already transposed matrix in the loop,
>> it didn't make any difference. Oh and btw, I'm on (Scientific) Linux.
>>
>> I used the Enthought distribution, but I guess I'll have to get
>> my hands dirty and try to get that Atlas thing working (I'm not
>> a Linux expert though). My simulations pretty much consist of
>> matrix multiplications, so if I don't get rid of that factor 5,
>> I pretty much have to get back to Matlab.
>>
>> When you said Atlas is going to be optimized for my system, does
>> that mean I should compile everything on each machine separately?
>> I.e. I have a not-so-great desktop machine and one of those bigger
>> multicore things available...
>>
>> Cheers
>>
>> David
>>
>
> --
> Jason Rennie
> Research Scientist, ITA Software
> 617-714-2645
> http://www.itasoftware.com/
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Import fails of scipy.factorial when installed as a zipped egg

2009-06-05 Thread josef . pktd
On Fri, Jun 5, 2009 at 9:25 AM, Fadhley Salim
 wrote:
> I've noticed that with scipy 0.7.0 + numpy 1.2.1, an importing the factorial
> function from the scipy module always seems to fail when scipy is installed
> as a zipped ",egg" file. When the project is installed as an unzipped
> directory it works fine.
>
> Is there any reason why this function should not be egg-safe?
>
> My test to verify this was pretty simple: I just installed my scipy egg
> (made by extracting the Windows, Python 2.4 Superpack) with the easy_install
> command. Whenever I install it with the "-Z" option (to uncompress) it works
> fine. With the "-z" option it always fails.
>

I don't think numpy/scipy are zip safe, the numpy packageloader uses
os.path to find files.
I would expect that you are not able to import anything, factorial
might be just the first function that is loaded.

easy_install usually does a check for zipsafe, and if it unpacks the
egg it usually means it's not zipsafe, for example because of the use
of __file__.

That's my guess,

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Import fails of scipy.factorial when installed as a zipped egg

2009-06-05 Thread Fadhley Salim
I've noticed that with scipy 0.7.0 + numpy 1.2.1, an importing the
factorial function from the scipy module always seems to fail when scipy
is installed as a zipped ",egg" file. When the project is installed as
an unzipped directory it works fine.
 
Is there any reason why this function should not be egg-safe? 
 
My test to verify this was pretty simple: I just installed my scipy egg
(made by extracting the Windows, Python 2.4 Superpack) with the
easy_install command. Whenever I install it with the "-Z" option (to
uncompress) it works fine. With the "-z" option it always fails.
 
Thanks!
 
Sal
Disclaimer CALYON UK:
This email does not create a legal relationship between any member of the 
Cr=E9dit Agricole group and the recipient or constitute investment advice. 
The content of this email should not be copied or disclosed (in whole or 
part) to any other person. It may contain information which is 
confidential, privileged or otherwise protected from disclosure. If you 
are not the intended recipient, you should notify us and delete it from 
your system. Emails may be monitored, are not secure and may be amended, 
destroyed or contain viruses and in communicating with us such conditions 
are accepted. Any content which does not relate to business matters is not 
endorsed by us.

Calyon is authorised by the Comit=e9 des Etablissements de Cr=e9dit et des
Entreprises d'Investissement (CECEI) and supervised by the Commission Bancaire
in France and subject to limited regulation by the Financial Services Authority.
Details about the extent of our regulation by the Financial Services Authority
are available from us on request. Calyon is incorporated in France with limited
liability and registered in England & Wales. Registration number: FC008194.
Registered office: Broadwalk House, 5 Appold Street, London, EC2A 2DA.
Disclaimer CALYON France:
This message and/or any attachments is intended for the sole use of its 
addressee.
If you are not the addressee, please immediately notify the sender and then 
destroy the message.
As this message and/or any attachments may have been altered without our 
knowledge, its content is not legally binding on CALYON Crédit Agricole CIB.
All rights reserved.

Ce message et ses pièces jointes est destiné à l'usage exclusif de son 
destinataire.
Si vous recevez ce message par erreur, merci d'en aviser immédiatement 
l'expéditeur et de le détruire ensuite.
Le présent message pouvant être altéré à notre insu, CALYON Crédit Agricole CIB 
ne peut pas être engagé par son contenu.
Tous droits réservés.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Jason Rennie
Hi David,

Let me suggest that you try the latest version of Ubuntu (9.04/Jaunty),
which was released two months ago.  It sounds like you are effectively using
release 5 of RedHat Linux which was originally released May 2007.  There
have been updates (5.1, 5.2, 5.3), but, if my memory serves me correctly,
RedHat updates are more focused on fixing bugs and security issues rather
than improving functionality.  Ubuntu does a full, new release every 6
months so you don't have to wait as long to see improvements.  Ubuntu also
has a tremendously better package management system.  You generally
shouldn't be installing packages by hand as it sounds like you are doing.

This post suggests that the latest version of Ubuntu is up-to-date wrt
ATLAS:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg13102.html

Jason

On Fri, Jun 5, 2009 at 5:44 AM, David Paul Reichert <
d.p.reich...@sms.ed.ac.uk> wrote:

> Thanks for the replies so far.
>
> I had already tested using an already transposed matrix in the loop,
> it didn't make any difference. Oh and btw, I'm on (Scientific) Linux.
>
> I used the Enthought distribution, but I guess I'll have to get
> my hands dirty and try to get that Atlas thing working (I'm not
> a Linux expert though). My simulations pretty much consist of
> matrix multiplications, so if I don't get rid of that factor 5,
> I pretty much have to get back to Matlab.
>
> When you said Atlas is going to be optimized for my system, does
> that mean I should compile everything on each machine separately?
> I.e. I have a not-so-great desktop machine and one of those bigger
> multicore things available...
>
> Cheers
>
> David
>

-- 
Jason Rennie
Research Scientist, ITA Software
617-714-2645
http://www.itasoftware.com/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] scipy 0.7.1rc2 released

2009-06-05 Thread David Cournapeau
Hi,

   The RC2 for 0.7.1 scipy release has just been tagged. This is a
bug-fixes only release, see below for the release notes. More
information can
also be found on the trac website:

http://projects.scipy.org/scipy/milestone/0.7.1

The only code change compared to the RC1 is one fix which is essential
for mac os x/python 2.6 combination. Tarballs, windows and mac os x
binaries are available.

Please test it ! I am particularly interested in results for scipy
binaries on mac os x (do they work on ppc).

The scipy developers

--

=
SciPy 0.7.1 Release Notes
=

.. contents::

SciPy 0.7.1 is a bug-fix release with no new features compared to 0.7.0.

scipy.io


Bugs fixed:

- Several fixes in Matlab file IO

scipy.odr
=

Bugs fixed:

- Work around a failure with Python 2.6

scipy.signal


Memory leak in lfilter have been fixed, as well as support for array object

Bugs fixed:

- #880, #925: lfilter fixes
- #871: bicgstab fails on Win32


scipy.sparse


Bugs fixed:

- #883: scipy.io.mmread with scipy.sparse.lil_matrix broken
- lil_matrix and csc_matrix reject now unexpected sequences,
  cf. http://thread.gmane.org/gmane.comp.python.scientific.user/19996

scipy.special
=

Several bugs of varying severity were fixed in the special functions:

- #503, #640: iv: problems at large arguments fixed by new implementation
- #623: jv: fix errors at large arguments
- #679: struve: fix wrong output for v < 0
- #803: pbdv produces invalid output
- #804: lqmn: fix crashes on some input
- #823: betainc: fix documentation
- #834: exp1 strange behavior near negative integer values
- #852: jn_zeros: more accurate results for large s, also in jnp/yn/ynp_zeros
- #853: jv, yv, iv: invalid results for non-integer v < 0, complex x
- #854: jv, yv, iv, kv: return nan more consistently when out-of-domain
- #927: ellipj: fix segfault on Windows
- #946: ellpj: fix segfault on Mac OS X/python 2.6 combination.
- ive, jve, yve, kv, kve: with real-valued input, return nan for out-of-domain
  instead of returning only the real part of the result.

Also, when ``scipy.special.errprint(1)`` has been enabled, warning
messages are now issued as Python warnings instead of printing them to
stderr.


scipy.stats
===

- linregress, mannwhitneyu, describe: errors fixed
- kstwobign, norm, expon, exponweib, exponpow, frechet, genexpon, rdist,
  truncexpon, planck: improvements to numerical accuracy in distributions

Windows binaries for python 2.6
===

python 2.6 binaries for windows are now included. The binary for python 2.5
requires numpy 1.2.0 or above, and and the one for python 2.6 requires numpy
1.3.0 or above.

Universal build for scipy
=

Mac OS X binary installer is now a proper universal build, and does not depend
on gfortran anymore (libgfortran is statically linked). The python 2.5 version
of scipy requires numpy 1.2.0 or above, the python 2.6 version requires numpy
1.3.0 or above.

Checksums
=

08cdf8d344535fcb5407dafd9f120b9b  release/installers/scipy-0.7.1rc2.tar.gz
93595ca9f0b5690a6592c9fc43e9253d
release/installers/scipy-0.7.1rc2-py2.6-macosx10.5.dmg
fc8f434a9b4d76f1b38b7025f425127b  release/installers/scipy-0.7.1rc2.zip
8cdc2472f3282f08a703cdcca5c92952
release/installers/scipy-0.7.1rc2-win32-superpack-python2.5.exe
15c4c45de931bd7f13e4ce24bd59579e
release/installers/scipy-0.7.1rc2-win32-superpack-python2.6.exe
e42853e39b3b4f590824e3a262863ef6
release/installers/scipy-0.7.1rc2-py2.5-macosx10.5.dmg
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread David Cournapeau
Sebastian Walter wrote:
> On Fri, Jun 5, 2009 at 11:58 AM, David
> Cournapeau wrote:
>   
>> Sebastian Walter wrote:
>> 
>>> On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote:
>>>
>>>   
 I should update after reading the thread Sebastian linked:

 The current 1.3 version of numpy (don't know about previous versions) uses
 the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled 
 with
 these libraries. I've verified this on linux only, thought it shouldnt be
 any different on windows AFAIK.

 
>>> in the  best of all possible worlds this would be done by a package
>>> maintainer
>>>
>>>   
>> Numpy packages on windows do use ATLAS, so I am not sure what you are
>> referring to ?
>> 
> I'm on debian unstable and my numpy (version 1.2.1) uses an unoptimized blas.
>   

Yes, it is because the package on Linux are not well done in that
respect (to their defense, numpy build is far from being packaging
friendly, and is both fragile and obscure).

> I had the impression that most ppl that use numpy are on linux.

Sourceforge numbers tell a different story at least. I think most users
on the ML use linux, and certainly almost every developer use linux or
mac os x. But ML already filter most windows users - only geeks read ML
:) I am pretty sure a vast majority of numpy users never even bother to
look for the ML.

>> On a side note,  correctly packaging ATLAS is almost
>> inherently impossible, since the build method of ATLAS can never produce
>> the same binary (even on the same machine), and the binary is optimized
>> for the machine it was built on. So if you want the best speed, you
>> should build atlas by yourself - which is painful on windows (you need
>> cygwin).
>> 
> in the debian repositories there are different builds of atlas so
> there could be different builds for numpy, too.
> But there aren't
>   

There are several problems:
- packagers (rightfully) hate to have many versions of the same software
- as for now, if ATLAS is detected, numpy is built differently than
if it is linked against conventional blas/lapack
- numpy on debian is not built with atlas support

But there is certainly no need to build one numpy version for every
atlas: the linux loader can load the most appropriate library depending
on your architecture, the so called hwcap flag. If your CPU supports
SSE2, and you have ATLAS installed for SSE2, then the loader will
automatically load the libraries there instead of the one in /usr/lib by
default.

But because ATLAS is such a pain to support in a binary form, only
ancient versions of ATLAS are packaged anyways (3.6.*). So if you care
so much, you should build your own.

>> On windows, if you really care about speed, you should try linking
>> against the Intel MKL. That's what Matlab uses internally on recent
>> versions, so you would get the same speed. But that's rather involved.
>> 
>
>   

It really depends on the CPU, compiler, how atlas was compiled, etc...
it can be slightly faster to 10 times faster (if you use a very poorly
optimized ATLAS).

For some recent benchmarks:

http://eigen.tuxfamily.org/index.php?title=Benchmark

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Sebastian Walter
On Fri, Jun 5, 2009 at 11:58 AM, David
Cournapeau wrote:
> Sebastian Walter wrote:
>> On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote:
>>
>>> I should update after reading the thread Sebastian linked:
>>>
>>> The current 1.3 version of numpy (don't know about previous versions) uses
>>> the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with
>>> these libraries. I've verified this on linux only, thought it shouldnt be
>>> any different on windows AFAIK.
>>>
>>
>> in the  best of all possible worlds this would be done by a package
>> maintainer
>>
>
> Numpy packages on windows do use ATLAS, so I am not sure what you are
> referring to ?
I'm on debian unstable and my numpy (version 1.2.1) uses an unoptimized blas.
I had the impression that most ppl that use numpy are on linux. But
apparently this is a misconception.

>On a side note,  correctly packaging ATLAS is almost
> inherently impossible, since the build method of ATLAS can never produce
> the same binary (even on the same machine), and the binary is optimized
> for the machine it was built on. So if you want the best speed, you
> should build atlas by yourself - which is painful on windows (you need
> cygwin).
in the debian repositories there are different builds of atlas so
there could be different builds for numpy, too.
But there aren't

>
> On windows, if you really care about speed, you should try linking
> against the Intel MKL. That's what Matlab uses internally on recent
> versions, so you would get the same speed. But that's rather involved.

How much faster is MKL than ATLAS?


>
> cheers,
>
> David
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread David Cournapeau
Sebastian Walter wrote:
> On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote:
>   
>> I should update after reading the thread Sebastian linked:
>>
>> The current 1.3 version of numpy (don't know about previous versions) uses
>> the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with
>> these libraries. I've verified this on linux only, thought it shouldnt be
>> any different on windows AFAIK.
>> 
>
> in the  best of all possible worlds this would be done by a package
> maintainer
>   

Numpy packages on windows do use ATLAS, so I am not sure what you are
referring to ? On a side note,  correctly packaging ATLAS is almost
inherently impossible, since the build method of ATLAS can never produce
the same binary (even on the same machine), and the binary is optimized
for the machine it was built on. So if you want the best speed, you
should build atlas by yourself - which is painful on windows (you need
cygwin).

On windows, if you really care about speed, you should try linking
against the Intel MKL. That's what Matlab uses internally on recent
versions, so you would get the same speed. But that's rather involved.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Sebastian Walter
On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote:
> I should update after reading the thread Sebastian linked:
>
> The current 1.3 version of numpy (don't know about previous versions) uses
> the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with
> these libraries. I've verified this on linux only, thought it shouldnt be
> any different on windows AFAIK.

in the  best of all possible worlds this would be done by a package
maintainer


>
> chris
>
> On Thu, Jun 4, 2009 at 4:54 PM, Chris Colbert  wrote:
>>
>> Sebastian is right.
>>
>> Since Matlab r2007 (i think that's the version) it has included support
>> for multi-core architecture. On my core2 Quad here at the office, r2008b has
>> no problem utilizing 100% cpu for large matrix multiplications.
>>
>>
>> If you download and build atlas and lapack from source and enable
>> parrallel threads in atlas, then compile numpy against these libraries, you
>> should achieve similar if not better performance (since the atlas routines
>> will be tuned to your system).
>>
>> If you're on Windows, you need to do some trickery to get threading to
>> work (the instructions are on the atlas website).
>>
>> Chris
>>
>>
>
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread David Paul Reichert
Thanks for the replies so far.

I had already tested using an already transposed matrix in the loop,
it didn't make any difference. Oh and btw, I'm on (Scientific) Linux.

I used the Enthought distribution, but I guess I'll have to get
my hands dirty and try to get that Atlas thing working (I'm not
a Linux expert though). My simulations pretty much consist of
matrix multiplications, so if I don't get rid of that factor 5,
I pretty much have to get back to Matlab.

When you said Atlas is going to be optimized for my system, does
that mean I should compile everything on each machine separately?
I.e. I have a not-so-great desktop machine and one of those bigger
multicore things available...

Cheers

David



Quoting David Cournapeau :

> David Warde-Farley wrote:
>> On 4-Jun-09, at 5:03 PM, Anne Archibald wrote:
>>
>>
>>> Apart from the implementation issues people have chimed in about
>>> already, it's worth noting that the speed of matrix multiplication
>>> depends on the memory layout of the matrices. So generating B instead
>>> directly as a 100 by 500 matrix might affect the speed substantially
>>> (I'm not sure in which direction). If MATLAB's matrices have a
>>> different memory order, that might be a factor as well.
>>>
>>
>> AFAIK Matlab matrices are always Fortran ordered.
>>
>> Does anyone know if the defaults on Mac OS X (vecLib/Accelerate)
>> support multicore? Is there any sense in compiling ATLAS on OS X (I
>> know it can be done)?
>>
>
> It may be worthwhile if you use a recent gcc and recent ATLAS.
> Multithread support is supposed to be much better in 3.9.* compared to
> 3.6.* (which is likely the version used on vecLib/Accelerate). The main
> issue I could foresee is clashes between vecLib/Accelerate and Atlas if
> you mix softwares which use one or the other together.
>
> For the OP question: recent matlab versions use the MKL, which is likely
> to give higher performances than ATLAS, specially on windows (compilers
> on that platform are ancient, as building atlas with native compilers on
> windows requires super-human patience).
>
> David
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-05 Thread David Warde-Farley
On 4-Jun-09, at 4:38 PM, Anne Archibald wrote:

> It seems to me that this is the basic source of the problem. Perhaps
> this can be addressed? I realize maintaining compatibility with the
> current behaviour is necessary, so how about a multistage deprecation:
>
> 1. add a keyword argument to intersect1d "assume_unique"; if it is not
> present, check for uniqueness and emit a warning if not unique
> 2. change the warning to an exception
> Optionally:
> 3. change the meaning of the function to that of intersect1d_nu if the
> keyword argument is not present
>
> One could do something similar with setmember1d.

+1 on this idea. I've been bitten by the non-unique stuff in the past,  
especially with setmember1d, not realizing that both need to be unique.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion