Re: [Numpy-discussion] Quick Question about Optimization

2008-05-21 Thread Christopher Barker
James Snyder wrote:
> b = np.zeros((1,30)) # allocates new memory and disconnects the view

This is really about how python works, not how numpy works:

np.zeros() -- creates a new array with all zeros in it -- that's the 
whole point.

b = Something -- binds the name "b" to the Something object. Name 
binding will never, ever, change the object the name used to be bound 
to. This has nothing to do with whether the object formally know as "b" 
is referencing the data from another array.

This is a nice write up of the concept of name binding in Python:

http://python.net/crew/mwh/hacks/objectthink.html


-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-20 Thread Eric Firing
James Snyder wrote:
>> Well, if you do f = a[n, :], you would get a view, another object that
>> shares the data in memory with a but is a separate object.
> 
> OK, so it is a new object, with the properties of the slice it
> references, but if I write anything to it, it will consistently go
> back to the same spot in the original array.
> 
> In general, if I work on that, and don't do something that allocates a
> new set of memory locations for that, it will reference the same
> memory location.
> 
> If I do:
> a = np.zeros((20,30))
> b = a[2,:]
> 
> b += 1 # will add 1 to the original slice
> 
> b.resize will fail...
> 
> b = np.zeros((1,30)) # allocates new memory and disconnects the view
> 
> The appropriate way to zero out the original memory locations would be
> to do something like b *= 0?

or, from fastest to slowest:

b.fill(0)
b.flat = 0
b[:] = 0


Eric
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-20 Thread James Snyder
>
> Well, if you do f = a[n, :], you would get a view, another object that
> shares the data in memory with a but is a separate object.

OK, so it is a new object, with the properties of the slice it
references, but if I write anything to it, it will consistently go
back to the same spot in the original array.

In general, if I work on that, and don't do something that allocates a
new set of memory locations for that, it will reference the same
memory location.

If I do:
a = np.zeros((20,30))
b = a[2,:]

b += 1 # will add 1 to the original slice

b.resize will fail...

b = np.zeros((1,30)) # allocates new memory and disconnects the view

The appropriate way to zero out the original memory locations would be
to do something like b *= 0?

Is there any way to force a writeback to the original view so long as
the dimensions of what is being assigned to b is the same as the
original? Or, is there a way to, say, enable a warking if I'm dropping
a view?

> Sorry I didn't get back to you earlier on this -- I was a bit busy
> yesterday.  It looks like weave.blitz isn't working on your second
> line because you're not explicitly putting slices in some of the
> dimensions,  In numpy v[0:2] works for 1,2,3,4, dimensions, but
> for a 2d array in blitz you have to use v[0:2,:], 3d v[0:2,:,:].  It's
> a bit more picky.   I think that's the problem with your second line
> -- try replacing v[:] with v[0,:] and theta[1-curidx] with
> theta[1-curidx, :]. (I may have missed some others.)

OK, that seems to do it.  I still actually get better performance
(subsequent runs after compilation) with the straight numpy code.
Strangely, I'm also getting that the flip/flop method is running a bit
slower than having the separate prev_ variables.  aff_input is rather
large (~2000x14000), but the state vectors are only 14000 (or that x2
w/ flipflopping for some), each.

Is there slowdown maybe because it is doing those 3 lines of blitz
operations then doing a bunch of python numpy?

Either way, It seems like I've got pretty good performance as well as
a handle on using weave.blitz in the future.

-jsnyder


-- 
James Snyder
Biomedical Engineering
Northwestern University
[EMAIL PROTECTED]
PGP: http://fanplastic.org/key.txt
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-20 Thread Hoyt Koepke
> The time has now been shaved down to ~9 seconds with this suggestion
> from the original 13-14s, with the inclusing of Eric Firing's
> suggestions.  This is without scipy.weave, which at the moment I can't
> get to work for all lines, and when I just replace one of them
> sucessfully it seems to run more slowly, I assume because it is
> converting data back and forth.

There's a nontrivial constant time cost to using scipy.weave.blitz,
but it isn't copying the data. Thus it will slow you down on smaller
arrays, but you'll notice quite a speed-up on much larger ones.  I
should have mentioned that earlier -- I assumed your arrays were
really large.

> Are there any major pitfalls to be aware of?  It sounds like if I do:
> f = a[n,:] I get a reference, but if I did something like g = a[n,:]*2
> I would get a copy.

Well, if you do f = a[n, :], you would get a view, another object that
shares the data in memory with a but is a separate object.

> If anyone has any clues on why scipy.weave is blowing
> (http://pastebin.com/m79699c04) using the code I attached, I wouldn't
> mind knowing.  Most of the times I've attempted using weave, I've been
> a little baffled about why things aren't working.  I also don't have a
> sense for whether all numpy functions should be "weavable" or if it's
> just general array operations that can be put through weave.

Sorry I didn't get back to you earlier on this -- I was a bit busy
yesterday.  It looks like weave.blitz isn't working on your second
line because you're not explicitly putting slices in some of the
dimensions,  In numpy v[0:2] works for 1,2,3,4, dimensions, but
for a 2d array in blitz you have to use v[0:2,:], 3d v[0:2,:,:].  It's
a bit more picky.   I think that's the problem with your second line
-- try replacing v[:] with v[0,:] and theta[1-curidx] with
theta[1-curidx, :]. (I may have missed some others.)

weave.blitz is currently limited to just array operations... it
doesn't really support the numpy functions.

Hope that helps a little

-- Hoyt

-- 
+++
Hoyt Koepke
UBC Department of Computer Science
http://www.cs.ubc.ca/~hoytak/
[EMAIL PROTECTED]
+++
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread James Snyder
I've done a little profiling with cProfile as well as with dtrace
since the bindings exist in mac os x, and you can use a lot of the d
scripts that apply to python, so previously I've found that the
np.random call and the where (in the original code) were heavy hitters
as far as amount of time consumed.

The time has now been shaved down to ~9 seconds with this suggestion
from the original 13-14s, with the inclusing of Eric Firing's
suggestions.  This is without scipy.weave, which at the moment I can't
get to work for all lines, and when I just replace one of them
sucessfully it seems to run more slowly, I assume because it is
converting data back and forth.

Quick question regarding the pointer abstraction that's going on, the
following seems to work:
np.putmask(S[n,:],np.squeeze(mask),1)

with that section of S being worked on.  Is it safe to assume in most
cases while working with NumPy that without additional operations,
aside from indexing, that a reference rather than a copy is being
passed?  It certainly seems like this sort of thing, including stuff
like:

u = self.u
v = self.v
theta = self.theta
...

without having to repack those data into self later, since u,v,theta
are just references to the existing data saves on code and whatnot,
but I'm a little worried about not being explicit.

Are there any major pitfalls to be aware of?  It sounds like if I do:
f = a[n,:] I get a reference, but if I did something like g = a[n,:]*2
I would get a copy.

Thanks guys.  This is definitely useful, especially in combination
with using PyPar on my dual core system I'm getting pretty good
performance :-)

If anyone has any clues on why scipy.weave is blowing
(http://pastebin.com/m79699c04) using the code I attached, I wouldn't
mind knowing.  Most of the times I've attempted using weave, I've been
a little baffled about why things aren't working.  I also don't have a
sense for whether all numpy functions should be "weavable" or if it's
just general array operations that can be put through weave.

I know this is the numpy list, so I can take things over to the scipy
list if that's more appropriate.


On Mon, May 19, 2008 at 7:36 PM, Robert Kern <[EMAIL PROTECTED]> wrote:
> On Mon, May 19, 2008 at 6:55 PM, James Snyder <[EMAIL PROTECTED]> wrote:
>> Also note, I'm not asking to match MATLAB performance.  It'd be nice,
>> but again I'm just trying to put together decent, fairly efficient
>> numpy code.
>
> I can cut the time by about a quarter by just using the boolean mask
> directly instead of using where().
>
>for n in range(0,time_milliseconds):
>u  =  expfac_m  *  prev_u + (1-expfac_m) * aff_input[n,:]
>v = u + sigma * stdnormrvs[n, :]
>theta = expfac_theta * prev_theta - (1-expfac_theta)
>
>mask = (v >= theta)
>
>S[n,np.squeeze(mask)] = 1
>theta[mask] += b
>
>prev_u = u
>prev_theta = theta
>
>
> There aren't any good line-by-line profiling tools in Python, but you
> can fake it by making a local function for each line:
>
>def f1():
>u  =  expfac_m  *  prev_u + (1-expfac_m) * aff_input[n,:]
>return u
>def f2():
>v = u + sigma * stdnormrvs[n, :]
>return v
>def f3():
>theta = expfac_theta * prev_theta - (1-expfac_theta)
>return theta
>def f4():
>mask = (v >= theta)
>return mask
>def f5():
>S[n,np.squeeze(mask)] = 1
>def f6():
>theta[mask] += b
>
># Run Standard, Unoptimized Model
>for n in range(0,time_milliseconds):
>u = f1()
>v = f2()
>theta = f3()
>mask = f4()
>f5()
>f6()
>
>prev_u = u
>prev_theta = theta
>
> I get f6() as being the biggest bottleneck, followed by the general
> time spent in the loop (about the same), followed by f5(), f1(), and
> f3() (each about half of f6()), followed by f2() (about half of f5()).
> f4() is negligible.
>
> Masked operations are inherently slow. They mess up CPU's branch
> prediction. Worse, the use of iterators in that part of the code
> frustrates compilers' attempts to optimize that away in the case of
> contiguous arrays.
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
James Snyder
Biomedical Engineering
Northwestern University
[EMAIL PROTECTED]
___

Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Eric Firing
Robert Kern wrote:
> On Mon, May 19, 2008 at 6:55 PM, James Snyder <[EMAIL PROTECTED]> wrote:
>> Also note, I'm not asking to match MATLAB performance.  It'd be nice,
>> but again I'm just trying to put together decent, fairly efficient
>> numpy code.
> 
> I can cut the time by about a quarter by just using the boolean mask
> directly instead of using where().
> 
> for n in range(0,time_milliseconds):
> u  =  expfac_m  *  prev_u + (1-expfac_m) * aff_input[n,:]
> v = u + sigma * stdnormrvs[n, :]
> theta = expfac_theta * prev_theta - (1-expfac_theta)
> 
> mask = (v >= theta)
> 
> S[n,np.squeeze(mask)] = 1
> theta[mask] += b
> 
> prev_u = u
> prev_theta = theta
> 
> 
> There aren't any good line-by-line profiling tools in Python, but you
> can fake it by making a local function for each line:
> 
> def f1():
> u  =  expfac_m  *  prev_u + (1-expfac_m) * aff_input[n,:]
> return u
> def f2():
> v = u + sigma * stdnormrvs[n, :]
> return v
> def f3():
> theta = expfac_theta * prev_theta - (1-expfac_theta)
> return theta
> def f4():
> mask = (v >= theta)
> return mask
> def f5():
> S[n,np.squeeze(mask)] = 1
> def f6():
> theta[mask] += b
> 
> # Run Standard, Unoptimized Model
> for n in range(0,time_milliseconds):
> u = f1()
> v = f2()
> theta = f3()
> mask = f4()
> f5()
> f6()
> 
> prev_u = u
> prev_theta = theta
> 
> I get f6() as being the biggest bottleneck, followed by the general
> time spent in the loop (about the same), followed by f5(), f1(), and
> f3() (each about half of f6()), followed by f2() (about half of f5()).
> f4() is negligible.
> 
> Masked operations are inherently slow. They mess up CPU's branch
> prediction. Worse, the use of iterators in that part of the code
> frustrates compilers' attempts to optimize that away in the case of
> contiguous arrays.
> 

f6 can be sped up more than a factor of 2 by using putmask:

In [10]:xx = np.random.rand(10)

In [11]:mask = xx > 0.5

In [12]:timeit xx[mask] += 2.34
100 loops, best of 3: 4.06 ms per loop

In [14]:timeit np.putmask(xx, mask, xx+2.34)
1000 loops, best of 3: 1.4 ms per loop

I think that
xx += 2.34*mask
will be similarly quick, but I can't get ipython timeit to work with it.

Eric


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Robert Kern
On Mon, May 19, 2008 at 7:30 PM, Charles R Harris
<[EMAIL PROTECTED]> wrote:
>
> On Mon, May 19, 2008 at 5:52 PM, Robert Kern <[EMAIL PROTECTED]> wrote:
>>
>> On Mon, May 19, 2008 at 6:39 PM, Charles R Harris
>> <[EMAIL PROTECTED]> wrote:
>> >
>> > On Mon, May 19, 2008 at 4:36 PM, Robert Kern <[EMAIL PROTECTED]>
>> > wrote:
>> >>
>> >> On Mon, May 19, 2008 at 5:27 PM, Charles R Harris
>> >> <[EMAIL PROTECTED]> wrote:
>> >> > The latest versions of Matlab use the ziggurat method to generate
>> >> > random
>> >> > normals and it is faster than the method used in numpy. I have
>> >> > ziggurat
>> >> > code
>> >> > at hand, but IIRC, Robert doesn't trust the method ;)
>> >>
>> >> Well, I outlined the tests that would satisfy me, but I don't think
>> >> you ever responded.
>> >>
>> >>
>> >>  http://projects.scipy.org/pipermail/scipy-dev/2005-December/004405.html
>> >>  which references
>> >>
>> >>  http://projects.scipy.org/pipermail/scipy-dev/2005-December/004400.html
>> >
>> > It's been tested in the literature.
>>
>> And it happened to fail such tests. Hence the Doornik paper which
>> improves Marsaglia's method to pass the appropriate tests.
>
> Exactly. Doornik was more careful about using independent samples and also
> used a better random number generator (MWC8222), not exactly rocket
> science.  Believe it or not, I had read Doornik's paper before I did my
> implementation.

Good! I believe this is the first time you've mentioned it.

> I also used a better ziggurat, IMHO, than Marsaglia and
> Doornik.

Ah. HOs need testing. dieharder is probably de rigeur these days, and
it will accept input from a file.

  http://www.phy.duke.edu/~rgb/General/dieharder.php

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Robert Kern
On Mon, May 19, 2008 at 6:55 PM, James Snyder <[EMAIL PROTECTED]> wrote:
> Also note, I'm not asking to match MATLAB performance.  It'd be nice,
> but again I'm just trying to put together decent, fairly efficient
> numpy code.

I can cut the time by about a quarter by just using the boolean mask
directly instead of using where().

for n in range(0,time_milliseconds):
u  =  expfac_m  *  prev_u + (1-expfac_m) * aff_input[n,:]
v = u + sigma * stdnormrvs[n, :]
theta = expfac_theta * prev_theta - (1-expfac_theta)

mask = (v >= theta)

S[n,np.squeeze(mask)] = 1
theta[mask] += b

prev_u = u
prev_theta = theta


There aren't any good line-by-line profiling tools in Python, but you
can fake it by making a local function for each line:

def f1():
u  =  expfac_m  *  prev_u + (1-expfac_m) * aff_input[n,:]
return u
def f2():
v = u + sigma * stdnormrvs[n, :]
return v
def f3():
theta = expfac_theta * prev_theta - (1-expfac_theta)
return theta
def f4():
mask = (v >= theta)
return mask
def f5():
S[n,np.squeeze(mask)] = 1
def f6():
theta[mask] += b

# Run Standard, Unoptimized Model
for n in range(0,time_milliseconds):
u = f1()
v = f2()
theta = f3()
mask = f4()
f5()
f6()

prev_u = u
prev_theta = theta

I get f6() as being the biggest bottleneck, followed by the general
time spent in the loop (about the same), followed by f5(), f1(), and
f3() (each about half of f6()), followed by f2() (about half of f5()).
f4() is negligible.

Masked operations are inherently slow. They mess up CPU's branch
prediction. Worse, the use of iterators in that part of the code
frustrates compilers' attempts to optimize that away in the case of
contiguous arrays.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Charles R Harris
On Mon, May 19, 2008 at 5:52 PM, Robert Kern <[EMAIL PROTECTED]> wrote:

> On Mon, May 19, 2008 at 6:39 PM, Charles R Harris
> <[EMAIL PROTECTED]> wrote:
> >
> > On Mon, May 19, 2008 at 4:36 PM, Robert Kern <[EMAIL PROTECTED]>
> wrote:
> >>
> >> On Mon, May 19, 2008 at 5:27 PM, Charles R Harris
> >> <[EMAIL PROTECTED]> wrote:
> >> > The latest versions of Matlab use the ziggurat method to generate
> random
> >> > normals and it is faster than the method used in numpy. I have
> ziggurat
> >> > code
> >> > at hand, but IIRC, Robert doesn't trust the method ;)
> >>
> >> Well, I outlined the tests that would satisfy me, but I don't think
> >> you ever responded.
> >>
> >>
> http://projects.scipy.org/pipermail/scipy-dev/2005-December/004405.html
> >>  which references
> >>
> http://projects.scipy.org/pipermail/scipy-dev/2005-December/004400.html
> >
> > It's been tested in the literature.
>
> And it happened to fail such tests. Hence the Doornik paper which
> improves Marsaglia's method to pass the appropriate tests.


Exactly. Doornik was more careful about using independent samples and also
used a better random number generator (MWC8222), not exactly rocket
science.  Believe it or not, I had read Doornik's paper before I did my
implementation. I also used a better ziggurat, IMHO, than Marsaglia and
Doornik.

MWC8222 is also about twice as fast on AMD hardware as the Mersenne Twister,
but does require more careful initialization. On Pentium V they are about
they are about the same. I haven't benchmarked either on Core2.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread James Snyder
On to the code, here's a current implementation, attached.  I make no
claims about it being great code, I've modified it so that there is a
weave version and a sans-weave version.

Many of the suggestions make things a bit faster.  The weave version
bombs out with a rather long log, which can be found at:
http://pastebin.com/m79699c04

I can tell it's failing for the second weave.blitz line, but I don't
understand why exactly.

What does this mean?:
error: no match for call to '(blitz::FastArrayIterator)
(const blitz::TinyVector&)'

Also note, I'm not asking to match MATLAB performance.  It'd be nice,
but again I'm just trying to put together decent, fairly efficient
numpy code.

On Mon, May 19, 2008 at 3:53 PM, Christopher Barker
<[EMAIL PROTECTED]> wrote:
> Anne Archibald wrote:
>> 2008/5/19 James Snyder <[EMAIL PROTECTED]>:
>>> I can provide the rest of the code if needed, but it's basically just
>>> filling some vectors with random and empty data and initializing a few
>>> things.
>>
>> It would kind of help, since it would make it clearer what's a scalar
>> and what's an array, and what the dimensions of the various arrays
>> are.
>
> It would also help if you provided a complete example (as little code as
> possible), so we could try out and time our ideas before suggesting them.
>
>>> np.random.standard_normal(size=(1,self.naff))
>
> Anyone know how fast this is compared to Matlab? That could be the
> difference right there.
>
> -Chris
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> [EMAIL PROTECTED]
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
James Snyder
Biomedical Engineering
Northwestern University
[EMAIL PROTECTED]
import os.path as path
import sys
import time
import numpy as np
import scipy.weave as wv

class AfferentData(object):
"""Compute afferent response to transdermal potentials"""

def __init__(self, b=0.09, sigma=0.04, tau_m=2, aff_input=np.zeros((2000,13857)),runnum=1):
super(AfferentData, self).__init__()

# Data + dimens
self.aff_input = aff_input
self.endtime = aff_input.shape[0]
self.naff = aff_input.shape[1]
self.runnum = runnum

# Main Params
self.b = b
self.sigma = sigma
self.tau_m = tau_m
rnd_exp = -np.log(np.random.uniform(size=(1,self.naff)))

# Implement approach in the future to hold over previous rates
self.tau_theta = 21 + 18 * rnd_exp
self.expfac_m = np.exp(-1/self.tau_m)
self.expfac_theta = np.exp(-1/self.tau_theta)

# Set initial states
self.prev_u = np.zeros((1,self.naff))
prev_rand = np.random.standard_normal(size=(1,self.naff))
self.prev_theta = 0.053+0.040*prev_rand

# Pre-allocation
self.u = np.empty((1,self.naff))
self.v = np.empty((1,self.naff))
self.theta = np.empty((1,self.naff))

self.S  =  np.uint8(np.zeros((self.endtime, self.naff)))

def gen_spikes(self,runtype="numpy"):
"""generate all spikes for available transdermal voltages"""

self.runtype=runtype

print "Starting Run: %d" % self.runnum
t1 = time.time()
self.gen_spikes_to_time(self.endtime)
t2 = time.time()
self.runtime = t2-t1
print 'Run %d took %1.3f s' % ((self.runnum,self.runtime))

def gen_spikes_to_time(self, time_milliseconds):
"""generate spikes up until a particular time index"""

# Unpack Data
# I suppose this could be programmatic with local or something else?
u = self.u
v = self.v
theta = self.theta
S = self.S
aff_input = self.aff_input
prev_u = self.prev_u
prev_theta = self.prev_theta
expfac_theta = self.expfac_theta
expfac_m = float(self.expfac_m)
sigma = self.sigma
naff = self.naff
b = self.b

stdnormrvs = np.random.standard_normal(size=(time_milliseconds,naff) )

if (self.runtype == "weave"):  
u = np.empty( (2, naff) )
v = np.empty( (1, naff) )
theta = np.empty( (2, naff) )
curidx = int(0)

for n in xrange(time_milliseconds):
   wv.blitz("u[curidx, :] = expfac_m  *  u[1-curidx, :] + (1-expfac_m) * aff_input[n,:]")
   wv.blitz("v[:] = u[curidx, :] + sigma * stdnormrvs[n, :]")
   wv.blitz("theta[curidx, :] = expfac_theta * theta[1-curidx] - (1-expfac_theta)")
   
   idx_spk = np.where(v >= theta)
  

Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Robert Kern
On Mon, May 19, 2008 at 6:39 PM, Charles R Harris
<[EMAIL PROTECTED]> wrote:
>
> On Mon, May 19, 2008 at 4:36 PM, Robert Kern <[EMAIL PROTECTED]> wrote:
>>
>> On Mon, May 19, 2008 at 5:27 PM, Charles R Harris
>> <[EMAIL PROTECTED]> wrote:
>> > The latest versions of Matlab use the ziggurat method to generate random
>> > normals and it is faster than the method used in numpy. I have ziggurat
>> > code
>> > at hand, but IIRC, Robert doesn't trust the method ;)
>>
>> Well, I outlined the tests that would satisfy me, but I don't think
>> you ever responded.
>>
>>  http://projects.scipy.org/pipermail/scipy-dev/2005-December/004405.html
>>  which references
>>  http://projects.scipy.org/pipermail/scipy-dev/2005-December/004400.html
>
> It's been tested in the literature.

And it happened to fail such tests. Hence the Doornik paper which
improves Marsaglia's method to pass the appropriate tests.
Consequently, I want to see the tests performed on the actual
implementation before using it. It's a complicated algorithm that is
*demonstrably* easy to get wrong.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Charles R Harris
On Mon, May 19, 2008 at 4:36 PM, Robert Kern <[EMAIL PROTECTED]> wrote:

> On Mon, May 19, 2008 at 5:27 PM, Charles R Harris
> <[EMAIL PROTECTED]> wrote:
> > The latest versions of Matlab use the ziggurat method to generate random
> > normals and it is faster than the method used in numpy. I have ziggurat
> code
> > at hand, but IIRC, Robert doesn't trust the method ;)
>
> Well, I outlined the tests that would satisfy me, but I don't think
> you ever responded.
>
>  http://projects.scipy.org/pipermail/scipy-dev/2005-December/004405.html
>  which references
>  http://projects.scipy.org/pipermail/scipy-dev/2005-December/004400.html
>

It's been tested in the literature. It's basically just sampling a Gaussian
but done so most of the tests for being under the curve are trivial and
there are few misses, i.e., the Gaussian is covered with a stack (ziggurat)
of slices of equal areas, each slice is randomly chosen, then the position
along the slice is randomly chosen. Most of those last points will be under
the curve except at the ends, and it is those last that require computation.
However, like all sampling it has to be carefully implemented and the
samples are discretized differently than for the current way. Floats are
strange that way because they are on a log scale. The tails will be fine,
the real question is how much precision you want when doubles are returned,
i.e., how fine the discetization of the resulting samples should be. The
same method also works for the exponential distribution.

I don't feel this is a pressing issue, when I need fast normals I use my own
code. But if we are in competition with Matlab maybe we should give it a
shot.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread James Snyder
Separating the response into 2 emails, here's the aspect that comes
from implementations of random:

In short, that's part of the difference. I ran these a few times to
check for consistency.

MATLAB (R2008a:
tic
for i = 1:2000
a = randn(1,13857);
end
toc

Runtime: ~0.733489 s

NumPy (1.1.0rc1):
import numpy as np
import time

t1 = time.time()
for n in xrange(0,2000):
a = np.random.standard_normal(size=(1,14000))
t2 = time.time()

print 'Runtime: %1.3f s' % ((t2-t1))

Runtime: ~2.716 s

On Mon, May 19, 2008 at 3:53 PM, Christopher Barker
<[EMAIL PROTECTED]> wrote:
> Anne Archibald wrote:
>> 2008/5/19 James Snyder <[EMAIL PROTECTED]>:
>>> I can provide the rest of the code if needed, but it's basically just
>>> filling some vectors with random and empty data and initializing a few
>>> things.
>>
>> It would kind of help, since it would make it clearer what's a scalar
>> and what's an array, and what the dimensions of the various arrays
>> are.
>
> It would also help if you provided a complete example (as little code as
> possible), so we could try out and time our ideas before suggesting them.
>
>>> np.random.standard_normal(size=(1,self.naff))
>
> Anyone know how fast this is compared to Matlab? That could be the
> difference right there.
>
> -Chris
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> [EMAIL PROTECTED]
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
James Snyder
Biomedical Engineering
Northwestern University
[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Robert Kern
On Mon, May 19, 2008 at 5:27 PM, Charles R Harris
<[EMAIL PROTECTED]> wrote:
> The latest versions of Matlab use the ziggurat method to generate random
> normals and it is faster than the method used in numpy. I have ziggurat code
> at hand, but IIRC, Robert doesn't trust the method ;)

Well, I outlined the tests that would satisfy me, but I don't think
you ever responded.

  http://projects.scipy.org/pipermail/scipy-dev/2005-December/004405.html
  which references
  http://projects.scipy.org/pipermail/scipy-dev/2005-December/004400.html

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Charles R Harris
On Mon, May 19, 2008 at 2:53 PM, Christopher Barker <[EMAIL PROTECTED]>
wrote:

> Anne Archibald wrote:
> > 2008/5/19 James Snyder <[EMAIL PROTECTED]>:
> >> I can provide the rest of the code if needed, but it's basically just
> >> filling some vectors with random and empty data and initializing a few
> >> things.
> >
> > It would kind of help, since it would make it clearer what's a scalar
> > and what's an array, and what the dimensions of the various arrays
> > are.
>
> It would also help if you provided a complete example (as little code as
> possible), so we could try out and time our ideas before suggesting them.
>
> >> np.random.standard_normal(size=(1,self.naff))
>
> Anyone know how fast this is compared to Matlab? That could be the
> difference right there.
>

The latest versions of Matlab use the ziggurat method to generate random
normals and it is faster than the method used in numpy. I have ziggurat code
at hand, but IIRC, Robert doesn't trust the method ;) I don't know if it
would actually speed things up, though.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Christopher Barker
Anne Archibald wrote:
> 2008/5/19 James Snyder <[EMAIL PROTECTED]>:
>> I can provide the rest of the code if needed, but it's basically just
>> filling some vectors with random and empty data and initializing a few
>> things.
> 
> It would kind of help, since it would make it clearer what's a scalar
> and what's an array, and what the dimensions of the various arrays
> are.

It would also help if you provided a complete example (as little code as 
possible), so we could try out and time our ideas before suggesting them.

>> np.random.standard_normal(size=(1,self.naff))

Anyone know how fast this is compared to Matlab? That could be the 
difference right there.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Hoyt Koepke
On Mon, May 19, 2008 at 12:53 PM, Robin <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I think my understanding is somehow incomplete... It's not clear to me
> why (simplified case)
>
> a[curidx,:] = scalar * a[2-curidx,:]
> should be faster than
> a = scalar * b
>
> In both cases I thought the scalar multiplication results in a new
> array (new memory allocated) and then the difference between copying
> that result into the existing array u[curix,:] or reassining the
> reference u to that result should be very similar?
>
> If anything I would have thought the direct assignment would be
> quicker since then there is no copying.
>
> What am I missing?

Actually, I think you are correct.  My bad.  I was mainly thinking in
terms of weave.blitz, where it would make a difference, then
translating back...

--Hoyt

+++
Hoyt Koepke
UBC Department of Computer Science
http://www.cs.ubc.ca/~hoytak/
[EMAIL PROTECTED]
+++
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Eric Firing
Robin wrote:
> Also you could use xrange instead of range...
> 
> Again, not sure of the size of the effect but it seems to be
> recommended by the docstring.

No, it is going away in Python 3.0, and its only real benefit is a 
memory saving in extreme cases.

 From the Python library docs:
"The advantage of xrange() over range() is minimal (since xrange() still 
has to create the values when asked for them) except when a very large 
range is used on a memory-starved machine or when all of the range's 
elements are never used (such as when the loop is usually terminated 
with break)."

Eric
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Anne Archibald
2008/5/19 James Snyder <[EMAIL PROTECTED]>:

> First off, I know that optimization is evil, and I should make sure
> that everything works as expected prior to bothering with squeezing
> out extra performance, but the situation is that this particular block
> of code works, but it is about half as fast with numpy as in matlab,
> and I'm wondering if there's a better approach than what I'm doing.
>
> I have a chunk of code, below, that generally iterates over 2000
> iterations, and the vectors that are being worked on at a given step
> generally have ~14000 elements in them.

With arrays this size, I wouldn't worry about python overhead - things
like range versus xrange or self lookups.

> Is there anything in practice here that could be done to speed this
> up?  I'm looking more for general numpy usage tips, that I can use
> while writing further code and not so things that would be obscure or
> difficult to maintain in the future.

Try using a profiler to find which steps are using most of your time.
With such a simple function it may not be very informative, but it's
worth a try.

> Also, the results of this are a binary array, I'm wondering if there's
> anything more compact for expressing than using 8 bits to represent
> each single bit.  I've poked around, but I haven't come up with any
> clean and unhackish ideas :-)

There's a tradeoff between compactness and speed here. The *fastest*
is probably one boolean per 32-bit integer. It sounds awful, I know,
but most modern CPUs have to work harder to access bytes individually
than they do to access them four at a time. On the other hand, cache
performance can make a huge difference, so compactness might actually
amount to speed. I don't think numpy has a packed bit array data type
(which is a shame, but would require substantial implementation
effort).

> I can provide the rest of the code if needed, but it's basically just
> filling some vectors with random and empty data and initializing a few
> things.

It would kind of help, since it would make it clearer what's a scalar
and what's an array, and what the dimensions of the various arrays
are.

>for n in range(0,time_milliseconds):
>self.u  =  self.expfac_m  *  self.prev_u +
> (1-self.expfac_m) * self.aff_input[n,:]
>self.v = self.u + self.sigma *
> np.random.standard_normal(size=(1,self.naff))

You can use "scale" to rescale the random numbers on creation; that'll
save you a temporary.

>self.theta = self.expfac_theta * self.prev_theta -
> (1-self.expfac_theta)
>
>idx_spk = np.where(self.v>=self.theta)

You can probably skip the "where"; the result of the expression
self.v>=self.theta is a boolean array, which you can use directly for
indexing.

>self.S[n,idx_spk] = 1
>self.theta[idx_spk] = self.theta[idx_spk] + self.b

+= here might speed things up, not just in terms of temporaries but by
saving a fancy-indexing operation.

>self.prev_u = self.u
>self.prev_theta = self.theta



Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Robin
Hi,

I think my understanding is somehow incomplete... It's not clear to me
why (simplified case)

a[curidx,:] = scalar * a[2-curidx,:]
should be faster than
a = scalar * b

In both cases I thought the scalar multiplication results in a new
array (new memory allocated) and then the difference between copying
that result into the existing array u[curix,:] or reassining the
reference u to that result should be very similar?

If anything I would have thought the direct assignment would be
quicker since then there is no copying.

What am I missing?

> This should give you a substantial speedup.  Also, I have to say that
> this is begging for weave.blitz, which compiles such statements using
> templated c++ code to avoid temporaries.  It doesn't work on all
> systems, but if it does in your case, here's what your code might look
> like:

If you haven't seen it this page gives useful examples of methods to
speed up python code (incuding weave.blitz), which has Hoyt says would
be ideal in this case:
http://scipy.org/PerformancePython

Robin
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Hoyt Koepke
>for n in range(0,time_milliseconds):
>self.u  =  self.expfac_m  *  self.prev_u +
> (1-self.expfac_m) * self.aff_input[n,:]
>self.v = self.u + self.sigma *
> np.random.standard_normal(size=(1,self.naff))
>self.theta = self.expfac_theta * self.prev_theta -
> (1-self.expfac_theta)
>
>idx_spk = np.where(self.v>=self.theta)
>
>self.S[n,idx_spk] = 1
>self.theta[idx_spk] = self.theta[idx_spk] + self.b
>
>self.prev_u = self.u
>self.prev_theta = self.theta

Copying elements into array objects that already exist will always be
faster than creating a new object with separate data.  However, in
this case, you don't need to do any copying or creation if you use a
flip-flopping index to handle keeping track of the previous. If I drop
the selfs, you can translate the above into (untested):


curidx = 0  # prev will be 2-curidx

u = empty( (2, naff) )
v = empty( naff )
theta = empty( (2, naff) )

stdnormrvs = np.random.standard_normal(size=(time_milliseconds,naff) )

for n in xrange(time_milliseconds):
u[curidx, :] = expfac_m  *  u[2-curidx, :] + (1-expfac_m) * aff_input[n,:]
v[:] = u[curidx, :] + sigma * stdnormrvs[n, :]
theta[curidx, :] = expfac_theta * theta[2-curidx] - (1-expfac_theta)

idx_spk = np.where(v >= theta)
S[n,idx_spk] = 1
theta[curidx, idx_spk] += b

# Flop to handle previous stuff
curidx = 2 - curidx

This should give you a substantial speedup.  Also, I have to say that
this is begging for weave.blitz, which compiles such statements using
templated c++ code to avoid temporaries.  It doesn't work on all
systems, but if it does in your case, here's what your code might look
like:

import scipy.weave as wv

curidx = 0

u = empty( (2, naff) )
v = empty( (1, naff) )
theta = empty( (2, naff) )

stdnormrvs = np.random.standard_normal(size=(time_milliseconds,naff) )

for n in xrange(time_milliseconds):
wv.blitz("u[curidx, :] = expfac_m  *  u[2-curidx, :] +
(1-expfac_m) * aff_input[n,:]")
wv.blitz("v[:] = u[curidx, :] + sigma * stdnormrvs[n, :]")
wv.blitz("theta[curidx, :] = expfac_theta * theta[2-curidx] -
(1-expfac_theta)")

idx_spk = np.where(v >= theta)
S[n,idx_spk] = 1
theta[curidx, idx_spk] += b

# Flop to handle previous stuff
curidx = 2 - curidx



-- 
+++
Hoyt Koepke
UBC Department of Computer Science
http://www.cs.ubc.ca/~hoytak/
[EMAIL PROTECTED]
+++


On Mon, May 19, 2008 at 11:08 AM, James Snyder <[EMAIL PROTECTED]> wrote:
> Hi -
>
> First off, I know that optimization is evil, and I should make sure
> that everything works as expected prior to bothering with squeezing
> out extra performance, but the situation is that this particular block
> of code works, but it is about half as fast with numpy as in matlab,
> and I'm wondering if there's a better approach than what I'm doing.
>
> I have a chunk of code, below, that generally iterates over 2000
> iterations, and the vectors that are being worked on at a given step
> generally have ~14000 elements in them.
>
> In matlab, doing pretty much exactly the same thing takes about 6-7
> seconds, always around 13-14 with numpy on the same machine.  I've
> gotten this on Linux & Mac OS X.
>
> self.aff_input has the bulk of the data in it (2000x14000 array), and
> the various steps are for computing the state of some afferent neurons
> (if there's any interest, here's a paper that includes the model:
> Brandman, R. and Nelson ME (2002) A simple model of long-term spike
> train regularization. Neural Computation 14, 1575-1597.)
>
> I've imported numpy as np.
>
> Is there anything in practice here that could be done to speed this
> up?  I'm looking more for general numpy usage tips, that I can use
> while writing further code and not so things that would be obscure or
> difficult to maintain in the future.
>
> Also, the results of this are a binary array, I'm wondering if there's
> anything more compact for expressing than using 8 bits to represent
> each single bit.  I've poked around, but I haven't come up with any
> clean and unhackish ideas :-)
>
> Thanks!
>
> I can provide the rest of the code if needed, but it's basically just
> filling some vectors with random and empty data and initializing a few
> things.
>
>for n in range(0,time_milliseconds):
>self.u  =  self.expfac_m  *  self.prev_u +
> (1-self.expfac_m) * self.aff_input[n,:]
>self.v = self.u + self.sigma *
> np.random.standard_normal(size=(1,self.naff))
>self.theta = self.expfac_theta * self.prev_theta -
> (1-self.expfac_theta)
>
>idx_spk = np.where(self.v>=self.theta)
>
>self.S[n,idx_spk] = 1
>self.theta[idx_spk] = self.theta[idx_spk] + self.b
>
>self.prev_u = self.u
>self.prev_theta = self.theta
>
> --
> James Snyder
> Biomedical Engineering
> Northwestern University

Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Robin
Also you could use xrange instead of range...

Again, not sure of the size of the effect but it seems to be
recommended by the docstring.

Robin
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Robin
On Mon, May 19, 2008 at 7:08 PM, James Snyder <[EMAIL PROTECTED]> wrote:
>
>for n in range(0,time_milliseconds):
>self.u  =  self.expfac_m  *  self.prev_u +
> (1-self.expfac_m) * self.aff_input[n,:]
>self.v = self.u + self.sigma *
> np.random.standard_normal(size=(1,self.naff))
>self.theta = self.expfac_theta * self.prev_theta -
> (1-self.expfac_theta)
>
>idx_spk = np.where(self.v>=self.theta)
>
>self.S[n,idx_spk] = 1
>self.theta[idx_spk] = self.theta[idx_spk] + self.b
>
>self.prev_u = self.u
>self.prev_theta = self.theta

Hello,

The only thoughts I had were that depending on how 'vectorised'
random.standard_normal is, it might be better to calculate a big block
of random data outside the loop and index it in the same way as the
aff_input. Not certain if the indexing would be faster than the
function call but it could be worth a try if you have enough memory.

The other thing is there are a lot of self.'s there. I don't have a
lot of practicle experience, but I've read (
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#head-aa6c07c46a630a2fa10bd6502510e532806f1f62
) that . based lookups are slower than local variables so another
thing to try would be to rebind everything to a local variable outside
the loop:
u = self.u
v = self.v
etc. which although a bit unsightly actually can make the inner loop
more readable and might speed things up.

The only other thing is be careful with things like this when
translating from matlab:
>self.prev_u = self.u
since this is a reference not a copy of the data. This is OK because
when you recreate u as a product it creates a new object, but if you
changed u in another way ie self.u[:100] = 10 then self.prev_u would
still be pointing to the same array and also reflect those changes.
In this case it doesn't look like you explicitly need the prev_ values
so it's possible you could do the v and theta updates in place
(although I'm not sure if that's quicker)
u *= expfac_m
u += (1-expfac_m)*aff_input.. etc.
Of course you can also take the (1-)'s outside of the loop although
again I'm not sure how much difference it would make.

So sorry I can't give any concrete advise but I hope I've given some ideas...

Cheers

Robin
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread James Snyder
Hi -

First off, I know that optimization is evil, and I should make sure
that everything works as expected prior to bothering with squeezing
out extra performance, but the situation is that this particular block
of code works, but it is about half as fast with numpy as in matlab,
and I'm wondering if there's a better approach than what I'm doing.

I have a chunk of code, below, that generally iterates over 2000
iterations, and the vectors that are being worked on at a given step
generally have ~14000 elements in them.

In matlab, doing pretty much exactly the same thing takes about 6-7
seconds, always around 13-14 with numpy on the same machine.  I've
gotten this on Linux & Mac OS X.

self.aff_input has the bulk of the data in it (2000x14000 array), and
the various steps are for computing the state of some afferent neurons
(if there's any interest, here's a paper that includes the model:
Brandman, R. and Nelson ME (2002) A simple model of long-term spike
train regularization. Neural Computation 14, 1575-1597.)

I've imported numpy as np.

Is there anything in practice here that could be done to speed this
up?  I'm looking more for general numpy usage tips, that I can use
while writing further code and not so things that would be obscure or
difficult to maintain in the future.

Also, the results of this are a binary array, I'm wondering if there's
anything more compact for expressing than using 8 bits to represent
each single bit.  I've poked around, but I haven't come up with any
clean and unhackish ideas :-)

Thanks!

I can provide the rest of the code if needed, but it's basically just
filling some vectors with random and empty data and initializing a few
things.

for n in range(0,time_milliseconds):
self.u  =  self.expfac_m  *  self.prev_u +
(1-self.expfac_m) * self.aff_input[n,:]
self.v = self.u + self.sigma *
np.random.standard_normal(size=(1,self.naff))
self.theta = self.expfac_theta * self.prev_theta -
(1-self.expfac_theta)

idx_spk = np.where(self.v>=self.theta)

self.S[n,idx_spk] = 1
self.theta[idx_spk] = self.theta[idx_spk] + self.b

self.prev_u = self.u
self.prev_theta = self.theta

-- 
James Snyder
Biomedical Engineering
Northwestern University
[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion