Re: [Numpy-discussion] Quick Question about Optimization
James Snyder wrote: > b = np.zeros((1,30)) # allocates new memory and disconnects the view This is really about how python works, not how numpy works: np.zeros() -- creates a new array with all zeros in it -- that's the whole point. b = Something -- binds the name "b" to the Something object. Name binding will never, ever, change the object the name used to be bound to. This has nothing to do with whether the object formally know as "b" is referencing the data from another array. This is a nice write up of the concept of name binding in Python: http://python.net/crew/mwh/hacks/objectthink.html -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
James Snyder wrote: >> Well, if you do f = a[n, :], you would get a view, another object that >> shares the data in memory with a but is a separate object. > > OK, so it is a new object, with the properties of the slice it > references, but if I write anything to it, it will consistently go > back to the same spot in the original array. > > In general, if I work on that, and don't do something that allocates a > new set of memory locations for that, it will reference the same > memory location. > > If I do: > a = np.zeros((20,30)) > b = a[2,:] > > b += 1 # will add 1 to the original slice > > b.resize will fail... > > b = np.zeros((1,30)) # allocates new memory and disconnects the view > > The appropriate way to zero out the original memory locations would be > to do something like b *= 0? or, from fastest to slowest: b.fill(0) b.flat = 0 b[:] = 0 Eric ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
> > Well, if you do f = a[n, :], you would get a view, another object that > shares the data in memory with a but is a separate object. OK, so it is a new object, with the properties of the slice it references, but if I write anything to it, it will consistently go back to the same spot in the original array. In general, if I work on that, and don't do something that allocates a new set of memory locations for that, it will reference the same memory location. If I do: a = np.zeros((20,30)) b = a[2,:] b += 1 # will add 1 to the original slice b.resize will fail... b = np.zeros((1,30)) # allocates new memory and disconnects the view The appropriate way to zero out the original memory locations would be to do something like b *= 0? Is there any way to force a writeback to the original view so long as the dimensions of what is being assigned to b is the same as the original? Or, is there a way to, say, enable a warking if I'm dropping a view? > Sorry I didn't get back to you earlier on this -- I was a bit busy > yesterday. It looks like weave.blitz isn't working on your second > line because you're not explicitly putting slices in some of the > dimensions, In numpy v[0:2] works for 1,2,3,4, dimensions, but > for a 2d array in blitz you have to use v[0:2,:], 3d v[0:2,:,:]. It's > a bit more picky. I think that's the problem with your second line > -- try replacing v[:] with v[0,:] and theta[1-curidx] with > theta[1-curidx, :]. (I may have missed some others.) OK, that seems to do it. I still actually get better performance (subsequent runs after compilation) with the straight numpy code. Strangely, I'm also getting that the flip/flop method is running a bit slower than having the separate prev_ variables. aff_input is rather large (~2000x14000), but the state vectors are only 14000 (or that x2 w/ flipflopping for some), each. Is there slowdown maybe because it is doing those 3 lines of blitz operations then doing a bunch of python numpy? Either way, It seems like I've got pretty good performance as well as a handle on using weave.blitz in the future. -jsnyder -- James Snyder Biomedical Engineering Northwestern University [EMAIL PROTECTED] PGP: http://fanplastic.org/key.txt ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
> The time has now been shaved down to ~9 seconds with this suggestion > from the original 13-14s, with the inclusing of Eric Firing's > suggestions. This is without scipy.weave, which at the moment I can't > get to work for all lines, and when I just replace one of them > sucessfully it seems to run more slowly, I assume because it is > converting data back and forth. There's a nontrivial constant time cost to using scipy.weave.blitz, but it isn't copying the data. Thus it will slow you down on smaller arrays, but you'll notice quite a speed-up on much larger ones. I should have mentioned that earlier -- I assumed your arrays were really large. > Are there any major pitfalls to be aware of? It sounds like if I do: > f = a[n,:] I get a reference, but if I did something like g = a[n,:]*2 > I would get a copy. Well, if you do f = a[n, :], you would get a view, another object that shares the data in memory with a but is a separate object. > If anyone has any clues on why scipy.weave is blowing > (http://pastebin.com/m79699c04) using the code I attached, I wouldn't > mind knowing. Most of the times I've attempted using weave, I've been > a little baffled about why things aren't working. I also don't have a > sense for whether all numpy functions should be "weavable" or if it's > just general array operations that can be put through weave. Sorry I didn't get back to you earlier on this -- I was a bit busy yesterday. It looks like weave.blitz isn't working on your second line because you're not explicitly putting slices in some of the dimensions, In numpy v[0:2] works for 1,2,3,4, dimensions, but for a 2d array in blitz you have to use v[0:2,:], 3d v[0:2,:,:]. It's a bit more picky. I think that's the problem with your second line -- try replacing v[:] with v[0,:] and theta[1-curidx] with theta[1-curidx, :]. (I may have missed some others.) weave.blitz is currently limited to just array operations... it doesn't really support the numpy functions. Hope that helps a little -- Hoyt -- +++ Hoyt Koepke UBC Department of Computer Science http://www.cs.ubc.ca/~hoytak/ [EMAIL PROTECTED] +++ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
I've done a little profiling with cProfile as well as with dtrace since the bindings exist in mac os x, and you can use a lot of the d scripts that apply to python, so previously I've found that the np.random call and the where (in the original code) were heavy hitters as far as amount of time consumed. The time has now been shaved down to ~9 seconds with this suggestion from the original 13-14s, with the inclusing of Eric Firing's suggestions. This is without scipy.weave, which at the moment I can't get to work for all lines, and when I just replace one of them sucessfully it seems to run more slowly, I assume because it is converting data back and forth. Quick question regarding the pointer abstraction that's going on, the following seems to work: np.putmask(S[n,:],np.squeeze(mask),1) with that section of S being worked on. Is it safe to assume in most cases while working with NumPy that without additional operations, aside from indexing, that a reference rather than a copy is being passed? It certainly seems like this sort of thing, including stuff like: u = self.u v = self.v theta = self.theta ... without having to repack those data into self later, since u,v,theta are just references to the existing data saves on code and whatnot, but I'm a little worried about not being explicit. Are there any major pitfalls to be aware of? It sounds like if I do: f = a[n,:] I get a reference, but if I did something like g = a[n,:]*2 I would get a copy. Thanks guys. This is definitely useful, especially in combination with using PyPar on my dual core system I'm getting pretty good performance :-) If anyone has any clues on why scipy.weave is blowing (http://pastebin.com/m79699c04) using the code I attached, I wouldn't mind knowing. Most of the times I've attempted using weave, I've been a little baffled about why things aren't working. I also don't have a sense for whether all numpy functions should be "weavable" or if it's just general array operations that can be put through weave. I know this is the numpy list, so I can take things over to the scipy list if that's more appropriate. On Mon, May 19, 2008 at 7:36 PM, Robert Kern <[EMAIL PROTECTED]> wrote: > On Mon, May 19, 2008 at 6:55 PM, James Snyder <[EMAIL PROTECTED]> wrote: >> Also note, I'm not asking to match MATLAB performance. It'd be nice, >> but again I'm just trying to put together decent, fairly efficient >> numpy code. > > I can cut the time by about a quarter by just using the boolean mask > directly instead of using where(). > >for n in range(0,time_milliseconds): >u = expfac_m * prev_u + (1-expfac_m) * aff_input[n,:] >v = u + sigma * stdnormrvs[n, :] >theta = expfac_theta * prev_theta - (1-expfac_theta) > >mask = (v >= theta) > >S[n,np.squeeze(mask)] = 1 >theta[mask] += b > >prev_u = u >prev_theta = theta > > > There aren't any good line-by-line profiling tools in Python, but you > can fake it by making a local function for each line: > >def f1(): >u = expfac_m * prev_u + (1-expfac_m) * aff_input[n,:] >return u >def f2(): >v = u + sigma * stdnormrvs[n, :] >return v >def f3(): >theta = expfac_theta * prev_theta - (1-expfac_theta) >return theta >def f4(): >mask = (v >= theta) >return mask >def f5(): >S[n,np.squeeze(mask)] = 1 >def f6(): >theta[mask] += b > ># Run Standard, Unoptimized Model >for n in range(0,time_milliseconds): >u = f1() >v = f2() >theta = f3() >mask = f4() >f5() >f6() > >prev_u = u >prev_theta = theta > > I get f6() as being the biggest bottleneck, followed by the general > time spent in the loop (about the same), followed by f5(), f1(), and > f3() (each about half of f6()), followed by f2() (about half of f5()). > f4() is negligible. > > Masked operations are inherently slow. They mess up CPU's branch > prediction. Worse, the use of iterators in that part of the code > frustrates compilers' attempts to optimize that away in the case of > contiguous arrays. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- James Snyder Biomedical Engineering Northwestern University [EMAIL PROTECTED] ___
Re: [Numpy-discussion] Quick Question about Optimization
Robert Kern wrote: > On Mon, May 19, 2008 at 6:55 PM, James Snyder <[EMAIL PROTECTED]> wrote: >> Also note, I'm not asking to match MATLAB performance. It'd be nice, >> but again I'm just trying to put together decent, fairly efficient >> numpy code. > > I can cut the time by about a quarter by just using the boolean mask > directly instead of using where(). > > for n in range(0,time_milliseconds): > u = expfac_m * prev_u + (1-expfac_m) * aff_input[n,:] > v = u + sigma * stdnormrvs[n, :] > theta = expfac_theta * prev_theta - (1-expfac_theta) > > mask = (v >= theta) > > S[n,np.squeeze(mask)] = 1 > theta[mask] += b > > prev_u = u > prev_theta = theta > > > There aren't any good line-by-line profiling tools in Python, but you > can fake it by making a local function for each line: > > def f1(): > u = expfac_m * prev_u + (1-expfac_m) * aff_input[n,:] > return u > def f2(): > v = u + sigma * stdnormrvs[n, :] > return v > def f3(): > theta = expfac_theta * prev_theta - (1-expfac_theta) > return theta > def f4(): > mask = (v >= theta) > return mask > def f5(): > S[n,np.squeeze(mask)] = 1 > def f6(): > theta[mask] += b > > # Run Standard, Unoptimized Model > for n in range(0,time_milliseconds): > u = f1() > v = f2() > theta = f3() > mask = f4() > f5() > f6() > > prev_u = u > prev_theta = theta > > I get f6() as being the biggest bottleneck, followed by the general > time spent in the loop (about the same), followed by f5(), f1(), and > f3() (each about half of f6()), followed by f2() (about half of f5()). > f4() is negligible. > > Masked operations are inherently slow. They mess up CPU's branch > prediction. Worse, the use of iterators in that part of the code > frustrates compilers' attempts to optimize that away in the case of > contiguous arrays. > f6 can be sped up more than a factor of 2 by using putmask: In [10]:xx = np.random.rand(10) In [11]:mask = xx > 0.5 In [12]:timeit xx[mask] += 2.34 100 loops, best of 3: 4.06 ms per loop In [14]:timeit np.putmask(xx, mask, xx+2.34) 1000 loops, best of 3: 1.4 ms per loop I think that xx += 2.34*mask will be similarly quick, but I can't get ipython timeit to work with it. Eric ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
On Mon, May 19, 2008 at 7:30 PM, Charles R Harris <[EMAIL PROTECTED]> wrote: > > On Mon, May 19, 2008 at 5:52 PM, Robert Kern <[EMAIL PROTECTED]> wrote: >> >> On Mon, May 19, 2008 at 6:39 PM, Charles R Harris >> <[EMAIL PROTECTED]> wrote: >> > >> > On Mon, May 19, 2008 at 4:36 PM, Robert Kern <[EMAIL PROTECTED]> >> > wrote: >> >> >> >> On Mon, May 19, 2008 at 5:27 PM, Charles R Harris >> >> <[EMAIL PROTECTED]> wrote: >> >> > The latest versions of Matlab use the ziggurat method to generate >> >> > random >> >> > normals and it is faster than the method used in numpy. I have >> >> > ziggurat >> >> > code >> >> > at hand, but IIRC, Robert doesn't trust the method ;) >> >> >> >> Well, I outlined the tests that would satisfy me, but I don't think >> >> you ever responded. >> >> >> >> >> >> http://projects.scipy.org/pipermail/scipy-dev/2005-December/004405.html >> >> which references >> >> >> >> http://projects.scipy.org/pipermail/scipy-dev/2005-December/004400.html >> > >> > It's been tested in the literature. >> >> And it happened to fail such tests. Hence the Doornik paper which >> improves Marsaglia's method to pass the appropriate tests. > > Exactly. Doornik was more careful about using independent samples and also > used a better random number generator (MWC8222), not exactly rocket > science. Believe it or not, I had read Doornik's paper before I did my > implementation. Good! I believe this is the first time you've mentioned it. > I also used a better ziggurat, IMHO, than Marsaglia and > Doornik. Ah. HOs need testing. dieharder is probably de rigeur these days, and it will accept input from a file. http://www.phy.duke.edu/~rgb/General/dieharder.php -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
On Mon, May 19, 2008 at 6:55 PM, James Snyder <[EMAIL PROTECTED]> wrote: > Also note, I'm not asking to match MATLAB performance. It'd be nice, > but again I'm just trying to put together decent, fairly efficient > numpy code. I can cut the time by about a quarter by just using the boolean mask directly instead of using where(). for n in range(0,time_milliseconds): u = expfac_m * prev_u + (1-expfac_m) * aff_input[n,:] v = u + sigma * stdnormrvs[n, :] theta = expfac_theta * prev_theta - (1-expfac_theta) mask = (v >= theta) S[n,np.squeeze(mask)] = 1 theta[mask] += b prev_u = u prev_theta = theta There aren't any good line-by-line profiling tools in Python, but you can fake it by making a local function for each line: def f1(): u = expfac_m * prev_u + (1-expfac_m) * aff_input[n,:] return u def f2(): v = u + sigma * stdnormrvs[n, :] return v def f3(): theta = expfac_theta * prev_theta - (1-expfac_theta) return theta def f4(): mask = (v >= theta) return mask def f5(): S[n,np.squeeze(mask)] = 1 def f6(): theta[mask] += b # Run Standard, Unoptimized Model for n in range(0,time_milliseconds): u = f1() v = f2() theta = f3() mask = f4() f5() f6() prev_u = u prev_theta = theta I get f6() as being the biggest bottleneck, followed by the general time spent in the loop (about the same), followed by f5(), f1(), and f3() (each about half of f6()), followed by f2() (about half of f5()). f4() is negligible. Masked operations are inherently slow. They mess up CPU's branch prediction. Worse, the use of iterators in that part of the code frustrates compilers' attempts to optimize that away in the case of contiguous arrays. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
On Mon, May 19, 2008 at 5:52 PM, Robert Kern <[EMAIL PROTECTED]> wrote: > On Mon, May 19, 2008 at 6:39 PM, Charles R Harris > <[EMAIL PROTECTED]> wrote: > > > > On Mon, May 19, 2008 at 4:36 PM, Robert Kern <[EMAIL PROTECTED]> > wrote: > >> > >> On Mon, May 19, 2008 at 5:27 PM, Charles R Harris > >> <[EMAIL PROTECTED]> wrote: > >> > The latest versions of Matlab use the ziggurat method to generate > random > >> > normals and it is faster than the method used in numpy. I have > ziggurat > >> > code > >> > at hand, but IIRC, Robert doesn't trust the method ;) > >> > >> Well, I outlined the tests that would satisfy me, but I don't think > >> you ever responded. > >> > >> > http://projects.scipy.org/pipermail/scipy-dev/2005-December/004405.html > >> which references > >> > http://projects.scipy.org/pipermail/scipy-dev/2005-December/004400.html > > > > It's been tested in the literature. > > And it happened to fail such tests. Hence the Doornik paper which > improves Marsaglia's method to pass the appropriate tests. Exactly. Doornik was more careful about using independent samples and also used a better random number generator (MWC8222), not exactly rocket science. Believe it or not, I had read Doornik's paper before I did my implementation. I also used a better ziggurat, IMHO, than Marsaglia and Doornik. MWC8222 is also about twice as fast on AMD hardware as the Mersenne Twister, but does require more careful initialization. On Pentium V they are about they are about the same. I haven't benchmarked either on Core2. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
On to the code, here's a current implementation, attached. I make no claims about it being great code, I've modified it so that there is a weave version and a sans-weave version. Many of the suggestions make things a bit faster. The weave version bombs out with a rather long log, which can be found at: http://pastebin.com/m79699c04 I can tell it's failing for the second weave.blitz line, but I don't understand why exactly. What does this mean?: error: no match for call to '(blitz::FastArrayIterator) (const blitz::TinyVector&)' Also note, I'm not asking to match MATLAB performance. It'd be nice, but again I'm just trying to put together decent, fairly efficient numpy code. On Mon, May 19, 2008 at 3:53 PM, Christopher Barker <[EMAIL PROTECTED]> wrote: > Anne Archibald wrote: >> 2008/5/19 James Snyder <[EMAIL PROTECTED]>: >>> I can provide the rest of the code if needed, but it's basically just >>> filling some vectors with random and empty data and initializing a few >>> things. >> >> It would kind of help, since it would make it clearer what's a scalar >> and what's an array, and what the dimensions of the various arrays >> are. > > It would also help if you provided a complete example (as little code as > possible), so we could try out and time our ideas before suggesting them. > >>> np.random.standard_normal(size=(1,self.naff)) > > Anyone know how fast this is compared to Matlab? That could be the > difference right there. > > -Chris > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > [EMAIL PROTECTED] > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- James Snyder Biomedical Engineering Northwestern University [EMAIL PROTECTED] import os.path as path import sys import time import numpy as np import scipy.weave as wv class AfferentData(object): """Compute afferent response to transdermal potentials""" def __init__(self, b=0.09, sigma=0.04, tau_m=2, aff_input=np.zeros((2000,13857)),runnum=1): super(AfferentData, self).__init__() # Data + dimens self.aff_input = aff_input self.endtime = aff_input.shape[0] self.naff = aff_input.shape[1] self.runnum = runnum # Main Params self.b = b self.sigma = sigma self.tau_m = tau_m rnd_exp = -np.log(np.random.uniform(size=(1,self.naff))) # Implement approach in the future to hold over previous rates self.tau_theta = 21 + 18 * rnd_exp self.expfac_m = np.exp(-1/self.tau_m) self.expfac_theta = np.exp(-1/self.tau_theta) # Set initial states self.prev_u = np.zeros((1,self.naff)) prev_rand = np.random.standard_normal(size=(1,self.naff)) self.prev_theta = 0.053+0.040*prev_rand # Pre-allocation self.u = np.empty((1,self.naff)) self.v = np.empty((1,self.naff)) self.theta = np.empty((1,self.naff)) self.S = np.uint8(np.zeros((self.endtime, self.naff))) def gen_spikes(self,runtype="numpy"): """generate all spikes for available transdermal voltages""" self.runtype=runtype print "Starting Run: %d" % self.runnum t1 = time.time() self.gen_spikes_to_time(self.endtime) t2 = time.time() self.runtime = t2-t1 print 'Run %d took %1.3f s' % ((self.runnum,self.runtime)) def gen_spikes_to_time(self, time_milliseconds): """generate spikes up until a particular time index""" # Unpack Data # I suppose this could be programmatic with local or something else? u = self.u v = self.v theta = self.theta S = self.S aff_input = self.aff_input prev_u = self.prev_u prev_theta = self.prev_theta expfac_theta = self.expfac_theta expfac_m = float(self.expfac_m) sigma = self.sigma naff = self.naff b = self.b stdnormrvs = np.random.standard_normal(size=(time_milliseconds,naff) ) if (self.runtype == "weave"): u = np.empty( (2, naff) ) v = np.empty( (1, naff) ) theta = np.empty( (2, naff) ) curidx = int(0) for n in xrange(time_milliseconds): wv.blitz("u[curidx, :] = expfac_m * u[1-curidx, :] + (1-expfac_m) * aff_input[n,:]") wv.blitz("v[:] = u[curidx, :] + sigma * stdnormrvs[n, :]") wv.blitz("theta[curidx, :] = expfac_theta * theta[1-curidx] - (1-expfac_theta)") idx_spk = np.where(v >= theta)
Re: [Numpy-discussion] Quick Question about Optimization
On Mon, May 19, 2008 at 6:39 PM, Charles R Harris <[EMAIL PROTECTED]> wrote: > > On Mon, May 19, 2008 at 4:36 PM, Robert Kern <[EMAIL PROTECTED]> wrote: >> >> On Mon, May 19, 2008 at 5:27 PM, Charles R Harris >> <[EMAIL PROTECTED]> wrote: >> > The latest versions of Matlab use the ziggurat method to generate random >> > normals and it is faster than the method used in numpy. I have ziggurat >> > code >> > at hand, but IIRC, Robert doesn't trust the method ;) >> >> Well, I outlined the tests that would satisfy me, but I don't think >> you ever responded. >> >> http://projects.scipy.org/pipermail/scipy-dev/2005-December/004405.html >> which references >> http://projects.scipy.org/pipermail/scipy-dev/2005-December/004400.html > > It's been tested in the literature. And it happened to fail such tests. Hence the Doornik paper which improves Marsaglia's method to pass the appropriate tests. Consequently, I want to see the tests performed on the actual implementation before using it. It's a complicated algorithm that is *demonstrably* easy to get wrong. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
On Mon, May 19, 2008 at 4:36 PM, Robert Kern <[EMAIL PROTECTED]> wrote: > On Mon, May 19, 2008 at 5:27 PM, Charles R Harris > <[EMAIL PROTECTED]> wrote: > > The latest versions of Matlab use the ziggurat method to generate random > > normals and it is faster than the method used in numpy. I have ziggurat > code > > at hand, but IIRC, Robert doesn't trust the method ;) > > Well, I outlined the tests that would satisfy me, but I don't think > you ever responded. > > http://projects.scipy.org/pipermail/scipy-dev/2005-December/004405.html > which references > http://projects.scipy.org/pipermail/scipy-dev/2005-December/004400.html > It's been tested in the literature. It's basically just sampling a Gaussian but done so most of the tests for being under the curve are trivial and there are few misses, i.e., the Gaussian is covered with a stack (ziggurat) of slices of equal areas, each slice is randomly chosen, then the position along the slice is randomly chosen. Most of those last points will be under the curve except at the ends, and it is those last that require computation. However, like all sampling it has to be carefully implemented and the samples are discretized differently than for the current way. Floats are strange that way because they are on a log scale. The tails will be fine, the real question is how much precision you want when doubles are returned, i.e., how fine the discetization of the resulting samples should be. The same method also works for the exponential distribution. I don't feel this is a pressing issue, when I need fast normals I use my own code. But if we are in competition with Matlab maybe we should give it a shot. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
Separating the response into 2 emails, here's the aspect that comes from implementations of random: In short, that's part of the difference. I ran these a few times to check for consistency. MATLAB (R2008a: tic for i = 1:2000 a = randn(1,13857); end toc Runtime: ~0.733489 s NumPy (1.1.0rc1): import numpy as np import time t1 = time.time() for n in xrange(0,2000): a = np.random.standard_normal(size=(1,14000)) t2 = time.time() print 'Runtime: %1.3f s' % ((t2-t1)) Runtime: ~2.716 s On Mon, May 19, 2008 at 3:53 PM, Christopher Barker <[EMAIL PROTECTED]> wrote: > Anne Archibald wrote: >> 2008/5/19 James Snyder <[EMAIL PROTECTED]>: >>> I can provide the rest of the code if needed, but it's basically just >>> filling some vectors with random and empty data and initializing a few >>> things. >> >> It would kind of help, since it would make it clearer what's a scalar >> and what's an array, and what the dimensions of the various arrays >> are. > > It would also help if you provided a complete example (as little code as > possible), so we could try out and time our ideas before suggesting them. > >>> np.random.standard_normal(size=(1,self.naff)) > > Anyone know how fast this is compared to Matlab? That could be the > difference right there. > > -Chris > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > [EMAIL PROTECTED] > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- James Snyder Biomedical Engineering Northwestern University [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
On Mon, May 19, 2008 at 5:27 PM, Charles R Harris <[EMAIL PROTECTED]> wrote: > The latest versions of Matlab use the ziggurat method to generate random > normals and it is faster than the method used in numpy. I have ziggurat code > at hand, but IIRC, Robert doesn't trust the method ;) Well, I outlined the tests that would satisfy me, but I don't think you ever responded. http://projects.scipy.org/pipermail/scipy-dev/2005-December/004405.html which references http://projects.scipy.org/pipermail/scipy-dev/2005-December/004400.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
On Mon, May 19, 2008 at 2:53 PM, Christopher Barker <[EMAIL PROTECTED]> wrote: > Anne Archibald wrote: > > 2008/5/19 James Snyder <[EMAIL PROTECTED]>: > >> I can provide the rest of the code if needed, but it's basically just > >> filling some vectors with random and empty data and initializing a few > >> things. > > > > It would kind of help, since it would make it clearer what's a scalar > > and what's an array, and what the dimensions of the various arrays > > are. > > It would also help if you provided a complete example (as little code as > possible), so we could try out and time our ideas before suggesting them. > > >> np.random.standard_normal(size=(1,self.naff)) > > Anyone know how fast this is compared to Matlab? That could be the > difference right there. > The latest versions of Matlab use the ziggurat method to generate random normals and it is faster than the method used in numpy. I have ziggurat code at hand, but IIRC, Robert doesn't trust the method ;) I don't know if it would actually speed things up, though. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
Anne Archibald wrote: > 2008/5/19 James Snyder <[EMAIL PROTECTED]>: >> I can provide the rest of the code if needed, but it's basically just >> filling some vectors with random and empty data and initializing a few >> things. > > It would kind of help, since it would make it clearer what's a scalar > and what's an array, and what the dimensions of the various arrays > are. It would also help if you provided a complete example (as little code as possible), so we could try out and time our ideas before suggesting them. >> np.random.standard_normal(size=(1,self.naff)) Anyone know how fast this is compared to Matlab? That could be the difference right there. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
On Mon, May 19, 2008 at 12:53 PM, Robin <[EMAIL PROTECTED]> wrote: > Hi, > > I think my understanding is somehow incomplete... It's not clear to me > why (simplified case) > > a[curidx,:] = scalar * a[2-curidx,:] > should be faster than > a = scalar * b > > In both cases I thought the scalar multiplication results in a new > array (new memory allocated) and then the difference between copying > that result into the existing array u[curix,:] or reassining the > reference u to that result should be very similar? > > If anything I would have thought the direct assignment would be > quicker since then there is no copying. > > What am I missing? Actually, I think you are correct. My bad. I was mainly thinking in terms of weave.blitz, where it would make a difference, then translating back... --Hoyt +++ Hoyt Koepke UBC Department of Computer Science http://www.cs.ubc.ca/~hoytak/ [EMAIL PROTECTED] +++ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
Robin wrote: > Also you could use xrange instead of range... > > Again, not sure of the size of the effect but it seems to be > recommended by the docstring. No, it is going away in Python 3.0, and its only real benefit is a memory saving in extreme cases. From the Python library docs: "The advantage of xrange() over range() is minimal (since xrange() still has to create the values when asked for them) except when a very large range is used on a memory-starved machine or when all of the range's elements are never used (such as when the loop is usually terminated with break)." Eric ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
2008/5/19 James Snyder <[EMAIL PROTECTED]>: > First off, I know that optimization is evil, and I should make sure > that everything works as expected prior to bothering with squeezing > out extra performance, but the situation is that this particular block > of code works, but it is about half as fast with numpy as in matlab, > and I'm wondering if there's a better approach than what I'm doing. > > I have a chunk of code, below, that generally iterates over 2000 > iterations, and the vectors that are being worked on at a given step > generally have ~14000 elements in them. With arrays this size, I wouldn't worry about python overhead - things like range versus xrange or self lookups. > Is there anything in practice here that could be done to speed this > up? I'm looking more for general numpy usage tips, that I can use > while writing further code and not so things that would be obscure or > difficult to maintain in the future. Try using a profiler to find which steps are using most of your time. With such a simple function it may not be very informative, but it's worth a try. > Also, the results of this are a binary array, I'm wondering if there's > anything more compact for expressing than using 8 bits to represent > each single bit. I've poked around, but I haven't come up with any > clean and unhackish ideas :-) There's a tradeoff between compactness and speed here. The *fastest* is probably one boolean per 32-bit integer. It sounds awful, I know, but most modern CPUs have to work harder to access bytes individually than they do to access them four at a time. On the other hand, cache performance can make a huge difference, so compactness might actually amount to speed. I don't think numpy has a packed bit array data type (which is a shame, but would require substantial implementation effort). > I can provide the rest of the code if needed, but it's basically just > filling some vectors with random and empty data and initializing a few > things. It would kind of help, since it would make it clearer what's a scalar and what's an array, and what the dimensions of the various arrays are. >for n in range(0,time_milliseconds): >self.u = self.expfac_m * self.prev_u + > (1-self.expfac_m) * self.aff_input[n,:] >self.v = self.u + self.sigma * > np.random.standard_normal(size=(1,self.naff)) You can use "scale" to rescale the random numbers on creation; that'll save you a temporary. >self.theta = self.expfac_theta * self.prev_theta - > (1-self.expfac_theta) > >idx_spk = np.where(self.v>=self.theta) You can probably skip the "where"; the result of the expression self.v>=self.theta is a boolean array, which you can use directly for indexing. >self.S[n,idx_spk] = 1 >self.theta[idx_spk] = self.theta[idx_spk] + self.b += here might speed things up, not just in terms of temporaries but by saving a fancy-indexing operation. >self.prev_u = self.u >self.prev_theta = self.theta Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
Hi, I think my understanding is somehow incomplete... It's not clear to me why (simplified case) a[curidx,:] = scalar * a[2-curidx,:] should be faster than a = scalar * b In both cases I thought the scalar multiplication results in a new array (new memory allocated) and then the difference between copying that result into the existing array u[curix,:] or reassining the reference u to that result should be very similar? If anything I would have thought the direct assignment would be quicker since then there is no copying. What am I missing? > This should give you a substantial speedup. Also, I have to say that > this is begging for weave.blitz, which compiles such statements using > templated c++ code to avoid temporaries. It doesn't work on all > systems, but if it does in your case, here's what your code might look > like: If you haven't seen it this page gives useful examples of methods to speed up python code (incuding weave.blitz), which has Hoyt says would be ideal in this case: http://scipy.org/PerformancePython Robin ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
>for n in range(0,time_milliseconds): >self.u = self.expfac_m * self.prev_u + > (1-self.expfac_m) * self.aff_input[n,:] >self.v = self.u + self.sigma * > np.random.standard_normal(size=(1,self.naff)) >self.theta = self.expfac_theta * self.prev_theta - > (1-self.expfac_theta) > >idx_spk = np.where(self.v>=self.theta) > >self.S[n,idx_spk] = 1 >self.theta[idx_spk] = self.theta[idx_spk] + self.b > >self.prev_u = self.u >self.prev_theta = self.theta Copying elements into array objects that already exist will always be faster than creating a new object with separate data. However, in this case, you don't need to do any copying or creation if you use a flip-flopping index to handle keeping track of the previous. If I drop the selfs, you can translate the above into (untested): curidx = 0 # prev will be 2-curidx u = empty( (2, naff) ) v = empty( naff ) theta = empty( (2, naff) ) stdnormrvs = np.random.standard_normal(size=(time_milliseconds,naff) ) for n in xrange(time_milliseconds): u[curidx, :] = expfac_m * u[2-curidx, :] + (1-expfac_m) * aff_input[n,:] v[:] = u[curidx, :] + sigma * stdnormrvs[n, :] theta[curidx, :] = expfac_theta * theta[2-curidx] - (1-expfac_theta) idx_spk = np.where(v >= theta) S[n,idx_spk] = 1 theta[curidx, idx_spk] += b # Flop to handle previous stuff curidx = 2 - curidx This should give you a substantial speedup. Also, I have to say that this is begging for weave.blitz, which compiles such statements using templated c++ code to avoid temporaries. It doesn't work on all systems, but if it does in your case, here's what your code might look like: import scipy.weave as wv curidx = 0 u = empty( (2, naff) ) v = empty( (1, naff) ) theta = empty( (2, naff) ) stdnormrvs = np.random.standard_normal(size=(time_milliseconds,naff) ) for n in xrange(time_milliseconds): wv.blitz("u[curidx, :] = expfac_m * u[2-curidx, :] + (1-expfac_m) * aff_input[n,:]") wv.blitz("v[:] = u[curidx, :] + sigma * stdnormrvs[n, :]") wv.blitz("theta[curidx, :] = expfac_theta * theta[2-curidx] - (1-expfac_theta)") idx_spk = np.where(v >= theta) S[n,idx_spk] = 1 theta[curidx, idx_spk] += b # Flop to handle previous stuff curidx = 2 - curidx -- +++ Hoyt Koepke UBC Department of Computer Science http://www.cs.ubc.ca/~hoytak/ [EMAIL PROTECTED] +++ On Mon, May 19, 2008 at 11:08 AM, James Snyder <[EMAIL PROTECTED]> wrote: > Hi - > > First off, I know that optimization is evil, and I should make sure > that everything works as expected prior to bothering with squeezing > out extra performance, but the situation is that this particular block > of code works, but it is about half as fast with numpy as in matlab, > and I'm wondering if there's a better approach than what I'm doing. > > I have a chunk of code, below, that generally iterates over 2000 > iterations, and the vectors that are being worked on at a given step > generally have ~14000 elements in them. > > In matlab, doing pretty much exactly the same thing takes about 6-7 > seconds, always around 13-14 with numpy on the same machine. I've > gotten this on Linux & Mac OS X. > > self.aff_input has the bulk of the data in it (2000x14000 array), and > the various steps are for computing the state of some afferent neurons > (if there's any interest, here's a paper that includes the model: > Brandman, R. and Nelson ME (2002) A simple model of long-term spike > train regularization. Neural Computation 14, 1575-1597.) > > I've imported numpy as np. > > Is there anything in practice here that could be done to speed this > up? I'm looking more for general numpy usage tips, that I can use > while writing further code and not so things that would be obscure or > difficult to maintain in the future. > > Also, the results of this are a binary array, I'm wondering if there's > anything more compact for expressing than using 8 bits to represent > each single bit. I've poked around, but I haven't come up with any > clean and unhackish ideas :-) > > Thanks! > > I can provide the rest of the code if needed, but it's basically just > filling some vectors with random and empty data and initializing a few > things. > >for n in range(0,time_milliseconds): >self.u = self.expfac_m * self.prev_u + > (1-self.expfac_m) * self.aff_input[n,:] >self.v = self.u + self.sigma * > np.random.standard_normal(size=(1,self.naff)) >self.theta = self.expfac_theta * self.prev_theta - > (1-self.expfac_theta) > >idx_spk = np.where(self.v>=self.theta) > >self.S[n,idx_spk] = 1 >self.theta[idx_spk] = self.theta[idx_spk] + self.b > >self.prev_u = self.u >self.prev_theta = self.theta > > -- > James Snyder > Biomedical Engineering > Northwestern University
Re: [Numpy-discussion] Quick Question about Optimization
Also you could use xrange instead of range... Again, not sure of the size of the effect but it seems to be recommended by the docstring. Robin ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
On Mon, May 19, 2008 at 7:08 PM, James Snyder <[EMAIL PROTECTED]> wrote: > >for n in range(0,time_milliseconds): >self.u = self.expfac_m * self.prev_u + > (1-self.expfac_m) * self.aff_input[n,:] >self.v = self.u + self.sigma * > np.random.standard_normal(size=(1,self.naff)) >self.theta = self.expfac_theta * self.prev_theta - > (1-self.expfac_theta) > >idx_spk = np.where(self.v>=self.theta) > >self.S[n,idx_spk] = 1 >self.theta[idx_spk] = self.theta[idx_spk] + self.b > >self.prev_u = self.u >self.prev_theta = self.theta Hello, The only thoughts I had were that depending on how 'vectorised' random.standard_normal is, it might be better to calculate a big block of random data outside the loop and index it in the same way as the aff_input. Not certain if the indexing would be faster than the function call but it could be worth a try if you have enough memory. The other thing is there are a lot of self.'s there. I don't have a lot of practicle experience, but I've read ( http://wiki.python.org/moin/PythonSpeed/PerformanceTips#head-aa6c07c46a630a2fa10bd6502510e532806f1f62 ) that . based lookups are slower than local variables so another thing to try would be to rebind everything to a local variable outside the loop: u = self.u v = self.v etc. which although a bit unsightly actually can make the inner loop more readable and might speed things up. The only other thing is be careful with things like this when translating from matlab: >self.prev_u = self.u since this is a reference not a copy of the data. This is OK because when you recreate u as a product it creates a new object, but if you changed u in another way ie self.u[:100] = 10 then self.prev_u would still be pointing to the same array and also reflect those changes. In this case it doesn't look like you explicitly need the prev_ values so it's possible you could do the v and theta updates in place (although I'm not sure if that's quicker) u *= expfac_m u += (1-expfac_m)*aff_input.. etc. Of course you can also take the (1-)'s outside of the loop although again I'm not sure how much difference it would make. So sorry I can't give any concrete advise but I hope I've given some ideas... Cheers Robin ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Quick Question about Optimization
Hi - First off, I know that optimization is evil, and I should make sure that everything works as expected prior to bothering with squeezing out extra performance, but the situation is that this particular block of code works, but it is about half as fast with numpy as in matlab, and I'm wondering if there's a better approach than what I'm doing. I have a chunk of code, below, that generally iterates over 2000 iterations, and the vectors that are being worked on at a given step generally have ~14000 elements in them. In matlab, doing pretty much exactly the same thing takes about 6-7 seconds, always around 13-14 with numpy on the same machine. I've gotten this on Linux & Mac OS X. self.aff_input has the bulk of the data in it (2000x14000 array), and the various steps are for computing the state of some afferent neurons (if there's any interest, here's a paper that includes the model: Brandman, R. and Nelson ME (2002) A simple model of long-term spike train regularization. Neural Computation 14, 1575-1597.) I've imported numpy as np. Is there anything in practice here that could be done to speed this up? I'm looking more for general numpy usage tips, that I can use while writing further code and not so things that would be obscure or difficult to maintain in the future. Also, the results of this are a binary array, I'm wondering if there's anything more compact for expressing than using 8 bits to represent each single bit. I've poked around, but I haven't come up with any clean and unhackish ideas :-) Thanks! I can provide the rest of the code if needed, but it's basically just filling some vectors with random and empty data and initializing a few things. for n in range(0,time_milliseconds): self.u = self.expfac_m * self.prev_u + (1-self.expfac_m) * self.aff_input[n,:] self.v = self.u + self.sigma * np.random.standard_normal(size=(1,self.naff)) self.theta = self.expfac_theta * self.prev_theta - (1-self.expfac_theta) idx_spk = np.where(self.v>=self.theta) self.S[n,idx_spk] = 1 self.theta[idx_spk] = self.theta[idx_spk] + self.b self.prev_u = self.u self.prev_theta = self.theta -- James Snyder Biomedical Engineering Northwestern University [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion