Re: [Numpy-discussion] Broadcasting question
2008/5/2 Chris.Barker [EMAIL PROTECTED]: Hi all, I have a n X m X 3 array, and I a n X M array. I want to assign the values in the n X m to all three of the slices in the bigger array: A1 = np.zeros((5,4,3)) A2 = np.ones((5,4)) A1[:,:,0] = A2 A1[:,:,1] = A2 A1[:,:,2] = A2 However,it seems I should be able to broadcast that, so I don't have to repeat myself, but I can't figure out how. All you need is a newaxis on A2: In [2]: A = np.zeros((3,4)) In [3]: B = np.ones(3) In [4]: A[:,:] = B[:,np.newaxis] In [5]: A Out[5]: array([[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] FW: HELP: PROBLEMS WITH VPYTHON, NUMPY AND PY2EXE PART 2
Does anyone could help me generate an executable program that functions for the bounce.py with py2exe? Thank you bounce.py: from visual import * from numpy import * ball = sphere(pos=(0,4,0), color=color.red) example = zeros((10,10),dtype=float) print (example) setup.py: from distutils.core import setup import py2exe opts = { 'py2exe': {'packages':['numarray'] } } setup(windows=[{script : bounce.py}], options=opts) _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Broadcasting question
Charles R Harris wrote: A1 += A2[:,:,newaxis] is one way. Exactly what I was looking for -- thanks Charles (and Ann) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] a possible way to implement a plogin system
On 5/1/08, David Cournapeau [EMAIL PROTECTED] wrote: On Wed, 2008-04-30 at 16:44 -0300, Lisandro Dalcin wrote: David, in order to put clear what I was proposing to you in previous mail regarding to implementing plugin systems for numpy, please take a look at the attached tarball. Thanks for looking at this Lisandro. The problem I see with the approach of struct vs direct function pointers is of ABI compatibility: it is easy to mess things up with structures. That's a very valid point. There is the advantage of using only one dlsym (or equivalent) with the struct, which may be much faster than using hundreds of dlsym for each function. Linux does not seem to have problems with that, but mac os X for example felt slow when I tried doing this for several thousand functions. I did not go really far on that, though. I don't really see why using a struct is cleaner, though. That's really the same thing (function pointers), in both cases the name namespace pollution will be the same, and in both cases there will be a need to generate source code. Perhaps cleaner is not the right word. I actually believe that is far more portable, regarding the oddities of dlopening in different platforms. Concerning the loading mechanism, I don't understand the point of using PyCObject_Import. By quickly looking at the code of the function, the plugin needs to be a python module in that case, which is not needed in our case; I don't like the fact that it is not documented either. Well, that is the same mechanism NumPy uses for exporting its C API to extensions modules. Look at generated header '__multiarray_api.h' in function '_import_array()' in your numpy installation. This is the standard, portable, and (as Python doc says) recommended way to expose C API's from extensions modules. The code for dlopen/dlclose is really short. For each platform, it is like 50 lines of code, and we have a better control on what we can do (which may be needed; for example, you want want to load symbols from a dll A, but the symbols are only in dll B, which dll A depends on; that's a problem that does not happen for python extensions, I think; I don't really know to be honest). Please note that all my concerns about recommending you not to use dlopen, is just to save you from future headaches!!!. See yourself at row 17 of table here http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html . And that table does not cover stuff like RTLD_GLOBAL flags to dlopen (or equivalent). -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Combining Sigmoid Curves
When I last visited I was given excellent advice about Gaussian and other bell-shaped curves. Upon further reflection I realized that the Gaussian curves will not do; the curve does need to have y=0.0 at each end. I tried to apply a Beta distribution, but I cannot correlate the alpha and beta parameters with the curve characteristics that I have. What will work (I call it a pi curve) is a matched pair of sigmoid curves, the ascending curve on the left and the descending curve on the right. Using the Boltzmann function for these I can calculate and plot each individually, but I'm having difficulty in appending the x and y values across the entire range. This is where I would appreciate your assistance. The curve parameters passed to the function are the left end, right end, and midpoint. The inflection point (where y = 0.5) is half-way between the ends and the midpoint. What I have in the function is this: def piCurve(ll, lr, mid): flexL = (mid - ll)/2.0 flexR = (lr - mid)/2.0 tau = (mid - ll)/10.0 x = [] y = [] xL = nx.arange(ll,mid,0.1) for i in xL: x.append(xL[i]) yL = 1.0 / (1.0 + nx.exp(-(xL-flexL)/tau)) for j in yL: y.append(yL[j]) xR = nx.arange(mid,lr,0.1) for i in xR: x.append(xR[i]) yR = 1 - (1.0 / (1.0 + nx.exp(-(xR-flexR)/tau))) for j in yR: y.append(yR[j]) appData.plotX = x appData.plotY = y Python complains about adding to the list: yL = 1.0 / (1.0 + nx.exp(-(x-flexL)/tau)) TypeError: unsupported operand type(s) for -: 'list' and 'float' What is the appropriate way to generate two sigmoid curves so that the x values range from the left end to the right end and the y values rise from 0.0 to 1.0 at the midpoint, then lower to 0.0 again? Rich ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
2008/5/2 Rich Shepard [EMAIL PROTECTED]: When I last visited I was given excellent advice about Gaussian and other bell-shaped curves. Upon further reflection I realized that the Gaussian curves will not do; the curve does need to have y=0.0 at each end. I tried to apply a Beta distribution, but I cannot correlate the alpha and beta parameters with the curve characteristics that I have. What will work (I call it a pi curve) is a matched pair of sigmoid curves, the ascending curve on the left and the descending curve on the right. Using the Boltzmann function for these I can calculate and plot each individually, but I'm having difficulty in appending the x and y values across the entire range. This is where I would appreciate your assistance. The curve parameters passed to the function are the left end, right end, and midpoint. The inflection point (where y = 0.5) is half-way between the ends and the midpoint. What I have in the function is this: def piCurve(ll, lr, mid): flexL = (mid - ll)/2.0 flexR = (lr - mid)/2.0 tau = (mid - ll)/10.0 x = [] y = [] xL = nx.arange(ll,mid,0.1) for i in xL: x.append(xL[i]) yL = 1.0 / (1.0 + nx.exp(-(xL-flexL)/tau)) for j in yL: y.append(yL[j]) xR = nx.arange(mid,lr,0.1) for i in xR: x.append(xR[i]) yR = 1 - (1.0 / (1.0 + nx.exp(-(xR-flexR)/tau))) for j in yR: y.append(yR[j]) appData.plotX = x appData.plotY = y Python complains about adding to the list: yL = 1.0 / (1.0 + nx.exp(-(x-flexL)/tau)) TypeError: unsupported operand type(s) for -: 'list' and 'float' What is the appropriate way to generate two sigmoid curves so that the x values range from the left end to the right end and the y values rise from 0.0 to 1.0 at the midpoint, then lower to 0.0 again? How about multiplying two Boltzmann terms together, ala: f(x) = 1/(1+exp(-(x-flex1)/tau1)) * 1/(1+exp((x-flex2)/tau2)) You'll find if your two flexion points get too close together, the peak will drop below the maximum for each individual curve, but the transition will be continuous. Angus. -- AJC McMorland, PhD candidate Physiology, University of Auckland (Nearly) post-doctoral research fellow Neurobiology, University of Pittsburgh ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
On Fri, 2 May 2008, Angus McMorland wrote: How about multiplying two Boltzmann terms together, ala: f(x) = 1/(1+exp(-(x-flex1)/tau1)) * 1/(1+exp((x-flex2)/tau2)) You'll find if your two flexion points get too close together, the peak will drop below the maximum for each individual curve, but the transition will be continuous. Angus, With an x range from 0.0-100.0 (and the flexion points at 25.0 and 75.0), the above formula provides a nice bell-shaped curve from x=0.0 to x=50.0, and a maximum y of only 0.25 rather than 2.0. Modifying the above so the second term is subtracted from 1 before the multiplication, or by negating the exponent in the second term, yields only the first half: the ascending 'S' curve from 0-50. Thanks, Rich -- Richard B. Shepard, Ph.D. | IntegrityCredibility Applied Ecosystem Services, Inc.|Innovation http://www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
Rich, this could use some serious vectorization/numpyification! Poke around the scipy Wiki and whatever other tutorials you can find -- you'll be glad you did. A hint: When you are writing a loop like: for i in xL: x.append(xL[i]) You should be doing array operations! Specifics: xL = nx.arange(ll,mid,0.1) # you've now created an array of your X values for i in xL: x.append(xL[i]) #this dumps them into a list - why not keep them an array? # if you really need an array (you don't here), then you can use: xL.tolist() That's why you get this error: TypeError: unsupported operand type(s) for -: 'list' and 'float' you're trying to do array operations on a list. Also, you probably want np.linspace, rather than arange, it's a better option for floats. Here is my version: import numpy as np ll, lr, mid = (-10, 10, 0) flexL = (mid - ll)/2.0 flexR = (lr - mid)/2.0 tau = (mid - ll)/10.0 xL = np.linspace(ll, mid, 5) yL = 1.0 / (1.0 + np.exp(-(xL-flexL)/tau)) print xL print yL xR = np.linspace(mid, lr, 5) yR = 1 - (1.0 / (1.0 + np.exp(-(xR-flexR)/tau))) print xR print yR # now put them together: x = np.hstack((xL, xR[1:])) # don't want to duplicate the midpoint y = np.hstack((yL, yR[1:])) print x print y Though it doesn't look like the numbers are right. Also, you don't need to create the separate left and right arraysand put them together, slicing gives a view, so you could do: numpoints = 11 # should be an odd number to get the midpoint x = linspace(ll,lr, numpoints) y = zeros_like(x) xL = x[:numpoints/2] yL = y[:numpoints/2] yL[:] = 1.0 / (1.0 + np.exp(-(xL-flexL)/tau)) you could also use something like: np.where(x0, .) left as an exercise for the reader... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
On Fri, 2 May 2008, Christopher Barker wrote: this could use some serious vectorization/numpyification! Poke around the scipy Wiki and whatever other tutorials you can find -- you'll be glad you did. A hint: When you are writing a loop like: for i in xL: x.append(xL[i]) You should be doing array operations! Chris, Good suggestions. I need to append the two numpy arrays so I return all x values in a single list and all y values in another list. I'll read the numpy book again. That's why you get this error: TypeError: unsupported operand type(s) for -: 'list' and 'float' you're trying to do array operations on a list. Ah, so! Also, you probably want np.linspace, rather than arange, it's a better option for floats. arange() has worked just fine in other functions. # now put them together: x = np.hstack((xL, xR[1:])) # don't want to duplicate the midpoint y = np.hstack((yL, yR[1:])) Also, you don't need to create the separate left and right arraysand put them together, slicing gives a view, so you could do: I assume that there's a way to mate the two curves in a single equation. I've not been able to find what that is. Thanks, Rich ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [IT] Maintenance and scheduled downtime this evening and weekend
Hi everyone, This evening and this weekend, we will be doing a major overhaul of Enthought's internal network infrastructure. We will be cleaning up a large amount of legacy structure and transitioning to a more maintainable, better documented configuration. We have planned the work so that externally-facing servers will experience a minimum of downtime. In the event of unforeseen difficulties that cause outages to extend beyond the times given below, we will update the network status page, located at http://dr0.enthought.com/status/ . This page will remain available and be unaffected by any network outages. Downtimes: Friday May 2, 2008, 8pm - 10pm Central time (= 9pm - 11pm Eastern time) (= 1am - 3am UTC Sat. May 3) Saturday May 2, 2008, 10am - 11am Central time (= 11am - 12 noon Eastern time) (= 3pm - 4pm UTC) To reach us during the downtime, please use the contact information provided on the network status page. Please let me know if you have any questions or concerns. We will send out another email once the outage is complete. Thanks for your patience! -Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
2008/5/2 Rich Shepard [EMAIL PROTECTED]: What will work (I call it a pi curve) is a matched pair of sigmoid curves, the ascending curve on the left and the descending curve on the right. Using the Boltzmann function for these I can calculate and plot each individually, but I'm having difficulty in appending the x and y values across the entire range. This is where I would appreciate your assistance. It's better not to work point-by-point, appending things, when working with numpy. Ideally you could find a formula which just produced the right curve, and then you'd apply it to the input vector and get the output vector all at once. For example for a Gaussian (which you're not using, I know), you'd write a function something like def f(x): return np.exp(-x**2) and then call it like: f(np.linspace(-1,1,1000)). This is efficient and clear. It does not seem to me that the logistic function, 1/(1+np.exp(x)) does quite what you want. How about using the cosine? def f(left, right, x): scaled_x = (x-(right+left)/2)/((right-left)/2) return (1+np.cos((np.pi/2) * scaled_x))/2 exactly zero at both endpoints, exactly one at the midpoint, inflection points midway between, where the value is 1/2. If you want to normalize it so that the area underneath is one, that's easy to do. More generally, the trick of producing a scaled_x as above lets you move any function anywhere you like. If you want the peak not to be at the midpoint of the interval, you will need to do something a litle more clever, perhaps choosing a different function, scaling the x values with a quadratic polynomial so that (left, middle, right) becomes (-1,0,1), or using a piecewise function. Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
On Fri, 2 May 2008, Anne Archibald wrote: It's better not to work point-by-point, appending things, when working with numpy. Ideally you could find a formula which just produced the right curve, and then you'd apply it to the input vector and get the output vector all at once. Anne, That's been my goal. :-) How about using the cosine? def f(left, right, x): scaled_x = (x-(right+left)/2)/((right-left)/2) return (1+np.cos((np.pi/2) * scaled_x))/2 exactly zero at both endpoints, exactly one at the midpoint, inflection points midway between, where the value is 1/2. If you want to normalize it so that the area underneath is one, that's easy to do. More generally, the trick of producing a scaled_x as above lets you move any function anywhere you like. This looks like a pragmatic solution. When I print scaled_x (using left = 0.0 and right = 100.0), the values range from -1.0 to +0.998. So, I need to figure out the scale_x that sets the end points at 0 and 100. Thanks very much, Rich ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
2008/5/2 Rich Shepard [EMAIL PROTECTED]: On Fri, 2 May 2008, Anne Archibald wrote: It's better not to work point-by-point, appending things, when working with numpy. Ideally you could find a formula which just produced the right curve, and then you'd apply it to the input vector and get the output vector all at once. Anne, That's been my goal. :-) How about using the cosine? def f(left, right, x): scaled_x = (x-(right+left)/2)/((right-left)/2) return (1+np.cos((np.pi/2) * scaled_x))/2 exactly zero at both endpoints, exactly one at the midpoint, inflection points midway between, where the value is 1/2. If you want to normalize it so that the area underneath is one, that's easy to do. More generally, the trick of producing a scaled_x as above lets you move any function anywhere you like. This looks like a pragmatic solution. When I print scaled_x (using left = 0.0 and right = 100.0), the values range from -1.0 to +0.998. So, I need to figure out the scale_x that sets the end points at 0 and 100. No, no. You *want* scaled_x to range from -1 to 1. (The 0.998 is because you didn't include the endpoint, 100.) The function I gave, (1+np.cos((np.pi/2) * scaled_x))/2, takes [-1, 1] to a nice bump-shaped function. If you feed in numbers from 0 to 100 as x, they get transformed to scaled_x, and you feed them to the function, getting a result that goes from 0 at x=0 to 1 at x=50 to 0 at x=100. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-dev] [IT] Maintenance and scheduled downtime this evening and weekend
Will this effect SVN and Trac access? Thanks! Damian Peter Wang wrote: Hi everyone, This evening and this weekend, we will be doing a major overhaul of Enthought's internal network infrastructure. We will be cleaning up a large amount of legacy structure and transitioning to a more maintainable, better documented configuration. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-dev] [IT] Maintenance and scheduled downtime this evening and weekend
On May 2, 2008, at 5:45 PM, Damian Eads wrote: Will this effect SVN and Trac access? Thanks! Damian Yes, this affects svn, trac, web, mailing lists,...everything, because we will be working on the underlying network infrastructure. -Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
Anne Archibald wrote: 2008/5/2 Rich Shepard [EMAIL PROTECTED]: No, no. You *want* scaled_x to range from -1 to 1. Why not just scale to -pi to pi right there? (The 0.998 is because you didn't include the endpoint, 100.) Which is why you want linspace, rather than arange. Really, trust me on this! If the cosine curve isn't right for you, a little create use of np.where would let you do what you want in maybe another line or two of code. Something like: y = np.where( x 0 , Leftfun(x), Rightfun(x) ) or just compute one side and then flip it to make the other side: y = np.zeros_like(x) y[center:] = fun(x[center:]) y[:center] = y[center+1:][::-1] # don't want the center point twice if it's symetric anyway. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
On Fri, 2 May 2008, Christopher Barker wrote: Why not just scale to -pi to pi right there? Dunno, Chris. As I wrote to Anne (including a couple of files and the resulting plot), it's been almost three decades since I dealt with the math underlying distribution functions. Which is why you want linspace, rather than arange. Really, trust me on this! I see the difference in the book. Didn't know about linspace(), but adding True brings the end point to 100.0 If the cosine curve isn't right for you, a little create use of np.where would let you do what you want in maybe another line or two of code. Something like: y = np.where( x 0 , Leftfun(x), Rightfun(x) ) or just compute one side and then flip it to make the other side: y = np.zeros_like(x) y[center:] = fun(x[center:]) y[:center] = y[center+1:][::-1] # don't want the center point twice if it's symetric anyway. I'll look into this over the weekend (after upgrading my machines to Slackware-12.1). I can get nice sigmoid curves with the Boltzmann function. This thread started when I asked how to combine the two into a single curve with one set of x,y points. Much appreciated, Rich -- Richard B. Shepard, Ph.D. | IntegrityCredibility Applied Ecosystem Services, Inc.|Innovation http://www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Ruby's NMatrix and NVector
http://narray.rubyforge.org/matrix-e.html It seems they've implemented some of what Tim is looking for, in particular. Perhaps there is information to be gleaned from what they are doing. It looks promising.. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Faster
How can I make this function faster? It removes the i-th row and column from an array. def cut(x, i): idx = range(x.shape[0]) idx.remove(i) y = x[idx,:] y = y[:,idx] return y import numpy as np x = np.random.rand(500,500) timeit cut(x, 100) 100 loops, best of 3: 16.8 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Faster
On Fri, May 2, 2008 at 5:38 PM, Charles R Harris [EMAIL PROTECTED] wrote: On Fri, May 2, 2008 at 6:24 PM, Keith Goodman [EMAIL PROTECTED] wrote: How can I make this function faster? It removes the i-th row and column from an array. Why do you want to do that? Single linkage clustering; x is the distance matrix. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Faster
On Fri, May 2, 2008 at 5:47 PM, Keith Goodman [EMAIL PROTECTED] wrote: On Fri, May 2, 2008 at 5:38 PM, Charles R Harris [EMAIL PROTECTED] wrote: On Fri, May 2, 2008 at 6:24 PM, Keith Goodman [EMAIL PROTECTED] wrote: How can I make this function faster? It removes the i-th row and column from an array. Why do you want to do that? Single linkage clustering; x is the distance matrix. Here's the full code if you are interested. I haven't used it yet other than running the test and test2 so it may be full of bugs. import time import numpy as np class Cluster: Single linkage hierarchical clustering def __init__(self, dist, label=None, debug=False): dist Distance matrix, NxN numpy array label Labels of each row of the distance matrix, list of N items, default is range(N) assert dist.shape[0] == dist.shape[1], 'dist must be square (nxn)' assert (np.abs(dist - dist.T) 1e-8).all(), 'dist must be symmetric' if label is None: label = range(dist.shape[0]) assert dist.shape[0] == len(label), 'dist and label must match in size' self.c = [[[z] for z in label]] self.label = label self.dist = dist self.debug = debug def run(self): for level in xrange(len(self.label) - 1): i, j = self.min_dist() self.join(i, j) def join(self, i, j): assert i != j, 'Cannot combine a cluster with itself' # Join labels new = list(self.c[-1]) new[i] = new[i] + new[j] new.pop(j) self.c.append(new) # Join distance matrix self.dist[:,i] = self.dist[:,[i,j]].min(1) self.dist[i,:] = self.dist[:,i] idx = range(self.dist.shape[1]) idx.remove(j) self.dist = self.dist[:,idx] self.dist = self.dist[idx,:] # Debug output if self.debug: print print len(self.c) - 1 print 'Clusters' print self.c[-1] print 'Distance' print self.dist def min_dist(self): dist = self.dist + 1e10 * np.eye(self.dist.shape[0]) i, j = np.where(dist == dist.min()) return i[0], j[0] def test(): # Example from # home.dei.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html label = ['BA', 'FI', 'MI', 'NA', 'RM', 'TO'] dist = np.array([[0,662, 877, 255, 412, 996], [662, 0,295, 468, 268, 400], [877, 295, 0,754, 564, 138], [255, 468, 754, 0,219, 869], [412, 268, 564, 219, 0,669], [996, 400, 138, 869, 669, 0 ]]) clust = Cluster(dist, label, debug=True) clust.run() def test2(n): x = np.random.rand(n,n) x = x + x.T c = Cluster(x) t1 = time.time() c.run() t2 = time.time() print 'n = %d took %0.2f seconds' % (n, t2-t1) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Faster
On Fri, May 2, 2008 at 7:24 PM, Keith Goodman [EMAIL PROTECTED] wrote: How can I make this function faster? It removes the i-th row and column from an array. def cut(x, i): idx = range(x.shape[0]) idx.remove(i) y = x[idx,:] y = y[:,idx] return y import numpy as np x = np.random.rand(500,500) timeit cut(x, 100) 100 loops, best of 3: 16.8 ms per loop I can get a ~20% improvement with the following: In [8]: %timeit cut(x, 100) 10 loops, best of 3: 21.6 ms per loop In [9]: def mycut(x, i): ...: A = x[:i,:i] ...: B = x[:i,i+1:] ...: C = x[i+1:,:i] ...: D = x[i+1:,i+1:] ...: return hstack([vstack([A,C]),vstack([B,D])]) ...: In [10]: %timeit mycut(x, 100) 10 loops, best of 3: 17.3 ms per loop The hstack(vstack, vstack) seems to be somewhat better than vstack(hstack, hstack), at least for these sizes. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Faster
On Fri, May 2, 2008 at 6:05 PM, Robert Kern [EMAIL PROTECTED] wrote: On Fri, May 2, 2008 at 7:24 PM, Keith Goodman [EMAIL PROTECTED] wrote: How can I make this function faster? It removes the i-th row and column from an array. def cut(x, i): idx = range(x.shape[0]) idx.remove(i) y = x[idx,:] y = y[:,idx] return y import numpy as np x = np.random.rand(500,500) timeit cut(x, 100) 100 loops, best of 3: 16.8 ms per loop I can get a ~20% improvement with the following: In [8]: %timeit cut(x, 100) 10 loops, best of 3: 21.6 ms per loop In [9]: def mycut(x, i): ...: A = x[:i,:i] ...: B = x[:i,i+1:] ...: C = x[i+1:,:i] ...: D = x[i+1:,i+1:] ...: return hstack([vstack([A,C]),vstack([B,D])]) ...: In [10]: %timeit mycut(x, 100) 10 loops, best of 3: 17.3 ms per loop The hstack(vstack, vstack) seems to be somewhat better than vstack(hstack, hstack), at least for these sizes. Wow. I never would have come up with that. And I probably never will. Original code: cluster.test2(500) n = 500 took 5.28 seconds Your improvement: cluster.test2(500) n = 500 took 3.52 seconds Much more than a 20% improvement when used in the larger program. Thank you. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Faster
On Fri, May 2, 2008 at 7:16 PM, Keith Goodman [EMAIL PROTECTED] wrote: On Fri, May 2, 2008 at 6:05 PM, Robert Kern [EMAIL PROTECTED] wrote: On Fri, May 2, 2008 at 7:24 PM, Keith Goodman [EMAIL PROTECTED] wrote: How can I make this function faster? It removes the i-th row and column from an array. def cut(x, i): idx = range(x.shape[0]) idx.remove(i) y = x[idx,:] y = y[:,idx] return y import numpy as np x = np.random.rand(500,500) timeit cut(x, 100) 100 loops, best of 3: 16.8 ms per loop I can get a ~20% improvement with the following: In [8]: %timeit cut(x, 100) 10 loops, best of 3: 21.6 ms per loop In [9]: def mycut(x, i): ...: A = x[:i,:i] ...: B = x[:i,i+1:] ...: C = x[i+1:,:i] ...: D = x[i+1:,i+1:] ...: return hstack([vstack([A,C]),vstack([B,D])]) ...: In [10]: %timeit mycut(x, 100) 10 loops, best of 3: 17.3 ms per loop The hstack(vstack, vstack) seems to be somewhat better than vstack(hstack, hstack), at least for these sizes. Wow. I never would have come up with that. And I probably never will. Original code: cluster.test2(500) n = 500 took 5.28 seconds Your improvement: cluster.test2(500) n = 500 took 3.52 seconds Much more than a 20% improvement when used in the larger program. Thank you. Isn't the lengthy part finding the distance between clusters? I can think of several ways to do that, but I think you will get a real speedup by doing that in c or c++. I have a module made in boost python that holds clusters and returns a list of lists containing their elements. Clusters are joined by joining any two elements, one from each. It wouldn't take much to add a distance function, but you could use the list of indices in each cluster to pull a subset out of the distance matrix and then find the minimum function in that. This also reminds me of Huffman codes. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Faster
On Fri, May 2, 2008 at 6:29 PM, Charles R Harris [EMAIL PROTECTED] wrote: Isn't the lengthy part finding the distance between clusters? I can think of several ways to do that, but I think you will get a real speedup by doing that in c or c++. I have a module made in boost python that holds clusters and returns a list of lists containing their elements. Clusters are joined by joining any two elements, one from each. It wouldn't take much to add a distance function, but you could use the list of indices in each cluster to pull a subset out of the distance matrix and then find the minimum function in that. This also reminds me of Huffman codes. You're right. Finding the distance is slow. Is there any way to speed up the function below? It returns the row and column indices of the min value of the NxN array x. def dist(x): x = x + 1e10 * np.eye(x.shape[0]) i, j = np.where(x == x.min()) return i[0], j[0] x = np.random.rand(500,500) timeit dist(x) 100 loops, best of 3: 14.1 ms per loop If the clustering gives me useful results, I'll ask you about your boost code. I'll also take a look at Damian Eads's scipy-cluster. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Faster
On Fri, May 2, 2008 at 8:02 PM, Keith Goodman [EMAIL PROTECTED] wrote: On Fri, May 2, 2008 at 6:29 PM, Charles R Harris [EMAIL PROTECTED] wrote: Isn't the lengthy part finding the distance between clusters? I can think of several ways to do that, but I think you will get a real speedup by doing that in c or c++. I have a module made in boost python that holds clusters and returns a list of lists containing their elements. Clusters are joined by joining any two elements, one from each. It wouldn't take much to add a distance function, but you could use the list of indices in each cluster to pull a subset out of the distance matrix and then find the minimum function in that. This also reminds me of Huffman codes. You're right. Finding the distance is slow. Is there any way to speed up the function below? It returns the row and column indices of the min value of the NxN array x. def dist(x): x = x + 1e10 * np.eye(x.shape[0]) x += x + diag(ones(x.shape[0])*1e10 would be faster. i, j = np.where(x == x.min()) return i[0], j[0] i = x.argmin() j = i % x.shape[0] i = i / x.shape[0] But I wouldn't worry about speed yet if you are just trying things out. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Faster
On Fri, May 2, 2008 at 9:02 PM, Keith Goodman [EMAIL PROTECTED] wrote: On Fri, May 2, 2008 at 6:29 PM, Charles R Harris [EMAIL PROTECTED] wrote: Isn't the lengthy part finding the distance between clusters? I can think of several ways to do that, but I think you will get a real speedup by doing that in c or c++. I have a module made in boost python that holds clusters and returns a list of lists containing their elements. Clusters are joined by joining any two elements, one from each. It wouldn't take much to add a distance function, but you could use the list of indices in each cluster to pull a subset out of the distance matrix and then find the minimum function in that. This also reminds me of Huffman codes. You're right. Finding the distance is slow. Is there any way to speed up the function below? It returns the row and column indices of the min value of the NxN array x. def dist(x): x = x + 1e10 * np.eye(x.shape[0]) i, j = np.where(x == x.min()) return i[0], j[0] Assuming x is contiguous and you can modify x in-place: In [1]: from numpy import * In [2]: def dist(x): ...:x = x + 1e10 * eye(x.shape[0]) ...:i, j = where(x == x.min()) ...:return i[0], j[0] ...: In [3]: def symmdist(N): ...: x = random.rand(N, N) ...: x = x + x.T ...: x.flat[::N+1] = 0 ...: return x ...: In [4]: symmdist(5) Out[4]: array([[ 0., 0.87508654, 1.11691704, 0.80366071, 0.57966808], [ 0.87508654, 0., 1.5521685 , 1.74010886, 0.52156877], [ 1.11691704, 1.5521685 , 0., 1.22725396, 1.04101992], [ 0.80366071, 1.74010886, 1.22725396, 0., 1.94577965], [ 0.57966808, 0.52156877, 1.04101992, 1.94577965, 0.]]) In [5]: def kerndist(x): ...: N = x.shape[0] ...: x.flat[::N+1] = x.max() ...: ij = argmin(x.flat) ...: i, j = divmod(ij, N) ...: return i, j ...: In [10]: x = symmdist(500) In [15]: %timeit dist(x) 10 loops, best of 3: 19.9 ms per loop In [16]: %timeit kerndist(x) 100 loops, best of 3: 4.38 ms per loop -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Faster
On Fri, May 2, 2008 at 8:02 PM, Keith Goodman [EMAIL PROTECTED] wrote: On Fri, May 2, 2008 at 6:29 PM, Charles R Harris [EMAIL PROTECTED] wrote: Isn't the lengthy part finding the distance between clusters? I can think of several ways to do that, but I think you will get a real speedup by doing that in c or c++. I have a module made in boost python that holds clusters and returns a list of lists containing their elements. Clusters are joined by joining any two elements, one from each. It wouldn't take much to add a distance function, but you could use the list of indices in each cluster to pull a subset out of the distance matrix and then find the minimum function in that. This also reminds me of Huffman codes. You're right. Finding the distance is slow. Is there any way to speed up the function below? It returns the row and column indices of the min value of the NxN array x. def dist(x): x = x + 1e10 * np.eye(x.shape[0]) i, j = np.where(x == x.min()) return i[0], j[0] x = np.random.rand(500,500) timeit dist(x) 100 loops, best of 3: 14.1 ms per loop If the clustering gives me useful results, I'll ask you about your boost code. I'll also take a look at Damian Eads's scipy-cluster. That package looks nice. I think your time would be better spent learning how to use it than in rolling your own routines. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Faster
On Fri, May 2, 2008 at 10:48 PM, Keith Goodman [EMAIL PROTECTED] wrote: On Fri, May 2, 2008 at 7:25 PM, Robert Kern [EMAIL PROTECTED] wrote: In [5]: def kerndist(x): ...: N = x.shape[0] ...: x.flat[::N+1] = x.max() ...: ij = argmin(x.flat) ...: i, j = divmod(ij, N) ...: return i, j I replaced ij = argmin(x.flat) with x.argmin() (they're the same in this context, right?) for a slight speed up. Now I'm down to 1.9 seconds. Yes, they're the same. I forgot that .argmin() flattened by default. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion