Re: [Numpy-discussion] array from list of lists
On 12/11/06, Erin Sheldon <[EMAIL PROTECTED]> wrote: > Actually, there is a problem with that approach. It first converts > the entire array to a single type, by default a floating type. For > very large integers this precision is insufficient. For example, I > have the following integer in my arrays: > 94137100072000193L > which ends up as > 94137100072000192 > after going to a float and then back to an integer. That's an unfortunate limitation of numpy; it views double-precision floats as higher precision than 64-bit integers, but of course they aren't. If you want to put all your data in a record array, you could try transposing the lists using a list comprehension - numpy is not always as much faster than pure python as it looks. You could then convert that to a list of four arrays and do the assignment as appropriate. Alternatively, you could convert your array into a higher-precision floating-point format (if one is available on your machine) before transposing and storing in a record array. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Could numpy.matlib.nanmax return a matrix?
On 12/11/06, Keith Goodman <[EMAIL PROTECTED]> wrote: > Is anybody interested in making x.max() and nanmax() behave the same > for matrices, except for the NaN part? That is, make > numpy.matlib.nanmax return a matrix instead of an array. Sounds good to me; maybe put a patch in the tracker? A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] x.min() depends on ordering
On 11/11/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > I think the problem is that the max and min functions use the first value in > the array as the starting point. That could be fixed by using the first > non-nan and returning nan if there aren't any "real" numbers. But it > probably isn't worth the effort as the behavior becomes more complicated. A > better rule of thumb is to note that comparisons involving nans are > basically invalid because nans aren't comparable -- the comparison violates > trichotomy. Don't really know what to do about that. Well, we could get simple consistent behaviour by taking inf as the initial value for min and -inf as the first value for max, then reducing as normal. This would then, depending on how max and min are implemented, either return NaN if any are present, or return the smallest/largest non-NaN value (or inf/-inf if there are none) A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] sparse SVD
On 11/11/06, koara <[EMAIL PROTECTED]> wrote: > Hello, is there a way to do SVD of large sparse matrices efficiently in > python? All i found was scipy (little sparse support, no SVD), pysparse > (no SVD) and PROPACK (no python). Out of these, PROPACK seems the most > enticing choice, but i have no experience porting fortran code and this > seems too big a bite. Any pointers or suggestions are most welcome! Numpy includes the program "f2py" for generating python wrappers for fortran code; it's not too difficult to use, even for a non-fortran programmer like me. If you do write wrappers, please send them to the scipy list - it's always good to build up the library. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] A reimplementation of MaskedArray
On 09/11/06, Paul Dubois <[EMAIL PROTECTED]> wrote: > Since the function of old retired persons is to tell youngsters stories > around the campfile: I'll pull up a log. But since I'm uppity, and not especially young, I hope you won't mind if I heckle. > A supercomputer hardware designer told me that when forced to add IEEE > arithmetic to his designs that it decreased performance substantially, maybe > 25-30%; it wasn't that doing the operations took so much longer, it was that > the increased physical space needed for that circuitry pushed the memory > farther away. Doubtless this inspired doing some of it in software instead. The goal of IEEE floats is not to be faster but to make doing correct numerical programming easier. (Numerical analysts can write robust numerical code for almost any kind of float, but the vast majority of numerical code is written by scientists who know the bare minimum or less about numerical analysis, so a good system makes such code work as well as possible.) The other reason IEEE floats are good is, no disrespect to your hardware designer friend intended, that they were designed with some care. Reimplementations are liable to get some of the finicky details wrong (round-to-even, denormalized numbers, what have you...). I found Kahan's "How Java's Floating-point Hurts Everyone Everywhere" ( http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf ) very educational when it comes to "what can go wrong with platform-dependent floating-point". > No standard for controlling the behaviors exists, either, so you can find > out the hard way that underflow-to-zero is being done in software by > default, and that you are doing a lot of it. Or that your code doesn't have > the same behavior on different platforms. Well, if the platform is IEEE-compliant, you can trust it to behave the same way logicially, even if some applications are slower on some systems than others. Or did you mean that there's no standard way to set the various IEEE flags? That seems like a language/compiler issue, to me (which is not to minimize the pain it causes!) > To my mind, all that was really accomplished was to convince said youngsters > that somehow this NaN stuff was the solution to some problem. In reality, > computing for 3 days and having it print out 1000 NaNs is not exactly > satisfying. I think this stuff was essentially a mistake in the sense that > it is a nice ivory-tower idea that costs more in practice than it is worth. > I do not think a properly thought-out and coded algorithm ought to do > anything that this stuff is supposed to 'help' with, and if it does do it, > the code should stop executing. Well, then turn on floating-point exceptions. I find a few NaNs in my calculations relatively benign. If I'm doing a calculation where they'll propagate and destroy everything, I trap them. But it happens just as often that I launch a job, a NaN appears early on, but I come back in the morning to find that instead of aborting, the job has completed except for a few bad data points. If you want an algorithm that takes advantage of NaNs, I have one. I was simulating light rays around a black hole, generating a map of the bending for various view angles. So I naturally did a lot of exploring of grazing incidence, where the calculations would often yield NaNs. So I used the NaNs to fill in the gap between the internal bending of light and the external bending of the light; no extra programming effort required, since that was what came out of the ray tracer anyway. The rendering step just returned NaNs for the image pixels that came from interpolating in the NaN-filled regions, which came out black; just what I wanted. I just wrote the program ignoring what would happen if a NaN appeared, and it came out right. Which is, I think, the goal - to save the programmer having to think too much about numerical-analysis headaches. > Anyway, if I thought it would do the job I wouldn't have written MA in the > first place. It's certainly not always a good solution; in particular it's hard to control the behaviour of things like where(). There's also no NaN for integer types, which makes MaskedArray more useful. Anyway, I am pleased that numpy has both controllable IEEE floats and MaskedArray, and I apologize for a basically off-topic post. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] A reimplementation of MaskedArray
On 08/11/06, Tim Hochberg <[EMAIL PROTECTED]> wrote: > It has always been my experience (on various flavors or Pentium) that > operating on NANs is extremely slow. Does anyone know on what hardware > NANs are *not* slow? Of course it's always possible I just never notice > NANs on hardware where they aren't slow. On an opteron machine I have access to, they appear to be no slower (and even faster for some transcendental functions) than ordinary floats: In [13]: a=zeros(100) In [14]: %time for i in xrange(1000): a += 1.1 CPU times: user 6.87 s, sys: 0.00 s, total: 6.87 s Wall time: 6.87 In [15]: a *= NaN In [16]: %time for i in xrange(1000): a += 1.1 CPU times: user 6.86 s, sys: 0.00 s, total: 6.86 s Wall time: 6.87 On my Pentium M, they are indeed significantly slower (three times? I didn't really do enough testing to say how much). I am actually rather offended by this unfair discrimination against a citizen in good standing of the IEEE floating point community. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] A reimplementation of MaskedArray
On 08/11/06, Pierre GM <[EMAIL PROTECTED]> wrote: > I like your idea, but not its implementation. If MA.masked_singleton is > defined as an object, as you suggest, then the dtype of the ndarray it is > passed to becomes 'object', as you pointed out, and that is not something one > would naturally expec, as basic numerical functions don't work well with the > 'object' dtype (just try N.sqrt(N.array([1],dtype=N.object)) to see what I > mean). > Even if we can construct a mask rather easily at the creation of the masked > array, following your 'a==masked' suggestion, we still need to get the dtype > of the non-masked section, and that doesn't seem trivial... A good candidate for "should be masked" marked is NaN. It is supposed to mean, more or less, "no sensible value". Unfortunately, integer types do not have such a special value. It's also conceivable that some user might want to keep NaNs in their array separate from the mask. Finally, on some hardware, operations with NaN are very slow (so leaving them in the array, even masked, might not be a good idea). The reason I suggest this is that in the last major application I had for numpy, one stage of the problem would occasionally result in NaNs for certain values, but the best thing I could do was leave them in place to represent "no data". Switching to a MaskedArray might have been a better idea, but the NaNs were a rare occurrence. > About the conversion to ndarray: > By default, the result should have the same dtype as the _data section. > For this reason, I disagree with your idea of "(returning) an object ndarray > with the missing value containing the masked singleton". If you really want > an object ndarray, you can use the filled method or the filled function, with > your own definition of the filling value (such as your MaskedScalar). If you've got floating point, you can again fill in NaNs, but you have a good point about wanting to extract the original values that were masked out. Depending on what one is doing, one might want one or the other. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] numarray indexing problems with Python2.5 and 64-bit platforms
On 02/11/06, Francesc Altet <[EMAIL PROTECTED]> wrote: > I see this as a major issue in numarray and poses in great danger the > intended support of PyTables for numarray that we planned for some time > (until end of 2007). It would be nice to know if the numarray crew would > be willing to address this, or, now that NumPy 1.0 is out, they have > decided to completely drop the support for it. > > We'd really like to continue offering support for numarray (in the end, > it is a very good piece of software) in PyTables, but don't having a > solution for this problem anytime soon, will make this very problematic > to us. Someone has to say it: you could just drop support for the obsolete numarray and provide only support for its successor, numpy. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Compressed index sets, generators and iterators
On 01/11/06, Bill Baxter <[EMAIL PROTECTED]> wrote: > What's the reason iterators are not supported currently? > For instance A[range(0,4)] works for a 1d A, but A[xrange(0,4)] does not. > Are iterators just too inefficient to bother with? If you want an iterator back, a generator comprehension will do it: (A[i] for i in xrange(0,4)) If the result is to be a numpy array, the size must be known ahead of time (so that the memory can be allocated) and specified. At this point, and considering the overhead in calling back and forth to the generator, the cost of converting to a list (after all, A[list(xrange(0,40)] should work fine) isn't necessarily worth the trouble. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Need more comments from scientific community on python-dev
On 01/11/06, Travis Oliphant <[EMAIL PROTECTED]> wrote: > And it may be a good idea to also have a get_ctype method or some-such > on the ctypes attribute so that one could get a "ctypes description" > from the NumPy data-type. It seems to me that at the python level, there's not much reason to choose the dtypes proposal over ctypes. There is one at the C level, it seems (though I, like perhaps most of the people on python-dev, have never actually tried using either). So perhaps, to persuade the python-dev folks, what is needed is a comparison of what has to be done at the C level. What would it take to rewrite numpy to use ctypes? There seems to be some problem with extending the type objects used by ctypes, but it's not very clear to me what that problem is (what the extensions are supposed to do). The argument that *some* datatypes format should become standard is much easier. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Strange and hard to reproduce crash
On 30/10/06, Fernando Perez <[EMAIL PROTECTED]> wrote: > On 10/30/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > I suspect the real problem is that the refcount keeps going up. Even if it > > was unsigned it would eventually wrap to zero and with a bit of luck get > > garbage collected. So probably something isn't decrementing the refcount. > > Oops, my bad: I meant *unsigned long long*, so that the refcount is a > 64-bit object. By the time it wraps around, you'll have run out of > memory long ago. Having 32 bit ref counters can potentially mean you > run out of the counter before you run out of RAM on a system with > sufficient memory. Yes, this is a feature(?) of python as it currently stands (I checked 2.5) - reference counts are 32-bit signed integers, so if you have an object that has enough references, python will be exceedingly unhappy: http://mail.python.org/pipermail/python-dev/2002-September/028679.html It is of course possible that you actually have that many references to some object, but it seems to me you'd notice twenty-four gigabytes of pointers floating around... A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Sorting eigenvalues and vectors together
On 27/10/06, jeremito <[EMAIL PROTECTED]> wrote: > I am using a = numpy.linalg.eig(A) to get the eigenvalues and > eigenvectors. I am interested only in the largest eigenvalue so I > would like to sort the first element of a. But if I do that, I won't > know what is the associated eigenvector. Is there a function that will > sort the values and vectors together or do I need to write it myself. I think that they are sorted, but argsort() will do what you want (or argmax(), even). A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorize pitfall
On 25/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote: > It takes old "type-codes" as one big string so you say > > vectorize(f,otypes='d') > > This should be modernized to handle a list of dtype objects. > > I've fixed vectorize in SVN. Thanks! A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
[Numpy-discussion] vectorize pitfall
Hi, Vectorize is a very handy function, but it has at least one pitfall: def f(x): if 1.3http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] memory position of numpy arrays
On 24/10/06, Lars Friedrich <[EMAIL PROTECTED]> wrote: > I am not really sure about the term "locking". Does that mean that this > part is not paged, or that this part is not accessed by two entities at > the same time? Or both? There are two kinds of locking, and really, you probably want both. But mlock() just ensures that the virtual memory stays in actual RAM. > Is my way a common way? I mean, letting python/numpy do the memory > allocation by creating a numpy-array with zeros in and passing its > memory location to the hardware-API? It's not necessary to do it this way. I think a more usual approach would be to create the buffer however is convenient in your C code, then provide its address to numpy. You can then use the ndarray function from python to tell it how to interpret that buffer as an array. Since the C code is creating the buffer, you can make sure it is in a special locked area of memory, ensure that the garbage collector never comes calling for it, or whatever you like. If you're having problems with driver stability, though, you may be safest having your C code copy the buffer into a numpy array in one shot - then you have complete control over when and how the DMA memory is accessed. (In C, I'm afraid, but for this sort of thing C is well-suited.) A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Model and experiment fitting.
On 20/10/06, Sebastian Żurek <[EMAIL PROTECTED]> wrote: > Is there something like that in any numerical python modules (numpy, > pylab) I could use? In scipy there are some very convenient spline fitting tools which will allow you to fit a nice smooth spline through the simulation data points (or near, if they have some uncertainty); you can then easily look at the RMS difference in the y values. You can also, less easily, look at the distance from the curve allowing for some uncertainty in the x values. I suppose you could also fit a curve through the experimental points and compare the two curves in some way. > I can imagine, I can fit the data with some polynomial or whatever, > and than compare the fitted data, but my goal is to operate on > as raw data as it's possible. If you want to avoid using an a priori model, Numerical Recipes discuss some possible approaches ("Do two-dimensional distributions differ?" at http://www.nrbook.com/a/bookcpdf.html is one) but it's not clear how to turn the problem you describe into a solvable one - some assumption about how the models vary between sampled x values appears to be necessary, and that amounts to interpolation. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Fortran-ordering quiz
On 18/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote: > If there are any cases satisfying these rules where a copy does not have > to occur then let me know. For example, zeros((4,4))[:,1].reshape((2,2)) need not be copied. I filed a bug in trac and supplied a patch to multiarray.c that avoids copies in PyArray_NewShape unless absolutely necessary. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] questions regarding assignement and copy
On 18/10/06, David Cournapeau <[EMAIL PROTECTED]> wrote: > Sven Schreiber wrote: > > > > Yes it's intended; as far as I understand the python/numpy syntax, <+> > > is an operator, and that triggers assignment by copy (even if you do > > something trivial as bar = +foo, you get a copy, if I'm not mistaken), > > > So basically, whenever you have > > foo = expr > > with expr is a numpy expression containing foo, you trigger a copy ? No. "=" never copies, but "+" does: when you do "A+B" you produce a new, freshly-allocated array, which you can then store (a reference to) in any variable you like, including A or B. So M = M+1 makes a copy because "M+1" is a new matrix, which you are assigning to M. Unfortunately, this means if you do: M = 2*(M+1) python makes the new matrix M+1, then makes the new matrix 2*(M+1), discards M+1, and sets M to point to 2*(M+1). If you want to avoid all this copying, you can use python's in-place operators: M += 1 M *= 2 This is actually a standard difficulty people have with python, made more obvious because you're working with mutable arrays rather than immutable scalars. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Fortran-ordering quiz
On 18/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote: > > I just committed a change to the check-for-copy code to SVN. A copy > will occur if an actual reshaping is taking place that involves more > than just adding or deleting ones from the old shape and if > > 1) self is not "single-segment". Single-segment means either > Fortran-order or C-order. If we don't have a single-segment array, then > we have to get a (Fortran or C-) contiguous buffer to do any reshaping > (there may be special cases when it's possible, but I haven't figure > them out...) Indeed it is often possible to reshape an array that is not contiguous; for example zeros(100)[::10] is not contiguous but can be reshaped to any shape without any copying. More generally, there are all sorts of peculiar arrangements that can be reshaped witout a copy, for example: input has lengths (8,2,3) and strides (39,9,3); output should have lengths (4,4,3,2), so take strides (156,39,6,3) Whether this sort of thing actually *occurs* very often, I don't know... > 2) self is single-segment but self.squeeze().ndim > 1 and it is in the > "wrong" order from what was requested in the order argument (i.e. self > is in C-contiguous order but a Fortran-order reshape was requested or > self is in F-contiguous order but a C-order reshape was requested). > > If there are any cases satisfying these rules where a copy does not have > to occur then let me know. There are. I came up with a general condition for describing exactly when you need to copy an array, but it's difficult to put into words. Here goes: (0) Dimensions of length 1 are an irrelevant annoyance. Conceptually they should be squeezed out before beginning and newaxised back in at the end. (What strides should they have?) Assume that I've done so from now on. (Actual code can just skip them appropriately.) (1) If the array looks like an evenly-strided 1D array that has been reshaped (so strides[i+-1]=lengths[i]*strides[i] for all i) it can always be reshaped without a copy. (For example, zeros(1000)[::10] can be reshaped arbitrarily and as many times as desired without copying.) (2) If the array does *not* look like an evenly-strided 1D array that has been reshaped, you can still reshape it without a copy if it looks like an array of such arrays, each of which can be reshaped separately. (For example, you can view an array of length (8,2,3) that you have to resize to an array of size (4,4,3,2) as two separate pieces: an array of size (2,3) that you have to resize to (3,2) and an array of size (8,) that you need to resize to size (4,4). Each of these pieces separately needs to look like a reshaped 1d array, but there need be no relation between them.) (The condition for when you can pull a smaller resize operation off the end is precisely when you can find a tail segment of each tuple that has the same product; in the case above, 2*3=3*2, so we could stop and do that reshape separately.) (3) If the array cannot be reshaped without a copy using the rules above, it cannot be reshaped without a copy at all. The python code in my previous message actually implements this rule, if you want a less vague description. I should also point out that views are not always more efficient than copies (because of cache concerns, and because a small view can prevent a giant block of memory from being deallocated); nevertheless it should be up to the user to copy when it helps. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Fortran-ordering quiz
On 18/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > On 10/17/06, A. M. Archibald <[EMAIL PROTECTED]> wrote: > > On 17/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > > > > > > > On 10/17/06, Travis Oliphant <[EMAIL PROTECTED] > wrote: > > > > > > Thus, reshape does the equivalent of a Fortran ravel to [1,4,2,5,3,6] > > > > and then a Fortran-order based fill of an empty (3,2) array: giving > you > > > > the result. > > > > > > Why a Fortran ravel? I am thinking of it as preserving the memory > layout, > > > just messing with how it is addressed. > > > > Because, to first order, the memory layout is totally irrelevant to a > > python user. > > > > array([[1,2,3],[4,5,6]],order='C') > > array([[1,2,3],[4,5,6]],order='F') > > > > should be nearly identical to the python user. > > So what is the point of having a fortran layout if things are not actually > layed out in fortran order? As far as I can see, memory layout is the only > reason for fortran ordering. Now I can sort of see where the fortran ravel > comes from, because the result can be passed to a fortran routine. And it > does look like a.ravel('F') is the same as a.T.ravel(). Hmmm. Now I wonder > about this: Thins are laid out in Fortran order if you request Fortran order upon array creation. You just can't see it, normally. Numpy tries hard to make arrays of all different orders look identical. Of course, the reason I said "to first order" is that when determining when to copy and when not to, or when passing arrays to C or Fortran functions, sometimes it does matter. For example, a typical Fortran linear algebra routine needs its arrays to be in Fortran order in memory. > In [62]: array([[1,2,3],[4,5,6]], dtype=int8, order='F').flags > Out[62]: > CONTIGUOUS : False > FORTRAN : True > OWNDATA : True >WRITEABLE : True > ALIGNED : True >UPDATEIFCOPY : False Unless you're working with C extensions, it doesn't make much sense to ever look at flags. The memory layout does not affect normal array behaviour (including reshape). A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Fortran-ordering quiz
On 17/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote: Charles R Harris wrote: > > In [3]: array([[1,2,3,4,5,6]]).reshape((3,2),order='F') > Out[3]: > array([[1, 4], >[2, 5], >[3, 6]]) > > I also don't understand why a copy is returned if 'F' just fiddles > with the indices and strides; the underlying data should be the same, > just the view changes. FWIW, I think both examples should be returning > views. You are right, it doesn't need to. My check is not general enough. It can be challenging to come up with a general way to differentiate the view-vs-copy situation and I struggled with it. In this case, it's the fact that while self->nd > 1, the other dimensions are only of shape 1 and so don't really matter. If you could come up with some kind of striding check that would distinguish the two cases, I would appreciate it. I wrote, just for exercise I guess, a routine that checks to see if a copy is needed for a reshape (and if not, it computes the resulting strides) for an arbitrary array. (It currently only does a C-order reshape, but F-order would be an easy addition.) It does, however, successfully handle arbitrarily strided arrays with arbitrary arrangements of "1" dimensions having arbitrary strides. I think it also correctly handles 0 strides. I don't imagine you'd want to *use* it, but it does provide a complete solution to copy-less reshaping. I'd be happy to help with the C implementation built into numpy, but for the life of me I can't figure out how it works. A. M. Archibald P.S. I haven't written down a *proof* that this implementation never copies unnecessarily, but I'm pretty sure that it doesn't. It has a reshape_restricted that effectively does a ravel() first; this can't cope with things like length (5,2,3) -> length (5,3,2) unless the strides are conveniently equal, so there's a wrapper, reshape(), that breaks the reshape operation up into pieces that can be done by reshape_restricted. -AMA from numpy import multiply, array, empty, zeros, add, asarray from itertools import izip class MustCopy(Exception): pass def index(indices, strides): indices = asarray(indices) strides = asarray(strides) if len(indices)!=len(strides): raise ValueError return add.reduce(indices*strides) def reshape_restricted(from_strides,from_lengths,to_lengths): if multiply.reduce(from_lengths)!=multiply.reduce(to_lengths): raise ValueError, "Array sizes are not compatible" for i in xrange(1,len(from_strides)): if not (from_strides[i-1]==from_strides[i]*from_lengths[i] or from_lengths[i]==1): # If a dimension has length one, who cares what its stride is? raise MustCopy r = empty(len(to_lengths), dtype=int) j = len(from_lengths)-1 while j>=0 and from_lengths[j]==1: j-=1 # Ignore strides on length-0 dimensions if j<0: # Both arrays have size 1 r[-1] = 1 else: r[-1] = from_strides[j] i = -1 while len(to_lengths)+i>0: r[i-1]=to_lengths[i]*r[i] i -= 1 return r def reshape(from_strides,from_lengths,to_lengths): if multiply.reduce(from_lengths)!=multiply.reduce(to_lengths): raise ValueError r = empty(len(to_lengths), dtype=int) j1 = len(from_lengths) j2 = len(to_lengths) while j1!=0: i1 = j1-1 i2 = j2-1 p1 = from_lengths[i1] p2 = to_lengths[i2] while p1!=p2: if p1=0 p1*=from_lengths[i1] else: i2-=1 assert i2>=0 p2*=to_lengths[i2] r[i2:j2] = reshape_restricted(from_strides[i1:j1], from_lengths[i1:j1], to_lengths[i2:j2]) j1 = i1 j2 = i2 return r def test_reshape(from_strides,from_lengths,to_strides,to_lengths): if multiply.reduce(from_lengths)!=multiply.reduce(to_lengths): raise ValueError if len(from_strides)!=len(from_lengths) or len(to_strides)!=len(to_lengths): raise ValueError def walk(lengths): i = zeros(len(lengths),int) while True: yield i j = len(lengths)-1 while i[j]==lengths[j]-1: i[j]=0 j-=1 if j<0: return i[j] += 1 for (i1,i2) in izip(walk(from_lengths), walk(to_lengths)): if index(i1,from_strides)!=index(i2,to_strides): raise ValueError - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Fortran-ordering quiz
On 17/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > On 10/17/06, Travis Oliphant <[EMAIL PROTECTED]> wrote: > > Thus, reshape does the equivalent of a Fortran ravel to [1,4,2,5,3,6] > > and then a Fortran-order based fill of an empty (3,2) array: giving you > > the result. > > Why a Fortran ravel? I am thinking of it as preserving the memory layout, > just messing with how it is addressed. Because, to first order, the memory layout is totally irrelevant to a python user. array([[1,2,3],[4,5,6]],order='C') array([[1,2,3],[4,5,6]],order='F') should be nearly identical to the python user. If the storage order mattered, you'd have to know, every time you used reshape, what order your matrix was stored in (did it come from a transpose operation, for example?). As for why a Fortran ravel rather than a C ravel, that's a simplifying decision. If you like, you can think of reshape as happening in two steps: o1='C' o2='C' A.reshape((6,),order=o1).reshape((2,3),order=o2) The design could have allowed, as Travis Oliphant says, o1!=o2, but explaining that would have been quite difficult (even more than now, I mean). And in any case you can use the code above to achieve that, if you really want. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] rcond in polyfit
On 14/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > On 10/13/06, A. M. Archibald <[EMAIL PROTECTED]> wrote: > > On 13/10/06, Tim Hochberg <[EMAIL PROTECTED]> wrote: > > > Charles R Harris wrote: > > > > > > On the other hand if error handling is set to 'raise', then a > > > FloatingPointError is issued. Is a FloatingPointWarning in order to > > > mirror the FloatingPointError? And if so, would it be appropriate to use > > > for condition number? > > > > I submitted a patchto use warnings for several functions in scipy a > > while ago, and the approach I took was to create a ScipyWarning, from > > which more specific warnings were derived (IntegrationWarning, for > > example). That was perhaps a bit short-sighted. > > > > I'd suggest a FloatingPointWarning as a base class, with > > IllConditionedMatrix as a subclass (it should include the condition > > number, but probably not the matrix itself unless it's small, as > > debugging information). > > > > Let's pin this down a bit. Numpy seems to need its own warning classes so we > can control the printing of the warnings. For the polyfit function I think > it should *always* warn by default because it is likely to be used > interactively and one can fool around with the degree of the fit. For > instance, I am now scaling the the input x vector so norm_inf(x) == 1 and > this seems to work pretty well for lots of stuff with rcond=-1 (about 1e-16) > and a warning on rank reduction. As long as the rank stays the same things > seem to work ok, up to fits of degree 21 on the test data that started this > conversation. Numerical Recipes (http://www.nrbook.com/a/bookcpdf/c15-4.pdf ) recommend setting rcond to the number of data points times machine epsilon (which of course is different for single/double). We should definitely warn the user if any singular value is below s[0]*rcond (as that means that there is effectively a useless basis function, up to roundoff). I'm not sure how to control the default warnings setting ("once" vs. "always"); it's definitely not possible using the standard API to save the warnings state and restore it later. One might be able to push such a change into the warnings module by including it in a ContextManager. ipython should probably reset all the "show once" warnings every time it shows an interactive prompt. I suppose more accurately, it should do that only for warnings the user hasn't given instructions about. That way you'll get a warning about bad polynomial fits every time you run a command that contains one, but if your function runs thousands of fits you don't drown in warnings. > BTW, how does one turn warnings back on? If I do a > > >>> warnings.simplefilter('always', mywarn) > > things work fine. Following this by > > >>> warnings.simplefilter('once', mywarn) > > does what is supposed to do. Once again issuing > > >>> warnings.simplefilter('always', mywarn) > > fails to have any effect. Resetwarnings doesn't help. Hmmm... I don't get the impression that the warnings module is much tested; I had similar headaches. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-dev] NumPy/SciPy + MPI for Python
On 14/10/06, Gael Varoquaux <[EMAIL PROTECTED]> wrote: > On Sat, Oct 14, 2006 at 06:58:45AM -0600, Bill Spotz wrote: > > I would like to second the notion of converging on a single MPI > > interface. My parallel project encapsulates most of the inter- > > processor communication within higher-level objects because the lower- > > level communication patterns can usually be determined from higher- > > level data structures. But still, there are times when a user would > > like access to the lower-level MPI interface. > > I think we are running in the same old problem of scipy/a super-package > of scientific tools. It is important to keep numpy/scipy modular. People > may want to install either one without an MPI interface. However it is > nice that if somebody just wants "everything" without having to choose > he can download that "everything" from scpy.org and that it comes well > bundled together. So all we need to do is find a name for that > super-package, put it together with good integration and add it to > scipy.org. > > My 2 cents, > > Gaël I agree. Moreover, being picked for such integration work would help encourage people to converge on one MPI interface (for example). There's some discussion of such a package at NumpyProConPage on the wiki (and in particular at http://www.scipy.org/PyLabAwaits ). The "scipy superpack" (which is only for Windows, as I understand it) is a sort of beginning. So, well, any suggestion for a name? pylab is already in use by matplotlib (for some reason), as is Scientific Python (and numpy, Numeric and numarray are obviously already confusing). A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] rcond in polyfit
On 13/10/06, Tim Hochberg <[EMAIL PROTECTED]> wrote: > Charles R Harris wrote: > > That sounds good, but how to do it? Should I raise an exception? > Use the warnings framework: > > >>> import warnings > >>> warnings.warn("condition number is BAD") > __main__:1: UserWarning: condition number is BAD > > The user can turn warnings on or off or turned in exceptions based on a > variety of criteria. Look for the warnings filter in the docs. > > Which brings up a question: do we want to have a FloatingPointWarning or > some such? Currently, if you use set the error handling to warn using > seterr a runtime warning is issued: > > >>> np.seterr(all='warn') > {'over': 'ignore', 'divide': 'ignore', 'invalid': 'ignore', 'under': > 'ignore'} > >>> np.arange(1) / 0 > __main__:1: RuntimeWarning: divide by zero encountered in divide > > > On the other hand if error handling is set to 'raise', then a > FloatingPointError is issued. Is a FloatingPointWarning in order to > mirror the FloatingPointError? And if so, would it be appropriate to use > for condition number? I submitted a patchto use warnings for several functions in scipy a while ago, and the approach I took was to create a ScipyWarning, from which more specific warnings were derived (IntegrationWarning, for example). That was perhaps a bit short-sighted. I'd suggest a FloatingPointWarning as a base class, with IllConditionedMatrix as a subclass (it should include the condition number, but probably not the matrix itself unless it's small, as debugging information). The warnings module is frustratingly non-reentrant, unfortunately, which makes writing tests very awkward. > > I would also have to modify lstsq so it returns the degree of the fit > > which would mess up the current interface. > One approach would be to write lstsqcond (or a better name) that returns > both the fit and the condition number. listsq could then be just a > wrapper over that which dumped the condition number. IIRC, the > condition number is available, but we're not returning it. This is a very good idea. scipy.integrate.quad returns a pair (result, error_estimate) and every time I use it I trip over that. (Perhaps if I were a fine upstanding numerical analyst I would be checking the error estimate every time, but it is a pain.) Another option would be a "full_output" optional argument. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] rcond in polyfit
On 13/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > > On 10/13/06, A. M. Archibald <[EMAIL PROTECTED]> wrote: > > On 12/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > > > > I note that polyfit looks like it should work for single and double, > real > > > and complex, polynomials. On the otherhand, the default rcond needs to > > > depend on the underlying precision. On the other, other hand, all the > svd > > > computations are done with dgelsd or zgelsd, i.e., double precision. > Even so > > > problems can arise from inherent errors of the input data if it is > single > > > precision to start with. I also think the final degree of the fit should > be > > > available somewhere if wanted, as it is an indication of what is going > on. > > > Sooo, any suggestions as to what to do? My initial impulse would be to > set > > > rcond=1e-6 for single, 1e-14 for double, make rcond a keyword, and kick > the > > > can down the road on returning the actual degree of the fit. > > > > I'd also be inclined to output a warning (which the user can ignore, > > read or trap as necessary) if the condition number is too bad or they > > supplied an rcond that is too small for the precision of their data. > > That sounds good, but how to do it? Should I raise an exception? I would > also have to modify lstsq so it returns the degree of the fit which would > mess up the current interface. Python's warnings module is a decent solution for providing this information. Goodness-of-fit worries me less than ill-conditioning - users are going to expect the curve to deviate from their function (and an easy reliable way to get goodness of fit is sqrt(sum(abs(f(xs)-polynomial(xs))**2)); this is certain to take into account any roundoff errors introduced anywhere). But they may well have no idea they should be worried about the condition number of some matrix they've never heard of. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Polyfit
On 13/10/06, Greg Willden <[EMAIL PROTECTED]> wrote: > What about including multiple algorithms each returning a figure of fit? > Then I could try two or three different algorithms and then use the one that > works best for my data. The basic problem is that X^n is rarely a good basis for the functions on [a,b]. So if you want it to return the coefficients of a polynomial, you're basically stuck. If you *don't* want that, there's a whole bestiary of other options. If you're just looking to put a smooth curve through a bunch of data points (perhaps with known uncertainties), scipy.interpolate includes some nice spline fitting functions. If you're looking for polynomials, orthogonal polynomials may serve as a better basis for your interval; you can look in scipy.special for them (and leastsq will fit them to your points). Extracting their coefficients is possible but will bring you back to numerical instabilities. In any case, all this is outside the purview of numpy (as is polyfit, frankly). A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Polyfit
On 13/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > You can also get *much* better results if you scale the x interval to [0,1] > as the problem will be better posed. For instance, with your data and a > degree 10 fit I get a condition number of about 2e7 when x is scaled to > [0,1], as opposed to about 1e36 when left as is. The former yields a > perfectly useable fit while the latter blows up. I suppose this could be > built into the polyfit routine if one were only interested in polynomial > fits of some sort, but the polynomial would have to carry around an offset > and scale factor to make evaluation work. [-1,1] would probably be even better, no? > If Travis is interested in such a thing we could put together some variant > of the polynomials that includes the extra data. At this point you might as well use a polynomial class that can accomodate a variety of bases for the space of polynomials - X^n, (X-a)^n, orthogonal polynomials (translated and scaled as needed), what have you. I think I vote for polyfit that is no more clever than it has to be but which warns the user when the fit is bad. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] rcond in polyfit
On 12/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > Hi all, > > I note that polyfit looks like it should work for single and double, real > and complex, polynomials. On the otherhand, the default rcond needs to > depend on the underlying precision. On the other, other hand, all the svd > computations are done with dgelsd or zgelsd, i.e., double precision. Even so > problems can arise from inherent errors of the input data if it is single > precision to start with. I also think the final degree of the fit should be > available somewhere if wanted, as it is an indication of what is going on. > Sooo, any suggestions as to what to do? My initial impulse would be to set > rcond=1e-6 for single, 1e-14 for double, make rcond a keyword, and kick the > can down the road on returning the actual degree of the fit. I'd also be inclined to output a warning (which the user can ignore, read or trap as necessary) if the condition number is too bad or they supplied an rcond that is too small for the precision of their data. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Should numpy.sqrt(-1) return 1j rather than nan?
On 12/10/06, David Goldsmith <[EMAIL PROTECTED]> wrote: > I don't use scipy (and don't want to because of the overhead) but it > sounds like I should because if I'm taking the square root of a variable > whose value at run time happens to be real but less than zero, I *want* > the language I'm using to return an imaginary; in other words, it's not > the scipy behavior which "scares" me, its the numpy (which I do/have > been using) behavior. To which you might say, "Well, if that's what you > want, and you have Matlab (as you've said you do), then just use that." > But that's precisely the point: people who don't want to be bothered > with having to be "a bit more care[ful]" (as Chuck put it) - and who can > afford it - are going to be inclined to choose Matlab over numpy. > Perhaps one doesn't care - in the grand scheme of things, it certainly > doesn't matter - but I think that you all should be aware that this > numpy "feature" will be seen by many as more than just a nuisance. Scipy doesn't do what you want. What you want is to use complex numbers throughout; then numpy and scipy will do exactly what you request. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Should numpy.sqrt(-1) return 1j rather than nan?
On 11/10/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > """ > In SciPy 0.3.x the ufuncs were overloaded by more "intelligent" versions. > A very attractive feature was that sqrt(-1) would yield 1j as in Matlab. > Then you can program formulas directly (e.g., roots of a 2nd order > polynomial) and the right answer is always achieved. In the Matlab-Python > battle in mathematics education, this feature is important. > > Now in SciPy 0.5.x sqrt(-1) yields nan. A lot of code we have, especially > for introductory numerics and physics courses, is now broken. > This has already made my colleagues at the University skeptical to > Python as "this lack of backward compatibility would never happen in Matlab". > > Another problem related to Numeric and numpy is that in these courses we > use ScientificPython several places, which applies Numeric and will > continue to do so. You then easily get a mix of numpy and Numeric > in scripts, which may cause problems and at least extra overhead. > Just converting to numpy in your own scripts isn't enough if you call > up libraries using and returning Numeric. > """ > > I wonder, what are the reasons that numpy.sqrt(-1) returns nan? > Could sqrt(-1) made to return 1j again? If not, shouldn't > numpy.sqrt(-1) raise a ValueError instead of returning silently nan? What is the desired behaviour of sqrt? Should sqrt always return a complex array, regardless of the type of its input? This will be extremely surprising to many users, whose memory usage suddenly doubles and for whom many functions no longer work the way they're accustomed to. Should it return a complex array only when any entry in its input is negative? This will be even *more* surprising when a negative (perhaps even -0) value appears in their matrix (for example, does a+min(a) yield -0s in the minimal values?) and suddenly it's complex. A ValueError is also surprising, and it forces the user to sanitize her array before taking the square root, instead of whenever convenient. If you want MATLAB behaviour, use only complex arrays. If the problem is backward incompatibility, there's a reason 1.0 hasn't been released yet... A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Things to address for Py3K
On 11/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > Speaking long term, what about data types? The basic 80 bit extended > precision float now occurs in 80, 96, and 128 bit versions depending on > alignment. So what happens when quad precision, which will probably be in > the next IEEE standard, comes down the pike and is 128 bits long? The length > of a float will no longer be sufficient to distinguish the various types. This doesn't require python 3K; it can happn in numpy with no language support. But perhaps python 3K can be viewed as a chance to ditch backwards-compatibility baggage? (cough, Numeric compatibility, cough) Would it be of interest to have numeric datatypes integrated with python datatypes? How about IEEE floats in python proper, at least? It can be rather confusing when doing a calculation yields different results for arrays than for ordinary scalars... A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] can this be made faster?
> > > > c contains arbitray floats. > > > > essentially it is to compute class totals > > > > as in total[class[i]] += value[i] > > This seems like a rather common operation - I know I've needed it on > > at least two occasions - is it worth creating some sort of C > > implementation? What is the appropriate generalization? > > Some sort of indirect addressing infrastructure. But it looks like this > could be tricky to make safe, it would need to do bounds checking at the > least and would probably work best with a contiguous array as the target. I > could see some sort of low-level function called argassign(target, indirect > index, source) that could be used to build more complicated things in > python. If it were only assignment that was needed, fancy indexing could already handle it. The problem is that this is something that can't *quite* be done with the current fancy indexing infrastructure - every time an index comes up we want to add the value to what's there, rather than replacing it. I suppose histogram covers one major application; in fact if histogram allowed weightings ("count this point as -0.6") it would solve the OP's problem. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] can this be made faster?
On 09/10/06, Robert Kern <[EMAIL PROTECTED]> wrote: > Daniel Mahler wrote: > > In my case all a, b, c are large with b and c being orders of > > magnitude lareger than a. > > b is known to contain only, but potentially any, a-indexes, reapeated > > many times. > > c contains arbitray floats. > > essentially it is to compute class totals > > as in total[class[i]] += value[i] > > In that case, a slight modification to Greg's suggestion will probably be > fastest: If a is even moderately large and you don't care what's left behind in b and c you will probably accelerate the process by sorting b and c together (for cache coherency in a) This seems like a rather common operation - I know I've needed it on at least two occasions - is it worth creating some sort of C implementation? What is the appropriate generalization? A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] can this be made faster?
On 08/10/06, Robert Kern <[EMAIL PROTECTED]> wrote: > Bill Baxter wrote: > > Yes, that'd be > >a[b] += c > > No, I'm afraid that fancy indexing does not do the loop that you are thinking > it > would (and for reasons that we've discussed previously on this list, *can't* > do > that loop). That statement reduces to something like the following: So the question remains, is there a for-loop-free way to do this? (This, specifically, is: for i in range(len(b)): a[b[i]]+=c[i] where b[i] may contain repetitions.) I didn't find one, but came to the conclusion that for loops are not necessarily slower than fancy indexing, so the way to do this one is just to use a for loop. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] super-packages [was "Hello and my first patch"]
On 06/10/06, Christopher Barker <[EMAIL PROTECTED]> wrote: > A. M. Archibald wrote: > > Is setting out to create a "MATLAB replacement" a good goal? > > No. I see what you mean, but I still think that using MATLAB as a yardstick ("Can users do everything with the python superpackage that they could with MATLAB?" "Is it of comparable or lesser difficulty?") is a good idea, at least until we've clearly outstripped it in all categories. > MATLAB two advantages (and they are big ones!): integrated plotting and > a large and broad library of ready-to-use functions for a wide variety > of computing. This is what I mean about using it as a yardstick. As a MATLAB survivor, would you be interested in going to http://www.scipy.org/NumPy_for_Matlab_Users and adding numerical tools that MATLAB has, so that we can use it as a checklist for what tools we'd like to have (first)? I put a short list there, but it's years since I last used MATLAB. > MPL has almost solved one, and SciPy is getting better an better -- so > we're really on the right track! I agree. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] super-packages [was "Hello and my first patch"]
On 06/10/06, Jon Peirce <[EMAIL PROTECTED]> wrote: > > Perhaps that is the best way to move forward along with the work on a > > "pylab" super-package. > > [...] > > But why not scipy? It seems the right name for a super-package. It does, particularly if (as is being discussed) the current content of scipy might get subdivided into separately buildable modules (weave, fortran, non-fortran, say). scipy would then refer to the lot - numpy, matplotlib, weave, scipy-fortran, scipy-nonfortran, and any others that seem essential. Is setting out to create a "MATLAB replacement" a good goal? I think so. Really the goal should be to create a useful tool for scientists, but I think MATLAB is clearly already that, so (at first) it can serve as a definite objective. Of course, anyone who wants to make scipy *better* than MATLAB should be warmly encouraged... A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Hello and my first patch
On 05/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote: > I think a hybrid for weave / f2py / ctypes that allows "inlining in > multiple languages" as well as automatic extension module generation for > "already-written" code is in order. It might make sense to also include SWIG (since that seems to be a popular choice for wrapping "already-written" C and C++ code). A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Hello and my first patch
On 05/10/06, Greg Willden <[EMAIL PROTECTED]> wrote: > On 10/5/06, Travis Oliphant <[EMAIL PROTECTED]> wrote: > > Perhaps that is the best way to move forward along with the work on a > > "pylab" super-package. > > That is exactly what I want. What is unsatisfactory about installing numpy+scipy+matplotlib? I've found they're generally pretty complete (except where no decent python alternative exists). > In the end I want a nice collection of functions, logically organized, that > let me analyze/filter/plot etc. etc. etc. > > The key for me is "logically organized". For the most part, the organization is pretty logical: * Basic array and matrix operations in numpy * linear algebra, differential equation, interpolation, etc. tools are in scipy, each in their own subpackage * weave is mysteriously in scipy * plotting tools are in matplotlib There are a few historical quirks, like window functions in numpy (they really belong in scipy), and some of the less-used scipy subpackages are a bit of a mess internally (scipy.interpolate for example), but otherwise I'm not sure what you want to be different. > And right now that means "So a newbie can find the function I need and the > function I need is there" > > I'm not criticising. I'd like to help get there. Install all three major packages. Use the window functions from scipy in matplotlib. Task-oriented documentation is so far a bit scant, although the scipy cookbook (http://www.scipy.org/Cookbook ) and the numpy examples list (http://www.scipy.org/Numpy_Example_List ) are a good start. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Hello and my first patch
On 05/10/06, Greg Willden <[EMAIL PROTECTED]> wrote: > That is a much better policy in my view. > > I (gently) encourage this group (Travis?) to make this the policy for > Numpy/Scipy. > > From my view as a newbie to numpy/scipy/matplotlib it isn't clear where I > should look for what functionality. Matplotlib plots the spectrogram but it > only supports two or three window functions. Numpy supports 4 or 5 window > functions and Scipy apparently supports more but Matplotlib doesn't support > Scipy. Of course this is a minor example and I could just write the window > function myself and then use it in Matplotlib but I want to give back so > that the project can grow. I'd really like to be able to leave Matlab > behind and encourage everyone else to do the same but there are still these > annoyances that need to be worked out. Unfortunately, that policy (put it in numpy if it doesn't make the build dependencies any worse) makes it even harder for the user to figure out what is where. Say I want a fast matrix product. Do I look in numpy or scipy? It'll run faster if it uses a tuned BLAS, so it ought to have external requirements, so I'd look in scipy, but maybe there's a simple non-tuned implementation in numpy instead... Frankly, I tend to prefer the other approach to solving all these issues: distribute numpy, scipy, and matplotlib as one bundle. The requirements for scipy are not particularly onerous, particularly if it comes as part of your distribution. There are currently some problems successfully finding optimized versions of LAPACK and BLAS, but to me the benefits of bundling the packages together outweigh the difficulties: * routines and objects can be in the package in which they make the most semantic sense, rather than the one with the correct external dependencies (how is a user supposed to know whether convolution uses an external library or not?) * documentation can be cross-referenced between packages (so the Matrix class can tell people to look in scipy.linalg for inverses, for example) * users can more easily find what's available rather than rewriting from scratch * derived packages (ipython, IDEs, MATLAB-alikes) have many fewer possibilities to support I'm not arguing that the development processes need to be connected, just that making the primary distributed object a package containing all the components will make life easier for everyone involved. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] "Manually" broadcasting arrays in Python
On 04/10/06, Robert Kern <[EMAIL PROTECTED]> wrote: Yeah, a segfault would be problematic. Otherwise, it works and is in fact faster than what I wrote. It's a bit tricky to trigger but I think it was fixed in 1.0rc1 (in changeset 3125, in fact: http://projects.scipy.org/scipy/numpy/changeset/3125 ) Would it be useful for me to contribute the tiny script I wrote to trigger it as a regression test? A. M. Archibald from numpy import vectorize, zeros vt = vectorize(lambda *args: args) # Removing either of the following lines cures the segfault vt(zeros((1,2,1)), zeros((2,1,1)), zeros((1,1,2))) vt(zeros((1,2,1)), zeros((2,1,1)), zeros((1,1,2)), zeros((2,2))) - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] "Manually" broadcasting arrays in Python
On 03/10/06, Robert Kern <[EMAIL PROTECTED]> wrote: > Has anyone implemented an easier or more efficient way to broadcast arrays to > a > common shape at the Python level? I was hoping that the broadcast iterator > would > actually provide the broadcasted arrays, but it does not. How about vectorize(lambda *args: args)? Almost works, only it sometimes segfaults with more than two arguments... but that's clearly a numpy bug. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Vectorizing code, for loops, and all that
On 03/10/06, Tim Hochberg <[EMAIL PROTECTED]> wrote: > I had an idea regarding which axis to operate on first. Rather than > operate on strictly the longest axis or strictly the innermost axis, a > hybrid approach could be used. We would operate on the longest axis that > would not result in the inner loop overflowing the cache. The idea is > minimize the loop overhead as we do now by choosing the largest axis, > while at the same time attempting to maintain cache friendliness. If elements are smaller than cache lines (usually at least eight bytes, I think), we might end up pulling many times as many bytes into the cache as we actually need if we don't loop along axes with small strides first. Can BLAS be used for some of these operations? A. M. Archibald A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Vectorizing code, for loops, and all that
On 02/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote: > Perhaps those inner 1-d loops could be optimized (using prefetch or > something) to reduce the number of cache misses on the inner > computation, and the concept of looping over the largest dimension > (instead of the last dimension) should be re-considered. Cache control seems to be the main factor deciding the speed of many algorithms. Prefectching could make a huge difference, particularly on NUMA machines (like a dual opteron). I think GCC has a moderately portable way to request it (though it may be only in beta versions as yet). More generally, all the tricks that ATLAS uses to accelerate BLAS routines would (in principle) be applicable here. The implementation would be extremely difficult, though, even if all the basic loops could be expressed in a few primitives. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] eval shortcomings?
On 25/09/06, Angus McMorland <[EMAIL PROTECTED]> wrote: > Hi all, > > Can someone explain why the following occurs? > > a = numpy.zeros((100)) > b = numpy.ones((10)) > a[20:30] = b# okay > eval('a[50:60] = b') # raises SyntaxError: invalid syntax > > Is there some line mangling that the interpretor does that eval doesn't do? No. Eval evaluates expressions, that is, formulas producing a value. "a=b" does not produce a value, so you are obtaining the same error you would if you'd written if a=b: ... The way you run code that doesn't return a value is with "exec". A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
[Numpy-discussion] ufunc.reduce and conversion
Hi, What are the rules for datatype conversion in ufuncs? Does ufunc(a,b) always yield the smallest type big enough to represent both a and b? What is the datatype of ufunc.reduce(a)? I ask because I was startled by the following behaviour: >>> a = array([1,1],uint8); print a.dtype; print maximum.reduce(a).dtype; '|u1' ' float, possibly only for add.reduce). Thanks, A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] sorting -inf, nan, inf
On 19/09/06, Tim Hochberg <[EMAIL PROTECTED]> wrote: > I'm still somewhat mystified by the desire to move the nans to one end > of the sorted object. I see two scenarios: It's mostly to have something to do with them other than throw an exception. Leaving them in place while the rest of the array is reshuffled requires a lot of work and isn't particularly better. I mostly presented it as an alternative to throwing an exception. Throwing a Python exception now seems like the most reasonable idea. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] sorting -inf, nan, inf
On 19/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > > For floats we could use something like: > > lessthan(a,b) := a < b || (a == nan && b != nan) > > Which would put all the nans at one end and might not add too much overhead. You could put an any(isnan()) out front and run this slower version only if there are any NaNs (also, you can't use == for NaNs, you have to use C isNaN). But I'm starting to see the wisdom in simply throwing an exception, since sorting is not well-defined with NaNs. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] sorting -inf, nan, inf
On 19/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > If this sort of thing can cause unexpected errors I wonder if it would be > worth it to have a global debugging flag that essentially causes isnan to > be called before any function applications. That sounds very like the IEEE floating-point flags, which would be extremely useful to have, and which are being wored on, IIRC. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] sorting -inf, nan, inf
On 19/09/06, Tim Hochberg <[EMAIL PROTECTED]> wrote: > I'm not sure where the breakpoint is, but I was seeing failures for all > three sort types with N as high as 1. I suspect that they're all > broken in the presence of NaNs. I further suspect you'd need some > punishingly slow n**2 algorithm to be robust in the presence of NaNs. Not at all. Just temporarily make NaNs compare greater than any other floating-point value and use quicksort (say). You can even do this for python lists without much trouble. That's actually a viable suggestion for numpy's sorting, although it would be kind of ugly to implement: do a quick any(isnan(A)), and if not, use the fast stock sorting algorithms; if there is a NaN somewhere in the array, use a version of the sort that has a tweaked comparison function so the NaNs wind up at the end and are easy to trim off. But the current situation, silently returning arrays in which the non-NaNs are unsorted, is really bad. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] sorting -inf, nan, inf
On 19/09/06, Tim Hochberg <[EMAIL PROTECTED]> wrote: > A. M. Archibald wrote: > > Mmm. Somebody who's working with NaNs has more or less already decided > > they don't want to be pestered with exceptions for invalid data. > Do you really think so? In my experience NaNs are nearly always just an > indication of a mistake somewhere that didn't get trapped for one reason > or another. Well, I said that because for an image porcessing project I was doing, the easiest thing to do with certain troublesome pixels was to fill in NaNs, and then at the end replace the NaNs with sensible values. It seems as if the point of NaNs is to allow you to keep working with those numbers that make sense while ingoring those that don't. If you wanted exceptions, why not get them as soon as the first NaN would have been generated? > > I'd > > be happy if they wound up at either end, but I'm not sure it's worth > > hacking up the sort algorithm when a simple isnan() can pull them out. > > > Moving them to the end seems to be the worst choice to me. Leaving them > alone is fine with me. Or raising an exception would be fine. Or doing > one or the other depending on the error mode settings would be even > better if it is practical. I was just thinking in terms of easy removal. > Is that true? Are all of numpy's sorting algorithms robust against > nontransitive objects laying around? The answer to that appears to be > no. Try running this a couple of times to see what I mean: > The values don't correctly cross the inserted NaN and the sort is incorrect. You're quite right: when NaNs are present in the array, sorting and then removing them does not yield a sorted array. For example, mergesort just output [ 2. 4. 6. 9. nan 0. 1. 3. 5. 7. 8.] The other two are no better (and arguably worse). Python's built-in sort() for lists has the same problem. This is definitely a bug, and the best way to fix it is not clear to me - perhaps sort() needs to always do any(isnan(A)) before starting to sort. I don't really like raising an exception, but sort() isn't really very meaningful with NaNs in the array. The only other option I can think of is to somehow remove them, sort without, and reintroduce them at the end, which is going to be a nightmare when sorting a single axis of a large array. Or, I suppose, sort() could simply fill the array with NaNs; I'm sure users will love that. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] sorting -inf, nan, inf
On 19/09/06, Tim Hochberg <[EMAIL PROTECTED]> wrote: > Keith Goodman wrote: > > In what order would you like argsort to sort the values -inf, nan, inf? > > > Ideally, -inf should sort first, inf should sort last and nan should > raise an exception if present. > > -tim Mmm. Somebody who's working with NaNs has more or less already decided they don't want to be pestered with exceptions for invalid data. I'd be happy if they wound up at either end, but I'm not sure it's worth hacking up the sort algorithm when a simple isnan() can pull them out. What's happening now is that NaNa are all false, which rather confuses the sorting algorithm. But as long as it doesn't actually *break* (that is, leave some of the non-NaNs incorrectly sorted) I don't care. A. M. Archibald - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] visual programming
On 18/09/06, Mathew Yeates <[EMAIL PROTECTED]> wrote: > semi off topic. > Does anyone know of any good visual programming tools? Forever ago, I > used to use something called Khoros for image processing and I found it > very useful. You connect boxes which represent different processing > steps. At about the same time, there was something similar to Khoros > from SGI. Explorer was its name, I think. > > Anybody out there use this sort of thing? Matlab doesn't offer it, > right? Nor matplotlib? > > Mathew I doubt it's what you're looking for, but PD and GEM are a visual programming environment, oriented towards music, audio signal processing, 2D and 3D video. (They're related to Max and jMax, which do some of the same things.) A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] In-place operations
On 12/09/06, Pierre Thibault <[EMAIL PROTECTED]> wrote: > I would like to have information on the best techniques to do in-place > calculations and to minimize temporary array creations. To me this > seems to be very important whenever the arrays become very large. The first rule of optimization: don't do it yet. You can usually go through and banish temporary arrays (using ufuncs and so on) at the cost of readability, code encapsulation, and thread-safe-ness. But it may not do what you want. I had an image-processing code that was taking longer than I thought it should and using two hundred megabytes or so of RAM. So I rewrote it, with a certain amount of pain, in a way that it used the fewest possible temporary arrays. It didn't run any faster, and it then took five hundred megabytes. Because all the arrays ended up being in memory at once, the memory footprint increased drastically. malloc() is fast, typically just a handful of instructions; if you're allocating a giant array, it's almost certainly being allocated using mmap(), and it can be released back to the OS on deallocation. But you probably still want to avoid temporary arrays. So: > More specifically, here are examples that occured in my code > > 1) FFTs: Let A and B be two large arrays, already allocated. I want > the fft of A to be stored in B. If I just type B = fft(A), there is a > temprary array creation, right? Is it possible to avoid that? Doing an FFT in-place is a major challenge, and involves its own slowdowns, so generally high-level toolkits don't bother. But fft seems to be like many functions (those generated by interp1d, for example) that insist on malloc()ing their own arrays to return. Short of rooting around in the numpy/scipy code, there's no real way around this for such functions. The best you can do is make actual use of the allocated array (rather than copy its contents to *another* array and discard it). > 2) Function output: In general, I think the same thing happens with > functions like > > def f1(array_in): >array_out = # something using array_in >return array_out > > Then, if B is already allocated, writing B = f1(A) involves again a > temporary array creation Uh, no, not really. The way you have written f1, it probably malloc()s space for array_out. the address of that space (roughly) is saved in the array_out variable. If you write B=f1(A), you are just storing the address in B. The memory is not copied. Even if you do B=f1(A)[::10] you don't copy the memory. > I thought instead of doing something like > > def f2(array_in, array_out): > array_out[:] = # something > # Is this good practice? > > and call f2(A,B). This is a ufunc-like solution; you could even make array_out an optional argument, and return it. > If I understand well, this still requires a temporary array creation. > Is there another way of doing that (appart from actually looping > through the indices of A and B)? It depends what #something is. If, say, it is 2*array_in, you can simply do multiply(array_in,2,array_out) to avoid any dynamic allocation. > I guess these considerations are not standard python problems because > you expect python to take care of memory issues. With big arrays in > scientific computations, I feel the question is more relevant. I might > be wrong... Some of these issues come up when dealing with mutable objects (lists, dictionaries, and so on). Some of them (the fact that python variables contain simply references) are discussed in various python FAQs. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Problem with concatenate and object arrays
Maybe I should stay out of this, but it seems like constructing object arrays is complicated and involves a certain amount of guesswork on the part of Numeric. For example, if you do array([a,b,c]).shape(), the answer is normally (3,) unless a b and c happen to all be lists of the same length, at which point your array could have a much more complicated shape... but as the person who wrote "array([a,b,c])" it's tempting to assume that the result has shape (3,), only to discover subtle bugs much later. If we were writing an array-creation function from scratch, would there be any reason to include object-array creation in the same function as uniform array creation? It seems like a bad idea to me. If not, the problem is just compatibility with Numeric. Why not simply write a wrapper function in python that does Numeric-style guesswork, and put it in the compatibility modules? How much code will actually break? A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Problem with concatenate and object arrays
On 06/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > On 9/6/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > order - Specify the order of the array. If order is 'C', then the > > array will be in C-contiguous order (last-index varies the > > fastest). If order is 'FORTRAN', then the returned array > > will be in Fortran-contiguous order (first-index varies > the > > fastest). If order is None, then the returned array may > > be in either C-, or Fortran-contiguous order or even > > discontiguous. This one's a bit complicated. If array() is passed a list of lists, there are two different orders that are relevant - the output order of the array, and the order used to interpret the input. I suppose that if L is a lost of lists, array(L)[2,3]==L[2][3], that is, in some sense the arrays are always logically C-ordered even if the underlying representation is different. Does it make sense to specify this somewhere in the docstring? At least it would be good to make it clear that the order parameter affects only the underlying storage format, and not the indexing of the array. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Random number generators
On 04/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > > On 9/4/06, A. M. Archibald <[EMAIL PROTECTED]> wrote: > > On 04/09/06, Robert Kern <[EMAIL PROTECTED]> wrote: > > > Charles R Harris wrote: > > However, a random number generator based on a stream cipher is > > probably going to be pretty good, if slow, so implementingone if those > > can't hurt. But I think leaving the possibility open that one could > > use custom sources of random bytes is a very good idea. There's no > > particular need that custom ones should be fast, as they're for use > > when you really want *good* random numbers. > > > I was thinking it might be nice to have one, just to generate seeds if > nothing else. This could actually be written in python and use something > like the python cryptography toolkit, no need to reinvent the wheel. It > might be worth looking at that code just to see how someone else thought > through the interface. It's under the Python license, so I need to know if > that license is compatible with the BSD license used for numpy. Of the generators they have, only their random pool is of any use for seeding, and even that needs some clever feeding of random data. However, on practically any platofrm this will run on, random bytes based on real entropy can be extracted from the system (/dev/random, windows crypto API, something on the Mac); a uniform API for those would be a handy thing to have. But I can't recommend spending much effort connecting the python crypto stuff to numpy's random number generators; it's the sort of thing application writers can easily cook up by subclassing RandomGenerator (or whatever). A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Random number generators
On 04/09/06, Robert Kern <[EMAIL PROTECTED]> wrote: > Charles R Harris wrote: > > > What sort of api should this be? It occurs to me that there are already > > 4 sources of random bytes: > > > > Initialization: > > > > /dev/random (pseudo random, I think) /dev/random is (supposed to be) true randomness derived from various sources. It's also fed through a bunch of hashing and CRC functions chosen for convenience, but is probably pretty unbiased. > > /dev/urandom > > crypto system on windows > > > > Pseudo random generators: > > > > mtrand > > > > I suppose we could add some cryptologically secure source as well. That > > indicates to me that one set of random number generators would just be > > streams of random bytes, possibly in 4 byte chunks. If I were doing this > > for linux these would all look like file systems, FUSE comes to mind. > > Another set of functions would transform these into the different > > distributions. So, how much should stay in numpy? What sort of API are > > folks asking for? "Cryptographically secure random number generator" means something different to everybody, and implementing one that everyone is happy with is going to be a colossal pain. Look at Yarrow and its design documents for some idea of the complexities involved. However, a random number generator based on a stream cipher is probably going to be pretty good, if slow, so implementingone if those can't hurt. But I think leaving the possibility open that one could use custom sources of random bytes is a very good idea. There's no particular need that custom ones should be fast, as they're for use when you really want *good* random numbers. > It would be good to also support quasi-random generators like Sobol and Halton > sequences. They tend to only generate floating point numbers in [0, 1] (or (0, > 1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs > that > only output floating point numbers, but I doubt we want to implement any of > them. You can look at the 1.6 version of RandomKit for an implementation of > Sobol sequences and ISAAC, a cryptographic PRNG. > >http://www.jeannot.org/~js/code/index.en.html Sobol' sequences may require special care in constructing non-uniform distributions in order to preserve their uniformity properties. If we can simply import ISAAC, it won't hurt, but I'd rather trust (say) AES in counter mode than some custom cryptosystem that hasn't been analyzed as thoroughly. > I'm thinking that the core struct that we're going to pass around will look > something like this: > > struct random_state { >void* state; >unsigned int (*gen_uint32)(struct random_state*); >double (*gen_double)(struct random_state*); > } > > state -- A pointer to the generator-specific struct. > > gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. > Possibly > NULL if the generator can not implement such a generator. > > gen_double -- A function pointer that yields a double-precision number in [0, > 1] > (possibly omitting one or both of the endpoints depending on the > implementation). Are 32-bit numbers really the right least common denominator? There are plenty of 64-bit platforms out there... Given this API, implementing a subclassable class that exports it should satisfy most people's needs for interchangeable generators. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Random number generators
On 04/09/06, Robert Kern <[EMAIL PROTECTED]> wrote: > A. M. Archibald wrote: > > > In the same project I also noticed it would be nice to be able to > > (say) do "exponential(2+sin(arange(10)))" to get an array of > > exponentials with varying parameters. > > Travis O. recently added this capability. The distribution parameters (loc, > scale, alpha, whatever) are broadcast against each other using the same rules > as > ufunc parameters. Wonderful! Thank you, Travis O. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in round with negative number of decimals
On 04/09/06, Sebastian Haase <[EMAIL PROTECTED]> wrote: > Thanks for the reply - but read what the doc says: > >>> N.around.__doc__ > 'Round 'a' to the given number of decimal places. Rounding > behaviour is equivalent to Python. > > Return 'a' if the array is not floating point. Round both the real > and imaginary parts separately if the array is complex. > ' > > it is *not* done this way in Python: > >>> round(.5) > 1.0 > >>> round(1.5) > 2.0 > >>> round(2.5) > 3.0 > > ( the round obj. method is missing this doc string ) Um, well, the doc is wrong. Numpy should *not* always follow python's lead, and in partciular it's explicit that Numeric's floating point is closer to IEEE floating-point behaviour than python's - compare, for example 1./0. and array(1.)/0. > I really think we should stick to what the doc string say - everybody > expects x.5 to round up !! Not everybody. This (admittedly odd) behaviour wasn't decided on because it was more efficient to implement, it was decided on because a group of very smart numerical analysts agreed that it was the best way to avoid surprising naive users with biased results. (Non-naive users can normally pick from any of several other rounding modes if they want.) A better question to ask is, "Can I change numpy's rounding behaviour for my programs?" (And, more generally, "can I set all the various floating-point options that the IEEE standard and my processor both support?") I don't know the answer to that one, but it does seem to be a goal that numpy is trying for (hence, for example, the difference between numpy float scalars and python floats). A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] diagflat
On 03/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote: Travis has introduced the function diagflat for creating diagonal arrays. It flattens whatever array is supplied and returns its values as the diagonal of a 2-D array or matrix. As the function is currently undocumented I thought now might be a good time to discuss other possible choices for the name. Here is a quick list that comes to mind diagflat (current name) asdiagonal todiagonal asdiag todiag ... I was wishing for that just the other day... I think my vote is for "diagflat", to emphasize the initial flattening. In particular, there's another function I could wish for: given an array, pick an axis, treat the array like an array of vectors along that axis, and replace each vector with the diagonal matrix. So, supposing that it was called todiagonal: A = todiagonal(ones(2,3,4),axis=1) shape(A) (2,3,3,4) A[1,:,:,2] array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]) Specifically I find the forcible flattening a bit monstrous. And you can always just do todiagonal(A.flat). No such function seems to exist at the moment, so I'm attaching one; probably not as efficient as it could be with appropriate fancy indexing wizardry. A. M. Archibald from numpy import zeros, shape, newaxis, repeat, where, indices, asarray import sys def todiagonal(A,axis=-1): A = asarray(A) s = shape(A) if axis<0: axis += len(s) if not 0<=axis- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Random number generators
On 04/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote: > Except for the last, none conflict with current routines and can be added > without a branch. I expect adding MWC8222 might need more extensive work and > I will branch for that. So the questions are of utility and naming. I see > some utility for myself, otherwise I wouldn't be considering doing the work. > OTOH, I already have (C++) routines that I use for these things, so a larger > question might be if anyone else sees a use for these. I like speed, but it > is not always that important in everyday apps. How painful would it be to allow users to drop in their own sources of raw random numbers? A couple of years ago I was trying to analyze some X-ray timing data and we saw some peaks at the end of a long chain of periodicity-detecting software (Fourier transforms, parameter searching, et cetera). To rule out the possibility that they were coming from the random numbers we were using at one stage, I wanted to supply raw random numbers from /dev/random. Unfortunately, that forced me to write my own program to convert uniform random numbers to exponential. Allowing the function that generates raw random bytes to be overwritten would be extremely handy, even if it were painfully slow. In the same project I also noticed it would be nice to be able to (say) do "exponential(2+sin(arange(10)))" to get an array of exponentials with varying parameters. I realize all this is outside the scope of what you were asking. I would have found another pseudorandom number generator that I could just switch to a benefit; the rest of them are less exciting. When you say "32-bit integers" do you really mean 32-bit, or do you mean "machine width"? 32-bit integers may not be faster on 64-bit platforms... A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion