Re: [Numpy-discussion] array from list of lists

2006-11-12 Thread A. M. Archibald
On 12/11/06, Erin Sheldon <[EMAIL PROTECTED]> wrote:

> Actually, there is a problem with that approach.  It first converts
> the entire array to a single type, by default a floating type.  For
> very large integers this precision is insufficient.  For example, I
> have the following integer in my arrays:
>  94137100072000193L
> which ends up as
>   94137100072000192
> after going to a float and then back to an integer.

That's an unfortunate limitation of numpy; it views double-precision
floats as higher precision than 64-bit integers, but of course they
aren't. If you want to put all your data in a record array, you could
try transposing the lists using a list comprehension - numpy is not
always as much faster than pure python as it looks. You could then
convert that to a list of four arrays and do the assignment as
appropriate. Alternatively, you could convert your array into a
higher-precision floating-point format (if one is available on your
machine) before transposing and storing in a record array.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Could numpy.matlib.nanmax return a matrix?

2006-11-12 Thread A. M. Archibald
On 12/11/06, Keith Goodman <[EMAIL PROTECTED]> wrote:
> Is anybody interested in making x.max() and nanmax() behave the same
> for matrices, except for the NaN part? That is, make
> numpy.matlib.nanmax return a matrix instead of an array.

Sounds good to me; maybe put a patch in the tracker?

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] x.min() depends on ordering

2006-11-11 Thread A. M. Archibald
On 11/11/06, Charles R Harris <[EMAIL PROTECTED]> wrote:

> I think the problem is that the max and min functions use the first value in
> the array as the starting point. That could be fixed by using the first
> non-nan and returning nan if there aren't any "real" numbers.  But it
> probably isn't worth the effort as the behavior becomes more complicated. A
> better rule of thumb is to note that comparisons involving nans are
> basically invalid because nans aren't comparable -- the comparison violates
> trichotomy. Don't really know what to do about that.

Well, we could get simple consistent behaviour by taking inf as the
initial value for min and -inf as the first value for max, then
reducing as normal. This would then, depending on how max and min are
implemented, either return NaN if any are present, or return the
smallest/largest non-NaN value (or inf/-inf if there are none)

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse SVD

2006-11-11 Thread A. M. Archibald
On 11/11/06, koara <[EMAIL PROTECTED]> wrote:
> Hello, is there a way to do SVD of large sparse matrices efficiently in
> python? All i found was scipy (little sparse support, no SVD), pysparse
> (no SVD) and PROPACK (no python). Out of these, PROPACK seems the most
> enticing choice, but i have no experience porting fortran code and this
> seems too big a bite. Any pointers or suggestions are most welcome!

Numpy includes the program "f2py" for generating python wrappers for
fortran code; it's not too difficult to use, even for a non-fortran
programmer like me.

If you do write wrappers, please send them to the scipy list - it's
always good to build up the library.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] A reimplementation of MaskedArray

2006-11-09 Thread A. M. Archibald
On 09/11/06, Paul Dubois <[EMAIL PROTECTED]> wrote:

> Since the function of old retired persons is to tell youngsters stories
> around the campfile:

I'll pull up a log. But since I'm uppity, and not especially young, I
hope you won't mind if I heckle.

> A supercomputer hardware designer told me that when forced to add IEEE
> arithmetic to his designs that it decreased performance substantially, maybe
> 25-30%; it wasn't that doing the operations took so much longer, it was that
> the increased physical space needed for that circuitry pushed the memory
> farther away. Doubtless this inspired doing some of it in software instead.

The goal of IEEE floats is not to be faster but to make doing correct
numerical programming easier. (Numerical analysts can write robust
numerical code for almost any kind of float, but the vast majority of
numerical code is written by scientists who know the bare minimum or
less about numerical analysis, so a good system makes such code work
as well as possible.)

The other reason IEEE floats are good is, no disrespect to your
hardware designer friend intended, that they were designed with some
care. Reimplementations are liable to get some of the finicky details
wrong (round-to-even, denormalized numbers, what have you...). I found
Kahan's "How Java's Floating-point Hurts Everyone Everywhere" (
http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf ) very educational
when it comes to "what can go wrong with platform-dependent
floating-point".

> No standard for controlling the behaviors exists, either, so you can find
> out the hard way that underflow-to-zero is being done in software by
> default, and that you are doing a lot of it. Or that your code doesn't have
> the same behavior on different platforms.

Well, if the platform is IEEE-compliant, you can trust it to behave
the same way logicially, even if some applications are slower on some
systems than others. Or did you mean that there's no standard way to
set the various IEEE flags? That seems like a language/compiler issue,
to me (which is not to minimize the pain it causes!)

> To my mind, all that was really accomplished was to convince said youngsters
> that somehow this NaN stuff was the solution to some problem. In reality,
> computing for 3 days and having it print out 1000 NaNs is not exactly
> satisfying. I think this stuff was essentially a mistake in the sense that
> it is a nice ivory-tower idea that costs more in practice than it is worth.
> I do not think a properly thought-out and coded algorithm ought to do
> anything that this stuff is supposed to 'help' with, and if it does do it,
> the code should stop executing.

Well, then turn on floating-point exceptions. I find a few NaNs in my
calculations relatively benign. If I'm doing a calculation where
they'll propagate and destroy everything, I trap them. But it happens
just as often that I launch a job, a NaN appears early on, but I come
back in the morning to find that instead of aborting, the job has
completed except for a few bad data points.

If you want an algorithm that takes advantage of NaNs, I have one. I
was simulating light rays around a black hole, generating a map of the
bending for various view angles. So I naturally did a lot of exploring
of grazing incidence, where the calculations would often yield NaNs.
So I used the NaNs to fill in the gap between the internal bending of
light and the external bending of the light; no extra programming
effort required, since that was what came out of the ray tracer
anyway. The rendering step just returned NaNs for the image pixels
that came from interpolating in the NaN-filled regions, which came out
black; just what I wanted. I just wrote the program ignoring what
would happen if a NaN appeared, and it came out right. Which is, I
think, the goal - to save the programmer having to think too much
about numerical-analysis headaches.

> Anyway, if I thought it would do the job I wouldn't have written MA in the
> first place.

It's certainly not always a good solution; in particular it's hard to
control the behaviour of things like where(). There's also no NaN for
integer types, which makes MaskedArray more useful.

Anyway, I am pleased that numpy has both controllable IEEE floats and
MaskedArray, and I apologize for a basically off-topic post.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] A reimplementation of MaskedArray

2006-11-08 Thread A. M. Archibald
On 08/11/06, Tim Hochberg <[EMAIL PROTECTED]> wrote:

> It has always been my experience (on various flavors or Pentium) that
> operating on NANs is extremely slow. Does anyone know on what hardware
> NANs are *not* slow? Of course it's always possible I just never notice
> NANs on hardware where they aren't slow.

On an opteron machine I have access to, they appear to be no slower
(and even faster for some transcendental functions) than ordinary
floats:

In [13]: a=zeros(100)

In [14]: %time for i in xrange(1000): a += 1.1
CPU times: user 6.87 s, sys: 0.00 s, total: 6.87 s
Wall time: 6.87

In [15]: a *= NaN

In [16]: %time for i in xrange(1000): a += 1.1
CPU times: user 6.86 s, sys: 0.00 s, total: 6.86 s
Wall time: 6.87

On my Pentium M, they are indeed significantly slower (three times? I
didn't really do enough testing to say how much). I am actually rather
offended by this unfair discrimination against a citizen in good
standing of the IEEE floating point community.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] A reimplementation of MaskedArray

2006-11-08 Thread A. M. Archibald
On 08/11/06, Pierre GM <[EMAIL PROTECTED]> wrote:

> I like your idea, but not its implementation. If MA.masked_singleton is
> defined as an object, as you suggest, then the dtype of the ndarray it is
> passed to becomes 'object', as you pointed out, and that is not something one
> would naturally expec, as basic numerical functions don't  work well  with the
> 'object' dtype (just try  N.sqrt(N.array([1],dtype=N.object)) to see what I
> mean).
> Even if we can construct a mask rather easily at the creation of the masked
> array, following your 'a==masked' suggestion, we still need to get the dtype
> of the non-masked section, and that doesn't seem trivial...

A good candidate for "should be masked" marked is NaN. It is supposed
to mean, more or less, "no sensible value". Unfortunately, integer
types do not have such a special value. It's also conceivable that
some user might want to keep NaNs in their array separate from the
mask. Finally, on some hardware, operations with NaN are very slow (so
leaving them in the array, even masked, might not be a good idea).

The reason I suggest this is that in the last major application I had
for numpy, one stage of the problem would occasionally result in NaNs
for certain values, but the best thing I could do was leave them in
place to represent "no data". Switching to a MaskedArray might have
been a better idea, but the NaNs were a rare occurrence.

> About the conversion to ndarray:
> By default, the result should have the same dtype as the _data section.
> For this reason, I disagree with your idea of "(returning) an object ndarray
> with the missing value containing the masked singleton". If you really want
> an object ndarray, you can use the filled method or the filled function, with
> your own definition of the filling value (such as your MaskedScalar).

If you've got floating point, you can again fill in NaNs, but you have
a good point about wanting to extract the original values that were
masked out. Depending on what one is doing, one might want one or the
other.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] numarray indexing problems with Python2.5 and 64-bit platforms

2006-11-02 Thread A. M. Archibald
On 02/11/06, Francesc Altet <[EMAIL PROTECTED]> wrote:

> I see this as a major issue in numarray and poses in great danger the
> intended support of PyTables for numarray that we planned for some time
> (until end of 2007). It would be nice to know if the numarray crew would
> be willing to address this, or, now that NumPy 1.0 is out, they have
> decided to completely drop the support for it.
>
> We'd really like to continue offering support for numarray (in the end,
> it is a very good piece of software) in PyTables, but don't having a
> solution for this problem anytime soon, will make this very problematic
> to us.

Someone has to say it: you could just drop support for the obsolete
numarray and provide only support for its successor, numpy.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Compressed index sets, generators and iterators

2006-11-01 Thread A. M. Archibald
On 01/11/06, Bill Baxter <[EMAIL PROTECTED]> wrote:

> What's the reason iterators are not supported currently?
> For instance  A[range(0,4)] works for a 1d A, but A[xrange(0,4)] does not.
> Are iterators just too inefficient to bother with?

If you want an iterator back, a generator comprehension will do it:
(A[i] for i in xrange(0,4))

If the result is to be a numpy array, the size must be known ahead of
time (so that the memory can be allocated) and specified. At this
point, and considering the overhead in calling back and forth to the
generator, the cost of converting to a list (after all,
A[list(xrange(0,40)] should work fine) isn't necessarily worth the
trouble.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Need more comments from scientific community on python-dev

2006-11-01 Thread A. M. Archibald
On 01/11/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:

> And it may be a good idea to also have a get_ctype method or some-such
> on the ctypes attribute so that  one could get a "ctypes description"
> from the NumPy data-type.

It seems to me that at the python level, there's not much reason to
choose the dtypes proposal over ctypes. There is one at the C level,
it seems (though I, like perhaps most of the people on python-dev,
have never actually tried using either). So perhaps, to persuade the
python-dev folks, what is needed is a comparison of what has to be
done at the C level. What would it take to rewrite numpy to use
ctypes? There seems to be some problem with extending the type objects
used by ctypes, but it's not very clear to me what that problem is
(what the extensions are supposed to do).

The argument that *some* datatypes format should become standard is much easier.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Strange and hard to reproduce crash

2006-10-30 Thread A. M. Archibald
On 30/10/06, Fernando Perez <[EMAIL PROTECTED]> wrote:
> On 10/30/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
> >  I suspect the real problem is that the refcount keeps going up. Even if it
> > was unsigned it would eventually wrap to zero and with a bit of luck get
> > garbage collected. So probably something isn't decrementing the refcount.
>
> Oops, my bad: I meant *unsigned long long*, so that the refcount is a
> 64-bit object.  By the time it wraps around, you'll have run out of
> memory long ago.  Having 32 bit ref counters can potentially mean you
> run out of the counter before you run out of RAM on a system with
> sufficient memory.

Yes, this is a feature(?) of python as it currently stands (I checked
2.5) - reference counts are 32-bit signed integers, so if you have an
object that has enough references, python will be exceedingly unhappy:
http://mail.python.org/pipermail/python-dev/2002-September/028679.html

It is of course possible that you actually have that many references
to some object, but it seems to me you'd notice twenty-four gigabytes
of pointers floating around...

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Sorting eigenvalues and vectors together

2006-10-27 Thread A. M. Archibald
On 27/10/06, jeremito <[EMAIL PROTECTED]> wrote:
> I am using a = numpy.linalg.eig(A) to get the eigenvalues and
> eigenvectors.  I am interested only in the largest eigenvalue so I
> would like to sort the first element of a.  But if I do that, I won't
> know what is the associated eigenvector.  Is there a function that will
> sort the values and vectors together or do I need to write it myself.

I think that they are sorted, but argsort() will do what you want (or
argmax(), even).

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorize pitfall

2006-10-25 Thread A. M. Archibald
On 25/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:

> It takes old "type-codes" as one big string so you say
>
> vectorize(f,otypes='d')
>
> This should be modernized to handle a list of dtype objects.
>
> I've fixed vectorize in SVN.

Thanks!

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


[Numpy-discussion] vectorize pitfall

2006-10-25 Thread A. M. Archibald
Hi,

Vectorize is a very handy function, but it has at least one pitfall:

def f(x):
if 1.3http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] memory position of numpy arrays

2006-10-23 Thread A. M. Archibald
On 24/10/06, Lars Friedrich <[EMAIL PROTECTED]> wrote:

> I am not really sure about the term "locking". Does that mean that this
> part is not paged, or that this part is not accessed by two entities at
> the same time? Or both?

There are two kinds of locking, and really, you probably want both.
But mlock() just ensures that the virtual memory stays in actual RAM.

> Is my way a common way? I mean, letting python/numpy do the memory
> allocation by creating a numpy-array with zeros in and passing its
> memory location to the hardware-API?

It's not necessary to do it this way. I think a more usual approach
would be to create the buffer however is convenient in your C code,
then provide its address to numpy. You can then use the ndarray
function from python to tell it how to interpret that buffer as an
array. Since the C code is creating the buffer, you can make sure it
is in a special locked area of memory, ensure that the garbage
collector never comes calling for it, or whatever you like.

If you're having problems with driver stability, though, you may be
safest having your C code copy the buffer into a numpy array in one
shot - then you have complete control over when and how the DMA memory
is accessed. (In C, I'm afraid, but for this sort of thing C is
well-suited.)

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Model and experiment fitting.

2006-10-20 Thread A. M. Archibald
On 20/10/06, Sebastian Żurek <[EMAIL PROTECTED]> wrote:


> Is there something like that in any numerical python modules (numpy,
> pylab) I could use?

In scipy there are some very convenient spline fitting tools which
will allow you to fit a nice smooth spline through the simulation data
points (or near, if they have some uncertainty); you can then easily
look at the RMS difference in the y values. You can also, less easily,
look at the distance from the curve allowing for some uncertainty in
the x values.

I suppose you could also fit a curve through the experimental points
and compare the two curves in some way.

> I can imagine, I can fit the data with some polynomial or whatever,
> and than compare the fitted data, but my goal is to operate on
> as raw data as it's possible.

If you want to avoid using an a priori model, Numerical Recipes
discuss some possible approaches ("Do two-dimensional distributions
differ?" at http://www.nrbook.com/a/bookcpdf.html is one) but it's not
clear how to turn the problem you describe into a solvable one - some
assumption about how the models vary between sampled x values appears
to be necessary, and that amounts to interpolation.

A. M. Archibald
-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Fortran-ordering quiz

2006-10-20 Thread A. M. Archibald
On 18/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:

> If there are any cases satisfying these rules where a copy does not have
> to occur then let me know.

For example, zeros((4,4))[:,1].reshape((2,2)) need not be copied.

I filed a bug in trac and supplied a patch to multiarray.c that avoids
copies in PyArray_NewShape unless absolutely necessary.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] questions regarding assignement and copy

2006-10-18 Thread A. M. Archibald
On 18/10/06, David Cournapeau <[EMAIL PROTECTED]> wrote:
> Sven Schreiber wrote:
> >
> > Yes it's intended; as far as I understand the python/numpy syntax, <+>
> > is an operator, and that triggers assignment by copy (even if you do
> > something trivial as bar = +foo, you get a copy, if I'm not mistaken),
> >
> So basically, whenever you have
>
> foo = expr
>
> with expr is a numpy expression containing foo, you trigger a copy ?

No. "=" never copies, but "+" does: when you do "A+B" you produce a
new, freshly-allocated array, which you can then store (a reference
to) in any variable you like, including A or B. So

M = M+1

makes a copy because "M+1" is a new matrix, which you are assigning to
M. Unfortunately, this means if you do:

M = 2*(M+1)

python makes the new matrix M+1, then makes the new matrix 2*(M+1),
discards M+1, and sets M to point to 2*(M+1). If you want to avoid all
this copying, you can use python's in-place operators:

M += 1
M *= 2

This is actually a standard difficulty people have with python, made
more obvious because you're working with mutable arrays rather than
immutable scalars.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Fortran-ordering quiz

2006-10-18 Thread A. M. Archibald
On 18/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:
>
> I just committed a change to the check-for-copy code to SVN.   A copy
> will occur if an actual reshaping is taking place that involves more
> than just adding or deleting ones from the old shape and if
>
> 1) self is not "single-segment".  Single-segment means either
> Fortran-order or C-order.  If we don't have a single-segment array, then
> we have to get a (Fortran or C-) contiguous buffer to do any reshaping
> (there may be special cases when it's possible, but I haven't figure
> them out...)

Indeed it is often possible to reshape an array that is not
contiguous; for example zeros(100)[::10] is not contiguous but can be
reshaped to any shape without any copying.

More generally, there are all sorts of peculiar arrangements that can
be reshaped witout a copy, for example:

input has lengths (8,2,3) and strides (39,9,3); output should have
lengths (4,4,3,2), so take strides (156,39,6,3)

Whether this sort of thing actually *occurs* very often, I don't know...

> 2) self is single-segment but self.squeeze().ndim > 1 and it is in the
> "wrong" order from what was requested in the order argument (i.e. self
> is in C-contiguous order but a Fortran-order reshape was requested or
> self is in F-contiguous order but a C-order reshape was requested).
>
> If there are any cases satisfying these rules where a copy does not have
> to occur then let me know.

There are. I came up with a general condition for describing exactly
when you need to copy an array, but it's difficult to put into words.
Here goes:

(0) Dimensions of length 1 are an irrelevant annoyance. Conceptually
they should be squeezed out before beginning and newaxised back in at
the end. (What strides should they have?) Assume that I've done so
from now on. (Actual code can just skip them appropriately.)

(1) If the array looks like an evenly-strided 1D array that has been
reshaped (so strides[i+-1]=lengths[i]*strides[i] for all i) it can
always be reshaped without a copy. (For example, zeros(1000)[::10] can
be reshaped arbitrarily and as many times as desired without copying.)

(2) If the array does *not* look like an evenly-strided 1D array that
has been reshaped, you can still reshape it without a copy if it looks
like an array of such arrays, each of which can be reshaped
separately. (For example, you can view an array of length (8,2,3) that
you have to resize to an array of size (4,4,3,2) as two separate
pieces: an array of size (2,3) that you have to resize to (3,2) and an
array of size (8,) that you need to resize to size (4,4). Each of
these pieces separately needs to look like a reshaped 1d array, but
there need be no relation between them.) (The condition for when you
can pull a smaller resize operation off the end is precisely when you
can find a tail segment of each tuple that has the same product; in
the case above, 2*3=3*2, so we could stop and do that reshape
separately.)

(3) If the array cannot be reshaped without a copy using the rules
above, it cannot be reshaped without a copy at all.


The python code in my previous message actually implements this rule,
if you want a less vague description.

I should also point out that views are not always more efficient than
copies (because of cache concerns, and because a small view can
prevent a giant block of memory from being deallocated); nevertheless
it should be up to the user to copy when it helps.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Fortran-ordering quiz

2006-10-18 Thread A. M. Archibald
On 18/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
>
> On 10/17/06, A. M. Archibald <[EMAIL PROTECTED]> wrote:
> > On 17/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
> > >
> > >
> > > On 10/17/06, Travis Oliphant <[EMAIL PROTECTED] > wrote:
> >
> > > > Thus, reshape does the equivalent of a Fortran ravel to [1,4,2,5,3,6]
> > > > and then a Fortran-order based fill of an empty (3,2) array:  giving
> you
> > > > the result.
> > >
> > > Why a Fortran ravel? I am thinking of it as preserving the memory
> layout,
> > > just messing with how it is addressed.
> >
> > Because, to first order, the memory layout is totally irrelevant to a
> > python user.
> >
> > array([[1,2,3],[4,5,6]],order='C')
> > array([[1,2,3],[4,5,6]],order='F')
> >
> > should be nearly identical to the python user.
>
> So what is the point of having a fortran layout if things are not actually
> layed out in fortran order? As far as I can see, memory layout is the only
> reason for fortran ordering. Now I can sort of see where the fortran ravel
> comes from, because the result can be passed to a fortran routine. And it
> does look like a.ravel('F') is the same as a.T.ravel(). Hmmm. Now I wonder
> about this:

Thins are laid out in Fortran order if you request Fortran order upon
array creation. You just can't see it, normally. Numpy tries hard to
make arrays of all different orders look identical.

Of course, the reason I said "to first order" is that when determining
when to copy and when not to, or when passing arrays to C or Fortran
functions, sometimes it does matter. For example, a typical Fortran
linear algebra routine needs its arrays to be in Fortran order in
memory.

> In [62]: array([[1,2,3],[4,5,6]], dtype=int8, order='F').flags
>  Out[62]:
>   CONTIGUOUS : False
>    FORTRAN : True
>   OWNDATA : True
>WRITEABLE : True
>   ALIGNED : True
>UPDATEIFCOPY : False

Unless you're working with C extensions, it doesn't make much sense to
ever look at flags. The memory layout does not affect normal array
behaviour (including reshape).

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Fortran-ordering quiz

2006-10-18 Thread A. M. Archibald

On 17/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:

Charles R Harris wrote:
>
> In [3]: array([[1,2,3,4,5,6]]).reshape((3,2),order='F')
> Out[3]:
> array([[1, 4],
>[2, 5],
>[3, 6]])
>
> I also don't understand why a copy is returned if 'F' just fiddles
> with the indices and strides; the underlying data should be the same,
> just the view changes. FWIW, I think both examples should be returning
> views.

You are right, it doesn't need to.   My check is not general enough.

It can be challenging to come up with a general way to differentiate the
view-vs-copy situation and I struggled with it.  In this case, it's the
fact that while self->nd > 1, the other dimensions are only of shape 1
and so don't really matter.   If you could come up with some kind of
striding check that would distinguish the two cases, I would appreciate it.


I wrote, just for exercise I guess, a routine that checks to see if a
copy is needed for a reshape (and if not, it computes the resulting
strides) for an arbitrary array. (It currently only does a C-order
reshape, but F-order would be an easy addition.) It does, however,
successfully handle arbitrarily strided arrays with arbitrary
arrangements of "1" dimensions having arbitrary strides. I think it
also correctly handles 0 strides. I don't imagine you'd want to *use*
it, but it does provide a complete solution to copy-less reshaping.

I'd be happy to help with the C implementation built into numpy, but
for the life of me I can't figure out how it works.


A. M. Archibald

P.S. I haven't written down a *proof* that this implementation never
copies unnecessarily, but I'm pretty sure that it doesn't. It has a
reshape_restricted that effectively does a ravel() first; this can't
cope with things like length (5,2,3) -> length (5,3,2) unless the
strides are conveniently equal, so there's a wrapper, reshape(), that
breaks the reshape operation up into pieces that can be done by
reshape_restricted. -AMA

from numpy import multiply, array, empty, zeros, add, asarray
from itertools import izip

class MustCopy(Exception):
pass

def index(indices, strides):
indices = asarray(indices)
strides = asarray(strides)
if len(indices)!=len(strides):
	raise ValueError
return add.reduce(indices*strides)

def reshape_restricted(from_strides,from_lengths,to_lengths):

if multiply.reduce(from_lengths)!=multiply.reduce(to_lengths):
	raise ValueError, "Array sizes are not compatible"
for i in xrange(1,len(from_strides)):
	if not (from_strides[i-1]==from_strides[i]*from_lengths[i] or 
		from_lengths[i]==1): # If a dimension has length one, who cares what its stride is?
	raise MustCopy

r = empty(len(to_lengths), dtype=int)
j = len(from_lengths)-1
while j>=0 and from_lengths[j]==1:
	j-=1 # Ignore strides on length-0 dimensions
if j<0: # Both arrays have size 1
	r[-1] = 1
else:
	r[-1] = from_strides[j]
i = -1
while len(to_lengths)+i>0:
	r[i-1]=to_lengths[i]*r[i]
	i -= 1

return r

def reshape(from_strides,from_lengths,to_lengths):
if multiply.reduce(from_lengths)!=multiply.reduce(to_lengths):
	raise ValueError

r = empty(len(to_lengths), dtype=int)

j1 = len(from_lengths)
j2 = len(to_lengths)

while j1!=0:
	i1 = j1-1
	i2 = j2-1

	p1 = from_lengths[i1]
	p2 = to_lengths[i2]

	while p1!=p2:
	if p1=0
		p1*=from_lengths[i1]
	else:
		i2-=1
		assert i2>=0
		p2*=to_lengths[i2]

	r[i2:j2] = reshape_restricted(from_strides[i1:j1],
		  from_lengths[i1:j1],
  to_lengths[i2:j2])
	
	j1 = i1
	j2 = i2

return r

def test_reshape(from_strides,from_lengths,to_strides,to_lengths):
if multiply.reduce(from_lengths)!=multiply.reduce(to_lengths):
	raise ValueError

if len(from_strides)!=len(from_lengths) or len(to_strides)!=len(to_lengths):
	raise ValueError

def walk(lengths):
	i = zeros(len(lengths),int)
	while True:
	yield i
	j = len(lengths)-1
	while i[j]==lengths[j]-1:
		i[j]=0
		j-=1
		if j<0:
		return
	i[j] += 1	

for (i1,i2) in izip(walk(from_lengths), walk(to_lengths)):
	if index(i1,from_strides)!=index(i2,to_strides):
	raise ValueError
-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Fortran-ordering quiz

2006-10-17 Thread A. M. Archibald
On 17/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
>
> On 10/17/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:

> > Thus, reshape does the equivalent of a Fortran ravel to [1,4,2,5,3,6]
> > and then a Fortran-order based fill of an empty (3,2) array:  giving you
> > the result.
>
> Why a Fortran ravel? I am thinking of it as preserving the memory layout,
> just messing with how it is addressed.

Because, to first order, the memory layout is totally irrelevant to a
python user.

array([[1,2,3],[4,5,6]],order='C')
array([[1,2,3],[4,5,6]],order='F')

should be nearly identical to the python user. If the storage order
mattered, you'd have to know, every time you used reshape, what order
your matrix was stored in (did it come from a transpose operation, for
example?).

As for why a Fortran ravel rather than a C ravel, that's a simplifying
decision. If you like, you can think of reshape as happening in two
steps:

o1='C'
o2='C'
A.reshape((6,),order=o1).reshape((2,3),order=o2)

The design could have allowed, as Travis Oliphant says, o1!=o2, but
explaining that would have been quite difficult (even more than now, I
mean). And in any case you can use the code above to achieve that, if
you really want.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] rcond in polyfit

2006-10-14 Thread A. M. Archibald
On 14/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
>
> On 10/13/06, A. M. Archibald <[EMAIL PROTECTED]> wrote:
> > On 13/10/06, Tim Hochberg <[EMAIL PROTECTED]> wrote:
> > > Charles R Harris wrote:
>
> 
>
> > > On the other hand if error handling is set to 'raise', then a
> > > FloatingPointError is issued. Is a FloatingPointWarning in order to
> > > mirror the FloatingPointError? And if so, would it be appropriate to use
> > > for condition number?
> >
> > I submitted a patchto use warnings for several functions in scipy a
> > while ago, and the approach I took was to create a ScipyWarning, from
> > which more specific warnings were derived (IntegrationWarning, for
> > example). That was perhaps a bit short-sighted.
> >
> > I'd suggest a FloatingPointWarning as a base class, with
> > IllConditionedMatrix as a subclass (it should include the condition
> > number, but probably not the matrix itself unless it's small, as
> > debugging information).
> >
>
> Let's pin this down a bit. Numpy seems to need its own warning classes so we
> can control the printing of the warnings. For the polyfit function I think
> it should *always* warn by default because it is likely to be used
> interactively and one can fool around with the degree of the fit. For
> instance, I am now scaling the the input x vector so norm_inf(x) == 1 and
> this seems to work pretty well for lots of stuff with rcond=-1 (about 1e-16)
> and a warning on rank reduction. As long as the rank stays the same things
> seem to work ok, up to fits of degree 21 on the test data that started this
> conversation.

Numerical Recipes (http://www.nrbook.com/a/bookcpdf/c15-4.pdf )
recommend setting rcond to the number of data points times machine
epsilon (which of course is different for single/double). We should
definitely warn the user if any singular value is below s[0]*rcond (as
that means that there is effectively a useless basis function, up to
roundoff).

I'm not sure how to control the default warnings setting ("once" vs.
"always"); it's definitely not possible using the standard API to save
the warnings state and restore it later. One might be able to push
such a change into the warnings module by including it in a
ContextManager.

ipython should probably reset all the "show once" warnings every time
it shows an interactive prompt. I suppose more accurately, it should
do that only for warnings the user hasn't given instructions about.
That way you'll get a warning about bad polynomial fits every time you
run a command that contains one, but if your function runs thousands
of fits you don't drown in warnings.

> BTW, how does one turn warnings back on? If I do a
>
> >>> warnings.simplefilter('always', mywarn)
>
> things work fine. Following this by
>
> >>> warnings.simplefilter('once', mywarn)
>
> does what is supposed to do. Once again issuing
>
> >>> warnings.simplefilter('always', mywarn)
>
> fails to have any effect. Resetwarnings doesn't help. Hmmm...

I don't get the impression that the warnings module is much tested; I
had similar headaches.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-dev] NumPy/SciPy + MPI for Python

2006-10-14 Thread A. M. Archibald
On 14/10/06, Gael Varoquaux <[EMAIL PROTECTED]> wrote:
> On Sat, Oct 14, 2006 at 06:58:45AM -0600, Bill Spotz wrote:
> > I would like to second the notion of converging on a single MPI
> > interface.  My parallel project encapsulates most of the inter-
> > processor communication within higher-level objects because the lower-
> > level communication patterns can usually be determined from higher-
> > level data structures.  But still, there are times when a user would
> > like access to the lower-level MPI interface.
>
> I think we are running in the same old problem of scipy/a super-package
> of scientific tools. It is important to keep numpy/scipy modular. People
> may want to install either one without an MPI interface. However it is
> nice that if somebody just wants "everything" without having to choose
> he can download that "everything" from scpy.org and that it comes well
> bundled together. So all we need to do is find a name for that
> super-package, put it together with good integration and add it to
> scipy.org.
>
> My 2 cents,
>
> Gaël

I agree. Moreover, being picked for such integration work would help
encourage people to converge on one MPI interface (for example).
There's some discussion of such a package at NumpyProConPage on the
wiki (and in particular at http://www.scipy.org/PyLabAwaits ). The
"scipy superpack" (which is only for Windows, as I understand it) is a
sort of beginning.

So, well, any suggestion for a name? pylab is already in use by
matplotlib (for some reason), as is Scientific Python (and numpy,
Numeric and numarray are obviously already confusing).

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] rcond in polyfit

2006-10-13 Thread A. M. Archibald
On 13/10/06, Tim Hochberg <[EMAIL PROTECTED]> wrote:
> Charles R Harris wrote:

> > That sounds good, but how to do it? Should I raise an exception?
> Use the warnings framework:
>
>  >>> import warnings
>  >>> warnings.warn("condition number is BAD")
> __main__:1: UserWarning: condition number is BAD
>
> The user can turn warnings on or off or turned in exceptions based on a
> variety of criteria. Look for the warnings filter in the docs.
>
> Which brings up a question: do we want to have a FloatingPointWarning or
> some such? Currently, if you use set the error handling to warn using
> seterr a runtime warning is issued:
>
>  >>> np.seterr(all='warn')
> {'over': 'ignore', 'divide': 'ignore', 'invalid': 'ignore', 'under':
> 'ignore'}
>  >>> np.arange(1) / 0
> __main__:1: RuntimeWarning: divide by zero encountered in divide
>
>
> On the other hand if error handling is set to 'raise', then a
> FloatingPointError is issued. Is a FloatingPointWarning in order to
> mirror the FloatingPointError? And if so, would it be appropriate to use
> for condition number?

I submitted a patchto use warnings for several functions in scipy a
while ago, and the approach I took was to create a ScipyWarning, from
which more specific warnings were derived (IntegrationWarning, for
example). That was perhaps a bit short-sighted.

I'd suggest a FloatingPointWarning as a base class, with
IllConditionedMatrix as a subclass (it should include the condition
number, but probably not the matrix itself unless it's small, as
debugging information).

The warnings module is frustratingly non-reentrant, unfortunately,
which makes writing tests very awkward.

> > I would also have to modify lstsq so it returns the degree of the fit
> > which would mess up the current  interface.
> One approach would be to write lstsqcond (or a better name) that returns
> both the fit and the condition number. listsq could then be just a
> wrapper over that which dumped the condition number.  IIRC, the
> condition number is available, but we're not returning it.

This is a very good idea. scipy.integrate.quad returns a pair (result,
error_estimate) and every time I use it I trip over that. (Perhaps if
I were a fine upstanding numerical analyst I would be checking the
error estimate every time, but it is a pain.) Another option would be
a "full_output" optional argument.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] rcond in polyfit

2006-10-13 Thread A. M. Archibald
On 13/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
>
>
> On 10/13/06, A. M. Archibald <[EMAIL PROTECTED]> wrote:
> > On 12/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
> > > Hi all,
> > >
> > > I note that polyfit looks like it should work for single and double,
> real
> > > and complex, polynomials. On the otherhand, the default rcond needs to
> > > depend on the underlying precision. On the other, other hand, all the
> svd
> > > computations are done with dgelsd or zgelsd, i.e., double precision.
> Even so
> > > problems can arise from inherent errors of the input data if it is
> single
> > > precision to start with. I also think the final degree of the fit should
> be
> > > available somewhere if wanted, as it is an indication of what is going
> on.
> > > Sooo, any suggestions as to what to do? My initial impulse would be to
> set
> > > rcond=1e-6 for single, 1e-14 for double, make rcond a keyword, and kick
> the
> > > can down the road on returning the actual degree of the fit.
> >
> > I'd also be inclined to output a warning (which the user can ignore,
> > read or trap as necessary) if the condition number is too bad or they
> > supplied an rcond that is too small for the precision of their data.
>
> That sounds good, but how to do it? Should I raise an exception? I would
> also have to modify lstsq so it returns the degree of the fit which would
> mess up the current  interface.

Python's warnings module is a decent solution for providing this
information. Goodness-of-fit worries me less than ill-conditioning -
users are going to expect the curve to deviate from their function
(and an easy reliable way to get goodness of fit is
sqrt(sum(abs(f(xs)-polynomial(xs))**2)); this is certain to take into
account any roundoff errors introduced anywhere). But they may well
have no idea they should be worried about the condition number of some
matrix they've never heard of.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Polyfit

2006-10-13 Thread A. M. Archibald
On 13/10/06, Greg Willden <[EMAIL PROTECTED]> wrote:

> What about including multiple algorithms each returning a figure of fit?
> Then I could try two or three different algorithms and then use the one that
> works best for my data.

The basic problem is that X^n is rarely a good basis for the functions
on [a,b]. So if you want it to return the coefficients of a
polynomial, you're basically stuck. If you *don't* want that, there's
a whole bestiary of other options.

If you're just looking to put a smooth curve through a bunch of data
points (perhaps with known uncertainties), scipy.interpolate includes
some nice spline fitting functions.

If you're looking for polynomials, orthogonal polynomials may serve as
a better basis for your interval; you can look in scipy.special for
them (and leastsq will fit them to your points). Extracting their
coefficients is possible but will bring you back to numerical
instabilities.

In any case, all this is outside the purview of numpy (as is polyfit, frankly).

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Polyfit

2006-10-13 Thread A. M. Archibald
On 13/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote:

> You can also get *much* better results if you scale the x interval to [0,1]
> as the problem will be better posed. For instance, with your data and a
> degree 10 fit I get  a condition number of about 2e7 when x is scaled to
> [0,1], as opposed to about 1e36 when left as is. The former yields a
> perfectly useable fit while the latter blows up. I suppose this could be
> built into the polyfit routine if one were only interested in polynomial
> fits of some sort, but the polynomial would have to carry around an offset
> and scale factor to make evaluation work.

[-1,1] would probably be even better, no?

> If Travis is interested in such a thing we could put together some variant
> of the polynomials that includes the extra data.

At this point you might as well use a polynomial class that can
accomodate a variety of bases for the space of polynomials - X^n,
(X-a)^n, orthogonal polynomials (translated and scaled as needed),
what have you.

I think I vote for polyfit that is no more clever than it has to be
but which warns the user when the fit is bad.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] rcond in polyfit

2006-10-13 Thread A. M. Archibald
On 12/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I note that polyfit looks like it should work for single and double, real
> and complex, polynomials. On the otherhand, the default rcond needs to
> depend on the underlying precision. On the other, other hand, all the svd
> computations are done with dgelsd or zgelsd, i.e., double precision. Even so
> problems can arise from inherent errors of the input data if it is single
> precision to start with. I also think the final degree of the fit should be
> available somewhere if wanted, as it is an indication of what is going on.
> Sooo, any suggestions as to what to do? My initial impulse would be to set
> rcond=1e-6 for single, 1e-14 for double, make rcond a keyword, and kick the
> can down the road on returning the actual degree of the fit.

I'd also be inclined to output a warning (which the user can ignore,
read or trap as necessary) if the condition number is too bad or they
supplied an rcond that is too small for the precision of their data.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Should numpy.sqrt(-1) return 1j rather than nan?

2006-10-12 Thread A. M. Archibald
On 12/10/06, David Goldsmith <[EMAIL PROTECTED]> wrote:

> I don't use scipy (and don't want to because of the overhead) but it
> sounds like I should because if I'm taking the square root of a variable
> whose value at run time happens to be real but less than zero, I *want*
> the language I'm using to return an imaginary; in other words, it's not
> the scipy behavior which "scares" me, its the numpy (which I do/have
> been using) behavior.  To which you might say, "Well, if that's what you
> want, and you have Matlab (as you've said you do), then just use that."
> But that's precisely the point: people who don't want to be bothered
> with having to be "a bit more care[ful]" (as Chuck put it) - and who can
> afford it - are going to be inclined to choose Matlab over numpy.
> Perhaps one doesn't care - in the grand scheme of things, it certainly
> doesn't matter - but I think that you all should be aware that this
> numpy "feature" will be seen by many as more than just a nuisance.

Scipy doesn't do what you want. What you want is to use complex
numbers throughout; then numpy and scipy will do exactly what you
request.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Should numpy.sqrt(-1) return 1j rather than nan?

2006-10-11 Thread A. M. Archibald
On 11/10/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> """
> In SciPy 0.3.x the ufuncs were overloaded by more "intelligent" versions.
> A very attractive feature was that sqrt(-1) would yield 1j as in Matlab.
> Then you can program formulas directly (e.g., roots of a 2nd order
> polynomial) and the right answer is always achieved. In the Matlab-Python
> battle in mathematics education, this feature is important.
>
> Now in SciPy 0.5.x sqrt(-1) yields nan. A lot of code we have, especially
> for introductory numerics and physics courses, is now broken.
> This has already made my colleagues at the University skeptical to
> Python as "this lack of backward compatibility would never happen in Matlab".
>
> Another problem related to Numeric and numpy is that in these courses we
> use ScientificPython several places, which applies Numeric and will
> continue to do so. You then easily get a mix of numpy and Numeric
> in scripts, which may cause problems and at least extra overhead.
> Just converting to numpy in your own scripts isn't enough if you call
> up libraries using and returning Numeric.
> """
>
> I wonder, what are the reasons that numpy.sqrt(-1) returns nan?
> Could sqrt(-1) made to return 1j again? If not, shouldn't
> numpy.sqrt(-1) raise a ValueError instead of returning silently nan?

What is the desired behaviour of sqrt?

Should sqrt always return a complex array, regardless of the type of
its input? This will be extremely surprising to many users, whose
memory usage suddenly doubles and for whom many functions no longer
work the way they're accustomed to.

Should it return a complex array only when any entry in its input is
negative? This will be even *more* surprising when a negative (perhaps
even -0) value appears in their matrix (for example, does a+min(a)
yield -0s in the minimal values?) and suddenly it's complex.

A ValueError is also surprising, and it forces the user to sanitize
her array before taking the square root, instead of whenever
convenient.

If you want MATLAB behaviour, use only complex arrays.

If the problem is backward incompatibility, there's a reason 1.0
hasn't been released yet...

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Things to address for Py3K

2006-10-11 Thread A. M. Archibald
On 11/10/06, Charles R Harris <[EMAIL PROTECTED]> wrote:

> Speaking long term, what about data types? The basic 80 bit extended
> precision float now occurs in 80, 96, and 128 bit versions depending on
> alignment. So what happens when quad precision, which will probably be in
> the next IEEE standard, comes down the pike and is 128 bits long? The length
> of a float will no longer be sufficient to distinguish the various types.

This doesn't require python 3K; it can happn in numpy with no language
support. But perhaps python 3K can be viewed as a chance to ditch
backwards-compatibility baggage? (cough, Numeric compatibility, cough)

Would it be of interest to have numeric datatypes integrated with
python datatypes? How about IEEE floats in python proper, at least? It
can be rather confusing when doing a calculation yields different
results for arrays than for ordinary scalars...

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] can this be made faster?

2006-10-09 Thread A. M. Archibald
> > > > c contains arbitray floats.
> > > > essentially it is to compute class totals
> > > > as in total[class[i]] += value[i]

> > This seems like a rather common operation - I know I've needed it on
> > at least two occasions - is it worth creating some sort of C
> > implementation? What is the appropriate generalization?
>
> Some sort of indirect addressing infrastructure. But it looks like this
> could be tricky to make safe, it would need to do bounds checking at the
> least and would probably work best with a contiguous array as the target. I
> could see some sort of low-level function called argassign(target, indirect
> index, source) that could be used to build more complicated things in
> python.

If it were only assignment that was needed, fancy indexing could
already handle it. The problem is that this is something that can't
*quite* be done with the current fancy indexing infrastructure - every
time an index comes up we want to add the value to what's there,
rather than replacing it. I suppose histogram covers one major
application; in fact if histogram allowed weightings ("count this
point as -0.6") it would solve the OP's problem.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] can this be made faster?

2006-10-09 Thread A. M. Archibald
On 09/10/06, Robert Kern <[EMAIL PROTECTED]> wrote:
> Daniel Mahler wrote:
> > In my case all a, b, c are large with b and c being orders of
> > magnitude lareger than a.
> > b is known to contain only, but potentially any, a-indexes,  reapeated
> > many times.
> > c contains arbitray floats.
> > essentially it is to compute class totals
> > as in total[class[i]] += value[i]
>
> In that case, a slight modification to Greg's suggestion will probably be 
> fastest:

If a is even moderately large and you don't care what's left behind in
b and c you will probably accelerate the process by sorting b and c
together (for cache coherency in a)

This seems like a rather common operation - I know I've needed it on
at least two occasions - is it worth creating some sort of C
implementation? What is the appropriate generalization?

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] can this be made faster?

2006-10-08 Thread A. M. Archibald
On 08/10/06, Robert Kern <[EMAIL PROTECTED]> wrote:
> Bill Baxter wrote:
> > Yes, that'd be
> >a[b] += c
>
> No, I'm afraid that fancy indexing does not do the loop that you are thinking 
> it
> would (and for reasons that we've discussed previously on this list, *can't* 
> do
> that loop). That statement reduces to something like the following:

So the question remains, is there a for-loop-free way to do this?

(This, specifically, is:
for i in range(len(b)):
a[b[i]]+=c[i]
where b[i] may contain repetitions.)

I didn't find one, but came to the conclusion that for loops are not
necessarily slower than fancy indexing, so the way to do this one is
just to use a for loop.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] super-packages [was "Hello and my first patch"]

2006-10-06 Thread A. M. Archibald
On 06/10/06, Christopher Barker <[EMAIL PROTECTED]> wrote:
> A. M. Archibald wrote:
> > Is setting out to create a "MATLAB replacement" a good goal?
>
> No.

I see what you mean, but I still think that using MATLAB as a
yardstick ("Can users do everything with the python superpackage that
they could with MATLAB?" "Is it of comparable or lesser difficulty?")
is a good idea, at least until we've clearly outstripped it in all
categories.

> MATLAB two advantages (and they are big ones!): integrated plotting and
> a large and broad library of ready-to-use functions for a wide variety
> of computing.

This is what I mean about using it as a yardstick.

As a MATLAB survivor, would you be interested in going to
http://www.scipy.org/NumPy_for_Matlab_Users and adding numerical tools
that MATLAB has, so that we can use it as a checklist for what tools
we'd like to have (first)? I put a short list there, but it's years
since I last used MATLAB.

> MPL has almost solved one, and SciPy is getting better an better -- so
> we're really on the right track!

I agree.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] super-packages [was "Hello and my first patch"]

2006-10-06 Thread A. M. Archibald
On 06/10/06, Jon Peirce <[EMAIL PROTECTED]> wrote:
>  > Perhaps that is the best way to move forward along with the work on a
>  > "pylab" super-package.
>
> [...]
>
> But why not scipy? It seems the right name for a super-package.

It does, particularly if (as is being discussed) the current content
of scipy might get subdivided into separately buildable modules
(weave, fortran, non-fortran, say). scipy would then refer to the lot
- numpy, matplotlib, weave, scipy-fortran, scipy-nonfortran, and any
others that seem essential.

Is setting out to create a "MATLAB replacement" a good goal? I think
so. Really the goal should be to create a useful tool for scientists,
but I think MATLAB is clearly already that, so (at first) it can serve
as a definite objective. Of course, anyone who wants to make scipy
*better* than MATLAB should be warmly encouraged...

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Hello and my first patch

2006-10-05 Thread A. M. Archibald
On 05/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:

> I think a hybrid for weave / f2py / ctypes that allows "inlining in
> multiple languages" as well as automatic extension module generation for
> "already-written" code is in order.

It might make sense to also include SWIG (since that seems to be a
popular choice for wrapping "already-written" C and C++ code).

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Hello and my first patch

2006-10-05 Thread A. M. Archibald
On 05/10/06, Greg Willden <[EMAIL PROTECTED]> wrote:
> On 10/5/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:
> > Perhaps that is the best way to move forward along with the work on a
> > "pylab" super-package.
>
> That is exactly what I want.

What is unsatisfactory about installing numpy+scipy+matplotlib? I've
found they're generally pretty complete (except where no decent python
alternative exists).

> In the end I want a nice collection of functions, logically organized, that
> let me analyze/filter/plot etc. etc. etc.
>
> The key for me is "logically organized".

For the most part, the organization is pretty logical:
* Basic array and matrix operations in numpy
* linear algebra, differential equation, interpolation, etc. tools are
in scipy, each in their own subpackage
* weave is mysteriously in scipy
* plotting tools are in matplotlib

There are a few historical quirks, like window functions in numpy
(they really belong in scipy), and some of the less-used scipy
subpackages are a bit of a mess internally (scipy.interpolate for
example), but otherwise I'm not sure what you want to be different.

> And right now that means "So a newbie can find the function I need and the
> function I need is there"
>
> I'm not criticising.  I'd like to help get there.

Install all three major packages. Use the window functions from scipy
in matplotlib.

Task-oriented documentation is so far a bit scant, although the scipy
cookbook (http://www.scipy.org/Cookbook ) and the numpy examples list
(http://www.scipy.org/Numpy_Example_List ) are a good start.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Hello and my first patch

2006-10-05 Thread A. M. Archibald
On 05/10/06, Greg Willden <[EMAIL PROTECTED]> wrote:

> That is a much better policy in my view.
>
> I (gently) encourage this group (Travis?) to make this the policy for
> Numpy/Scipy.
>
> From my view as a newbie to numpy/scipy/matplotlib it isn't clear where I
> should look for what functionality.  Matplotlib plots the spectrogram but it
> only supports two or three window functions.  Numpy supports 4 or 5 window
> functions and Scipy apparently supports more but Matplotlib doesn't support
> Scipy.  Of course this is a minor example and I could just write the window
> function myself and then use it in Matplotlib but I want to give back so
> that the project can grow.  I'd really like to be able to leave Matlab
> behind and encourage everyone else to do the same but there are still these
> annoyances that need to be worked out.

Unfortunately, that policy (put it in numpy if it doesn't make the
build dependencies any worse) makes it even harder for the user to
figure out what is where. Say I want a fast matrix product. Do I look
in numpy or scipy? It'll run faster if it uses a tuned BLAS, so it
ought to have external requirements, so I'd look in scipy, but maybe
there's a simple non-tuned implementation in numpy instead...

Frankly, I tend to prefer the other approach to solving all these
issues: distribute numpy, scipy, and matplotlib as one bundle. The
requirements for scipy are not particularly onerous, particularly if
it comes as part of your distribution. There are currently some
problems successfully finding optimized versions of LAPACK and BLAS,
but to me the benefits of bundling the packages together outweigh the
difficulties:

* routines and objects can be in the package in which they make the
most semantic sense, rather than the one with the correct external
dependencies (how is a user supposed to know whether convolution uses
an external library or not?)

* documentation can be cross-referenced between packages (so the
Matrix class can tell people to look in scipy.linalg for inverses, for
example)

* users can more easily find what's available rather than rewriting from scratch

* derived packages (ipython, IDEs, MATLAB-alikes) have many fewer
possibilities to support

I'm not arguing that the development processes need to be connected,
just that making the primary distributed object a package containing
all the components will make life easier for everyone involved.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] "Manually" broadcasting arrays in Python

2006-10-03 Thread A. M. Archibald

On 04/10/06, Robert Kern <[EMAIL PROTECTED]> wrote:


Yeah, a segfault would be problematic. Otherwise, it works and is in fact faster
than what I wrote.


It's a bit tricky to trigger but I think it was fixed in 1.0rc1 (in
changeset 3125, in fact:
http://projects.scipy.org/scipy/numpy/changeset/3125 )

Would it be useful for me to contribute the tiny script I wrote to
trigger it as a regression test?

A. M. Archibald

from numpy import vectorize, zeros

vt = vectorize(lambda *args: args)
# Removing either of the following lines cures the segfault
vt(zeros((1,2,1)), zeros((2,1,1)), zeros((1,1,2)))
vt(zeros((1,2,1)), zeros((2,1,1)), zeros((1,1,2)), zeros((2,2)))
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] "Manually" broadcasting arrays in Python

2006-10-03 Thread A. M. Archibald
On 03/10/06, Robert Kern <[EMAIL PROTECTED]> wrote:
> Has anyone implemented an easier or more efficient way to broadcast arrays to 
> a
> common shape at the Python level? I was hoping that the broadcast iterator 
> would
> actually provide the broadcasted arrays, but it does not.

How about vectorize(lambda *args: args)? Almost works, only it
sometimes segfaults with more than two arguments... but that's clearly
a numpy bug.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorizing code, for loops, and all that

2006-10-03 Thread A. M. Archibald
On 03/10/06, Tim Hochberg <[EMAIL PROTECTED]> wrote:

> I had an idea regarding which axis to operate on first. Rather than
> operate on strictly the longest axis or strictly the innermost axis, a
> hybrid approach could be used. We would operate on the longest axis that
> would not result in the inner loop overflowing the cache. The idea is
> minimize the loop overhead as we do now by choosing the largest axis,
> while at the same time attempting to maintain cache friendliness.

If elements are smaller than cache lines (usually at least eight
bytes, I think), we might end up pulling many times as many bytes into
the cache as we actually need if we don't loop along axes with small
strides first.

Can BLAS be used for some of these operations?

A. M. Archibald

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorizing code, for loops, and all that

2006-10-03 Thread A. M. Archibald
On 02/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:

> Perhaps those inner 1-d loops could be optimized (using prefetch or
> something) to reduce the number of cache misses on the inner
> computation, and the concept of looping over the largest dimension
> (instead of the last dimension) should be re-considered.

Cache control seems to be the main factor deciding the speed of many
algorithms. Prefectching could make a huge difference, particularly on
NUMA machines (like a dual opteron). I think GCC has a moderately
portable way to request it (though it may be only in beta versions as
yet).

More generally, all the tricks that ATLAS uses to accelerate BLAS
routines would (in principle) be applicable here. The implementation
would be extremely difficult, though, even if all the basic loops
could be expressed in a few primitives.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] eval shortcomings?

2006-09-24 Thread A. M. Archibald
On 25/09/06, Angus McMorland <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Can someone explain why the following occurs?
>
> a = numpy.zeros((100))
> b = numpy.ones((10))
> a[20:30] = b# okay
> eval('a[50:60] = b') # raises SyntaxError: invalid syntax
>
> Is there some line mangling that the interpretor does that eval doesn't do?

No. Eval evaluates expressions, that is, formulas producing a value.
"a=b" does not produce a value, so you are obtaining the same error
you would if you'd written

if a=b:
   ...

The way you run code that doesn't return a value is with "exec".

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


[Numpy-discussion] ufunc.reduce and conversion

2006-09-19 Thread A. M. Archibald
Hi,

What are the rules for datatype conversion in ufuncs? Does ufunc(a,b)
always yield the smallest type big enough to represent both a and b?
What is the datatype of ufunc.reduce(a)?

I ask because I was startled by the following behaviour:
>>> a = array([1,1],uint8); print a.dtype; print maximum.reduce(a).dtype;
'|u1'
' float,
possibly only for add.reduce).

Thanks,
A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Tim Hochberg <[EMAIL PROTECTED]> wrote:

> I'm still somewhat mystified by the desire to move the nans to one end
> of the sorted object. I see two scenarios:

It's mostly to have something to do with them other than throw an
exception. Leaving them in place while the rest of the array is
reshuffled requires a lot of work and isn't particularly better. I
mostly presented it as an alternative to throwing an exception.

Throwing a Python exception now seems like the most reasonable idea.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
>
>
> For floats we could use something like:
>
> lessthan(a,b) := a < b || (a == nan && b != nan)
>
> Which would put all the nans at one end and might not add too much overhead.

You could put an any(isnan()) out front and run this slower version
only if there are any NaNs (also, you can't use == for NaNs, you have
to use C isNaN). But I'm starting to see the wisdom in simply throwing
an exception, since sorting is not well-defined with NaNs.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote:

> If this sort of thing can cause unexpected errors I wonder if it would be
> worth it to have a global debugging flag that essentially causes  isnan to
> be called before any function applications.

That sounds very like the IEEE floating-point flags, which would be
extremely useful to have, and which are being wored on, IIRC.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Tim Hochberg <[EMAIL PROTECTED]> wrote:

> I'm not sure where the breakpoint is, but I was seeing failures for all
> three sort types with N as high as 1. I suspect that they're all
> broken in the presence of  NaNs.  I further suspect you'd need some
> punishingly slow n**2 algorithm to be robust in the presence of NaNs.

Not at all. Just temporarily make NaNs compare greater than any other
floating-point value and use quicksort (say). You can even do this for
python lists without much trouble.

That's actually a viable suggestion for numpy's sorting, although it
would be kind of ugly to implement: do a quick any(isnan(A)), and if
not, use the fast stock sorting algorithms; if there is a NaN
somewhere in the array, use a version of the sort that has a tweaked
comparison function so the NaNs wind up at the end and are easy to
trim off.

But the current situation, silently returning arrays in which the
non-NaNs are unsorted, is really bad.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Tim Hochberg <[EMAIL PROTECTED]> wrote:
> A. M. Archibald wrote:
> > Mmm. Somebody who's working with NaNs has more or less already decided
> > they don't want to be pestered with exceptions for invalid data.
> Do you really think so? In my experience NaNs are nearly always just an
> indication of a mistake somewhere that didn't get trapped for one reason
> or another.

Well, I said that because for an image porcessing project I was doing,
the easiest thing to do with certain troublesome pixels was to fill in
NaNs, and then at the end replace the NaNs with sensible values. It
seems as if the point of NaNs is to allow you to keep working with
those numbers that make sense while ingoring those that don't. If you
wanted exceptions, why not get them as soon as the first NaN would
have been generated?

> >  I'd
> > be happy if they wound up at either end, but I'm not sure it's worth
> > hacking up the sort algorithm when a simple isnan() can pull them out.
> >
> Moving them to the end seems to be the worst choice to me. Leaving them
> alone is fine with me. Or raising an exception would be fine. Or doing
> one or the other depending on the error mode settings would be even
> better if it is practical.

I was just thinking in terms of easy removal.

> Is that true? Are all of numpy's sorting algorithms robust against
> nontransitive objects laying around? The answer to that appears to be
> no. Try running this a couple of times to see what I mean:

> The values don't correctly cross the inserted NaN and the sort is incorrect.

You're quite right: when NaNs are present in the array, sorting and
then removing them does not yield a sorted array. For example,
mergesort just output
[ 2.  4.  6.  9. nan  0.  1.
  3.  5.  7.  8.]

The other two are no better (and arguably worse).  Python's built-in
sort() for lists has the same problem.

This is definitely a bug, and the best way to fix it is not clear to
me - perhaps sort() needs to always do any(isnan(A)) before starting
to sort. I don't really like raising an exception, but sort() isn't
really very meaningful with NaNs in the array. The only other option I
can think of is to somehow remove them, sort without, and reintroduce
them at the end, which is going to be a nightmare when sorting a
single axis of a large array. Or, I suppose, sort() could simply fill
the array with NaNs; I'm sure users will love that.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Tim Hochberg <[EMAIL PROTECTED]> wrote:
> Keith Goodman wrote:
> > In what order would you like argsort to sort the values -inf, nan, inf?
> >
> Ideally, -inf should sort first, inf should sort last and nan should
> raise an exception if present.
>
> -tim

Mmm. Somebody who's working with NaNs has more or less already decided
they don't want to be pestered with exceptions for invalid data. I'd
be happy if they wound up at either end, but I'm not sure it's worth
hacking up the sort algorithm when a simple isnan() can pull them out.

What's happening now is that NaNa are all false,
which rather confuses the sorting algorithm. But as long as it doesn't
actually *break* (that is, leave some of the non-NaNs incorrectly
sorted) I don't care.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] visual programming

2006-09-18 Thread A. M. Archibald
On 18/09/06, Mathew Yeates <[EMAIL PROTECTED]> wrote:
> semi off topic.
> Does anyone know of any good visual programming tools? Forever ago, I
> used to use something called Khoros for image processing and I found it
> very useful. You connect boxes which represent different processing
> steps. At about the same time, there was something similar to Khoros
> from SGI. Explorer was its name, I think.
>
> Anybody out there use this sort of thing? Matlab doesn't offer it,
> right? Nor matplotlib?
>
> Mathew

I doubt it's what you're looking for, but PD and GEM are a visual
programming environment, oriented towards music, audio signal
processing, 2D and 3D video. (They're related to Max and jMax, which
do some of the same things.)

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] In-place operations

2006-09-12 Thread A. M. Archibald
On 12/09/06, Pierre Thibault <[EMAIL PROTECTED]> wrote:

> I would like to have information on the best techniques to do in-place
> calculations and to minimize temporary array creations. To me this
> seems to be very important whenever the arrays become very large.

The first rule of optimization: don't do it yet.

You can usually go through and banish temporary arrays (using ufuncs
and so on) at the cost of readability, code encapsulation, and
thread-safe-ness. But it may not do what you want. I had an
image-processing code that was taking longer than I thought it should
and using two hundred megabytes or so of RAM. So I rewrote it, with a
certain amount of pain, in a way that it used the fewest possible
temporary arrays. It didn't run any faster, and it then took five
hundred megabytes. Because all the arrays ended up being in memory at
once, the memory footprint increased drastically.

malloc() is fast, typically just a handful of instructions; if you're
allocating a giant array, it's almost certainly being allocated using
mmap(), and it can be released back to the OS on deallocation.

But you probably still want to avoid temporary arrays. So:

> More specifically, here are examples that occured in my code
>
> 1) FFTs:  Let A and B be two large arrays, already allocated. I want
> the fft of A to be stored in B. If I just type B = fft(A), there is a
> temprary array creation, right? Is it possible to avoid that?

Doing an FFT in-place is a major challenge, and involves its own
slowdowns, so generally high-level toolkits don't bother. But fft
seems to be like many functions (those generated by interp1d, for
example) that insist on malloc()ing their own arrays to return. Short
of rooting around in the numpy/scipy code, there's no real way around
this for such functions. The best you can do is make actual use of the
allocated array (rather than copy its contents to *another* array and
discard it).

> 2) Function output: In general, I think the same thing happens with
> functions like
>
> def f1(array_in):
>array_out = # something using array_in
>return array_out
>
> Then, if B is already allocated, writing B = f1(A) involves again a
> temporary array creation

Uh, no, not really. The way you have written f1, it probably malloc()s
space for array_out. the address of that space (roughly) is saved in
the array_out variable. If you write B=f1(A), you are just storing the
address in B. The memory is not copied. Even if you do B=f1(A)[::10]
you don't copy the memory.

> I thought instead of doing something like
>
> def f2(array_in, array_out):
>   array_out[:] = # something
>   # Is this good practice?
>
> and call f2(A,B).

This is a ufunc-like solution; you could even make array_out an
optional argument, and return it.

> If I understand well, this still requires a temporary array creation.
> Is there another way of doing that (appart from actually looping
> through the indices of A and B)?

It depends what #something is. If, say, it is 2*array_in, you can
simply do multiply(array_in,2,array_out) to avoid any dynamic
allocation.

> I guess these considerations are not standard python problems because
> you expect python to take care of memory issues. With big arrays in
> scientific computations, I feel the question is more relevant. I might
> be wrong...

Some of these issues come up when dealing with mutable objects (lists,
dictionaries, and so on). Some of them (the fact that python variables
contain simply references) are discussed in various python FAQs.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Problem with concatenate and object arrays

2006-09-07 Thread A. M. Archibald
Maybe I should stay out of this, but it seems like constructing object
arrays is complicated and involves a certain amount of guesswork on
the part of Numeric.

For example, if you do array([a,b,c]).shape(), the answer is normally
(3,) unless a b and c happen to all be lists of the same length, at
which point your array could have a much more complicated shape... but
as the person who wrote "array([a,b,c])" it's tempting to assume that
the result has shape (3,), only to discover subtle bugs much later.

If we were writing an array-creation function from scratch, would
there be any reason to include object-array creation in the same
function as uniform array creation? It seems like a bad idea to me.

If not, the problem is just compatibility with Numeric. Why not simply
write a wrapper function in python that does Numeric-style guesswork,
and put it in the compatibility modules? How much code will actually
break?

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Problem with concatenate and object arrays

2006-09-06 Thread A. M. Archibald
On 06/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
> On 9/6/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
> >   order  - Specify the order of the array.  If order is 'C', then the
> > array will be in C-contiguous order (last-index varies the
> > fastest).  If order is 'FORTRAN', then the returned array
> > will be in Fortran-contiguous order (first-index varies
> the
> > fastest).  If order is None, then the returned array may
> > be in either C-, or Fortran-contiguous order or even
> > discontiguous.

This one's a bit complicated. If array() is passed a list of lists,
there are two different orders that are relevant - the output order of
the array, and the order used to interpret the input. I suppose that
if L is a lost of lists, array(L)[2,3]==L[2][3], that is, in some
sense the arrays are always logically C-ordered even if the underlying
representation is different. Does it make sense to specify this
somewhere in the docstring? At least it would be good to make it clear
that the order parameter affects only the underlying storage format,
and not the indexing of the array.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Random number generators

2006-09-04 Thread A. M. Archibald
On 04/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
>
>
> On 9/4/06, A. M. Archibald <[EMAIL PROTECTED]> wrote:
> > On 04/09/06, Robert Kern <[EMAIL PROTECTED]> wrote:
> > > Charles R Harris wrote:

> > However, a random number generator based on a stream cipher is
> > probably going to be pretty good, if slow, so implementingone if those
> > can't hurt. But I think leaving the possibility open that one could
> > use custom sources of random bytes is a very good idea. There's no
> > particular need that custom ones should be fast, as they're for use
> > when you really want *good* random numbers.
>
>
> I was thinking it might be nice to have one, just to generate seeds if
> nothing else. This could actually be written in python and use something
> like the python cryptography toolkit, no need to reinvent the wheel. It
> might be worth looking at that code just to see how someone else thought
> through the interface. It's under the Python license, so I need to know if
> that license is compatible with the BSD license used for numpy.

Of the generators they have, only their random pool is of any use for
seeding, and even that needs some clever feeding of random data.
However, on practically any platofrm this will run on, random bytes
based on real entropy can be extracted from the system (/dev/random,
windows crypto API, something on the Mac); a uniform API for those
would be a handy thing to have. But I can't recommend spending much
effort connecting the python crypto stuff to numpy's random number
generators; it's the sort of thing application writers can easily cook
up by subclassing RandomGenerator (or whatever).

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Random number generators

2006-09-04 Thread A. M. Archibald
On 04/09/06, Robert Kern <[EMAIL PROTECTED]> wrote:
> Charles R Harris wrote:
>
> > What sort of api should this be? It occurs to me that there are already
> > 4 sources of random bytes:
> >
> > Initialization:
> >
> > /dev/random (pseudo random, I think)

/dev/random is (supposed to be) true randomness derived from various
sources. It's also fed through a bunch of hashing and CRC functions
chosen for convenience, but is probably pretty unbiased.

> > /dev/urandom
> > crypto system on windows
> >
> > Pseudo random generators:
> >
> > mtrand
> >
> > I suppose we could add some cryptologically secure source as well. That
> > indicates to me that one set of random number generators would just be
> > streams of random bytes, possibly in 4 byte chunks. If I were doing this
> > for linux these would all look like file systems, FUSE comes to mind.
> > Another set of functions would transform these into the different
> > distributions. So, how much should stay in numpy? What sort of API are
> > folks asking for?

"Cryptographically secure random number generator" means something
different to everybody, and implementing one that everyone is happy
with is going to be a colossal pain. Look at Yarrow and its design
documents for some idea of the complexities involved.

However, a random number generator based on a stream cipher is
probably going to be pretty good, if slow, so implementingone if those
can't hurt. But I think leaving the possibility open that one could
use custom sources of random bytes is a very good idea. There's no
particular need that custom ones should be fast, as they're for use
when you really want *good* random numbers.

> It would be good to also support quasi-random generators like Sobol and Halton
> sequences. They tend to only generate floating point numbers in [0, 1] (or (0,
> 1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs 
> that
> only output floating point numbers, but I doubt we want to implement any of
> them. You can look at the 1.6 version of RandomKit for an implementation of
> Sobol sequences and ISAAC, a cryptographic PRNG.
>
>http://www.jeannot.org/~js/code/index.en.html

Sobol' sequences may require special care in constructing non-uniform
distributions in order to preserve their uniformity properties. If we
can simply import ISAAC, it won't hurt, but I'd rather trust (say) AES
in counter mode than some custom cryptosystem that hasn't been
analyzed as thoroughly.

> I'm thinking that the core struct that we're going to pass around will look
> something like this:
>
> struct random_state {
>void* state;
>unsigned int (*gen_uint32)(struct random_state*);
>double (*gen_double)(struct random_state*);
> }
>
> state -- A pointer to the generator-specific struct.
>
> gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. 
> Possibly
> NULL if the generator can not implement such a generator.
>
> gen_double -- A function pointer that yields a double-precision number in [0, 
> 1]
> (possibly omitting one or both of the endpoints depending on the 
> implementation).

Are 32-bit numbers really the right least common denominator? There
are plenty of 64-bit platforms out there...

Given this API, implementing a subclassable class that exports it
should satisfy most people's needs for interchangeable generators.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Random number generators

2006-09-03 Thread A. M. Archibald
On 04/09/06, Robert Kern <[EMAIL PROTECTED]> wrote:
> A. M. Archibald wrote:
>
> > In the same project I also noticed it would be nice to be able to
> > (say) do "exponential(2+sin(arange(10)))" to get an array of
> > exponentials with varying parameters.
>
> Travis O. recently added this capability. The distribution parameters (loc,
> scale, alpha, whatever) are broadcast against each other using the same rules 
> as
> ufunc parameters.

Wonderful!  Thank you, Travis O.

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in round with negative number of decimals

2006-09-03 Thread A. M. Archibald
On 04/09/06, Sebastian Haase <[EMAIL PROTECTED]> wrote:

> Thanks for the reply - but read what the doc says:
>  >>> N.around.__doc__
> 'Round 'a' to the given number of decimal places.  Rounding
>  behaviour is equivalent to Python.
>
>  Return 'a' if the array is not floating point.  Round both the real
>  and imaginary parts separately if the array is complex.
>  '
>
> it is *not* done this way in Python:
>  >>> round(.5)
> 1.0
>  >>> round(1.5)
> 2.0
>  >>> round(2.5)
> 3.0
>
> ( the round obj. method is missing this doc string )

Um, well, the doc is wrong. Numpy should *not* always follow python's
lead, and in partciular it's explicit that Numeric's floating point is
closer to IEEE floating-point behaviour than python's - compare, for
example 1./0. and array(1.)/0.

> I really think we should stick to what the doc string say - everybody
> expects x.5  to round up !!

Not everybody. This (admittedly odd) behaviour wasn't decided on
because it was more efficient to implement, it was decided on because
a group of very smart numerical analysts agreed that it was the best
way to avoid surprising naive users with biased results. (Non-naive
users can normally pick from any of several other rounding modes if
they want.)

A better question to ask is, "Can I change numpy's rounding behaviour
for my programs?" (And, more generally, "can I set all the various
floating-point options that the IEEE standard and my processor both
support?") I don't know the answer to that one, but it does seem to be
a goal that numpy is trying for (hence, for example, the difference
between numpy float scalars and python floats).

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] diagflat

2006-09-03 Thread A. M. Archibald

On 03/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote:

Travis has introduced the function diagflat for creating diagonal arrays. It
flattens whatever array is supplied and returns its values as the diagonal
of a 2-D array or matrix. As the function is currently undocumented I
thought now might be a good time to discuss other possible choices for the
name. Here is a quick list that comes to mind

diagflat (current name)
asdiagonal
todiagonal
asdiag
todiag
...


I was wishing for that just the other day... I think my vote is for
"diagflat", to emphasize the initial flattening.

In particular, there's another function I could wish for: given an
array, pick an axis, treat the array like an array of vectors along
that axis, and replace each vector with the diagonal matrix. So,
supposing that it was called todiagonal:

A = todiagonal(ones(2,3,4),axis=1)
shape(A)

(2,3,3,4)

A[1,:,:,2]

array([[1, 0, 0],
  [0, 1, 0],
  [0, 0, 1]])

Specifically I find the forcible flattening a bit monstrous. And you
can always just do todiagonal(A.flat).

No such function seems to exist at the moment, so I'm attaching one;
probably not as efficient as it could be with appropriate fancy
indexing wizardry.

A. M. Archibald

from numpy import zeros, shape, newaxis, repeat, where, indices, asarray
import sys
 
def todiagonal(A,axis=-1):
A = asarray(A)
s = shape(A)
if axis<0:
	axis += len(s)
if not 0<=axis-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Random number generators

2006-09-03 Thread A. M. Archibald
On 04/09/06, Charles R Harris <[EMAIL PROTECTED]> wrote:

> Except for the last, none conflict with current routines and can be added
> without a branch. I expect adding MWC8222 might need more extensive work and
> I will branch for that. So the questions are of utility and naming. I see
> some utility for myself, otherwise I wouldn't be considering doing the work.
> OTOH, I already have (C++) routines that I use for these things, so a larger
> question might be if anyone else sees a use for these. I like speed, but it
> is not always that important in everyday apps.

How painful would it be to allow users to drop in their own sources of
raw random numbers? A couple of years ago I was trying to analyze some
X-ray timing data and we saw some peaks at the end of a long chain of
periodicity-detecting software (Fourier transforms, parameter
searching, et cetera). To rule out the possibility that they were
coming from the random numbers we were using at one stage, I wanted to
supply raw random numbers from /dev/random. Unfortunately, that forced
me to write my own program to convert uniform random numbers to
exponential. Allowing the function that generates raw random bytes to
be overwritten would be extremely handy, even if it were painfully
slow.

In the same project I also noticed it would be nice to be able to
(say) do "exponential(2+sin(arange(10)))" to get an array of
exponentials with varying parameters.

I realize all this is outside the scope of what you were asking. I
would have found another pseudorandom number generator that I could
just switch to a benefit; the rest of them are less exciting.

When you say "32-bit integers" do you really mean 32-bit, or do you
mean "machine width"? 32-bit integers may not be faster on 64-bit
platforms...

A. M. Archibald

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion