On Tue, 2008-09-09 at 07:53 +0100, Hanni Ali wrote: > Hi David, > > Forgot to answer last week, I was under a fair bit of pressure time > wise, but thanks for your input. I sorted it all in the end and just > in time, but the main issue here was the change from numarray to > numpy. Previously where a typecode of 'f' was used in numarray, the > calculation was performed in double precision whereas in numpy it was > calculated in single precision. Hence when migrating the code, the > differences popped up, which were fairly big when considering the size > and number of mean calcs we perform.
Hi Hanni, glad it worked ok for you. > > I now have a distinct dislike of float values (it'll probably wear off > over time), how can the sum of 100,000 numbers be anything other than > the sum of those numbers. I know the reasoning, as highlighted by the > couple of other e-mails we have had, but I feel the default should > probably lean towards accuracy than speed. 2.0+2.0=4.0 and 2.0 > +2.0.....=200,000.0 not 2array.sum() != 200,000... I think it is a fallacy to say you prefer accuracy over speed: the fallacy is in thinking it is binary choice. You care about speed, because otherwise, you would not use a computer at all, you would do everything by hand [1]. Floating point is by itself an approximation: it can not even represent rational number accurately, let alone algebraic numbers or transcendent ones ! There are packages to do exact computation (look at sage for example for something based on python), but numpy/scipy are first numerical computation, meaning approximation along the way. It is true that it can give some unexpected results, and you should be aware of floating point limitations [2]. That being said, for a lot of computations, when you have unexpected difference between float and double, you have a problem in your implementation. For example, IIRC, you computed average of a big number numbers, at once: you can get better results if you first normalize your numbers. Another example which bites me all the time in statistic is when computing exponential of small numbers: log(exp(-1000)) will be -Inf done naively, but you and me know the solution is of course -1000; again, you should think more about your computation. IOW, floating point are a useful approximation/abstraction (I don't know if you are familiar with fixed point computation, as done in some DSP, but it is not pretty), but it breaks in some cases. cheers, David [1] I know some people do this for some kind of computation; in a different context from numerical computation, I found the following interview from Alain Connes (one of the most famous French Mathematician currently alive), to be extemely enlightening: http://www.ipm.ac.ir/IPM/news/connes-interview.pdf (see page 2-3 for the discussion about computer and computation) [2] "What every computer scientist should know about floating-point arithmetic", in ACM Computer Survey, 1991, By David Goldberg. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion