Re: [Numpy-discussion] Log Arrays

2008-05-11 Thread Gael Varoquaux
On Thu, May 08, 2008 at 10:04:28AM -0600, Charles R Harris wrote:
>What realistic probability is in the range exp(-1000) ?

I recently tried to do Fisher information estimation of a very noisy
experiment. For this I needed to calculate the norm of the derivation of
the probability over the entire state space as a function of a hidden
parameter. My probabilities where in most of the phase space very small,
and calculating the derivative in these regions was very numerically
unstable. However I could not drop these regions as they where
contributing significantly to the Fisher information. I could never get
clean results (everything was very noisy) and I gave up.

This was just to say that extremely small numbers, and their difference,
can be very significant, and that there is a strong usecase for
logarithmic operations (actually I should probably have worked a bit on
my formulas to express them all in logarithm form, but I ran out of
time).

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 2:37 PM, Warren Focke <[EMAIL PROTECTED]>
wrote:

>
>
> On Thu, 8 May 2008, Charles R Harris wrote:
>
> > On Thu, May 8, 2008 at 11:46 AM, Anne Archibald <
> [EMAIL PROTECTED]>
> > wrote:
> >
> >> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
> >>>
> >>> On Thu, May 8, 2008 at 10:56 AM, Robert Kern <[EMAIL PROTECTED]>
> >> wrote:
> 
>  When you're running an optimizer over a PDF, you will be stuck in the
>  region of exp(-1000) for a substantial amount of time before you get
>  to the peak. If you don't use the log representation, you will never
>  get to the peak because all of the gradient information is lost to
>  floating point error. You can consult any book on computational
>  statistics for many more examples. This is a long-established best
>  practice in statistics.
> >>>
> >>> But IEEE is already a log representation. You aren't gaining precision,
> >> you
> >>> are gaining more bits in the exponent at the expense of fewer bits in
> the
> >>> mantissa, i.e., less precision. As I say, it may be convenient, but if
> >> cpu
> >>> cycles matter, it isn't efficient.
> >>
> >> Efficiency is not the point here. IEEE floats simply cannot represent
> >> the difference between exp(-1000) and exp(-1001). This difference can
> >> matter in many contexts.
> >>
> >> For example, suppose I have observed a source in X-rays and I want to
> >> fit a blackbody spectrum. I have, say, hundreds of spectral bins, with
> >> a number of photons in each. For any temperature T and amplitude A, I
> >> can compute the distribution of photons in each bin (Poisson with mean
> >> depending on T and A), and obtain the probability of obtaining the
> >> observed number of photons in each bin. Even if a blackbody with
> >> temperature T and amplitude A is a perfect fit I should expect this
> >
> >
> > This is an interesting problem, partly because I was one of the first
> guys
> > to use synthetic spectra (NO, OH), to fit temperatures to spectral data.
> But
> > also because it might look like the Poisson matters. But probably not,
> the
> > curve fitting effectively aggregates data and the central limit theorem
> > kicks in, so that the normal least squares approach will probably work.
> > Also, if the number of photons in a bin is much more that about 10, the
> > computation of the Poisson distribution probably uses a Gaussian, with
> > perhaps a small adjustment for the tail. So I'm curious, did you find any
> > significant difference trying both methods?
>
> I can't address Anne's results, but I've certainly found it to make a
> difference
> in my work, and it's pretty standard in high-energy astronomy.  The central
> limit theorem does not get you out of having to use the right PDF to
> compare
> data to model.  It does mean that, if you have enough events, the PDF of
> the
> fitness function is fairly chi-squarish, so much of the logic developed for
> least-squares fitting still applies (like for finding confidence intervals
> on
> the fitted parameters).  Using Poisson statistics instead of a Gaussian
> approximation is actually pretty easy, see Cash (Astrophysical Journal,
> Part 1,
> vol. 228, Mar. 15, 1979, p. 939-947
> http://adsabs.harvard.edu/abs/1979ApJ...228..939C)
>

Interesting paper.  I've also been playing with a 64 bit floating format
with a 31 bit offset binary exponent and 33 bit mantissa with a hidden bit,
which can hold the probability of any give sequence of 2**30 - 1 coin
tosses, or about 1e-323228496. It looks to be pretty efficient for
multiplication and compare, and functions like log and exp aren't hard to
do.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Warren Focke


On Thu, 8 May 2008, Charles R Harris wrote:

> On Thu, May 8, 2008 at 11:46 AM, Anne Archibald <[EMAIL PROTECTED]>
> wrote:
>
>> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
>>>
>>> On Thu, May 8, 2008 at 10:56 AM, Robert Kern <[EMAIL PROTECTED]>
>> wrote:

 When you're running an optimizer over a PDF, you will be stuck in the
 region of exp(-1000) for a substantial amount of time before you get
 to the peak. If you don't use the log representation, you will never
 get to the peak because all of the gradient information is lost to
 floating point error. You can consult any book on computational
 statistics for many more examples. This is a long-established best
 practice in statistics.
>>>
>>> But IEEE is already a log representation. You aren't gaining precision,
>> you
>>> are gaining more bits in the exponent at the expense of fewer bits in the
>>> mantissa, i.e., less precision. As I say, it may be convenient, but if
>> cpu
>>> cycles matter, it isn't efficient.
>>
>> Efficiency is not the point here. IEEE floats simply cannot represent
>> the difference between exp(-1000) and exp(-1001). This difference can
>> matter in many contexts.
>>
>> For example, suppose I have observed a source in X-rays and I want to
>> fit a blackbody spectrum. I have, say, hundreds of spectral bins, with
>> a number of photons in each. For any temperature T and amplitude A, I
>> can compute the distribution of photons in each bin (Poisson with mean
>> depending on T and A), and obtain the probability of obtaining the
>> observed number of photons in each bin. Even if a blackbody with
>> temperature T and amplitude A is a perfect fit I should expect this
>
>
> This is an interesting problem, partly because I was one of the first guys
> to use synthetic spectra (NO, OH), to fit temperatures to spectral data. But
> also because it might look like the Poisson matters. But probably not, the
> curve fitting effectively aggregates data and the central limit theorem
> kicks in, so that the normal least squares approach will probably work.
> Also, if the number of photons in a bin is much more that about 10, the
> computation of the Poisson distribution probably uses a Gaussian, with
> perhaps a small adjustment for the tail. So I'm curious, did you find any
> significant difference trying both methods?

I can't address Anne's results, but I've certainly found it to make a 
difference 
in my work, and it's pretty standard in high-energy astronomy.  The central 
limit theorem does not get you out of having to use the right PDF to compare 
data to model.  It does mean that, if you have enough events, the PDF of the 
fitness function is fairly chi-squarish, so much of the logic developed for 
least-squares fitting still applies (like for finding confidence intervals on 
the fitted parameters).  Using Poisson statistics instead of a Gaussian 
approximation is actually pretty easy, see Cash (Astrophysical Journal, Part 1, 
vol. 228, Mar. 15, 1979, p. 939-947 
http://adsabs.harvard.edu/abs/1979ApJ...228..939C)

w

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Anne Archibald
2008/5/8 T J <[EMAIL PROTECTED]>:
> On 5/8/08, Anne Archibald <[EMAIL PROTECTED]> wrote:
>  > Is "logarray" really the way to handle it, though? it seems like you
>  > could probably get away with providing a logsum ufunc that did the
>  > right thing. I mean, what operations does one want to do on logarrays?
>  >
>  > add -> logsum
>  > subtract -> ?
>  > multiply -> add
>  > mean -> logsum/N
>  > median -> median
>  > exponentiate to recover normal-space values -> exp
>  > str -> ?
>  >
>
>  That's about it, as far as my usage goes.  Additionally, I would also
>  benefit from:
>
>logdot
>logouter
>
>  In addition to the elementwise operations, it would be nice to have
>
>logsum along an axis
>logprod along an axis
>cumlogsum
>cumlogprod
>
>  Whether these are through additional ufuncs or through a subclass is
>  not so much of an issue for me---either would be a huge improvement to
>  the current situation.  One benefit of a subclass, IMO, is that it
>  maintains the feel of non-log arrays.  That is, when I want to
>  multiply to logarrays, I just do x*y, rather than x+ybut I can
>  understand arguments that this might not be desirable.

Well, how about a two-step process? Let's come up with a nice
implementation of each of the above, then we can discuss whether a
subclass is warranted (and at that point each operation on the
subclass will be very simple to implement).

Most of them could be implemented almost for free by providing a ufunc
that did logsum(); then logsum along an axis becomes logsum.reduce(),
and cumlogsum becomes logsum.accumulate(). logprod is of course just
add. logouter is conveniently and efficiently written as
logprod.outer. logdot can be written in terms of logsum() and
logprod() at the cost of a sizable temporary, but should really have
its own implementation.

It might be possible to test this by using vectorize() on a
single-element version of logsum(). This would be slow but if
vectorize() makes a real ufunc object should provide all the usual
handy methods (reduce, accumulate, outer, etcetera). Alternatively,
since logsum can be written in terms of existing ufuncs, it should be
possible to implement a class providing all the ufunc methods by hand
without too too much pain.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 11:46 AM, Anne Archibald <[EMAIL PROTECTED]>
wrote:

> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
> >
> > On Thu, May 8, 2008 at 10:56 AM, Robert Kern <[EMAIL PROTECTED]>
> wrote:
> > >
> > > When you're running an optimizer over a PDF, you will be stuck in the
> > > region of exp(-1000) for a substantial amount of time before you get
> > > to the peak. If you don't use the log representation, you will never
> > > get to the peak because all of the gradient information is lost to
> > > floating point error. You can consult any book on computational
> > > statistics for many more examples. This is a long-established best
> > > practice in statistics.
> >
> > But IEEE is already a log representation. You aren't gaining precision,
> you
> > are gaining more bits in the exponent at the expense of fewer bits in the
> > mantissa, i.e., less precision. As I say, it may be convenient, but if
> cpu
> > cycles matter, it isn't efficient.
>
> Efficiency is not the point here. IEEE floats simply cannot represent
> the difference between exp(-1000) and exp(-1001). This difference can
> matter in many contexts.
>
> For example, suppose I have observed a source in X-rays and I want to
> fit a blackbody spectrum. I have, say, hundreds of spectral bins, with
> a number of photons in each. For any temperature T and amplitude A, I
> can compute the distribution of photons in each bin (Poisson with mean
> depending on T and A), and obtain the probability of obtaining the
> observed number of photons in each bin. Even if a blackbody with
> temperature T and amplitude A is a perfect fit I should expect this


This is an interesting problem, partly because I was one of the first guys
to use synthetic spectra (NO, OH), to fit temperatures to spectral data. But
also because it might look like the Poisson matters. But probably not, the
curve fitting effectively aggregates data and the central limit theorem
kicks in, so that the normal least squares approach will probably work.
Also, if the number of photons in a bin is much more that about 10, the
computation of the Poisson distribution probably uses a Gaussian, with
perhaps a small adjustment for the tail. So I'm curious, did you find any
significant difference trying both methods?

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread T J
On 5/8/08, Anne Archibald <[EMAIL PROTECTED]> wrote:
> Is "logarray" really the way to handle it, though? it seems like you
> could probably get away with providing a logsum ufunc that did the
> right thing. I mean, what operations does one want to do on logarrays?
>
> add -> logsum
> subtract -> ?
> multiply -> add
> mean -> logsum/N
> median -> median
> exponentiate to recover normal-space values -> exp
> str -> ?
>

That's about it, as far as my usage goes.  Additionally, I would also
benefit from:

   logdot
   logouter

In addition to the elementwise operations, it would be nice to have

   logsum along an axis
   logprod along an axis
   cumlogsum
   cumlogprod

Whether these are through additional ufuncs or through a subclass is
not so much of an issue for me---either would be a huge improvement to
the current situation.  One benefit of a subclass, IMO, is that it
maintains the feel of non-log arrays.  That is, when I want to
multiply to logarrays, I just do x*y, rather than x+ybut I can
understand arguments that this might not be desirable.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Robert Kern
On Thu, May 8, 2008 at 2:12 PM, Charles R Harris
<[EMAIL PROTECTED]> wrote:
>
> On Thu, May 8, 2008 at 11:46 AM, Anne Archibald <[EMAIL PROTECTED]>
> wrote:
>>
>> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
>> >
>> > On Thu, May 8, 2008 at 10:56 AM, Robert Kern <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > > When you're running an optimizer over a PDF, you will be stuck in the
>> > > region of exp(-1000) for a substantial amount of time before you get
>> > > to the peak. If you don't use the log representation, you will never
>> > > get to the peak because all of the gradient information is lost to
>> > > floating point error. You can consult any book on computational
>> > > statistics for many more examples. This is a long-established best
>> > > practice in statistics.
>> >
>> > But IEEE is already a log representation. You aren't gaining precision,
>> > you
>> > are gaining more bits in the exponent at the expense of fewer bits in
>> > the
>> > mantissa, i.e., less precision. As I say, it may be convenient, but if
>> > cpu
>> > cycles matter, it isn't efficient.
>>
>> Efficiency is not the point here. IEEE floats simply cannot represent
>> the difference between exp(-1000) and exp(-1001). This difference can
>> matter in many contexts.
>>
>> For example, suppose I have observed a source in X-rays and I want to
>> fit a blackbody spectrum. I have, say, hundreds of spectral bins, with
>> a number of photons in each. For any temperature T and amplitude A, I
>> can compute the distribution of photons in each bin (Poisson with mean
>> depending on T and A), and obtain the probability of obtaining the
>> observed number of photons in each bin. Even if a blackbody with
>> temperature T and amplitude A is a perfect fit I should expect this
>> number to be very small, since the chance of obtaining *exactly* this
>> sequence of integers is quite small. But when I start the process I
>> need to guess a T and an A and evauate the probability. If my values
>> are far off, the probability will almost certainly be lower than
>> exp(-1000). But I need to know whether, if I increase T, this
>> probability will increase or decrease, so I cannot afford to treat it
>> as zero, or throw up my hands and say "This is smaller than one over
>> the number of baryons in the universe! My optimization problem doesn't
>> make any sense!".
>
> You want the covariance of the parameters of the fit, which will be much
> more reasonable. And if not, then you have outliers. Mostly, folks are
> looking for the probability of classes of things, in this case the class of
> curves you are fitting, the probability of any give sequence, which
> approaches to zero, is a much less interest. So the errors in the parameters
> of the model matter, the probability of essentially unique sequence much
> less so.

Except that when you are doing practical computations, you will often
be computing likelihoods as part of the larger computation. Yes, the
final probabilities that you interpret won't be exp(-1000) (and if
they are, then 0 is close enough), but the calculations *in between*
do require that exp(-1000) and exp(-1001) be distinguishable from each
other.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 11:46 AM, Anne Archibald <[EMAIL PROTECTED]>
wrote:

> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
> >
> > On Thu, May 8, 2008 at 10:56 AM, Robert Kern <[EMAIL PROTECTED]>
> wrote:
> > >
> > > When you're running an optimizer over a PDF, you will be stuck in the
> > > region of exp(-1000) for a substantial amount of time before you get
> > > to the peak. If you don't use the log representation, you will never
> > > get to the peak because all of the gradient information is lost to
> > > floating point error. You can consult any book on computational
> > > statistics for many more examples. This is a long-established best
> > > practice in statistics.
> >
> > But IEEE is already a log representation. You aren't gaining precision,
> you
> > are gaining more bits in the exponent at the expense of fewer bits in the
> > mantissa, i.e., less precision. As I say, it may be convenient, but if
> cpu
> > cycles matter, it isn't efficient.
>
> Efficiency is not the point here. IEEE floats simply cannot represent
> the difference between exp(-1000) and exp(-1001). This difference can
> matter in many contexts.
>
> For example, suppose I have observed a source in X-rays and I want to
> fit a blackbody spectrum. I have, say, hundreds of spectral bins, with
> a number of photons in each. For any temperature T and amplitude A, I
> can compute the distribution of photons in each bin (Poisson with mean
> depending on T and A), and obtain the probability of obtaining the
> observed number of photons in each bin. Even if a blackbody with
> temperature T and amplitude A is a perfect fit I should expect this
> number to be very small, since the chance of obtaining *exactly* this
> sequence of integers is quite small. But when I start the process I
> need to guess a T and an A and evauate the probability. If my values
> are far off, the probability will almost certainly be lower than
> exp(-1000). But I need to know whether, if I increase T, this
> probability will increase or decrease, so I cannot afford to treat it
> as zero, or throw up my hands and say "This is smaller than one over
> the number of baryons in the universe! My optimization problem doesn't
> make any sense!".
>

You want the covariance of the parameters of the fit, which will be much
more reasonable. And if not, then you have outliers. Mostly, folks are
looking for the probability of classes of things, in this case the class of
curves you are fitting, the probability of any give sequence, which
approaches to zero, is a much less interest. So the errors in the parameters
of the model matter, the probability of essentially unique sequence much
less so.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 12:39 PM, Anne Archibald <[EMAIL PROTECTED]>
wrote:

> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
> >
> > David, what you are using is a log(log(x)) representation internally.
> IEEE
> > is *not* linear, it is logarithmic.
>
> As Robert Kern says, yes, this is exactly what the OP and all the rest
> of us want.
>
> But it's a strange thing to say that IEEE is logarithmic - "2.3*10**1"
> is not exactly logarithmic, since the "2.3" is not the logarithm of
> anything. IEEE floats work the same way, which is important, since it
> means they can exactly represent integers of moderate size. For
> example, 257 is represented as
>
> sign 0, exponent 135, (implied leading 1).0001b.
>
> The exponent is indeed the integer part of the log base 2 of the
> value, up to some fiddling, but the mantissa is not a logarithm of any
> kind.
>

First order in the  Taylors series. The log computation uses the fact that
it has small curvature over the range 1..2

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Anne Archibald
2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
>
> David, what you are using is a log(log(x)) representation internally. IEEE
> is *not* linear, it is logarithmic.

As Robert Kern says, yes, this is exactly what the OP and all the rest
of us want.

But it's a strange thing to say that IEEE is logarithmic - "2.3*10**1"
is not exactly logarithmic, since the "2.3" is not the logarithm of
anything. IEEE floats work the same way, which is important, since it
means they can exactly represent integers of moderate size. For
example, 257 is represented as

sign 0, exponent 135, (implied leading 1).0001b.

The exponent is indeed the integer part of the log base 2 of the
value, up to some fiddling, but the mantissa is not a logarithm of any
kind.

Anyway, all this is immaterial. The point is, in spite of the fact
that floating-point numbers can represent a very wide range of
numbers, there are some important contexts in which this range is not
wide enough. One could in principle store an additional power of two
in an accompanying integer, but this would be less efficient in terms
of space and time, and more cumbersome, when for the usual situations
where this is applied, simply taking the logarithm works fine.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 11:31 AM, Warren Focke <[EMAIL PROTECTED]>
wrote:

>
>
> On Thu, 8 May 2008, Charles R Harris wrote:
>
> > On Thu, May 8, 2008 at 10:11 AM, Anne Archibald <
> [EMAIL PROTECTED]>
> > wrote:
> >
> >> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
> >>>
> >>> What realistic probability is in the range exp(-1000) ?
> >>
> >> Well, I ran into it while doing a maximum-likelihood fit - my early
> >> guesses had exceedingly low probabilities, but I needed to know which
> >> way the probabilities were increasing.
> >>
> >
> > The number of bosons in the universe is only on the order of 1e-42.
> > Exp(-1000) may be convenient, but as a probability it is a delusion. The
> > hypothesis "none of the above" would have a much larger prior.
>
> You might like to think so.  Sadly, not.
>
> If you're doing a least-square (or any other maximum-likelihood) fit to
> 2000
> data points, exp(-1000) is the highest probability you can reasonably hope
> for.
> For a good fit.  Chi-square is -2*ln(P).  In the course of doing the fit,
> you
> will evaluate many parameter sets which are bad fits, and the probablility
> will
> be much lower.
>
> This has no real effect on the current discussion, but:
>
> The number of bosons in the universe (or any subset thereof) is not
> well-defined.  It's not just a question of not knowing the number; there
> really
> is no answer to that question (well, ok, 'mu').  It's like asking which
> slit the
> particle went through in a double-slit interference experiment.  The
> question is
> incorrect.  Values <<1 will never be tenable, but I suspect that the minus
> sign
> was a typo.  The estimates I hear for the number of baryons (protons,
> atoms) are
> ~ 1e80.
>

Say, mostly photons. Temperature (~2.7 K) determines density, multiply by
volume. But I meant baryons and the last number I saw was about 1e42.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Robert Kern
On Thu, May 8, 2008 at 12:53 PM, Charles R Harris
<[EMAIL PROTECTED]> wrote:
>
> On Thu, May 8, 2008 at 11:18 AM, David Cournapeau <[EMAIL PROTECTED]>
> wrote:
>>
>> On Fri, May 9, 2008 at 2:06 AM, Nadav Horesh <[EMAIL PROTECTED]>
>> wrote:
>> > Is the 80 bits float (float96 on IA32, float128 on AMD64) isn't enough?
>> > It has a 64 bits mantissa and can represent numbers up to nearly 
>> > 1E(+-)5000.
>>
>> It only make the problem happen later, I think. If you have a GMM with
>> million of samples of high dimension with many clusters, any "linear"
>> representation will fail I think. In a sense, the IEEE format is not
>> adequate for that kind of computation.
>
> David, what you are using is a log(log(x)) representation internally. IEEE
> is *not* linear, it is logarithmic.

*YES*. That is precisely the point. I want 53 bits devoted to the "x"
part of "exp(-x)". The straight IEEE representation is not logarithmic
*enough*.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 11:18 AM, David Cournapeau <[EMAIL PROTECTED]>
wrote:

> On Fri, May 9, 2008 at 2:06 AM, Nadav Horesh <[EMAIL PROTECTED]>
> wrote:
> > Is the 80 bits float (float96 on IA32, float128 on AMD64) isn't enough?
> It has a 64 bits mantissa and can represent numbers up to nearly 1E(+-)5000.
>
> It only make the problem happen later, I think. If you have a GMM with
> million of samples of high dimension with many clusters, any "linear"
> representation will fail I think. In a sense, the IEEE format is not
> adequate for that kind of computation.
>

David, what you are using is a log(log(x)) representation internally. IEEE
is *not* linear, it is logarithmic.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Anne Archibald
2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
>
> On Thu, May 8, 2008 at 10:56 AM, Robert Kern <[EMAIL PROTECTED]> wrote:
> >
> > When you're running an optimizer over a PDF, you will be stuck in the
> > region of exp(-1000) for a substantial amount of time before you get
> > to the peak. If you don't use the log representation, you will never
> > get to the peak because all of the gradient information is lost to
> > floating point error. You can consult any book on computational
> > statistics for many more examples. This is a long-established best
> > practice in statistics.
>
> But IEEE is already a log representation. You aren't gaining precision, you
> are gaining more bits in the exponent at the expense of fewer bits in the
> mantissa, i.e., less precision. As I say, it may be convenient, but if cpu
> cycles matter, it isn't efficient.

Efficiency is not the point here. IEEE floats simply cannot represent
the difference between exp(-1000) and exp(-1001). This difference can
matter in many contexts.

For example, suppose I have observed a source in X-rays and I want to
fit a blackbody spectrum. I have, say, hundreds of spectral bins, with
a number of photons in each. For any temperature T and amplitude A, I
can compute the distribution of photons in each bin (Poisson with mean
depending on T and A), and obtain the probability of obtaining the
observed number of photons in each bin. Even if a blackbody with
temperature T and amplitude A is a perfect fit I should expect this
number to be very small, since the chance of obtaining *exactly* this
sequence of integers is quite small. But when I start the process I
need to guess a T and an A and evauate the probability. If my values
are far off, the probability will almost certainly be lower than
exp(-1000). But I need to know whether, if I increase T, this
probability will increase or decrease, so I cannot afford to treat it
as zero, or throw up my hands and say "This is smaller than one over
the number of baryons in the universe! My optimization problem doesn't
make any sense!".

I could also point out that frequently when one obtains these crazy
numbers, one is not working with probabilities at all, but probability
densities. A probability density of exp(-1000) means nothing special.

Finally, the fact that *you* don't think this is a useful technique
doesn't affect the fact that there is a substantial community of users
who use it daily and who would like some support for it in scipy.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Warren Focke


On Thu, 8 May 2008, Charles R Harris wrote:

> On Thu, May 8, 2008 at 10:11 AM, Anne Archibald <[EMAIL PROTECTED]>
> wrote:
>
>> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
>>>
>>> What realistic probability is in the range exp(-1000) ?
>>
>> Well, I ran into it while doing a maximum-likelihood fit - my early
>> guesses had exceedingly low probabilities, but I needed to know which
>> way the probabilities were increasing.
>>
>
> The number of bosons in the universe is only on the order of 1e-42.
> Exp(-1000) may be convenient, but as a probability it is a delusion. The
> hypothesis "none of the above" would have a much larger prior.

You might like to think so.  Sadly, not.

If you're doing a least-square (or any other maximum-likelihood) fit to 2000 
data points, exp(-1000) is the highest probability you can reasonably hope for. 
For a good fit.  Chi-square is -2*ln(P).  In the course of doing the fit, you 
will evaluate many parameter sets which are bad fits, and the probablility will 
be much lower.

This has no real effect on the current discussion, but:

The number of bosons in the universe (or any subset thereof) is not 
well-defined.  It's not just a question of not knowing the number; there really 
is no answer to that question (well, ok, 'mu').  It's like asking which slit 
the 
particle went through in a double-slit interference experiment.  The question 
is 
incorrect.  Values <<1 will never be tenable, but I suspect that the minus sign 
was a typo.  The estimates I hear for the number of baryons (protons, atoms) 
are 
~ 1e80.

w

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread David Cournapeau
On Fri, May 9, 2008 at 2:06 AM, Nadav Horesh <[EMAIL PROTECTED]> wrote:
> Is the 80 bits float (float96 on IA32, float128 on AMD64) isn't enough? It 
> has a 64 bits mantissa and can represent numbers up to nearly 1E(+-)5000.

It only make the problem happen later, I think. If you have a GMM with
million of samples of high dimension with many clusters, any "linear"
representation will fail I think. In a sense, the IEEE format is not
adequate for that kind of computation.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread David Cournapeau
On Fri, May 9, 2008 at 1:54 AM, Charles R Harris
<[EMAIL PROTECTED]> wrote:

> Yes, and Gaussians are a delusion beyond a few sigma. One of my pet peeves.
> If you have more than 8 standard deviations, then something is fundamentally
> wrong in the concept and formulation.

If you have a mixture of Gaussian, and the components are not all
mostly overlapping, you will get those ranges, and nothing is wrong in
the formulation. I mean, it is not like EM algorithms are untested
things and totally new. It is used in many different fields, and all
its successful implementations use the logsumexp trick.

Look at here for the formula involved:

http://en.wikipedia.org/wiki/Expectation-maximization_algorithm

If you need to compute log (exp (-1000) + exp(-1001)), how would you
do ? If you do it the naive way, you have -inf, and it propagates
across all your computation quickly. -inf instead of -1000 seems like
a precision win to me. of course you are trading precision for range,
but when you are out of range for your number representation, the
tradeoff is not a loss anymore. It is really like denormal: they are
less precise than normal format *for the usual range*, but in the
range where denormal are used, they are much more precise; they are
actually infinitely more precise, since the normal representation
would be 0 :)

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Nadav Horesh
Is the 80 bits float (float96 on IA32, float128 on AMD64) isn't enough? It has 
a 64 bits mantissa and can represent numbers up to nearly 1E(+-)5000.

  Nadav.


-הודעה מקורית-
מאת: [EMAIL PROTECTED] בשם Charles R Harris
נשלח: ה 08-מאי-08 19:25
אל: Discussion of Numerical Python
נושא: Re: [Numpy-discussion] Log Arrays
 
On Thu, May 8, 2008 at 10:11 AM, Anne Archibald <[EMAIL PROTECTED]>
wrote:

> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
> >
> > What realistic probability is in the range exp(-1000) ?
>
> Well, I ran into it while doing a maximum-likelihood fit - my early
> guesses had exceedingly low probabilities, but I needed to know which
> way the probabilities were increasing.
>

The number of bosons in the universe is only on the order of 1e-42.
Exp(-1000) may be convenient, but as a probability it is a delusion. The
hypothesis "none of the above" would have a much larger prior.

But to expand on David's computation... If the numbers are stored without
using logs, i.e., as the exponentials, then the sum is of the form:

x_1*2**y_1 + ... + x_i*2**y_i

Where 1<= x_j < 2 and both x_i and y_i are available. When the numbers are
all of the same general magnitude you get essentially the same result as
David's formula by simply by dividing out the first value.

Chuck

Chuck

<>___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Robert Kern
On Thu, May 8, 2008 at 12:02 PM, Charles R Harris
<[EMAIL PROTECTED]> wrote:
>
> On Thu, May 8, 2008 at 10:56 AM, Robert Kern <[EMAIL PROTECTED]> wrote:
>>
>> On Thu, May 8, 2008 at 11:25 AM, Charles R Harris
>> <[EMAIL PROTECTED]> wrote:
>> >
>> > On Thu, May 8, 2008 at 10:11 AM, Anne Archibald
>> > <[EMAIL PROTECTED]>
>> > wrote:
>> >>
>> >> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
>> >> >
>> >> > What realistic probability is in the range exp(-1000) ?
>> >>
>> >> Well, I ran into it while doing a maximum-likelihood fit - my early
>> >> guesses had exceedingly low probabilities, but I needed to know which
>> >> way the probabilities were increasing.
>> >
>> > The number of bosons in the universe is only on the order of 1e-42.
>> > Exp(-1000) may be convenient, but as a probability it is a delusion. The
>> > hypothesis "none of the above" would have a much larger prior.
>>
>> When you're running an optimizer over a PDF, you will be stuck in the
>> region of exp(-1000) for a substantial amount of time before you get
>> to the peak. If you don't use the log representation, you will never
>> get to the peak because all of the gradient information is lost to
>> floating point error. You can consult any book on computational
>> statistics for many more examples. This is a long-established best
>> practice in statistics.
>
> But IEEE is already a log representation. You aren't gaining precision, you
> are gaining more bits in the exponent at the expense of fewer bits in the
> mantissa, i.e., less precision.

*YES*. As David pointed out, many of these PDFs are in exponential
form. Most of the meaningful variation is in the exponent, not the
mantissa.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 10:56 AM, Robert Kern <[EMAIL PROTECTED]> wrote:

> On Thu, May 8, 2008 at 11:25 AM, Charles R Harris
> <[EMAIL PROTECTED]> wrote:
> >
> > On Thu, May 8, 2008 at 10:11 AM, Anne Archibald <
> [EMAIL PROTECTED]>
> > wrote:
> >>
> >> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
> >> >
> >> > What realistic probability is in the range exp(-1000) ?
> >>
> >> Well, I ran into it while doing a maximum-likelihood fit - my early
> >> guesses had exceedingly low probabilities, but I needed to know which
> >> way the probabilities were increasing.
> >
> > The number of bosons in the universe is only on the order of 1e-42.
> > Exp(-1000) may be convenient, but as a probability it is a delusion. The
> > hypothesis "none of the above" would have a much larger prior.
>
> When you're running an optimizer over a PDF, you will be stuck in the
> region of exp(-1000) for a substantial amount of time before you get
> to the peak. If you don't use the log representation, you will never
> get to the peak because all of the gradient information is lost to
> floating point error. You can consult any book on computational
> statistics for many more examples. This is a long-established best
> practice in statistics.
>

But IEEE is already a log representation. You aren't gaining precision, you
are gaining more bits in the exponent at the expense of fewer bits in the
mantissa, i.e., less precision. As I say, it may be convenient, but if cpu
cycles matter, it isn't efficient.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 10:52 AM, David Cournapeau <[EMAIL PROTECTED]>
wrote:

> On Fri, May 9, 2008 at 1:25 AM, Charles R Harris
> <[EMAIL PROTECTED]> wrote:
> >
>
> >
> > But to expand on David's computation... If the numbers are stored without
> > using logs, i.e., as the exponentials, then the sum is of the form:
> >
> > x_1*2**y_1 + ... + x_i*2**y_i
>
> You missed the part on parametric models: in parametric settings, your
> x_i are often exponential,


I'm talking IEEE floating point. The x_i are never exponentials.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Robert Kern
On Thu, May 8, 2008 at 11:25 AM, Charles R Harris
<[EMAIL PROTECTED]> wrote:
>
> On Thu, May 8, 2008 at 10:11 AM, Anne Archibald <[EMAIL PROTECTED]>
> wrote:
>>
>> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
>> >
>> > What realistic probability is in the range exp(-1000) ?
>>
>> Well, I ran into it while doing a maximum-likelihood fit - my early
>> guesses had exceedingly low probabilities, but I needed to know which
>> way the probabilities were increasing.
>
> The number of bosons in the universe is only on the order of 1e-42.
> Exp(-1000) may be convenient, but as a probability it is a delusion. The
> hypothesis "none of the above" would have a much larger prior.

When you're running an optimizer over a PDF, you will be stuck in the
region of exp(-1000) for a substantial amount of time before you get
to the peak. If you don't use the log representation, you will never
get to the peak because all of the gradient information is lost to
floating point error. You can consult any book on computational
statistics for many more examples. This is a long-established best
practice in statistics.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 10:42 AM, David Cournapeau <[EMAIL PROTECTED]>
wrote:

> On Fri, May 9, 2008 at 1:04 AM, Charles R Harris
> <[EMAIL PROTECTED]> wrote:
>
> > <  1e-308 ?
>
> Yes, all the time. I mean, if it was not, why people would bother with
> long double and co ? Why denormal would exist ? I don't consider the
> comparison with the number of particules to be really relevant here.
> We are talking about implementation problems.
>
> >
> > Yes, logs can be useful there, but I still fail to see any precision
> > advantage. As I say, to all intents and purposes, IEEE floating point
> *is* a
> > logarithm. You will see that if you look at how log is implemented in
> > hardware. I'm less sure of the C floating point library because it needs
> to
> > be portable.
> >
> >>
> >> a = np.array([-1000., -1001.])
> >> np.log(np.sum(np.exp(a))) -> -inf
> >> -1000 + np.log(np.sum([1 + np.exp(-1)])) -> correct result
> >
> > What realistic probability is in the range exp(-1000) ?
>
> Realistic as significant, none of course. Realistic as it happens in
> computation ? Certainly. Typically, when you do clustering, and you
> have distant clusters, when you compute the probabilities of the
> points from one cluster relatively to another one, you can quickly get
> several units from the mean. Adds a small variance, and you can
> quickly get (x-mu)**2/sigma**2 around 1000.
>

Yes, and Gaussians are a delusion beyond a few sigma. One of my pet peeves.
If you have more than 8 standard deviations, then something is fundamentally
wrong in the concept and formulation. It is more likely that a particle
sized blackhole has whacked out some component of the experiment.


> You cannot just clip to 0, specially in online settings.
>
> >
> > If you have a hammer... It's portable, but there are wasted cpu cycles in
> > there. If speed was important, I suspect you could do better writing a
> low
> > level function that assumed IEEE doubles and twiddled the bits.
>
> When you call a function in python, you waste thousand cycles at every
> call. Yet, you use python, and not assembly :) The above procedure is
> extremely standard, and used in all robust implementations of machine
> learning algorithms I am aware of, it is implemented in HTK, a widely
> used toolkit for HMM for speech recognition for example.
>
> Twiddling bits is all fun, but it takes time and is extremely error
> prone. Also, I don't see what kind of method you have in mind here,
> exactly: how would you do a logsumexp algorithm with bit twiddling ?
>

You are complaining of inadequate range, but that is what scale factors are
for. Why compute exponentials and logs when all you need to do is store an
exponent in an integer.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread David Cournapeau
On Fri, May 9, 2008 at 1:25 AM, Charles R Harris
<[EMAIL PROTECTED]> wrote:
>

>
> But to expand on David's computation... If the numbers are stored without
> using logs, i.e., as the exponentials, then the sum is of the form:
>
> x_1*2**y_1 + ... + x_i*2**y_i

You missed the part on parametric models: in parametric settings, your
x_i are often exponential, so it makes sense to compute in the log
domain (you don't compute more log/exp than the naive implementation).

Of course, if we were not interested in log  x_i in the first place,
the thing would not have made any sense.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread David Cournapeau
On Fri, May 9, 2008 at 1:04 AM, Charles R Harris
<[EMAIL PROTECTED]> wrote:

> <  1e-308 ?

Yes, all the time. I mean, if it was not, why people would bother with
long double and co ? Why denormal would exist ? I don't consider the
comparison with the number of particules to be really relevant here.
We are talking about implementation problems.

>
> Yes, logs can be useful there, but I still fail to see any precision
> advantage. As I say, to all intents and purposes, IEEE floating point *is* a
> logarithm. You will see that if you look at how log is implemented in
> hardware. I'm less sure of the C floating point library because it needs to
> be portable.
>
>>
>> a = np.array([-1000., -1001.])
>> np.log(np.sum(np.exp(a))) -> -inf
>> -1000 + np.log(np.sum([1 + np.exp(-1)])) -> correct result
>
> What realistic probability is in the range exp(-1000) ?

Realistic as significant, none of course. Realistic as it happens in
computation ? Certainly. Typically, when you do clustering, and you
have distant clusters, when you compute the probabilities of the
points from one cluster relatively to another one, you can quickly get
several units from the mean. Adds a small variance, and you can
quickly get (x-mu)**2/sigma**2 around 1000.

You cannot just clip to 0, specially in online settings.

>
> If you have a hammer... It's portable, but there are wasted cpu cycles in
> there. If speed was important, I suspect you could do better writing a low
> level function that assumed IEEE doubles and twiddled the bits.

When you call a function in python, you waste thousand cycles at every
call. Yet, you use python, and not assembly :) The above procedure is
extremely standard, and used in all robust implementations of machine
learning algorithms I am aware of, it is implemented in HTK, a widely
used toolkit for HMM for speech recognition for example.

Twiddling bits is all fun, but it takes time and is extremely error
prone. Also, I don't see what kind of method you have in mind here,
exactly: how would you do a logsumexp algorithm with bit twiddling ?

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 10:11 AM, Anne Archibald <[EMAIL PROTECTED]>
wrote:

> 2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
> >
> > What realistic probability is in the range exp(-1000) ?
>
> Well, I ran into it while doing a maximum-likelihood fit - my early
> guesses had exceedingly low probabilities, but I needed to know which
> way the probabilities were increasing.
>

The number of bosons in the universe is only on the order of 1e-42.
Exp(-1000) may be convenient, but as a probability it is a delusion. The
hypothesis "none of the above" would have a much larger prior.

But to expand on David's computation... If the numbers are stored without
using logs, i.e., as the exponentials, then the sum is of the form:

x_1*2**y_1 + ... + x_i*2**y_i

Where 1<= x_j < 2 and both x_i and y_i are available. When the numbers are
all of the same general magnitude you get essentially the same result as
David's formula by simply by dividing out the first value.

Chuck

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Anne Archibald
2008/5/8 Charles R Harris <[EMAIL PROTECTED]>:
>
> What realistic probability is in the range exp(-1000) ?

Well, I ran into it while doing a maximum-likelihood fit - my early
guesses had exceedingly low probabilities, but I needed to know which
way the probabilities were increasing.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Anne Archibald
2008/5/8 David Cournapeau <[EMAIL PROTECTED]>:
> On Thu, May 8, 2008 at 10:20 PM, Charles R Harris
>  <[EMAIL PROTECTED]> wrote:
>  >
>  >
>  > Floating point numbers are essentially logs to base 2, i.e., integer
>  > exponent and mantissa between 1 and 2. What does using the log buy you?
>
>  Precision, of course. I am not sure I understand the notation base =
>  2, but doing computation in the so called log-domain is a must in many
>  statistical computations. In particular, in machine learning with
>  large datasets, it is common to have some points whose pdf is
>  extremely small, and well below the precision of double. Typically,
>  internally, the computation of my EM toolbox are done in the log
>  domain, and use the logsumexp trick to compute likelihood given some
>  data:

I'm not sure I'd describe this as precision, exactly; it's an issue of
numerical range. But yes, I've come across this while doing
maximum-likelihood fitting, and a coworker ran into it doing Bayesian
statistics. It definitely comes up.

Is "logarray" really the way to handle it, though? it seems like you
could probably get away with providing a logsum ufunc that did the
right thing. I mean, what operations does one want to do on logarrays?

add -> logsum
subtract -> ?
multiply -> add
mean -> logsum/N
median -> median
exponentiate to recover normal-space values -> exp
str -> ?

I suppose numerical integration is also valuable, so it would help to
have a numerical integrator that was reasonably smart about working
with logs. (Though really it's just a question of rescaling and
exponentiating, I think.)

A "logarray" type would help by keeping track of the fact that its
contents were in log space, and would make expressions a little less
cumbersome, I guess. How much effort would it take to write it so that
it got all the corner cases right?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 9:20 AM, David Cournapeau <[EMAIL PROTECTED]> wrote:

> On Thu, May 8, 2008 at 10:20 PM, Charles R Harris
> <[EMAIL PROTECTED]> wrote:
> >
> >
> > Floating point numbers are essentially logs to base 2, i.e., integer
> > exponent and mantissa between 1 and 2. What does using the log buy you?
>
> Precision, of course. I am not sure I understand the notation base =
> 2,


lg(x) = ln(x)/ln(2)


> but doing computation in the so called log-domain is a must in many
> statistical computations. In particular, in machine learning with
> large datasets, it is common to have some points whose pdf is
> extremely small, and well below the precision of double.


<  1e-308 ?


> Typically,
> internally, the computation of my EM toolbox are done in the log
> domain, and use the logsumexp trick to compute likelihood given some
> data:
>

Yes, logs can be useful there, but I still fail to see any precision
advantage. As I say, to all intents and purposes, IEEE floating point *is* a
logarithm. You will see that if you look at how log is implemented in
hardware. I'm less sure of the C floating point library because it needs to
be portable.


>
> a = np.array([-1000., -1001.])
> np.log(np.sum(np.exp(a))) -> -inf
> -1000 + np.log(np.sum([1 + np.exp(-1)])) -> correct result
>

What realistic probability is in the range exp(-1000) ?


>
> Where you use log(exp(x) + exp(y)) = x + log(1 + exp(y-x)). It is
> useful when x and y are in the same range bu far from 0, which happens
> a lot practically in many machine learning algorithms (EM, SVM, etc...
> everywhere you need to compute likelihood of densities from the
> exponential family, which covers most practical cases of parametric
> estimation)
>

If you have a hammer... It's portable, but there are wasted cpu cycles in
there. If speed was important, I suspect you could do better writing a low
level function that assumed IEEE doubles and twiddled the bits.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread David Cournapeau
On Thu, May 8, 2008 at 10:20 PM, Charles R Harris
<[EMAIL PROTECTED]> wrote:
>
>
> Floating point numbers are essentially logs to base 2, i.e., integer
> exponent and mantissa between 1 and 2. What does using the log buy you?

Precision, of course. I am not sure I understand the notation base =
2, but doing computation in the so called log-domain is a must in many
statistical computations. In particular, in machine learning with
large datasets, it is common to have some points whose pdf is
extremely small, and well below the precision of double. Typically,
internally, the computation of my EM toolbox are done in the log
domain, and use the logsumexp trick to compute likelihood given some
data:

a = np.array([-1000., -1001.])
np.log(np.sum(np.exp(a))) -> -inf
-1000 + np.log(np.sum([1 + np.exp(-1)])) -> correct result

Where you use log(exp(x) + exp(y)) = x + log(1 + exp(y-x)). It is
useful when x and y are in the same range bu far from 0, which happens
a lot practically in many machine learning algorithms (EM, SVM, etc...
everywhere you need to compute likelihood of densities from the
exponential family, which covers most practical cases of parametric
estimation)
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 1:26 AM, T J <[EMAIL PROTECTED]> wrote:

> Hi,
>
> For precision reasons, I almost always need to work with arrays whose
> elements are log values.  My thought was that it would be really neat
> to have a 'logarray' class implemented in C or as a subclass of the
> standard array class.  Here is a sample of how I'd like to work with
> these objects:
>

Floating point numbers are essentially logs to base 2, i.e., integer
exponent and mantissa between 1 and 2. What does using the log buy you?

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread T J
On Thu, May 8, 2008 at 12:26 AM, T J <[EMAIL PROTECTED]> wrote:
>
>  >>> x = array([-2,-2,-3], base=2)
>  >>> y = array([-1,-2,-inf], base=2)
>  >>> z = x + y
>  >>> z
>  array([-0.415037499279, -1.0, -3])
>  >>> z = x * y
>  >>> z
>  array([-3, -4, -inf])
>  >>> z[:2].sum()
>  -2.41503749928
>

Whoops

s/array/logarray/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Log Arrays

2008-05-08 Thread T J
Hi,

For precision reasons, I almost always need to work with arrays whose
elements are log values.  My thought was that it would be really neat
to have a 'logarray' class implemented in C or as a subclass of the
standard array class.  Here is a sample of how I'd like to work with
these objects:

>>> x = array([-2,-2,-3], base=2)
>>> y = array([-1,-2,-inf], base=2)
>>> z = x + y
>>> z
array([-0.415037499279, -1.0, -3])
>>> z = x * y
>>> z
array([-3, -4, -inf])
>>> z[:2].sum()
-2.41503749928

This would do a lot for the code I writeand some of the numerical
stability issues would handled more appropriately.  For example, the
logsum function is frequently handled as:

 log(x + y)  ==  log(x) + log(1 + exp(log(y) - log(x)) )

when log(x) > log(y).   So the goal of the above function is to return
log(x + y)  using only the logarithms of x and y.


Does this sound like a good idea?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion