Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-12-17 Thread Robert Haas
On Thu, Dec 17, 2015 at 10:38 AM, Dean Rasheed  wrote:
> On 10 December 2015 at 20:02, Tom Lane  wrote:
>> Robert Haas  writes:
>>> It seems to be a loss of 4 digits in every case I've seen.
>>
>> I wouldn't have a problem with, say, throwing in an extra DEC_DIGITS worth
>> of rscale in each of these functions so that the discrepancies tend to
>> favor more significant digits out, rather than fewer.  I don't know that
>> it's worth trying to guarantee that the result is never fewer digits than
>> before, and I certainly wouldn't want to make the rules a lot more complex
>> than what's there now.  But perhaps we could cover most cases easily.
>>
>
> Looking at this, it appears that those extra digits of precision for
> log(0.5) in the old code are an anomaly that only occurs for a certain
> range of inputs. According to the code comments these functions
> intentionally output at least around 16 significant digits (or more if
> the input has greater precision), so that they output at least the
> precision of floating point. For example, in both 9.5 and HEAD:
>
> select exp(5::numeric);
> exp
> 
>  148.41315910257660
>
> select exp(0.5::numeric);
> exp
> 
>  1.6487212707001281
>
> select ln(5::numeric);
>  ln
> 
>  1.6094379124341004
>
> select ln(0.5::numeric);
>  ln
> -
>  -0.6931471805599453
>
> select power(0.5::numeric, 0.4::numeric);
>power
> 
>  0.7578582832551990
>
>
> However, the old log() code would occasionally output 4 more digits
> than that, due to it's mis-estimation of the result weight, which was
> used to determine the output scale. So, for example, in 9.5:
>
> select log(0.0005::numeric);
>  log
> -
>  -3.3010299956639812
>
> select log(0.005::numeric);
>  log
> -
>  -2.3010299956639812
>
> select log(0.05::numeric);
>log
> -
>  -1.30102999566398119521
>
> select log(0.5::numeric);
>log
> -
>  -0.30102999566398119521
>
> select log(5::numeric);
>   log
> 
>  0.69897000433601880479
>
> select log(50::numeric);
> log
> 
>  1.6989700043360188
>
> select log(500::numeric);
> log
> 
>  2.6989700043360188
>
> i.e., for a certain range of inputs the result precision jumps from 16
> to 20 digits after the decimal point, whereas in HEAD the precision of
> the results is more consistent across the range:
>
> select log(0.0005::numeric);
>  log
> -
>  -3.3010299956639812
>
> select log(0.005::numeric);
>  log
> -
>  -2.3010299956639812
>
> select log(0.05::numeric);
>  log
> -
>  -1.3010299956639812
>
> select log(0.5::numeric);
>  log
> -
>  -0.3010299956639812
>
> select log(5::numeric);
> log
> 
>  0.6989700043360188
>
> select log(50::numeric);
> log
> 
>  1.6989700043360188
>
> select log(500::numeric);
> log
> 
>  2.6989700043360188
>
>
> With other inputs, the actual number of significant digits can vary
> between 16 and 17, but it's generally better behaved than the old
> code, even though it sometimes produces fewer digits. I think it ought
> to be sufficient to release note that the number of digits returned by
> these functions may have changed, in addition to the results being
> more accurate.

Thanks for the additional explanation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-12-17 Thread Dean Rasheed
On 10 December 2015 at 20:02, Tom Lane  wrote:
> Robert Haas  writes:
>> It seems to be a loss of 4 digits in every case I've seen.
>
> I wouldn't have a problem with, say, throwing in an extra DEC_DIGITS worth
> of rscale in each of these functions so that the discrepancies tend to
> favor more significant digits out, rather than fewer.  I don't know that
> it's worth trying to guarantee that the result is never fewer digits than
> before, and I certainly wouldn't want to make the rules a lot more complex
> than what's there now.  But perhaps we could cover most cases easily.
>

Looking at this, it appears that those extra digits of precision for
log(0.5) in the old code are an anomaly that only occurs for a certain
range of inputs. According to the code comments these functions
intentionally output at least around 16 significant digits (or more if
the input has greater precision), so that they output at least the
precision of floating point. For example, in both 9.5 and HEAD:

select exp(5::numeric);
exp

 148.41315910257660

select exp(0.5::numeric);
exp

 1.6487212707001281

select ln(5::numeric);
 ln

 1.6094379124341004

select ln(0.5::numeric);
 ln
-
 -0.6931471805599453

select power(0.5::numeric, 0.4::numeric);
   power

 0.7578582832551990


However, the old log() code would occasionally output 4 more digits
than that, due to it's mis-estimation of the result weight, which was
used to determine the output scale. So, for example, in 9.5:

select log(0.0005::numeric);
 log
-
 -3.3010299956639812

select log(0.005::numeric);
 log
-
 -2.3010299956639812

select log(0.05::numeric);
   log
-
 -1.30102999566398119521

select log(0.5::numeric);
   log
-
 -0.30102999566398119521

select log(5::numeric);
  log

 0.69897000433601880479

select log(50::numeric);
log

 1.6989700043360188

select log(500::numeric);
log

 2.6989700043360188

i.e., for a certain range of inputs the result precision jumps from 16
to 20 digits after the decimal point, whereas in HEAD the precision of
the results is more consistent across the range:

select log(0.0005::numeric);
 log
-
 -3.3010299956639812

select log(0.005::numeric);
 log
-
 -2.3010299956639812

select log(0.05::numeric);
 log
-
 -1.3010299956639812

select log(0.5::numeric);
 log
-
 -0.3010299956639812

select log(5::numeric);
log

 0.6989700043360188

select log(50::numeric);
log

 1.6989700043360188

select log(500::numeric);
log

 2.6989700043360188


With other inputs, the actual number of significant digits can vary
between 16 and 17, but it's generally better behaved than the old
code, even though it sometimes produces fewer digits. I think it ought
to be sufficient to release note that the number of digits returned by
these functions may have changed, in addition to the results being
more accurate.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-12-10 Thread Dean Rasheed
On 10 December 2015 at 20:02, Tom Lane  wrote:
>> It seems to be a loss of 4 digits in every case I've seen.
>
> I wouldn't have a problem with, say, throwing in an extra DEC_DIGITS worth
> of rscale in each of these functions so that the discrepancies tend to
> favor more significant digits out, rather than fewer.  I don't know that
> it's worth trying to guarantee that the result is never fewer digits than
> before, and I certainly wouldn't want to make the rules a lot more complex
> than what's there now.  But perhaps we could cover most cases easily.
>
> Dean, do you want to recheck the patch with an eye to that?
>

OK, I'll take a look at it.

Regards,
Dean



https://www.avast.com/lp-safe-emailing?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail";
target="_blank">https://ipmcdn.avast.com/images/logo-avast-v1.png"; style="width:
90px; height:33px;"/>

This email has been sent from a virus-free
computer protected by Avast. https://www.avast.com/lp-safe-emailing?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail";
target="_blank" style="color: #4453ea;">www.avast.com





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-12-10 Thread Tom Lane
Robert Haas  writes:
> On Thu, Dec 10, 2015 at 2:47 PM, Tom Lane  wrote:
>> That's on me as author of the commit message, I guess.  The rscale
>> in most of these functions is exactly the number of fraction digits
>> that will be emitted, and we changed the rules for computing it.
>> Not by much, in most cases.  I don't think we should be too worried
>> about being bug-compatible with the old behavior.

> It seems to be a loss of 4 digits in every case I've seen.

I wouldn't have a problem with, say, throwing in an extra DEC_DIGITS worth
of rscale in each of these functions so that the discrepancies tend to
favor more significant digits out, rather than fewer.  I don't know that
it's worth trying to guarantee that the result is never fewer digits than
before, and I certainly wouldn't want to make the rules a lot more complex
than what's there now.  But perhaps we could cover most cases easily.

Dean, do you want to recheck the patch with an eye to that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-12-10 Thread Robert Haas
On Thu, Dec 10, 2015 at 2:47 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> This patch, or something nearby, seems to have changed the number of
>> significant figures produced by log() and maybe some of the other
>> functions this patch touched.
>
> Yeah, not surprising.
>
>> It's certainly not obvious from the commit message that this change
>> was expected.
>
> That's on me as author of the commit message, I guess.  The rscale
> in most of these functions is exactly the number of fraction digits
> that will be emitted, and we changed the rules for computing it.
> Not by much, in most cases.  I don't think we should be too worried
> about being bug-compatible with the old behavior.

It seems to be a loss of 4 digits in every case I've seen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-12-10 Thread Tom Lane
Robert Haas  writes:
> This patch, or something nearby, seems to have changed the number of
> significant figures produced by log() and maybe some of the other
> functions this patch touched.

Yeah, not surprising.

> It's certainly not obvious from the commit message that this change
> was expected.

That's on me as author of the commit message, I guess.  The rscale
in most of these functions is exactly the number of fraction digits
that will be emitted, and we changed the rules for computing it.
Not by much, in most cases.  I don't think we should be too worried
about being bug-compatible with the old behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-12-10 Thread Robert Haas
On Thu, Dec 10, 2015 at 2:25 PM, Simon Riggs  wrote:
> On 10 December 2015 at 18:59, Robert Haas  wrote:
>Why did we make the change?  I'm not sure it's bad, but
>>
>> it seems funny to whack a user-visible behavior around like this
>> without a clearly-explained reason.
>
> Surely the title of the post explains?

Nope.  I'm not asking why we fixed the inaccurate results.  I'm asking
why we now produce fewer output digits than before.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-12-10 Thread Simon Riggs
On 10 December 2015 at 18:59, Robert Haas  wrote:
   Why did we make the change?  I'm not sure it's bad, but

> it seems funny to whack a user-visible behavior around like this
> without a clearly-explained reason.
>

Surely the title of the post explains?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-12-10 Thread Robert Haas
On Sun, Nov 15, 2015 at 4:40 AM, Dean Rasheed  wrote:
> On 14 November 2015 at 20:05, Tom Lane  wrote:
>> I committed this with that change and some other
>> mostly-cosmetic revisions.
>
> Thanks.

This patch, or something nearby, seems to have changed the number of
significant figures produced by log() and maybe some of the other
functions this patch touched.  On master:

rhaas=# select log(.5);
 log
-
 -0.3010299956639812
(1 row)

But on REL9_5_STABLE:

rhaas=# select log(.5);
   log
-
 -0.30102999566398119521
(1 row)

It's certainly not obvious from the commit message that this change
was expected.  There is something about setting the rscales for
intermediate results, but there isn't any indication that the number
of digits in the final result should be expected to differ.  Was that
intentional?  Why did we make the change?  I'm not sure it's bad, but
it seems funny to whack a user-visible behavior around like this
without a clearly-explained reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-15 Thread Dean Rasheed
On 14 November 2015 at 20:05, Tom Lane  wrote:
> I committed this with that change and some other
> mostly-cosmetic revisions.

Thanks.


>  Notable non-cosmetic changes:
>
> * I forced all the computed rscales to be at least 0, via code like
> local_rscale = Max(local_rscale, NUMERIC_MIN_DISPLAY_SCALE);
>
> You had done that in some places but not all, with the result that
> functions like mul_var could be invoked with negative rscale and hence
> produce outputs with negative dscale.  This is not allowed according to
> the comments in the code, and while it might accidentally fail to fail,
> I do not trust the code to operate properly in that regime.  It might be
> worth revisiting the whole question of negative dscale with an eye to not
> calculating digits we don't need, but I think that's material for a
> completely separate patch.
>

Hmm, I wondered about that when deciding on the rscale for the sqrt()s
in the range reduction code for ln_var(). For very large inputs,
forcing the rscale to be non-negative can cause the sqrt() to compute
far more digits than are needed, but I take your point that it might
be playing with fire if the underlying functions aren't sure to handle
negative rscales properly.

In most normal cases it makes little difference to performance. For
ln(9.9e99) it's only around 5% slower than my original code, and for
ln(9.9e999) it is around 5 times slower, but still pretty fast (520
microseconds vs 128). Once you get to cases like ln(2.0^435411),
however, it is pretty bad (13 seconds vs 165ms). But that is a very
contrived worst-case example, and actually that performance is no
different than it was before the patch. I very much doubt anyone will
ever do such a computation, except out of academic interest.


> * I moved some of the new regression test cases into numeric_big.sql.
> While they don't add more than a couple hundred msec on my dev machine,
> they're probably going to cost a lot more than that on the slower
> buildfarm machines, and I'm not sure that they add enough benefit to be
> worth running all the time.  This code really shouldn't suffer from many
> portability issues.
>
> (I am going to go run numeric_big manually on all my own buildfarm
> critters just to be sure, though.)
>
> regards, tom lane

OK, that seems fair enough.

Thanks.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-14 Thread Tom Lane
Dean Rasheed  writes:
> Yeah, that makes sense. Thinking about it some more, its potential
> benefit becomes even less apparent at higher rscales because the cost
> of ln() relative to sqrt() will become larger -- the number of Taylor
> series terms required will grow roughly in proportion to the number of
> digits N, whereas the number of iterations sqrt() needs to do only
> grows like log(N) I think, because of it's quadratic convergence. So
> let's get rid of it.

OK, done that way.  I committed this with that change and some other
mostly-cosmetic revisions.  Notable non-cosmetic changes:

* I forced all the computed rscales to be at least 0, via code like
local_rscale = Max(local_rscale, NUMERIC_MIN_DISPLAY_SCALE);

You had done that in some places but not all, with the result that
functions like mul_var could be invoked with negative rscale and hence
produce outputs with negative dscale.  This is not allowed according to
the comments in the code, and while it might accidentally fail to fail,
I do not trust the code to operate properly in that regime.  It might be
worth revisiting the whole question of negative dscale with an eye to not
calculating digits we don't need, but I think that's material for a
completely separate patch.

* I moved some of the new regression test cases into numeric_big.sql.
While they don't add more than a couple hundred msec on my dev machine,
they're probably going to cost a lot more than that on the slower
buildfarm machines, and I'm not sure that they add enough benefit to be
worth running all the time.  This code really shouldn't suffer from many
portability issues.

(I am going to go run numeric_big manually on all my own buildfarm
critters just to be sure, though.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-14 Thread Dean Rasheed
On 14 November 2015 at 16:13, Tom Lane  wrote:
>> We might well want to keep the power-10 argument reduction step, but
>> it would now be there purely on performance grounds so there's scope
>> for playing around with the threshold at which it kicks in.
>
> My inclination is to get rid of it.  I thought having ln_var recurse was
> rather ugly, and I'm suspicious of whether there's really any performance
> benefit.  Even with the pow_10 reduction, you can have up to 7 sqrt steps
> (if the first digit is , or indeed anything above 445), and will
> almost always have 4 or 5.  So if the pow_10 reduction costs about as much
> as 7 sqrt steps, the input has to exceed something like 1e170 before it
> can hope to win.  Admittedly there's daylight between there and 1e128000,
> but I doubt it's worth carrying extra code for.
>

Yeah, that makes sense. Thinking about it some more, its potential
benefit becomes even less apparent at higher rscales because the cost
of ln() relative to sqrt() will become larger -- the number of Taylor
series terms required will grow roughly in proportion to the number of
digits N, whereas the number of iterations sqrt() needs to do only
grows like log(N) I think, because of it's quadratic convergence. So
let's get rid of it.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-14 Thread Tom Lane
Dean Rasheed  writes:
> I'm much less happy with the sqrt() range reduction step though. I now
> realise that the whole increment local_rscale before each sqrt()
> approach is totally bogus.

Yeah, I was wondering about that yesterday ...

> So repeated use of sqrt() can be used for range reduction, even in
> extreme cases, and it won't lose precision if it's done right. In
> fact, in the worst case there are only 22 sqrts() by my count.

Cool.

> We might well want to keep the power-10 argument reduction step, but
> it would now be there purely on performance grounds so there's scope
> for playing around with the threshold at which it kicks in.

My inclination is to get rid of it.  I thought having ln_var recurse was
rather ugly, and I'm suspicious of whether there's really any performance
benefit.  Even with the pow_10 reduction, you can have up to 7 sqrt steps
(if the first digit is , or indeed anything above 445), and will
almost always have 4 or 5.  So if the pow_10 reduction costs about as much
as 7 sqrt steps, the input has to exceed something like 1e170 before it
can hope to win.  Admittedly there's daylight between there and 1e128000,
but I doubt it's worth carrying extra code for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-14 Thread Dean Rasheed
On 13 November 2015 at 21:34, Dean Rasheed  wrote:
> On 13 November 2015 at 18:36, Tom Lane  wrote:
>> Next question: in the additional range-reduction step you added to ln_var,
>> why stop there, ie, what's the rationale for this magic number:
>>
>> if (Abs((x.weight + 1) * DEC_DIGITS) > 10)
>>
>> Seems like we arguably should do this whenever the weight isn't zero,
>> so as to minimize the number of sqrt() steps.  (Yes, I see the point
>> about not getting into infinite recursion, but that only says that
>> the threshold needs to be more than 10, not that it needs to be 10^10.)
>>
>
> It's a bit arbitrary. There is a tradeoff here -- computing ln(10) is
> more expensive than doing a sqrt() since the Babylonian algorithm used
> for sqrt() is quadratically convergent, whereas the Taylor series for
> ln() converges more slowly. At the default precision, ln(10) is around
> 7 times slower than sqrt() on my machine, although that will vary with
> precision, and the sqrt()s increase the local rscale and so they will
> get slower. Anyway, it seemed reasonable to not do the extra ln()
> unless it was going to save at least a couple of sqrt()s.
>
>
>> Also, it seems a little odd to put the recursive calculation of ln(10)
>> where you did, rather than down where it's used, ie why not
>>
>> mul_var(result, &fact, result, local_rscale);
>>
>> ln_var(&const_ten, &ln_10, local_rscale);
>> int64_to_numericvar((int64) pow_10, &ni);
>> mul_var(&ln_10, &ni, &xx, local_rscale);
>> add_var(result, &xx, result);
>>
>> round_var(result, rscale);
>>
>> As you have it, ln_10 will be calculated with possibly a smaller rscale
>> than is used in this stanza.  That might be all right but it seems dubious
>> --- couldn't the lower-precision result leak into digits we care about?
>>
>
> Well it still has 8 digits more than the final rscale, so it's
> unlikely to cause any loss of precision when added to the result

I thought I'd try an extreme test of that claim, so I tried the
following example:

select ln(2.0^435411);
   ln
-
 301803.9070347863471187

The input to ln() in this case is truly huge (possibly larger than we
ought to allow), and results in pow_10 = 131068 in ln_var(). So we
compute ln(10) with 8 extra digits of precision, multiply that by
131068, effectively shifting it up by 6 decimal digits, leaving a
safety margin of 2 extra digits at the lower end, before we add that
to the result. And the above result is indeed correct according to bc
(just don't try to compute it using l(2^435411), or you'll be waiting
a long time :-).

That leaves me feeling pretty happy about that aspect of the
computation because in all realistic cases pow_10 ought to fall way
short of that, so there's a considerable margin of safety.

I'm much less happy with the sqrt() range reduction step though. I now
realise that the whole increment local_rscale before each sqrt()
approach is totally bogus. Like elsewhere in this patch, what we ought
to be doing is ensuring that the number of significant digits remains
fixed as the weight of x is reduced. Given the slow rate of increase
of logarithms as illustrated above, we only need to keep rscale + a
few significant digits during the sqrt() range reduction. I tried the
following:

/*
 * Use sqrt() to reduce the input to the range 0.9 < x < 1.1.
 *
 * The final logarithm will have up to around rscale+6 significant digits.
 * Each sqrt() will roughly halve the weight of x, so adjust the local
 * rscale as we work so that we keep this many significant digits at each
 * step (plus a few more for good measure).
 */
while (cmp_var(&x, &const_zero_point_nine) <= 0)
{
sqrt_rscale = rscale - x.weight * DEC_DIGITS / 2 + 8;
sqrt_var(&x, &x, sqrt_rscale);
mul_var(&fact, &const_two, &fact, 0);
}
while (cmp_var(&x, &const_one_point_one) >= 0)
{
sqrt_rscale = rscale - x.weight * DEC_DIGITS / 2 + 8;
sqrt_var(&x, &x, sqrt_rscale);
mul_var(&fact, &const_two, &fact, 0);
}

and it passed all the tests (including the extreme case above) with
the power-10 range reduction step disabled.

So repeated use of sqrt() can be used for range reduction, even in
extreme cases, and it won't lose precision if it's done right. In
fact, in the worst case there are only 22 sqrts() by my count.

Note that this also reduces the rscale used in the Taylor series,
since local_rscale is no longer being increased above its original
value of rscale+8. I think that's OK for largely the same reasons --
the result of the Taylor series is multiplied by a factor of at most
2^22 (and usually much less than that), so the 8 extra digits ought to
be sufficient, although that could be upped a bit if you really wanted
to be sure.

We might well want to keep the power-10 argument reduction step, but
it would now be there pur

Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-13 Thread Tom Lane
Dean Rasheed  writes:
> On 14 November 2015 at 00:16, Dean Rasheed  wrote:
>> I can't see a way to make that work reliably. It would need to be
>> 10^estimate_ln_weight(base) and the problem is that both exp and
>> ln_weight could be too big to fit in double variables, and become
>> HUGE_VAL, losing all precision.

> Of course I meant 10^ln_weight could be too big to fit in a double.

Drat, that's the second time in this review that I've confused
ln_weight(x) with ln(x).  Time to call it a day ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-13 Thread Dean Rasheed
On 14 November 2015 at 00:16, Dean Rasheed  wrote:
> I can't see a way to make that work reliably. It would need to be
> 10^estimate_ln_weight(base) and the problem is that both exp and
> ln_weight could be too big to fit in double variables, and become
> HUGE_VAL, losing all precision.

Of course I meant 10^ln_weight could be too big to fit in a double.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-13 Thread Dean Rasheed
On 13 November 2015 at 23:10, Tom Lane  wrote:
> One more thing: the approach you used in power_var() of doing a whole
> separate exp * ln(base) calculation to approximate the result weight
> seems mighty expensive, even if it is done at minimal precision.
> Couldn't we get a good-enough approximation using basically
> numericvar_to_double_no_overflow(exp) * estimate_ln_weight(base) ?
>

I can't see a way to make that work reliably. It would need to be
10^estimate_ln_weight(base) and the problem is that both exp and
ln_weight could be too big to fit in double variables, and become
HUGE_VAL, losing all precision.

An interesting example is the limit of (1+1/x)^x as x approaches
infinity which is e (the base of natural logarithms), so in that case
both the exponent and ln_weight could be arbitrarily big (well too big
for doubles anyway). For example (1+1/1.2e+500)^(1.2e500) =
2.7182818284...

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-13 Thread Tom Lane
Dean Rasheed  writes:
> On 13 November 2015 at 18:36, Tom Lane  wrote:
>> Seems like we arguably should do this whenever the weight isn't zero,
>> so as to minimize the number of sqrt() steps.

> It's a bit arbitrary. There is a tradeoff here -- computing ln(10) is
> more expensive than doing a sqrt() since the Babylonian algorithm used
> for sqrt() is quadratically convergent, whereas the Taylor series for
> ln() converges more slowly. At the default precision, ln(10) is around
> 7 times slower than sqrt() on my machine, although that will vary with
> precision, and the sqrt()s increase the local rscale and so they will
> get slower. Anyway, it seemed reasonable to not do the extra ln()
> unless it was going to save at least a couple of sqrt()s.

OK --- I think I miscounted how many sqrt's we could expect to save.

One more thing: the approach you used in power_var() of doing a whole
separate exp * ln(base) calculation to approximate the result weight
seems mighty expensive, even if it is done at minimal precision.
Couldn't we get a good-enough approximation using basically
numericvar_to_double_no_overflow(exp) * estimate_ln_weight(base) ?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-13 Thread Dean Rasheed
On 13 November 2015 at 21:00, Tom Lane  wrote:
> BTW, something I find confusing and error-prone is that this patch keeps
> on using the term "weight" to refer to numbers expressed in decimal digits
> (ie, the approximate log10 of something).  Basically everywhere in the
> existing code, "weights" are measured in base-NBASE digits, while "scales"
> are measured in decimal digits.  I've not yet come across anyplace where
> you got the units wrong, but it seems like a gotcha waiting to bite the
> next hacker.
>
> I thought for a bit about s/weight/scale/g in the patch, but that is not
> le mot juste either, since weight is generally counting digits to the left
> of the decimal point while scale is generally counting digits to the
> right.
>
> The best idea that has come to me is to use "dweight" to refer to a weight
> measured in decimal digits.  Anyone have a better thought?
>

Yeah, dweight works for me.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-13 Thread Dean Rasheed
On 13 November 2015 at 18:36, Tom Lane  wrote:
> Next question: in the additional range-reduction step you added to ln_var,
> why stop there, ie, what's the rationale for this magic number:
>
> if (Abs((x.weight + 1) * DEC_DIGITS) > 10)
>
> Seems like we arguably should do this whenever the weight isn't zero,
> so as to minimize the number of sqrt() steps.  (Yes, I see the point
> about not getting into infinite recursion, but that only says that
> the threshold needs to be more than 10, not that it needs to be 10^10.)
>

It's a bit arbitrary. There is a tradeoff here -- computing ln(10) is
more expensive than doing a sqrt() since the Babylonian algorithm used
for sqrt() is quadratically convergent, whereas the Taylor series for
ln() converges more slowly. At the default precision, ln(10) is around
7 times slower than sqrt() on my machine, although that will vary with
precision, and the sqrt()s increase the local rscale and so they will
get slower. Anyway, it seemed reasonable to not do the extra ln()
unless it was going to save at least a couple of sqrt()s.


> Also, it seems a little odd to put the recursive calculation of ln(10)
> where you did, rather than down where it's used, ie why not
>
> mul_var(result, &fact, result, local_rscale);
>
> ln_var(&const_ten, &ln_10, local_rscale);
> int64_to_numericvar((int64) pow_10, &ni);
> mul_var(&ln_10, &ni, &xx, local_rscale);
> add_var(result, &xx, result);
>
> round_var(result, rscale);
>
> As you have it, ln_10 will be calculated with possibly a smaller rscale
> than is used in this stanza.  That might be all right but it seems dubious
> --- couldn't the lower-precision result leak into digits we care about?
>

Well it still has 8 digits more than the final rscale, so it's
unlikely to cause any loss of precision when added to the result and
then rounded to rscale. But on the other hand it does look neater to
compute it there, right where it's needed.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-13 Thread Tom Lane
BTW, something I find confusing and error-prone is that this patch keeps
on using the term "weight" to refer to numbers expressed in decimal digits
(ie, the approximate log10 of something).  Basically everywhere in the
existing code, "weights" are measured in base-NBASE digits, while "scales"
are measured in decimal digits.  I've not yet come across anyplace where
you got the units wrong, but it seems like a gotcha waiting to bite the
next hacker.

I thought for a bit about s/weight/scale/g in the patch, but that is not
le mot juste either, since weight is generally counting digits to the left
of the decimal point while scale is generally counting digits to the
right.

The best idea that has come to me is to use "dweight" to refer to a weight
measured in decimal digits.  Anyone have a better thought?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-13 Thread Tom Lane
Dean Rasheed  writes:
> On 12 November 2015 at 21:01, Tom Lane  wrote:
>> I started to look at this patch, and was immediately bemused by the
>> comment in estimate_ln_weight:

> That's nonsense. The comment is perfectly correct. It's not saying the
> logarithm is negative, it's saying that the *weight* of the logarithm
> is negative.

Ah, you're right --- I'd gotten confused about the distinction between
ln(x) and ln(ln(x)).  Nevermind ...

Next question: in the additional range-reduction step you added to ln_var,
why stop there, ie, what's the rationale for this magic number:

if (Abs((x.weight + 1) * DEC_DIGITS) > 10)

Seems like we arguably should do this whenever the weight isn't zero,
so as to minimize the number of sqrt() steps.  (Yes, I see the point
about not getting into infinite recursion, but that only says that
the threshold needs to be more than 10, not that it needs to be 10^10.)

Also, it seems a little odd to put the recursive calculation of ln(10)
where you did, rather than down where it's used, ie why not

mul_var(result, &fact, result, local_rscale);

ln_var(&const_ten, &ln_10, local_rscale);
int64_to_numericvar((int64) pow_10, &ni);
mul_var(&ln_10, &ni, &xx, local_rscale);
add_var(result, &xx, result);

round_var(result, rscale);

As you have it, ln_10 will be calculated with possibly a smaller rscale
than is used in this stanza.  That might be all right but it seems dubious
--- couldn't the lower-precision result leak into digits we care about?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-13 Thread Dean Rasheed
On 12 November 2015 at 21:01, Tom Lane  wrote:
> Dean Rasheed  writes:
>> These results are based on the attached, updated patch which includes
>> a few minor improvements.
>
> I started to look at this patch, and was immediately bemused by the
> comment in estimate_ln_weight:
>
> /*
>  * 0.9 <= var <= 1.1
>  *
>  * Its logarithm has a negative weight (possibly very large). Estimate
>  * it using ln(var) = ln(1+x) = x + O(x^2) ~= x.
>  */
>
> This comment's nonsense of course: ln(0.9) is about -0.105, and ln(1.1) is
> about 0.095, which is not even negative much less "very large".

That's nonsense. The comment is perfectly correct. It's not saying the
logarithm is negative, it's saying that the *weight* of the logarithm
is negative. For the examples you cite, the weight is small and
negative (around -1), but for inputs closer to 1 it can be large and
negative.


>  We could
> just replace that with "For values reasonably close to 1, we can estimate
> the result using ln(var) = ln(1+x) ~= x."  I am wondering what's the point
> though: why not flush this entire branch in favor of always using the
> generic case?  I rather doubt that on any modern machine two uses of
> cmp_var plus init_var/sub_var/free_var are going to be cheaper than the
> log() call you save.  You would need a different way to special-case 1.0,
> but that's not expensive.
>

No, you're missing the point entirely. Suppose var =
1.1, in which case ln(var) is approximately 1e-21.
The code in the first branch uses the approximation ln(var) = ln(1+x)
~= x = var-1 to see that the weight of ln(var) is around -21. You
can't do that with the code in second branch just by looking at the
first couple of digits of var and doing a floating point
approximation, because all you'd see would be 1.000 and your
approximation to the logarithm would be orders of magnitude out (as is
the case with the code that this function is replacing, which comes
with no explanatory comments at all).


> Larger questions are
>
> (1) why the Abs() in the specification of estimate_ln_weight --- that
> doesn't comport with the text about "Estimate the weight of the most
> significant digit".

Yes it does, and the formula is correct. The function returns an
estimate for the weight of the logarithm of var. Let L = ln(var), then
what we are trying to return is the weight of L. So we don't care
about the sign of L, we just need to know it's weight -- i.e., we want
approximately log10(Abs(L)) which is log10(Abs(ln(var))) as the
comment says. The Abs() is necessary because L might be negative (when
0 < var < 1).

>  (The branch I'm proposing you remove fails to
> honor that anyway.)
>

No it doesn't. It honours the Abs() by ignoring the sign of x, and
just looking at its weight.


> (2) what should the behavior be for input == 1, and why?  The code
> is returning zero, but that seems inconsistent or at least
> underdocumented.
>

ln(1) is 0, which has a weight of 0. I suppose you could argue that
technically 0 could have any weight, but looking at the bigger
picture, what this function is for is deciding how many digits of
precision are needed in intermediate calculations to retain full
precision in the final result. When the input is exactly 1, the
callers will have nice exact results, and no extra intermediate
precision is needed, so returning 0 is quite sensible.

I guess adding words to that effect to the comment makes sense, since
it clearly wasn't obvious.


> (3) if it's throwing an error for zero input, why not for negative
> input?  I'm not sure that either behavior is within the purview of
> this function, anyway.  Seems like returning zero might be fine.
>
> regards, tom lane

[Shrug] It doesn't make much difference since both those error cases
will be caught a little later on in the callers, but since this
function needs to test for an empty digit array in the second branch
anyway, it seemed like it might as well report that error there.
Returning zero would be fine too.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-11-12 Thread Tom Lane
Dean Rasheed  writes:
> These results are based on the attached, updated patch which includes
> a few minor improvements.

I started to look at this patch, and was immediately bemused by the
comment in estimate_ln_weight:

/*
 * 0.9 <= var <= 1.1
 *
 * Its logarithm has a negative weight (possibly very large). Estimate
 * it using ln(var) = ln(1+x) = x + O(x^2) ~= x.
 */

This comment's nonsense of course: ln(0.9) is about -0.105, and ln(1.1) is
about 0.095, which is not even negative much less "very large".  We could
just replace that with "For values reasonably close to 1, we can estimate
the result using ln(var) = ln(1+x) ~= x."  I am wondering what's the point
though: why not flush this entire branch in favor of always using the
generic case?  I rather doubt that on any modern machine two uses of
cmp_var plus init_var/sub_var/free_var are going to be cheaper than the
log() call you save.  You would need a different way to special-case 1.0,
but that's not expensive.

Larger questions are

(1) why the Abs() in the specification of estimate_ln_weight --- that
doesn't comport with the text about "Estimate the weight of the most
significant digit".  (The branch I'm proposing you remove fails to
honor that anyway.)

(2) what should the behavior be for input == 1, and why?  The code
is returning zero, but that seems inconsistent or at least
underdocumented.

(3) if it's throwing an error for zero input, why not for negative
input?  I'm not sure that either behavior is within the purview of
this function, anyway.  Seems like returning zero might be fine.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-09-16 Thread Dean Rasheed
On 16 September 2015 at 16:14, Dean Rasheed  wrote:
> On 16 September 2015 at 14:49, Merlin Moncure  wrote:
>>> AFAICT, this kind of slowdown only happens in cases like this where a
>>> very large number of digits are being returned.
>>
>> Can you clarify "very large"?
>>
>
> I haven't done much performance testing because I've been mainly
> focussed on accuracy. I just did a quick test of exp() for various
> result sizes. For results up to around 50 digits, the patched code was
> twice as fast as HEAD. After that the gap narrows until at around 250
> digits they become about the same speed, and beyond that the patched
> code is slower. At around 450 digits the patched code is twice as
> slow.
>

OK, scratch that. I managed to compare an optimised build with a debug
build somehow.

Comparing 2 optimised builds, it's still 2x faster at computing 16 or
17 digits, becomes about the same speed at 30 digits, and is 2x slower
at around 60 digits. So not nearly as impressive, but not too bad
either.

I'll try to do some more comprehensive performance testing over the
next few days.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-09-16 Thread Merlin Moncure
On Wed, Sep 16, 2015 at 10:14 AM, Dean Rasheed  wrote:
> On 16 September 2015 at 14:49, Merlin Moncure  wrote:
>>> AFAICT, this kind of slowdown only happens in cases like this where a
>>> very large number of digits are being returned.
>>
>> Can you clarify "very large"?
>>
>
> I haven't done much performance testing because I've been mainly
> focussed on accuracy. I just did a quick test of exp() for various
> result sizes. For results up to around 50 digits, the patched code was
> twice as fast as HEAD. After that the gap narrows until at around 250
> digits they become about the same speed, and beyond that the patched
> code is slower. At around 450 digits the patched code is twice as
> slow.
>
> My guess is that no one is actually using it for numbers that large.

well, I'm sold :-).

(I certainly agree that a slow inaccurate answer is better than a fast
inaccurate one, but it's nice to know that reasonable users of these
functions won't be impacted)

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-09-16 Thread Dean Rasheed
On 16 September 2015 at 14:49, Merlin Moncure  wrote:
>> AFAICT, this kind of slowdown only happens in cases like this where a
>> very large number of digits are being returned.
>
> Can you clarify "very large"?
>

I haven't done much performance testing because I've been mainly
focussed on accuracy. I just did a quick test of exp() for various
result sizes. For results up to around 50 digits, the patched code was
twice as fast as HEAD. After that the gap narrows until at around 250
digits they become about the same speed, and beyond that the patched
code is slower. At around 450 digits the patched code is twice as
slow.

My guess is that no one is actually using it for numbers that large.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-09-16 Thread Simon Riggs
On 16 September 2015 at 09:32, Tom Lane  wrote:

> Dean Rasheed  writes:
> > ... For example, exp() works for inputs up to 6000. However, if you
> > compute exp(5999.999) the answer is truly huge -- probably only of
> > academic interest to anyone. With HEAD, exp(5999.999) produces a
> > number with 2609 significant digits in just 1.5ms (on my ageing
> > desktop box). However, only the first 9 digits returned are correct.
> > The other 2600 digits are pure noise. With my patch, all 2609 digits
> > are correct (confirmed using bc), but it takes 27ms to compute, making
> > it 18x slower.
>
> > AFAICT, this kind of slowdown only happens in cases like this where a
> > very large number of digits are being returned. It's not obvious what
> > we should be doing in cases like this. Is a performance reduction like
> > that acceptable to generate the correct answer? Or should we try to
> > produce a more approximate result more quickly, and where do we draw
> > the line?
>
> FWIW, in that particular example I'd happily take the 27ms time to get
> the more accurate answer.  If it were 270ms, maybe not.  I think my
> initial reaction to this patch is "are there any cases where it makes
> things 100x slower ... especially for non-outrageous inputs?"  If not,
> sure, let's go for more accuracy.
>

Agreed

Hopefully things can be made faster with less significant digits.

I figure this is important enough to trigger a maint release, but since we
already agreed when the next one is, I don't see we need to do it any
quicker, do we?

Well done Dean for excellent research.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-09-16 Thread Tom Lane
Dean Rasheed  writes:
> ... For example, exp() works for inputs up to 6000. However, if you
> compute exp(5999.999) the answer is truly huge -- probably only of
> academic interest to anyone. With HEAD, exp(5999.999) produces a
> number with 2609 significant digits in just 1.5ms (on my ageing
> desktop box). However, only the first 9 digits returned are correct.
> The other 2600 digits are pure noise. With my patch, all 2609 digits
> are correct (confirmed using bc), but it takes 27ms to compute, making
> it 18x slower.

> AFAICT, this kind of slowdown only happens in cases like this where a
> very large number of digits are being returned. It's not obvious what
> we should be doing in cases like this. Is a performance reduction like
> that acceptable to generate the correct answer? Or should we try to
> produce a more approximate result more quickly, and where do we draw
> the line?

FWIW, in that particular example I'd happily take the 27ms time to get
the more accurate answer.  If it were 270ms, maybe not.  I think my
initial reaction to this patch is "are there any cases where it makes
things 100x slower ... especially for non-outrageous inputs?"  If not,
sure, let's go for more accuracy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccurate results from numeric ln(), log(), exp() and pow()

2015-09-16 Thread Merlin Moncure
On Wed, Sep 16, 2015 at 2:31 AM, Dean Rasheed  wrote:
> Hi,
>
> I recently noticed that numeric log() produces inaccurate results for
> certain ranges of inputs. This isn't just small errors in the last 1
> or 2 digits, but sometimes quite large errors, with over half the
> significant digits returned being incorrect.

yikes.

> The initial issue was a block of code to estimate the weight of the
> logarithm, repeated in 3 different places, and called from ln(), log()
> and pow(). After coming up with a fix for that, however, testing
> revealed a whole range of additional issues causing further
> inaccuracies in those functions and also in exp().

> AFAICT, this kind of slowdown only happens in cases like this where a
> very large number of digits are being returned.

Can you clarify "very large"?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers