Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-09 Thread Erik Nordström
Tom,

You are of course right that you cannot add precision that was not there to
begin with. My patch does nothing for input values that cannot be
represented accurately to begin with. However, that was not my main point.

The idea with the calculation is this: When you remove the integral/seconds
part of the value before converting to integral microseconds, you are
creating a number that can be represented with a float at higher accuracy
compared to the original value (i.e., simply because there are less digits
in the mantissa/significand). Thus when doing 0.312311172485352 *
USECS_PER_SEC, you suffer less precision-related errors with regards to
microseconds than when doing 14864803242.312311172485352 * USECS_PER_SEC as
in the original code. In other words, with this approach there are fewer
cases when 1 ULP is bigger than 1 microsecond.

FWIW, this is the output from latest postgres HEAD, which includes your
rint() patch:

postgres=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312312+01
(1 row)

And this is the output with my patch:

postgres=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312311+01
(1 row)


I don't know why you would get different output (that would be worrying in
itself).

In any case, I would prefer a to_timestamp(numeric), although I see no
reason not to make the float8 version as accurate as possible anyway.

-Erik



On Thu, Feb 9, 2017 at 3:56 PM, Tom Lane  wrote:

> =?UTF-8?Q?Erik_Nordstr=C3=B6m?=  writes:
> > Thanks for the insightful feedback. You are right, the patch does suffer
> > from overflow (and other possible issues when I think about it). Using
> > rint(), as you suggest, helps in my original example case, but there are
> > still cases when the output is not what you would expect. For instance,
> > converting the Unix time 14864803242.312311 gives back the timestamp
> > "2441-01-17 09:00:42.312312+01", even if using rint() (see below).
>
> Hm, that particular case works for me using the patch I committed
> yesterday (which just added a rint() call to the existing code).
> I'm a bit skeptical of the version you propose here because it seems
> mighty prone to subtractive cancellation.  You're basically computing
> x - int(x) which can't add any precision that wasn't there before.
> Looking at successive microsecond values in this range:
>
> regression=# select 14864803242.312310::float8 - 14864803242;
>  ?column?
> ---
>  0.312309265136719
> (1 row)
>
> regression=# select 14864803242.312311::float8 - 14864803242;
>  ?column?
> ---
>  0.312311172485352
> (1 row)
>
> regression=# select 14864803242.312312::float8 - 14864803242;
>  ?column?
> ---
>  0.312311172485352
> (1 row)
>
> regression=# select 14864803242.312313::float8 - 14864803242;
>  ?column?
> ---
>  0.312313079833984
> (1 row)
>
> Basically, 1 ULP in this range is more than 1 microsecond, so there are
> going to be places where you cannot get the right answer.  Reformulating
> the computation will just shift the errors to nearby values.  float8
> simply doesn't have enough bits to represent this many microseconds since
> 1970 exactly, and the problem's only going to get worse the further out
> you look.
>
> I think we might be better advised to add to_timestamp(numeric) alongside
> to_timestamp(float8).  There's plenty of precedent for that (e.g, exp(),
> ln()) so I would not expect problems with ambiguous function calls.
> It'd be slower though ...
>
> regards, tom lane
>


Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-09 Thread Erik Nordström
Hi Tom, and others,

Thanks for the insightful feedback. You are right, the patch does suffer
from overflow (and other possible issues when I think about it). Using
rint(), as you suggest, helps in my original example case, but there are
still cases when the output is not what you would expect. For instance,
converting the Unix time 14864803242.312311 gives back the timestamp
"2441-01-17 09:00:42.312312+01", even if using rint() (see below).

I guess precision-related errors are unavoidable when working with floating
point numbers (especially large ones). But I am wondering if it is not
desirable to avoid (or reduce) errors at least in the case when the input
value can be accurately represented to the microsecond by a float (i.e.,
when rounded to microsecond, as in printf)? For example, splitting up the
float into second and microsecond integers might help reduce errors,
especially with large numbers. Something like this:

ts_seconds = (int64)seconds;
ts_microseconds = (int64)rint((seconds - ts_seconds) * USECS_PER_SEC);
result = (ts_seconds - epoch_diff_seconds) * USECS_PER_SEC +
ts_microseconds;

Note that stripping of the full seconds before scaling the microsecond
fractional part will keep it more accurate, and it should solve the problem
for the case above, including overflow.

Or, maybe I am overthinking this and the only proper solution is to provide
a to_timestamp() that takes a microsecond integer? This certainly wouldn't
hurt in any case an makes sense since the timestamp is itself an integer
internally.

Anyway, I am attaching an updated version of the path. Below are some
example outputs (including min and max allowed input) from original
Postgres, with your rint() fix, and the included patch:

Original postgres:

test=# select to_timestamp(9224318015999);
  to_timestamp
-
 294277-01-01 00:59:58.999552+01
(1 row)

test=# select to_timestamp(-210866803200);
  to_timestamp
-
 4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
 to_timestamp
---
 2017-02-07 16:12:04.236537+01
(1 row)

test=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312312+01
(1 row)


With rint():

test=# select to_timestamp(9224318015999);
  to_timestamp
-
 294277-01-01 00:59:58.999552+01
(1 row)

test=# select to_timestamp(-210866803200);
  to_timestamp
-
 4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
 to_timestamp
---
 2017-02-07 16:12:04.236538+01
(1 row)

test=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312312+01
(1 row)


Included patch:

test=# select to_timestamp(9224318015999);
   to_timestamp
--
 294277-01-01 00:59:59+01
(1 row)

test=# select to_timestamp(-210866803200);
  to_timestamp
-
 4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
 to_timestamp
---
 2017-02-07 16:12:04.236538+01
(1 row)

test=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312311+01
(1 row)


--Erik


On Wed, Feb 8, 2017 at 9:52 PM, Tom Lane  wrote:

> I wrote:
> > I wonder if we could make things better just by using rint() rather than
> > a naive cast-to-integer.  The cast will truncate not round, and I think
> > that might be what's mostly biting you.  Does this help for you?
>
> > #ifdef HAVE_INT64_TIMESTAMP
> > - result = seconds * USECS_PER_SEC;
> > + result = rint(seconds * USECS_PER_SEC);
> > #else
>
> I poked around looking for possible similar issues elsewhere, and found
> that a substantial majority of the datetime-related code already uses
> rint() when trying to go from float to int representations.  As far as
> I can find, this function and make_interval() are the only ones that
> fail to do so.  So I'm now thinking that this is a clear oversight,
> and both those places need to be patched to use rint().
>
> regards, tom lane
>


to_timestamp_fix_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-08 Thread Erik Nordström
Hello hackers,

I stumbled upon a precision issue with the to_timestamp() function that
causes it to return unexpected timestamp values. For instance, the query
SELECT to_timestamp(1486480176.236538) returns the timestamp "2017-02-07
16:09:36.236537+01", which is off by one microsecond. Looking at the source
code, the issue seems to be that the conversion is unnecessarily done using
imprecise floating point calculations. Since the target timestamp has
microsecond precision, and is internally represented by a 64-bit integer
(on modern platforms), it is better to first convert the given floating
point value to a microsecond integer and then doing the epoch conversion,
rather than doing the conversion using floating point and finally casting
to an integer/timestamp.

I am attaching a patch that fixes the above issue.

Regards,

Erik


to_timestamp_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers