Re: [HACKERS] [GENERAL] Floating point error
On 03/05/2013 07:23 PM, Tom Lane wrote: Maciek Sakrejda m.sakre...@gmail.com writes: Thank you: I think this is what I was missing, and what wasn't clear from the proposed doc patch. But then how can pg_dump assume that it's always safe to set extra_float_digits = 3? It's been proven (don't have a link handy, but the paper is at least a dozen years old) that 3 extra digits are sufficient to accurately reconstruct any IEEE single or double float value, given properly written conversion functions in libc. So that's where that number comes from. Now, if either end is not using IEEE floats, you may or may not get equivalent results --- but it's pretty hard to make any guarantees at all in such a case. There's also gdtoa, which returns the shortest decimal representation which rounds to the same decimal number. It would print 0.1 as 0.1, but 0.1 + 0.2 as 0.30004. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
Daniel Farina wrote: On Mon, Mar 4, 2013 at 2:27 PM, Maciek Sakrejda m.sakre...@gmail.com wrote: On Sun, Mar 3, 2013 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: The real difficulty is that there may be more than one storable value that corresponds to 1.23456 to six decimal digits. To be certain that we can reproduce the stored value uniquely, we have to err in the other direction, and print *more* decimal digits than the underlying precision justifies, rather than a bit less. Some of those digits are going to look like garbage to the naked eye. I think part of the difficulty here is that psql (if I understand this correctly) conflates the wire-format text representations with what should be displayed to the user. E.g., a different driver might parse the wire representation into a native representation, and then format that native representation when it is to be displayed. That's what the JDBC driver does, so it doesn't care about how the wire format actually looks. pg_dump cares about reproducing values exactly, and not about whether things are nice-looking, so it cranks up extra_float_digits. The JDBC driver might be justified in doing likewise, to ensure that the identical binary float value is stored on both client and server --- but that isn't even a valid goal unless you assume that the server's float implementation is the same as Java's, which is a bit of a leap of faith, even if IEEE 754 is nigh universal these days. I would hope that any driver cares about reproducing values exactly (or at least as exactly as the semantics of the client and server representations of the data type allow). Once you start talking operations, sure, things get a lot more complicated and you're better off not relying on any particular semantics. But IEEE 754 unambiguously defines certain bit patterns to correspond to certain values, no? If both client and server talk IEEE 754 floating point, it should be possible to round-trip values with no fuss and end up with the same bits you started with (and as far as I can tell, it is, as long as extra_float_digits is set to the max), even if the implementations of actual operations on these numbers behave very differently on client and server. I think given that many ORMs can cause UPDATEs on tuple fields that have not changed as part of saving an object, stable round trips seem like a desirable feature. But all these things are already available: Any driver that cares can set extra_float_digits=3, and if it prefers the binary format, the wire protocol supports sending floating point values as such. I also find the rationale for extra_float digits quite mysterious for the same reason: why would most programs care about precision less than pg_dump does? If a client wants floating point numbers to look nice, I think the rendering should be on them (e.g. psql and pgadmin), and the default should be to expose whatever precision is available to clients that want an accurate representation of what is in the database. This kind of change may have many practical problems that may make it un-pragmatic to alter at this time (considering the workaround is to set the extra float digits), but I can't quite grasp the rationale for well, the only program that cares about the most precision available is pg_dump. It seems like most programs would care just as much. I don't think that it is about looking nice. C doesn't promise you more than FLT_DIG or DBL_DIG digits of precision, so PostgreSQL cannot either. If you allow more, that would mean that if you store the same number on different platforms and query it, it might come out differently. Among other things, that would be a problem for the regression tests. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
Daniel Farina dan...@heroku.com wrote: This kind of change may have many practical problems that may make it un-pragmatic to alter at this time (considering the workaround is to set the extra float digits), but I can't quite grasp the rationale for well, the only program that cares about the most precision available is pg_dump. It seems like most programs would care just as much. Something to keep in mind is that when you store 0.01 into a double precision column, the precise value stored, when written in decimal, is: 0.0120816681711721685132943093776702880859375 Of course, some values can't be precisely written in decimal with so few digits. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
On 05.03.2013 15:59, Kevin Grittner wrote: Daniel Farinadan...@heroku.com wrote: This kind of change may have many practical problems that may make it un-pragmatic to alter at this time (considering the workaround is to set the extra float digits), but I can't quite grasp the rationale for well, the only program that cares about the most precision available is pg_dump. It seems like most programs would care just as much. Something to keep in mind is that when you store 0.01 into a double precision column, the precise value stored, when written in decimal, is: 0.0120816681711721685132943093776702880859375 Of course, some values can't be precisely written in decimal with so few digits. It would be nice to have a base-2 text format to represent floats. It wouldn't be as human-friendly as base-10, but it could be used when you don't want to lose precision. pg_dump in particular. - Heikki -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
On Tue, Mar 5, 2013 at 12:03 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote: I don't think that it is about looking nice. C doesn't promise you more than FLT_DIG or DBL_DIG digits of precision, so PostgreSQL cannot either. If you allow more, that would mean that if you store the same number on different platforms and query it, it might come out differently. Among other things, that would be a problem for the regression tests. Thank you: I think this is what I was missing, and what wasn't clear from the proposed doc patch. But then how can pg_dump assume that it's always safe to set extra_float_digits = 3? Why the discrepancy between default behavior and what pg_dump gets? It can't know whether the dump is to be restored into the same system or a different one (and AFAICT, there's not even an option to tweak extra_float_digits there). -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
HL == Heikki Linnakangas hlinnakan...@vmware.com writes: HL It would be nice to have a base-2 text format to represent floats. HL It wouldn't be as human-friendly as base-10, but it could be used HL when you don't want to lose precision. pg_dump in particular. hexidecimal notation for floats exists. The printf format flag is %a for miniscule and %A for majuscule. The result of 1./3. is 0xa.aabp-5. This site has some info and a conversion demo: http://gregstoll.dyndns.org/~gregstoll/floattohex/ -JimC -- James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
Maciek Sakrejda m.sakre...@gmail.com writes: Thank you: I think this is what I was missing, and what wasn't clear from the proposed doc patch. But then how can pg_dump assume that it's always safe to set extra_float_digits = 3? It's been proven (don't have a link handy, but the paper is at least a dozen years old) that 3 extra digits are sufficient to accurately reconstruct any IEEE single or double float value, given properly written conversion functions in libc. So that's where that number comes from. Now, if either end is not using IEEE floats, you may or may not get equivalent results --- but it's pretty hard to make any guarantees at all in such a case. Why the discrepancy between default behavior and what pg_dump gets? Basically, the default behavior is tuned to the expectations of people who think that what they put in is what they should get back, ie we don't want the system doing this by default: regression=# set extra_float_digits = 3; SET regression=# select 0.1::float4; float4 - 0.10001 (1 row) regression=# select 0.1::float8; float8 - 0.10001 (1 row) We would get a whole lot more bug reports, not fewer, if that were the default behavior. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Why the discrepancy between default behavior and what pg_dump gets? Basically, the default behavior is tuned to the expectations of people who think that what they put in is what they should get back, ie we don't want the system doing this by default: regression=# set extra_float_digits = 3; SET regression=# select 0.1::float4; float4 - 0.10001 (1 row) regression=# select 0.1::float8; float8 - 0.10001 (1 row) We would get a whole lot more bug reports, not fewer, if that were the default behavior. Isn't this a client rendering issue, rather than an on-the-wire encoding issue? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
Maciek Sakrejda m.sakre...@gmail.com writes: On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Basically, the default behavior is tuned to the expectations of people who think that what they put in is what they should get back, ie we don't want the system doing this by default: regression=# set extra_float_digits = 3; SET regression=# select 0.1::float4; float4 - 0.10001 (1 row) regression=# select 0.1::float8; float8 - 0.10001 (1 row) We would get a whole lot more bug reports, not fewer, if that were the default behavior. Isn't this a client rendering issue, rather than an on-the-wire encoding issue? Nope, at least not unless you ask for binary output format (which introduces a whole different set of portability gotchas, so it's not the default either). regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
This conversation has moved beyond my ability to be useful but I want to remind everyone of my original issues in case it helps you improve the docs: 1) Data shown in psql did not match data retrieved by JDBC. I had to debug pretty deep into the JDBC code to confirm that a value I was staring at in psql was different in JDBC. Pretty weird, but I figured it had something to do with floating point malarky. 2) The problem in #1 could not be reproduced when running on our test database. Again very weird, because as far as psql was showing me the values in the two databases were identical. I used COPY to transfer some data from the production database to the test database. I now know that what you see in psql is not necessarily what you see in JDBC. I also know that you need to set extra_float_digits = 3 before using COPY to transfer data from one database to another or risk differences in floating point values. Sounds like both pg_dump and the JDBC driver must be doing this or its equivalent on their own. If the numeric types page of the documentation had mentioned the extra_float_digits then I might have been able to solve my own problem. I'd like you to add some mention of it even if it is just handwaving but will let you guys hash it out from here. Either way, PostgreSQL rocks! Tom On Mar 5, 2013, at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maciek Sakrejda m.sakre...@gmail.com writes: On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Basically, the default behavior is tuned to the expectations of people who think that what they put in is what they should get back, ie we don't want the system doing this by default: regression=# set extra_float_digits = 3; SET regression=# select 0.1::float4; float4 - 0.10001 (1 row) regression=# select 0.1::float8; float8 - 0.10001 (1 row) We would get a whole lot more bug reports, not fewer, if that were the default behavior. Isn't this a client rendering issue, rather than an on-the-wire encoding issue? Nope, at least not unless you ask for binary output format (which introduces a whole different set of portability gotchas, so it's not the default either). regards, tom lane -- Tom Duffey tduf...@trillitech.com 414-751-0600 x102 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
Em 05/03/2013 16:01, Tom Duffey escreveu: This conversation has moved beyond my ability to be useful but I want to remind everyone of my original issues in case it helps you improve the docs: 1) Data shown in psql did not match data retrieved by JDBC. I had to debug pretty deep into the JDBC code to confirm that a value I was staring at in psql was different in JDBC. Pretty weird, but I figured it had something to do with floating point malarky. As stated before, Java tries to be consistent across platforms. This is different than a dozen of C and C++ implementations you will find (some implementations will not be consistent even using same library on different platforms). 2) The problem in #1 could not be reproduced when running on our test database. Again very weird, because as far as psql was showing me the values in the two databases were identical. I used COPY to transfer some data from the production database to the test database. Yes, this is really interesting. Is the production database running on same platform (OS, architecture, updates) as the test database? Regards, Edson I now know that what you see in psql is not necessarily what you see in JDBC. I also know that you need to set extra_float_digits = 3 before using COPY to transfer data from one database to another or risk differences in floating point values. Sounds like both pg_dump and the JDBC driver must be doing this or its equivalent on their own. If the numeric types page of the documentation had mentioned the extra_float_digits then I might have been able to solve my own problem. I'd like you to add some mention of it even if it is just handwaving but will let you guys hash it out from here. Either way, PostgreSQL rocks! Tom On Mar 5, 2013, at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maciek Sakrejda m.sakre...@gmail.com writes: On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Basically, the default behavior is tuned to the expectations of people who think that what they put in is what they should get back, ie we don't want the system doing this by default: regression=# set extra_float_digits = 3; SET regression=# select 0.1::float4; float4 - 0.10001 (1 row) regression=# select 0.1::float8; float8 - 0.10001 (1 row) We would get a whole lot more bug reports, not fewer, if that were the default behavior. Isn't this a client rendering issue, rather than an on-the-wire encoding issue? Nope, at least not unless you ask for binary output format (which introduces a whole different set of portability gotchas, so it's not the default either). regards, tom lane -- Tom Duffey tduf...@trillitech.com 414-751-0600 x102 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
On Sun, Mar 3, 2013 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: The real difficulty is that there may be more than one storable value that corresponds to 1.23456 to six decimal digits. To be certain that we can reproduce the stored value uniquely, we have to err in the other direction, and print *more* decimal digits than the underlying precision justifies, rather than a bit less. Some of those digits are going to look like garbage to the naked eye. I think part of the difficulty here is that psql (if I understand this correctly) conflates the wire-format text representations with what should be displayed to the user. E.g., a different driver might parse the wire representation into a native representation, and then format that native representation when it is to be displayed. That's what the JDBC driver does, so it doesn't care about how the wire format actually looks. pg_dump cares about reproducing values exactly, and not about whether things are nice-looking, so it cranks up extra_float_digits. The JDBC driver might be justified in doing likewise, to ensure that the identical binary float value is stored on both client and server --- but that isn't even a valid goal unless you assume that the server's float implementation is the same as Java's, which is a bit of a leap of faith, even if IEEE 754 is nigh universal these days. I would hope that any driver cares about reproducing values exactly (or at least as exactly as the semantics of the client and server representations of the data type allow). Once you start talking operations, sure, things get a lot more complicated and you're better off not relying on any particular semantics. But IEEE 754 unambiguously defines certain bit patterns to correspond to certain values, no? If both client and server talk IEEE 754 floating point, it should be possible to round-trip values with no fuss and end up with the same bits you started with (and as far as I can tell, it is, as long as extra_float_digits is set to the max), even if the implementations of actual operations on these numbers behave very differently on client and server. I think given that many ORMs can cause UPDATEs on tuple fields that have not changed as part of saving an object, stable round trips seem like a desirable feature. We could have dumbed it down to a boolean look nice versus reproduce the value exactly switch, but it seemed like there might be applications that could use some additional flexibility. In any case, it's not Postgres' fault that there is an issue here; it's fundamental to the use of binary rather than decimal stored values. It seems like getting things to look nice should be the client's job, no? Why does that factor into wire protocol data representations (and yes, I know part of the answer here--presumably literals are intimately tied to the same code paths, so it's not quite so simple)? Going back to the documentation patch, what should the advice be? How about something along these lines: Due to the nature of floating point numeric values, a faithful textual representation of a typereal/type or typedouble precision/type value requires some decimal digits that are generally insignificant, impairing readability of common values. Because of this, Postgres supports a limited output precision for floating point numbers by default. In order to preserve floating point values more exactly, you can use the xref linkend=guc-extra-float-digits to adjust this setting. Is that reasonable? It still feels like extra_float_digits should be opt-out rather than opt-in (leaving any formatting issues to clients), but this could be a start. It doesn't address non-IEEE 754 platforms, but the note in the other proposed patch is so high-level as to just be hand-waving. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
On Mon, Mar 4, 2013 at 2:27 PM, Maciek Sakrejda m.sakre...@gmail.com wrote: On Sun, Mar 3, 2013 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: The real difficulty is that there may be more than one storable value that corresponds to 1.23456 to six decimal digits. To be certain that we can reproduce the stored value uniquely, we have to err in the other direction, and print *more* decimal digits than the underlying precision justifies, rather than a bit less. Some of those digits are going to look like garbage to the naked eye. I think part of the difficulty here is that psql (if I understand this correctly) conflates the wire-format text representations with what should be displayed to the user. E.g., a different driver might parse the wire representation into a native representation, and then format that native representation when it is to be displayed. That's what the JDBC driver does, so it doesn't care about how the wire format actually looks. pg_dump cares about reproducing values exactly, and not about whether things are nice-looking, so it cranks up extra_float_digits. The JDBC driver might be justified in doing likewise, to ensure that the identical binary float value is stored on both client and server --- but that isn't even a valid goal unless you assume that the server's float implementation is the same as Java's, which is a bit of a leap of faith, even if IEEE 754 is nigh universal these days. I would hope that any driver cares about reproducing values exactly (or at least as exactly as the semantics of the client and server representations of the data type allow). Once you start talking operations, sure, things get a lot more complicated and you're better off not relying on any particular semantics. But IEEE 754 unambiguously defines certain bit patterns to correspond to certain values, no? If both client and server talk IEEE 754 floating point, it should be possible to round-trip values with no fuss and end up with the same bits you started with (and as far as I can tell, it is, as long as extra_float_digits is set to the max), even if the implementations of actual operations on these numbers behave very differently on client and server. I think given that many ORMs can cause UPDATEs on tuple fields that have not changed as part of saving an object, stable round trips seem like a desirable feature. I also find the rationale for extra_float digits quite mysterious for the same reason: why would most programs care about precision less than pg_dump does? If a client wants floating point numbers to look nice, I think the rendering should be on them (e.g. psql and pgadmin), and the default should be to expose whatever precision is available to clients that want an accurate representation of what is in the database. This kind of change may have many practical problems that may make it un-pragmatic to alter at this time (considering the workaround is to set the extra float digits), but I can't quite grasp the rationale for well, the only program that cares about the most precision available is pg_dump. It seems like most programs would care just as much. -- fdr -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
While having more docs around extra_float_digits is a great idea, I don't think the patch really clarifies much. (Disclaimer: I honestly have only a vague idea of the reasoning behind extra_float_digits existing in the first place, but perhaps that means I'm a good target audience for the doc patch. Also, I apologize if anything below sounds abrasive--having worked on both my own driver and a few things here and there on the JDBC one, plus running into this in assorted places in the wild, I still find extra_float_digits baffling at best. I immensely appreciate the effort to make improvements here.) That is, the patch explains some of the reasoning behind the setting, but it doesn't really help the reader identify where setting this is useful and/or necessary. E.g., the JDBC driver just indiscriminately sets extra_float_digits to 3 if the server supports it (the max allowable). Is this a bogus approach? + The (inherently inaccurate) textual representation of a typereal/type + or typedouble precision/type value... Isn't full fidelity possible assuming sensible rounding semantics and enough characters of precision? Isn't that the point of extra_float_digits? + will by default contain only as many + significant digits as can be represented without losing precision + on any platform supported by PostgreSQL. How would providing more digits lose precision? Platform as in a non-IEEE 754 server restoring a dump? A non-IEEE 754 client running queries? Something more subtle? And how does having more precision in the serialized cause these to lose precision when decoding the value? I think extra_float_digits is an awkward, surprising implementation detail--at first glance it seems like a flip this switch for correct behavior knob. I'd love to have a clear explanation in the docs of why it's needed, and when you should and should not set it (e.g., is the JDBC driver wrong to set it all the time? should every driver set it all the time?). Thanks, Maciek -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Floating point error
Maciek Sakrejda m.sakre...@gmail.com writes: [ a bunch of questions that boil down to: ] Isn't full fidelity possible assuming sensible rounding semantics and enough characters of precision? The fundamental issue is that the underlying representation is binary and so its precision limit doesn't correspond to an exact number of decimal digits. The default print format tries to hide that from you by printing only as many decimal digits as the stored format is certain to be able to reproduce, ie if you enter 1.23456 you should see that printed again, not 1.23455 or 1.23457. However, the stored value is not going to be exactly equal to 1.23456. The real difficulty is that there may be more than one storable value that corresponds to 1.23456 to six decimal digits. To be certain that we can reproduce the stored value uniquely, we have to err in the other direction, and print *more* decimal digits than the underlying precision justifies, rather than a bit less. Some of those digits are going to look like garbage to the naked eye. pg_dump cares about reproducing values exactly, and not about whether things are nice-looking, so it cranks up extra_float_digits. The JDBC driver might be justified in doing likewise, to ensure that the identical binary float value is stored on both client and server --- but that isn't even a valid goal unless you assume that the server's float implementation is the same as Java's, which is a bit of a leap of faith, even if IEEE 754 is nigh universal these days. I think extra_float_digits is an awkward, surprising implementation detail We could have dumbed it down to a boolean look nice versus reproduce the value exactly switch, but it seemed like there might be applications that could use some additional flexibility. In any case, it's not Postgres' fault that there is an issue here; it's fundamental to the use of binary rather than decimal stored values. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general