Re: [HACKERS] [GENERAL] Floating point error
This conversation has moved beyond my ability to be useful but I want to remind everyone of my original issues in case it helps you improve the docs: 1) Data shown in psql did not match data retrieved by JDBC. I had to debug pretty deep into the JDBC code to confirm that a value I was staring at in psql was different in JDBC. Pretty weird, but I figured it had something to do with floating point malarky. 2) The problem in #1 could not be reproduced when running on our test database. Again very weird, because as far as psql was showing me the values in the two databases were identical. I used COPY to transfer some data from the production database to the test database. I now know that what you see in psql is not necessarily what you see in JDBC. I also know that you need to set extra_float_digits = 3 before using COPY to transfer data from one database to another or risk differences in floating point values. Sounds like both pg_dump and the JDBC driver must be doing this or its equivalent on their own. If the numeric types page of the documentation had mentioned the extra_float_digits then I might have been able to solve my own problem. I'd like you to add some mention of it even if it is just handwaving but will let you guys hash it out from here. Either way, PostgreSQL rocks! Tom On Mar 5, 2013, at 12:38 PM, Tom Lane wrote: > Maciek Sakrejda writes: >> On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane wrote: >>> Basically, the default behavior is tuned to the expectations of people >>> who think that what they put in is what they should get back, ie we >>> don't want the system doing this by default: >>> >>> regression=# set extra_float_digits = 3; >>> SET >>> regression=# select 0.1::float4; >>> float4 >>> - >>> 0.10001 >>> (1 row) >>> >>> regression=# select 0.1::float8; >>> float8 >>> - >>> 0.10001 >>> (1 row) >>> >>> We would get a whole lot more bug reports, not fewer, if that were >>> the default behavior. > >> Isn't this a client rendering issue, rather than an on-the-wire encoding >> issue? > > Nope, at least not unless you ask for binary output format (which > introduces a whole different set of portability gotchas, so it's > not the default either). > > regards, tom lane -- Tom Duffey tduf...@trillitech.com 414-751-0600 x102 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Floating point error
Hi Everyone, To bring closure to this thread, my whole problem was caused by not knowing about the extra_float_digits setting. We have a script that uses COPY to transfer a subset of rows from a very large production table to a test table. The script was not setting extra_float_digits so the values did not match even though they appeared to match when running queries in psql. Definitely another gotcha for floating point values and it might be a good idea to mention this setting on the "Numeric Types" page of the docs. Thanks to all who chimed in to help! Tom On Feb 28, 2013, at 7:05 PM, James Cloos wrote: >>>>>> "TD" == Tom Duffey writes: > > TD> Riddle me this. I have a database column of type "real" that gets > TD> mapped to a Java field of type double via JDBC. ... > > TD> - Selecting values from both test and production DBs using psql > TD> shows "10.3885" as the value > > TD> - The Java app on production shows "10.3884573" while the test app > TD> shows "10.3885" > > I suspect the issue is that psql(1) and whatever java method you use to > convert the floats to text choose different rounding. > > By default, it seems that psql(1) uses something like printf("%.4f",...) > whereas your java app calls a routing which works more like "%.7f". > > (The wire format for floats is the same as they are stored, not a text > representation thereof.) > > -JimC > -- > James Cloos OpenPGP: 1024D/ED7DAEA6 -- Tom Duffey tduf...@trillitech.com 414-751-0600 x102 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Floating point error
That's exactly what I was looking for. We use COPY to transfer data from a 1 billion+ row table to a test database and were confused why the results looked the same but were obviously not. Sounds like we need to use the extra_float_digits setting to include all the available information when transferring the data. Thanks for the explanation. Tom On Feb 25, 2013, at 8:00 AM, Albe Laurenz wrote: > Tom Duffey wrote: >> Here is a smaller test case that does not involve Java. I guess this >> probably is just due to floating >> point error when the initial value is inserted that is too large for the >> field but it's still a >> surprise. >> >> Create a test table, insert a couple values and view the results: >> >> CREATE TABLE test ( >>id INTEGER PRIMARY KEY, >>value REAL NOT NULL >> ); >> >> INSERT INTO test (id, value) VALUES (1, 10.3884573), (2, 10.3885); >> SELECT * FROM test; >> >> id | value >> +- >> 1 | 10.3885 >> 2 | 10.3885 >> (2 rows) > > SET extra_float_digits=3; > SELECT * FROM test; > > id | value > + > 1 | 10.3884573 > 2 | 10.3885002 > (2 rows) > > PostgreSQL by default omits the last three digits to avoid > differences on different architectures (I think). > > When you convert to double precision, you'll see these digits. > >> At this point you would think you have two equal values. Now change the type: >> >> ALTER TABLE test ALTER COLUMN value TYPE DOUBLE PRECISION; >> SELECT * FROM test; >> >> id | value >> +-- >> 1 | 10.3884572982788 >> 2 | 10.388500213623 >> (2 rows) > > Yours, > Laurenz Albe -- Tom Duffey tduf...@trillitech.com 414-751-0600 x102 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Floating point error
Here is a smaller test case that does not involve Java. I guess this probably is just due to floating point error when the initial value is inserted that is too large for the field but it's still a surprise. Create a test table, insert a couple values and view the results: CREATE TABLE test ( id INTEGER PRIMARY KEY, value REAL NOT NULL ); INSERT INTO test (id, value) VALUES (1, 10.3884573), (2, 10.3885); SELECT * FROM test; id | value +- 1 | 10.3885 2 | 10.3885 (2 rows) At this point you would think you have two equal values. Now change the type: ALTER TABLE test ALTER COLUMN value TYPE DOUBLE PRECISION; SELECT * FROM test; id | value +-- 1 | 10.3884572982788 2 | 10.388500213623 (2 rows) Values no longer equal and the first one is in fact closer to what as originally inserted. Why is this? Is this simply caused by how the initially inserted value is stored as floating point? If you create a "copy" of the database before changing the field type then both values get inserted as "10.3885." Changing the type then results in two equal values. Maybe this is just another pitfall of using floating point numbers and at this point I am just trying to identify exactly where our errors are being introduced so can anyone confirm the above behavior is correct? In our real world example we are not changing the type but are instead getting the second value w/id = 1 above when using JDBC to retrieve values into a Java double field. I ran the above on PostgreSQL 9.1.2 and 9.2.2 with the same results. Tom On Feb 24, 2013, at 9:17 PM, Adrian Klaver wrote: > On 02/24/2013 06:58 PM, Tom Duffey wrote: >> >> On Feb 24, 2013, at 8:44 PM, Adrian Klaver wrote: >> >>> On 02/24/2013 06:13 PM, Tom Duffey wrote: >>>> Hi Everyone, >>>> >>>> Riddle me this. I have a database column of type "real" that gets mapped >>>> to a Java field of type double via JDBC. We have two databases, test and >>>> production, and the test database is periodically blown away and reloaded >>>> from a copy of production. We recently noticed that some values do not >>>> match when viewed within our application on test vs. production. More >>>> specifically: >>>> >>>> - Selecting values from both test and production DBs using psql shows >>>> "10.3885" as the value >>>> - The Java app on production shows "10.3884573" while the test app shows >>>> "10.3885" >>>> >>>> I have a hunch that when the value was originally inserted into the >>>> production DB it probably contained more than the 6 digits supported by >>>> the real data type. It may have even been exactly the "10.3884573" value >>>> we see when retrieving via JDBC on production. What I don't understand is >>>> why when the value gets mapped back to Java via JDBC those extra digits >>>> are coming back. Can anyone explain this or do you think I'm on the wrong >>>> track? I stepped through code and it sure seems like the extra information >>>> is coming back from the JDBC driver. >>> >>> Are the production and test apps running on the same platform i.e. OS, >>> bitness, etc. >> >> Yes, the production and test apps are running on the same platform. The Java >> apps themselves are physically on the same Linux server. The production and >> test databases reside within the same instance of PostgreSQL. >> >> Also, I should have mentioned up front that I am well aware of the pitfalls >> of using floating point values and also the fact that PostgreSQL's "real" >> data type supports 6 digits of precision. What I do not understand is why my >> JDBC driver is returning more information than what I receive in psql or if >> I operate on a copy of the database. This leads me to believe that more >> information was available at insertion time and is somehow being made >> available to my application even though the data type should only store 6 >> digits. Let me see if I can write a quick little test case. >> > > Well I guess you could look in the dump file and see what is recorded there. > >> Tom >> > > > -- > Adrian Klaver > adrian.kla...@gmail.com -- Tom Duffey tduf...@trillitech.com 414-751-0600 x102 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Floating point error
On Feb 24, 2013, at 8:44 PM, Adrian Klaver wrote: > On 02/24/2013 06:13 PM, Tom Duffey wrote: >> Hi Everyone, >> >> Riddle me this. I have a database column of type "real" that gets mapped to >> a Java field of type double via JDBC. We have two databases, test and >> production, and the test database is periodically blown away and reloaded >> from a copy of production. We recently noticed that some values do not match >> when viewed within our application on test vs. production. More specifically: >> >> - Selecting values from both test and production DBs using psql shows >> "10.3885" as the value >> - The Java app on production shows "10.3884573" while the test app shows >> "10.3885" >> >> I have a hunch that when the value was originally inserted into the >> production DB it probably contained more than the 6 digits supported by the >> real data type. It may have even been exactly the "10.3884573" value we see >> when retrieving via JDBC on production. What I don't understand is why when >> the value gets mapped back to Java via JDBC those extra digits are coming >> back. Can anyone explain this or do you think I'm on the wrong track? I >> stepped through code and it sure seems like the extra information is coming >> back from the JDBC driver. > > Are the production and test apps running on the same platform i.e. OS, > bitness, etc. Yes, the production and test apps are running on the same platform. The Java apps themselves are physically on the same Linux server. The production and test databases reside within the same instance of PostgreSQL. Also, I should have mentioned up front that I am well aware of the pitfalls of using floating point values and also the fact that PostgreSQL's "real" data type supports 6 digits of precision. What I do not understand is why my JDBC driver is returning more information than what I receive in psql or if I operate on a copy of the database. This leads me to believe that more information was available at insertion time and is somehow being made available to my application even though the data type should only store 6 digits. Let me see if I can write a quick little test case. Tom -- Tom Duffey tduf...@trillitech.com 414-751-0600 x102 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Floating point error
Hi Everyone, Riddle me this. I have a database column of type "real" that gets mapped to a Java field of type double via JDBC. We have two databases, test and production, and the test database is periodically blown away and reloaded from a copy of production. We recently noticed that some values do not match when viewed within our application on test vs. production. More specifically: - Selecting values from both test and production DBs using psql shows "10.3885" as the value - The Java app on production shows "10.3884573" while the test app shows "10.3885" I have a hunch that when the value was originally inserted into the production DB it probably contained more than the 6 digits supported by the real data type. It may have even been exactly the "10.3884573" value we see when retrieving via JDBC on production. What I don't understand is why when the value gets mapped back to Java via JDBC those extra digits are coming back. Can anyone explain this or do you think I'm on the wrong track? I stepped through code and it sure seems like the extra information is coming back from the JDBC driver. Tom -- Tom Duffey tduf...@trillitech.com 414-751-0600 x102 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query memory usage
On May 15, 2010, at 8:00 PM, Tom Lane wrote: Tom Duffey writes: On May 15, 2010, at 7:28 PM, Tom Lane wrote: Well, I tried executing a large "copy (select ...)" query and couldn't see any memory bloat at all in either the backend or psql. So there's something relevant that you haven't told us. I hope you are right! The actual query is different because I was not aware until right before I posted this question that you can have a WHERE clause with COPY. Here is the actual query I ran: SELECT point_id || E'\t' || status || E'\t' || value || E'\t' || timestamp FROM point_history WHERE timestamp > NOW() - interval '18 months'; Ermm ... is that the whole query, or did you wrap it in COPY (...) TO STDOUT? The former case will cause psql to eat memory, because it tries to buffer the whole result of an ordinary query. In the latter case psql will just stream the data through to the output file. That's the whole query. If I understand your reply correctly it sounds like psql was the culprit and that I should try again using COPY (...) TO STDOUT, no? Tom -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query memory usage
On May 15, 2010, at 7:28 PM, Tom Lane wrote: Tom Duffey writes: On May 15, 2010, at 4:51 PM, Tom Lane wrote: What's being done on the client side with the data? I am executing the query in psql at the command line and piping the result to a file, e.g., psql < get_data.sql > data.sql Well, I tried executing a large "copy (select ...)" query and couldn't see any memory bloat at all in either the backend or psql. So there's something relevant that you haven't told us. Could we see the full schema (eg via psql \dt) for the table being copied? I hope you are right! The actual query is different because I was not aware until right before I posted this question that you can have a WHERE clause with COPY. Here is the actual query I ran: SELECT point_id || E'\t' || status || E'\t' || value || E'\t' || timestamp FROM point_history WHERE timestamp > NOW() - interval '18 months'; And here is the table schema: prod=> \dt point_history List of relations Schema | Name | Type | Owner +---+---+ public | point_history | table | prod (1 row) prod=> \d point_history Table "public.point_history" Column |Type | Modifiers ---+-+--- point_id | integer | not null value | real| not null status| integer | not null timestamp | timestamp without time zone | not null Indexes: "point_history_pkey" PRIMARY KEY, btree (point_id, "timestamp") Foreign-key constraints: "$1" FOREIGN KEY (point_id) REFERENCES point(id) Tom
Re: [GENERAL] Query memory usage
On May 15, 2010, at 4:51 PM, Tom Lane wrote: Tom Duffey writes: I have a table with several hundred million rows of timestamped values. Using pg_dump we are able to dump the entire table to disk no problem. However, I would like to retrieve a large subset of data from this table using something like: COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT; Executing this query causes our server to consume all available swap and crash. What's being done on the client side with the data? AFAIK that operation really shouldn't consume a lot of memory on the server side. It would help if you'd be more specific about which process is consuming swap space. I am executing the query in psql at the command line and piping the result to a file, e.g., psql < get_data.sql > data.sql Tom -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Query memory usage
Hi Everyone, I have a table with several hundred million rows of timestamped values. Using pg_dump we are able to dump the entire table to disk no problem. However, I would like to retrieve a large subset of data from this table using something like: COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT; Executing this query causes our server to consume all available swap and crash. Can anyone help me figure out what needs to be done to allow this query to execute? How long it takes doesn't really matter as long as it can be performed reliably. The database currently lives on a Red Hat EL 5.3 server with 16GB RAM and 4GB swap running PostgreSQL 8.3.7. Possibly relevant lines from postgresql.conf: shared_buffers = 4GB work_mem = 32MB maintenance_work_mem = 1GB effective_cache_size = 8GB Tom -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] VMWare file system / database corruption
On Sep 21, 2009, at 12:40 PM, Scott Marlowe wrote: On Mon, Sep 21, 2009 at 11:09 AM, Tom Duffey wrote: Hi All, We're having numerous problems with a PostgreSQL 8.3.7 database running on a virtual Linux server w/VMWare ESX. This is not by choice and I have been asking the operator of this equipment for details about the disk setup and here's what I got: "We have a SAN that is presenting an NFS share. VMWare sees that share and reads the VMDK file that make up the virtual file system." Does anyone with a better understanding of PostgreSQL and VMWare know if this is an unreliable setup for PostgreSQL? I see things like "NFS" and "VMWare" and start to get worried. I see VMWare and thing performance issues, I see NFS and thing dear god help us all. Even if properly setup NFS is a problem waiting to happen, and it's not reliable storage for a database in my opinion. That said, lots of folks do it. Ask for the NFS mount options from the sysadmin. Thanks to everyone so far for the insight. I'm trying to get more details about the hardware setup but am not making much progress. Here are some of the errors we're getting. I searched through archives and they all seem to point at hardware trouble but is there anything else I should be looking at? ERROR: invalid page header in block 2 of relation "pg_toast_19466_index" ERROR: invalid memory alloc request size 1667592311 STATEMENT: COPY public.version_bundle (node_id_hi, node_id_lo, bundle_data) TO stdout; ERROR: unexpected chunk number 1632 (expected 1629) for toast value 19711 in pg_toast_19184 STATEMENT: COPY public.data_binval (binval_id, binval_data) TO stdout; ERROR: invalid page header in block 414 of relation "pg_toast_19460_index" ERROR: could not open segment 1 of relation 1663/16386/16535 (target block 3966127611): No such file or directory I dealt with some of the above by reindexing or finding and deleting bad rows. I can now successfully dump the database but of course have missing data so the application is toast. What I'm really wondering now is how to prevent this from happening again and if that means moving the database to new hardware. Best Regards, Tom -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] VMWare file system / database corruption
Hi All, We're having numerous problems with a PostgreSQL 8.3.7 database running on a virtual Linux server w/VMWare ESX. This is not by choice and I have been asking the operator of this equipment for details about the disk setup and here's what I got: "We have a SAN that is presenting an NFS share. VMWare sees that share and reads the VMDK file that make up the virtual file system." Does anyone with a better understanding of PostgreSQL and VMWare know if this is an unreliable setup for PostgreSQL? I see things like "NFS" and "VMWare" and start to get worried. Tom -- Tom Duffey Technology by Design :: http://techbydesign.com/ p: 414.431.0800 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] could not access status of transaction
Hi Tom, On Mar 25, 2009, at 9:02 PM, Tom Lane wrote: Tom Duffey writes: One of our databases suffered a problem yesterday during a normal update, something we have been doing for years. Near the end of the process a foreign key constraint is rebuilt on a table containing several hundred million rows. Rebuilding the constraint failed with the following message: ERROR: could not access status of transaction 4294918145 DETAIL: Could not open file "pg_clog/0FFF": No such file or directory. This looks like a garden-variety data corruption problem to me. Trashed rows tend to yield this type of error because the "xmin" transaction ID is the first field that the server can check with any amount of finesse. 4294918145 is 4001 in hex, saith my calculator, so it looks like a bunch of bits went to ones --- or perhaps more likely, the row offset in the page header got clobbered and we're looking at some bytes that never were a transaction ID at all. So I'd try looking around for flaky RAM, failing disks, loose cables, that sort of thing ... Are you aware of any issues like this related to VMWare ESX? Our PostgreSQL server is running in such an environment and I asked the guys to review your email and they thought maybe this type of corruption could happen when the virtual machine was moved from one physical server to another, which we have done once or twice in the past few months. Tom -- Tom Duffey Technology by Design :: http://techbydesign.com/ p: 414.431.0800
[GENERAL] could not access status of transaction
Hi All, One of our databases suffered a problem yesterday during a normal update, something we have been doing for years. Near the end of the process a foreign key constraint is rebuilt on a table containing several hundred million rows. Rebuilding the constraint failed with the following message: ERROR: could not access status of transaction 4294918145 DETAIL: Could not open file "pg_clog/0FFF": No such file or directory. Here's the table and constraint definitions: CREATE TABLE point_history ( point_id integer NOT NULL, value real NOT NULL, status integer NOT NULL, "timestamp" timestamp without time zone NOT NULL ); ALTER TABLE point_history ADD CONSTRAINT point_history_pkey PRIMARY KEY (point_id, "timestamp"); ALTER TABLE point_history ADD CONSTRAINT "$1" FOREIGN KEY (point_id) REFERENCES point(id); I read about this and and created the pg_clog/0FFF file, filling it with 256K of zeroes and then vacuumed the database. Then I tried rebuilding the constraint and received a foreign key violation: DETAIL: Key (point_id)=(2) is not present in table "point". The crappy thing about this is that there was no record in the point_history table with point_id = 2: db=> select * from point_history where point_id = 2; point_id | value | status | timestamp --+---++--- (0 rows) I scratched my head for a while and decided to reload the database from a backup, which I'm still working on now. I'm wondering if anyone has any thoughts or ideas about this? I found references to similar problems but they were all for older versions of PostgreSQL. When the problem occurred we were running 8.3.6 and are now running 8.3.7. Tom -- Tom Duffey Technology by Design :: http://techbydesign.com/ p: 414.431.0800