Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data
Excerpts from Aren Cambre's message of dom mar 18 23:39:16 -0300 2012: Thanks. I am running this on 1 year old hardware on a wholly-encrypted C drive, encrypted with TrueCrypt. I suspect that I would have been alerted already with that or SMART if I had hardware issues? That sounds excessively optimistic -- not necessarily a good characteristic on a database administrator. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data
Thanks. I am running this on 1 year old hardware on a wholly-encrypted C drive, encrypted with TrueCrypt. I suspect that I would have been alerted already with that or SMART if I had hardware issues? That sounds excessively optimistic -- not necessarily a good characteristic on a database administrator. Which is why I could never be a DBA. :-) I am the only user. This is for my doctoral research project. Aren
Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data
Thanks. While creating that very file, I discovered that 1 row had blanks in every field despite a column having a *NOT NULL* constraint and another column being a* *serial. Removing that column appears to fixed the problem. Something about that column made Postgres unhappy, though. If I ran these queries: SELECT COUNT(*) FROM consistent.master WHERE citation_id IS NOT NULL UNION SELECT COUNT(*) FROM consistent.master UNION SELECT COUNT(*) FROM consistent.master WHERE citation_id IS NULL I got this result: 2085344 2085343 0 Not clear how adding a WHERE clause, whose only practical effect is to reduce the number of rows returned, could cause *more* rows to be returned. That seems buggy to me. Aren On Fri, Mar 16, 2012 at 12:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: a...@arencambre.com writes: (Note that this issue is described fully at http://stackoverflow.com/questions/9713537/postgres-left-join-is-creating-new-data . A few knowledgeable people have weighed in, and there is no solution identified.) This might be a bug, but you've not provided sufficient information for someone else to reproduce the problem. What would be good is a SQL script that reproduces the error from a standing start (empty database). regards, tom lane
Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data
Aren Cambre a...@arencambre.com wrote: SELECT COUNT(*) FROM consistent.master WHERE citation_id IS NOT NULL UNION SELECT COUNT(*) FROM consistent.master UNION SELECT COUNT(*) FROM consistent.master WHERE citation_id IS NULL I got this result: 2085344 2085343 0 Not clear how adding a WHERE clause, whose only practical effect is to reduce the number of rows returned, could cause *more* rows to be returned. That seems buggy to me. Never assume that the rows will be returned in any particular order from a query unless you specify ORDER BY. Assuming, as you seem to be doing, that rows from the left side of a UNION will be output before rows from the right side is not safe. You have no way of knowing which row in a result set like that came from which of the UNIONed SELECTs. In this case your assumption is almost certainly wrong. -Kevin -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data
Kevin, You're absolutely correct; there's no telling what order the planner will use. I did in fact run the queries separately and got the same result that I intimated in the UNIONed queries. I can no longer reproduce, however, because I have since altered the table. So if there is a bug, I may not be much help in nailing it down. The only thing I can help with is this appears to have happened at row # 583847 of just over 2 million rows, per a straight CSV dump of the table. Aren On Sun, Mar 18, 2012 at 3:22 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Aren Cambre a...@arencambre.com wrote: SELECT COUNT(*) FROM consistent.master WHERE citation_id IS NOT NULL UNION SELECT COUNT(*) FROM consistent.master UNION SELECT COUNT(*) FROM consistent.master WHERE citation_id IS NULL I got this result: 2085344 2085343 0 Not clear how adding a WHERE clause, whose only practical effect is to reduce the number of rows returned, could cause *more* rows to be returned. That seems buggy to me. Never assume that the rows will be returned in any particular order from a query unless you specify ORDER BY. Assuming, as you seem to be doing, that rows from the left side of a UNION will be output before rows from the right side is not safe. You have no way of knowing which row in a result set like that came from which of the UNIONed SELECTs. In this case your assumption is almost certainly wrong. -Kevin
Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data
Kevin Grittner kevin.gritt...@wicourts.gov wrote: Aren Cambre a...@arencambre.com wrote: SELECT COUNT(*) FROM consistent.master WHERE citation_id IS NOT NULL UNION SELECT COUNT(*) FROM consistent.master UNION SELECT COUNT(*) FROM consistent.master WHERE citation_id IS NULL I got this result: 2085344 2085343 0 Not clear how adding a WHERE clause, whose only practical effect is to reduce the number of rows returned, could cause *more* rows to be returned. Never assume that the rows will be returned in any particular order from a query unless you specify ORDER BY. Hmm. That doesn't explain why the numbers don't add up, though. Is that a copy/paste from an actual query run, or was there some hand-editing there? In particular, you might easily get that result if that last line was really: WHERE citation_id = '' instead of the IS NULL test. In the ANSI standard and in PostgreSQL there is a big difference between an empty string and NULL, although there is at least one product I know of which breaks from standard compliance by treating them as equivalent. -Kevin -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data
Kevin, Thanks. That was a direct copy/paste. It happened that the *UNION*ed queries spat out those results in the same order that the queries appeared. Just want to again emphasize that my database state has changed, so I am not sure this is remains a good case for finding a bug. Aren On Sun, Mar 18, 2012 at 3:33 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: Aren Cambre a...@arencambre.com wrote: SELECT COUNT(*) FROM consistent.master WHERE citation_id IS NOT NULL UNION SELECT COUNT(*) FROM consistent.master UNION SELECT COUNT(*) FROM consistent.master WHERE citation_id IS NULL I got this result: 2085344 2085343 0 Not clear how adding a WHERE clause, whose only practical effect is to reduce the number of rows returned, could cause *more* rows to be returned. Never assume that the rows will be returned in any particular order from a query unless you specify ORDER BY. Hmm. That doesn't explain why the numbers don't add up, though. Is that a copy/paste from an actual query run, or was there some hand-editing there? In particular, you might easily get that result if that last line was really: WHERE citation_id = '' instead of the IS NULL test. In the ANSI standard and in PostgreSQL there is a big difference between an empty string and NULL, although there is at least one product I know of which breaks from standard compliance by treating them as equivalent. -Kevin
Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data
Aren Cambre a...@arencambre.com writes: Just want to again emphasize that my database state has changed, so I am not sure this is remains a good case for finding a bug. Your description of an apparently-all-nulls row sounds a great deal like data corruption. I suspect there may be more wrong than just the one row, and that that's what's leading to the inconsistent results. It'd be worthwhile to run hardware diagnostics (memtest, disk tests), make sure you are on the latest minor release of Postgres, and perhaps do a database dump and reload. (If the latter seems impractical, at least consider reindexing everything.) regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data
Thanks. I am running this on 1 year old hardware on a wholly-encrypted C drive, encrypted with TrueCrypt. I suspect that I would have been alerted already with that or SMART if I had hardware issues? I am on 9.1.3, although I think the data was inserted back when I was running 9.0.x. I upgraded to 9.1 almost as soon as it came out, doing the requisite dumps and reloads. I have also reindexed a couple of times as diagnostic steps. So far, looks like only that one row has data corruption. Aren On Sun, Mar 18, 2012 at 9:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Aren Cambre a...@arencambre.com writes: Just want to again emphasize that my database state has changed, so I am not sure this is remains a good case for finding a bug. Your description of an apparently-all-nulls row sounds a great deal like data corruption. I suspect there may be more wrong than just the one row, and that that's what's leading to the inconsistent results. It'd be worthwhile to run hardware diagnostics (memtest, disk tests), make sure you are on the latest minor release of Postgres, and perhaps do a database dump and reload. (If the latter seems impractical, at least consider reindexing everything.) regards, tom lane
Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data
a...@arencambre.com writes: (Note that this issue is described fully at http://stackoverflow.com/questions/9713537/postgres-left-join-is-creating-new-data. A few knowledgeable people have weighed in, and there is no solution identified.) This might be a bug, but you've not provided sufficient information for someone else to reproduce the problem. What would be good is a SQL script that reproduces the error from a standing start (empty database). regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs