Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data

2012-03-19 Thread Alvaro Herrera

Excerpts from Aren Cambre's message of dom mar 18 23:39:16 -0300 2012:
 Thanks. I am running this on 1 year old hardware on a wholly-encrypted C
 drive, encrypted with TrueCrypt. I suspect that I would have been alerted
 already with that or SMART if I had hardware issues?

That sounds excessively optimistic -- not necessarily a good
characteristic on a database administrator.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data

2012-03-19 Thread Aren Cambre

  Thanks. I am running this on 1 year old hardware on a wholly-encrypted C
  drive, encrypted with TrueCrypt. I suspect that I would have been alerted
  already with that or SMART if I had hardware issues?

 That sounds excessively optimistic -- not necessarily a good
 characteristic on a database administrator.


Which is why I could never be a DBA. :-)

I am the only user. This is for my doctoral research project.

Aren


Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data

2012-03-18 Thread Aren Cambre
Thanks. While creating that very file, I discovered that 1 row had blanks
in every field despite a column having a *NOT NULL* constraint and another
column being a* *serial. Removing that column appears to fixed the problem.

Something about that column made Postgres unhappy, though. If I ran these
queries:

SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NOT NULL
UNION
SELECT COUNT(*)
FROM consistent.master
UNION
SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL

I got this result:

2085344
2085343
0

Not clear how adding a WHERE clause, whose only practical effect is to
reduce the number of rows returned, could cause *more* rows to be returned.
That seems buggy to me.

Aren

On Fri, Mar 16, 2012 at 12:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 a...@arencambre.com writes:
  (Note that this issue is described fully at
 
 http://stackoverflow.com/questions/9713537/postgres-left-join-is-creating-new-data
 .
  A few knowledgeable people have weighed in, and there is no solution
  identified.)

 This might be a bug, but you've not provided sufficient information for
 someone else to reproduce the problem.  What would be good is a SQL
 script that reproduces the error from a standing start (empty database).

regards, tom lane



Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data

2012-03-18 Thread Kevin Grittner
Aren Cambre a...@arencambre.com wrote:
 
 SELECT COUNT(*)
 FROM consistent.master
 WHERE citation_id IS NOT NULL
 UNION
 SELECT COUNT(*)
 FROM consistent.master
 UNION
 SELECT COUNT(*)
 FROM consistent.master
 WHERE citation_id IS NULL
 
 I got this result:
 
 2085344
 2085343
 0
 
 Not clear how adding a WHERE clause, whose only practical effect
 is to reduce the number of rows returned, could cause *more* rows
 to be returned. That seems buggy to me.
 
Never assume that the rows will be returned in any particular order
from a query unless you specify ORDER BY.  Assuming, as you seem to
be doing, that rows from the left side of a UNION will be output
before rows from the right side is not safe.  You have no way of
knowing which row in a result set like that came from which of the
UNIONed SELECTs.  In this case your assumption is almost certainly
wrong.
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data

2012-03-18 Thread Aren Cambre
Kevin,

You're absolutely correct; there's no telling what order the planner will
use.

I did in fact run the queries separately and got the same result that I
intimated in the UNIONed queries. I can no longer reproduce, however,
because I have since altered the table. So if there is a bug, I may not be
much help in nailing it down.

The only thing I can help with is this appears to have happened at row
# 583847 of just over 2 million rows, per a straight CSV dump of the table.

Aren

On Sun, Mar 18, 2012 at 3:22 PM, Kevin Grittner kevin.gritt...@wicourts.gov
 wrote:

 Aren Cambre a...@arencambre.com wrote:

  SELECT COUNT(*)
  FROM consistent.master
  WHERE citation_id IS NOT NULL
  UNION
  SELECT COUNT(*)
  FROM consistent.master
  UNION
  SELECT COUNT(*)
  FROM consistent.master
  WHERE citation_id IS NULL
 
  I got this result:
 
  2085344
  2085343
  0
 
  Not clear how adding a WHERE clause, whose only practical effect
  is to reduce the number of rows returned, could cause *more* rows
  to be returned. That seems buggy to me.

 Never assume that the rows will be returned in any particular order
 from a query unless you specify ORDER BY.  Assuming, as you seem to
 be doing, that rows from the left side of a UNION will be output
 before rows from the right side is not safe.  You have no way of
 knowing which row in a result set like that came from which of the
 UNIONed SELECTs.  In this case your assumption is almost certainly
 wrong.

 -Kevin



Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data

2012-03-18 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Aren Cambre a...@arencambre.com wrote:
 
 SELECT COUNT(*)
 FROM consistent.master
 WHERE citation_id IS NOT NULL
 UNION
 SELECT COUNT(*)
 FROM consistent.master
 UNION
 SELECT COUNT(*)
 FROM consistent.master
 WHERE citation_id IS NULL
 
 I got this result:
 
 2085344
 2085343
 0
 
 Not clear how adding a WHERE clause, whose only practical effect
 is to reduce the number of rows returned, could cause *more* rows
 to be returned.
 
 Never assume that the rows will be returned in any particular
 order from a query unless you specify ORDER BY.
 
Hmm.  That doesn't explain why the numbers don't add up, though. Is
that a copy/paste from an actual query run, or was there some
hand-editing there?  In particular, you might easily get that result
if that last line was really:
 
  WHERE citation_id = ''
 
instead of the IS NULL test.  In the ANSI standard and in PostgreSQL
there is a big difference between an empty string and NULL, although
there is at least one product I know of which breaks from standard
compliance by treating them as equivalent.
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data

2012-03-18 Thread Aren Cambre
Kevin,

Thanks. That was a direct copy/paste. It happened that the *UNION*ed
queries spat out those results in the same order that the queries appeared.

Just want to again emphasize that my database state has changed, so I am
not sure this is remains a good case for finding a bug.

Aren

On Sun, Mar 18, 2012 at 3:33 PM, Kevin Grittner kevin.gritt...@wicourts.gov
 wrote:

 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
  Aren Cambre a...@arencambre.com wrote:
 
  SELECT COUNT(*)
  FROM consistent.master
  WHERE citation_id IS NOT NULL
  UNION
  SELECT COUNT(*)
  FROM consistent.master
  UNION
  SELECT COUNT(*)
  FROM consistent.master
  WHERE citation_id IS NULL
 
  I got this result:
 
  2085344
  2085343
  0
 
  Not clear how adding a WHERE clause, whose only practical effect
  is to reduce the number of rows returned, could cause *more* rows
  to be returned.

  Never assume that the rows will be returned in any particular
  order from a query unless you specify ORDER BY.

 Hmm.  That doesn't explain why the numbers don't add up, though. Is
 that a copy/paste from an actual query run, or was there some
 hand-editing there?  In particular, you might easily get that result
 if that last line was really:

  WHERE citation_id = ''

 instead of the IS NULL test.  In the ANSI standard and in PostgreSQL
 there is a big difference between an empty string and NULL, although
 there is at least one product I know of which breaks from standard
 compliance by treating them as equivalent.

 -Kevin



Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data

2012-03-18 Thread Tom Lane
Aren Cambre a...@arencambre.com writes:
 Just want to again emphasize that my database state has changed, so I am
 not sure this is remains a good case for finding a bug.

Your description of an apparently-all-nulls row sounds a great deal like
data corruption.  I suspect there may be more wrong than just the one
row, and that that's what's leading to the inconsistent results.  It'd
be worthwhile to run hardware diagnostics (memtest, disk tests), make
sure you are on the latest minor release of Postgres, and perhaps do a
database dump and reload.  (If the latter seems impractical, at least
consider reindexing everything.)

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data

2012-03-18 Thread Aren Cambre
Thanks. I am running this on 1 year old hardware on a wholly-encrypted C
drive, encrypted with TrueCrypt. I suspect that I would have been alerted
already with that or SMART if I had hardware issues?

I am on 9.1.3, although I think the data was inserted back when I was
running 9.0.x. I upgraded to 9.1 almost as soon as it came out, doing the
requisite dumps and reloads. I have also reindexed a couple of times as
diagnostic steps.

So far, looks like only that one row has data corruption.

Aren

On Sun, Mar 18, 2012 at 9:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Aren Cambre a...@arencambre.com writes:
  Just want to again emphasize that my database state has changed, so I am
  not sure this is remains a good case for finding a bug.

 Your description of an apparently-all-nulls row sounds a great deal like
 data corruption.  I suspect there may be more wrong than just the one
 row, and that that's what's leading to the inconsistent results.  It'd
 be worthwhile to run hardware diagnostics (memtest, disk tests), make
 sure you are on the latest minor release of Postgres, and perhaps do a
 database dump and reload.  (If the latter seems impractical, at least
 consider reindexing everything.)

regards, tom lane



Re: [BUGS] BUG #6535: LEFT JOIN on large table is altering data

2012-03-16 Thread Tom Lane
a...@arencambre.com writes:
 (Note that this issue is described fully at
 http://stackoverflow.com/questions/9713537/postgres-left-join-is-creating-new-data.
 A few knowledgeable people have weighed in, and there is no solution
 identified.)

This might be a bug, but you've not provided sufficient information for
someone else to reproduce the problem.  What would be good is a SQL
script that reproduces the error from a standing start (empty database).

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs