Re: [HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint

2006-06-07 Thread Kenneth Marshall
Travis,

We have used postgresql 7.4, 8.0, and 8.1 with DSPAM and have
never had a single problem like you are describing. In the past
on this mailing list, these sorts of issues have been caused by
hardware problems on the DB server in some cases. Good luck with
tracking it down.

Ken

On Tue, Jun 06, 2006 at 05:45:26AM -0400, Travis Cross wrote:
 I have a table that I am using to store email token data for DSPAM.
 
 I'm noticing that a handful (4-16) of rows with duplicate columns
 (uid,token) are sneaking into the table every day despite the
 primary key constraint.
 
 The server currently processes a few thousand emails per day, and
 this particular table currently has about 4.5 million rows in it.
 
 I feel as though I must be missing something here, because I have
 always strongly assumed that postgresql prevents this sort of chaos
 from happening by default.
 
 When these duplicate pairs make their way into the table, all havoc
 breaks loose ;)  The rows with the duplicate pairs seem to become
 land mines.  The postgresql process handling a query that
 subsequently runs across one of these rows dies, taking down
 the DSPAM daemon with it, and sometimes corrupting the postgresql
 shared memory enough that the postmaster has to shutdown the other
 processes and restart everything anew [1].
 
 I am usually able to clean things up by running the following, but
 once or twice I've had to drop the unique constraint before
 postgresql would process the request without choking:
 
 delete from dspam_token_data
where row(uid,token) in
(select uid,token
  from dspam_token_data
  group by uid,token
  having count(*)  1);
 
 (I don't worry about preserving one of the duplicates here.)
 
 I'm running postgresql-8.1.3.  Here is the table in question:
 
 CREATE TABLE dspam.dspam_token_data
 (
uid int4 NOT NULL,
token int8 NOT NULL,
spam_hits int4,
innocent_hits int4,
last_hit date,
CONSTRAINT dspam_token_data_pkey PRIMARY KEY (uid, token)
 )
 WITHOUT OIDS;
 
 [2]
 
 What steps can I take to fix this?  Is there any additional
 information I can provide?
 
 I've cleaned the table many times now.  I then run VACUUM ANALYZE.
 
 My next step will probably be to hack the DSPAM sources to make the
 application more careful about not trying to insert rows that would
 violate the unique constraint.  Even still, it seems that would only
 reduce the frequency of these occurrences, not eliminate them
 completely.
 
 Thanks!
 
 Cheers,
 
 -- Travis
 
 
 
 Notes:
 [1] A condensed log file showing off exactly what happens here is
 attached.
 
 [2]  Previously, the table lacked a primary key and instead used a
 unique constraint and index.  This yielded the exact same results I
 am currently seeing using a two-column primary key, as above.  The
 old table schema was:
 
 CREATE TABLE dspam.dspam_token_data
 (
uid int4,
token int8,
spam_hits int4,
innocent_hits int4,
last_hit date,
CONSTRAINT dspam_token_data_uid_key UNIQUE (uid, token)
 )
 WITHOUT OIDS;
 

 Jun  5 17:58:14 shadow postgres[28775]: [21-1] PANIC:  right sibling's 
 left-link doesn't match
 Jun  5 17:58:14 shadow postgres[28775]: [21-2] STATEMENT:  PREPARE 
 dspam_update_plan (bigint) AS UPDATE dspam_token_data SET last_hit = 
 CURRENT_DATE, innocent_hits =
 Jun  5 17:58:14 shadow postgres[28775]: [21-3]  innocent_hits + 1 WHERE uid = 
 '1' AND token = $1;PREPARE dspam_insert_plan (bigint, int, int) AS INSERT 
 INTO dspam_token_data
 Jun  5 17:58:14 shadow postgres[28775]: [21-4]  (uid, token, spam_hits, 
 innocent_hits, last_hit) VALUES (1, $1, $2, $3, CURRENT_DATE);
 Jun  5 17:58:14 shadow dspam[7780]: query error: (null): see sql.errors for 
 more details
 Jun  5 17:58:14 shadow postgres[10566]: [21-1] LOG:  server process (PID 
 28775) was terminated by signal 6
 Jun  5 17:58:14 shadow postgres[10566]: [22-1] LOG:  terminating any other 
 active server processes
 Jun  5 17:58:14 shadow postgres[7219]: [21-1] WARNING:  terminating 
 connection because of crash of another server process
 Jun  5 17:58:14 shadow postgres[7219]: [21-2] DETAIL:  The postmaster has 
 commanded this server process to roll back the current transaction and exit, 
 because another server
 Jun  5 17:58:14 shadow postgres[7219]: [21-3]  process exited abnormally and 
 possibly corrupted shared memory.
 
 Jun  5 17:58:14 shadow postgres[7525]: [24-1] LOG:  database system was 
 interrupted at 2006-06-05 17:58:01 UTC
 Jun  5 17:58:14 shadow postgres[7525]: [25-1] LOG:  checkpoint record is at 
 16/DA1D6868
 Jun  5 17:58:14 shadow postgres[7525]: [26-1] LOG:  redo record is at 
 16/DA0081D8; undo record is at 0/0; shutdown FALSE
 Jun  5 17:58:14 shadow postgres[7525]: [27-1] LOG:  next transaction ID: 
 5691748; next OID: 148615
 Jun  5 17:58:14 shadow postgres[7525]: [28-1] LOG:  next MultiXactId: 4; next 
 MultiXactOffset: 10
 Jun  5 17:58:14 shadow postgres[7525]: [29-1] LOG:  database 

[HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint

2006-06-06 Thread Travis Cross

I have a table that I am using to store email token data for DSPAM.

I'm noticing that a handful (4-16) of rows with duplicate columns
(uid,token) are sneaking into the table every day despite the
primary key constraint.

The server currently processes a few thousand emails per day, and
this particular table currently has about 4.5 million rows in it.

I feel as though I must be missing something here, because I have
always strongly assumed that postgresql prevents this sort of chaos
from happening by default.

When these duplicate pairs make their way into the table, all havoc
breaks loose ;)  The rows with the duplicate pairs seem to become
land mines.  The postgresql process handling a query that
subsequently runs across one of these rows dies, taking down
the DSPAM daemon with it, and sometimes corrupting the postgresql
shared memory enough that the postmaster has to shutdown the other
processes and restart everything anew [1].

I am usually able to clean things up by running the following, but
once or twice I've had to drop the unique constraint before
postgresql would process the request without choking:

delete from dspam_token_data
   where row(uid,token) in
   (select uid,token
 from dspam_token_data
 group by uid,token
 having count(*)  1);

(I don't worry about preserving one of the duplicates here.)

I'm running postgresql-8.1.3.  Here is the table in question:

CREATE TABLE dspam.dspam_token_data
(
   uid int4 NOT NULL,
   token int8 NOT NULL,
   spam_hits int4,
   innocent_hits int4,
   last_hit date,
   CONSTRAINT dspam_token_data_pkey PRIMARY KEY (uid, token)
)
WITHOUT OIDS;

[2]

What steps can I take to fix this?  Is there any additional
information I can provide?

I've cleaned the table many times now.  I then run VACUUM ANALYZE.

My next step will probably be to hack the DSPAM sources to make the
application more careful about not trying to insert rows that would
violate the unique constraint.  Even still, it seems that would only
reduce the frequency of these occurrences, not eliminate them
completely.

Thanks!

Cheers,

-- Travis



Notes:
[1] A condensed log file showing off exactly what happens here is
attached.

[2]  Previously, the table lacked a primary key and instead used a
unique constraint and index.  This yielded the exact same results I
am currently seeing using a two-column primary key, as above.  The
old table schema was:

CREATE TABLE dspam.dspam_token_data
(
   uid int4,
   token int8,
   spam_hits int4,
   innocent_hits int4,
   last_hit date,
   CONSTRAINT dspam_token_data_uid_key UNIQUE (uid, token)
)
WITHOUT OIDS;

Jun  5 17:58:14 shadow postgres[28775]: [21-1] PANIC:  right sibling's 
left-link doesn't match
Jun  5 17:58:14 shadow postgres[28775]: [21-2] STATEMENT:  PREPARE 
dspam_update_plan (bigint) AS UPDATE dspam_token_data SET last_hit = 
CURRENT_DATE, innocent_hits =
Jun  5 17:58:14 shadow postgres[28775]: [21-3]  innocent_hits + 1 WHERE uid = 
'1' AND token = $1;PREPARE dspam_insert_plan (bigint, int, int) AS INSERT INTO 
dspam_token_data
Jun  5 17:58:14 shadow postgres[28775]: [21-4]  (uid, token, spam_hits, 
innocent_hits, last_hit) VALUES (1, $1, $2, $3, CURRENT_DATE);
Jun  5 17:58:14 shadow dspam[7780]: query error: (null): see sql.errors for 
more details
Jun  5 17:58:14 shadow postgres[10566]: [21-1] LOG:  server process (PID 28775) 
was terminated by signal 6
Jun  5 17:58:14 shadow postgres[10566]: [22-1] LOG:  terminating any other 
active server processes
Jun  5 17:58:14 shadow postgres[7219]: [21-1] WARNING:  terminating connection 
because of crash of another server process
Jun  5 17:58:14 shadow postgres[7219]: [21-2] DETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and exit, 
because another server
Jun  5 17:58:14 shadow postgres[7219]: [21-3]  process exited abnormally and 
possibly corrupted shared memory.

Jun  5 17:58:14 shadow postgres[7525]: [24-1] LOG:  database system was 
interrupted at 2006-06-05 17:58:01 UTC
Jun  5 17:58:14 shadow postgres[7525]: [25-1] LOG:  checkpoint record is at 
16/DA1D6868
Jun  5 17:58:14 shadow postgres[7525]: [26-1] LOG:  redo record is at 
16/DA0081D8; undo record is at 0/0; shutdown FALSE
Jun  5 17:58:14 shadow postgres[7525]: [27-1] LOG:  next transaction ID: 
5691748; next OID: 148615
Jun  5 17:58:14 shadow postgres[7525]: [28-1] LOG:  next MultiXactId: 4; next 
MultiXactOffset: 10
Jun  5 17:58:14 shadow postgres[7525]: [29-1] LOG:  database system was not 
properly shut down; automatic recovery in progress
Jun  5 17:58:14 shadow postgres[7525]: [30-1] LOG:  redo starts at 16/DA0081D8
Jun  5 17:58:14 shadow postgres[7525]: [31-1] LOG:  record with zero length at 
16/DC2858F8
Jun  5 17:58:14 shadow postgres[7525]: [32-1] LOG:  redo done at 16/DC2858C8
Jun  5 17:58:16 shadow postgres[7542]: [24-1] FATAL:  the database system is 
starting up
Jun  5 17:58:16 shadow postgres[7543]: [24-1] FATAL:  the database 

Re: [HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint

2006-06-06 Thread Tom Lane
Travis Cross [EMAIL PROTECTED] writes:
 I'm noticing that a handful (4-16) of rows with duplicate columns
 (uid,token) are sneaking into the table every day despite the
 primary key constraint.

Corrupt index, looks like ... you might try reindexing the index.

I don't believe that the PANIC you show has anything directly to do
with duplicate entries.  It is a symptom of corrupt index structure.
Now a corrupt index might also explain failure to notice duplications,
but changing your application isn't going to fix whatever is causing
it.  You need to look for server-side causes.

Any database or system crashes on this server (before this problem
started)?  Do you *know* that the disk drive will not lie about write
complete?  What is the platform and storage system, anyway?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint

2006-06-06 Thread Ian Barwick

On 6/6/06, Tom Lane [EMAIL PROTECTED] wrote:

Travis Cross [EMAIL PROTECTED] writes:
 I'm noticing that a handful (4-16) of rows with duplicate columns
 (uid,token) are sneaking into the table every day despite the
 primary key constraint.

Corrupt index, looks like ... you might try reindexing the index.

I don't believe that the PANIC you show has anything directly to do
with duplicate entries.  It is a symptom of corrupt index structure.
Now a corrupt index might also explain failure to notice duplications,
but changing your application isn't going to fix whatever is causing
it.  You need to look for server-side causes.

Any database or system crashes on this server (before this problem
started)?  Do you *know* that the disk drive will not lie about write
complete?  What is the platform and storage system, anyway?


FWIW I've seen similar behaviour to this (PostgreSQL processes exiting
abnormally, index corruption with duplicate primary keys) on servers
with defective RAM chips.

Ian Barwick

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match