I have a table that I am using to store email token data for DSPAM.

I'm noticing that a handful (4-16) of rows with duplicate columns
(uid,token) are sneaking into the table every day despite the
primary key constraint.

The server currently processes a few thousand emails per day, and
this particular table currently has about 4.5 million rows in it.

I feel as though I must be missing something here, because I have
always strongly assumed that postgresql prevents this sort of chaos
from happening by default.

When these duplicate pairs make their way into the table, all havoc
breaks loose ;)  The rows with the duplicate pairs seem to become
land mines.  The postgresql process handling a query that
subsequently runs across one of these rows dies, taking down
the DSPAM daemon with it, and sometimes corrupting the postgresql
shared memory enough that the postmaster has to shutdown the other
processes and restart everything anew [1].

I am usually able to clean things up by running the following, but
once or twice I've had to drop the unique constraint before
postgresql would process the request without choking:

delete from dspam_token_data
   where row(uid,token) in
           (select uid,token
             from dspam_token_data
             group by uid,token
             having count(*) > 1);

(I don't worry about preserving one of the duplicates here.)

I'm running postgresql-8.1.3.  Here is the table in question:

CREATE TABLE dspam.dspam_token_data
(
   uid int4 NOT NULL,
   token int8 NOT NULL,
   spam_hits int4,
   innocent_hits int4,
   last_hit date,
   CONSTRAINT dspam_token_data_pkey PRIMARY KEY (uid, token)
)
WITHOUT OIDS;

[2]

What steps can I take to fix this?  Is there any additional
information I can provide?

I've cleaned the table many times now.  I then run VACUUM ANALYZE.

My next step will probably be to hack the DSPAM sources to make the
application more careful about not trying to insert rows that would
violate the unique constraint.  Even still, it seems that would only
reduce the frequency of these occurrences, not eliminate them
completely.

Thanks!

Cheers,

-- Travis


----
Notes:
[1] A condensed log file showing off exactly what happens here is
attached.

[2]  Previously, the table lacked a primary key and instead used a
unique constraint and index.  This yielded the exact same results I
am currently seeing using a two-column primary key, as above.  The
old table schema was:

CREATE TABLE dspam.dspam_token_data
(
   uid int4,
   token int8,
   spam_hits int4,
   innocent_hits int4,
   last_hit date,
   CONSTRAINT dspam_token_data_uid_key UNIQUE (uid, token)
)
WITHOUT OIDS;

Jun  5 17:58:14 shadow postgres[28775]: [21-1] PANIC:  right sibling's 
left-link doesn't match
Jun  5 17:58:14 shadow postgres[28775]: [21-2] STATEMENT:  PREPARE 
dspam_update_plan (bigint) AS UPDATE dspam_token_data SET last_hit = 
CURRENT_DATE, innocent_hits =
Jun  5 17:58:14 shadow postgres[28775]: [21-3]  innocent_hits + 1 WHERE uid = 
'1' AND token = $1;PREPARE dspam_insert_plan (bigint, int, int) AS INSERT INTO 
dspam_token_data
Jun  5 17:58:14 shadow postgres[28775]: [21-4]  (uid, token, spam_hits, 
innocent_hits, last_hit) VALUES (1, $1, $2, $3, CURRENT_DATE);
Jun  5 17:58:14 shadow dspam[7780]: query error: (null): see sql.errors for 
more details
Jun  5 17:58:14 shadow postgres[10566]: [21-1] LOG:  server process (PID 28775) 
was terminated by signal 6
Jun  5 17:58:14 shadow postgres[10566]: [22-1] LOG:  terminating any other 
active server processes
Jun  5 17:58:14 shadow postgres[7219]: [21-1] WARNING:  terminating connection 
because of crash of another server process
Jun  5 17:58:14 shadow postgres[7219]: [21-2] DETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and exit, 
because another server
Jun  5 17:58:14 shadow postgres[7219]: [21-3]  process exited abnormally and 
possibly corrupted shared memory.
....
Jun  5 17:58:14 shadow postgres[7525]: [24-1] LOG:  database system was 
interrupted at 2006-06-05 17:58:01 UTC
Jun  5 17:58:14 shadow postgres[7525]: [25-1] LOG:  checkpoint record is at 
16/DA1D6868
Jun  5 17:58:14 shadow postgres[7525]: [26-1] LOG:  redo record is at 
16/DA0081D8; undo record is at 0/0; shutdown FALSE
Jun  5 17:58:14 shadow postgres[7525]: [27-1] LOG:  next transaction ID: 
5691748; next OID: 148615
Jun  5 17:58:14 shadow postgres[7525]: [28-1] LOG:  next MultiXactId: 4; next 
MultiXactOffset: 10
Jun  5 17:58:14 shadow postgres[7525]: [29-1] LOG:  database system was not 
properly shut down; automatic recovery in progress
Jun  5 17:58:14 shadow postgres[7525]: [30-1] LOG:  redo starts at 16/DA0081D8
Jun  5 17:58:14 shadow postgres[7525]: [31-1] LOG:  record with zero length at 
16/DC2858F8
Jun  5 17:58:14 shadow postgres[7525]: [32-1] LOG:  redo done at 16/DC2858C8
Jun  5 17:58:16 shadow postgres[7542]: [24-1] FATAL:  the database system is 
starting up
Jun  5 17:58:16 shadow postgres[7543]: [24-1] FATAL:  the database system is 
starting up
Jun  5 17:58:24 shadow postgres[7525]: [33-1] LOG:  database system is ready
Jun  5 17:58:24 shadow postgres[7525]: [34-1] LOG:  transaction ID wrap limit 
is 1078513827, limited by database "xmail"
Jun  5 17:59:24 shadow postgres[7632]: [24-1] LOG:  unexpected EOF on client 
connection
Jun  5 17:59:24 shadow postgres[7768]: [24-1] LOG:  unexpected EOF on client 
connection
Jun  5 17:59:37 shadow postgres[7836]: [24-1] WARNING:  there is no transaction 
in progress

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to