I have a table that I am using to store email token data for DSPAM.
I'm noticing that a handful (4-16) of rows with duplicate columns (uid,token) are sneaking into the table every day despite the primary key constraint. The server currently processes a few thousand emails per day, and this particular table currently has about 4.5 million rows in it. I feel as though I must be missing something here, because I have always strongly assumed that postgresql prevents this sort of chaos from happening by default. When these duplicate pairs make their way into the table, all havoc breaks loose ;) The rows with the duplicate pairs seem to become land mines. The postgresql process handling a query that subsequently runs across one of these rows dies, taking down the DSPAM daemon with it, and sometimes corrupting the postgresql shared memory enough that the postmaster has to shutdown the other processes and restart everything anew [1]. I am usually able to clean things up by running the following, but once or twice I've had to drop the unique constraint before postgresql would process the request without choking: delete from dspam_token_data where row(uid,token) in (select uid,token from dspam_token_data group by uid,token having count(*) > 1); (I don't worry about preserving one of the duplicates here.) I'm running postgresql-8.1.3. Here is the table in question: CREATE TABLE dspam.dspam_token_data ( uid int4 NOT NULL, token int8 NOT NULL, spam_hits int4, innocent_hits int4, last_hit date, CONSTRAINT dspam_token_data_pkey PRIMARY KEY (uid, token) ) WITHOUT OIDS; [2] What steps can I take to fix this? Is there any additional information I can provide? I've cleaned the table many times now. I then run VACUUM ANALYZE. My next step will probably be to hack the DSPAM sources to make the application more careful about not trying to insert rows that would violate the unique constraint. Even still, it seems that would only reduce the frequency of these occurrences, not eliminate them completely. Thanks! Cheers, -- Travis ---- Notes: [1] A condensed log file showing off exactly what happens here is attached. [2] Previously, the table lacked a primary key and instead used a unique constraint and index. This yielded the exact same results I am currently seeing using a two-column primary key, as above. The old table schema was: CREATE TABLE dspam.dspam_token_data ( uid int4, token int8, spam_hits int4, innocent_hits int4, last_hit date, CONSTRAINT dspam_token_data_uid_key UNIQUE (uid, token) ) WITHOUT OIDS;
Jun 5 17:58:14 shadow postgres[28775]: [21-1] PANIC: right sibling's left-link doesn't match Jun 5 17:58:14 shadow postgres[28775]: [21-2] STATEMENT: PREPARE dspam_update_plan (bigint) AS UPDATE dspam_token_data SET last_hit = CURRENT_DATE, innocent_hits = Jun 5 17:58:14 shadow postgres[28775]: [21-3] innocent_hits + 1 WHERE uid = '1' AND token = $1;PREPARE dspam_insert_plan (bigint, int, int) AS INSERT INTO dspam_token_data Jun 5 17:58:14 shadow postgres[28775]: [21-4] (uid, token, spam_hits, innocent_hits, last_hit) VALUES (1, $1, $2, $3, CURRENT_DATE); Jun 5 17:58:14 shadow dspam[7780]: query error: (null): see sql.errors for more details Jun 5 17:58:14 shadow postgres[10566]: [21-1] LOG: server process (PID 28775) was terminated by signal 6 Jun 5 17:58:14 shadow postgres[10566]: [22-1] LOG: terminating any other active server processes Jun 5 17:58:14 shadow postgres[7219]: [21-1] WARNING: terminating connection because of crash of another server process Jun 5 17:58:14 shadow postgres[7219]: [21-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server Jun 5 17:58:14 shadow postgres[7219]: [21-3] process exited abnormally and possibly corrupted shared memory. .... Jun 5 17:58:14 shadow postgres[7525]: [24-1] LOG: database system was interrupted at 2006-06-05 17:58:01 UTC Jun 5 17:58:14 shadow postgres[7525]: [25-1] LOG: checkpoint record is at 16/DA1D6868 Jun 5 17:58:14 shadow postgres[7525]: [26-1] LOG: redo record is at 16/DA0081D8; undo record is at 0/0; shutdown FALSE Jun 5 17:58:14 shadow postgres[7525]: [27-1] LOG: next transaction ID: 5691748; next OID: 148615 Jun 5 17:58:14 shadow postgres[7525]: [28-1] LOG: next MultiXactId: 4; next MultiXactOffset: 10 Jun 5 17:58:14 shadow postgres[7525]: [29-1] LOG: database system was not properly shut down; automatic recovery in progress Jun 5 17:58:14 shadow postgres[7525]: [30-1] LOG: redo starts at 16/DA0081D8 Jun 5 17:58:14 shadow postgres[7525]: [31-1] LOG: record with zero length at 16/DC2858F8 Jun 5 17:58:14 shadow postgres[7525]: [32-1] LOG: redo done at 16/DC2858C8 Jun 5 17:58:16 shadow postgres[7542]: [24-1] FATAL: the database system is starting up Jun 5 17:58:16 shadow postgres[7543]: [24-1] FATAL: the database system is starting up Jun 5 17:58:24 shadow postgres[7525]: [33-1] LOG: database system is ready Jun 5 17:58:24 shadow postgres[7525]: [34-1] LOG: transaction ID wrap limit is 1078513827, limited by database "xmail" Jun 5 17:59:24 shadow postgres[7632]: [24-1] LOG: unexpected EOF on client connection Jun 5 17:59:24 shadow postgres[7768]: [24-1] LOG: unexpected EOF on client connection Jun 5 17:59:37 shadow postgres[7836]: [24-1] WARNING: there is no transaction in progress
---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match