I'm trying to track down the cause of some duplicate rows in a table
which I would expect to be impossible due to a unique constraint. I'm
hoping that somebody here will be able to suggest something I might have
missed.
The problem relates to a bug filed against our application
(https://github.com/matrix-org/synapse/issues/6696). At first I put this
down to random data corruption on a single user's postgres instance, but
I've now seen three separate reports in as many days and am wondering if
there is more to it.
We have a table whose schema is as follows:
synapse=# \d current_state_events
Table "public.current_state_events"
Column | Type | Modifiers
------------+------+-----------
event_id | text | not null
room_id | text | not null
type | text | not null
state_key | text | not null
membership | text |
Indexes:
"current_state_events_event_id_key" UNIQUE CONSTRAINT, btree (event_id)
"current_state_events_room_id_type_state_key_key" UNIQUE
CONSTRAINT, btree (room_id, type, state_key)
"current_state_events_member_index" btree (state_key) WHERE type =
'm.room.member'::text
Despite the presence of the
current_state_events_room_id_type_state_key_key constraint, several
users have reported seeing errors which suggest that their tables have
duplicate rows for the same (room_id, type, state_key) triplet and
indeed querying confirms that to be the case:
synapse=> select count(*), room_id, type, state_key from
current_state_events group by 2,3,4 order by count(*) DESC LIMIT 2;
count | room_id | type | state_key
-------+-----------------------------------+---------------+-------------------------------------
3 | !ueLfVrSWYGMnFnoCbT:darkfasel.net | m.room.member |
@irc_ebi_:darkfasel.net
3 | !HwocBmCtBcHQhILtYQ:matrix.org | m.room.member |
@freenode_AlmarShenwan_:matrix.org
(2 rows)
Further investigation suggests that these are genuinely separate rows
rather than duplicate entries in an index.
The index appears to consider itself valid:
synapse=> select i.* from pg_class c join pg_index i on
i.indexrelid=c.oid where
relname='current_state_events_room_id_type_state_key_key';
indexrelid | indrelid | indnatts | indisunique | indisprimary |
indisexclusion | indimmediate | indisclustered | indisvalid |
indcheckxmin | indisready | indislive | indisreplident | indkey |
indcollation | indclass | indoption | indexprs | indpred
------------+----------+----------+-------------+--------------+----------------+--------------+----------------+------------+--------------+------------+-----------+----------------+--------+--------------+----------------+-----------+----------+---------
17023 | 16456 | 3 | t | f | f
| t | f | t | f | t
| t | f | 2 3 4 | 100 100 100 | 3126
3126 3126 | 0 0 0 | |
(1 row)
So, question: what could we be doing wrong to get ourselves into this
situation?
Some other datapoints which may be relevant:
* this has been reported by one user on postgres 9.6.15 and one on
10.10, though it's hard to be certain of the version that was running
when the duplication occurred
* the constraint is added when the table is first created (before any
data is added)
* At least one user reports that he has recently migrated his database
from one server to another via a `pg_dump -C` and later piping into psql.