I just had some interaction with RhodiumToad on IRC about a duplicated primary key problem I ran into today. After some poking around he suggested that I send this to -bugs since it seems like an interesting error.
The short version is that I have a table ("ways") which has a primary key ("id") that has managed to get 4 of its records duplicated (out of ~168 million): select id, version, tstamp from ways where id in (26245218,26709186,26709284,26926573) order by id, version; id | version | tstamp ----------+---------+--------------------- 26245218 | 12 | 2011-02-06 06:54:10 26245218 | 13 | 2013-01-28 02:37:56 26709186 | 4 | 2008-09-02 04:39:21 26709186 | 4 | 2008-09-02 04:39:21 26709284 | 4 | 2008-10-26 14:06:03 26709284 | 5 | 2013-01-28 02:38:30 26926573 | 4 | 2009-12-27 07:13:28 26926573 | 4 | 2009-12-27 07:13:28 version and tstamp are just additional data to show that two of the entries have identical values while two have different values. More on that in a bit. Here is the table definition: Column | Type | Modifiers | Storage --------------+-----------------------------+-----------+---------- id | bigint | not null | plain version | integer | not null | plain user_id | integer | not null | plain tstamp | timestamp without time zone | not null | plain changeset_id | bigint | not null | plain tags | hstore | | extended nodes | bigint[] | | extended linestring | geometry(Geometry,4326) | | main Indexes: "pk_ways" PRIMARY KEY, btree (id) "idx_ways_linestring" gist (linestring), tablespace "ssd" "idx_ways_tags" gin (tags) Has OIDs: no Tablespace: "ssd" If you don't read your email with a monospaced font, here is a pastie'd version: http://pastie.org/5979660 Some background: I am running a postgis database filled with OpenStreetMap data. The database was initially populated with data from the "planet" file in early January. The process involves creating a set of data files and then loading them into the database using \copy. The basic load process can be seen in this file: https://github.com/openstreetmap/osmosis/blob/master/package/script/pgsnapshot_load_0.6.sql The file does have some sections that are optional, depending on your setup so I did not execute everything in there. Taking out only the statements that affect the problematic ways table: ALTER TABLE ways DROP CONSTRAINT pk_ways; DROP INDEX idx_ways_linestring; \copy ways FROM 'ways.txt' ALTER TABLE ONLY ways ADD CONSTRAINT pk_ways PRIMARY KEY (id); CREATE INDEX idx_ways_linestring ON ways USING gist (linestring); plus an index creation on the hstore "tags" column that isn't in that file. After the copy and index building was done, I set it up to update from the OpenStreetMap minutely diff files. This means that every minute, updates are being applied to the database. This has been working fine for about 3 weeks until this morning when the minutely update crashed with a duplicate key error: Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "pk_ways" Detail: Key (id)=(26926573) already exists. Knowing how the update software works, my first response was "that's impossible" since it actually checks for the existence of an object in the database and if it exists, it issues an UPDATE query. Otherwise it does an INSERT. Of course the problem is actually that there are already two copies of this object in the database. So the update software finds an existing record and issues an UPDATE query which fails with the duplicate key error. As I said, I did the initial import at the beginning of January. So if you look back up at the data you will see that at least all the earlier versions of the objects have a timestamp long before the import. My first intuition was that this meant that there were duplicates in the ways.txt file that somehow went undetected when I added the PK constraint. While I don't have the original files any more I do still have the planet file and the same software that generated the files so I just ran that again and I do not see any duplicates in the file. This leaves me completely clueless as to how the duplicates got into the database. Especially the one with the ID 26709186 since it hasn't been changed in OpenStreetMap in years so there is no reason for it to have been touched in any way since the import. But looking at the timestamps again, it looks like an UPDATE query may have successfully executed on two of the duplicates during minutely diff application and updated one of the records on ways 26245218 and 26709284. The timestamp is when this object was last touched in OpenStreetMap. These two ways were changed within seconds of each other although they don't seem to share anything. Edited by different users in different locations. Here are some queries and their results that RhodiumToad had me run to try and track things down: select ctid,xmin,xmax,id from ways where id in (26926573,26709284,26709186,26245218) order by id; ctid | xmin | xmax | id ---------------+--------+--------+---------- (11249625,3) | 979685 | 0 | 26245218 (1501614,5) | 955993 | 0 | 26245218 (11247884,18) | 979686 | 0 | 26709186 (1520052,19) | 955993 | 0 | 26709186 (1520056,15) | 976128 | 0 | 26709284 (11249780,8) | 979686 | 0 | 26709284 (1528888,14) | 955993 | 0 | 26926573 (11249622,6) | 979685 | 982467 | 26926573 ( http://pastie.org/5979497 ) The last one there seems to be showing the transaction that failed and was rolled back earlier this morning. select * from heap_page_items(get_raw_page('ways',11249622)) where lp=6; lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid ----+--------+----------+--------+--------+--------+----------+--------------+-------------+------------+--------+--------+------- 6 | 7888 | 1 | 304 | 979685 | 982467 | 143 | (11399071,5) | 8 | 10498 | 24 | | ( http://pastie.org/pastes/5979566/text ) select i, (x).* from (select i, bt_page_items('pk_ways',i::integer) as x from generate_series(1, pg_relation_size('pk_ways')/8192 - 1) i offset 0) s where (x).ctid in ('(11249625,3)','(1501614,5)','(11247884,18)','(1520052,19)','(1520056,15)','(11249780,8)','(1528888,14)','(11249622,6)'); i | itemoffset | ctid | itemlen | nulls | vars | data --------+------------+---------------+---------+-------+------+------------------------- 50039 | 181 | (11249625,3) | 16 | f | f | 62 78 90 01 00 00 00 00 50039 | 183 | (1501614,5) | 16 | f | f | 62 78 90 01 00 00 00 00 506579 | 214 | (11249622,6) | 16 | f | f | ed dd 9a 01 00 00 00 00 506579 | 219 | (1528888,14) | 16 | f | f | ed dd 9a 01 00 00 00 00 535456 | 4 | (11247884,18) | 16 | f | f | c2 8c 97 01 00 00 00 00 535456 | 5 | (1520052,19) | 16 | f | f | c2 8c 97 01 00 00 00 00 535456 | 77 | (11249780,8) | 16 | f | f | 24 8d 97 01 00 00 00 00 535456 | 80 | (1520056,15) | 16 | f | f | 24 8d 97 01 00 00 00 00 ( http://pastie.org/5980159 ) Additional system information: OS: Linux Mint 14 (based on Ubuntu 12.10) Postgresql version: PostgreSQL 9.1.7 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 64-bit Postgis version: 2.0.1 r9979 The database is running on a 4 disk software RAID5 (yeah... not built for speed) plus one SSD which holds the ways table is along with its linestring index (but not the PK index) And one more thing I just realized: I did turn off fsync for the import, intending to turn it back on for normal usage but it looks like I never turned it back on. However I haven't had any ungraceful shutdowns since the import so I don't think this should be a factor. The nature of the database is that I can rebuild it at any time from the source (although it does take several days) so I'm not overly vigilant about this. I have left the database in its current state for now so I can run some more checks if needed. Toby -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs