If you've ever loaded 100 million rows, you'll know just how annoying it is to find that you have a duplicate row somewhere in there. Experience shows that there is always one, whatever oath the analyst swears beforehand.
It's hard to find out which row is the duplicate, plus you've just screwed up a big table. It needs a VACUUM, then a reload. I'd like to find a way to handle this manual task programmatically. What I'd like to do is add an ERRORTABLE clause to COPY. The main problem is how we detect a duplicate row violation, yet prevent it from aborting the transaction. What I propose is to log uniqueness violations only when there is only a single unique index on a table. Flow of control would be to: locate page of index where value should go lock index block _bt_check_unique, but don't error if violation then insert row into ERRORTABLE else insert row into data block insert row into unique index unlock index block do other indexes Which is very similar code to the recently proposed MERGE logic. With that logic, a COPY will run to completion, yet be able to report the odd couple of unique index violations in found along the way. More importantly we can then handle rows those with another program to locate where those errors came from and resolve them. In most cases with a single unique index, the index inserts are rightmost index entries anyway, so there is scope here for an additional optimisation: keep both index and data blocks locked across multiple row inserts until either the unique index or the data block fills. Thats better than taking a full table lock, since it allows concurrent access to the rest of the table, but its also more efficient than continually re-requesting the same blocks (which looks like about 10-15% saving on performance from hash lookups, lock/unlock, etc). Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match