Tom Lane wrote:
Ultimately, there's always going to be a tradeoff between speed and
flexibility.  It may be that we should just say "if you want to import
dirty data, it's gonna cost ya" and not worry about the speed penalty
of subtransaction-per-row.  But that still leaves us with the 2^32
limit.  I wonder whether we could break down COPY into sub-sub
transactions to work around that...
Regarding that tradeoff between speed and flexibility I think we could propose multiple options:
- maximum speed: current implementation fails on first error
- speed with error logging: copy command fails if there is an error but continue to log all errors - speed with error logging best effort: no use of sub-transactions but errors that can safely be trapped with pg_try/catch (no index violation, no before insert trigger, etc...) are logged and command can complete - pre-loading (2-phase copy): phase 1: copy good tuples into a [temp] table and bad tuples into an error table. phase 2: push good tuples to destination table. Note that if phase 2 fails, it could be retried since the temp table would be dropped only on success of phase 2. - slow but flexible: have every row in a sub-transaction -> is there any real benefits compared to pg_loader?

Tom was also suggesting 'refactoring COPY into a series of steps that the user can control'. What would these steps be? Would that be per row and allow to discard a bad tuple?

Emmanuel

--
Emmanuel Cecchet
FTO @ Frog Thinker Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to