On Wed, Jun 7, 2017 at 12:34 PM, Alex K <kondratov.alek...@gmail.com> wrote: > (1) One of my mentors--Alvaro Herrera--suggested me to have a look on the > UPSERT.
> It may be a good point to be able to achieve the same functionality > as during the ON CONFLICT DO NOTHING, when COPY actually inserts tuples > and errors handling is turned on. It could additionally reduce number of > failed > subtransactions and reduce XIDs consumption, while still ignoring some > common > errors like unique index violation. Alvaro and I talked about this informally at PGCon. > Adding a full support of ON CONFLICT DO NOTHING/UPDATE to COPY seems > to be a large separated task and is out of the current project scope, but > maybe there is > a relatively simple way to somehow perform internally tuples insert with > ON CONFLICT DO NOTHING? I have added Peter Geoghegan to cc, as > I understand he is the major contributor of UPSERT in PostgreSQL. It would > be great > if he will answer this question. I think that there is a way of making COPY use "speculative insertion", so that it behaves the same as ON CONFLICT DO NOTHING with no inference specification. Whether or not this is useful depends on a lot of things. You seem to be talking about doing this as an optimization on top of a base feature that does the main thing you want (captures all errors within an implementation level subxact without passing them to the client). That could make sense, as a way of preventing extreme bloat for a very bad case where almost all inserts have conflicts. (This seems quite possible, whereas it seems much less likely that users would have an input file simple full of illformed tuples.) I think that you need to more formally identify what errors your new COPY error handling will need to swallow. I'm not sure if it's possible to avoid using subtransactions all together, but speculative insertion would help if you find that you can do it without subtransactions. Using subtransactions is always going to be a bit ugly, because you'll need to continually reassess whether or not you're batching insertions together at the right granularity (that is, that you've weighed the rate of XID consumption against how much work you lose when a batched transaction has to be "replayed" to include things that are known to be valid). And, if you care about duplicate violations, then you can't really be sure that replaying a "known good" tuple will stay good from one moment to the next. My advice right now is: see if you can figure out a way of doing what you want without subtransactions at all, possibly by cutting some scope. For example, maybe it would be satisfactory to have the implementation just ignore constraint violations, but still raise errors for invalid input for types. Is there really much value in ignoring errors due to invalid encoding? It's not as if such problems can be reliably detected today. If you use the wrong encoding, and ignore some errors that COPY would generally raise, then there is an excellent chance that you'll still insert some remaining rows with text that has been incorrectly interpreted as valid in the database encoding -- some text datums are bound to accidentally appear valid. There are probably similar issues with other types. It's not clear what the point is at which the user is no longer helped by ignoring problems, because we cannot reliably detect *all* problems at the level of each row. If you must ignore errors within the input functions of types, then maybe you can optionally let the user do that by way of a "dry run", where the entire input file is examined for basic structural soundness ahead of considering constraints. Any errors are saved then and there, in a format that can be used to make sure that those entries are skipped on a later COPY. As a further enhancement, in the future, the user might then be able to define special transform functions that correct the errors for those rows only. You kind of need to see all the faulty rows together to do something like that, so a dry run could make a lot of sense. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers