Thank you for your comments Peter, there are some points that I did not think about before.
I am not going to start with "speculative insertion" right now, but it would be very useful, if you give me a point, where to start. Maybe I will at least try to evaluate the complexity of the problem.
Initially I was thinking only about malformed rows, e.g. less or extra columns. Honestly, I did not know that there are so many levels and ways where error can occur. So currently (and especially after your comments) I prefer to focus only on the following list of errors: 1) File format issues a. Less columns than needed b. Extra columns 2) I am doubt about type mismatch. It is possible to imagine a situation when, e.g. some integers are exported as int, and some as "int", but I am not sure that is is a common situation. 3) Some constraint violations, e.g. unique index. First appeared to be easy achievable without subtransactions. I have created a proof of concept version of copy, where the errors handling is turned on by default. Please, see small patch attached (applicable to 76b11e8a43eca4612dfccfe7f3ebd293fb8a46ec) or GUI version on GitHub https://github.com/ololobus/postgres/pull/1/files. It throws warnings instead of errors for malformed lines with less/extra columns and reports line number. Second is probably achievable without subtransactions via the PG_TRY/PG_CATCH around heap_form_tuple, since it is not yet inserted into the heap. But third is questionable without subtransactions, since even if we check constraints once, there maybe various before/after triggers which can modify tuple, so it will not satisfy them. Corresponding comment inside copy.c states: "Note that a BR trigger might modify tuple such that the partition constraint is no satisfied, so we need to check in that case." Thus, there are maybe different situations here, as I understand. However, it a point where "speculative insertion" is able to help. These three cases should cover most real-life scenarios.
Now, I have some doubts about it too. If there is an encoding problem, it is probably about the whole file, not only a few rows. Alexey |
copy-errors-v0.1.patch
Description: Binary data