>> 2) A modified command syntax for introducing a direct single row error
>> handling. By direct I mean - a row that if rejected from within the COPY
>> command context does not throw an error and rollsback the whole transaction.
>> Instead the error is caught and recorded elsewhere, maybe in some error
>> table, with some more information that can later on be retrieved. The
>> following rows continue to be processed. This way there is barely any error
>> handling overhead.
> 
> Is there any idea on exactly how would this be done?  Do you plan on
> using savepoints to implement it?  I fail to see how is this "barely any
> overhead".  Savepoints are not that expensive but they are not free either.
> (No, I haven't measured it.)

Good question, I am not entirely sure if this is possible yet, as I didn't
think it through entirely yet.

I guess data errors could be divided into 2 main categories: mal-formed data
where error is detected even before forming a tuple, and the other is errors
that are caused by some constraint violation, that is, after the tuple is
formed and inserted.

>From what I hear and experience the big majority of errors are of the first
type. In that case the error could be caught, the data line + line number +
error description could be inserted into an ERROR table (all TEXT fields),
and then COPY can skip forming a tuple, and move to parsing the next. In
this process there is barely any overhead.

The more difficult part obviously is handling the second error type, which I
haven't looked at yet deeply. Hopefully it is not impossible to do while
keeping transaction integrity (Any ideas anyone?). The overhead for this one
will probably be larger, but again, we expect those to happen less (in most
cases at least). Nevertheless, it is surely much faster than recursively
narrowing down batch sizes.


Alon.



---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to