>> 2) A modified command syntax for introducing a direct single row error >> handling. By direct I mean - a row that if rejected from within the COPY >> command context does not throw an error and rollsback the whole transaction. >> Instead the error is caught and recorded elsewhere, maybe in some error >> table, with some more information that can later on be retrieved. The >> following rows continue to be processed. This way there is barely any error >> handling overhead. > > Is there any idea on exactly how would this be done? Do you plan on > using savepoints to implement it? I fail to see how is this "barely any > overhead". Savepoints are not that expensive but they are not free either. > (No, I haven't measured it.)
Good question, I am not entirely sure if this is possible yet, as I didn't think it through entirely yet. I guess data errors could be divided into 2 main categories: mal-formed data where error is detected even before forming a tuple, and the other is errors that are caused by some constraint violation, that is, after the tuple is formed and inserted. >From what I hear and experience the big majority of errors are of the first type. In that case the error could be caught, the data line + line number + error description could be inserted into an ERROR table (all TEXT fields), and then COPY can skip forming a tuple, and move to parsing the next. In this process there is barely any overhead. The more difficult part obviously is handling the second error type, which I haven't looked at yet deeply. Hopefully it is not impossible to do while keeping transaction integrity (Any ideas anyone?). The overhead for this one will probably be larger, but again, we expect those to happen less (in most cases at least). Nevertheless, it is surely much faster than recursively narrowing down batch sizes. Alon. ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match