On Friday 2007-12-14 16:22, Tom Lane wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
> > to drop (and log) rows that contain malformed data. That is, rows with
> > too many or too few columns, rows that result in constraint violations,
> > and rows containing columns where the data type's input function raises
> > an error. The last case is the only thing that would be a bit tricky to
> > implement, I think: you could use PG_TRY() around the InputFunctionCall,
> > but I guess you'd need a subtransaction to ensure that you reset your
> > state correctly after catching an error.
>
> Yeah.  It's the subtransaction per row that's daunting --- not only the
> cycles spent for that, but the ensuing limitation to 4G rows imported
> per COPY.

You could extend the COPY FROM syntax with a COMMIT EVERY n clause.  This 
would help with the 4G subtransaction limit.  The cost to the ETL process is 
that a simple rollback would not be guaranteed send the process back to it's 
initial state.  There are easy ways to deal with the rollback issue though.  

A {NO} RETRY {USING algorithm} clause might be useful.   If the NO RETRY 
option is selected then the COPY FROM can run without subtransactions and in 
excess of the 4G per transaction limit.  NO RETRY should be the default since 
it preserves the legacy behavior of COPY FROM.

You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not give the 
option of sending exceptions to a table since they are presumably malformed, 
otherwise they would not be exceptions.  (Users should re-process exception 
files if they want an if good then table a else exception to table b ...)

EXCEPTIONS TO and NO RETRY would be mutually exclusive.


> If we could somehow only do a subtransaction per failure, things would
> be much better, but I don't see how.



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to