On Saturday 2007-12-15 02:14, Simon Riggs wrote: > On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote: > > Neil Conway <[EMAIL PROTECTED]> writes: > > > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY > > > to drop (and log) rows that contain malformed data. That is, rows with > > > too many or too few columns, rows that result in constraint violations, > > > and rows containing columns where the data type's input function raises > > > an error. The last case is the only thing that would be a bit tricky to > > > implement, I think: you could use PG_TRY() around the > > > InputFunctionCall, but I guess you'd need a subtransaction to ensure > > > that you reset your state correctly after catching an error. > > > > Yeah. It's the subtransaction per row that's daunting --- not only the > > cycles spent for that, but the ensuing limitation to 4G rows imported > > per COPY. > > I'd suggest doing everything at block level > - wrap each new block of data in a subtransaction > - apply data to the table block by block (can still work with FSM). > - apply indexes in bulk for each block, unique ones first. > > That then gives you a limit of more than 500 trillion rows, which should > be enough for anyone.
Wouldn't it only give you more than 500T rows in the best case? If it hits a bad row it has to back off and roll forward one row and one subtransaction at a time for the failed block. So in the worst case, where there is at least one exception row per block, I think you would still wind up with only a capacity of 4G rows. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org