Neil Conway <[EMAIL PROTECTED]> writes: > One approach would be to essentially implement the pg_bulkloader > approach inside the backend. That is, begin by doing a subtransaction > for every k rows (with k = 1000, say). If you get any errors, then > either repeat the process with k/2 until you locate the individual > row(s) causing the trouble, or perhaps just immediately switch to k = 1. > Fairly ugly though, and would be quite slow for data sets with a high > proportion of erroneous data.
You could make it self-tuning, perhaps: initially, or after an error, set k = 1, and increase k after a successful set of rows. > Another approach would be to distinguish between errors that require a > subtransaction to recover to a consistent state, and less serious errors > that don't have this requirement (e.g. invalid input to a data type > input function). If all the errors that we want to tolerate during a > bulk load fall into the latter category, we can do without > subtransactions. I think such an approach is doomed to hopeless unreliability. There is no concept of an error that doesn't require a transaction abort in the system now, and that doesn't seem to me like something that can be successfully bolted on after the fact. Also, there's a lot of bookkeeping (eg buffer pins) that has to be cleaned up regardless of the exact nature of the error, and all those mechanisms are hung off transactions. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings