Greg Smith wrote:
After some thought, I think that Andrew's feature *is* generally
applicable, if done as IGNORE COLUMN COUNT (or, more likely,
column_count=ignore). I can think of a lot of data sets where column
count is jagged and you want to do ELT instead of ETL.

Exactly, the ELT approach gives you so many more options for cleaning up the data that I think it would be used more if it weren't so hard to do in Postgres right now.



+1. That's exactly what my client wants to do. They know perfectly well that they get junk data. They want to get it into the database with a minimum of fuss where they will have the right tools for checking and cleaning it. If they have to spend effort whacking it into shape just to get it into the database, then their cleanup effort essentially has to be done in two pieces, part inside and part outside the database.



While complicated, COPY is a pretty walled off command of around 3500 lines of code, and the hackery required here is pretty small. For example, it turns out we do already have the code to get it to ignore column overruns here, and it's all of 50 new lines--much of which is shared with code that does other error ignoring bits too. It's easy to make a case for a grand future extensibility cleanup here, but it's really not necessary to provide a significant benefit here for the cases I mentioned. And I would guess the maintenance burden of a more general solution has to be higher than a simple implementation of the feature list I gave in my last message.

In short: there's a presumption that adding any error-ignoring code would require significant contortions. I don't think that's really true though, and would like to keep open the possibilty of accepting some simple but useful ad-hoc features in this area, even if they don't solve every possible problem in this space just yet.



Right. What I proposed would not have been terribly invasive or difficult, certainly less so than what seems to be our direction by an order of magnitude at least. I don't for a moment accept the assertion that we can get a general solution for the same effort.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to