Greg Smith wrote:
After some thought, I think that Andrew's feature *is* generally
applicable, if done as IGNORE COLUMN COUNT (or, more likely,
column_count=ignore). I can think of a lot of data sets where column
count is jagged and you want to do ELT instead of ETL.
Exactly, the ELT approach gives you so many more options for cleaning
up the data that I think it would be used more if it weren't so hard
to do in Postgres right now.
+1. That's exactly what my client wants to do. They know perfectly well
that they get junk data. They want to get it into the database with a
minimum of fuss where they will have the right tools for checking and
cleaning it. If they have to spend effort whacking it into shape just to
get it into the database, then their cleanup effort essentially has to
be done in two pieces, part inside and part outside the database.
While complicated, COPY is a pretty walled off command of around 3500
lines of code, and the hackery required here is pretty small. For
example, it turns out we do already have the code to get it to ignore
column overruns here, and it's all of 50 new lines--much of which is
shared with code that does other error ignoring bits too. It's easy to
make a case for a grand future extensibility cleanup here, but it's
really not necessary to provide a significant benefit here for the
cases I mentioned. And I would guess the maintenance burden of a more
general solution has to be higher than a simple implementation of the
feature list I gave in my last message.
In short: there's a presumption that adding any error-ignoring code
would require significant contortions. I don't think that's really
true though, and would like to keep open the possibilty of accepting
some simple but useful ad-hoc features in this area, even if they
don't solve every possible problem in this space just yet.
Right. What I proposed would not have been terribly invasive or
difficult, certainly less so than what seems to be our direction by an
order of magnitude at least. I don't for a moment accept the assertion
that we can get a general solution for the same effort.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers