"Tom Lane" <[EMAIL PROTECTED]> writes: > Dimitri Fontaine <[EMAIL PROTECTED]> writes: >> Here's a proposal for COPY to support the T part of an ETL, that is adding >> the >> capability for COPY FROM to Transform the data it gets. > >> The idea is quite simple: adding to COPY FROM the option to run a function >> on >> the data before to call datatype_in functions. > > The major concern I have about this is to ensure that no detectable > overhead is added to COPY when the feature isn't being used. > > I am not actually convinced that the column-by-column design you seem to > have in mind is worth anything. The examples that I remember seeing > often involve removing columns, generating one column from multiple ones > or vice versa, dealing with nonstandard column delimiters, etc. What > would makes sense in my mind is a single function taking and returning > text, which is invoked once on each complete input line before it is > broken into fields.
I think not having to deal with separating fields is actually one of the few reasons to do this within COPY. If you can separate out yourself or need to do something more clever than COPY is capable of to split the columns then you're better off preprocessing it with perl or something anyways. To that end all the other use cases you describe could be handled with his plan. There's nothing stopping you from doing CREATE READER foo (a integer, b integer) INSERT INTO b (SELECT a+b FROM foo); or INSERT INTO b (SELECT 1, a, b, greatest(a,b) FROM foo) However I'm not sure we even need new syntax for CREATE READER. I would think something like this would make more sense: CREATE FUNCTION transform(integer, integer) RETURNS SETOF b; COPY b FROM 'foo' USING transform(integer,integer); > So the whole thing seems just marginally attractive to me. Everything about ETL is only marginally attractive, but it's something people spend a lot of time doing. Nobody's come up with any particularly clean solutions I think. AFAIK the state of the art is actually to load the data into a table which closely matches the source material, sometimes just columns of text. Then copy it all to another table doing transformations. Not impressed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers