Dimitri Fontaine <[EMAIL PROTECTED]> writes: > Here's a proposal for COPY to support the T part of an ETL, that is adding > the > capability for COPY FROM to Transform the data it gets.
> The idea is quite simple: adding to COPY FROM the option to run a function on > the data before to call datatype_in functions. The major concern I have about this is to ensure that no detectable overhead is added to COPY when the feature isn't being used. I am not actually convinced that the column-by-column design you seem to have in mind is worth anything. The examples that I remember seeing often involve removing columns, generating one column from multiple ones or vice versa, dealing with nonstandard column delimiters, etc. What would makes sense in my mind is a single function taking and returning text, which is invoked once on each complete input line before it is broken into fields. This is, of course, just a substitute for running a sed or perl or similar script over the data before feeding it to COPY --- and probably not an amazingly good substitute at that. For instance, assuming you like perl for text-wrangling, I'd fully expect the function approach to be slower than an external script because of the large overhead of getting into and out of libperl for each line, In situations where it's actually useful to apply SQL functions rather than text-mangling operations to the data, you always have the option to COPY into a temp table and then do INSERT/SELECT from there. So the whole thing seems just marginally attractive to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers