As I understand it, COPY is intended as a simple & fast way to get clean data 
into Postgres.
It is not intended to groom data during the process. There are a variety of ETL 
tools that can apply filters & rules to data streams to do what you are asking, 
so no need to invent another Postgres specific wheel.
That said, there are some ways to do this without a formal ETL application.
If the data rules you want to apply can be provided in a script, an approach I 
use is to pipe the data file through a set of checks & filters into a COPY 
command reading from STDIN. Very effective on Linux, Windows doesn't really do 
this very well.
eg: to change any instance of Grent in your data to great, use:
cat <file> | sed 's/Grent/great/' | psql <db> -c "copy <table> from STDIN with 
delimiter ',' null '';" 
You can chain as many awk, sed, tr, grep, etc commands as you need together to 
transform the data as required on the way to COPY
A Postgres only approach that I've also used is to create a loading table with 
columns to match the incoming data, but each column is an unconstrained text or 
varchar. Then modify these data to remove any errors using SQL, finishing up 
with a select to insert the cleaned data into the final table. If the final 
table has appropriate constraints to validate the data, only valid data will 
get there. As you find errors, you fix them & rerun the insert until it works.
HTH,
Brent Wood

 
 
 On Friday, September 26, 2025 at 07:49:09 AM GMT+12, Bryan Sayer 
<[email protected]> wrote: 





 
 
Hi,

I'm not sure if this is the best list to ask this (and I am very new to 
PostgreSQL) but I think more options are needed in the COPY from command, in 
order to better deal with exceptions. By exceptions I mean data not consistent 
with the format, empty rows, extra delimiters at the end of rows, etc.

Is there someone or a party that deals with the details of the COPY command 
that I could discuss this with?

Just for context, I am dealing with delimited data of millions of rows and 
perhaps 60 columns or so that I wish to read into tables in a Postgres 17 
database under Windows 11, with everything on local drives.

-- 
Bryan Sayer
Retired Demographer/Statistician
In a world in which you can be anything, be kind 




Reply via email to