2018-05-20 20:48 GMT-03:00 Carter Thaxton <carter.thax...@gmail.com>: > Many times I've wanted to export a subset of a database, using some sort of > row filter condition on some of the large tables. E.g. copying a production > database to a staging environment, but with some time series data only from > the past month. > How would you handle foreign keys? It seems easier to produce a dump that won't restore.
> We have the existing options: > --include-table=table (and its -t synonym) > --exclude-table=table > --exclude-table-data=table > > I propose a new option: > --include-table-data-where=table:filter_clause > I remembered an old thread [1]. At that time pg_dump was not so decoupled from the backend. We are far from being decoupled in a way that someone can write his own pg_dump using only calls from a library. I'm not sure pg_dump is the right place to add another ETL parameter. We already have too much parameters that could break a restore (flexibility is always welcome but too much is not so good). > One would use this option as follows: > > pg_dump --include-table-data-where=largetable:"created_at >= '2018-05-01'" > database_name > How would you check that that expression is correct? Every parameter could quote its value. It means that your parameter have to escape the quote in '2018-05-01'. Another problem is that your spec does not show us how you would handle tables like Foo.Bar or "foo:bar" (colon have to be escaped)? > The filter_clause is used as the contents of a WHERE clause when querying > the data to generate the COPY statement produced by pg_dump. > You are forgetting about --inserts parameter. Could I use --include-table-data-where and --inserts? [1] https://www.postgresql.org/message-id/1212299813.17810.17.camel%40ubuntu -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento