Re: [PATCH] minor bug fix for pg_dump --clean
Hi, Good catch! Here are several points for improvement: 1. pg_dump.c:17380 Maybe better to write simpler appendPQExpBuffer(delcmd, "CREATE SCHEMA IF NOT EXISTS %s;\n", tbinfo->dobj.namespace->dobj.name); because there is a schema name inside the `tbinfo->dobj.namespace->dobj.name ` 2. pg_backup_archiver.c:588 Here are no necessary spaces before and after braces, and no spaces around the '+' sign. ( strncmp(dropStmt, "CREATE SCHEMA IF NOT EXISTS", 27) == 0 && strstr(dropStmt+29, "CREATE OR REPLACE VIEW") )) Best regards, Viktoria Shepard чт, 1 сент. 2022 г. в 12:13, Frédéric Yhuel : > Hello, > > When using pg_dump (or pg_restore) with option "--clean", there is some > SQL code to drop every objects at the beginning. > > The DROP statement for a view involving circular dependencies is : > > CREATE OR REPLACE VIEW [...] > > (see commit message of d8c05aff for a much better explanation) > > If the view is not in the "public" schema, and the target database is > empty, this statement fails, because the schema hasn't been created yet. > > The attached patches are a TAP test which can be used to reproduce the > bug, and a proposed fix. They apply to the master branch. > > Best regards, > Frédéric
Re: Re[2]: Possible solution for masking chosen columns when using pg_dump
Hi, Thank you, Oleg Tselebrovskiy, for your valuable review, here are the fixes Best regards, Viktoria Shepard ср, 12 окт. 2022 г. в 12:19, Виктория Шепард : > Hi, > > Here is an idea of how to read masking options from a file. Please, take a > look. > > пн, 10 окт. 2022 г. в 14:54, Олег Целебровский >: > >> Hi, >> >> I applied most of suggestions: used separate files for most of added >> code, fixed typos/mistakes, got rid of that pesky TODO that was already >> implemented, just not deleted. >> >> Added tests (and functionality) for cases when you need to mask columns >> in tables with the same name in different schemas. If schema is not >> specified, then columns in all tables with specified name are masked >> (Example - pg_dump -t ‘t0’ --mask-columns id --mask-function mask_int will >> mask all ids in all tables with names ‘t0’ in all existing schemas). >> >> Wrote comments for all ‘magic numbers’ >> >> About that >> >> >- Also it can be hard to use a lot of different functions for different >> fields, maybe it would be better to set up functions in a file. >> >> I agree with that, but I know about at least 2 other patches (both are >> WIP, but still) that are interacting with reading command-line options from >> file. And if everyone will write their own version of reading command-line >> options from file, it will quickly get confusing. >> >> A solution to that problem is another patch that will put all options >> from file (one file for any possible options, from existing to future ones) >> into **argv in main, so that pg_dump can process them as if they came form >> command line. >> >> >> Пятница, 7 октября 2022, 8:01 +07:00 от Виктория Шепард < >> we.vikt...@gmail.com>: >> >> Hi, >> I took a look, here are several suggestions for improvement: >> >> - Masking is not a main functionality of pg_dump and it is better to >> write most of the connected things in a separate file like parallel.c or >> dumputils.c. This will help slow down the growth of an already huge pg_dump >> file. >> >> - Also it can be hard to use a lot of different functions for different >> fields, maybe it would be better to set up functions in a file. >> >> - How will it work for the same field and tables in the different >> schemas? Can we set up the exact schema for the field? >> >> - misspelling in a word >> >/* >> >* Add all columns and funcions to list of MaskColumnInfo structures, >> >*/ >> >> - Why did you use 256 here? >> > char* table = (char*) pg_malloc(256 * sizeof(char)); >> Also for malloc you need malloc on 1 symbol more because you have to >> store '\0' symbol. >> >> - Instead of addFuncToDatabase you can run your query using something >> already defined from fe_utils/query_utils.c. And It will be better to set >> up a connection only once and create all functions. Establishing a >> connection is a resource-intensive procedure. There are a lot of magic >> numbers, better to leave some comments explaining why there are 64 or 512. >> >> - It seems that you are not using temp_string >> > char *temp_string = (char*)malloc(256 * sizeof(char)); >> >> - Grammar issues >> >/* >> >* mask_column_info_list contains info about every to-be-masked column: >> >* its name, a name its table (if nothing is specified - mask all columns >> with this name), >> >* name of masking function and name of schema containing this function >> (public if not specified) >> >*/ >> the name of its table >> >> >> пн, 3 окт. 2022 г. в 20:45, Julien Rouhaud > >: >> >> Hi, >> >> On Mon, Oct 03, 2022 at 06:30:17PM +0300, Олег Целебровский wrote: >> > >> > Hello, here's my take on masking data when using pg_dump >> > >> > The main idea is using PostgreSQL functions to replace data during a >> SELECT. >> > When table data is dumped SELECT a,b,c,d ... from ... query is >> generated, the columns that are marked for masking are replaced with result >> of functions on those columns >> > Example: columns name, count are to be masked, so the query will look >> as such: SELECT id, mask_text(name), mask_int(count), date from ... >> > >> > So about the interface: I added 2 more command-line options: >> > >> > --mask-columns, which specifies what columns from what tables will be >> masked >> > usage example: >> > --mask-columns "t1.name, t2.descriptio
Re: Re[2]: Possible solution for masking chosen columns when using pg_dump
Hi, Here is an idea of how to read masking options from a file. Please, take a look. пн, 10 окт. 2022 г. в 14:54, Олег Целебровский : > Hi, > > I applied most of suggestions: used separate files for most of added code, > fixed typos/mistakes, got rid of that pesky TODO that was already > implemented, just not deleted. > > Added tests (and functionality) for cases when you need to mask columns in > tables with the same name in different schemas. If schema is not specified, > then columns in all tables with specified name are masked (Example - > pg_dump -t ‘t0’ --mask-columns id --mask-function mask_int will mask all > ids in all tables with names ‘t0’ in all existing schemas). > > Wrote comments for all ‘magic numbers’ > > About that > > >- Also it can be hard to use a lot of different functions for different > fields, maybe it would be better to set up functions in a file. > > I agree with that, but I know about at least 2 other patches (both are > WIP, but still) that are interacting with reading command-line options from > file. And if everyone will write their own version of reading command-line > options from file, it will quickly get confusing. > > A solution to that problem is another patch that will put all options from > file (one file for any possible options, from existing to future ones) into > **argv in main, so that pg_dump can process them as if they came form > command line. > > > Пятница, 7 октября 2022, 8:01 +07:00 от Виктория Шепард < > we.vikt...@gmail.com>: > > Hi, > I took a look, here are several suggestions for improvement: > > - Masking is not a main functionality of pg_dump and it is better to write > most of the connected things in a separate file like parallel.c or > dumputils.c. This will help slow down the growth of an already huge pg_dump > file. > > - Also it can be hard to use a lot of different functions for different > fields, maybe it would be better to set up functions in a file. > > - How will it work for the same field and tables in the different schemas? > Can we set up the exact schema for the field? > > - misspelling in a word > >/* > >* Add all columns and funcions to list of MaskColumnInfo structures, > >*/ > > - Why did you use 256 here? > > char* table = (char*) pg_malloc(256 * sizeof(char)); > Also for malloc you need malloc on 1 symbol more because you have to store > '\0' symbol. > > - Instead of addFuncToDatabase you can run your query using something > already defined from fe_utils/query_utils.c. And It will be better to set > up a connection only once and create all functions. Establishing a > connection is a resource-intensive procedure. There are a lot of magic > numbers, better to leave some comments explaining why there are 64 or 512. > > - It seems that you are not using temp_string > > char *temp_string = (char*)malloc(256 * sizeof(char)); > > - Grammar issues > >/* > >* mask_column_info_list contains info about every to-be-masked column: > >* its name, a name its table (if nothing is specified - mask all columns > with this name), > >* name of masking function and name of schema containing this function > (public if not specified) > >*/ > the name of its table > > > пн, 3 окт. 2022 г. в 20:45, Julien Rouhaud >: > > Hi, > > On Mon, Oct 03, 2022 at 06:30:17PM +0300, Олег Целебровский wrote: > > > > Hello, here's my take on masking data when using pg_dump > > > > The main idea is using PostgreSQL functions to replace data during a > SELECT. > > When table data is dumped SELECT a,b,c,d ... from ... query is > generated, the columns that are marked for masking are replaced with result > of functions on those columns > > Example: columns name, count are to be masked, so the query will look as > such: SELECT id, mask_text(name), mask_int(count), date from ... > > > > So about the interface: I added 2 more command-line options: > > > > --mask-columns, which specifies what columns from what tables will be > masked > > usage example: > > --mask-columns "t1.name, t2.description" - both columns > will be masked with the same corresponding function > > or --mask-columns name - ALL columns with name "name" from > all dumped tables will be masked with correspoding function > > > > --mask-function, which specifies what functions will mask data > > usage example: > > --mask-function mask_int - corresponding columns will be > masked with function named "mask_int" from default schema (public) > > or --mask-function my_schema.mask_varchar - same as above >
Re: Possible solution for masking chosen columns when using pg_dump
Hi, I took a look, here are several suggestions for improvement: - Masking is not a main functionality of pg_dump and it is better to write most of the connected things in a separate file like parallel.c or dumputils.c. This will help slow down the growth of an already huge pg_dump file. - Also it can be hard to use a lot of different functions for different fields, maybe it would be better to set up functions in a file. - How will it work for the same field and tables in the different schemas? Can we set up the exact schema for the field? - misspelling in a word >/* >* Add all columns and funcions to list of MaskColumnInfo structures, >*/ - Why did you use 256 here? > char* table = (char*) pg_malloc(256 * sizeof(char)); Also for malloc you need malloc on 1 symbol more because you have to store '\0' symbol. - Instead of addFuncToDatabase you can run your query using something already defined from fe_utils/query_utils.c. And It will be better to set up a connection only once and create all functions. Establishing a connection is a resource-intensive procedure. There are a lot of magic numbers, better to leave some comments explaining why there are 64 or 512. - It seems that you are not using temp_string > char *temp_string = (char*)malloc(256 * sizeof(char)); - Grammar issues >/* >* mask_column_info_list contains info about every to-be-masked column: >* its name, a name its table (if nothing is specified - mask all columns with this name), >* name of masking function and name of schema containing this function (public if not specified) >*/ the name of its table пн, 3 окт. 2022 г. в 20:45, Julien Rouhaud : > Hi, > > On Mon, Oct 03, 2022 at 06:30:17PM +0300, Олег Целебровский wrote: > > > > Hello, here's my take on masking data when using pg_dump > > > > The main idea is using PostgreSQL functions to replace data during a > SELECT. > > When table data is dumped SELECT a,b,c,d ... from ... query is > generated, the columns that are marked for masking are replaced with result > of functions on those columns > > Example: columns name, count are to be masked, so the query will look as > such: SELECT id, mask_text(name), mask_int(count), date from ... > > > > So about the interface: I added 2 more command-line options: > > > > --mask-columns, which specifies what columns from what tables will be > masked > > usage example: > > --mask-columns "t1.name, t2.description" - both columns > will be masked with the same corresponding function > > or --mask-columns name - ALL columns with name "name" from > all dumped tables will be masked with correspoding function > > > > --mask-function, which specifies what functions will mask data > > usage example: > > --mask-function mask_int - corresponding columns will be > masked with function named "mask_int" from default schema (public) > > or --mask-function my_schema.mask_varchar - same as above > but with specified schema where the function is stored > > or --mask-function somedir/filename - the function is > "defined" here - more on the structure below > > FTR I wrote an extension POC [1] last weekend that does that but on the > backend > side. The main advantage is that it's working with any existing versions > of > pg_dump (or any client relying on COPY or even plain interactive SQL > statements), and that the DBA can force a dedicated role to only get a > masked > dump, even if they forgot to ask for it. > > I only had a quick look at your patch but it seems that you left some todo > in > russian, which isn't helpful at least to me. > > [1] https://github.com/rjuju/pg_anonymize > > >