Re: Implementing Incremental View Maintenance
On Wed, 24 Nov 2021 04:31:25 + "r.takahash...@fujitsu.com" wrote: > > > ivm=# create table t (c1 int, c2 int); > > > CREATE TABLE > > > ivm=# create incremental materialized view ivm_t as select distinct c1 > > > from t; > > > NOTICE: created index "ivm_t_index" on materialized view "ivm_t" > > > SELECT 0 > > > > > > Then I executed pg_dump. > > > > > > In the dump, the following SQLs appear. > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS > > > SELECT DISTINCT t.c1 > > >FROM public.t > > > WITH NO DATA; > > > > > > ALTER TABLE ONLY public.ivm_t > > > ADD CONSTRAINT ivm_t_index UNIQUE (c1); > > > > > > If I execute psql with the result of pg_dump, following error occurs. > > > > > > ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation > > "ivm_t" > > > DETAIL: This operation is not supported for materialized views. > > > > Good catch! It was my mistake creating unique constraints on IMMV in spite > > of > > we cannot defined them via SQL. I'll fix it to use unique indexes instead of > > constraints. > > I checked the same procedure on v24 patch. > But following error occurs instead of the original error. > > ERROR: relation "ivm_t_index" already exists Thank you for pointing out it! Hmmm, an index is created when IMMV is defined, so CREAE INDEX called after this would fail... Maybe, we should not create any index automatically if IMMV is created WITH NO DATA. I'll fix it after some investigation. Regards, Yugo Nagata -- Yugo NAGATA
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Tue, Oct 5, 2021 at 7:07 PM Dilip Kumar wrote: > > Patch details: > 0001 to 0006 implements an approach1 > 0007 removes the code of pg_class scanning and adds the directory scan. > I had a scan through the patches, though have not yet actually run any tests to try to better gauge their benefit. I do have some initial review comments though: 0003 src/backend/commands/tablecmds.c (1) RelationCopyAllFork() In the following comment: +/* + * Copy source smgr all fork's data to the destination smgr. + */ Shouldn't it say "smgr relation"? Also, you could additionally say ", using a specified fork data copying function." or something like that, to account for the additional argument. 0006 src/backend/commands/dbcommands.c (1) function prototype location The following prototype is currently located in the "non-export function prototypes" section of the source file, but it's not static - shouldn't it be in dbcommands.h? +void RelationCopyStorageUsingBuffer(SMgrRelation src, SMgrRelation dst, +ForkNumber forkNum, char relpersistence); (2) CreateDirAndVersionFile() Shouldn't the following code: + fd = OpenTransientFile(versionfile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY); + if (fd < 0 && errno == EEXIST && isRedo) + fd = OpenTransientFile(versionfile, O_RDWR | PG_BINARY); actually be: + fd = OpenTransientFile(versionfile, O_WRONLY | O_CREAT | O_EXCL | PG_BINARY); + if (fd < 0 && errno == EEXIST && isRedo) + fd = OpenTransientFile(versionfile, O_WRONLY | O_TRUNC | PG_BINARY); since we're only writing to that file descriptor and we want to truncate the file if it already exists. The current comment says "... open it in the write mode.", but should say "... open it in write mode." Also, shouldn't you be writing a newline (\n) after the PG_MAJORVERSION ? (compare with code in initdb.c) (3) GetDatabaseRelationList() Shouldn't: + if (PageIsNew(page) || PageIsEmpty(page)) +continue; be: + if (PageIsNew(page) || PageIsEmpty(page)) + { +UnlockReleaseBuffer(buf); +continue; + } ? Also, in the following code: + if (rnodelist == NULL) +rnodelist = list_make1(relinfo); + else +rnodelist = lappend(rnodelist, relinfo); it should really be "== NIL" rather than "== NULL". But in any case, that code can just be: rnodelist = lappend(rnodelist, relinfo); because lappend() will create a list if the first arg is NIL. (4) RelationCopyStorageUsingBuffer() In the function comments, IMO it is better to use "APIs" instead of "apis". Also, better to use "get" instead of "got" in the following comment: + /* If we got a cancel signal during the copy of the data, quit */ 0007 (I think I prefer the first approach rather than this 2nd approach) src/backend/commands/dbcommands.c (1) createdb() pfree(srcpath) seems to be missing, in the case that CopyDatabase() gets called. (2) GetRelfileNodeFromFileName() %s in sscanf() allows an unbounded read and is considered potentially dangerous (allows buffer overflow), especially here where FORKNAMECHARS is so small. + nmatch = sscanf(filename, "%u_%s", , forkname); how about using the following instead in this case: + nmatch = sscanf(filename, "%u_%4s", , forkname); ? (even if there were > 4 chars after the underscore, it would still match and InvalidOid would be returned because nmatch==2) Regards, Greg Nancarrow Fujitsu Australia
Re: Deduplicate code updating ControleFile's DBState.
On Thu, Nov 25, 2021 at 10:21:40AM +0530, Amul Sul wrote: > Thanks for the inputs -- moved timestamp setting inside update_controlfile(). I have not check the performance implication of that with a micro benchmark or the like, but I can get behind 0001 on consistency grounds between the backend and the frontend. 0002 does not seem worth the trouble, though, as it is changing only two code paths. -- Michael signature.asc Description: PGP signature
Re: Implementing Incremental View Maintenance
Hello Takahashi-san, On Wed, 24 Nov 2021 04:27:13 + "r.takahash...@fujitsu.com" wrote: > Hi Nagata-san, > > > Sorry for late reply. > > > > However, even if we create triggers recursively on the parents or children, > > we would still > > need more consideration. This is because we will have to convert the format > > of tuple of > > modified table to the format of the table specified in the view for cases > > that the parent > > and some children have different format. > > > > I think supporting partitioned tables can be left for the next release. > > OK. I understand. > In the v24-patch, creating IVM on partions or partition table is prohibited. > It is OK but it should be documented. > > Perhaps, the following statement describe this. > If so, I think the definition of "simple base table" is ambiguous for some > users. > > + IMMVs must be based on simple base tables. It's not supported to > + create them on top of views or materialized views. Oh, I forgot to fix the documentation. I'll fix it. Ragards, Yugo Nagata -- Yugo NAGATA
Re: [BUG]Missing REPLICA IDENTITY check when DROP NOT NULL
On Thu, Nov 25, 2021 at 10:44:53AM +0530, Dilip Kumar wrote: > Good catch, your patch looks fine to me and solves the reported problem. And applied down to 10. A couple of comments in the same area did not get the call, though. -- Michael signature.asc Description: PGP signature
Re: prevent immature WAL streaming
On Wed, Nov 24, 2021 at 2:10 AM Alvaro Herrera wrote: > > On 2021-Nov-23, Tom Lane wrote: > > > We're *still* not out of the woods with 026_overwrite_contrecord.pl, > > as we are continuing to see occasional "mismatching overwritten LSN" > > failures, further down in the test where it tries to start up the > > standby: > > Augh. > > > Looking at adjacent successful runs, it seems that the exact point > > where the "missing contrecord" starts varies substantially, even after > > our previous fix to disable autovacuum in this test. How could that be? > > Well, there is intentionally some variability. Maybe not as much as one > would wish, but I expect that that should explain why that point is not > always the same. > > > It's probably for the best though, because I think this is exposing > > an actual bug that we would not have seen if the start point were > > completely consistent. I have not dug into the code, but it looks to > > me like if the "consistent recovery state" is reached exactly at a > > page boundary (0/1FFE000 in all these cases), then the standby expects > > that to be what the OVERWRITE_CONTRECORD record will point at. But > > actually it points to the first WAL record on that page, resulting > > in a bogus failure. > > So what is happening is that we set state->overwrittenRecPtr to the LSN > of page start, ignoring the page header. Is that the LSN of the first > record in a page? I'll see if I can reproduce the problem. > In XLogReadRecord(), both the variables being compared have inconsistency in the assignment -- one gets assigned from state->currRecPtr where other is from RecPtr. . state->overwrittenRecPtr = state->currRecPtr; . state->abortedRecPtr = RecPtr; . Before the place where assembled flag sets, there is a bunch of code that adjusts RecPtr. I think instead of RecPtr, the latter assignment should use state->currRecPtr as well. Regards, Amul
Re: pg_dump, pg_basebackup don't error out with wrong option for "--format"
On Thu, Nov 25, 2021 at 11:02 AM Dilip Kumar wrote: > > On Thu, Nov 25, 2021 at 10:44 AM Bharath Rupireddy > wrote: > > > > Hi, > > > > I noticed that the pg_dump and pg_basebackup are not erroring out when > > "--fo"/"--for"/"--form"/"--forma"/" are specified(use cases 1 - 4, 7 - > > 9) whereas it fails if a pattern that doesn't match with "format" is > > specified (use case 5, 10). This seems to be true only for "--format" > > option, for other options it just errors out (use case 6). Why is the > > behaviour for "--format" isn't consistent? > > > > Is it something expected? Is the option parsing logic here buggy? > > I think for parsing we use getopt_long(), as per that if you use the > prefix of the string and that is not conflicting with any other option > then that is allowed. So --fo, --for all are accepted, --f will not > be accepted because --file and --format will conflict, --foo is also > not allowed because it is not a valid prefix string of any valid > option string. Yeah, that makes sense. Thanks. Regards, Bharath Rupireddy.
Re: pg_dump, pg_basebackup don't error out with wrong option for "--format"
On Thu, Nov 25, 2021 at 10:44 AM Bharath Rupireddy wrote: > > Hi, > > I noticed that the pg_dump and pg_basebackup are not erroring out when > "--fo"/"--for"/"--form"/"--forma"/" are specified(use cases 1 - 4, 7 - > 9) whereas it fails if a pattern that doesn't match with "format" is > specified (use case 5, 10). This seems to be true only for "--format" > option, for other options it just errors out (use case 6). Why is the > behaviour for "--format" isn't consistent? > > Is it something expected? Is the option parsing logic here buggy? I think for parsing we use getopt_long(), as per that if you use the prefix of the string and that is not conflicting with any other option then that is allowed. So --fo, --for all are accepted, --f will not be accepted because --file and --format will conflict, --foo is also not allowed because it is not a valid prefix string of any valid option string. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [BUG]Missing REPLICA IDENTITY check when DROP NOT NULL
On Thu, Nov 25, 2021 at 8:21 AM tanghy.f...@fujitsu.com wrote: > > On Wed, Nov 24, 2021 7:29 PM, Michael Paquier wrote: > > Thanks for your comment. I agree with you. > I have fixed it and attached v2 patch. Good catch, your patch looks fine to me and solves the reported problem. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pg_dump, pg_basebackup don't error out with wrong option for "--format"
Hi, I noticed that the pg_dump and pg_basebackup are not erroring out when "--fo"/"--for"/"--form"/"--forma"/" are specified(use cases 1 - 4, 7 - 9) whereas it fails if a pattern that doesn't match with "format" is specified (use case 5, 10). This seems to be true only for "--format" option, for other options it just errors out (use case 6). Why is the behaviour for "--format" isn't consistent? Is it something expected? Is the option parsing logic here buggy? Thoughts? Use cases: 1) ./pg_dump --dbname=postgres --host=localhost --port=5432 --username=bharath --no-password --fo=plain --file=mydump.sql 2) ./pg_dump --dbname=postgres --host=localhost --port=5432 --username=bharath --no-password --for=plain --file=mydump.sql 3) ./pg_dump --dbname=postgres --host=localhost --port=5432 --username=bharath --no-password --form=plain --file=mydump.sql 4) ./pg_dump --dbname=postgres --host=localhost --port=5432 --username=bharath --no-password --forma=plain --file=mydump.sql 5) ./pg_dump --dbname=postgres --host=localhost --port=5432 --username=bharath --no-password --foo=plain --file=mydump.sql 6) ./pg_dump --dbname=postgres --host=localhost --port=5432 --username=bharath --no-password --format=plain --fi=mydump.sql 7) ./pg_basebackup -D bkupdata --fo=plain 8) ./pg_basebackup -D bkupdata --for=plain 9) ./pg_basebackup -D bkupdata --forma=plain 10) ./pg_basebackup -D bkupdata --foo=plain Regards, Bharath Rupireddy.
Re: Skipping logical replication transactions on subscriber side
On Wed, Nov 24, 2021 at 5:14 PM Masahiko Sawada wrote: > > Changed. I've removed first_error_time as per discussion on the thread > for adding xact stats. > We also agreed to change the column names to start with last_error_* [1]. Is there a reason to not make those changes? Do you think that we can change it just before committing that patch? I thought it might be better to do it that way now itself. [1] - https://www.postgresql.org/message-id/CAD21AoCQ8z5goy3BCqfk2gn5p8NVH5B-uxO3Xc-dXN-MXVfnKg%40mail.gmail.com -- With Regards, Amit Kapila.
Re: pg_replslotdata - a tool for displaying replication slot information
On Wed, Nov 24, 2021 at 9:40 PM Japin Li wrote: > > > On Wed, 24 Nov 2021 at 23:59, Bharath Rupireddy > wrote: > > On Wed, Nov 24, 2021 at 9:09 PM Bharath Rupireddy > >> > Thoughts? > >> > >> Attaching the rebased v2 patch. > > > > On windows the previous patches were failing, fixed that in the v3 > > patch. I'm really sorry for the noise. > > > > Cool! When I try to use it, there is an error for -v, --verbose option. > > px@ubuntu:~/Codes/postgres/Debug$ pg_replslotdata -v > pg_replslotdata: invalid option -- 'v' > Try "pg_replslotdata --help" for more information. > > This is because the getopt_long() missing 'v' in the third parameter. > > while ((c = getopt_long(argc, argv, "D:v", long_options, NULL)) != -1) Thanks for taking a look at the patch, attaching v4. There are many things that I could do in the patch, for instance, more comments, documentation, code improvements etc. I would like to first know what hackers think about this tool, and then start spending more time on it. Regards, Bharath Rupireddy. v4-0001-pg_replslotdata.patch Description: Binary data
Re: Deduplicate code updating ControleFile's DBState.
On Thu, Nov 11, 2021 at 1:30 AM Bossart, Nathan wrote: > > On 10/1/21, 10:40 PM, "Michael Paquier" wrote: > > On Fri, Oct 01, 2021 at 05:47:45PM +, Bossart, Nathan wrote: > >> I'm inclined to agree that anything that calls update_controlfile() > >> should update the timestamp. > > > > pg_control.h tells that: > > pg_time_t time; /* time stamp of last pg_control update */ > > So, yes, that would be more consistent. > > > >> However, I wonder if the additional > >> calls to time() would have a noticeable impact. > > > > I would not take that lightly either. Now, I don't think that any of > > the code paths where UpdateControlFile() or update_controlfile() is > > called are hot enough to worry about that. > > > > UpdateControlFile(void) > > { > > + ControlFile->time = (pg_time_t) time(NULL); > > update_controlfile(DataDir, ControlFile, true); > > } > > I have to admit that it is a bit strange to do that in the backend but > > not the frontend, so there is a good argument for doing that directly > > in update_controlfile(). pg_resetwal does an update of the time, but > > pg_rewind does not. > Thanks for the inputs -- moved timestamp setting inside update_controlfile(). > I don't see any recent updates to this thread from Amul, so I'm > marking this one as waiting-for-author. > Sorry for the delay, please have a look at the attached version -- changing status to Needs review, thanks. Regards, Amul v5-0001-Do-the-ControlFile-timestamp-setting-in-update_co.patch Description: Binary data v5-0002-Deduplicate-code-updating-ControleFile-s-DBState.patch Description: Binary data
Re: Non-superuser subscription owners
On Thu, Nov 25, 2021 at 6:00 AM Jeff Davis wrote: > > On Fri, 2021-11-19 at 16:45 -0800, Mark Dilger wrote: > > Renamed as 0001 in version 3, as it is the only remaining patch. For > > anyone who reviewed the older patch set, please note that I made some > > changes to the src/test/subscription/t/026_nosuperuser.pl test case > > relative to the prior version. > > We need to do permission checking for WITH CHECK OPTION and RLS. The > patch right now allows the subscription to write data that an RLS > policy forbids. > Won't it be better to just check if the current user is superuser before applying each change as a matter of this first patch? Sorry, I was under impression that first, we want to close the current gap where we allow to proceed with replication if the user's superuser privileges were revoked during replication. To allow non-superusers owners, I thought it might be better to first try to detect the change of ownership as soon as possible instead of at the transaction boundary. -- With Regards, Amit Kapila.
Re: Reduce function call costs on ELF platforms
Hi, On 2021-11-24 17:55:03 -0500, Andrew Dunstan wrote: > On 11/24/21 13:55, Andres Freund wrote: > > On 2021-11-23 17:28:08 +0100, Peter Eisentraut wrote: > >> On 22.11.21 23:32, Tom Lane wrote: > The easier approach for this class of issues is to use the linker option > -Bsymbolic. > >>> I don't recall details, but we've previously rejected the idea of > >>> trying to use -Bsymbolic widely; apparently it has undesirable > >>> side-effects on some platforms. See commit message for e3d77ea6b > >>> (hopefully there's some detail in the email thread [1]). It sounds > >>> like you're not actually proposing that, but I thought it would be > >>> a good idea to note the hazard here. > >> Also, IIRC, -Bsymbolic was once frowned upon by packaging policies, since > >> it > >> prevents use of LD_PRELOAD. I'm not sure what the current thinking there > >> is, however. > > It doesn't break some (most?) of the uses of LD_PRELOAD. In particular, it > > doesn't break things like replacing the malloc implementation. When do you > > have a symbol that you want to override *inside* your library (executables > > already bind to their own symbols at compile time)? I've seen that for > > replacing buggy functions in closed source things, but that's about it? > > > > Which things does it break exactly? -Bsymbolic causes symbols that are defined and referenced within one shared library to use that definition. E.g. if a shared lib has a function "do_something()" and some of its code calls do_something(), you cannot use LD_PRELOAD (or a definition in the main binary) to redirect the call to do_something() inside the shared library to something else. I.e. if a shared library calls a function that's *not* defined within that shared library, -Bsymbolic doesn't have an effect for that symbol. > I have a case where a library that > is LD_PRELOADed calls PQsetSSLKeyPassHook_OpenSSL() in its constructor > function. I'd be very unhappy if that stopped working (and so would our > client). Bsymbolic shouldn't affect that at all. Greetings, Andres Freund
Re: row filtering for logical replication
On Tue, Nov 23, 2021 at 10:52 PM Amit Kapila wrote: > > On Tue, Nov 23, 2021 at 4:58 PM Ajin Cherian wrote: > > > > Attaching a new patchset v41 which includes changes by both Peter and > > myself. > > > ... > I suggest at this stage we can combine 0001, 0003, and 0004. Then move > pg_dump and psql (describe.c) related changes to 0002 and make 0002 as > the last patch in the series. This will help review backend changes > first and then we can look at client-side changes. > The patch combining and reordering was as suggested. BEFORE: v41-0001 Euler's main patch v41-0002 Tab-complete v41-0003 ExprState cache v41-0004 OR/AND v41-0005 Validation walker v41-0006 new/old tuple updates AFTER: v42-0001 main patch <== v41-0001 + v41-0003 + v41-0004 v42-0002 validation walker <== v41-0005 v42-0003 new/old tuple updates <== v41-0006 v42-0004 tab-complete and pgdump <== v41-0002 (plus pgdump code from v41-0001) ~ Please note, I did not remove the describe.c changes from the v42-0001 patch at this time. I left this as-is because I felt the ability for psql \d+ or \dRp+ etc to display the current row-filter is *essential* functionality to be able to test and debug the 0001 patch properly. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: row filtering for logical replication
On Thu, Nov 25, 2021 at 12:03 AM Amit Kapila wrote: > > On Tue, Nov 23, 2021 at 4:58 PM Ajin Cherian wrote: > > > > Attaching a new patchset v41 which includes changes by both Peter and > > myself. > > ... > Few more comments: > = ... > 0005 > 2. > +typedef struct { > + Relation rel; > + bool check_replident; > + Bitmapset *bms_replident; > +} > +rf_context; > > Add rf_context in the same line where } ends. > Fixed in v42* [1] -- [1] https://www.postgresql.org/message-id/CAHut%2BPsGZHvafa3K_RAJ0Agm28W2owjNN%2BqU0EUsSjBNbuXFsQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: row filtering for logical replication
On Wed, Nov 24, 2021 at 8:52 PM vignesh C wrote: > > On Tue, Nov 23, 2021 at 4:58 PM Ajin Cherian wrote: > > > > Attaching a new patchset v41 which includes changes by both Peter and > > myself. > > > > Patches v40-0005 and v40-0006 have been merged to create patch > > v41-0005 which reduces the patches to 6 again. > > Few comments: ... > 2) Should '"delete" or "delete"' be '"delete" or "update"' > --- a/src/backend/catalog/pg_publication.c > +++ b/src/backend/catalog/pg_publication.c > @@ -340,7 +340,7 @@ rowfilter_walker(Node *node, rf_context *context) > * 1. Only certain simple node types are permitted in the expression. See > * function rowfilter_walker for details. > * > - * 2. If the publish operation contains "delete" then only columns that > + * 2. If the publish operation contains "delete" or "delete" then > only columns that > * are allowed by the REPLICA IDENTITY rules are permitted to be used in the > * row-filter WHERE clause. > */ > @@ -352,12 +352,10 @@ rowfilter_expr_checker(Publication *pub, > Relation rel, Node *rfnode) > context.rel = rel; > > /* > -* For "delete", check that filter cols are also valid replica > identity > +* For "delete" or "update", check that filter cols are also > valid replica identity > * cols. Fixed in v42* [1] > 4) This should be included in typedefs.list, also we could add some > comments for this structure > +typedef struct { > + Relationrel; > + Bitmapset *bms_replident; > +} > +rf_context; > Fixed in v42* [1] > 6) typo "filte" should be "filter": > +/* > + * The row filte walker checks that the row filter expression is legal. > + * > + * Rules: Node-type validation > + * --- > + * Allow only simple or compound expressions like: > + * - "(Var Op Const)" or > + * - "(Var Op Const) Bool (Var Op Const)" Fixed in v42* [1] -- [1] https://www.postgresql.org/message-id/CAHut%2BPsGZHvafa3K_RAJ0Agm28W2owjNN%2BqU0EUsSjBNbuXFsQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: row filtering for logical replication
On Tue, Nov 23, 2021 at 4:40 PM tanghy.f...@fujitsu.com wrote: > > On Thursday, November 18, 2021 9:34 AM Peter Smith > wrote: > > > > PSA new set of v40* patches. > > > > Besides, a small comment on 0004 patch: > > +* Multiple row-filter expressions for the same publication > will later be > +* combined by the COPY using OR, but this means if any of > the filters is > > Should we change it to: > Multiple row-filter expressions for the same table ... Fixed in v42* [1] -- [1] https://www.postgresql.org/message-id/CAHut%2BPsGZHvafa3K_RAJ0Agm28W2owjNN%2BqU0EUsSjBNbuXFsQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: row filtering for logical replication
On Tue, Nov 23, 2021 at 8:02 PM Amit Kapila wrote: > > On Thu, Nov 18, 2021 at 11:02 AM Peter Smith wrote: > > > > On Mon, Nov 15, 2021 at 9:31 PM Amit Kapila wrote: > > > > > > > > 4. I think we should add some comments in pgoutput_row_filter() as to > > > why we are caching the row_filter here instead of > > > get_rel_sync_entry()? That has been discussed multiple times so it is > > > better to capture that in comments. > > > > Added comment in v40 [1] > > > > I think apart from truncate and error cases, it can also happen for > other operations because we decide whether to publish a change > (operation) after calling get_rel_sync_entry() in pgoutput_change. I > think we can reflect that as well in the comment. Fixed in v42* [1] -- [1] https://www.postgresql.org/message-id/CAHut%2BPsGZHvafa3K_RAJ0Agm28W2owjNN%2BqU0EUsSjBNbuXFsQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Parallel vacuum workers prevent the oldest xmin from advancing
On Wed, Nov 24, 2021 at 7:46 PM John Naylor wrote: > > On Wed, Nov 17, 2021 at 7:28 AM Amit Kapila wrote: > > > > > The patch looks good to me. But I can't come up with a stable test for > > > this. It seems to be hard without stopping and resuming parallel > > > vacuum workers. Do you have any good idea? > > > > > > > No, let's wait for a day or so to see if anybody else has any ideas to > > write a test for this case, otherwise, I'll check these once again and > > push. > > I set this "committed" in the CF app. > Thanks! -- With Regards, Amit Kapila.
Re: row filtering for logical replication
On Thu, Nov 18, 2021 at 9:35 PM Greg Nancarrow wrote: > > On Thu, Nov 18, 2021 at 12:33 PM Peter Smith wrote: > > > > PSA new set of v40* patches. > > > > Thanks for the patch updates. > > A couple of comments so far: > > (1) compilation warning > WIth the patches applied, there's a single compilation warning when > Postgres is built: > > pgoutput.c: In function ‘pgoutput_row_filter_init’: > pgoutput.c:854:8: warning: unused variable ‘relid’ [-Wunused-variable] > Oid relid = RelationGetRelid(relation); > ^ > Fixed in v42* [1] -- [1] https://www.postgresql.org/message-id/CAHut%2BPsGZHvafa3K_RAJ0Agm28W2owjNN%2BqU0EUsSjBNbuXFsQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
RE: [BUG]Missing REPLICA IDENTITY check when DROP NOT NULL
On Wed, Nov 24, 2021 7:29 PM, Michael Paquier wrote: > > On Wed, Nov 24, 2021 at 07:04:51AM +, tanghy.f...@fujitsu.com wrote: > > create table tbl (a int not null unique); > > alter table tbl replica identity using INDEX tbl_a_key; > > alter table tbl alter column a drop not null; > > insert into tbl values (null); > > Oops. Yes, that's obviously not good. > > > To solve the above problem, I think it's better to add a check when > > executing ALTER COLUMN DROP NOT NULL, > > and report an error if this column is part of replica identity. > > I'd say that you are right to block the operation. I'll try to play a > bit with this stuff tomorrow. > > > Attaching a patch that disallow DROP NOT NULL on a column if it's in > > a REPLICA IDENTITY index. Also added a test in it. > >if (indexStruct->indkey.values[i] == attnum) >ereport(ERROR, >(errcode(ERRCODE_INVALID_TABLE_DEFINITION), > - errmsg("column \"%s\" is in a primary key", > + errmsg(ngettext("column \"%s\" is in a primary key", > + "column \"%s\" is in a REPLICA IDENTITY > index", > + indexStruct->indisprimary), > colName))); > Using ngettext() looks incorrect to me here as it is used to get the > plural form of a string, so you'd better make these completely > separated instead. Thanks for your comment. I agree with you. I have fixed it and attached v2 patch. Regards, Tang v2-0001-Disallow-DROP-NOT-NULL-on-a-column-in-the-REPLICA.patch Description: v2-0001-Disallow-DROP-NOT-NULL-on-a-column-in-the-REPLICA.patch
RE: pg_get_publication_tables() output duplicate relid
On Wed, Nov 24, 2021 4:48 PM Amit Kapila wrote: > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote > wrote: > > > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila > wrote: > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote > wrote: > > > > As in, > > > > do we know of any replication (initial/streaming) misbehavior > > > > caused by the duplicate partition OIDs in this case or is the only > > > > problem that pg_publication_tables output looks odd? > > > > > > The latter one but I think either we should document this or change > > > it as we can't assume users will follow what subscriber-side code does. > > > > On second thought, I agree that de-duplicating partitions from this > > view is an improvement. > > > > Fair enough. Hou-San, Can you please submit the updated patch after fixing > any pending comments including the test case? OK, I will submit the updated patch soon. Best regards, Hou zj
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Thanks for working on this. I was just trying to find something like "pg_stat_checkpointer". You wrote beentry++ at the start of two loops, but I think that's wrong; it should be at the end, as in the rest of the file (or as a loop increment). BackendStatusArray[0] is actually used (even though its backend has backendId==1, not 0). "MyBEEntry = [MyBackendId - 1];" You could put *_NUM_TYPES as the last value in these enums, like NUM_AUXPROCTYPES, NUM_PMSIGNALS, and NUM_PROCSIGNALS: +#define IOOP_NUM_TYPES (IOOP_WRITE + 1) +#define IOPATH_NUM_TYPES (IOPATH_STRATEGY + 1) +#define BACKEND_NUM_TYPES (B_LOGGER + 1) There's extraneous blank lines in these functions: +pgstat_sum_io_path_ops +pgstat_report_live_backend_io_path_ops +pgstat_recv_resetsharedcounter +GetIOPathDesc +StrategyRejectBuffer This function is doubly-indented: +pgstat_send_buffers_reset As support for C99 is now required by postgres, variables can be declared as part of various loops. + int io_path; + for (io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++) Rather than memset(), you could initialize msg like this. PgStat_MsgIOPathOps msg = {0}; +pgstat_send_buffers(void) +{ + PgStat_MsgIOPathOps msg; + + PgBackendStatus *beentry = MyBEEntry; + + if (!beentry) + return; + + memset(, 0, sizeof(msg)); -- Justin
Re: Non-superuser subscription owners
On Fri, 2021-11-19 at 16:45 -0800, Mark Dilger wrote: > Renamed as 0001 in version 3, as it is the only remaining patch. For > anyone who reviewed the older patch set, please note that I made some > changes to the src/test/subscription/t/026_nosuperuser.pl test case > relative to the prior version. We need to do permission checking for WITH CHECK OPTION and RLS. The patch right now allows the subscription to write data that an RLS policy forbids. A couple other points: * We shouldn't refer to the behavior of previous versions in the docs unless there's a compelling reason * Do we need to be smarter about partitioned tables, where an insert can turn into an update? * Should we refactor to borrow logic from ExecInsert so that it's less likely that we miss something in the future? Regards, Jeff Davis
Re: Reduce function call costs on ELF platforms
On 11/24/21 13:55, Andres Freund wrote: > Hi, > > On 2021-11-23 17:28:08 +0100, Peter Eisentraut wrote: >> On 22.11.21 23:32, Tom Lane wrote: The easier approach for this class of issues is to use the linker option -Bsymbolic. >>> I don't recall details, but we've previously rejected the idea of >>> trying to use -Bsymbolic widely; apparently it has undesirable >>> side-effects on some platforms. See commit message for e3d77ea6b >>> (hopefully there's some detail in the email thread [1]). It sounds >>> like you're not actually proposing that, but I thought it would be >>> a good idea to note the hazard here. >> Also, IIRC, -Bsymbolic was once frowned upon by packaging policies, since it >> prevents use of LD_PRELOAD. I'm not sure what the current thinking there >> is, however. > It doesn't break some (most?) of the uses of LD_PRELOAD. In particular, it > doesn't break things like replacing the malloc implementation. When do you > have a symbol that you want to override *inside* your library (executables > already bind to their own symbols at compile time)? I've seen that for > replacing buggy functions in closed source things, but that's about it? > Which things does it break exactly? I have a case where a library that is LD_PRELOADed calls PQsetSSLKeyPassHook_OpenSSL() in its constructor function. I'd be very unhappy if that stopped working (and so would our client). cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On Wed, Nov 24, 2021 at 02:12:19PM -0800, SATYANARAYANA NARLAPURAM wrote: > While an exclusive backup is in progress if Postgres restarts, postgres > runs the recovery from the checkpoint identified by the label file instead > of the control file. This can cause long recovery or even sometimes fail to > recover as the WAL records corresponding to that checkpoint location are > removed. I can write a layer in my control plane to remove the backup_label > file when I know the server is not in restore from the base backup but I > don't see a reason why everyone has to repeat this step. Am I missing > something? This is a known issue with exclusive backups, which is a reason why non-exclusive backups have been implemented. pg_basebackup does that, and using "false" as the third argument of pg_start_backup() would have the same effect. So I would recommend to switch to that. -- Michael signature.asc Description: PGP signature
Re: internal subtransactions, memory contexts, resource owners
Chapman Flack writes: > By inspection of plperl and plpython, it looks like the canonical pattern > for a PL using internal subtransactions is: > save CurrentMemoryContext > save CurrentResourceOwner > BeginInternalSubTransaction > reimpose the saved memory context > // but not the saved resource owner > ... > (RollbackAnd)?ReleaseCurrentSubTransaction > reimpose the saved memory context > and the saved resource owner > Therefore, during the subtransaction, its newly-established memory context > is accessible as CurTransactionMemoryContext, but the caller can still use > CurrentMemoryContext to refer to the same context it already expected. > By contrast, the newly established resource owner is both the > CurTransactionResourceOwner and the CurrentResourceOwner within the scope > of the subtransaction. > Is there more explanation of this pattern written somewhere than I have > managed to find, and in particular of the motivation for treating the memory > context and the resource owner in these nearly-but-not-quite matching > ways? You normally want a separate resource owner for a subtransaction, since the main point of a subtransaction is to be able to clean up after errors and release resources. What to do with CurrentMemoryContext is a lot more specific to a particular PL. I don't want to speak to plperl and plpython in particular, but plpgsql does it this way because it uses the same function parsetree data structure and the same variable values throughout execution of a function. You would not, say, want the function's local variables to revert to previous values upon failure of a BEGIN block; so they have to be kept in the same memory context throughout. regards, tom lane
Postgres restart in the middle of exclusive backup and the presence of backup_label file
Hi Hackers, While an exclusive backup is in progress if Postgres restarts, postgres runs the recovery from the checkpoint identified by the label file instead of the control file. This can cause long recovery or even sometimes fail to recover as the WAL records corresponding to that checkpoint location are removed. I can write a layer in my control plane to remove the backup_label file when I know the server is not in restore from the base backup but I don't see a reason why everyone has to repeat this step. Am I missing something? If there are no standby.signal or recovery.signal, what is the use case of honoring backup_label file? Even when they exist, for a long running recovery, should we honor the backup_label file as the majority of the WAL already applied? It does slow down the recovery on restart right as it has to start all the way from the beginning? Thanks, Satya
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Fri, Nov 19, 2021 at 11:49 AM Andres Freund wrote: > > +/* > > + * On modern systems this is really just *counter++. On some older systems > > + * there might be more to it, due to inability to read and write 64 bit > > values > > + * atomically. > > + */ > > +static inline void inc_counter(pg_atomic_uint64 *counter) > > +{ > > + pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1); > > +} > > + > > #undef INSIDE_ATOMICS_H > > Why is this using a completely different naming scheme from the rest of the > file? It was what Thomas originally named it. Also, I noticed all the other pg_atomic* in this file were wrappers around the same impl function, so I thought maybe naming it this way would be confusing. I renamed it to pg_atomic_inc_counter(), though maybe pg_atomic_readonly_write() would be better? > > > doc/src/sgml/monitoring.sgml| 116 +- > > src/backend/catalog/system_views.sql| 11 ++ > > src/backend/postmaster/checkpointer.c | 3 +- > > src/backend/postmaster/pgstat.c | 161 +++- > > src/backend/storage/buffer/bufmgr.c | 46 -- > > src/backend/storage/buffer/freelist.c | 23 ++- > > src/backend/storage/buffer/localbuf.c | 3 + > > src/backend/storage/sync/sync.c | 1 + > > src/backend/utils/activity/backend_status.c | 60 +++- > > src/backend/utils/adt/pgstatfuncs.c | 152 ++ > > src/include/catalog/pg_proc.dat | 9 ++ > > src/include/miscadmin.h | 2 + > > src/include/pgstat.h| 53 +++ > > src/include/storage/buf_internals.h | 4 +- > > src/include/utils/backend_status.h | 80 ++ > > src/test/regress/expected/rules.out | 8 + > > 16 files changed, 701 insertions(+), 31 deletions(-) > > This is a pretty large change, I wonder if there's a way to make it a bit more > granular. > I have done this. See v15 patch set attached. - Melanie v15-0007-small-comment-correction.patch Description: Binary data v15-0004-Add-buffers-to-pgstat_reset_shared_counters.patch Description: Binary data v15-0006-Remove-superfluous-bgwriter-stats.patch Description: Binary data v15-0003-Send-IO-operations-to-stats-collector.patch Description: Binary data v15-0005-Add-system-view-tracking-IO-ops-per-backend-type.patch Description: Binary data v15-0002-Add-IO-operation-counters-to-PgBackendStatus.patch Description: Binary data v15-0001-Read-only-atomic-backend-write-function.patch Description: Binary data
Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)
On 11/23/21, 4:59 PM, "Mark Dilger" wrote: >> On Nov 23, 2021, at 4:51 PM, Bossart, Nathan wrote: >> >> This is a good point. Right now, you'd have to manually inspect the >> infomask field to determine the exact form of the invalid combination. >> My only worry with this is that we'd want to make sure these checks >> stayed consistent with the definition of HEAP_XMAX_IS_LOCKED_ONLY in >> htup_details.h. I'm guessing HEAP_XMAX_IS_LOCKED_ONLY is unlikely to >> change all that often, though. > > I expect that your patch will contain some addition to the amcheck (or > pg_amcheck) tests, so if we ever allow some currently disallowed bit > combination, we'd be reminded of the need to update amcheck. So I'm not too > worried about that aspect of this. > > I prefer not to have to get a page (or full file) from a customer when the > check reports corruption. Even assuming they are comfortable giving that, > which they may not be, it is an extra round trip to the customer asking for > stuff. Another option we might consider is only checking for the HEAP_XMAX_LOCK_ONLY bit instead of everything in HEAP_XMAX_IS_LOCKED_ONLY. IIUC everything else is only expected to happen for upgrades from v9.2 or earlier, so it might be pretty rare at this point. Otherwise, I'll extract the exact bit pattern for the error message as you suggest. Nathan
internal subtransactions, memory contexts, resource owners
Hi, By inspection of plperl and plpython, it looks like the canonical pattern for a PL using internal subtransactions is: save CurrentMemoryContext save CurrentResourceOwner BeginInternalSubTransaction reimpose the saved memory context // but not the saved resource owner ... (RollbackAnd)?ReleaseCurrentSubTransaction reimpose the saved memory context and the saved resource owner Therefore, during the subtransaction, its newly-established memory context is accessible as CurTransactionMemoryContext, but the caller can still use CurrentMemoryContext to refer to the same context it already expected. By contrast, the newly established resource owner is both the CurTransactionResourceOwner and the CurrentResourceOwner within the scope of the subtransaction. Is there more explanation of this pattern written somewhere than I have managed to find, and in particular of the motivation for treating the memory context and the resource owner in these nearly-but-not-quite matching ways? Regards, -Chap
Re: Post-CVE Wishlist
On Wed, Nov 24, 2021 at 3:14 PM Tom Lane wrote: > Robert Haas writes: > > On Wed, Nov 24, 2021 at 2:53 PM Tom Lane wrote: > >> One other point to be made here is that it seems like a stretch to call > >> these particular bugs "high-severity". > > > Well, I was referring to the CVSS score, which was in the "high" range. > > The server CVE is; the client CVE, not so much, precisely because we > couldn't find exciting exploits. Right, I understand that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Post-CVE Wishlist
Robert Haas writes: > On Wed, Nov 24, 2021 at 2:53 PM Tom Lane wrote: >> One other point to be made here is that it seems like a stretch to call >> these particular bugs "high-severity". > Well, I was referring to the CVSS score, which was in the "high" range. The server CVE is; the client CVE, not so much, precisely because we couldn't find exciting exploits. regards, tom lane
Re: Post-CVE Wishlist
On Wed, 2021-11-24 at 14:01 -0500, Robert Haas wrote: > The bar for actually ripping it out > on an expedited time scale would be proving not only that it's broken > in multiple ways, but that it's so badly broken that it can't be fixed > with any reasonable amount of effort. And I just don't see one bug > that had a pretty localized fix is really moving the needle as far as > that burden of proof is concerned. I'm not suggesting that we rip it out today, for the record. I agree with you on where the bar is for that. --Jacob
Re: Post-CVE Wishlist
On Wed, Nov 24, 2021 at 2:53 PM Tom Lane wrote: > One other point to be made here is that it seems like a stretch to call > these particular bugs "high-severity". Well, I was referring to the CVSS score, which was in the "high" range. > Given what we learned about > the difficulty of exploiting the libpq bug, and the certainty that any > other clients sharing the issue would have their own idiosyncrasies > necessitating a custom-designed attack, I rather doubt that we're going > to hear of anybody trying to exploit the issue in the field. I don't know. The main thing that I find consoling is the fact that most people probably have the libpq connection behind a firewall where nasty people can't even connect to the port. But there are probably exceptions. > (By no means do I suggest that these bugs aren't worth fixing when we > find them. But so far they seem very easy to fix. So moving mountains > to design out just this one type of bug doesn't seem like a great use > of our finite earth-moving capacity.) I have enough trouble just moving the couch. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Post-CVE Wishlist
On Wed, 2021-11-24 at 14:03 -0500, Tom Lane wrote: > > I don't buy the idea that, because we have fixed that particular > > vulnerability, we've rendered this entire class of bugs "hypothetical". > > There will be more code and more clients. There will always be bugs. > > I'd rather the bugs that people write be in places that are less > > security-critical. > > Unless we actively remove the existing way of starting SSL encryption > --- and GSS encryption, and anything else somebody proposes in future --- > we are not going to be able to design out this class of bugs. _We_ can't. I get that. But if this feature is introduced, new clients will begin to have the option of designing it out of their code. And DBAs will have the option of locking down their servers so that any new bugs we introduce in the TLS-upgrade codepath will simply not affect them. The ecosystem has the option of transitioning faster than we can. And then, some number of releases later, an entirely new conversation might happen. (Or it might not.) > Maybe > we could start the process now in the hopes of making such a breaking > change ten years down the road; but whether anyone will remember to > pull the trigger then is doubtful, and even if we do remember, you can > be dead certain it will still break some people's clients. I am familiar with the "we didn't plant a tree 20 years ago, so we shouldn't plant one now" line of argument. :D I hope it's not as persuasive as it used to be. > So I don't > put much stock in the argument that this will make things more secure. > (Ten years from now, SSL may be dead and replaced by something more > secure against quantum computers.) That would be great! But I suspect that if that happens, the new argument will be "we can't upgrade our server to XQuantum-only! Look at all these legacy SSL clients." --Jacob
Re: Post-CVE Wishlist
Robert Haas writes: > I think it would take an overwhelming amount of evidence to convince > the project to remove support for the current method. One or even two > or three high-severity bugs will probably not convince the project to > do more than spend more studying that code and trying to tighten > things up in a systematic way. One other point to be made here is that it seems like a stretch to call these particular bugs "high-severity". Given what we learned about the difficulty of exploiting the libpq bug, and the certainty that any other clients sharing the issue would have their own idiosyncrasies necessitating a custom-designed attack, I rather doubt that we're going to hear of anybody trying to exploit the issue in the field. (By no means do I suggest that these bugs aren't worth fixing when we find them. But so far they seem very easy to fix. So moving mountains to design out just this one type of bug doesn't seem like a great use of our finite earth-moving capacity.) regards, tom lane
Re: Split xlog.c
On Wed, Nov 24, 2021 at 12:16 PM Heikki Linnakangas wrote: > And here's another rebase, now that Robert got rid of ReadRecPtr and > EndRecPtr. In general, I think 0001 is a good idea, but the comment that says "Set the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag on the page header" seems to me to be telling the reader about what's already obvious instead of explaining to them the thing they might have missed. GetXLogBuffer() says that it's only safe to use if you hold a WAL insertion lock and don't go backwards, and here you don't hold a WAL insertion lock and I guess you're not going backwards only because you're staying in exactly the same place? It seems to me that the only reason this is safe is because, at the time this is called, only the startup process is able to write WAL, and therefore the race condition that would otherwise exist does not. Even then, I wonder what keeps the buffer from being flushed after we return from XLogInsert() and before we set the bit, and if the answer is that nothing prevents that, whether that's OK. It might be good to talk about these issues too. Just to be clear, I'm not saying that I think the code is broken. But I am concerned about someone using this as precedent for code that runs in some other place, which would be highly likely to be broken, and the way to avoid that is for the comment to explain the tricky points. Also, you've named the parameter to this new function so that it's exactly the same as the global variable. I do approve of trying to pass the value as a parameter instead of relying on a global variable, and I wonder if you could find a way to remove the global variable entirely. But if not, I think the function parameter and the global variable should have different names, because otherwise it's easy for anyone reading the code to get confused about which one is being referenced in any particular spot, and it's also hard to grep. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Should rename "startup process" to something else?
On Wed, Nov 24, 2021 at 1:54 PM Alvaro Herrera wrote: > I don't object to an underscore, but it looks a bit uglier to me. > AFAIK the main problem with uuid-ossp was that it is used as an identifier, so > it required quoting, which won't be the case with this process name. I agree, and for that reason I would prefer no separator, or a space. That's what we do with other processes, and I think it's fine. It's worth thinking too about the fact that we may want to rename functions, adjust comments, etc. Each of those things has their own conventions. For example consider StartupProcessMain(). If we decide to call this the WAL replay process, I suppose that is going to become WALReplayMain(). For sure it's not going to be come WAL-Replay-Main(). But what displays in the 'ps' status should look like what we do in other cases. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Should rename "startup process" to something else?
Alvaro Herrera writes: > On 2021-Nov-23, Tom Lane wrote: >> Bad memories of uuid-ossp float up ... can we use underscore? > I don't object to an underscore, but it looks a bit uglier to me. > AFAIK the main problem with uuid-ossp was that it is used as an identifier, so > it required quoting, which won't be the case with this process name. Well, we can call the process whatever we want, but we will need to derive identifiers from that name, so you can't just ignore the issue. regards, tom lane
Re: Post-CVE Wishlist
Jacob Champion writes: > What I'm trying to convince you of is that this pattern of beginning a > TLS conversation is known to be particularly error-prone, across > multiple protocols and implementations. I think this is supported by > the fact that at least three independent client libraries made changes > in response to this Postgres CVE, a decade after the first writeup of > this exact vulnerability. Well, it's not clear that they didn't just copy libpq's buggy logic, so I'm not sure how "independent" these bugs are. I actually took considerable comfort from the number of clients that *weren't* vulnerable. > I don't buy the idea that, because we have fixed that particular > vulnerability, we've rendered this entire class of bugs "hypothetical". > There will be more code and more clients. There will always be bugs. > I'd rather the bugs that people write be in places that are less > security-critical. Unless we actively remove the existing way of starting SSL encryption --- and GSS encryption, and anything else somebody proposes in future --- we are not going to be able to design out this class of bugs. Maybe we could start the process now in the hopes of making such a breaking change ten years down the road; but whether anyone will remember to pull the trigger then is doubtful, and even if we do remember, you can be dead certain it will still break some people's clients. So I don't put much stock in the argument that this will make things more secure. (Ten years from now, SSL may be dead and replaced by something more secure against quantum computers.) The suggestion that we could remove one network roundtrip is worth something. And perhaps the argument about improving compatibility with tools that know SSL but not the PG wire protocol (although that one seems pretty unbacked by concrete facts). Whether these things make it worth the effort is dubious in my mind, but others may evaluate that differently. Note, however, that neither argument impels us to break compatibility with existing clients. That's a far heavier price to pay, and basically I don't believe that we are willing to pay it. regards, tom lane
Re: Post-CVE Wishlist
On Wed, Nov 24, 2021 at 1:29 PM Jacob Champion wrote: > What I'm trying to convince you of is that this pattern of beginning a > TLS conversation is known to be particularly error-prone, across > multiple protocols and implementations. I think this is supported by > the fact that at least three independent client libraries made changes > in response to this Postgres CVE, a decade after the first writeup of > this exact vulnerability. > > - https://github.com/postgres/postgres/commit/160c0258802 > - https://github.com/pgbouncer/pgbouncer/commit/e4453c9151a > - https://github.com/yandex/odyssey/commit/4e00bf797a Sure, that's certainly true, but there are many programming patterns that have well-known gotchas, and people still write programs that do those things, debug them, and are satisfied with the results. Beginning programming classes often cover the abuse of recursion using fact() and fib() as examples. It's extremely easy to write a program that concates a large number of strings with O(n^2) runtime, and tons of people must have made that mistake. I've personally written fork() bombs multiple times, sometimes unintentionally. None of that proves that computing factorials or fibonacci numbers, concatenating strings, or calling fork() are things that you just should not do. However, if you do those things and make the classic errors, somebody's probably going to think that you're kind of stupid. So here. I think it would take an overwhelming amount of evidence to convince the project to remove support for the current method. One or even two or three high-severity bugs will probably not convince the project to do more than spend more studying that code and trying to tighten things up in a systematic way. Even if we did agree to move away from it, we would mostly likely add support for the replacement method now with a view to deprecating the old way in 6-10 years when existing releases are out of support, which means we'd still need to fix all of the bugs in the existing implementation, or at least all of the ones discovered between now and then. The bar for actually ripping it out on an expedited time scale would be proving not only that it's broken in multiple ways, but that it's so badly broken that it can't be fixed with any reasonable amount of effort. And I just don't see one bug that had a pretty localized fix is really moving the needle as far as that burden of proof is concerned. And if the existing method is not going away, then adding a new method just means that we have two things that can have bugs instead of one. That might or might not be an advancement in usability or convenience, but it certainly can't be less buggy. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Partial foreign key updates in referential integrity triggers
On 05.01.21 22:40, Paul Martinez wrote: CREATE TABLE tenants (id serial PRIMARY KEY); CREATE TABLE users ( tenant_id int REFERENCES tenants ON DELETE CASCADE, id serial, PRIMARY KEY (tenant_id, id), ); CREATE TABLE posts ( tenant_id int REFERENCES tenants ON DELETE CASCADE, id serial, author_id int, PRIMARY KEY (tenant_id, id), FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL ); INSERT INTO tenants VALUES (1); INSERT INTO users VALUES (1, 101); INSERT INTO posts VALUES (1, 201, 101); DELETE FROM users WHERE id = 101; ERROR: null value in column "tenant_id" violates not-null constraint DETAIL: Failing row contains (null, 201, null). I was looking through this example to see if it could be adapted for the documentation. The way the users table is defined, it appears that "id" is actually unique and the primary key ought to be just (id). The DELETE command you show also just uses the id column to find the user, which would be bad if the user id is not unique across tenants. If the id were unique, then the foreign key from posts to users would just use the user id column and the whole problem of the ON DELETE SET NULL action would go away. If the primary key of users is indeed supposed to be (tenant_id, id), then maybe the definition of the users.id column should not use serial, and the DELETE command should also look at the tenant_id column. (The same question applies to posts.id.) Also, you initially wrote that this is a denormalized schema. I think if we keep the keys the way you show, then this isn't denormalized. But if we considered users.id globally unique, then there would be normalization concerns. What do you think?
Re: pgsql: xlog.c: Remove global variables ReadRecPtr and EndRecPtr.
On Wed, Nov 24, 2021 at 1:51 PM Alvaro Herrera wrote: > > There is a buildfarm failure on lapwing which looks likely to be > > attributable to this commit, but I don't understand what has gone > > wrong exactly. The error message that is reported is: > > No, this is an unrelated problem; Tom reported this yesterday also: > https://postgr.es/m/45597.1637694...@sss.pgh.pa.us Oh, OK. I couldn't see exactly what that could have to do with my commit, but it happened right afterwards so I was hesitant to call it a coincidence. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Re: Reduce function call costs on ELF platforms
Hi, On 2021-11-23 17:28:08 +0100, Peter Eisentraut wrote: > On 22.11.21 23:32, Tom Lane wrote: > > > The easier approach for this class of issues is to use the linker option > > > -Bsymbolic. > > I don't recall details, but we've previously rejected the idea of > > trying to use -Bsymbolic widely; apparently it has undesirable > > side-effects on some platforms. See commit message for e3d77ea6b > > (hopefully there's some detail in the email thread [1]). It sounds > > like you're not actually proposing that, but I thought it would be > > a good idea to note the hazard here. > > Also, IIRC, -Bsymbolic was once frowned upon by packaging policies, since it > prevents use of LD_PRELOAD. I'm not sure what the current thinking there > is, however. It doesn't break some (most?) of the uses of LD_PRELOAD. In particular, it doesn't break things like replacing the malloc implementation. When do you have a symbol that you want to override *inside* your library (executables already bind to their own symbols at compile time)? I've seen that for replacing buggy functions in closed source things, but that's about it? Greetings, Andres Freund
Re: Should rename "startup process" to something else?
On 2021-Nov-23, Tom Lane wrote: > Alvaro Herrera writes: > > On 2021-Nov-20, Andrew Dunstan wrote: > >> Maybe something along those lines but using a dash/hyphen would work: > >> e.g. wal-replayer > > > Yeah, the idea of a dash occurred to me too. > > Bad memories of uuid-ossp float up ... can we use underscore? I don't object to an underscore, but it looks a bit uglier to me. AFAIK the main problem with uuid-ossp was that it is used as an identifier, so it required quoting, which won't be the case with this process name. Anyway, as a second and separate point, should we rename the other WAL-related process names to use the same character? walreceiver -> wal_receiver / wal-receiver walsender -> wal_sender / wal-sender walwriter -> wal_writer / wal-writer. I assume we would *not* rename other existing processes that already have spaces in their names, such as "autovacuum launcher" to "autovacuum_launcher", as that would be a net loss IMO. Currently, the process list in an idle primary server looks like USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND alvherre 1091970 0.0 0.0 199608 22252 pts/0S+ 12:22 0:00 /pgsql/install/master/bin/postmaster -p 55432 alvherre 1091983 0.0 0.0 199608 3644 ?Ss 12:22 0:00 \_ postgres: checkpointer alvherre 1091984 0.0 0.0 199608 5432 ?Ss 12:22 0:00 \_ postgres: background writer alvherre 1091986 0.0 0.0 199608 8956 ?Ss 12:22 0:00 \_ postgres: walwriter alvherre 1091987 0.0 0.0 200148 7988 ?Ss 12:22 0:00 \_ postgres: autovacuum launcher alvherre 1091988 0.0 0.0 54432 4156 ?Ss 12:22 0:00 \_ postgres: stats collector alvherre 1091989 0.0 0.0 200036 6416 ?Ss 12:22 0:00 \_ postgres: logical replication launcher I think this looks better: USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND alvherre 1091970 0.0 0.0 199608 22252 pts/0S+ 12:22 0:00 /pgsql/install/master/bin/postmaster -p 55432 alvherre 1091983 0.0 0.0 199608 3644 ?Ss 12:22 0:00 \_ postgres: checkpointer alvherre 1091984 0.0 0.0 199608 5432 ?Ss 12:22 0:00 \_ postgres: background writer alvherre 1091986 0.0 0.0 199608 8956 ?Ss 12:22 0:00 \_ postgres: wal-writer alvherre 1091987 0.0 0.0 200148 7988 ?Ss 12:22 0:00 \_ postgres: autovacuum launcher alvherre 1091988 0.0 0.0 54432 4156 ?Ss 12:22 0:00 \_ postgres: stats collector alvherre 1091989 0.0 0.0 200036 6416 ?Ss 12:22 0:00 \_ postgres: logical replication launcher -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: pgsql: xlog.c: Remove global variables ReadRecPtr and EndRecPtr.
On 2021-Nov-24, Robert Haas wrote: > On Wed, Nov 24, 2021 at 11:28 AM Robert Haas wrote: > > xlog.c: Remove global variables ReadRecPtr and EndRecPtr. > > https://git.postgresql.org/pg/commitdiff/d2ddfa681db27a138acb63c8defa8cc6fa588922 > > There is a buildfarm failure on lapwing which looks likely to be > attributable to this commit, but I don't understand what has gone > wrong exactly. The error message that is reported is: No, this is an unrelated problem; Tom reported this yesterday also: https://postgr.es/m/45597.1637694...@sss.pgh.pa.us -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "La espina, desde que nace, ya pincha" (Proverbio africano)
Re: pgsql: xlog.c: Remove global variables ReadRecPtr and EndRecPtr.
Robert Haas writes: > 2021-11-24 17:07:41.428 UTC [1449:6] FATAL: mismatching overwritten > LSN 0/1FFE014 -> 0/1FFE000 > 2021-11-24 17:07:41.428 UTC [1449:7] CONTEXT: WAL redo at 0/224 > for XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFE014; time 2021-11-24 > 17:07:41.127049+00 > Beyond the fact that something bad has happened, I'm not sure what > that is trying to tell me. Pre-existing issue: https://www.postgresql.org/message-id/45597.1637694259%40sss.pgh.pa.us regards, tom lane
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On Wed, Nov 24, 2021 at 9:53 AM Alvaro Herrera wrote: > OK, this makes a lot more sense. I wasn't aware of ae7291ac (and I > wasn't aware of the significance of 8523492d either, but that's not > really relevant here.) Thanks for hearing me out about the significance of 8523492d. Having the right formalisms seems to really matter here, because they enable decoupling, which is generally very useful. This makes it easy to understand (just for example) that index vacuuming and heap vacuuming are just additive, optional steps (in principle) -- an idea that will become even more important once we get Robert's pending TID conveyor belt design. I believe that that design goes one step further than what we have today, by making index vacuuming and heap vacuuming occur in a distinct operation to VACUUM proper (VACUUM would only need to set up the LP_DEAD item list for index vacuuming and heap vacuuming, which may or may not happen immediately after). An interesting question (at least to me) is: within a non-aggressive VACUUM, what remaining steps are *not* technically optional? I am pretty sure that they're all optional in principle (or will be soon), because soon we will be able to independently advance relfrozenxid without freezing all tuples with XIDs before our original FreezeLimit (FreezeLimit should only be used to decide which tuples to freeze, not to decide on a new relfrozenxid). Soon almost everything will be decoupled, without changing the basic invariants that we've had for many years. This flexibility seems really important to me. That just leaves avoiding pruning without necessarily avoiding ostensibly related processing for indexes. We can already independently prune without doing index/heap vacuuming (the bypass indexes optimization). We will also be able to do the opposite thing, with my new patch: we can perform index/heap vacuuming *without* pruning ourselves. This makes sense in the case where we cannot acquire a cleanup lock on a heap page with preexisting LP_DEAD items. > I think we could say "LP_DEAD line pointer" and that would be perfectly > clear. Given how nuanced we have to be if we want to be clear about > this, I would rather not use "LP_DEAD item"; that seems slightly > contradictory, since the item is the storage and such a line pointer > does not have storage. Perhaps change that define in progress.h to > PROGRESS_VACUUM_NUM_DEAD_LPS, and, in the first comment in vacuumlazy.c, > use wording such as I agree with all that, I think. But it's still not clear what the variable dead_tuples should be renamed to within the structure that you lay out (I imagine that you agree with me that dead_tuples is now a bad name). This one detail affects more individual lines of code than the restructuring of comments. -- Peter Geoghegan
Re: Sequence's value can be rollback after a crashed recovery.
Laurenz Albe writes: > On Tue, 2021-11-23 at 16:41 -0500, Tom Lane wrote: >> A bit of polishing later, maybe like the attached. > That looks good to me. Pushed, thanks. regards, tom lane
Re: BUFFERS enabled by default in EXPLAIN (ANALYZE)
I think it *should* be enabled for planning, since that makes the default > easier to understand and document, and it makes a user's use of "explain" > easier. I’d be keen to see BUFFERS off by default with EXPLAIN, and on by default with EXPLAIN ANALYZE. The SUMMARY flag was implemented that way, which I think has been easy enough for folks to understand and document. In fact, I think the only BUFFERS information goes in the “summary” section for EXPLAIN (BUFFERS), so maybe it makes perfect sense? If it would be great if that made implementation easier, too. In any case, thank you all, I’m so glad that this is being discussed again. It’d be so good to start seeing buffers in more plans. — Michael
Re: Post-CVE Wishlist
On Wed, 2021-11-24 at 09:40 -0500, Robert Haas wrote: > I am not really persuaded by Jacob's argument that, had this only > worked the other way from the start, this bug wouldn't have occurred. > That's just a tautology, because we can only have bugs in the code we > write, not the code we didn't write. So perhaps we would have just had > some other bug, which might have been more or less serious than the > one we actually had. It's hard to say, really, because the situation > is hypothetical. I'm not trying to convince you that there wouldn't have been bugs. There will always be bugs. What I'm trying to convince you of is that this pattern of beginning a TLS conversation is known to be particularly error-prone, across multiple protocols and implementations. I think this is supported by the fact that at least three independent client libraries made changes in response to this Postgres CVE, a decade after the first writeup of this exact vulnerability. - https://github.com/postgres/postgres/commit/160c0258802 - https://github.com/pgbouncer/pgbouncer/commit/e4453c9151a - https://github.com/yandex/odyssey/commit/4e00bf797a I don't buy the idea that, because we have fixed that particular vulnerability, we've rendered this entire class of bugs "hypothetical". There will be more code and more clients. There will always be bugs. I'd rather the bugs that people write be in places that are less security-critical. > This argument doesn't answer the question of whether speaking pure SSL > on a separate port is better or worse than having a single port that > does either. If I had to guess, the latter is more convenient for > users but less convenient to code. I don't even see a compelling > reason why we can't support multiple models here, supposing someone is > willing to do the work and fix the bugs that result. I only have experience in the area of HTTP(S), which supports three models of plaintext-only, plaintext-upgrade-to-TLS (which is rare in practice), and implicit-TLS. I'm not aware of mainstream efforts to mix plaintext and implicit-TLS traffic on the same HTTP port -- but there are projects that fill that niche [1] -- so I don't know what security issues might arise from that approach. --Jacob [1] https://github.com/mscdex/httpolyglot
Re: pgsql: xlog.c: Remove global variables ReadRecPtr and EndRecPtr.
On Wed, Nov 24, 2021 at 11:28 AM Robert Haas wrote: > xlog.c: Remove global variables ReadRecPtr and EndRecPtr. > https://git.postgresql.org/pg/commitdiff/d2ddfa681db27a138acb63c8defa8cc6fa588922 There is a buildfarm failure on lapwing which looks likely to be attributable to this commit, but I don't understand what has gone wrong exactly. The error message that is reported is: [17:07:19] t/025_stuck_on_old_timeline.pl ... ok 6048 ms # poll_query_until timed out executing this query: # SELECT '0/201E8E0'::pg_lsn <= pg_last_wal_replay_lsn() # expecting this output: # t # last actual query output: # # with stderr: # psql: error: connection to server on socket "/tmp/SrZekgRh7K/.s.PGSQL.57959" failed: No such file or directory # Is the server running locally and accepting connections on that socket? This output confused me for a while because I thought the failure was in test 025, but I should have realized that the problem must be in 026, since 025 finished with "ok". Anyway, in 026_overwrite_contrecord_standby.log there's this: 2021-11-24 17:07:41.388 UTC [1473:1] LOG: started streaming WAL from primary at 0/200 on timeline 1 2021-11-24 17:07:41.428 UTC [1449:6] FATAL: mismatching overwritten LSN 0/1FFE014 -> 0/1FFE000 2021-11-24 17:07:41.428 UTC [1449:7] CONTEXT: WAL redo at 0/224 for XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFE014; time 2021-11-24 17:07:41.127049+00 Beyond the fact that something bad has happened, I'm not sure what that is trying to tell me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On 2021-Nov-24, Peter Geoghegan wrote: > TIDs (ItemPointerData structs) are of course not the same thing as > line pointers (ItemIdData structs). There is a tendency to refer to > the latter as "item pointers" all the same, which was confusing. I > personally corrected/normalized this in commit ae7291ac in 2019. I > think that it's worth being careful about precisely because they're > closely related (but distinct) concepts. And so FWIW "LP_DEAD item > pointer" is not a thing. I agree that an LP_DEAD item pointer has no > tuple storage, and so you could say that it points to nothing (though > only in heapam). I probably would just say that it has no tuple > storage, though. OK, this makes a lot more sense. I wasn't aware of ae7291ac (and I wasn't aware of the significance of 8523492d either, but that's not really relevant here.) > I agree with others that the term "item" is vague, but I don't think > that that's necessarily a bad thing here -- I deliberately changed the > comments to say either "TIDs" or "LP_DEAD items", emphasizing whatever > the important aspect seemed to be in each context (they're LP_DEAD > items to the heap structure, TIDs to index structures). I think we could say "LP_DEAD line pointer" and that would be perfectly clear. Given how nuanced we have to be if we want to be clear about this, I would rather not use "LP_DEAD item"; that seems slightly contradictory, since the item is the storage and such a line pointer does not have storage. Perhaps change that define in progress.h to PROGRESS_VACUUM_NUM_DEAD_LPS, and, in the first comment in vacuumlazy.c, use wording such as + * The major space usage for LAZY VACUUM is storage for the array of TIDs + * of dead line pointers that are to be removed from indexes. or + * The major space usage for LAZY VACUUM is storage for the array of TIDs + * of LP_DEAD line pointers that are to be removed from indexes. (The point being that TIDs are not dead themselves, only the line pointers that they refer to.) -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Most hackers will be perfectly comfortable conceptualizing users as entropy sources, so let's move on." (Nathaniel Smith)
Re: Python 3.11 vs. Postgres
Peter Eisentraut writes: > On 24.11.21 04:07, Tom Lane wrote: >> According to [1], we need to stop including Python's . > See attached patch. The minimum Python version for this change is 2.4, > which is the oldest version supported by PG10, so we can backpatch this > to all live branches. LGTM. Tested with v10 and prairiedog's Python 2.4.1. regards, tom lane
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On Wed, Nov 24, 2021 at 7:16 AM Alvaro Herrera wrote: > Sorry to reply to myself, but I realized that I forgot to return to the > main point of this thread. If we agree that "an LP_DEAD item pointer > does not point to any item" (an assertion that gives a precise meaning > to both those terms), then a patch that renames "tuples" to "items" is > not doing anything useful IMO, because those two terms are synonyms. TIDs (ItemPointerData structs) are of course not the same thing as line pointers (ItemIdData structs). There is a tendency to refer to the latter as "item pointers" all the same, which was confusing. I personally corrected/normalized this in commit ae7291ac in 2019. I think that it's worth being careful about precisely because they're closely related (but distinct) concepts. And so FWIW "LP_DEAD item pointer" is not a thing. I agree that an LP_DEAD item pointer has no tuple storage, and so you could say that it points to nothing (though only in heapam). I probably would just say that it has no tuple storage, though. > Now maybe Peter doesn't agree with the definitions I suggest, in which > case I would like to know what his definitions are. I agree with others that the term "item" is vague, but I don't think that that's necessarily a bad thing here -- I deliberately changed the comments to say either "TIDs" or "LP_DEAD items", emphasizing whatever the important aspect seemed to be in each context (they're LP_DEAD items to the heap structure, TIDs to index structures). I'm not attached to the term "item". To me the truly important point is what these items are *not*: they're not tuples. The renaming is intended to enforce the concepts that I went into at the end of the commit message for commit 8523492d. Now the pruning steps in lazy_scan_prune always avoiding keeping around a DEAD tuple with tuple storage on return to lazy_scan_heap (only LP_DEAD items can remain), since (as of that commit) lazy_scan_prune alone is responsible for things involving the "logical database". This means that index vacuuming and heap vacuuming can now be thought of as removing garbage items from physical data structures (they're purely "physical database" concepts), and nothing else. They don't need recovery conflicts. How could they? Where are you supposed to get the XIDs for that from, when you've only got LP_DEAD items? This is also related to the idea that pruning by VACUUM isn't necessarily all that special compared to earlier pruning or concurrent opportunistic pruning. As I go into on the other recent thread on removing special cases in vacuumlazy.c, ISTM that we ought to do everything except pruning itself (and freezing tuples, which effectively depends on pruning) without even acquiring a cleanup lock. Which is actually quite a lot of things. -- Peter Geoghegan
Re: Minor documentation fix - missing blank space
On 24/11/2021 16:49, Japin Li wrote: When I read the documentation about Overview of PostgreSQL Internals - Executor [1], I find there is a missing space in the documentation. diff --git a/doc/src/sgml/arch-dev.sgml b/doc/src/sgml/arch-dev.sgml index 7aff059e82..c2be28fac8 100644 --- a/doc/src/sgml/arch-dev.sgml +++ b/doc/src/sgml/arch-dev.sgml @@ -559,7 +559,7 @@ A simple INSERT ... VALUES command creates a trivial plan tree consisting of a single Result node, which computes just one result row, feeding that up -toModifyTable to perform the insertion. +to ModifyTable to perform the insertion. Applied, thanks! - Heikki
Re: xlog.c: removing ReadRecPtr and EndRecPtr
On Tue, Nov 23, 2021 at 1:36 PM Robert Haas wrote: > Attached please find the test case not for commit as > v2-0001-dubious-test-case.patch; the fix, for commit and now with a > proper commit message as > v2-0002-Fix-corner-case-failure-to-detect-improper-timeli.patch; and > the back-ported test case for v14 as v14-0001-dubious-test-case.patch > in case anyone wants to play around with that. (with apologies for > using the idea of a version number in two different and conflicting > senses) OK, I have committed this patch, rebased the original combined patch over top of that, and committed that too. By my count that means I've now removed a total of 3 global variables from this file and Amul got rid of one as well so that makes 4 in total. If we continue at this brisk pace the code may become understandable and maintainable sometime prior to the heat death of the universe, which I think would be rather nice. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Synchronizing slots from primary to standby
Hi all, Peter Eisentraut writes: > I want to reactivate $subject. I took Petr Jelinek's patch from [0], > rebased it, added a bit of testing. It basically works, but as > mentioned in [0], there are various issues to work out. Thanks for working on that topic, I believe it's an important part of Postgres' HA story. > The idea is that the standby runs a background worker to periodically fetch > replication slot information from the primary. On failover, a logical > subscriber would then ideally find up-to-date replication slots on the new > publisher and can just continue normally. Is there a case to be made about doing the same thing for physical replication slots too? That's what pg_auto_failover [1] does by default: it creates replication slots on every node for every other node, in a way that a standby Postgres instance now maintains a replication slot for the primary. This ensures that after a promotion, the standby knows to retain any and all WAL segments that the primary might need when rejoining, at pg_rewind time. > The previous thread didn't have a lot of discussion, but I have gathered > from off-line conversations that there is a wider agreement on this > approach. So the next steps would be to make it more robust and > configurable and documented. I suppose part of the configuration would then include taking care of physical slots. Some people might want to turn that off and use the Postgres 13+ ability to use the remote primary restore_command to fetch missing WAL files, instead. Well, people who have setup an archiving system, anyway. > As I said, I added a small test case to > show that it works at all, but I think a lot more tests should be added. I > have also found that this breaks some seemingly unrelated tests in the > recovery test suite. I have disabled these here. I'm not sure if the patch > actually breaks anything or if these are just differences in timing or > implementation dependencies. This patch adds a LIST_SLOTS replication > command, but I think this could be replaced with just a SELECT FROM > pg_replication_slots query now. (This patch is originally older than when > you could run SELECT queries over the replication protocol.) Given the admitted state of the patch, I didn't focus on tests. I could successfully apply the patch on-top of current master's branch, and cleanly compile and `make check`. Then I also updated pg_auto_failover to support Postgres 15devel [2] so that I could then `make NODES=3 cluster` there and play with the new replication command: $ psql -d "port=5501 replication=1" -c "LIST_SLOTS;" psql:/Users/dim/.psqlrc:24: ERROR: XX000: cannot execute SQL commands in WAL sender for physical replication LOCATION: exec_replication_command, walsender.c:1830 ... I'm not too sure about this idea of running SQL in a replication protocol connection that you're mentioning, but I suppose that's just me needing to brush up on the topic. > So, again, this isn't anywhere near ready, but there is already a lot here > to gather feedback about how it works, how it should work, how to configure > it, and how it fits into an overall replication and HA architecture. Maybe the first question about configuration would be about selecting which slots a standby should maintain from the primary. Is it all of the slots that exists on both the nodes, or a sublist of that? Is it possible to have a slot with the same name on a primary and a standby node, in a way that the standby's slot would be a completely separate entity from the primary's slot? If yes (I just don't know at the moment), well then, should we continue to allow that? Other aspects of the configuration might include a list of databases in which to make the new background worker active, and the polling delay, etc. Also, do we want to even consider having the slot management on a primary node depend on the ability to sync the advancing on one or more standby nodes? I'm not sure to see that one as a good idea, but maybe we want to kill it publically very early then ;-) Regards, -- dim Author of “The Art of PostgreSQL”, see https://theartofpostgresql.com [1]: https://github.com/citusdata/pg_auto_failover [2]: https://github.com/citusdata/pg_auto_failover/pull/838
Re: psql - add SHOW_ALL_RESULTS option
Hello Daniel, This patch still has a few of the problems reported earlier this year. The patch fails to apply and the thread seems to have taken a nap. I'm not napping:-) I just do not have enough time available this month. I intend to work on the patch in the next CF (January). AFAICR there is one necessary rebase and one bug to fix. -- Fabien.
Re: pg_replslotdata - a tool for displaying replication slot information
On Wed, 24 Nov 2021 at 23:59, Bharath Rupireddy wrote: > On Wed, Nov 24, 2021 at 9:09 PM Bharath Rupireddy >> > Thoughts? >> >> Attaching the rebased v2 patch. > > On windows the previous patches were failing, fixed that in the v3 > patch. I'm really sorry for the noise. > Cool! When I try to use it, there is an error for -v, --verbose option. px@ubuntu:~/Codes/postgres/Debug$ pg_replslotdata -v pg_replslotdata: invalid option -- 'v' Try "pg_replslotdata --help" for more information. This is because the getopt_long() missing 'v' in the third parameter. while ((c = getopt_long(argc, argv, "D:v", long_options, NULL)) != -1) -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: pg_replslotdata - a tool for displaying replication slot information
On Wed, Nov 24, 2021 at 9:09 PM Bharath Rupireddy wrote: > > On Tue, Nov 23, 2021 at 10:39 AM Bharath Rupireddy > wrote: > > > > Hi, > > > > The replication slots data is stored in binary format on the disk under the > > pg_replslot/<> directory which isn't human readable. If the > > server is crashed/down (for whatever reasons) and unable to come up, > > currently there's no way for the user/admin/developer to know what were all > > the replication slots available at the time of server crash/down to figure > > out what's the restart lsn, xid, two phase info or types of slots etc. > > > > pg_replslotdata is a tool that interprets the replication slots information > > and displays it onto the stdout even if the server is crashed/down. The > > design of this tool is similar to other tools available in the core today > > i.e. pg_controldata, pg_waldump. > > > > Attaching initial patch herewith. I will improve it with documentation and > > other stuff a bit later. > > > > Please see the attached picture for the sample output. > > > > Thoughts? > > Attaching the rebased v2 patch. On windows the previous patches were failing, fixed that in the v3 patch. I'm really sorry for the noise. Regards, Bharath Rupireddy. v3-0001-pg_replslotdata.patch Description: Binary data
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On Wed, Nov 24, 2021 at 9:51 AM Alvaro Herrera wrote: > On 2021-Nov-24, Robert Haas wrote: > > Hmm. I think in my model an item and an item pointer and a line > > pointer are all the same thing, but a TID is different. When I talk > > about a TID, I mean the location of an item pointer, not its contents. > > So a TID is what tells me that I want block 5 and the 4th slot in the > > item pointer array. The item pointer tells me that the associate tuple > > is at a certain position in the page and has a certain length. > > OK, but you can have item pointers that don't have any item. > LP_REDIRECT, LP_DEAD, LP_UNUSED item pointers don't have items. I guess so. I said before that I thought an item and an item pointer were the same, but on reflection, that doesn't entirely make sense. But I don't know that I like making item and tuple synonymous either. I think perhaps the term "item" by itself is not very clear. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_replslotdata - a tool for displaying replication slot information
On Tue, Nov 23, 2021 at 10:39 AM Bharath Rupireddy wrote: > > Hi, > > The replication slots data is stored in binary format on the disk under the > pg_replslot/<> directory which isn't human readable. If the server > is crashed/down (for whatever reasons) and unable to come up, currently > there's no way for the user/admin/developer to know what were all the > replication slots available at the time of server crash/down to figure out > what's the restart lsn, xid, two phase info or types of slots etc. > > pg_replslotdata is a tool that interprets the replication slots information > and displays it onto the stdout even if the server is crashed/down. The > design of this tool is similar to other tools available in the core today > i.e. pg_controldata, pg_waldump. > > Attaching initial patch herewith. I will improve it with documentation and > other stuff a bit later. > > Please see the attached picture for the sample output. > > Thoughts? Attaching the rebased v2 patch. Regards, Bharath Rupireddy. v2-0001-pg_replslotdata.patch Description: Binary data
Re: Windows build warnings
Tom Lane writes: > =?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= writes: >> Should we change the compiler checks for attributes in c.h to include >> `|| __has_attribute(…)`, so that we automatically get them on compilers >> that support that (particularly clang)? > > clang already #defines GCC, no? __GNUC__, but yes, I didn't realise that. Clang 11 seems to claim to be GCC 4.2 by default, but that can be overridden usng the -fgnuc-version (and turned off by setting it to zero). Do any other compilers support __has_attribute()? > regards, tom lane - ilmari
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On 2021-Nov-24, Alvaro Herrera wrote: > On 2021-Nov-24, Robert Haas wrote: > > > Hmm. I think in my model an item and an item pointer and a line > > pointer are all the same thing, but a TID is different. When I talk > > about a TID, I mean the location of an item pointer, not its contents. > > So a TID is what tells me that I want block 5 and the 4th slot in the > > item pointer array. The item pointer tells me that the associate tuple > > is at a certain position in the page and has a certain length. > > OK, but you can have item pointers that don't have any item. > LP_REDIRECT, LP_DEAD, LP_UNUSED item pointers don't have items. Sorry to reply to myself, but I realized that I forgot to return to the main point of this thread. If we agree that "an LP_DEAD item pointer does not point to any item" (an assertion that gives a precise meaning to both those terms), then a patch that renames "tuples" to "items" is not doing anything useful IMO, because those two terms are synonyms. Now maybe Peter doesn't agree with the definitions I suggest, in which case I would like to know what his definitions are. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "How strange it is to find the words "Perl" and "saner" in such close proximity, with no apparent sense of irony. I doubt that Larry himself could have managed it." (ncm, http://lwn.net/Articles/174769/)
Re: Windows build warnings
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= writes: > Should we change the compiler checks for attributes in c.h to include > `|| __has_attribute(…)`, so that we automatically get them on compilers > that support that (particularly clang)? clang already #defines GCC, no? regards, tom lane
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On 2021-Nov-24, Robert Haas wrote: > Hmm. I think in my model an item and an item pointer and a line > pointer are all the same thing, but a TID is different. When I talk > about a TID, I mean the location of an item pointer, not its contents. > So a TID is what tells me that I want block 5 and the 4th slot in the > item pointer array. The item pointer tells me that the associate tuple > is at a certain position in the page and has a certain length. OK, but you can have item pointers that don't have any item. LP_REDIRECT, LP_DEAD, LP_UNUSED item pointers don't have items. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "Nunca confiaré en un traidor. Ni siquiera si el traidor lo he creado yo" (Barón Vladimir Harkonnen)
Minor documentation fix - missing blank space
Hi, When I read the documentation about Overview of PostgreSQL Internals - Executor [1], I find there is a missing space in the documentation. diff --git a/doc/src/sgml/arch-dev.sgml b/doc/src/sgml/arch-dev.sgml index 7aff059e82..c2be28fac8 100644 --- a/doc/src/sgml/arch-dev.sgml +++ b/doc/src/sgml/arch-dev.sgml @@ -559,7 +559,7 @@ A simple INSERT ... VALUES command creates a trivial plan tree consisting of a single Result node, which computes just one result row, feeding that up -toModifyTable to perform the insertion. +to ModifyTable to perform the insertion. [1] https://www.postgresql.org/docs/14/executor.html -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On Wed, Nov 24, 2021 at 9:37 AM Alvaro Herrera wrote: > My mental model is that "tuple" (in the narrow context of heap vacuum) > is the variable-size on-disk representation of a row in a page; "line > pointer" is the fixed-size struct at the bottom of each page that > contains location, size and flags of a tuple: struct ItemIdData. The > TID is the address of a line pointer -- an ItemPointerData. > > What is an item? Is an item the same as a line pointer? That seems > confusing. I think "item" means the tuple as a whole. In that light, > using the term TID for some of the things that the patch renames to > "item" seems more appropriate. Hmm. I think in my model an item and an item pointer and a line pointer are all the same thing, but a TID is different. When I talk about a TID, I mean the location of an item pointer, not its contents. So a TID is what tells me that I want block 5 and the 4th slot in the item pointer array. The item pointer tells me that the associate tuple is at a certain position in the page and has a certain length. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Post-CVE Wishlist
On Tue, Nov 23, 2021 at 5:41 PM Heikki Linnakangas wrote: > All that said, I'm not sure how serious I am about this. I think it > would work, and it wouldn't even be very complicated, but it feels > hacky, and that's not a good thing with anything security related. And > the starttls-style negotiation isn't that bad, really. I'm inclined to > do nothing I guess. Thoughts? I am not really persuaded by Jacob's argument that, had this only worked the other way from the start, this bug wouldn't have occurred. That's just a tautology, because we can only have bugs in the code we write, not the code we didn't write. So perhaps we would have just had some other bug, which might have been more or less serious than the one we actually had. It's hard to say, really, because the situation is hypothetical. But on reflection, one thing that isn't very nice about the current approach is that it won't work with anything that doesn't support the PostgreSQL wire protocol specifically. Imagine that you have a driver for PostgreSQL that for some reason does not support SSL, but you want to use SSL to talk to the server. You cannot stick a generic proxy that speaks plaintext on one side and SSL on the other side between that driver and the server and have it work. You will need something that knows how to proxy the PostgreSQL protocol specifically, and that will probably end up being higher-overhead than a generic proxy. There are all sorts of other variants of this scenario, and one of them is probably the motivation behind the request for proxy protocol support. I don't use these kinds of software myself, but I think a lot of people do, and it wouldn't be a bad thing if we could be "plug-compatible" with things that people on the Internet want to do, without needing a PostgreSQL-specific adapter. SSL is certainly one of those things. This argument doesn't answer the question of whether speaking pure SSL on a separate port is better or worse than having a single port that does either. If I had to guess, the latter is more convenient for users but less convenient to code. I don't even see a compelling reason why we can't support multiple models here, supposing someone is willing to do the work and fix the bugs that result. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On 2021-Nov-24, Robert Haas wrote: > On Wed, Nov 24, 2021 at 7:48 AM Masahiko Sawada wrote: > > I think it's more consistent if we change it to one side. I prefer > > "dead items". > > I feel like "items" is quite a generic word, so I think I would prefer > TIDs. But it's probably not a big deal. Is there clarity on what each term means? Since this patch only changes things that are specific to heap vacuuming, it seems OK to rely the convention that "item" means "heap item" (not just any generic item). However, I'm not sure that we fully agree exactly what a heap item is. Maybe if we agree to a single non ambiguous definition for each of those terms we can agree what terminology to use. It seems to me we have the following terms: - tuple - line pointer - [heap] item - TID My mental model is that "tuple" (in the narrow context of heap vacuum) is the variable-size on-disk representation of a row in a page; "line pointer" is the fixed-size struct at the bottom of each page that contains location, size and flags of a tuple: struct ItemIdData. The TID is the address of a line pointer -- an ItemPointerData. What is an item? Is an item the same as a line pointer? That seems confusing. I think "item" means the tuple as a whole. In that light, using the term TID for some of the things that the patch renames to "item" seems more appropriate. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Parallel vacuum workers prevent the oldest xmin from advancing
On Wed, Nov 17, 2021 at 7:28 AM Amit Kapila wrote: > > > The patch looks good to me. But I can't come up with a stable test for > > this. It seems to be hard without stopping and resuming parallel > > vacuum workers. Do you have any good idea? > > > > No, let's wait for a day or so to see if anybody else has any ideas to > write a test for this case, otherwise, I'll check these once again and > push. I set this "committed" in the CF app. -- John Naylor EDB: http://www.enterprisedb.com
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On Wed, Nov 24, 2021 at 7:48 AM Masahiko Sawada wrote: > I think it's more consistent if we change it to one side. I prefer "dead > items". I feel like "items" is quite a generic word, so I think I would prefer TIDs. But it's probably not a big deal. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Support for NSS as a libpq TLS backend
On Wed, Nov 24, 2021 at 8:46 AM Joshua Brindle wrote: > > On Wed, Nov 24, 2021 at 6:59 AM Daniel Gustafsson wrote: > > > > > On 23 Nov 2021, at 23:39, Joshua Brindle > > > wrote: > > > > > It no longer happens with v49, since it was a null deref of the pr_fd > > > which no longer happens. > > > > > > I'll continue testing now, so far it's looking better. > > > > Great, thanks for confirming. I'm still keen on knowing how you triggered > > the > > segfault so I can ensure there are no further bugs around there. > > > > It happened when I ran psql with hostssl on the server but before I'd > initialized my client certificate store. I don't know enough about NSS to know if this is problematic or not but if I try verify-full without having the root CA in the certificate store I get: $ /usr/pgsql-15/bin/psql "host=localhost sslmode=verify-full user=postgres" psql: error: SSL error: Issuer certificate is invalid. unable to shut down NSS context: NSS could not shutdown. Objects are still in use.
Re: Support for NSS as a libpq TLS backend
On Wed, Nov 24, 2021 at 6:59 AM Daniel Gustafsson wrote: > > > On 23 Nov 2021, at 23:39, Joshua Brindle > > wrote: > > > It no longer happens with v49, since it was a null deref of the pr_fd > > which no longer happens. > > > > I'll continue testing now, so far it's looking better. > > Great, thanks for confirming. I'm still keen on knowing how you triggered the > segfault so I can ensure there are no further bugs around there. > It happened when I ran psql with hostssl on the server but before I'd initialized my client certificate store.
Re: row filtering for logical replication
On Tue, Nov 23, 2021 at 4:58 PM Ajin Cherian wrote: > > Attaching a new patchset v41 which includes changes by both Peter and myself. > > Patches v40-0005 and v40-0006 have been merged to create patch > v41-0005 which reduces the patches to 6 again. > This patch-set contains changes addressing the following review comments: > > On Mon, Nov 15, 2021 at 5:48 PM Amit Kapila wrote: > > > > What I meant was that with this new code we have regressed the old > > behavior. Basically, imagine a case where no filter was given for any > > of the tables. Then after the patch, we will remove all the old tables > > whereas before the patch it will remove the oldrels only when they are > > not specified as part of new rels. If you agree with this, then we can > > retain the old behavior and for the new tables, we can always override > > the where clause for a SET variant of command. > > Fixed and modified the behaviour to match with what the schema patch > implemented. > + + /* + * If the new relation or the old relation has a where clause, + * we need to remove it so that it can be added afresh later. + */ + if (RelationGetRelid(newpubrel->relation) == oldrelid && + newpubrel->whereClause == NULL && rfisnull) Can't we use _equalPublicationTable() here? It compares the whereClause as well. Few more comments: = 0001 1. @@ -1039,10 +1081,11 @@ PublicationAddTables(Oid pubid, List *rels, bool if_not_exists, { PublicationRelInfo *pub_rel = (PublicationRelInfo *) lfirst(lc); Relation rel = pub_rel->relation; + Oid relid = RelationGetRelid(rel); ObjectAddress obj; /* Must be owner of the table or superuser. */ - if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId())) + if (!pg_class_ownercheck(relid, GetUserId())) Here, you can directly use RelationGetRelid as was used in the previous code without using an additional variable. 0005 2. +typedef struct { + Relation rel; + bool check_replident; + Bitmapset *bms_replident; +} +rf_context; Add rf_context in the same line where } ends. 3. In the function header comment of rowfilter_walker, you mentioned the simple expressions allowed but we should write why we are doing so. It has been discussed in detail in various emails in this thread. AFAIR, below are the reasons: A. We don't want to allow user-defined functions or operators because (a) if the user drops such a function/operator or if there is any other error via that function, the walsender won't be able to recover from such an error even if we fix the function's problem because it uses a historic snapshot to access row-filter; (b) any other table could be accessed via a function which won't work because of historic snapshots in logical decoding environment. B. We don't allow anything other immutable built-in functions as those can access database and would lead to the problem (b) mentioned in the previous paragraph. Don't we need to check for user-defined types similar to user-defined functions and operators? If not why? 4. + * Rules: Node-type validation + * --- + * Allow only simple or compound expressions like: + * - "(Var Op Const)" or It seems Var Op Var is allowed. I tried below and it works: create publication pub for table t1 where (c1 < c2) WITH (publish = 'insert'); I think it should be okay to allow it provided we ensure that we never access some other table/view etc. as part of the expression. Also, we should document the behavior correctly. -- With Regards, Amit Kapila.
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On Wed, Nov 24, 2021 at 2:46 PM Peter Geoghegan wrote: > > Attached patch performs polishing within vacuumlazy.c, as follow-up > work to the refactoring work in Postgres 14. This mainly consists of > changing references of dead tuples to dead items, which reflects the > fact that VACUUM no longer deals with TIDs that might point to > remaining heap tuples with storage -- the TIDs in the array must now > strictly point to LP_DEAD stub line pointers that remain in the heap, > following pruning. +1 > If there are no objections, I'll move on this soon. It's mostly just > mechanical changes. The patch renames dead tuples to dead items at some places and to dead TIDs at some places. For instance, it renames dead tuples to dead TIDs here: - * Return the maximum number of dead tuples we can record. + * Computes the number of dead TIDs that VACUUM will have to store in the + * worst case, where all line pointers are allocated, and all are LP_DEAD whereas renames to dead items here: - * extra cost of bsearch(), especially if dead tuples on the heap are + * extra cost of bsearch(), especially if dead items on the heap are I think it's more consistent if we change it to one side. I prefer "dead items". --- There is one more place where we can rename "dead tuples": /* * Allocate the space for dead tuples. Note that this handles parallel * VACUUM initialization as part of allocating shared memory space used * for dead_items. */ Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: VS2022: Support Visual Studio 2022 on Windows
Hi On Wed, Nov 24, 2021 at 11:36 AM Michael Paquier wrote: > On Wed, Nov 24, 2021 at 10:00:19AM +, Dave Page wrote: > > It's extremely unlikely that we'd shift to such a new version for PG15. > We > > build many components aside from PostgreSQL, and need to use the same > > toolchain for all of them (we've had very painful experiences with mix n > > match CRT versions in the past) so it's not just PG that needs to support > > VS2022 as far as we're concerned > > Yes, I can understand that upgrading the base version of VS used is a > very difficult exercise. I have been through that, on Windows for > Postgres.. As well as for the compilation of all its dependencies. > > > - Perl, Python, TCL, MIT Kerberos, > > OpenSSL, libxml2, libxslt etc. are all built with the same toolchain for > > consistency. > > Dave, do you include LZ4 in 14? Just asking, as a matter of > curiosity. > Yes we do :-) C:\Program Files\PostgreSQL\14\bin>pg_config BINDIR = C:/PROGRA~1/POSTGR~1/14/bin DOCDIR = C:/PROGRA~1/POSTGR~1/14/doc HTMLDIR = C:/PROGRA~1/POSTGR~1/14/doc INCLUDEDIR = C:/PROGRA~1/POSTGR~1/14/include PKGINCLUDEDIR = C:/PROGRA~1/POSTGR~1/14/include INCLUDEDIR-SERVER = C:/PROGRA~1/POSTGR~1/14/include/server LIBDIR = C:/Program Files/PostgreSQL/14/lib PKGLIBDIR = C:/Program Files/PostgreSQL/14/lib LOCALEDIR = C:/PROGRA~1/POSTGR~1/14/share/locale MANDIR = C:/Program Files/PostgreSQL/14/man SHAREDIR = C:/PROGRA~1/POSTGR~1/14/share SYSCONFDIR = C:/Program Files/PostgreSQL/14/etc PGXS = C:/Program Files/PostgreSQL/14/lib/pgxs/src/makefiles/pgxs.mk CONFIGURE = --enable-thread-safety --enable-nls --with-ldap --with-ssl=openssl --with-uuid --with-libxml --with-libxslt --with-lz4 --with-icu --with-tcl --with-perl --with-python CC = not recorded CPPFLAGS = not recorded CFLAGS = not recorded CFLAGS_SL = not recorded LDFLAGS = not recorded LDFLAGS_EX = not recorded LDFLAGS_SL = not recorded LIBS = not recorded VERSION = PostgreSQL 14.1 -- Dave Page Blog: https://pgsnake.blogspot.com Twitter: @pgsnake EDB: https://www.enterprisedb.com
Re: Support for NSS as a libpq TLS backend
> On 23 Nov 2021, at 23:39, Joshua Brindle > wrote: > It no longer happens with v49, since it was a null deref of the pr_fd > which no longer happens. > > I'll continue testing now, so far it's looking better. Great, thanks for confirming. I'm still keen on knowing how you triggered the segfault so I can ensure there are no further bugs around there. -- Daniel Gustafsson https://vmware.com/
Re: Teach pg_receivewal to use lz4 compression
On Wed, Nov 24, 2021 at 10:55 AM Michael Paquier wrote: > On Mon, Nov 22, 2021 at 09:02:47AM -0500, Robert Haas wrote: > > On Mon, Nov 22, 2021 at 12:46 AM Jeevan Ladhe > > wrote: > >> Fair enough. But, still I have a doubt in mind what benefit would that > >> really bring to us here, because we are immediately also freeing the > >> lz4buf without using it anywhere. > > > > Yeah, I'm also doubtful about that. If we're freeng the compression > > context, we shouldn't need to guarantee that it's in any particular > > state before doing so. Why would any critical cleanup be part of > > LZ4F_compressEnd() rather than LZ4F_freeCompressionContext()? The > > point of LZ4F_compressEnd() is to make sure all of the output bytes > > get written, and it would be stupid to force people to write the > > output bytes even when they've decided that they no longer care about > > them due to some error. > > Hmm. I have double-checked all that, and I agree that we could just > skip LZ4F_compressEnd() in this error code path. From what I can see > in the upstream code, what we have now is not broken either, but the > compressEnd() call does some work that's not needed here. Yes I agree that we are not broken, but as you said we are doing some an extra bit of work here. Regards, Jeevan Ladhe
Re: Skipping logical replication transactions on subscriber side
On Wed, Nov 17, 2021 at 8:14 PM Amit Kapila wrote: > > On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada > wrote: > > > > Right. I've fixed this issue and attached an updated patch. > > > > Few comments/questions: > = > 1. > + > + The pg_stat_subscription_workers view will > contain > + one row per subscription error reported by workers applying logical > + replication changes and workers handling the initial data copy of the > + subscribed tables. The statistics entry is removed when the subscription > + the worker is running on is removed. > + > > The last line of this paragraph is not clear to me. First "the" before > "worker" in the following part of the sentence seems unnecessary > "..when the subscription the worker..". Then the part "running on is > removed" is unclear because it could also mean that we remove the > entry when a subscription is disabled. Can we rephrase it to: "The > statistics entry is removed when the corresponding subscription is > dropped"? Agreed. Fixed. > > 2. > Between v20 and v23 versions of patch the size of hash table > PGSTAT_SUBWORKER_HASH_SIZE is increased from 32 to 256. I might have > missed the comment which lead to this change, can you point me to the > same or if you changed it for some other reason, can you let me know > the same? I'd missed reverting this change. I considered increasing this value since the lifetime of subscription is long. But when it comes to unshared hashtable can be expanded on-the-fly, it's better to start with a small value. Reverted. > > 3. > + > + /* > + * Repeat for subscription workers. Similarly, we needn't bother > + * in the common case where no function stats are being collected. > + */ > > /function/subscription workers' Fixed. > > 4. > + > + Name of command being applied when the error occurred. This field > + is always NULL if the error was reported during the initial data > + copy. > + > + > + > + > + > + xid xid > + > + > + Transaction ID of the publisher node being applied when the error > + occurred. This field is always NULL if the error was reported > + during the initial data copy. > + > > Is it important to stress on 'always' in the above two descriptions? No, removed. > > 5. > The current description of first/last_error_time seems sliglthy > misleading as one can interpret that these are about different errors. > Let's slightly change the description of first/last_error_time as > follows or something on those lines: > > > + > + Time at which the first error occurred > + > + > > First time at which this error occurred > > last_error_time timestamp with time > zone > + > + > + Time at which the last error occurred > > Last time at which this error occurred. This will be the same as > first_error_time except when the same error occurred more than once > consecutively. Changed. I've removed first_error_time as per discussion on the thread for adding xact stats. > > 6. > + > +pg_stat_reset_subscription_worker ( > subid oid, > relid oid ) > +void > + > + > +Resets the statistics of a single subscription worker running on the > +subscription with subid shown in the > +pg_stat_subscription_worker view. If the > +argument relid is not NULL, > +resets statistics of the subscription worker handling the initial > data > +copy of the relation with relid. Otherwise, > +resets the subscription worker statistics of the main apply worker. > +If the argument relid is omitted, resets the > +statistics of all subscription workers running on the subscription > +with subid. > + > > The first line of this description seems to indicate that we can only > reset the stats of a single worker but the later part indicates that > we can reset stats of all subscription workers. Can we change the > first line as: "Resets the statistics of subscription workers running > on the subscription with subid shown in the > pg_stat_subscription_worker view.". > Changed. > 7. > pgstat_vacuum_stat() > { > .. > + pgstat_setheader(_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE); > + spmsg.m_databaseid = MyDatabaseId; > + spmsg.m_nentries = 0; > .. > } > > Do we really need to set the header here? It seems to be getting set > in pgstat_send_subscription_purge() while sending this message. Removed. > > 8. > pgstat_vacuum_stat() > { > .. > + > + if (hash_search(htab, (void *) &(subwentry->key.subid), HASH_FIND, NULL) > + != NULL) > + continue; > + > + /* This subscription is dead, add the subid to the message */ > + spmsg.m_subids[spmsg.m_nentries++] = subwentry->key.subid; > .. > } > > I think it is better to use a separate variable here for subid as we > are using for funcid and tableid. That will make this part of the code > easier to follow and look consistent.
Re: VS2022: Support Visual Studio 2022 on Windows
On Wed, Nov 24, 2021 at 10:00:19AM +, Dave Page wrote: > It's extremely unlikely that we'd shift to such a new version for PG15. We > build many components aside from PostgreSQL, and need to use the same > toolchain for all of them (we've had very painful experiences with mix n > match CRT versions in the past) so it's not just PG that needs to support > VS2022 as far as we're concerned Yes, I can understand that upgrading the base version of VS used is a very difficult exercise. I have been through that, on Windows for Postgres.. As well as for the compilation of all its dependencies. > - Perl, Python, TCL, MIT Kerberos, > OpenSSL, libxml2, libxslt etc. are all built with the same toolchain for > consistency. Dave, do you include LZ4 in 14? Just asking, as a matter of curiosity. -- Michael signature.asc Description: PGP signature
Re: [BUG]Missing REPLICA IDENTITY check when DROP NOT NULL
On Wed, Nov 24, 2021 at 07:04:51AM +, tanghy.f...@fujitsu.com wrote: > create table tbl (a int not null unique); > alter table tbl replica identity using INDEX tbl_a_key; > alter table tbl alter column a drop not null; > insert into tbl values (null); Oops. Yes, that's obviously not good. > To solve the above problem, I think it's better to add a check when > executing ALTER COLUMN DROP NOT NULL, > and report an error if this column is part of replica identity. I'd say that you are right to block the operation. I'll try to play a bit with this stuff tomorrow. > Attaching a patch that disallow DROP NOT NULL on a column if it's in > a REPLICA IDENTITY index. Also added a test in it. if (indexStruct->indkey.values[i] == attnum) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), - errmsg("column \"%s\" is in a primary key", + errmsg(ngettext("column \"%s\" is in a primary key", + "column \"%s\" is in a REPLICA IDENTITY index", + indexStruct->indisprimary), colName))); Using ngettext() looks incorrect to me here as it is used to get the plural form of a string, so you'd better make these completely separated instead. -- Michael signature.asc Description: PGP signature
Re: Windows build warnings
Daniel Gustafsson writes: > On 22 Nov 2021, at 16:40, Tom Lane wrote: > >> I can't find anything that is providing a non-empty definition of >> PG_USED_FOR_ASSERTS_ONLY (a/k/a pg_attribute_unused) for anything >> except GCC. > > It's supported in clang as well per the documentation [0] in at least some > configurations or distributions: > > "The [[maybe_unused]] (or __attribute__((unused))) attribute can be > used to silence such diagnostics when the entity cannot be removed. > For instance, a local variable may exist solely for use in an assert() > statement, which makes the local variable unused when NDEBUG is > defined." Should we change the compiler checks for attributes in c.h to include `|| __has_attribute(…)`, so that we automatically get them on compilers that support that (particularly clang)? - ilmari
Re: VS2022: Support Visual Studio 2022 on Windows
Hi On Wed, Nov 24, 2021 at 9:12 AM Hans Buschmann wrote: > Hello Michael, > > thanks for your hard work and quick response! > It is very convenient to only use VS2022 for Windows from now on... > > >Diff unrelated to your patch. > > Sorry for the copysoft problem from the first version. > > >Glad to see that we should have nothing to do about locales this > >time. I have not tested, but I think that you covering all the areas > >that need a refresh here. Nice work. > > I think it is almost impossible to overestimate the value of such support > from experienced hackers to others starting their journey right now... > > I hope I can motivate you (and other experienced hackers) to give me some > more support on my real project arriving anytime soon. It addresses > hex_encoding (and more) targetting mostly pg_dump, but requires also some > deeper knowledge of general infrastructure and building (also on Windows). > Stay tuned! > > PS: Does anybody have good relations to EDB suggesting them to target > VS2022 as the build environment for the upcoming PG15 release? > That would be me... > > postgres=# select version (); > version > > PostgreSQL 14.1, compiled by Visual C++ build 1931, 64-bit > (1 row) > It's extremely unlikely that we'd shift to such a new version for PG15. We build many components aside from PostgreSQL, and need to use the same toolchain for all of them (we've had very painful experiences with mix n match CRT versions in the past) so it's not just PG that needs to support VS2022 as far as we're concerned - Perl, Python, TCL, MIT Kerberos, OpenSSL, libxml2, libxslt etc. are all built with the same toolchain for consistency. -- Dave Page Blog: https://pgsnake.blogspot.com Twitter: @pgsnake EDB: https://www.enterprisedb.com
Re: row filtering for logical replication
On Tue, Nov 23, 2021 at 4:58 PM Ajin Cherian wrote: > > Attaching a new patchset v41 which includes changes by both Peter and myself. > > Patches v40-0005 and v40-0006 have been merged to create patch > v41-0005 which reduces the patches to 6 again. Few comments: 1) I'm not sure if we will be able to throw a better error message in this case "ERROR: missing FROM-clause entry for table "t4"", if possible you could change it. + if (pri->whereClause != NULL) + { + /* Set up a pstate to parse with */ + pstate = make_parsestate(NULL); + pstate->p_sourcetext = nodeToString(pri->whereClause); + + nsitem = addRangeTableEntryForRelation(pstate, targetrel, + AccessShareLock, + NULL, false, false); + addNSItemToQuery(pstate, nsitem, false, true, true); + + whereclause = transformWhereClause(pstate, + copyObject(pri->whereClause), + EXPR_KIND_PUBLICATION_WHERE, + "PUBLICATION"); + + /* Fix up collation information */ + assign_expr_collations(pstate, whereclause); + } alter publication pub1 add table t5 where ( t4.c1 = 10); ERROR: missing FROM-clause entry for table "t4" LINE 1: alter publication pub1 add table t5 where ( t4.c1 = 10); ^ pstate->p_expr_kind is stored as EXPR_KIND_PUBLICATION_WHERE, we could differentiate using expr_kind. 2) Should '"delete" or "delete"' be '"delete" or "update"' --- a/src/backend/catalog/pg_publication.c +++ b/src/backend/catalog/pg_publication.c @@ -340,7 +340,7 @@ rowfilter_walker(Node *node, rf_context *context) * 1. Only certain simple node types are permitted in the expression. See * function rowfilter_walker for details. * - * 2. If the publish operation contains "delete" then only columns that + * 2. If the publish operation contains "delete" or "delete" then only columns that * are allowed by the REPLICA IDENTITY rules are permitted to be used in the * row-filter WHERE clause. */ @@ -352,12 +352,10 @@ rowfilter_expr_checker(Publication *pub, Relation rel, Node *rfnode) context.rel = rel; /* -* For "delete", check that filter cols are also valid replica identity +* For "delete" or "update", check that filter cols are also valid replica identity * cols. 3) Should we include row filter condition in pg_publication_tables view like in describe publication(\dRp+) , since the prqual is not easily readable in pg_publication_rel table: select * from pg_publication_tables ; pubname | schemaname | tablename -++--- pub1| public | t1 (1 row) select * from pg_publication_rel ; oid | prpubid | prrelid | prqual ---+-+-+--- - 16389 | 16388 | 16384 | {OPEXPR :opno 518 :opfuncid 144 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :va rnosyn 1 :varattnosyn 1 :location 45} {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 51 :constvalue 4 [ 0 0 0 0 0 0 0 0 ]}) :location 48} (1 row) 4) This should be included in typedefs.list, also we could add some comments for this structure +typedef struct { + Relationrel; + Bitmapset *bms_replident; +} +rf_context; 5) Few includes are not required. #include "miscadmin.h" not required in pg_publication.c, #include "executor/executor.h" not required in proto.c, #include "access/xact.h", #include "executor/executor.h" and #include "replication/logicalrelation.h" not required in pgoutput.c 6) typo "filte" should be "filter": +/* + * The row filte walker checks that the row filter expression is legal. + * + * Rules: Node-type validation + * --- + * Allow only simple or compound expressions like: + * - "(Var Op Const)" or + * - "(Var Op Const) Bool (Var Op Const)" Regards, Vignesh
AW: VS2022: Support Visual Studio 2022 on Windows
Hello Michael, thanks for your hard work and quick response! It is very convenient to only use VS2022 for Windows from now on... >Diff unrelated to your patch. Sorry for the copysoft problem from the first version. >Glad to see that we should have nothing to do about locales this >time. I have not tested, but I think that you covering all the areas >that need a refresh here. Nice work. I think it is almost impossible to overestimate the value of such support from experienced hackers to others starting their journey right now... I hope I can motivate you (and other experienced hackers) to give me some more support on my real project arriving anytime soon. It addresses hex_encoding (and more) targetting mostly pg_dump, but requires also some deeper knowledge of general infrastructure and building (also on Windows). Stay tuned! PS: Does anybody have good relations to EDB suggesting them to target VS2022 as the build environment for the upcoming PG15 release? postgres=# select version (); version PostgreSQL 14.1, compiled by Visual C++ build 1931, 64-bit (1 row) Thanks! Hans Buschmann
Re: rename SnapBuild* macros in slot.c
On Tue, Nov 23, 2021 at 8:12 AM Amit Kapila wrote: > > On Mon, Nov 22, 2021 at 7:47 PM Alvaro Herrera > wrote: > > > > On 2021-Nov-22, Amit Kapila wrote: > > > > > +1 for this change. This seems to be introduced by commit ec5896aed3 > > > [1] and I think it is just a typo to name these macros starting with > > > SnapBuildOnDisk* unless I am missing something. > > > > Yeah, it looks pretty weird. +1 for the change on consistency grounds. > > > > Okay, I will push this tomorrow unless I see any objections. > Pushed! -- With Regards, Amit Kapila.
Re: parallel vacuum comments
On Wed, Nov 24, 2021 at 12:16 PM Masahiko Sawada wrote: > > On Wed, Nov 24, 2021 at 1:34 PM Amit Kapila wrote: > > > > On Wed, Nov 24, 2021 at 7:43 AM Masahiko Sawada > > wrote: > > > > > > On Mon, Nov 22, 2021 at 1:48 PM Amit Kapila > > > wrote: > > > > > > > > On Fri, Nov 19, 2021 at 7:55 AM houzj.f...@fujitsu.com > > > > wrote: > > > > > > > > > > > > > > 4) > > > > > > > > > > Just a personal suggestion for the parallel related function name. > > > > > Since Andres > > > > > wanted a uniform naming pattern. Mabe we can rename the following > > > > > functions: > > > > > > > > > > end|begin_parallel_vacuum => parallel_vacuum_end|begin > > > > > perform_parallel_index_bulkdel|cleanup => > > > > > parallel_vacuum_index_bulkdel|cleanup > > > > > > > > > > So that all the parallel related functions' name is like > > > > > parallel_vacuum_xxx. > > > > > > > > > > > > > BTW, do we really need functions > > > > perform_parallel_index_bulkdel|cleanup? Both do some minimal > > > > assignments and then call parallel_vacuum_all_indexes() and there is > > > > just one caller of each. Isn't it better to just do those assignments > > > > in the caller and directly call parallel_vacuum_all_indexes()? > > > > > > The reason why I declare these two functions are: (1) the fields of > > > ParallelVacuumState are not exposed and (2) bulk-deletion and cleanup > > > require different arguments (estimated_count is required only by > > > cleanup). So if we expose the fields of ParallelVacuumState, the > > > caller can do those assignments and directly call > > > parallel_vacuum_all_indexes(). But I'm not sure it's good if those > > > assignments are the caller's responsibility. > > > > > > > Okay, that makes sense. However, I am still not very comfortable with > > the function naming suggested by Hou-San, do you have any thoughts on > > that? > > I personally don't disagree with the names starting with > "parallel_vacuum_*". > I don't have any strong opinion here but I prefer the name which makes more sense in the context it is being used. OTOH, I see there is an argument that it will be easier to follow and might appear consistent if we use parallel_vacuum_*. -- With Regards, Amit Kapila.
Re: Skipping logical replication transactions on subscriber side
On Wed, Nov 24, 2021 at 1:50 PM Masahiko Sawada wrote: > > On Wed, Nov 24, 2021 at 12:13 PM Amit Kapila wrote: > > > > On Wed, Nov 24, 2021 at 7:51 AM Masahiko Sawada > > wrote: > > > > > > On Thu, Nov 18, 2021 at 12:52 PM houzj.f...@fujitsu.com > > > wrote: > > > > > > > > On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada > > > > wrote: > > > > > Right. I've fixed this issue and attached an updated patch. > > > > > > > > > Hi, > > > > > > > > I have few comments for the testcases. > > > > > > > > 1) > > > > > > > > +my $appname = 'tap_sub'; > > > > +$node_subscriber->safe_psql( > > > > +'postgres', > > > > +"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr > > > > application_name=$appname' PUBLICATION tap_pub WITH (streaming = off, > > > > two_phase = on);"); > > > > +my $appname_streaming = 'tap_sub_streaming'; > > > > +$node_subscriber->safe_psql( > > > > +'postgres', > > > > +"CREATE SUBSCRIPTION tap_sub_streaming CONNECTION > > > > '$publisher_connstr application_name=$appname_streaming' PUBLICATION > > > > tap_pub_streaming WITH (streaming = on, two_phase = on);"); > > > > + > > > > > > > > I think we can remove the 'application_name=$appname', so that the > > > > command > > > > could be shorter. > > > > > > But we wait for the subscription to catch up by using > > > wait_for_catchup() with application_name, no? > > > > > > > Yeah, but you can directly use the subscription name in > > wait_for_catchup because we internally use that as > > fallback_application_name. If application_name is not specified in the > > connection string as suggested by Hou-San then > > fallback_application_name will be considered. Both ways are okay and I > > see we use both ways in the tests but it seems there are more places > > where we use the method Hou-San is suggesting in subscription tests. > > Okay, thanks! I referred to tests that set application_name. ISTM it's > better to unite them so as not to confuse them in future tests. > Agreed, but let's do this clean-up as a separate patch. Feel free to submit the patch for the same in a separate thread. -- With Regards, Amit Kapila.
Re: pg_get_publication_tables() output duplicate relid
On Mon, Nov 22, 2021 at 12:55 PM Amit Langote wrote: > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila wrote: > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote > > wrote: > > > As in, > > > do we know of any replication (initial/streaming) misbehavior caused > > > by the duplicate partition OIDs in this case or is the only problem > > > that pg_publication_tables output looks odd? > > > > The latter one but I think either we should document this or change it > > as we can't assume users will follow what subscriber-side code does. > > On second thought, I agree that de-duplicating partitions from this > view is an improvement. > Fair enough. Hou-San, Can you please submit the updated patch after fixing any pending comments including the test case? -- With Regards, Amit Kapila.
Re: pg_get_publication_tables() output duplicate relid
On Wed, Nov 24, 2021 at 12:02 PM Amit Langote wrote: > > On Tue, Nov 23, 2021 at 12:21 PM Amit Kapila wrote: > > Isn't it better to document this case and explain what > > users can expect instead of trying to design a solution around this? > > I thought about the problems you've described and it looks like I > hadn't considered many of the details which complicate implementing a > solution for this. > > So yeah, documenting the ATTACH issue as a limitation sounds like the > best course for now. I might word it as follows and add it under > Notes at https://www.postgresql.org/docs/current/sql-createpublication.html: > > "ATTACHing a table into a partition tree whose root is published using > a publication with publish_via_partition_root set to true does not > result in the table's existing contents to be replicated." > Instead of "to be", shall we use "being"? > I'm not sure there's a clean enough workaround to this that we can add > to the paragraph. > > Does that make sense? > Yeah, that sounds reasonable. I think the case with "publish_via_partition_root = false" should work but please test it once. The other thing which we need to consider about all these fixes is whether to back patch them or not. I think it is on case to case basis. I feel this limitation should be documented and backpatched, what do you think? Feel free to submit patches accordingly. > > Even if we do so the streaming after attach partition problem as > > discussed above should be fixed. > > I agree. I have reproduced the problem though haven't managed to pin > down the cause yet. > No problem, do let us know whatever be your findings at the end. I think now we know how to proceed with this attach issue, maybe we should also try to commit the fixes for other issues. -- With Regards, Amit Kapila.
Re: Post-CVE Wishlist
On 24/11/2021 07:09, Michael Paquier wrote: On Tue, Nov 23, 2021 at 02:18:30PM -0500, Tom Lane wrote: Jacob Champion writes: = Client-Side Auth Selection = The second request is for the client to stop fully trusting the server during the authentication phase. If I tell libpq to use a client certificate, for example, I don't think the server should be allowed to extract a plaintext password from my environment (at least not without my explicit opt-in). Yeah. I don't recall whether it's been discussed in public or not, but it certainly seems like libpq should be able to be configured so that (for example) it will never send a cleartext password. It's not clear to me what extent of configurability would be useful, and I don't want to overdesign it --- but that much at least would be a good thing. I recall this part being discussed in public, but I cannot put my finger on the exact thread. I think that this was around when we discussed the open items of 10 or 11 for things around channel binding and how libpq was sensitive to downgrade attacks, which would mean around 2016 or 2017. I also recall reading (writing?) a patch that introduced a new connection parameter that takes in input a comma-separated list of keywords to allow the user to choose a set of auth methods accepted, failing if the server is willing to use a method that does not match with what the user has put in his list. Perhaps this last part has never reached -hackers though :) Anyway, the closest thing I can put my finger on now is that: https://www.postgresql.org/message-id/c5cb08f4cce46ff661ad287fadaa1...@postgrespro.ru Here's a thread: https://www.postgresql.org/message-id/227015d8417f2b4fef03f8966dbfa5cbcc4f44da.camel%40j-davis.com The result of that thread was that we added the channel_binding=require/prefer/disable option. - Heikki
Re: Skipping logical replication transactions on subscriber side
On Wed, Nov 24, 2021 at 12:13 PM Amit Kapila wrote: > > On Wed, Nov 24, 2021 at 7:51 AM Masahiko Sawada wrote: > > > > On Thu, Nov 18, 2021 at 12:52 PM houzj.f...@fujitsu.com > > wrote: > > > > > > On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada > > > wrote: > > > > Right. I've fixed this issue and attached an updated patch. > > > > > > > Hi, > > > > > > I have few comments for the testcases. > > > > > > 1) > > > > > > +my $appname = 'tap_sub'; > > > +$node_subscriber->safe_psql( > > > +'postgres', > > > +"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr > > > application_name=$appname' PUBLICATION tap_pub WITH (streaming = off, > > > two_phase = on);"); > > > +my $appname_streaming = 'tap_sub_streaming'; > > > +$node_subscriber->safe_psql( > > > +'postgres', > > > +"CREATE SUBSCRIPTION tap_sub_streaming CONNECTION > > > '$publisher_connstr application_name=$appname_streaming' PUBLICATION > > > tap_pub_streaming WITH (streaming = on, two_phase = on);"); > > > + > > > > > > I think we can remove the 'application_name=$appname', so that the command > > > could be shorter. > > > > But we wait for the subscription to catch up by using > > wait_for_catchup() with application_name, no? > > > > Yeah, but you can directly use the subscription name in > wait_for_catchup because we internally use that as > fallback_application_name. If application_name is not specified in the > connection string as suggested by Hou-San then > fallback_application_name will be considered. Both ways are okay and I > see we use both ways in the tests but it seems there are more places > where we use the method Hou-San is suggesting in subscription tests. Okay, thanks! I referred to tests that set application_name. ISTM it's better to unite them so as not to confuse them in future tests. Anyway, I'll remove it in the next version patch that I'll submit soon. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/