Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro wrote: > I won't be surprised if the answer is: if you're holding a reference, > you have to get a pin (referring to bulk_write.c). Ahhh, on second thoughts, I take that back, I think the original theory still actually works just fine. It's just that somewhere in our refactoring of that commit, when we were vacillating between different semantics for 'destroy' and 'release', I think we made a mistake: in RelationCacheInvalidate() I think we should now call smgrreleaseall(), not smgrdestroyall(). That satisfies the requirements for sinval queue overflow: we close md.c segments (and most importantly virtual file descriptors), so our lack of sinval records can't hurt us, we'll reopen all files as required. That's what CLOBBER_CACHE_ALWAYS is effectively testing (and more). But the smgr pointer remains valid, and retains only its "identity", eg hash table key, and that's also fine. It won't be destroyed until after the end of the transaction. Which was the point, and it allows things like bulk_write.c (and streaming_read.c) to hold an smgr reference. Right?
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
On Sun, Mar 10, 2024 at 5:02 PM Thomas Munro wrote: > Thanks, reproduced here (painfully slowly). Looking... I changed that ERROR to a PANIC and now I can see that _bt_metaversion() is failing to read a meta page (block 0), and the file is indeed of size 0 in my filesystem. Which is not cool, for a btree. Looking at btbuildempty(), we have this sequence: bulkstate = smgr_bulk_start_rel(index, INIT_FORKNUM); /* Construct metapage. */ metabuf = smgr_bulk_get_buf(bulkstate); _bt_initmetapage((Page) metabuf, P_NONE, 0, allequalimage); smgr_bulk_write(bulkstate, BTREE_METAPAGE, metabuf, true); smgr_bulk_finish(bulkstate); Ooh. One idea would be that the smgr lifetime stuff is b0rked, introducing corruption. Bulk write itself isn't pinning the smgr relation, it's relying purely on the object not being invalidated, which the theory of 21d9c3ee's commit message allowed for but ... here it's destroyed (HASH_REMOVE'd) sooner under CACHE_CLOBBER_ALWAYS, which I think we failed to grok. If that's it, I'm surprised that things don't implode more spectacularly. Perhaps HASH_REMOVE should clobber objects in debug builds, similar to pfree? For that hypothesis, the corruption might not be happening in the above-quoted code itself, because it doesn't seem to have an invalidation acceptance point (unless I'm missing it). Some other bulk write got mixed up? Not sure yet. I won't be surprised if the answer is: if you're holding a reference, you have to get a pin (referring to bulk_write.c).
Re: Allow non-superuser to cancel superuser tasks.
Another comment that I forgot to mention is that we should also make the documentation change in doc/src/sgml/user-manag.sgml for this new predefined role Thanks. -- Anthony Leung Amazon Web Services: https://aws.amazon.com
Re: Allow non-superuser to cancel superuser tasks.
Hi, I'm new to reviewing postgres patches, but I took an interest in reviewing this patch as recommended by Nathan. I have the following comments: > if (!superuser()) { > if (!OidIsValid(proc->roleId)) { > LocalPgBackendStatus *local_beentry; > local_beentry = > pgstat_get_local_beentry_by_backend_id(proc->backendId); > > if (!(local_beentry && > local_beentry->backendStatus.st_backendType == B_AUTOVAC_WORKER && > has_privs_of_role(GetUserId(), > ROLE_PG_SIGNAL_AUTOVACUUM))) > return SIGNAL_BACKEND_NOSUPERUSER; > } else { > if (superuser_arg(proc->roleId)) > return SIGNAL_BACKEND_NOSUPERUSER; > > /* Users can signal backends they have role membership > in. */ > if (!has_privs_of_role(GetUserId(), proc->roleId) && > !has_privs_of_role(GetUserId(), > ROLE_PG_SIGNAL_BACKEND)) > return SIGNAL_BACKEND_NOPERMISSION; > } > } > 1. I would suggest not to do nested if blocks since it's becoming harder to read. Also, does it make sense to have a utilities function in backend_status.c to check if a backend of a given backend id is of a certain backend_type. And should we check if proc->backendId is invalid? >ALTER SYSTEM SET autovacuum_vacuum_cost_limit TO 1; >ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO 100; >ALTER SYSTEM SET autovacuum_naptime TO 1; 2. Could we set these parameters at the beginning of the test before $node->start with $node->append_conf ? That way we can avoid restarting the node and doing the sleep later on. > my $res_pid = $node_primary->safe_psql( >. 'regress', > "SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum > worker' and datname = 'regress';" > ); > > my ($res_reg_psa_1, $stdout_reg_psa_1, $stderr_reg_psa_1) = > $node_primary->psql('regress', qq[ SET ROLE psa_reg_role_1; > SELECT pg_terminate_backend($res_pid); > ]); > > ok($res_reg_psa_1 != 0, "should fail for non pg_signal_autovacuum"); > like($stderr_reg_psa_1, qr/Only roles with the SUPERUSER attribute may > terminate processes of roles with the SUPERUSER attribute./, "matches"); > > my ($res_reg_psa_2, $stdout_reg_psa_2, $stderr_reg_psa_2) = > $node_primary->psql('regress', qq[ >SET ROLE psa_reg_role_2; >SELECT pg_terminate_backend($res_pid); > ]"); 3. Some nits on styling 4. According to Postgres styles, I believe open brackets should be in a new line
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
On Sat, Mar 9, 2024 at 9:48 AM Tomas Vondra wrote: > I spent a bit of time investigating this today, but haven't made much > progress due to (a) my unfamiliarity with the smgr code in general and > the patch in particular, and (b) CLOBBER_CACHE_ALWAYS making it quite > time consuming to iterate and experiment. > > However, the smallest schedule that still reproduces the issue is: > > --- > test: test_setup > > test: create_aggregate create_function_sql create_cast constraints > triggers select inherit typed_table vacuum drop_if_exists > updatable_views roleattributes create_am hash_func errors infinite_recurse > --- Thanks, reproduced here (painfully slowly). Looking... Huh, also look at these extra problems later in the logs of the latest trilobite and avocet runs, further down after the short read errors: TRAP: failed Assert("j > attnum"), File: "heaptuple.c", Line: 640, PID: 15753 postgres: autovacuum worker regression(ExceptionalCondition+0x67)[0x9c5f37] postgres: autovacuum worker regression[0x4b60c8] postgres: autovacuum worker regression[0x5ff735] postgres: autovacuum worker regression[0x5fe468] postgres: autovacuum worker regression(analyze_rel+0x133)[0x5fd5e3] postgres: autovacuum worker regression(vacuum+0x6b6)[0x683926] postgres: autovacuum worker regression[0x7ce5e3] postgres: autovacuum worker regression[0x7cc4f0] postgres: autovacuum worker regression(StartAutoVacWorker+0x22)[0x7cc152] postgres: autovacuum worker regression[0x7d57d1] postgres: autovacuum worker regression[0x7d37bf] postgres: autovacuum worker regression[0x6f5a4f] Then crash recovery fails, in one case with: 2024-03-07 20:28:18.632 CET [15860:4] WARNING: will not overwrite a used ItemId 2024-03-07 20:28:18.632 CET [15860:5] CONTEXT: WAL redo at 0/FB07A48 for Heap/INSERT: off: 9, flags: 0x00; blkref #0: rel 1663/16384/2610, blk 12 2024-03-07 20:28:18.632 CET [15860:6] PANIC: failed to add tuple 2024-03-07 20:28:18.632 CET [15860:7] CONTEXT: WAL redo at 0/FB07A48 for Heap/INSERT: off: 9, flags: 0x00; blkref #0: rel 1663/16384/2610, blk 12 ... and in another with: 2024-03-05 11:51:27.992 CET [25510:4] PANIC: invalid lp 2024-03-05 11:51:27.992 CET [25510:5] CONTEXT: WAL redo at 0/F87A8D0 for Heap/INPLACE: off: 29; blkref #0: rel 1663/16384/20581, blk 0
Re: Add checkpoint/redo LSNs to recovery errors.
On 2/29/24 16:42, Michael Paquier wrote: On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote: This patch adds checkpoint/redo LSNs to recovery error messages where they may be useful for debugging. I've scanned a bit xlogrecovery.c, and I have spotted a couple of that could gain more information. ereport(PANIC, (errmsg("invalid redo record in shutdown checkpoint"))); [...] ereport(PANIC, (errmsg("invalid redo in checkpoint record"))); These two could mention CheckPointLoc and checkPoint.redo. ereport(PANIC, (errmsg("invalid next transaction ID"))); Perhaps some XID information could be added here? ereport(FATAL, (errmsg("WAL ends before consistent recovery point"))); [...] ereport(FATAL, (errmsg("WAL ends before end of online backup"), These two are in xlog.c, and don't mention backupStartPoint for the first one. Perhaps there's a point in adding some information about EndOfLog and LocalMinRecoveryPoint as well? For now I'd like to just focus on these three messages that are related to a missing backup_label or a misconfiguration of restore_command when backup_label is present. No doubt there are many other recovery log messages that could be improved, but I'd rather not bog down the patch. Regards, -David
Re: Add recovery to pg_control and remove backup_label
On 1/29/24 12:28, David Steele wrote: On 1/28/24 19:11, Michael Paquier wrote: On Fri, Jan 26, 2024 at 06:27:30PM +0530, vignesh C wrote: Please post an updated version for the same. [1] - http://cfbot.cputube.org/patch_46_3511.log With the recent introduction of incremental backups that depend on backup_label and the rather negative feedback received, I think that it would be better to return this entry as RwF for now. What do you think? I've been thinking it makes little sense to update the patch. It would be a lot of work with all the new changes for incremental backup and since Andres and Robert appear to be very against the idea, I doubt it would be worth the effort. I've had a new idea which may revive this patch. The basic idea is to keep backup_label but also return a copy of pg_control from pg_stop_backup(). This copy of pg_control would be safe from tears and have a backupLabelRequired field set (as Andres suggested) so recovery cannot proceed without the backup label. So, everything will continue to work as it does now. But, backup software can be enhanced to write the improved pg_control that is guaranteed not to be torn and has protection against a missing backup label. Of course, pg_basebackup will write the new backupLabelRequired field into pg_control, but this way third party software can also gain advantages from the new field. Thoughts? Regards, -David
Re: Vectored I/O in bulk_write.c
Slightly better version, adjusting a few obsoleted comments, adjusting error message to distinguish write/extend, fixing a thinko in smgr_cached_nblocks maintenance. From c786f979b0c38364775e32b9403b79303507d37b Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 9 Mar 2024 16:04:21 +1300 Subject: [PATCH v2 1/2] Provide vectored variant of smgrextend(). Since mdwrite() and mdextend() were basically the same, merge them. They had different assertions, but those would surely apply to any implementation of smgr, so move them up to the smgr.c wrapper level. The other difference was the policy on segment creation, but that can be captured by having smgrwritev() and smgrextendv() call a single mdwritev() function with a new "extend" flag. --- src/backend/storage/smgr/md.c | 106 +--- src/backend/storage/smgr/smgr.c | 32 ++ src/include/storage/md.h| 3 +- src/include/storage/smgr.h | 12 +++- 4 files changed, 47 insertions(+), 106 deletions(-) diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index bf0f3ca76d..9b12584122 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -447,79 +447,10 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo) pfree(path); } -/* - * mdextend() -- Add a block to the specified relation. - * - * The semantics are nearly the same as mdwrite(): write at the - * specified position. However, this is to be used for the case of - * extending a relation (i.e., blocknum is at or beyond the current - * EOF). Note that we assume writing a block beyond current EOF - * causes intervening file space to become filled with zeroes. - */ -void -mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, - const void *buffer, bool skipFsync) -{ - off_t seekpos; - int nbytes; - MdfdVec*v; - - /* If this build supports direct I/O, the buffer must be I/O aligned. */ - if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ) - Assert((uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)); - - /* This assert is too expensive to have on normally ... */ -#ifdef CHECK_WRITE_VS_EXTEND - Assert(blocknum >= mdnblocks(reln, forknum)); -#endif - - /* - * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any - * more --- we mustn't create a block whose number actually is - * InvalidBlockNumber. (Note that this failure should be unreachable - * because of upstream checks in bufmgr.c.) - */ - if (blocknum == InvalidBlockNumber) - ereport(ERROR, -(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("cannot extend file \"%s\" beyond %u blocks", - relpath(reln->smgr_rlocator, forknum), - InvalidBlockNumber))); - - v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE); - - seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE)); - - Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE); - - if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_EXTEND)) != BLCKSZ) - { - if (nbytes < 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not extend file \"%s\": %m", - FilePathName(v->mdfd_vfd)), - errhint("Check free disk space."))); - /* short write: complain appropriately */ - ereport(ERROR, -(errcode(ERRCODE_DISK_FULL), - errmsg("could not extend file \"%s\": wrote only %d of %d bytes at block %u", - FilePathName(v->mdfd_vfd), - nbytes, BLCKSZ, blocknum), - errhint("Check free disk space."))); - } - - if (!skipFsync && !SmgrIsTemp(reln)) - register_dirty_segment(reln, forknum, v); - - Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE)); -} - /* * mdzeroextend() -- Add new zeroed out blocks to the specified relation. * - * Similar to mdextend(), except the relation can be extended by multiple - * blocks at once and the added blocks will be filled with zeroes. + * The added blocks will be filled with zeroes. */ void mdzeroextend(SMgrRelation reln, ForkNumber forknum, @@ -595,13 +526,7 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum, { int ret; - /* - * Even if we don't want to use fallocate, we can still extend a - * bit more efficiently than writing each 8kB block individually. - * pg_pwrite_zeros() (via FileZero()) uses pg_pwritev_with_retry() - * to avoid multiple writes or needing a zeroed buffer for the - * whole length of the extension. - */ + /* Fall back to writing out zeroes. */ ret = FileZero(v->mdfd_vfd, seekpos, (off_t) BLCKSZ * numblocks, WAIT_EVENT_DATA_FILE_EXTEND); @@ -920,19 +845,14 @@ mdreadv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, /* * mdwritev() -- Write the supplied blocks at the appropriate location. * - * This is to be used only for updating already-existing blocks of a - * relation (ie, those before the current EOF). To extend a relation, - * use mdextend(). + * Note tha
Vectored I/O in bulk_write.c
Hi, I was trying to learn enough about the new bulk_write.c to figure out what might be going wrong over at [1], and finished up doing this exercise, which is experiment quality but passes basic tests. It's a bit like v1-0013 and v1-0014's experimental vectored checkpointing from [2] (which themselves are not currently proposed, that too was in the experiment category), but this usage is a lot simpler and might be worth considering. Presumably both things would eventually finish up being done by (not yet proposed) streaming write, but could also be done directly in this simple case. This way, CREATE INDEX generates 128kB pwritev() calls instead of 8kB pwrite() calls. (There's a magic 16 in there, we'd probably need to think harder about that.) It'd be better if bulk_write.c's memory management were improved: if buffers were mostly contiguous neighbours instead of being separately palloc'd objects, you'd probably often get 128kB pwrite() instead of pwritev(), which might be marginally more efficient. This made me wonder why smgrwritev() and smgrextendv() shouldn't be backed by the same implementation, since they are essentially the same operation. The differences are some assertions which might as well be moved up to the smgr.c level as they must surely apply to any future smgr implementation too, right?, and the segment file creation policy which can be controlled with a new argument. I tried that here. An alternative would be for md.c to have a workhorse function that both mdextendv() and mdwritev() call, but I'm not sure if there's much point in that. While thinking about that I realised that an existing write-or-extend assertion in master is wrong because it doesn't add nblocks. Hmm, it's a bit weird that we have nblocks as int or BlockNumber in various places, which I think should probably be fixed. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGK%2B5DOmLaBp3Z7C4S-Yv6yoROvr1UncjH2S1ZbPT8D%2BZg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com From 4611cb121bbfa787ddbba4bc0e80ac6c732345d0 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 9 Mar 2024 16:04:21 +1300 Subject: [PATCH 1/2] Provide vectored variant of smgrextend(). Since mdwrite() and mdextend() were basically the same, merge them. They had different assertions, but those would surely apply to any implementation of smgr, so move them up to the smgr.c wrapper level. The other difference was the policy on segment creation, but that can be captured by having smgrwritev() and smgrextendv() call a single mdwritev() function with a new "extend" flag. --- src/backend/storage/smgr/md.c | 91 - src/backend/storage/smgr/smgr.c | 28 ++ src/include/storage/md.h| 3 +- src/include/storage/smgr.h | 12 - 4 files changed, 40 insertions(+), 94 deletions(-) diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index bf0f3ca76d..80716f11e1 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -447,74 +447,6 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo) pfree(path); } -/* - * mdextend() -- Add a block to the specified relation. - * - * The semantics are nearly the same as mdwrite(): write at the - * specified position. However, this is to be used for the case of - * extending a relation (i.e., blocknum is at or beyond the current - * EOF). Note that we assume writing a block beyond current EOF - * causes intervening file space to become filled with zeroes. - */ -void -mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, - const void *buffer, bool skipFsync) -{ - off_t seekpos; - int nbytes; - MdfdVec*v; - - /* If this build supports direct I/O, the buffer must be I/O aligned. */ - if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ) - Assert((uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)); - - /* This assert is too expensive to have on normally ... */ -#ifdef CHECK_WRITE_VS_EXTEND - Assert(blocknum >= mdnblocks(reln, forknum)); -#endif - - /* - * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any - * more --- we mustn't create a block whose number actually is - * InvalidBlockNumber. (Note that this failure should be unreachable - * because of upstream checks in bufmgr.c.) - */ - if (blocknum == InvalidBlockNumber) - ereport(ERROR, -(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("cannot extend file \"%s\" beyond %u blocks", - relpath(reln->smgr_rlocator, forknum), - InvalidBlockNumber))); - - v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE); - - seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE)); - - Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE); - - if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_EXTEND)) != BLCKSZ) - { - if (nbytes
Re: Extract numeric filed in JSONB more effectively
>> But I have a different question about this patch set. This has some >> overlap with the JSON_VALUE function that is being discussed at >> [0][1]. For example, if I apply the patch >> v39-0001-Add-SQL-JSON-query-functions.patch from that thread, I can run >> >> select count(*) from tb where json_value(a, '$.a' returning numeric) = 2; >> >> and I get a noticeable performance boost over >> >> select count(*) from tb where cast (a->'a' as numeric) = 2; > > Here is my test and profile about the above 2 queries. > .. > As we can see the patch here has the best performance (this result looks > be different from yours?). > > After I check the code, I am sure both patches *don't* have the problem > in master where it get a jsonbvalue first and convert it to jsonb and > then cast to numeric. > > Then I perf the result, and find the below stuff: > .. > JSONB_VALUE has a much longer way to get getKeyJsonValueFromContainer, > then I think JSON_VALUE probably is designed for some more complex path > which need to pay extra effort which bring the above performance > difference. Hello Peter, Thanks for highlight the JSON_VALUE patch! Here is the sistuation in my mind now. My patch is desigined to *not* introducing any new user-faced functions, but let some existing functions run faster. JSON_VALUE patch is designed to following more on SQL standard so introuduced one new function which has more flexibility on ERROR handling [1]. Both patches are helpful on the subject here, but my patch here has a better performance per my testing, I don't think I did anything better here, just because JSON_VALUE function is designed for some more generic purpose which has to pay some extra effort, and even if we have some chance to improve JSON_VALUE, I don't think it shoud not block the patch here (I'd like to learn more about this, it may takes some time!) So I think the my patch here can be go ahead again, what do you think? [1] https://www.postgresql.org/message-id/CACJufxGtetrn34Hwnb9D2if5D_HOPAh235MtEZ1meVYx-BiNtg%40mail.gmail.com -- Best Regards Andy Fan
Re: remaining sql/json patches
jian he writes: > On Tue, Mar 5, 2024 at 12:38 PM Andy Fan wrote: >> >> >> In the commit message of 0001, we have: >> >> """ >> Both JSON_VALUE() and JSON_QUERY() functions have options for >> handling EMPTY and ERROR conditions, which can be used to specify >> the behavior when no values are matched and when an error occurs >> during evaluation, respectively. >> >> All of these functions only operate on jsonb values. The workaround >> for now is to cast the argument to jsonb. >> """ >> >> which is not clear for me why we introduce JSON_VALUE() function, is it >> for handling EMPTY or ERROR conditions? I think the existing cast >> workaround have a similar capacity? >> > > I guess because it's in the standard. > but I don't see individual sql standard Identifier, JSON_VALUE in > sql_features.txt > I do see JSON_QUERY. > mysql also have JSON_VALUE, [1] > > EMPTY, ERROR: there is a standard Identifier: T825: SQL/JSON: ON EMPTY > and ON ERROR clauses > > [1] > https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html#function_json-value Thank you for this informatoin! -- Best Regards Andy Fan
Re: Improving contrib/tablefunc's error reporting
On 3/9/24 15:39, Tom Lane wrote: Joe Conway writes: On 3/9/24 13:07, Tom Lane wrote: BTW, while I didn't touch it here, it seems fairly bogus that connectby() checks both type OID and typmod for its output columns while crosstab() only checks type OID. I think crosstab() is in the wrong and needs to be checking typmod. That might be fit material for a separate patch though. Something like the attached what you had in mind? (applies on top of your two patches) Yeah, exactly. As far as the comment change goes: - * - attribute [1] of the sql tuple is the category; no need to check it - - * attribute [2] of the sql tuple should match attributes [1] to [natts] + * attribute [1] of the sql tuple is the category; no need to check it + * attribute [2] of the sql tuple should match attributes [1] to [natts - 1] * of the return tuple I suspect that this block looked better when originally committed but didn't survive contact with pgindent. You should check whether your version will; if not, some dashes on the /* line will help. Thanks for the review and heads up. I fiddled with it a bit to make it pgindent clean. I saw you commit your patches so I just pushed mine. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Combine Prune and Freeze records emitted by vacuum
Thanks so much for the review! On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas wrote: > > On 25/01/2024 00:49, Melanie Plageman wrote: > > > The attached patch set is broken up into many separate commits for > > ease of review. Each patch does a single thing which can be explained > > plainly in the commit message. Every commit passes tests and works on > > its own. > > About this very first change: > > > --- a/src/backend/access/heap/vacuumlazy.c > > +++ b/src/backend/access/heap/vacuumlazy.c > > @@ -1526,8 +1526,7 @@ lazy_scan_prune(LVRelState *vacrel, > >* that everyone sees it as committed? > >*/ > > xmin = HeapTupleHeaderGetXmin(htup); > > - if (!TransactionIdPrecedes(xmin, > > - > > vacrel->cutoffs.OldestXmin)) > > + if > > (!GlobalVisTestIsRemovableXid(vacrel->vistest, xmin)) > > { > > all_visible = false; > > break; > > Does GlobalVisTestIsRemovableXid() handle FrozenTransactionId correctly? Okay, so I thought a lot about this, and I don't understand how GlobalVisTestIsRemovableXid() would not handle FrozenTransactionId correctly. vacrel->cutoffs.OldestXmin is computed initially from GetOldestNonRemovableTransactionId() which uses ComputeXidHorizons(). GlobalVisState is updated by ComputeXidHorizons() (when it is updated). I do see that the comment above GlobalVisTestIsRemovableXid() says * It is crucial that this only gets called for xids from a source that * protects against xid wraparounds (e.g. from a table and thus protected by * relfrozenxid). and then in * Convert 32 bit argument to FullTransactionId. We can do so safely * because we know the xid has to, at the very least, be between * [oldestXid, nextXid), i.e. within 2 billion of xid. I'm not sure what oldestXid is here. It's true that I don't see GlobalVisTestIsRemovableXid() being called anywhere else with an xmin as an input. I think that hints that it is not safe with FrozenTransactionId. But I don't see what could go wrong. Maybe it has something to do with converting it to a FullTransactionId? FullTransactionIdFromU64(U64FromFullTransactionId(rel) + (int32) (xid - rel_xid)); Sorry, I couldn't quite figure it out :( > I read through all the patches in order, and aside from the above they > all look correct to me. Some comments on the set as whole: > > I don't think we need XLOG_HEAP2_FREEZE_PAGE as a separate record type > anymore. By removing that, you also get rid of the freeze-only codepath > near the end of heap_page_prune(), and the > heap_freeze_execute_prepared() function. That makes sense to me. > The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than > XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for > the case that there's no pruning, just freezing. The record format > (xl_heap_prune) looks pretty complex as it is, I think it could be made > both more compact and more clear with some refactoring. I'm happy to change up xl_heap_prune format. In its current form, according to pahole, it has no holes and just 3 bytes of padding at the end. One way we could make it smaller is by moving the isCatalogRel member to directly after snapshotConflictHorizon and then adding a flags field and defining flags to indicate whether or not other members exist at all. We could set bits for HAS_FREEZE_PLANS, HAS_REDIRECTED, HAS_UNUSED, HAS_DEAD. Then I would remove the non-optional uint16 nredirected, nunused, nplans, ndead and just put the number of redirected/unused/etc at the beginning of the arrays containing the offsets. Then I could write various macros for accessing them. That would make it smaller, but it definitely wouldn't make it less complex (IMO). > FreezeMultiXactId still takes a separate 'cutoffs' arg, but it could use > pagefrz->cutoffs now. Yep, I forgot to add a commit to do this. Thanks! > HeapPageFreeze has two "trackers", for the "freeze" and "no freeze" > cases. heap_page_prune() needs to track both, until it decides whether > to freeze or not. But it doesn't make much sense that the caller > (lazy_scan_prune()) has to initialize both, and has to choose which of > the values to use after the call depending on whether heap_page_prune() > froze or not. The two trackers should be just heap_page_prune()'s > internal business. > > HeapPageFreeze is a bit confusing in general, as it's both an input and > an output to heap_page_prune(). Not sure what exactly to do there, but I > feel that we should make heap_page_prune()'s interface more clear. > Perhaps move the output fields to PruneResult. Great point. Perhaps I just add NewRelfrozenXid and NewRelminMxid t
Re: Improving contrib/tablefunc's error reporting
Joe Conway writes: > On 3/9/24 13:07, Tom Lane wrote: >>> BTW, while I didn't touch it here, it seems fairly bogus that >>> connectby() checks both type OID and typmod for its output >>> columns while crosstab() only checks type OID. I think >>> crosstab() is in the wrong and needs to be checking typmod. >>> That might be fit material for a separate patch though. > Something like the attached what you had in mind? (applies on top of > your two patches) Yeah, exactly. As far as the comment change goes: - * - attribute [1] of the sql tuple is the category; no need to check it - - * attribute [2] of the sql tuple should match attributes [1] to [natts] + * attribute [1] of the sql tuple is the category; no need to check it + * attribute [2] of the sql tuple should match attributes [1] to [natts - 1] * of the return tuple I suspect that this block looked better when originally committed but didn't survive contact with pgindent. You should check whether your version will; if not, some dashes on the /* line will help. regards, tom lane
Re: Improving contrib/tablefunc's error reporting
On 3/9/24 13:07, Tom Lane wrote: Joe Conway writes: On 3/5/24 17:04, Tom Lane wrote: After reading the thread at [1], I could not escape the feeling that contrib/tablefunc's error reporting is very confusing. The changes all look good to me and indeed more consistent with the docs. Do you want me to push these? If so, development tip only, or backpatch? I can push that. I was just thinking HEAD, we aren't big on changing error reporting in back branches. BTW, while I didn't touch it here, it seems fairly bogus that connectby() checks both type OID and typmod for its output columns while crosstab() only checks type OID. I think crosstab() is in the wrong and needs to be checking typmod. That might be fit material for a separate patch though. I can take a look at this. Presumably this would not be for backpatching. I'm not sure whether that could produce results bad enough to be called a bug or not. But I too would lean towards not back-patching, in view of the lack of field complaints. Something like the attached what you had in mind? (applies on top of your two patches) -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com Make tablefunc crosstab() check typmod too tablefunc connectby() checks both type OID and typmod for its output columns while crosstab() only checks type OID. Fix that by makeing the crosstab() check look more like the connectby() check. --- tablefunc.c.v0002 2024-03-09 14:38:29.285393890 -0500 +++ tablefunc.c 2024-03-09 14:43:47.021399855 -0500 @@ -1527,10 +1527,10 @@ compatCrosstabTupleDescs(TupleDesc ret_tupdesc, TupleDesc sql_tupdesc) { int i; - Form_pg_attribute ret_attr; Oid ret_atttypid; - Form_pg_attribute sql_attr; Oid sql_atttypid; + int32 ret_atttypmod; + int32 sql_atttypmod; if (ret_tupdesc->natts < 2) ereport(ERROR, @@ -1539,34 +1539,39 @@ errdetail("Return row must have at least two columns."))); Assert(sql_tupdesc->natts == 3); /* already checked by caller */ - /* check the rowid types match */ + /* check the row_name types match */ ret_atttypid = TupleDescAttr(ret_tupdesc, 0)->atttypid; sql_atttypid = TupleDescAttr(sql_tupdesc, 0)->atttypid; - if (ret_atttypid != sql_atttypid) + ret_atttypmod = TupleDescAttr(ret_tupdesc, 0)->atttypmod; + sql_atttypmod = TupleDescAttr(sql_tupdesc, 0)->atttypmod; + if (ret_atttypid != sql_atttypid || + (ret_atttypmod >= 0 && ret_atttypmod != sql_atttypmod)) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("invalid crosstab return type"), errdetail("Source row_name datatype %s does not match return row_name datatype %s.", - format_type_be(sql_atttypid), - format_type_be(ret_atttypid; + format_type_with_typemod(sql_atttypid, sql_atttypmod), + format_type_with_typemod(ret_atttypid, ret_atttypmod; /* - * - attribute [1] of the sql tuple is the category; no need to check it - - * attribute [2] of the sql tuple should match attributes [1] to [natts] + * attribute [1] of the sql tuple is the category; no need to check it + * attribute [2] of the sql tuple should match attributes [1] to [natts - 1] * of the return tuple */ - sql_attr = TupleDescAttr(sql_tupdesc, 2); + sql_atttypid = TupleDescAttr(sql_tupdesc, 2)->atttypid; + sql_atttypmod = TupleDescAttr(sql_tupdesc, 2)->atttypmod; for (i = 1; i < ret_tupdesc->natts; i++) { - ret_attr = TupleDescAttr(ret_tupdesc, i); - - if (ret_attr->atttypid != sql_attr->atttypid) + ret_atttypid = TupleDescAttr(ret_tupdesc, i)->atttypid; + ret_atttypmod = TupleDescAttr(ret_tupdesc, i)->atttypmod; + if (ret_atttypid != sql_atttypid || + (ret_atttypmod >= 0 && ret_atttypmod != sql_atttypmod)) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("invalid crosstab return type"), errdetail("Source value datatype %s does not match return value datatype %s in column %d.", - format_type_be(sql_attr->atttypid), - format_type_be(ret_attr->atttypid), + format_type_with_typemod(sql_atttypid, sql_atttypmod), + format_type_with_typemod(ret_atttypid, ret_atttypmod), i + 1))); }
Re: Improving contrib/tablefunc's error reporting
Joe Conway writes: > On 3/5/24 17:04, Tom Lane wrote: >> After reading the thread at [1], I could not escape the feeling >> that contrib/tablefunc's error reporting is very confusing. > The changes all look good to me and indeed more consistent with the docs. > Do you want me to push these? If so, development tip only, or backpatch? I can push that. I was just thinking HEAD, we aren't big on changing error reporting in back branches. >> BTW, while I didn't touch it here, it seems fairly bogus that >> connectby() checks both type OID and typmod for its output >> columns while crosstab() only checks type OID. I think >> crosstab() is in the wrong and needs to be checking typmod. >> That might be fit material for a separate patch though. > I can take a look at this. Presumably this would not be for backpatching. I'm not sure whether that could produce results bad enough to be called a bug or not. But I too would lean towards not back-patching, in view of the lack of field complaints. regards, tom lane
Re: Improving contrib/tablefunc's error reporting
On 3/5/24 17:04, Tom Lane wrote: After reading the thread at [1], I could not escape the feeling that contrib/tablefunc's error reporting is very confusing. Looking into the source code, I soon found that it is also very inconsistent, with similar error reports being phrased quite differently. The terminology for column names doesn't match the SGML docs either. And there are some places that are so confused about whether they are complaining about the calling query or the called query that the output is flat-out backwards. So at that point my nascent OCD wouldn't let me rest without fixing it. Here's a quick patch series to do that. For review purposes, I split this into two patches. 0001 simply adds some more test cases to reach currently-unexercised error reports. Then 0002 makes my proposed code changes and shows how the existing error messages change. I'm not necessarily wedded to the phrasings I used here, in case anyone has better ideas. The changes all look good to me and indeed more consistent with the docs. Do you want me to push these? If so, development tip only, or backpatch? BTW, while I didn't touch it here, it seems fairly bogus that connectby() checks both type OID and typmod for its output columns while crosstab() only checks type OID. I think crosstab() is in the wrong and needs to be checking typmod. That might be fit material for a separate patch though. I can take a look at this. Presumably this would not be for backpatching. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Fix for timestamp lag issue from emit_log_hook when GUC log_line_prefix has '%m'
Hi, We observed a slight lag in timestamp for a few logs from the emit_log_hook hook implementation when the log_line_prefix GUC has '%m'. Upon debugging, we found that the saved_timeval_set variable is set to 'true' in get_formatted_log_time() but is not reset to 'false' until the next call to send_message_to_server_log(). Due to this, saved_timeval_set will be true during the execution of hook emit_log_hook() which prefixes the saved timestamp 'saved_timeval' from the previous log line (our hook implementation calls log_line_prefix()). Attached patch sets the saved_timeval_set to false before executing the emit_log_hook() Thanks, Vinay 0001-set-saved_timeval_set-to-false-before-executing-emit.patch Description: Binary data
Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value
On Sat, Mar 09, 2024 at 11:57:18AM +0900, Yugo NAGATA wrote: > On Fri, 8 Mar 2024 16:17:58 -0600 > Nathan Bossart wrote: >> Is this guaranteed to be TOASTed for all possible page sizes? > > Should we use block_size? > > SHOW block_size \gset > INSERT INTO test_chunk_id(v1,v2) > VALUES (repeat('x', 1), repeat('x', (:block_size / 4))); > > I think this will work in various page sizes. WFM > +SHOW block_size; \gset > + block_size > + > + 8192 > +(1 row) I think we need to remove the ';' so that the output of the query is not saved in the ".out" file. With that change, this test passes when Postgres is built with --with-blocksize=32. However, many other unrelated tests begin failing, so I guess this fix isn't tremendously important. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
On Fri, Dec 22, 2023 at 11:04 PM Alexander Korotkov wrote: > > Hi, Anton! > > On Mon, Dec 4, 2023 at 3:50 AM Anton A. Melnikov > wrote: >> >> Thanks for remarks! >> >> On 28.11.2023 21:34, Alexander Korotkov wrote: >> > After examining the second patch >> > ("v2-0001-Add-restartpoint-stats.patch"), it appears that adding >> > additional statistics as outlined in the patch is the most suitable >> > approach to address the concerns raised. This solution provides more >> > visibility into the system's behavior without altering its core >> > mechanics. >> >> Agreed. I left only this variant of the patch and rework it due to commit >> 96f05261. >> So the new counters is in the pg_stat_checkpointer view now. >> Please see the v3-0001-add-restartpoints-stats.patch attached. >> >> >> > However, it's essential that this additional functionality >> > is accompanied by comprehensive documentation to ensure clear >> > understanding and ease of use by the PostgreSQL community. >> > >> > Please consider expanding the documentation to include detailed >> > explanations of the new statistics and their implications in various >> > scenarios. >> >> In the separate v3-0002-doc-for-restartpoints-stats.patch i added the >> definitions >> of the new counters into the "28.2.15. pg_stat_checkpointer" section >> and explanation of them with examples into the "30.5.WAL Configuration" one. >> >> Would be glad for any comments and and concerns. > > > I made some grammar corrections to the docs and have written the commit > message. > > I think this patch now looks good. I'm going to push this if there are no > objections. Per the docs, the sync_time, write_time and buffers_written only apply to checkpoints, not restartpoints. Is this correct? AFAICT from a quick look at the code they include both checkpoints and restartpoints in which case I think the docs should be clarified to indicate that? (Or if I'm wrong, and it doesn't include them, then shouldn't we have separate counters for them?) //Magnus
Re: postgres_fdw test timeouts
On Sat, Mar 09, 2024 at 10:00:00AM +0300, Alexander Lakhin wrote: > I have re-run the tests and found out that the issue was fixed by > d3c5f37dd. It changed the inner of the loop "while (PQisBusy(conn))", > formerly contained in pgfdw_get_result() as follows: > /* Data available in socket? */ > if (wc & WL_SOCKET_READABLE) > { > if (!PQconsumeInput(conn)) > pgfdw_report_error(ERROR, NULL, conn, false, query); > } > -> > /* Consume whatever data is available from the socket */ > if (PQconsumeInput(conn) == 0) > { > /* trouble; expect PQgetResult() to return NULL */ > break; > } > > That is, the unconditional "if PQconsumeInput() ..." eliminates the test > timeout. Thanks for confirming! I'm assuming this just masks the underlying issue... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Support "Right Semi Join" plan shapes
On 06.03.2024 05:23, wenhui qiu wrote: Hi Alena Rybakina For merge join + /* + * For now we do not support RIGHT_SEMI join in mergejoin. + */ + if (jointype == JOIN_RIGHT_SEMI) + { + *mergejoin_allowed = false; + return NIL; + } + Tanks Yes, I see it, thank you. Sorry for the noise. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: Improving EXPLAIN's display of SubPlan nodes
On Fri, 16 Feb 2024 at 19:39, Tom Lane wrote: > > > So now I'm thinking that we do have enough detail in the present > > proposal, and we just need to think about whether there's some > > nicer way to present it than the particular spelling I used here. > One thing that concerns me about making even greater use of "$n" is the potential for confusion with generic plan parameters. Maybe it's always possible to work out which is which from context, but still it looks messy: drop table if exists foo; create table foo(id int, x int, y int); explain (verbose, costs off, generic_plan) select row($3,$4) = (select x,y from foo where id=y) and row($1,$2) = (select min(x+y),max(x+y) from generate_series(1,3) x) from generate_series(1,3) y; QUERY PLAN --- Function Scan on pg_catalog.generate_series y Output: (($3 = $0) AND ($4 = $1) AND (ROWCOMPARE (($1 = $3) AND ($2 = $4)) FROM SubPlan 2 (returns $3,$4))) Function Call: generate_series(1, 3) InitPlan 1 (returns $0,$1) -> Seq Scan on public.foo Output: foo.x, foo.y Filter: (foo.id = foo.y) SubPlan 2 (returns $3,$4) -> Aggregate Output: min((x.x + y.y)), max((x.x + y.y)) -> Function Scan on pg_catalog.generate_series x Output: x.x Function Call: generate_series(1, 3) Another odd thing about that is the inconsistency between how the SubPlan and InitPlan expressions are displayed. I think "ROWCOMPARE" is really just an internal detail that could be omitted without losing anything. But the "FROM SubPlan ..." is useful to work out where it's coming from. Should it also output "FROM InitPlan ..."? I think that would risk making it harder to read. Another possibility is to put the SubPlan and InitPlan names inline, rather than outputting "FROM SubPlan ...". I had a go at hacking that up and this was the result: QUERY PLAN --- Function Scan on pg_catalog.generate_series y Output: (($3 = (InitPlan 1).$0) AND ($4 = (InitPlan 1).$1) AND ((($1 = (SubPlan 2).$3) AND ($2 = (SubPlan 2).$4 Function Call: generate_series(1, 3) InitPlan 1 (returns $0,$1) -> Seq Scan on public.foo Output: foo.x, foo.y Filter: (foo.id = foo.y) SubPlan 2 (returns $3,$4) -> Aggregate Output: min((x.x + y.y)), max((x.x + y.y)) -> Function Scan on pg_catalog.generate_series x Output: x.x Function Call: generate_series(1, 3) It's a little more verbose in this case, but in a lot of other cases it ended up being more compact. The code is a bit messy, but I think the regression test output (attached) is clearer and easier to interpret. SubPlans and InitPlans are displayed consistently, and it's easier to distinguish SubPlan/InitPlan outputs from external parameters. There are a few more regression test changes, corresponding to cases where InitPlans are referenced, such as: Seq Scan on document - Filter: ((dlevel <= $0) AND f_leak(dtitle)) + Filter: ((dlevel <= (InitPlan 1).$0) AND f_leak(dtitle)) InitPlan 1 (returns $0) -> Index Scan using uaccount_pkey on uaccount Index Cond: (pguser = CURRENT_USER) but I think that's useful extra clarification. Regards, Dean diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out new file mode 100644 index c355e8f..f0ff936 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -3021,7 +3021,7 @@ select exists(select 1 from pg_enum), su QUERY PLAN -- Foreign Scan - Output: $0, (sum(ft1.c1)) + Output: (InitPlan 1).$0, (sum(ft1.c1)) Relations: Aggregate on (public.ft1) Remote SQL: SELECT sum("C 1") FROM "S 1"."T 1" InitPlan 1 (returns $0) @@ -3039,7 +3039,7 @@ select exists(select 1 from pg_enum), su QUERY PLAN --- GroupAggregate - Output: $0, sum(ft1.c1) + Output: (InitPlan 1).$0, sum(ft1.c1) InitPlan 1 (returns $0) -> Seq Scan on pg_catalog.pg_enum -> Foreign Scan on public.ft1 @@ -3264,14 +3264,14 @@ select sum(c1) filter (where (c1 / c1) * explain (verbose, costs off) select sum(c2) filter (where c2 in (select c2 from ft1 where c2 < 5)) from ft1; -QUERY PLAN + QUERY PLAN +-
Re: Reducing the log spam
On Thu, 2024-03-07 at 08:30 +0100, Laurenz Albe wrote: > On Wed, 2024-03-06 at 17:33 -0500, Isaac Morland wrote: > > I have two questions about this: > > > > First, can it be done per role? If I have a particular application which is > > constantly throwing some particular error, I might want to suppress it, but > > not suppress the same error occasionally coming from another application. > > I see ALTER DATABASE name SET configuration_parameter … as being useful > > here, > > but often multiple applications share a database. > > > > Second, where can this setting be adjusted? Can any session turn off logging > > of arbitrary sets of sqlstates resulting from its queries? It feels to me > > like that might allow security problems to be hidden. Specifically, the > > first > > thing an SQL injection might do would be to turn off logging of important > > error states, then proceed to try various nefarious things. > > I was envisioning the parameter to be like other logging parameters, for > example "log_statement": only superusers can set the parameter or GRANT > that privilege to others. Also, a superuser could use ALTER ROLE to set > the parameter for all sessions by that role. > > > It seems to me the above questions interact; an answer to the first might be > > "ALTER ROLE role_specification SET configuration_parameter", but I think > > that > > would allow roles to change their own settings, contrary to the concern > > raised by the second question. > > If a superuser sets "log_statement" on a role, that role cannot undo or change > the setting. That's just how I plan to implement the new parameter. Here is a patch that implements this. I went with "log_suppress_errcodes", since the term "error code" is used throughout our documentation. The initial value is 23505,3D000,3F000,42601,42704,42883,42P01,57P03 Yours, Laurenz Albe From 26465dafe4bcd2aea3889360987d2a863b66 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Sat, 9 Mar 2024 13:59:55 +0100 Subject: [PATCH v1] Add parameter log_suppress_errcodes The parameter contains a list of SQLSTATEs for which FATAL and ERROR messages are not logged. This is to suppress messages that are of little interest to the database administrator, but tend to clutter the log. Author: Laurenz Albe Discussion: https://postgr.es/m/408f399e7de1416c47bab7e260327ed5ad92838c.camel%40cybertec.at --- doc/src/sgml/config.sgml | 34 + src/backend/utils/error/elog.c| 118 ++ src/backend/utils/misc/guc_tables.c | 11 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/include/pg_config_manual.h| 10 ++ src/include/utils/guc.h | 1 + src/include/utils/guc_hooks.h | 2 + 7 files changed, 180 insertions(+) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 43b1a132a2..f9487de06f 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -6850,6 +6850,40 @@ local0.*/var/log/postgresql + + log_suppress_errcodes (string) + + log_suppress_errcodes configuration parameter + + + + +Causes ERROR and FATAL messages +with certain error codes to be excluded from the log. +The value is a comma-separated list of five-character error codes as +listed in . An error code that +represents a class of errors (ends with three zeros) suppresses logging +of all error codes within that class. For example, the entry +08000 (connection_exception) +would suppress an error with code 08P01 +(protocol_violation). The default setting is +23505,3D000,3F000,42601,42704,42883,42P01,57P03. +Only superusers and users with the appropriate SET +privilege can change this setting. + + + +This setting is useful to exclude error messages from the log that are +frequent but irrelevant. That makes it easier to spot relevant +messages in the log and keeps log files from growing too big. For +example, if you have a monitoring system that regularly establishes a +TCP connection to the server without sending a correct startup packet, +you could suppress the protocol violation errors by adding error code +08P01 to the list. + + + + log_min_duration_statement (integer) diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index c9719f358b..4b38adfd4d 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -112,12 +112,16 @@ int Log_error_verbosity = PGERROR_DEFAULT; char *Log_line_prefix = NULL; /* format for extra log line info */ int Log_destination = LOG_DESTINATION_STDERR; char *Log_destination_string = NULL; +char *log_suppress_errcodes = NULL; bool syslog_sequence_numbers = true; bool syslog_spl
Re: Make query cancellation keys longer
On Fri, 8 Mar 2024 at 23:20, Heikki Linnakangas wrote: > Added some documentation. There's more work to be done there, but at > least the message type descriptions are now up-to-date. Thanks, that's very helpful. > The nice thing about relying on the message length was that we could > just redefine the CancelRequest message to have a variable length > secret, and old CancelRequest with 4-byte secret was compatible with the > new definition too. But it doesn't matter much, so I added an explicit > length field. > > FWIW I don't think that restriction makes sense. Any code that parses > the messages needs to have the message length available, and I don't > think it helps with sanity checking that much. I think the documentation > is a little anachronistic. The real reason that all the message types > include enough information to find the message end is that in protocol > version 2, there was no message length field. The only exception that > doesn't have that property is CopyData, and it's no coincidence that it > was added in protocol version 3. Hmm, looking at the current code, I do agree that not introducing a new message would simplify both client and server implementation. Now clients need to do different things depending on if the server supports 3.1 or 3.0 (sending CancelRequestExtended instead of CancelRequest and having to parse BackendKeyData differently). And I also agree that the extra length field doesn't add much in regards to sanity checking (for the CancelRequest and BackendKeyData message types at least). So, sorry for the back and forth on this, but I now agree with you that we should not add the length field. I think one reason I didn't see the benefit before was because the initial patch 0002 was still introducing a CancelRequestExtended message type. If we can get rid of this message type by not adding a length, then I think that's worth losing the sanity checking. > Unfortunately 1234,5679 already in use by NEGOTIATE_SSL_CODE. That's why > I went the other direction. If we want to add this to "the end", it > needs to be 1234,5681. But I wanted to keep the cancel requests together. Fair enough, I didn't realise that. This whole point is moot anyway if we're not introducing CancelRequestExtended > We could teach > libpq to disconnect and reconnect with minor protocol version 3.0, if > connecting with 3.1 fails, but IMHO it's better to not complicate this > and accept the break in backwards-compatibility. Yeah, implementing automatic reconnection seems a bit overkill to me too. But it might be nice to add a connection option that causes libpq to use protocol 3.0. Having to install an old libpq to connect to an old server seems quite annoying. Especially since I think that many other types of servers that implement the postgres protocol have not implemented the minor version negotiation. I at least know PgBouncer[1] and pgcat[2] have not, but probably other server implementations like CockroachDB and Google Spanner have this problem too. I'll try to add such a fallback connection option to my patchset soon. [1]: https://github.com/pgbouncer/pgbouncer/pull/1007 [2]: https://github.com/postgresml/pgcat/issues/706