Re: UUID v7
On Mon, Mar 11, 2024 at 11:27:43PM +0500, Andrey M. Borodin wrote: > Sorry for this long and vague explanation, if it still seems too > uncertain we can have a chat or something like that. I don't think > this number picking stuff deserve to be commented, because it still > is quite close to random. RFC gives us too much freedom of choice. Speaking about the RFC, I can see that there is a draft but nothing formal yet. The last one I can see is v14 from last November: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14 It does not strike me as a good idea to rush an implementation without a specification officially approved because there is always a risk of shipping something that's non-compliant into core. But perhaps I am missing something on the RFC side? -- Michael signature.asc Description: PGP signature
Re: Spurious pgstat_drop_replslot() call
On Mon, Mar 11, 2024 at 04:15:40PM +0900, Michael Paquier wrote: > That's a slight change in behavior, unfortunately, and it cannot be > called a bug as this improves the visibility of the stats after an > invalidation, so this is not something that can be backpatched. I've looked again at that and that was OK on a second look. May I suggest the attached additions with LWLockHeldByMeInMode to make sure that the stats are dropped and created while we hold ReplicationSlotAllocationLock? -- Michael diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c index c61bad1627..889e86ac5a 100644 --- a/src/backend/utils/activity/pgstat_replslot.c +++ b/src/backend/utils/activity/pgstat_replslot.c @@ -113,6 +113,8 @@ pgstat_create_replslot(ReplicationSlot *slot) PgStat_EntryRef *entry_ref; PgStatShared_ReplSlot *shstatent; + Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE)); + entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid, ReplicationSlotIndex(slot), false); shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats; @@ -153,6 +155,8 @@ pgstat_acquire_replslot(ReplicationSlot *slot) void pgstat_drop_replslot(ReplicationSlot *slot) { + Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE)); + pgstat_drop_entry(PGSTAT_KIND_REPLSLOT, InvalidOid, ReplicationSlotIndex(slot)); } signature.asc Description: PGP signature
Re: Regardign RecentFlushPtr in WalSndWaitForWal()
On Mon, Mar 11, 2024 at 4:36 PM Amit Kapila wrote: > > On Mon, Mar 11, 2024 at 4:17 PM shveta malik wrote: > > > > > > Please find the patch attached for the same. > > > > LGTM. I'll push this tomorrow unless I see any comments/objections to > this change. > Pushed. -- With Regards, Amit Kapila.
Re: Improve eviction algorithm in ReorderBuffer
On Fri, Mar 8, 2024 at 12:58 PM Peter Smith wrote: > > On Thu, Mar 7, 2024 at 2:16 PM Masahiko Sawada wrote: > > > > On Tue, Mar 5, 2024 at 3:28 PM Peter Smith wrote: > > > > > > > 4a. > > > The comment in simplehash.h says > > > * The following parameters are only relevant when SH_DEFINE is defined: > > > * - SH_KEY - ... > > > * - SH_EQUAL(table, a, b) - ... > > > * - SH_HASH_KEY(table, key) - ... > > > * - SH_STORE_HASH - ... > > > * - SH_GET_HASH(tb, a) - ... > > > > > > So maybe it is nicer to reorder the #defines in that same order? > > > > > > SUGGESTION: > > > +#define SH_PREFIX bh_nodeidx > > > +#define SH_ELEMENT_TYPE bh_nodeidx_entry > > > +#define SH_KEY_TYPE bh_node_type > > > +#define SH_SCOPE extern > > > +#ifdef FRONTEND > > > +#define SH_RAW_ALLOCATOR pg_malloc0 > > > +#endif > > > > > > +#define SH_DEFINE > > > +#define SH_KEY key > > > +#define SH_EQUAL(tb, a, b) (memcmp(, , sizeof(bh_node_type)) == 0) > > > +#define SH_HASH_KEY(tb, key) \ > > > + hash_bytes((const unsigned char *) , sizeof(bh_node_type)) > > > +#include "lib/simplehash.h" > > > > I'm really not sure it helps increase readability. For instance, for > > me it's readable if SH_DEFINE and SH_DECLARE come to the last before > > #include since it's more obvious whether we want to declare, define or > > both. Other simplehash.h users also do so. > > > > OK. > > > > 5. > > > + * > > > + * If 'indexed' is true, we create a hash table to track of each node's > > > + * index in the heap, enabling to perform some operations such as > > > removing > > > + * the node from the heap. > > > */ > > > binaryheap * > > > -binaryheap_allocate(int capacity, binaryheap_comparator compare, void > > > *arg) > > > +binaryheap_allocate(int capacity, binaryheap_comparator compare, > > > + bool indexed, void *arg) > > > > > > BEFORE > > > ... enabling to perform some operations such as removing the node from > > > the heap. > > > > > > SUGGESTION > > > ... to help make operations such as removing nodes more efficient. > > > > > > > But these operations literally require the indexed binary heap as we > > have an assertion: > > > > void > > binaryheap_remove_node_ptr(binaryheap *heap, bh_node_type d) > > { > > bh_nodeidx_entry *ent; > > > > Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property); > > Assert(heap->bh_indexed); > > > > I didn’t quite understand -- the operations mentioned are "operations > such as removing the node", but binaryheap_remove_node() also removes > a node from the heap. So I still felt the comment wording of the patch > is not quite correct. Now I understand your point. That's a valid point. > > Now, if the removal of a node from an indexed heap can *only* be done > using binaryheap_remove_node_ptr() then: > - the other removal functions (binaryheap_remove_*) probably need some > comments to make sure nobody is tempted to call them directly for an > indexed heap. > - maybe some refactoring and assertions are needed to ensure those > *cannot* be called directly for an indexed heap. > If the 'index' is true, the caller can not only use the existing functions but also newly added functions such as binaryheap_remove_node_ptr() and binaryheap_update_up() etc. How about something like below? * If 'indexed' is true, we create a hash table to track each node's * index in the heap, enabling to perform some operations such as * binaryheap_remove_node_ptr() etc. > > And, here are some review comments for v8-0002. > > == > 1. delete_nodeidx > > +/* > + * Remove the node's index from the hash table if the heap is indexed. > + */ > +static bool > +delete_nodeidx(binaryheap *heap, bh_node_type node) > +{ > + if (!binaryheap_indexed(heap)) > + return false; > + > + return bh_nodeidx_delete(heap->bh_nodeidx, node); > +} > > 1a. > In v8 this function was changed to now return bool, so, I think the > function comment should explain the meaning of that return value. > > ~ > > 1b. > I felt the function body is better expressed positively: "If this then > do that", instead of "If not this then do nothing otherwise do that" > > SUGGESTION > if (binaryheap_indexed(heap)) > return bh_nodeidx_delete(heap->bh_nodeidx, node); > > return false; > > ~~~ > > 2. > +static void > +replace_node(binaryheap *heap, int index, bh_node_type new_node) > +{ > + bool found PG_USED_FOR_ASSERTS_ONLY; > + > + /* Quick return if not necessary to move */ > + if (heap->bh_nodes[index] == new_node) > + return; > + > + /* > + * Remove overwritten node's index. The overwritten node's position must > + * have been tracked, if enabled. > + */ > + found = delete_nodeidx(heap, heap->bh_nodes[index]); > + Assert(!binaryheap_indexed(heap) || found); > + > + /* > + * Replace it with the given new node. This node's position must also be > + * tracked as we assume to replace the node by the existing node. > + */ > + found = set_node(heap, new_node, index); > + Assert(!binaryheap_indexed(heap) || found); > +} > > 2a. >
Re: [PROPOSAL] Skip test citext_utf8 on Windows
Michael Paquier писал(а) 2024-03-12 06:24: On Mon, Mar 11, 2024 at 03:21:11PM +0700, Oleg Tselebrovskiy wrote: The proposed patch for skipping test is attached Your attached patch seems to be in binary format. -- Michael Right, I had it saved in not-UTF-8 encoding. Kind of ironic Here's a fixed versiondiff --git a/contrib/citext/expected/citext_utf8.out b/contrib/citext/expected/citext_utf8.out index 5d988dcd485..6c4069f9469 100644 --- a/contrib/citext/expected/citext_utf8.out +++ b/contrib/citext/expected/citext_utf8.out @@ -10,7 +10,8 @@ SELECT getdatabaseencoding() <> 'UTF8' OR (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i' FROM pg_database -WHERE datname=current_database()) +WHERE datname=current_database()) OR + (version() ~ 'windows' OR version() ~ 'Visual C\+\+' OR version() ~ 'mingw32') AS skip_test \gset \if :skip_test \quit diff --git a/contrib/citext/expected/citext_utf8_1.out b/contrib/citext/expected/citext_utf8_1.out index 7065a5da190..d4472b1c36a 100644 --- a/contrib/citext/expected/citext_utf8_1.out +++ b/contrib/citext/expected/citext_utf8_1.out @@ -10,7 +10,8 @@ SELECT getdatabaseencoding() <> 'UTF8' OR (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i' FROM pg_database -WHERE datname=current_database()) +WHERE datname=current_database()) OR + (version() ~ 'windows' OR version() ~ 'Visual C\+\+' OR version() ~ 'mingw32') AS skip_test \gset \if :skip_test \quit diff --git a/contrib/citext/sql/citext_utf8.sql b/contrib/citext/sql/citext_utf8.sql index 34b232d64e2..53775cdcd35 100644 --- a/contrib/citext/sql/citext_utf8.sql +++ b/contrib/citext/sql/citext_utf8.sql @@ -11,7 +11,8 @@ SELECT getdatabaseencoding() <> 'UTF8' OR (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i' FROM pg_database -WHERE datname=current_database()) +WHERE datname=current_database()) OR + (version() ~ 'windows' OR version() ~ 'Visual C\+\+' OR version() ~ 'mingw32') AS skip_test \gset \if :skip_test \quit
Re: Race condition in FetchTableStates() breaks synchronization of subscription tables
On Tue, Mar 12, 2024 at 2:59 PM vignesh C wrote: > > Thanks, I have created the following Commitfest entry for this: > https://commitfest.postgresql.org/47/4816/ > > Regards, > Vignesh > Thanks for the patch, I have verified that the fix works well by following the steps mentioned to reproduce the problem. Reviewing the patch, it seems good and is well documented. Just one minor comment I had was probably to change the name of the variable table_states_valid to table_states_validity. The current name made sense when it was a bool, but now that it is a tri-state enum, it doesn't fit well. regards, Ajin Cherian Fujitsu Australia
Re: [PATCH] LockAcquireExtended improvement
Hello Robert, On 2024/3/8 1:02, Robert Haas wrote: > > But instead of just complaining, I decided to try writing a version of > the patch that seemed acceptable to me. Here it is. I took a different > approach than you. Instead of splitting up ProcSleep(), I just passed > down the dontWait flag to WaitOnLock() and ProcSleep(). In > LockAcquireExtended(), I moved the existing code that handles giving > up in the don't-wait case from before the call to ProcSleep() to > afterward. As far as I can see, the major way this could be wrong is > if calling ProcSleep() with dontWait = true and having it fail to > acquire the lock changes the state in some way that makes the cleanup > code that I moved incorrect. I don't *think* that's the case, but I > might be wrong. > > What do you think of this version? Your version changes less code than mine by pushing the nowait flag down into ProcSleep(). This looks fine in general, except for a little advice, which I don't think there is necessary to add 'waiting' suffix to the process name in function WaitOnLock with dontwait being true, as follows: --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -1801,8 +1801,12 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait) LOCK_PRINT("WaitOnLock: sleeping on lock", locallock->lock, locallock->tag.mode); - /* adjust the process title to indicate that it's waiting */ - set_ps_display_suffix("waiting"); + if (!dontWait) + { + /* adjust the process title to indicate that it's waiting */ + set_ps_display_suffix("waiting"); + } + awaitedLock = locallock; awaitedOwner = owner; @@ -1855,9 +1859,12 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait) { /* In this path, awaitedLock remains set until LockErrorCleanup */ - /* reset ps display to remove the suffix */ - set_ps_display_remove_suffix(); - + if (!dontWait) + { + /* reset ps display to remove the suffix */ + set_ps_display_remove_suffix(); + } + /* and propagate the error */ PG_RE_THROW(); } @@ -1865,8 +1872,11 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait) awaitedLock = NULL; - /* reset ps display to remove the suffix */ - set_ps_display_remove_suffix(); + if (!dontWait) + { + /* reset ps display to remove the suffix */ + set_ps_display_remove_suffix(); + } LOCK_PRINT("WaitOnLock: wakeup on lock", locallock->lock, locallock->tag.mode); -- Jingxian Li
Re: [PROPOSAL] Skip test citext_utf8 on Windows
On Tue, Mar 12, 2024 at 2:56 PM Andrew Dunstan wrote: > On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote: > > Greetings, everyone! > > > > While running "installchecks" on databases with UTF-8 encoding the test > > citext_utf8 fails because of Turkish dotted I like this: > > > > SELECT 'i'::citext = 'İ'::citext AS t; > > t > > --- > > - t > > + f > > (1 row) > > > > I tried to replicate the test's results by hand and with any collation > > that I tried (including --locale="Turkish") this test failed > > > > Also an interesing result of my tesing. If you initialize you DB > > with -E utf-8 --locale="Turkish" and then run select LOWER('İ'); > > the output will be this: > > lower > > --- > > İ > > (1 row) > > > > Which I find strange since lower() uses collation that was passed > > (default in this case but still) > > Wouldn't we be better off finding a Windows fix for this, instead of > sweeping it under the rug? Given the sorry state of our Windows locale support, I've started wondering about deleting it and telling users to adopt our nascent built-in support or ICU[1]. This other thread [2] says the sorting is intransitive so I don't think it really meets our needs anyway. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJhV__g_TJ0jVqPbnTuqT%2B%2BM6KFv2wj%2B9AV-cABNCXN6Q%40mail.gmail.com#bc35c0b88962ff8c24c27aecc1bca72e [2] https://www.postgresql.org/message-id/flat/1407a2c0-062b-4e4c-b728-438fdff5cb07%40manitou-mail.org
Re: SQL:2011 application time
On Mon, Mar 11, 2024 at 3:46 PM Peter Eisentraut wrote: > > A few general comments on the tests: > > - In the INSERT commands, specify the column names explicitly. This > makes the tests easier to read (especially since the column order > between the PK and the FK table is sometimes different). > > - Let's try to make it so that the inserted literals match the values > shown in the various error messages, so it's easier to match them up. > So, change the int4range literals to half-open notation. And also maybe > change the date output format to ISO. > maybe just change the tsrange type to daterange, then the dot out file will be far less verbose. minor issues while reviewing v27, 0001 to 0004. transformFkeyGetPrimaryKey comments need to update, since bool pk_period also returned. +/* + * FindFKComparisonOperators - + * + * Gets the operators for pfeqopOut, ppeqopOut, and ffeqopOut. + * Sets old_check_ok if we can avoid re-validating the constraint. + * Sets old_pfeqop_item to the old pfeqop values. + */ +static void +FindFKComparisonOperators(Constraint *fkconstraint, + AlteredTableInfo *tab, + int i, + int16 *fkattnum, + bool *old_check_ok, + ListCell **old_pfeqop_item, + Oid pktype, Oid fktype, Oid opclass, + Oid *pfeqopOut, Oid *ppeqopOut, Oid *ffeqopOut) I think the above comments is `Sets the operators for pfeqopOut, ppeqopOut, and ffeqopOut.`. + if (is_temporal) + { + if (!fkconstraint->fk_with_period) + ereport(ERROR, + (errcode(ERRCODE_INVALID_FOREIGN_KEY), + errmsg("foreign key uses PERIOD on the referenced table but not the referencing table"))); + } can be if (is_temporal && !fkconstraint->fk_with_period) ereport(ERROR, (errcode(ERRCODE_INVALID_FOREIGN_KEY), errmsg("foreign key uses PERIOD on the referenced table but not the referencing table"))); + + if (is_temporal) + { + if (!fkconstraint->pk_with_period) + /* Since we got pk_attrs, one should be a period. */ + ereport(ERROR, + (errcode(ERRCODE_INVALID_FOREIGN_KEY), + errmsg("foreign key uses PERIOD on the referencing table but not the referenced table"))); + } can be if (is_temporal && !fkconstraint->pk_with_period) /* Since we got pk_attrs, one should be a period. */ ereport(ERROR, (errcode(ERRCODE_INVALID_FOREIGN_KEY), errmsg("foreign key uses PERIOD on the referencing table but not the referenced table"))); refactor decompile_column_index_array seems unnecessary. Peter already mentioned it at [1], I have tried to fix it at [2]. @@ -12141,7 +12245,8 @@ transformFkeyGetPrimaryKey(Relation pkrel, Oid *indexOid, /* * Now build the list of PK attributes from the indkey definition (we - * assume a primary key cannot have expressional elements) + * assume a primary key cannot have expressional elements, unless it + * has a PERIOD) */ *attnamelist = NIL; for (i = 0; i < indexStruct->indnkeyatts; i++) @@ -12155,6 +12260,8 @@ transformFkeyGetPrimaryKey(Relation pkrel, Oid *indexOid, makeString(pstrdup(NameStr(*attnumAttName(pkrel, pkattno); } + *pk_period = (indexStruct->indisexclusion); I don't understand the "expression elements" in the comments, most of the tests case is like ` PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) ` + *pk_period = (indexStruct->indisexclusion); can be `+ *pk_period = indexStruct->indisexclusion;` [1] https://postgr.es/m/7be8724a-5c25-46d7-8325-1bd8be6fa...@eisentraut.org [2] https://postgr.es/m/CACJufxHVg65raNhG2zBwXgjrD6jqace4NZbePyMhP8-_Q=i...@mail.gmail.com
Re: SQL:2011 application time
+ + If the last column is marked with PERIOD, + it is treated in a special way. + While the non-PERIOD columns are treated normally + (and there must be at least one of them), + the PERIOD column is not compared for equality. + Instead the constraint is considered satisfied + if the referenced table has matching records + (based on the non-PERIOD parts of the key) + whose combined PERIOD values completely cover + the referencing record's. + In other words, the reference must have a referent for its entire duration. + Normally this column would be a range or multirange type, + although any type whose GiST opclass has a "contained by" operator + and a referenced_agg support function is allowed. + (See .) + In addition the referenced table must have a primary key + or unique constraint declared with WITHOUT PORTION. + typo "referenced_agg", in the gist-extensibility.html page is "referencedagg" WITHOUT PORTION should be WITHOUT OVERLAPS + While the non-PERIOD columns are treated normally + (and there must be at least one of them), + the PERIOD column is not compared for equality. the above sentence didn't say what is "normally"? maybe we can do the following: + While the non-PERIOD columns are treated + normally for equality + (and there must be at least one of them), + the PERIOD column is not compared for equality. + +Datum +my_range_agg_transfn(PG_FUNCTION_ARGS) +{ +MemoryContextaggContext; +Oid rngtypoid; +ArrayBuildState *state; + +if (!AggCheckCallContext(fcinfo, aggContext)) +elog(ERROR, "range_agg_transfn called in non-aggregate context"); + +rngtypoid = get_fn_expr_argtype(fcinfo-flinfo, 1); +if (!type_is_range(rngtypoid)) +elog(ERROR, "range_agg must be called with a range"); + +if (PG_ARGISNULL(0)) +state = initArrayResult(rngtypoid, aggContext, false); +else +state = (ArrayBuildState *) PG_GETARG_POINTER(0); + +/* skip NULLs */ +if (!PG_ARGISNULL(1)) +accumArrayResult(state, PG_GETARG_DATUM(1), false, rngtypoid, aggContext); + +PG_RETURN_POINTER(state); +} + +Datum +my_range_agg_finalfn(PG_FUNCTION_ARGS) +{ +MemoryContextaggContext; +Oid mltrngtypoid; +TypeCacheEntry *typcache; +ArrayBuildState *state; +int32range_count; +RangeType **ranges; +int i; + +if (!AggCheckCallContext(fcinfo, aggContext)) +elog(ERROR, "range_agg_finalfn called in non-aggregate context"); + +state = PG_ARGISNULL(0) ? NULL : (ArrayBuildState *) PG_GETARG_POINTER(0); +if (state == NULL) +/* This shouldn't be possible, but just in case */ +PG_RETURN_NULL(); + +/* Also return NULL if we had zero inputs, like other aggregates */ +range_count = state-nelems; +if (range_count == 0) +PG_RETURN_NULL(); + +mltrngtypoid = get_fn_expr_rettype(fcinfo-flinfo); +typcache = multirange_get_typcache(fcinfo, mltrngtypoid); + +ranges = palloc0(range_count * sizeof(RangeType *)); +for (i = 0; i range_count; i++) +ranges[i] = DatumGetRangeTypeP(state-dvalues[i]); + +PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid, typcache-rngtype, range_count, ranges)); +} my_range_agg_transfn error message is inconsistent? `elog(ERROR, "range_agg_transfn called in non-aggregate context");` `elog(ERROR, "range_agg must be called with a range");` maybe just `my_range_agg_transfn`, instead of mention {range_agg_transfn|range_agg} similarly my_range_agg_finalfn error is also inconsistent. my_range_agg_finalfn need `type_is_multirange(mltrngtypoid)`?
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Mar 11, 2024 at 5:13 PM Masahiko Sawada wrote: > > In the latest (v69) patch: > > - squashed v68-0005 and v68-0006 patches. > - removed most of the changes in v68-0007 patch. > - addressed above review comments in v69-0002 patch. > - v69-0003, 0004, and 0005 are miscellaneous updates. Since the v69 conflicts with the current HEAD, I've rebased them. In addition, v70-0008 is the new patch, which cleans up the vacuum integration patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com v70-ART.tar.gz Description: GNU Zip compressed data
Re: [PROPOSAL] Skip test citext_utf8 on Windows
On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote: Greetings, everyone! While running "installchecks" on databases with UTF-8 encoding the test citext_utf8 fails because of Turkish dotted I like this: SELECT 'i'::citext = 'İ'::citext AS t; t --- - t + f (1 row) I tried to replicate the test's results by hand and with any collation that I tried (including --locale="Turkish") this test failed Also an interesing result of my tesing. If you initialize you DB with -E utf-8 --locale="Turkish" and then run select LOWER('İ'); the output will be this: lower --- İ (1 row) Which I find strange since lower() uses collation that was passed (default in this case but still) Wouldn't we be better off finding a Windows fix for this, instead of sweeping it under the rug? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Reports on obsolete Postgres versions
On Mon, Mar 11, 2024 at 05:17:13PM -0400, Bruce Momjian wrote: > On Mon, Mar 11, 2024 at 04:12:04PM -0500, Nathan Bossart wrote: >> I've read that the use of the term "minor release" can be confusing. While >> the versioning page clearly describes what is eligible for a minor release, >> not everyone reads it, so I suspect that many folks think there are new >> features, etc. in minor releases. I think a "minor release" of Postgres is >> more similar to what other projects would call a "patch version." > > Well, we do say: > > While upgrading will always contain some level of risk, PostgreSQL > minor releases fix only frequently-encountered bugs, security issues, > and data corruption problems to reduce the risk associated with > upgrading. For minor releases, the community considers not upgrading to > be riskier than upgrading. > > but that is far down the page. Do we need to improve this? I think making that note more visible would certainly be an improvement. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Popcount optimization using AVX512
On Mon, Mar 11, 2024 at 09:59:53PM +, Amonson, Paul D wrote: > I will be splitting the request into 2 patches. I am attaching the first > patch (refactoring only) and I updated the commitfest entry to match this > patch. I have a question however: > Do I need to wait for the refactor patch to be merged before I post the > AVX portion of this feature in this thread? Thanks. There's no need to wait to post the AVX portion. I recommend using "git format-patch" to construct the patch set for the lists. >> Apologies for harping on this, but I'm still not seeing the need for these >> SIZEOF_VOID_P changes. While it's unlikely that this makes any practical >> difference, I see no reason to more strictly check SIZEOF_VOID_P here. > > I got rid of the second occurrence as I agree it is not needed but unless > you see something I don't how to know which function to call between a > 32-bit and 64-bit architecture? Maybe I am missing something obvious? > What exactly do you suggest here? I am happy to always call either > pg_popcount32() or pg_popcount64() with the understanding that it may not > be optimal, but I do need to know which to use. I'm recommending that we don't change any of the code in the pg_popcount() function (which is renamed to pg_popcount_slow() in your v6 patch). If pointers are 8 or more bytes, we'll try to process the buffer in 64-bit chunks. Else, we'll try to process it in 32-bit chunks. Any remaining bytes will be processed one-by-one. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Improve readability by using designated initializers when possible
On Fri, 08 Mar 2024 at 13:21, Michael Paquier wrote: > On Wed, Mar 06, 2024 at 08:24:09AM +0800, Japin Li wrote: >> On Wed, 06 Mar 2024 at 01:53, Jelte Fennema-Nio wrote: >>> I think if you remove the EEO_CASE(EEOP_LAST) block the warning should >>> go away. That block is clearly marked as unreachable, so it doesn't >>> really serve a purpose. >> >> Thanks! Fixed as you suggested. Please see v3 patch. > > Hmm. I am not sure if this one is a good idea. Sorry for the late reply! > This makes the code a > bit more complicated to grasp under EEO_USE_COMPUTED_GOTO with the > reverse dispatch table, to say the least. I'm not get your mind. Could you explain in more detail?
Re: Using the %m printf format more
On Mon, Mar 11, 2024 at 11:19:16AM +, Dagfinn Ilmari Mannsåker wrote: > On closer look, fprintf() is actually the only errno-clobbering function > it calls, I was just hedging my bets in that statement. This makes the whole simpler, so I'd be OK with that. I am wondering if others have opinions to offer about that. I've applied v2-0001 for now, as it is worth on its own and it shaves a bit of code. -- Michael signature.asc Description: PGP signature
Re: Reports on obsolete Postgres versions
On Mon, Mar 11, 2024 at 4:38 PM Bruce Momjian wrote: > https://www.postgresql.org/support/versioning/ > > This web page should correct the idea that "upgrades are more risky than > staying with existing versions". Is there more we can do? Should we have > a more consistent response for such reporters? > It could be helpful to remove this sentence: "Upgrading to a minor release does not normally require a dump and restore" While technically true, "not normally" is quite the understatement, as the true answer is "never" or at least "not in the last few decades". I have a hard time even imagining a scenario that would require a minor revision to do a dump and restore - surely, that in itself would warrant a major release? > It would be a crazy idea to report something in the logs if a major > version is run after a certain date, since we know the date when major > versions will become unsupported. > Could indeed be useful to spit something out at startup. Heck, even minor versions are fairly regular now. Sure would be nice to be able to point a client at the database and say "See? Even Postgres itself thinks you should upgrade from 11.3!!" (totally made up example, not at all related to an actual production system /sarcasm) Cheers, Greg
Re: Patch: Add parse_type Function
On 2024-03-09 02:42 +0100, David E. Wheeler wrote: > On Mar 7, 2024, at 23:39, Erik Wienhold wrote: > > > I think you need to swap the examples. The text mentions the error case > > first and the NULL case second. > > 臘♂️ > > Thanks, fixed in the attached patch. Thanks, LGTM. -- Erik
Re: Sequence Access Methods, round two
On Tue, Feb 27, 2024 at 10:27:13AM +0900, Michael Paquier wrote: > On Thu, Feb 22, 2024 at 05:36:00PM +0100, Tomas Vondra wrote: >> 0001 >> -- >> >> I think this bit in pg_proc.dat is not quite right: >> >> proallargtypes => '{regclass,bool,int8}', proargmodes => '{i,o,o}', >> proargnames => '{seqname,is_called,last_value}', >> >> the first argument should not be "seqname" but rather "seqid". > > Ah, right. There are not many system functions that use regclass as > arguments, but the existing ones refer more to IDs, not names. This patch set is not going to be merged for this release, so I am going to move it to the next commit fest to continue the discussion in v18~. Anyway, there is one piece of this patch set that I think has a lot of value outside of the discussion with access methods, which is to redesign pg_sequence_last_value so as it returns a (last_value, is_called) tuple rather than a (last_value). This has the benefit of switching pg_dump to use this function rather than relying on a scan of the heap table used by a sequence to retrieve the state of a sequence dumped. This is the main diff: -appendPQExpBuffer(query, - "SELECT last_value, is_called FROM %s", - fmtQualifiedDumpable(tbinfo)); +/* + * In versions 17 and up, pg_sequence_last_value() has been switched to + * return a tuple with last_value and is_called. + */ +if (fout->remoteVersion >= 17) +appendPQExpBuffer(query, + "SELECT last_value, is_called " + "FROM pg_sequence_last_value('%s')", + fmtQualifiedDumpable(tbinfo)); +else +appendPQExpBuffer(query, + "SELECT last_value, is_called FROM %s", + fmtQualifiedDumpable(tbinfo)); Are there any objections to that? pg_sequence_last_value() is something that we've only been relying on internally for the catalog pg_sequences. -- Michael signature.asc Description: PGP signature
Re: Add bump memory context type and use it for tuplesorts
On Mon, 11 Mar 2024 at 22:09, John Naylor wrote: > I ran the test function, but using 256kB and 3MB for the reset > frequency, and with 8,16,24,32 byte sizes (patched against a commit > after the recent hot/cold path separation). Images attached. I also > get a decent speedup with the bump context, but not quite as dramatic > as on your machine. It's worth noting that slab is the slowest for me. > This is an Intel i7-10750H. Thanks for trying this out. I didn't check if the performance was susceptible to the memory size before the reset. It certainly would be once the allocation crosses some critical threshold of CPU cache size, but probably it will also be to some extent regarding the number of actual mallocs that are required underneath. I see there's some discussion of bump in [1]. Do you still have a valid use case for bump for performance/memory usage reasons? The reason I ask is due to what Tom mentioned in [2] ("It's not apparent to me that the "bump context" idea is valuable enough to foreclose ever adding more context types"). So, I'm just probing to find other possible use cases that reinforce the usefulness of bump. It would be interesting to try it in a few places to see what performance gains could be had. I've not done much scouting around the codebase for other uses other than non-bounded tuplesorts. David [1] https://postgr.es/m/canwcazbxxhysytrpyz-wzbdtvrpwoete7rqm1g_+4cb8z6k...@mail.gmail.com [2] https://postgr.es/m/3537323.1708125...@sss.pgh.pa.us
Re: Add bump memory context type and use it for tuplesorts
On Tue, 12 Mar 2024 at 12:25, Tomas Vondra wrote: > (b) slab is considerably slower It would be interesting to modify SlabReset() to, instead of free()ing the blocks, push the first SLAB_MAXIMUM_EMPTY_BLOCKS of them onto the emptyblocks list. That might give us an idea of how much overhead comes from malloc/free. Having something like this as an option when creating a context might be a good idea. generation.c now keeps 1 "freeblock" which currently does not persist during context resets. Some memory context usages might suit having an option like this. Maybe something like the executor's per-tuple context, which perhaps (c|sh)ould be a generation context... However, saying that, I see you measure it to be slightly slower than aset. David
Re: [PROPOSAL] Skip test citext_utf8 on Windows
On Mon, Mar 11, 2024 at 03:21:11PM +0700, Oleg Tselebrovskiy wrote: > The proposed patch for skipping test is attached Your attached patch seems to be in binary format. -- Michael signature.asc Description: PGP signature
Re: Infinite loop in XLogPageRead() on standby
On Mon, Mar 11, 2024 at 04:43:32PM +0900, Kyotaro Horiguchi wrote: > At Wed, 6 Mar 2024 11:34:29 +0100, Alexander Kukushkin > wrote in >> Thank you for spending your time on it! > > You're welcome, but I aplogize for the delay in the work.. Thanks for spending time on this. Everybody is busy with the last commit fest, and the next minor release set is planned for May so there should still be time even after the feature freeze: https://www.postgresql.org/developer/roadmap/ I should be able to come back to this thread around the beginning of April. If you are able to do some progress in-between, that would be surely helpful. Thanks, -- Michael signature.asc Description: PGP signature
Re: Combine Prune and Freeze records emitted by vacuum
On 09/03/2024 22:41, Melanie Plageman wrote: On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas wrote: Does GlobalVisTestIsRemovableXid() handle FrozenTransactionId correctly? Okay, so I thought a lot about this, and I don't understand how GlobalVisTestIsRemovableXid() would not handle FrozenTransactionId correctly. vacrel->cutoffs.OldestXmin is computed initially from GetOldestNonRemovableTransactionId() which uses ComputeXidHorizons(). GlobalVisState is updated by ComputeXidHorizons() (when it is updated). I do see that the comment above GlobalVisTestIsRemovableXid() says * It is crucial that this only gets called for xids from a source that * protects against xid wraparounds (e.g. from a table and thus protected by * relfrozenxid). and then in * Convert 32 bit argument to FullTransactionId. We can do so safely * because we know the xid has to, at the very least, be between * [oldestXid, nextXid), i.e. within 2 billion of xid. I'm not sure what oldestXid is here. It's true that I don't see GlobalVisTestIsRemovableXid() being called anywhere else with an xmin as an input. I think that hints that it is not safe with FrozenTransactionId. But I don't see what could go wrong. Maybe it has something to do with converting it to a FullTransactionId? FullTransactionIdFromU64(U64FromFullTransactionId(rel) + (int32) (xid - rel_xid)); Sorry, I couldn't quite figure it out :( I just tested it. No, GlobalVisTestIsRemovableXid does not work for FrozenTransactionId. I just tested it with state->definitely_needed == {0, 40} and xid == FrozenTransactionid, and it incorrectly returned 'false'. It treats FrozenTransactionId as if was a regular xid '2'. The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for the case that there's no pruning, just freezing. The record format (xl_heap_prune) looks pretty complex as it is, I think it could be made both more compact and more clear with some refactoring. I'm happy to change up xl_heap_prune format. In its current form, according to pahole, it has no holes and just 3 bytes of padding at the end. One way we could make it smaller is by moving the isCatalogRel member to directly after snapshotConflictHorizon and then adding a flags field and defining flags to indicate whether or not other members exist at all. We could set bits for HAS_FREEZE_PLANS, HAS_REDIRECTED, HAS_UNUSED, HAS_DEAD. Then I would remove the non-optional uint16 nredirected, nunused, nplans, ndead and just put the number of redirected/unused/etc at the beginning of the arrays containing the offsets. Sounds good. Then I could write various macros for accessing them. That would make it smaller, but it definitely wouldn't make it less complex (IMO). I don't know, it might turn out not that complex. If you define the formats of each of those "sub-record types" as explicit structs, per attached sketch, you won't need so many macros. Some care is still needed with alignment though. -- Heikki Linnakangas Neon (https://neon.tech)/* * XXX: This one combined record type replaces existing * XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM, and XLOG_HEAP2_FREEZE_PAGE record types */ /* * This is what we need to know about page pruning and freezing, both during * VACUUM and during opportunistic pruning. * * If XLPH_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS or XLHP_HAS_FREEZE_PLANS is * set, acquires a full cleanup lock. Otherwise an ordinary exclusive lock is * enough (VACUUM's second heap pass generates such records). * * The data for block reference 0 contains "sub-records" depending on which * of the XLPH_HAS_* flags are set. See xlhp_* struct definitions below. */ typedef struct xl_heap_prune { TransactionId snapshotConflictHorizon; uint8 flags; } xl_heap_prune; #define XLHP_IS_CATALOG_REL 0x01 /* to handle recovery conflict during logical * decoding on standby */ #define XLHP_HAS_REDIRECTIONS 0x02 #define XLHP_HAS_DEAD_ITEMS 0x04 #define XLHP_HAS_NOW_UNUSED_ITEMS 0x08 #define XLHP_HAS_FREEZE_PLANS 0x10 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8)) /* * Sub-record contained in block reference 0 of a prune record if * XLHP_HAS_REDIRECTIONS is set */ typedef struct xl_heap_prune_redirections { uint16 nitems; struct { OffsetNumber from; OffsetNumber to; } items[FLEXIBLE_ARRAY_MEMBER]; } xl_heap_prune_redirections; /* * Sub-record contained in block reference 0 of a prune record if * XLHP_HAS_DEAD_ITEMS is set */ typedef struct xlhp_dead_items { uint16 nitems; OffsetNumber items[FLEXIBLE_ARRAY_MEMBER]; } xlhp_dead_items; /* *
RE: Popcount optimization using AVX512
> -Original Message- > From: Nathan Bossart > Sent: Thursday, March 7, 2024 1:36 PM > Subject: Re: Popcount optimization using AVX512 I will be splitting the request into 2 patches. I am attaching the first patch (refactoring only) and I updated the commitfest entry to match this patch. I have a question however: Do I need to wait for the refactor patch to be merged before I post the AVX portion of this feature in this thread? > > + PGAC_AVX512_POPCNT_INTRINSICS([-mavx512vpopcntdq -mavx512f]) > > I'm curious why we need both -mavx512vpopcntdq and -mavx512f. On my > machine, -mavx512vpopcntdq alone is enough to pass this test, so if there are > other instructions required that need -mavx512f, then we might need to > expand the test. First, nice catch on the required flags to build! When I changed my algorithm, dependence on the -mavx512f flag was no longer needed, In the second patch (AVX specific) I will fix this. > I still think it's worth breaking this change into at least 2 patches. In > particular, > I think there's an opportunity to do the refactoring into pg_popcnt_choose.c > and pg_popcnt_x86_64_accel.c prior to adding the AVX512 stuff. These > changes are likely straightforward, and getting them out of the way early > would make it easier to focus on the more interesting changes. IMHO there > are a lot of moving parts in this patch. As stated above I am doing this in 2 patches. :) > > +#undef HAVE__GET_CPUID_COUNT > > + > > +/* Define to 1 if you have immintrin. */ #undef HAVE__IMMINTRIN > > Is this missing HAVE__CPUIDEX? Yes I missed it, I will include in the second patch (AVX specific) of the 2 patches. > > uint64 > > -pg_popcount(const char *buf, int bytes) > > +pg_popcount_slow(const char *buf, int bytes) > > { > > uint64 popcnt = 0; > > > > -#if SIZEOF_VOID_P >= 8 > > +#if SIZEOF_VOID_P == 8 > > /* Process in 64-bit chunks if the buffer is aligned. */ > > if (buf == (const char *) TYPEALIGN(8, buf)) > > { > > @@ -311,7 +224,7 @@ pg_popcount(const char *buf, int bytes) > > > > buf = (const char *) words; > > } > > -#else > > +#elif SIZEOF_VOID_P == 4 > > /* Process in 32-bit chunks if the buffer is aligned. */ > > if (buf == (const char *) TYPEALIGN(4, buf)) > > { > > Apologies for harping on this, but I'm still not seeing the need for these > SIZEOF_VOID_P changes. While it's unlikely that this makes any practical > difference, I see no reason to more strictly check SIZEOF_VOID_P here. I got rid of the second occurrence as I agree it is not needed but unless you see something I don't how to know which function to call between a 32-bit and 64-bit architecture? Maybe I am missing something obvious? What exactly do you suggest here? I am happy to always call either pg_popcount32() or pg_popcount64() with the understanding that it may not be optimal, but I do need to know which to use. > > +/* Process any remaining bytes */ > > +while (bytes--) > > +popcnt += pg_number_of_ones[(unsigned char) *buf++]; > > +return popcnt; > > +#else > > +return pg_popcount_slow(buf, bytes); > > +#endif /* USE_AVX512_CODE */ > > nitpick: Could we call pg_popcount_slow() in a common section for these > "remaining bytes?" Agreed, will fix in the second patch as well. > > +#if defined(_MSC_VER) > > +pg_popcount_indirect = pg_popcount512_fast; #else > > +pg_popcount = pg_popcount512_fast; #endif > These _MSC_VER sections are interesting. I'm assuming this is the > workaround for the MSVC linking issue you mentioned above. I haven't > looked too closely, but I wonder if the CRC32C code (see > src/include/port/pg_crc32c.h) is doing something different to avoid this > issue. Using the latest master branch, I see what the needed changes are, I will implement using PGDLLIMPORT macro in the second patch. > Upthread, Alvaro suggested a benchmark [0] that might be useful. I scanned > through this thread and didn't see any recent benchmark results for the latest > form of the patch. I think it's worth verifying that we are still seeing the > expected improvements. I will get new benchmarks using the same process I used before (from Akash) so I get apples to apples. These are pending completion of the second patch which is still in progress. Just a reminder, I asked questions above about 1) multi-part dependent patches and, 2) What specifically to do about the SIZE_VOID_P checks. :) Thanks, Paul v7-0001-Refactor-POPCNT.patch Description: v7-0001-Refactor-POPCNT.patch
Re: Reports on obsolete Postgres versions
On Mon, Mar 11, 2024 at 04:12:04PM -0500, Nathan Bossart wrote: > On Mon, Mar 11, 2024 at 04:37:49PM -0400, Bruce Momjian wrote: > > https://www.postgresql.org/support/versioning/ > > > > This web page should correct the idea that "upgrades are more risky than > > staying with existing versions". Is there more we can do? Should we > > have a more consistent response for such reporters? > > I've read that the use of the term "minor release" can be confusing. While > the versioning page clearly describes what is eligible for a minor release, > not everyone reads it, so I suspect that many folks think there are new > features, etc. in minor releases. I think a "minor release" of Postgres is > more similar to what other projects would call a "patch version." Well, we do say: While upgrading will always contain some level of risk, PostgreSQL minor releases fix only frequently-encountered bugs, security issues, and data corruption problems to reduce the risk associated with upgrading. For minor releases, the community considers not upgrading to be riskier than upgrading. but that is far down the page. Do we need to improve this? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Reports on obsolete Postgres versions
On Mon, Mar 11, 2024 at 04:37:49PM -0400, Bruce Momjian wrote: > https://www.postgresql.org/support/versioning/ > > This web page should correct the idea that "upgrades are more risky than > staying with existing versions". Is there more we can do? Should we > have a more consistent response for such reporters? I've read that the use of the term "minor release" can be confusing. While the versioning page clearly describes what is eligible for a minor release, not everyone reads it, so I suspect that many folks think there are new features, etc. in minor releases. I think a "minor release" of Postgres is more similar to what other projects would call a "patch version." -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Confine vacuum skip logic to lazy_scan_skip
On Sun, Mar 10, 2024 at 11:01 PM Thomas Munro wrote: > > On Mon, Mar 11, 2024 at 5:31 AM Melanie Plageman > wrote: > > I have investigated the interaction between > > maintenance_io_concurrency, streaming reads, and the vacuum buffer > > access strategy (BAS_VACUUM). > > > > The streaming read API limits max_pinned_buffers to a pinned buffer > > multiplier (currently 4) * maintenance_io_concurrency buffers with the > > goal of constructing reads of at least MAX_BUFFERS_PER_TRANSFER size. > > > > Since the BAS_VACUUM ring buffer is size 256 kB or 32 buffers with > > default block size, that means that for a fully uncached vacuum in > > which all blocks must be vacuumed and will be dirtied, you'd have to > > set maintenance_io_concurrency at 8 or lower to see the same number of > > reuses (and shared buffer consumption) as master. > > > > Given that we allow users to specify BUFFER_USAGE_LIMIT to vacuum, it > > seems like we should force max_pinned_buffers to a value that > > guarantees the expected shared buffer usage by vacuum. But that means > > that maintenance_io_concurrency does not have a predictable impact on > > streaming read vacuum. > > > > What is the right thing to do here? > > > > At the least, the default size of the BAS_VACUUM ring buffer should be > > BLCKSZ * pinned_buffer_multiplier * default maintenance_io_concurrency > > (probably rounded up to the next power of two) bytes. > > Hmm, does the v6 look-ahead distance control algorithm mitigate that > problem? Using the ABC classifications from the streaming read > thread, I think for A it should now pin only 1, for B 16 and for C, it > depends on the size of the random 'chunks': if you have a lot of size > 1 random reads then it shouldn't go above 10 because of (default) > maintenance_io_concurrency. The only way to get up to very high > numbers would be to have a lot of random chunks triggering behaviour > C, but each made up of long runs of misses. For example one can > contrive a BHS query that happens to read pages 0-15 then 20-35 then > 40-55 etc etc so that we want to get lots of wide I/Os running > concurrently. Unless vacuum manages to do something like that, it > shouldn't be able to exceed 32 buffers very easily. > > I suspect that if we taught streaming_read.c to ask the > BufferAccessStrategy (if one is passed in) what its recommended pin > limit is (strategy->nbuffers?), we could just clamp > max_pinned_buffers, and it would be hard to find a workload where that > makes a difference, and we could think about more complicated logic > later. > > In other words, I think/hope your complaints about excessive pinning > from v5 WRT all-cached heap scans might have also already improved > this case by happy coincidence? I haven't tried it out though, I just > read your description of the problem... I've rebased the attached v10 over top of the changes to lazy_scan_heap() Heikki just committed and over the v6 streaming read patch set. I started testing them and see that you are right, we no longer pin too many buffers. However, the uncached example below is now slower with streaming read than on master -- it looks to be because it is doing twice as many WAL writes and syncs. I'm still investigating why that is. psql \ -c "create table small (a int) with (autovacuum_enabled=false, fillfactor=25);" \ -c "insert into small select generate_series(1,20) % 3;" \ -c "update small set a = 6 where a = 1;" pg_ctl stop # drop caches pg_ctl start psql -c "\timing on" -c "vacuum (verbose) small" - Melanie From 2b181d178f0cd55de45659540ec35536918a4c9d Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Mon, 11 Mar 2024 15:36:39 -0400 Subject: [PATCH v10 1/3] Streaming Read API --- src/backend/storage/Makefile | 2 +- src/backend/storage/aio/Makefile | 14 + src/backend/storage/aio/meson.build | 5 + src/backend/storage/aio/streaming_read.c | 648 +++ src/backend/storage/buffer/bufmgr.c | 641 +++--- src/backend/storage/buffer/localbuf.c| 14 +- src/backend/storage/meson.build | 1 + src/include/storage/bufmgr.h | 45 ++ src/include/storage/streaming_read.h | 52 ++ src/tools/pgindent/typedefs.list | 3 + 10 files changed, 1215 insertions(+), 210 deletions(-) create mode 100644 src/backend/storage/aio/Makefile create mode 100644 src/backend/storage/aio/meson.build create mode 100644 src/backend/storage/aio/streaming_read.c create mode 100644 src/include/storage/streaming_read.h diff --git a/src/backend/storage/Makefile b/src/backend/storage/Makefile index 8376cdfca20..eec03f6f2b4 100644 --- a/src/backend/storage/Makefile +++ b/src/backend/storage/Makefile @@ -8,6 +8,6 @@ subdir = src/backend/storage top_builddir = ../../.. include $(top_builddir)/src/Makefile.global -SUBDIRS = buffer file freespace ipc large_object lmgr page smgr sync +SUBDIRS = aio buffer file freespace ipc large_object lmgr page
Re: Reducing output size of nodeToString
On Mon, 11 Mar 2024 at 14:19, Peter Eisentraut wrote: > > On 22.02.24 16:07, Matthias van de Meent wrote: > > On Thu, 22 Feb 2024 at 13:37, Matthias van de Meent > > wrote: > >> > >> On Mon, 19 Feb 2024 at 14:19, Matthias van de Meent > >> wrote: > >>> Attached the updated version of the patch on top of 5497daf3, which > >>> incorporates this last round of feedback. > >> > >> Now attached rebased on top of 93db6cbd to fix conflicts with fbc93b8b > >> and an issue in the previous patchset: I attached one too many v3-0001 > >> from a previous patch I worked on. > > > > ... and now with a fix for not overwriting newly deserialized location > > attributes with -1, which breaks test output for > > READ_WRITE_PARSE_PLAN_TREES installations. Again, no other significant > > changes since the patch of last Monday. > > * v7-0002-pg_node_tree-Don-t-store-query-text-locations-in-.patch > > This patch looks much more complicated than I was expecting. I had > suggested to model this after stringToNodeWithLocations(). This uses a > global variable to toggle the mode. Your patch creates a function > nodeToStringNoQLocs() -- why the different naming scheme? It doesn't just exclude .location fields, but also Query.len, a similar field which contains the length of the query's string. The name has been further refined to nodeToStringNoParseLocs() in the attached version, but feel free to replace the names in the patch to anything else you might want. > -- and passes > the flag down as an argument to all the output functions. I mean, in a > green field, avoiding global variables can be sensible, of course, but I > think in this limited scope here it would really be better to keep the > code for the two directions read and write the same. I'm a big fan of _not_ using magic global variables as passed context without resetting on subnormal exits... For GUCs their usage is understandable (and there is infrastructure to reset them, and you're not supposed to manually update them), but IMO here its usage should be a function-scoped variable or in a passed-by-reference context struct, not a file-local static. Regardless, attached is an adapted version with the file-local variable implementation. > Attached is a small patch that shows what I had in mind. (It doesn't > contain any callers, but your patch shows where all those would go.) Attached a revised version that does it like stringToNodeInternal's handling of restore_location_fields. > * v7-0003-gen_node_support.pl-Mark-location-fields-as-type-.patch > > This looks sensible, but maybe making Location a global type is a bit > much? Maybe something more specific like ParseLocation, or ParseLoc, to > keep it under 12 characters. I've gone with ParseLoc in the attached v8 patchset. Kind regards, Matthias van de Meent v8-0003-pg_node_tree-Don-t-store-query-text-locations-in-.patch Description: Binary data v8-0001-pg_node_tree-Omit-serialization-of-fields-with-de.patch Description: Binary data v8-0005-nodeToString-omit-serializing-NULL-datums-in-Cons.patch Description: Binary data v8-0004-gen_node_support.pl-Add-a-TypMod-type-for-signall.patch Description: Binary data v8-0002-gen_node_support.pl-Mark-location-fields-as-type-.patch Description: Binary data v8-0006-nodeToString-Apply-RLE-on-Bitmapset-and-numeric-L.patch Description: Binary data v8-0008-nodeToString-omit-serializing-0s-in-enum-typed-fi.patch Description: Binary data v8-0007-gen_node_support.pl-Optimize-serialization-of-fie.patch Description: Binary data
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
On Fri, Mar 08, 2024 at 04:03:22PM -0600, Nathan Bossart wrote: > On Fri, Mar 08, 2024 at 09:33:19AM +, Dean Rasheed wrote: >> I think this is good to go. > > Thanks. In v4, I've added a first draft of the commit messages, and I am > planning to commit this early next week. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Confine vacuum skip logic to lazy_scan_skip
On Mon, Mar 11, 2024 at 2:47 PM Heikki Linnakangas wrote: > > > > Otherwise LGTM > > Ok, pushed! Thank you, this is much more understandable now! Cool, thanks!
Reports on obsolete Postgres versions
I am seeing an increasing number of bug/problem reports on obsolete Postgres versions, either not running a superseded minor version, or running an unsupported major version. What can we do to reduce such reports, or at least give a consistent response? It is very helpful that we have this web page, and I have made a habit of pointing reporters to that page since it has all the information they need: https://www.postgresql.org/support/versioning/ This web page should correct the idea that "upgrades are more risky than staying with existing versions". Is there more we can do? Should we have a more consistent response for such reporters? It would be a crazy idea to report something in the logs if a major version is run after a certain date, since we know the date when major versions will become unsupported. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Statistics Import and Export
> > > In which case we're failing nearly silently, yes, there is a null > returned, > > but we have no idea why there is a null returned. If I were using this > > function manually I'd want to know what I did wrong, what parameter I > > skipped, etc. > > I can see it both ways and don't feel super strongly about it ... I just > know that I've had some cases where we returned an ERROR or otherwise > were a bit noisy on NULL values getting passed into a function and it > was much more on the annoying side than on the helpful side; to the > point where we've gone back and pulled out ereport(ERROR) calls from > functions before because they were causing issues in otherwise pretty > reasonable queries (consider things like functions getting pushed down > to below WHERE clauses and such...). > I don't have strong feelings either. I think we should get more input on this. Regardless, it's easy to change...for now. > > Sure. Not a huge deal either way, was just pointing out the difference. > I do think it'd be good to match what ANALYZE does here, so checking if > the values in pg_class are different and only updating if they are, > while keeping the code for pg_statistic where it'll just always update. > I agree that mirroring ANALYZE wherever possible is the ideal. > > I like the symmetry of a consistent interface, but we've already got an > > asymmetry in that the pg_class update is done non-transactionally (like > > ANALYZE does). > > Don't know that I really consider that to be the same kind of thing when > it comes to talking about the interface as the other aspects we're > discussing ... > Fair. > > > One persistent problem is that there is no _safe equivalent to ARRAY_IN, > so > > that can always fail on us, though it should only do so if the string > > passed in wasn't a valid array input format, or the values in the array > > can't coerce to the attribute's basetype. > > That would happen before we even get to being called and there's not > much to do about it anyway. > Not sure I follow you here. the ARRAY_IN function calls happen once for every non-null stavaluesN parameter, and it's done inside the function because the result type could be the base type for a domain/array type, or could be the type itself. I suppose we could move that determination to the caller, but then we'd need to call get_base_element_type() inside a client, and that seems wrong if it's even possible. > > I should also point out that we've lost the ability to check if the > export > > values were of a type, and if the destination column is also of that > type. > > That's a non-issue in binary upgrades, but of course if a field changed > > from integers to text the histograms would now be highly misleading. > > Thoughts on adding a typname parameter that the function uses as a cheap > > validity check? > > Seems reasonable to me. > I'd like to hear what Tomas thinks about this, as he was the initial advocate for it. > > As for pg_dump, I'm currently leading toward the TOC entry having either > a > > series of commands: > > > > SELECT pg_set_relation_stats('foo.bar'::regclass, ...); > > pg_set_attribute_stats('foo.bar'::regclass, 'id'::name, ...); ... > > I'm guessing the above was intended to be SELECT ..; SELECT ..; > Yes. > > > Or one compound command > > > > SELECT pg_set_relation_stats(t.oid, ...) > > pg_set_attribute_stats(t.oid, 'id'::name, ...), > > pg_set_attribute_stats(t.oid, 'last_name'::name, ...), > > ... > > FROM (VALUES('foo.bar'::regclass)) AS t(oid); > > > > The second one has the feature that if any one attribute fails, then the > > whole update fails, except, of course, for the in-place update of > pg_class. > > This avoids having an explicit transaction block, but we could get that > > back by having restore wrap the list of commands in a transaction block > > (and adding the explicit lock commands) when it is safe to do so. > > Hm, I like this approach as it should essentially give us the > transaction block we had been talking about wanting but without needing > to explicitly do a begin/commit, which would add in some annoying > complications. This would hopefully also reduce the locking concern > mentioned previously, since we'd get the lock needed in the first > function call and then the others would be able to just see that we've > already got the lock pretty quickly. > True, we'd get the lock needed in the first function call, but wouldn't we also release that very lock before the subsequent call? Obviously we'd be shrinking the window in which another process could get in line and take a superior lock, and the universe of other processes that would even want a lock that blocks us is nil in the case of an upgrade, identical to existing behavior in the case of an FDW ANALYZE, and perfectly fine in the case of someone tinkering with stats. > > > Subject: [PATCH v8] Create pg_set_relation_stats, pg_set_attribute_stats. > > [...] > > > +Datum
Re: remaining sql/json patches
Thanka Alvaro. It works fine when quotes are used around the column name. On Mon, Mar 11, 2024 at 9:04 PM Alvaro Herrera wrote: > On 2024-Mar-11, Shruthi Gowda wrote: > > > *CASE 2:* > > -- > > SELECT * FROM JSON_TABLE(jsonb '{ > > "id" : 901, > > "age" : 30, > > "*FULL_NAME*" : "KATE DANIEL"}', > > '$' > > COLUMNS( > > FULL_NAME varchar(20), > > ID int, > > AGE int > > ) > >) as t; > > I think this is expected: when you use FULL_NAME as a SQL identifier, it > is down-cased, so it no longer matches the uppercase identifier in the > JSON data. You'd have to do it like this: > > SELECT * FROM JSON_TABLE(jsonb '{ > "id" : 901, > "age" : 30, > "*FULL_NAME*" : "KATE DANIEL"}', > '$' > COLUMNS( > "FULL_NAME" varchar(20), > ID int, > AGE int > ) >) as t; > > so that the SQL identifier is not downcased. > > -- > Álvaro Herrera PostgreSQL Developer — > https://www.EnterpriseDB.com/ >
Re: Statistics Import and Export
Greetings, * Corey Huinker (corey.huin...@gmail.com) wrote: > > Having thought about it a bit more, I generally like the idea of being > > able to just update one stat instead of having to update all of them at > > once (and therefore having to go look up what the other values currently > > are...). That said, per below, perhaps making it strict is the better > > plan. > > v8 has it as strict. Ok. > > > > Also, in some cases we allow the function to be called with a > > > > NULL but then make it a no-op rather than throwing an ERROR (eg, if the > > > > OID ends up being NULL). > > > > > > Thoughts on it emitting a WARN or NOTICE before returning false? > > > > Eh, I don't think so? > > > > Where this is coming from is that we can often end up with functions > > like these being called inside of larger queries, and having them spit > > out WARN or NOTICE will just make them noisy. > > > > That leads to my general feeling of just returning NULL if called with a > > NULL OID, as we would get with setting the function strict. > > In which case we're failing nearly silently, yes, there is a null returned, > but we have no idea why there is a null returned. If I were using this > function manually I'd want to know what I did wrong, what parameter I > skipped, etc. I can see it both ways and don't feel super strongly about it ... I just know that I've had some cases where we returned an ERROR or otherwise were a bit noisy on NULL values getting passed into a function and it was much more on the annoying side than on the helpful side; to the point where we've gone back and pulled out ereport(ERROR) calls from functions before because they were causing issues in otherwise pretty reasonable queries (consider things like functions getting pushed down to below WHERE clauses and such...). > > Well, that code is for pg_statistic while I was looking at pg_class (in > > vacuum.c:1428-1443, where we track if we're actually changing anything > > and only make the pg_class change if there's actually something > > different): > > I can do that, especially since it's only 3 tuples of known types, but my > reservations are summed up in the next comment. > > Not sure why we don't treat both the same way though ... although it's > > probably the case that it's much less likely to have an entire > > pg_statistic row be identical than the few values in pg_class. > > That would also involve comparing ANYARRAY values, yuk. Also, a matched > record will never be the case when used in primary purpose of the function > (upgrades), and not a big deal in the other future cases (if we use it in > ANALYZE on foreign tables instead of remote table samples, users > experimenting with tuning queries under hypothetical workloads). Sure. Not a huge deal either way, was just pointing out the difference. I do think it'd be good to match what ANALYZE does here, so checking if the values in pg_class are different and only updating if they are, while keeping the code for pg_statistic where it'll just always update. > > Hmm, that's a valid point, so a NULL passed in would need to set that > > value actually to NULL, presumably. Perhaps then we should have > > pg_set_relation_stats() be strict and have pg_set_attribute_stats() > > handles NULLs passed in appropriately, and return NULL if the relation > > itself or attname, or other required (not NULL'able) argument passed in > > cause the function to return NULL. > > > > That's how I have relstats done in v8, and could make it do that for attr > stats. That'd be my suggestion, at least, but as I mention above, it's not a position I hold very strongly. > > (What I'm trying to drive at here is a consistent interface for these > > functions, but one which does a no-op instead of returning an ERROR on > > values being passed in which aren't allowable; it can be quite > > frustrating trying to get a query to work where one of the functions > > decides to return ERROR instead of just ignoring things passed in which > > aren't valid.) > > I like the symmetry of a consistent interface, but we've already got an > asymmetry in that the pg_class update is done non-transactionally (like > ANALYZE does). Don't know that I really consider that to be the same kind of thing when it comes to talking about the interface as the other aspects we're discussing ... > One persistent problem is that there is no _safe equivalent to ARRAY_IN, so > that can always fail on us, though it should only do so if the string > passed in wasn't a valid array input format, or the values in the array > can't coerce to the attribute's basetype. That would happen before we even get to being called and there's not much to do about it anyway. > I should also point out that we've lost the ability to check if the export > values were of a type, and if the destination column is also of that type. > That's a non-issue in binary upgrades, but of course if a field changed > from integers to text the histograms would now be
Re: Confine vacuum skip logic to lazy_scan_skip
On 11/03/2024 18:15, Melanie Plageman wrote: On Mon, Mar 11, 2024 at 11:29:44AM +0200, Heikki Linnakangas wrote: diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index ac55ebd2ae..1757eb49b7 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c + /* - * lazy_scan_skip() -- set up range of skippable blocks using visibility map. + * heap_vac_scan_next_block() -- get next block for vacuum to process * - * lazy_scan_heap() calls here every time it needs to set up a new range of - * blocks to skip via the visibility map. Caller passes the next block in - * line. We return a next_unskippable_block for this range. When there are - * no skippable blocks we just return caller's next_block. The all-visible - * status of the returned block is set in *next_unskippable_allvis for caller, - * too. Block usually won't be all-visible (since it's unskippable), but it - * can be during aggressive VACUUMs (as well as in certain edge cases). + * lazy_scan_heap() calls here every time it needs to get the next block to + * prune and vacuum. The function uses the visibility map, vacuum options, + * and various thresholds to skip blocks which do not need to be processed and + * sets blkno to the next block that actually needs to be processed. I wonder if "need" is too strong a word since this function (heap_vac_scan_next_block()) specifically can set blkno to a block which doesn't *need* to be processed but which it chooses to process because of SKIP_PAGES_THRESHOLD. Ok yeah, there's a lot of "needs" here :-). Fixed. * - * Sets *skipping_current_range to indicate if caller should skip this range. - * Costs and benefits drive our decision. Very small ranges won't be skipped. + * The block number and visibility status of the next block to process are set + * in *blkno and *all_visible_according_to_vm. The return value is false if + * there are no further blocks to process. + * + * vacrel is an in/out parameter here; vacuum options and information about + * the relation are read, and vacrel->skippedallvis is set to ensure we don't + * advance relfrozenxid when we have skipped vacuuming all-visible blocks. It Maybe this should say when we have skipped vacuuming all-visible blocks which are not all-frozen or just blocks which are not all-frozen. Ok, rephrased. + * also holds information about the next unskippable block, as bookkeeping for + * this function. * * Note: our opinion of which blocks can be skipped can go stale immediately. * It's okay if caller "misses" a page whose all-visible or all-frozen marking Wonder if it makes sense to move this note to find_next_nunskippable_block(). Moved. @@ -1098,26 +1081,119 @@ lazy_scan_heap(LVRelState *vacrel) * older XIDs/MXIDs. The vacrel->skippedallvis flag will be set here when the * choice to skip such a range is actually made, making everything safe.) */ -static BlockNumber -lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer, BlockNumber next_block, - bool *next_unskippable_allvis, bool *skipping_current_range) +static bool +heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno, +bool *all_visible_according_to_vm) { - BlockNumber rel_pages = vacrel->rel_pages, - next_unskippable_block = next_block, - nskippable_blocks = 0; - boolskipsallvis = false; + BlockNumber next_block; - *next_unskippable_allvis = true; - while (next_unskippable_block < rel_pages) + /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */ + next_block = vacrel->current_block + 1; + + /* Have we reached the end of the relation? */ No strong opinion on this, but I wonder if being at the end of the relation counts as a fourth state? Yeah, perhaps. But I think it makes sense to treat it as a special case. + if (next_block >= vacrel->rel_pages) + { + if (BufferIsValid(vacrel->next_unskippable_vmbuffer)) + { + ReleaseBuffer(vacrel->next_unskippable_vmbuffer); + vacrel->next_unskippable_vmbuffer = InvalidBuffer; + } + *blkno = vacrel->rel_pages; + return false; + } + + /* +* We must be in one of the three following states: +*/ + if (vacrel->next_unskippable_block == InvalidBlockNumber || + next_block > vacrel->next_unskippable_block) + { + /* +* 1. We have just processed an unskippable block (or we're at the +* beginning of the scan). Find the next unskippable block using the +* visibility map. +*/ I would reorder the options in the comment or in the if statement since they seem to be in the
Re: btree: downlink right separator/HIKEY optimization
On Fri, 8 Mar 2024 at 20:14, Peter Geoghegan wrote: > > On Thu, Feb 22, 2024 at 10:42 AM Matthias van de Meent > wrote: > > I forgot to address this in the previous patch, so here's v3 which > > fixes the issue warning. > > What benchmarking have you done here? I have benchmarked this atop various versions of master when it was part of the btree specialization patchset, where it showed a 2-9% increase in btree insert performance over the previous patch in the patchset on the various index types in that set. More recently, on an unlogged pgbench with foreign keys enabled (-s400 -j4 -c8) I can't find any obvious regressions (it gains 0-0.7% on master across 5-minute runs), while being 4.5% faster on inserting data on a table with an excessively bad index shape (single index of 10 columns of empty strings with the non-default "nl-BE-x-icu" collation followed by 1 random uuid column, inserted from a 10M row dataset. Extrapolation indicates this could indeed get over 7% improvement when the index shape is 31 nondefault -collated nonnull text columns and a single random ID index column). > Have you tried just reordering things in _bt_search() instead? If we > delay the check until after the binary search, then the result of the > binary search is usually proof enough that we cannot possibly need to > move right. That has the advantage of not requiring that we copy > anything to the stack. I've not tried that, because it would makes page-level prefix truncation more expensive by ~50%: With this patch, we need only 2 full tuple _bt_compares per page before we can establish a prefix, while without this patch (i.e. if we did a binsrch-first approach) we'd need 3 on average (assuming linearly randomly distributed accesses). Because full-key compares can be arbitrarily more expensive than normal attribute compares, I'd rather not have that 50% overhead. > > On Fri, Mar 8, 2024 at 2:14 PM Peter Geoghegan wrote: > > What benchmarking have you done here? > I think that the memcmp() test is subtly wrong: Good catch, it's been fixed in the attached version, using a new function. Kind regards, Matthias van de Meent. v3-0001-btree-optimize-_bt_moveright-using-downlink-s-rig.patch Description: Binary data
Re: UUID v7
> On 11 Mar 2024, at 20:56, Jelte Fennema-Nio wrote: > > Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004) Thanks! > Now with the added comments, one thing pops out to me: The comments > mention that we use "Monotonic Random", but when I read the spec that > explicitly recommends against using an increment of 1 when using > monotonic random. I feel like if we use an increment of 1, we're > better off going for the "Fixed-Length Dedicated Counter Bits" method > (i.e. change the code to start the counter at 0). See patch 0005 for > an example of that change. > > I'm also wondering if we really want to use the extra rand_b bits for > this. The spec says we MAY, but it does remove the amount of > randomness in our UUIDs. Method 1 is a just a Method 2 with specifically picked constants. But I'll have to use some hand-wavy wordings... UUID consists of these 128 bits: a. Mandatory 2 var and 4 ver bits. b. Flexible but strongly recommended 48 bits unix_ts_ms. These bits contribute to global sortability of values generated at frequency less than 1KHz. c. Counter bits: c1. Initialised with 0 on any time tick. c2. Initialised with randomness. c3*. bit width of a counter step (*not counted in 128 bit capacity, can be non-integral) d. Randomness bits. Method 1 is when c2=0. My implementation of method 2 uses c1=1, c2=17 Consider all UUIDs generated at any given milliseconds. Probability of a collision of two UUIDs generated at frequency less than 1KHz is p = 2^-(c2+d) Capacity of a counter has expected value of c = 2^(c1)*2^(c2-1)/2^c3 To guess next UUID you can correctly pick one of u = 2^(d+c3) First, observe that c3 contributes unguessability at exactly same scale as decreases counter capacity. There is no difference between using bits in d directly, or in c3. There is no point in non-zero c3. Every bit that could be given to c3 can equally be given to d. Second, observe that c2 bits contribute to both collision protection and counter capacity! And when the time ticks, c2 also contribute to unguessability! So, technically, we should consider using all available bits as c2 bits. How many c1 bits do we need? I've chosen one - to prevent occasional counter capacity reduction. If c1 = 1, we can distribute 73 bits between c2 and d. I've chosen c2 = 17 and d = 56 as an arbitrary compromise between capacity of one backend per ms and prevention of global collision. This compromise is mostly dictated by maximum frequency of UUID generation by one backend, I've chosen 200MHz as a sane value. This compromise is much easier when you do not have 74 spare bits, this crazy amount of information forgives almost any mistake. Imagine you have to distribute 10 bits between c2 and d. And you try to prevent collision between 10 independent devices which need capacity to generate IDs with frequency of 10KHz each and keep sortability. You would have something like c1=1, c2=3,d=6. Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that. I don't think this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us too much freedom of choice. Thanks! Best regards, Andrey Borodin.
Re: Statistics Import and Export
> > > > > Having thought about it a bit more, I generally like the idea of being > able to just update one stat instead of having to update all of them at > once (and therefore having to go look up what the other values currently > are...). That said, per below, perhaps making it strict is the better > plan. > v8 has it as strict. > > > > Also, in some cases we allow the function to be called with a > > > NULL but then make it a no-op rather than throwing an ERROR (eg, if the > > > OID ends up being NULL). > > > > Thoughts on it emitting a WARN or NOTICE before returning false? > > Eh, I don't think so? > > Where this is coming from is that we can often end up with functions > like these being called inside of larger queries, and having them spit > out WARN or NOTICE will just make them noisy. > > That leads to my general feeling of just returning NULL if called with a > NULL OID, as we would get with setting the function strict. > In which case we're failing nearly silently, yes, there is a null returned, but we have no idea why there is a null returned. If I were using this function manually I'd want to know what I did wrong, what parameter I skipped, etc. > Well, that code is for pg_statistic while I was looking at pg_class (in > vacuum.c:1428-1443, where we track if we're actually changing anything > and only make the pg_class change if there's actually something > different): > I can do that, especially since it's only 3 tuples of known types, but my reservations are summed up in the next comment. > Not sure why we don't treat both the same way though ... although it's > probably the case that it's much less likely to have an entire > pg_statistic row be identical than the few values in pg_class. > That would also involve comparing ANYARRAY values, yuk. Also, a matched record will never be the case when used in primary purpose of the function (upgrades), and not a big deal in the other future cases (if we use it in ANALYZE on foreign tables instead of remote table samples, users experimenting with tuning queries under hypothetical workloads). > Hmm, that's a valid point, so a NULL passed in would need to set that > value actually to NULL, presumably. Perhaps then we should have > pg_set_relation_stats() be strict and have pg_set_attribute_stats() > handles NULLs passed in appropriately, and return NULL if the relation > itself or attname, or other required (not NULL'able) argument passed in > cause the function to return NULL. > That's how I have relstats done in v8, and could make it do that for attr stats. (What I'm trying to drive at here is a consistent interface for these > functions, but one which does a no-op instead of returning an ERROR on > values being passed in which aren't allowable; it can be quite > frustrating trying to get a query to work where one of the functions > decides to return ERROR instead of just ignoring things passed in which > aren't valid.) > I like the symmetry of a consistent interface, but we've already got an asymmetry in that the pg_class update is done non-transactionally (like ANALYZE does). One persistent problem is that there is no _safe equivalent to ARRAY_IN, so that can always fail on us, though it should only do so if the string passed in wasn't a valid array input format, or the values in the array can't coerce to the attribute's basetype. I should also point out that we've lost the ability to check if the export values were of a type, and if the destination column is also of that type. That's a non-issue in binary upgrades, but of course if a field changed from integers to text the histograms would now be highly misleading. Thoughts on adding a typname parameter that the function uses as a cheap validity check? v8 attached, incorporating these suggestions plus those of Bharath and Bertrand. Still no pg_dump. As for pg_dump, I'm currently leading toward the TOC entry having either a series of commands: SELECT pg_set_relation_stats('foo.bar'::regclass, ...); pg_set_attribute_stats('foo.bar'::regclass, 'id'::name, ...); ... Or one compound command SELECT pg_set_relation_stats(t.oid, ...) pg_set_attribute_stats(t.oid, 'id'::name, ...), pg_set_attribute_stats(t.oid, 'last_name'::name, ...), ... FROM (VALUES('foo.bar'::regclass)) AS t(oid); The second one has the feature that if any one attribute fails, then the whole update fails, except, of course, for the in-place update of pg_class. This avoids having an explicit transaction block, but we could get that back by having restore wrap the list of commands in a transaction block (and adding the explicit lock commands) when it is safe to do so. From bdfde573f4f79770439a1455c1cb337701eb20dc Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Mon, 11 Mar 2024 14:18:39 -0400 Subject: [PATCH v8] Create pg_set_relation_stats, pg_set_attribute_stats. These functions will be used by pg_dump/restore and pg_upgrade to convey relation and attribute
Alternative SAOP support for GiST
Hi All, During my journey of designing Pg based solution at my work I was severely hit by several shortcomings in GiST. The most severe one is the lack of support for SAOP filters as it makes it difficult to have partition pruning and index (only) scans working together. To overcome the difficulties I implemented a simple extension: https://github.com/mkleczek/btree_gist_extra mkleczek/btree_gist_extra: Extra operators support for PostgreSQL btree_gist github.com Since it provides a separate operator class from btree_gist it requires re-indexing the data. So I thought maybe it would be a good idea to incorporate it into btree_gist. I am aware of two patches related to SAOP being discussed at the moment but I am not sure if SAOP support for GiST indexes is planned. Let me know if you think it is a good idea to work on a patch. More general remark: I am wondering if SAOP support in core should not be implemented by mapping SAOP operations to specialised operators and delegating all the work to index AMs. That way core could be decoupled from particular index AMs implementation details. Thanks! — Michal
Re: Confine vacuum skip logic to lazy_scan_skip
On Mon, Mar 11, 2024 at 11:29:44AM +0200, Heikki Linnakangas wrote: > > > I see why you removed my treatise-level comment that was here about > > unskipped skippable blocks. However, when I was trying to understand > > this code, I did wish there was some comment that explained to me why we > > needed all of the variables next_unskippable_block, > > next_unskippable_allvis, all_visible_according_to_vm, and current_block. > > > > The idea that we would choose not to skip a skippable block because of > > kernel readahead makes sense. The part that I had trouble wrapping my > > head around was that we want to also keep the visibility status of both > > the beginning and ending blocks of the skippable range and then use > > those to infer the visibility status of the intervening blocks without > > another VM lookup if we decide not to skip them. > > Right, I removed the comment because looked a little out of place and it > duplicated the other comments sprinkled in the function. I agree this could > still use some more comments though. > > Here's yet another attempt at making this more readable. I moved the logic > to find the next unskippable block to a separate function, and added > comments to make the states more explicit. What do you think? Oh, I like the new structure. Very cool! Just a few remarks: > From c21480e9da61e145573de3b502551dde1b8fa3f6 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas > Date: Fri, 8 Mar 2024 17:32:19 +0200 > Subject: [PATCH v9 1/2] Confine vacuum skip logic to lazy_scan_skip() > > Rename lazy_scan_skip() to heap_vac_scan_next_block() and move more > code into the function, so that the caller doesn't need to know about > ranges or skipping anymore. heap_vac_scan_next_block() returns the > next block to process, and the logic for determining that block is all > within the function. This makes the skipping logic easier to > understand, as it's all in the same function, and makes the calling > code easier to understand as it's less cluttered. The state variables > needed to manage the skipping logic are moved to LVRelState. > > heap_vac_scan_next_block() now manages its own VM buffer separately > from the caller's vmbuffer variable. The caller's vmbuffer holds the > VM page for the current block its processing, while > heap_vac_scan_next_block() keeps a pin on the VM page for the next > unskippable block. Most of the time they are the same, so we hold two > pins on the same buffer, but it's more convenient to manage them > separately. > > For readability inside heap_vac_scan_next_block(), move the logic of > finding the next unskippable block to separate function, and add some > comments. > > This refactoring will also help future patches to switch to using a > streaming read interface, and eventually AIO > (https://postgr.es/m/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com) > > Author: Melanie Plageman, with some changes by me I'd argue you earned co-authorship by now :) > Discussion: > https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com > --- > src/backend/access/heap/vacuumlazy.c | 233 +-- > 1 file changed, 146 insertions(+), 87 deletions(-) > > diff --git a/src/backend/access/heap/vacuumlazy.c > b/src/backend/access/heap/vacuumlazy.c > index ac55ebd2ae..1757eb49b7 100644 > --- a/src/backend/access/heap/vacuumlazy.c > +++ b/src/backend/access/heap/vacuumlazy.c > + > /* > - * lazy_scan_skip() -- set up range of skippable blocks using visibility > map. > + * heap_vac_scan_next_block() -- get next block for vacuum to process > * > - * lazy_scan_heap() calls here every time it needs to set up a new range of > - * blocks to skip via the visibility map. Caller passes the next block in > - * line. We return a next_unskippable_block for this range. When there are > - * no skippable blocks we just return caller's next_block. The all-visible > - * status of the returned block is set in *next_unskippable_allvis for > caller, > - * too. Block usually won't be all-visible (since it's unskippable), but it > - * can be during aggressive VACUUMs (as well as in certain edge cases). > + * lazy_scan_heap() calls here every time it needs to get the next block to > + * prune and vacuum. The function uses the visibility map, vacuum options, > + * and various thresholds to skip blocks which do not need to be processed > and I wonder if "need" is too strong a word since this function (heap_vac_scan_next_block()) specifically can set blkno to a block which doesn't *need* to be processed but which it chooses to process because of SKIP_PAGES_THRESHOLD. > + * sets blkno to the next block that actually needs to be processed. > * > - * Sets *skipping_current_range to indicate if caller should skip this range. > - * Costs and benefits drive our decision. Very small ranges won't be > skipped. > + * The block number and visibility status of the next block to process are >
Re: Avoiding inadvertent debugging mode for pgbench
On 2024-Mar-01, Euler Taveira wrote: > I don't like to break backward compatibility but in this case I suspect that > it > is ok. I don't recall the last time I saw a script that makes use of -d > option. > How often do you need a pgbench debug information? I wondered what the difference actually is, so I checked. In -i mode, the only difference is that if the tables don't exist before hand, we receive the NOTICE that it doesn't. In normal mode, the -d switch emits so much junk that I would believe if somebody told me that passing -d distorted the benchmark results; and it's hard to believe that such output is valuable for anything other than debugging pgbench itself. All in all, I support the original patch. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "I love the Postgres community. It's all about doing things _properly_. :-)" (David Garamond)
Re: UUID v7
Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004) Now with the added comments, one thing pops out to me: The comments mention that we use "Monotonic Random", but when I read the spec that explicitly recommends against using an increment of 1 when using monotonic random. I feel like if we use an increment of 1, we're better off going for the "Fixed-Length Dedicated Counter Bits" method (i.e. change the code to start the counter at 0). See patch 0005 for an example of that change. I'm also wondering if we really want to use the extra rand_b bits for this. The spec says we MAY, but it does remove the amount of randomness in our UUIDs. v21-0001-Implement-UUID-v7.patch Description: Binary data v21-0003-Run-pgindent.patch Description: Binary data v21-0005-Change-to-Fixed-Length-Dedicated-Counter-Bits.patch Description: Binary data v21-0004-Fix-comments-a-bit.patch Description: Binary data v21-0002-Fix-typos.patch Description: Binary data
Re: remaining sql/json patches
On 2024-Mar-11, Shruthi Gowda wrote: > *CASE 2:* > -- > SELECT * FROM JSON_TABLE(jsonb '{ > "id" : 901, > "age" : 30, > "*FULL_NAME*" : "KATE DANIEL"}', > '$' > COLUMNS( > FULL_NAME varchar(20), > ID int, > AGE int > ) >) as t; I think this is expected: when you use FULL_NAME as a SQL identifier, it is down-cased, so it no longer matches the uppercase identifier in the JSON data. You'd have to do it like this: SELECT * FROM JSON_TABLE(jsonb '{ "id" : 901, "age" : 30, "*FULL_NAME*" : "KATE DANIEL"}', '$' COLUMNS( "FULL_NAME" varchar(20), ID int, AGE int ) ) as t; so that the SQL identifier is not downcased. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: remaining sql/json patches
Hi, I was experimenting with the v42 patches, and I tried testing without providing the path explicitly. There is one difference between the two test cases that I have highlighted in blue. The full_name column is empty in the second test case result. Let me know if this is an issue or expected behaviour. *CASE 1:* --- SELECT * FROM JSON_TABLE(jsonb '{ "id" : 901, "age" : 30, "*full_name*" : "KATE DANIEL"}', '$' COLUMNS( FULL_NAME varchar(20), ID int, AGE int ) ) as t; *RESULT:* full_name | id | age -+-+- KATE DANIEL | 901 | 30 (1 row) *CASE 2:* -- SELECT * FROM JSON_TABLE(jsonb '{ "id" : 901, "age" : 30, "*FULL_NAME*" : "KATE DANIEL"}', '$' COLUMNS( FULL_NAME varchar(20), ID int, AGE int ) ) as t; *RESULT:* full_name | id | age ---+-+- | 901 | 30 (1 row) Thanks & Regards, Shruthi K C EnterpriseDB: http://www.enterprisedb.com
Disconnect if socket cannot be put into non-blocking mode
While self-reviewing my "Refactoring backend fork+exec code" patches, I noticed this in pq_init(): /* * In backends (as soon as forked) we operate the underlying socket in * nonblocking mode and use latches to implement blocking semantics if * needed. That allows us to provide safely interruptible reads and * writes. * * Use COMMERROR on failure, because ERROR would try to send the error to * the client, which might require changing the mode again, leading to * infinite recursion. */ #ifndef WIN32 if (!pg_set_noblock(MyProcPort->sock)) ereport(COMMERROR, (errmsg("could not set socket to nonblocking mode: %m"))); #endif #ifndef WIN32 /* Don't give the socket to any subprograms we execute. */ if (fcntl(MyProcPort->sock, F_SETFD, FD_CLOEXEC) < 0) elog(FATAL, "fcntl(F_SETFD) failed on socket: %m"); #endif Using COMMERROR here seems bogus. Firstly, if there was a problem with recursion, surely the elog(FATAL) that follows would also be wrong. But more seriously, it's not cool to continue using the connection as if everything is OK, if we fail to put it into non-blocking mode. We should disconnect. (COMMERROR merely logs the error, it does not bail out like ERROR does) Barring objections, I'll commit and backpatch the attached to fix that. -- Heikki Linnakangas Neon (https://neon.tech)From fc737586783a05672ff8eb96e245dfdeeadcd506 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Mon, 11 Mar 2024 16:36:37 +0200 Subject: [PATCH 1/1] Disconnect if socket cannot be put into non-blocking mode Commit 387da18874 moved the code to put socket into non-blocking mode from socket_set_nonblocking() into the one-time initialization function, pg_init(). In socket_set_nonblocking(), there indeed was a risk of recursion on failure like the comment said, but in pg_init(), ERROR or FATAL is fine. There's even another elog(FATAL) just after this, if setting FD_CLOEXEC fails. Note that COMMERROR merely logged the error, it did not close the connection, so if putting the socket to non-blocking mode failed we would use the connection anyway. You might not immediately notice, because most socket operations in a regular backend wait for the socket to become readable/writable anyway. But e.g. replication will be quite broken. Backpatch to all supported versions. Discussion: xx --- src/backend/libpq/pqcomm.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index c606bf34473..d9e49ca7028 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -189,14 +189,10 @@ pq_init(void) * nonblocking mode and use latches to implement blocking semantics if * needed. That allows us to provide safely interruptible reads and * writes. - * - * Use COMMERROR on failure, because ERROR would try to send the error to - * the client, which might require changing the mode again, leading to - * infinite recursion. */ #ifndef WIN32 if (!pg_set_noblock(MyProcPort->sock)) - ereport(COMMERROR, + ereport(FATAL, (errmsg("could not set socket to nonblocking mode: %m"))); #endif -- 2.39.2
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 11, 2024 at 04:09:27PM +0530, Amit Kapila wrote: > I don't see how it will be easier for the user to choose the default > value of 'max_slot_xid_age' compared to 'max_slot_wal_keep_size'. But, > I agree similar to 'max_slot_wal_keep_size', 'max_slot_xid_age' can be > another parameter to allow vacuum to proceed removing the rows which > otherwise it wouldn't have been as those would be required by some > slot. Yeah, the idea is to help prevent transaction ID wraparound, so I would expect max_slot_xid_age to ordinarily be set relatively high, i.e., 1.5B+. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: WIP Incremental JSON Parser
On Sun, Mar 10, 2024 at 11:43 PM Andrew Dunstan wrote: > I haven't managed to reproduce this. But I'm including some tests for it. If I remove the newline from the end of the new tests: > @@ -25,7 +25,7 @@ for (my $size = 64; $size > 0; $size--) > foreach my $test_string (@inlinetests) > { > my ($fh, $fname) = tempfile(); > - print $fh "$test_string\n"; > + print $fh "$test_string"; > close($fh); > > foreach my $size (1..6, 10, 20, 30) then I can reproduce the same result as my local tests. That extra whitespace appears to help the partial token logic out somehow. --Jacob
Re: Add Index-level REINDEX with multiple jobs
On Tue, 6 Feb 2024 at 09:22, Michael Paquier wrote: > > The problem may be actually trickier than that, no? Could there be > other factors to take into account for their classification, like > their sizes (typically, we'd want to process the biggest one first, I > guess)? Sorry for a late reply. Thanks for an explanation. This is sounds reasonable to me. Svetlana had addressed this in the patch v2. -- Best regards, Maxim Orlov.
Re: Reducing output size of nodeToString
On 22.02.24 16:07, Matthias van de Meent wrote: On Thu, 22 Feb 2024 at 13:37, Matthias van de Meent wrote: On Mon, 19 Feb 2024 at 14:19, Matthias van de Meent wrote: Attached the updated version of the patch on top of 5497daf3, which incorporates this last round of feedback. Now attached rebased on top of 93db6cbd to fix conflicts with fbc93b8b and an issue in the previous patchset: I attached one too many v3-0001 from a previous patch I worked on. ... and now with a fix for not overwriting newly deserialized location attributes with -1, which breaks test output for READ_WRITE_PARSE_PLAN_TREES installations. Again, no other significant changes since the patch of last Monday. * v7-0002-pg_node_tree-Don-t-store-query-text-locations-in-.patch This patch looks much more complicated than I was expecting. I had suggested to model this after stringToNodeWithLocations(). This uses a global variable to toggle the mode. Your patch creates a function nodeToStringNoQLocs() -- why the different naming scheme? -- and passes the flag down as an argument to all the output functions. I mean, in a green field, avoiding global variables can be sensible, of course, but I think in this limited scope here it would really be better to keep the code for the two directions read and write the same. Attached is a small patch that shows what I had in mind. (It doesn't contain any callers, but your patch shows where all those would go.) * v7-0003-gen_node_support.pl-Mark-location-fields-as-type-.patch This looks sensible, but maybe making Location a global type is a bit much? Maybe something more specific like ParseLocation, or ParseLoc, to keep it under 12 characters. From dee403f637f279813f0711bbc64c365cba82c18c Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 11 Mar 2024 14:07:52 +0100 Subject: [PATCH] Add nodeToStringWithoutLocations() --- src/backend/nodes/outfuncs.c | 30 +- src/include/nodes/nodes.h| 1 + 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 29cbc83bd9..ea0a6cb94d 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -25,6 +25,8 @@ #include "nodes/pg_list.h" #include "utils/datum.h" +static bool write_location_fields = true; + static void outChar(StringInfo str, char c); static void outDouble(StringInfo str, double d); @@ -88,7 +90,7 @@ static void outDouble(StringInfo str, double d); /* Write a parse location field (actually same as INT case) */ #define WRITE_LOCATION_FIELD(fldname) \ - appendStringInfo(str, " :" CppAsString(fldname) " %d", node->fldname) + appendStringInfo(str, " :" CppAsString(fldname) " %d", write_location_fields ? node->fldname : -1) /* Write a Node field */ #define WRITE_NODE_FIELD(fldname) \ @@ -770,6 +772,32 @@ nodeToString(const void *obj) return str.data; } +/* + * nodeToStringWithoutLocations - + *returns the ascii representation of the Node as a palloc'd string + * + * This form prints location fields with their default value (not the actual + * field value). This is useful for serializing node trees for storing into + * catalogs, where the location information is not useful since the original + * query string is not preserved anyway. + */ +char * +nodeToStringWithoutLocations(const void *obj) +{ + StringInfoData str; + boolsave_write_location_fields; + + save_write_location_fields = write_location_fields; + write_location_fields = false; + + initStringInfo(); + outNode(, obj); + + write_location_fields = save_write_location_fields; + + return str.data; +} + /* * bmsToString - *returns the ascii representation of the Bitmapset as a palloc'd string diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 2969dd831b..3abe47af94 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -195,6 +195,7 @@ extern void outBitmapset(struct StringInfoData *str, extern void outDatum(struct StringInfoData *str, uintptr_t value, int typlen, bool typbyval); extern char *nodeToString(const void *obj); +extern char *nodeToStringWithoutLocations(const void *obj); extern char *bmsToString(const struct Bitmapset *bms); /* -- 2.44.0
Re: UUID v7
Hi, > Oops, CFbot expectedly found a problem... > Sorry for the noise, this version, I hope, will pass all the tests. > Thanks! > > Best regards, Andrey Borodin. I had some issues applying v19 against the current `master` branch. PFA the rebased and minorly tweaked v20. The patch LGTM. I think it could be merged unless there are any open issues left. I don't think so, but maybe I missed something. -- Best regards, Aleksander Alekseev v20-0001-Implement-UUID-v7.patch Description: Binary data
Re: POC, WIP: OR-clause support for indexes
On 11/3/2024 18:31, Alexander Korotkov wrote: I'm not convinced about this limit. The initial reason was to combine long lists of ORs into the array because such a transformation made at an early stage increases efficiency. I understand the necessity of this limit in the array decomposition routine but not in the creation one. The comment near MAX_SAOP_ARRAY_SIZE says that this limit is because N^2 algorithms could be applied to arrays. Are you sure that's not true for our case? When you operate an array, indeed. But when we transform ORs to an array, not. Just check all the places in the optimizer and even the executor where we would pass along the list of ORs. This is why I think we should use this optimization even more intensively for huge numbers of ORs in an attempt to speed up the overall query. 3) Better save the original order of clauses by putting hash entries and untransformable clauses to the same list. A lot of differences in regression tests output have gone. I agree that reducing the number of changes in regression tests looks better. But to achieve this, you introduced a hack that increases the complexity of the code. Is it worth it? Maybe it would be better to make one-time changes in tests instead of getting this burden on board. Or have you meant something more introducing the node type? For me the reason is not just a regression test. The current code keeps the original order of quals as much as possible. The OR transformation code reorders quals even in cases when it doesn't eventually apply any optimization. I don't think that's acceptable. However, less hackery ways for this is welcome for sure. Why is it unacceptable? Can the user implement some order-dependent logic with clauses, and will it be correct? Otherwise, it is a matter of taste, and generally, this decision is up to you. We don't make array values unique. That might make query execution performance somewhat worse, and also makes selectivity estimation worse. I suggest Andrei and/or Alena should implement making array values unique. The fix Alena has made looks correct. But I urge you to think twice: The optimizer doesn't care about duplicates, so why do we do it? What's more, this optimization is intended to speed up queries with long OR lists. Using the list_append_unique() comparator on such lists could impact performance. I suggest sticking to the common rule and leaving the responsibility on the user's shoulders. I don't see why the optimizer doesn't care about duplicates for OR lists. As I showed before in an example, it successfully removes the duplicate. So, currently OR transformation clearly introduces a regression in terms of selectivity estimation. I think we should evade that. I think you are right. It is probably a better place than any other to remove duplicates in an array. I just think we should sort and remove duplicates from entry->consts in one pass. Thus, this optimisation should be applied to sortable constants. At least, we should do this optimization later, in one pass, with sorting elements before building the array. But what if we don't have a sort operator for the type? It was probably discussed before, but can we do our work later? There is a canonicalize_qual() which calls find_duplicate_ors(). This is the place where currently duplicate OR clauses are removed. Could our OR-to-ANY transformation be just another call from canonicalize_qual()? Hmm, we already tried to do it at that point. I vaguely recall some issues caused by this approach. Anyway, it should be done as quickly as possible to increase the effect of the optimization. -- regards, Andrei Lepikhov Postgres Professional
RE: Test failures of 100_bugs.pl
On Monday, April 24, 2023 5:50 PM Yu Shi (Fujitsu) wrote: > > On Fri, Apr 21, 2023 1:48 PM Yu Shi (Fujitsu) wrote: > > > > I wrote a patch to dump rel state in wait_for_subscription_sync() as > suggested. > > Please see the attached patch. > > I will try to add some debug logs in code later. > > > > Please see the attached v2 patch. > > I added some debug logs when invalidating syncing table states and updating > table_states_not_ready list. I also adjusted the message level in the tests > which > failed before. Just a reference. I think similar issue has been analyzed in other thread[1] and the reason seems clear that the table state cache invalidation got lost due to a race condition. The fix is also being discussed there. [1] https://www.postgresql.org/message-id/flat/711a6afe-edb7-1211-cc27-1bef8239eec7%40gmail.com Best Regards, Hou zj
Re: Add new error_action COPY ON_ERROR "log"
On Mon, Mar 11, 2024 at 11:16 AM Michael Paquier wrote: > > At the end of the day, this comes down to what is more helpful to the > user. And I'd agree on the side what ON_ERROR does currently, which > is what your patch relies on: on the first conversion failure, give up > and skip the rest of the row because we cannot trust its contents. > That's my way of saying that I am fine with the proposal of your > patch, and that we cannot provide the full state of a row without > making the error stack of COPY more invasive. +1. > + verbose, a NOTICE message > + containing the line number and column name for each discarded row is > + emitted. > > This should clarify that the column name refers to the attribute where > the input conversion has failed, I guess. Specifying only "column > name" without more context is a bit confusing. Done. Please see the attached v6 patch set. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v6-0001-Add-LOG_VERBOSITY-option-to-COPY-command.patch Description: Binary data v6-0002-Add-detailed-info-when-COPY-skips-soft-errors.patch Description: Binary data
Re: type cache cleanup improvements
Hi, > Yep, exacly. One time from 2^32 we reset whole cache instead of one (or > several) > entry with hash value = 0. Got it. Here is an updated patch where I added a corresponding comment. Now the patch LGTM. I'm going to change its status to RfC unless anyone wants to review it too. -- Best regards, Aleksander Alekseev v4-0001-Improve-performance-of-type-cache-cleanup.patch Description: Binary data
Re: make BuiltinTrancheNames less ugly
On 2024-Mar-01, Tristan Partin wrote: > On Fri Mar 1, 2024 at 8:00 AM CST, Alvaro Herrera wrote: > > I'm pretty disappointed of not being able to remove > > generate-lwlocknames.pl (it now generates the .h, no longer the .c > > file), but I can't find a way to do the equivalent #defines in CPP ... > > it'd require invoking the C preprocessor twice. Maybe an intermediate > > .h file would solve the problem, but I have no idea how would that work > > with Meson. I guess I'll do it in Make and let somebody suggest a Meson > > way. > > I can help you with Meson if you get the Make implementation done. Actually I realized that we need to keep the Perl script anyway because we want to be able to cross-check the wait_event_names.txt files to ensure we generate the correct documentation. Since it was simple enough, I added the Meson support already. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ Essentially, you're proposing Kevlar shoes as a solution for the problem that you want to walk around carrying a loaded gun aimed at your foot. (Tom Lane) >From 6359348fafbefb391aad89e47e5e443eb8ab8e29 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 1 Mar 2024 13:03:10 +0100 Subject: [PATCH v2] Rework lwlocknames.txt to become lwlocklist.h This saves some code and some space in BuiltinTrancheNames, as foreseen in commit 74a730631065. We still have to build lwlocknames.h using Perl code, which initially I wanted to avoid, but it gives us the chance to cross-check wait_event_names.txt. Discussion: https://postgr.es/m/202401231025.gbv4nnte5fmm@alvherre.pgsql --- src/backend/Makefile | 4 +- src/backend/storage/lmgr/.gitignore | 1 - src/backend/storage/lmgr/Makefile | 9 +- .../storage/lmgr/generate-lwlocknames.pl | 60 ++--- src/backend/storage/lmgr/lwlock.c | 15 ++-- src/backend/storage/lmgr/lwlocknames.txt | 60 - src/backend/storage/lmgr/meson.build | 2 - .../utils/activity/wait_event_names.txt | 8 +- src/include/storage/lwlock.h | 4 +- src/include/storage/lwlocklist.h | 85 +++ src/include/storage/meson.build | 12 ++- src/tools/pginclude/headerscheck | 1 + 12 files changed, 136 insertions(+), 125 deletions(-) delete mode 100644 src/backend/storage/lmgr/lwlocknames.txt create mode 100644 src/include/storage/lwlocklist.h diff --git a/src/backend/Makefile b/src/backend/Makefile index d66e2a4b9f..79a12dfd1e 100644 --- a/src/backend/Makefile +++ b/src/backend/Makefile @@ -110,8 +110,8 @@ $(top_builddir)/src/port/libpgport_srv.a: | submake-libpgport parser/gram.h: parser/gram.y $(MAKE) -C parser gram.h -storage/lmgr/lwlocknames.h: storage/lmgr/generate-lwlocknames.pl storage/lmgr/lwlocknames.txt utils/activity/wait_event_names.txt - $(MAKE) -C storage/lmgr lwlocknames.h lwlocknames.c +storage/lmgr/lwlocknames.h: storage/lmgr/generate-lwlocknames.pl ../include/storage/lwlocklist.h utils/activity/wait_event_names.txt + $(MAKE) -C storage/lmgr lwlocknames.h utils/activity/wait_event_types.h: utils/activity/generate-wait_event_types.pl utils/activity/wait_event_names.txt $(MAKE) -C utils/activity wait_event_types.h pgstat_wait_event.c wait_event_funcs_data.c diff --git a/src/backend/storage/lmgr/.gitignore b/src/backend/storage/lmgr/.gitignore index dab4c3f580..8e5b734f15 100644 --- a/src/backend/storage/lmgr/.gitignore +++ b/src/backend/storage/lmgr/.gitignore @@ -1,3 +1,2 @@ -/lwlocknames.c /lwlocknames.h /s_lock_test diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile index 1aef423384..3f89548bde 100644 --- a/src/backend/storage/lmgr/Makefile +++ b/src/backend/storage/lmgr/Makefile @@ -18,7 +18,6 @@ OBJS = \ lmgr.o \ lock.o \ lwlock.o \ - lwlocknames.o \ predicate.o \ proc.o \ s_lock.o \ @@ -35,11 +34,7 @@ s_lock_test: s_lock.c $(top_builddir)/src/common/libpgcommon.a $(top_builddir)/s $(TASPATH) -L $(top_builddir)/src/common -lpgcommon \ -L $(top_builddir)/src/port -lpgport -lm -o s_lock_test -# see notes in src/backend/parser/Makefile -lwlocknames.c: lwlocknames.h - touch $@ - -lwlocknames.h: $(top_srcdir)/src/backend/storage/lmgr/lwlocknames.txt $(top_srcdir)/src/backend/utils/activity/wait_event_names.txt generate-lwlocknames.pl +lwlocknames.h: ../../../include/storage/lwlocklist.h ../../utils/activity/wait_event_names.txt generate-lwlocknames.pl $(PERL) $(srcdir)/generate-lwlocknames.pl $^ check: s_lock_test @@ -47,4 +42,4 @@ check: s_lock_test clean: rm -f s_lock_test - rm -f lwlocknames.h lwlocknames.c + rm -f lwlocknames.h diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl index 7b93ecf6c1..eaddd9d3b9 100644 --- a/src/backend/storage/lmgr/generate-lwlocknames.pl +++ b/src/backend/storage/lmgr/generate-lwlocknames.pl @@ -1,6 +1,6 @@
Re: Function and Procedure with same signature?
On Thu, Mar 7, 2024 at 5:46 PM Tom Lane wrote: > > Hannu Krosing writes: > > On Sat, Feb 10, 2024 at 12:38 AM Tom Lane wrote: > >> Worth noting perhaps that this is actually required by the SQL > >> standard: per spec, functions and procedures are both "routines" > >> and share the same namespace, > > > Can you point me to a place in the standard where it requires all > > kinds of ROUTINES to be using the same namespace ? > > [ digs around a bit... ] Well, the spec is vaguer about this than > I thought. It is clear on one thing: 11.60 > conformance rules include ... Thanks for thorough analysis of the standard. I went and looked at more what other relevant database do in this space based on their documentation Tl;DR * MS SQL Server - no overloading allowed anywhere * MySQL - no overloading * Oracle - no overloading at top level - overloading and independent namespaces for functions and procedures * Teradata - function overloading allowed - not clear from documentation if this also applies procedures - function overloading docs does not mention possible clashes with procedure names anywhere * DB2 - function overloading fully supported - procedure overloading supported, but only for distinct NUMBER OF ARGUMENTS I'll try to get access to a Teradata instance to verify the above So at high level most other Serious Databases - do support function overloading - keep functions and procedures in separated namespaces I still think that PostgreSQL having functions and procedures share the same namespace is an unneeded and unjustified restriction I plan to do some hands-on testing on Teradata and DB2 to understand it But my current thinking is that we should not be more restrictive than others unless there is a strong technical reason for it. And currently I do not see any. > It could be argued that this doesn't prohibit having both a function > and a procedure with the same data type list, only that you can't > write ROUTINE when trying to drop or alter one. But I think that's > just motivated reasoning. The paragraphs for being > FUNCTION or PROCEDURE are exactly like the above except they say > "exactly one function" or "exactly one procedure". If you argue that > this text means we must allow functions and procedures with the same > parameter lists, then you are also arguing that we must allow multiple > functions with the same parameter lists, and it's just the user's > tough luck if they need to drop one of them. The main issue is not dropping them, but inability to determine which one to call. We already have this in case of two overloaded functions with same initial argument types and the rest having defaults - when --- hannuk=# create function get(i int, out o int) begin atomic select i; end; CREATE FUNCTION hannuk=# create function get(i int, j int default 0, out o int) begin atomic select i+j; end; CREATE FUNCTION hannuk=# select get(1); ERROR: function get(integer) is not unique LINE 1: select get(1); ^ HINT: Could not choose a best candidate function. You might need to add explicit type casts. --- > A related point is that our tolerance for overloading routine > names isn't unlimited: we don't allow duplicate names with the > same list of input parameters, even if the output parameters are > different. This again has a good reason, as there would be many cases where you could not decide which one to call Not allowing overloading based on only return types is common across all OO languages. My point is that this does not apply to FUNCTION vs. PROCEDURE as it is very clear from the CALL syntax which one is meant. > This is not something that the SQL spec cares to > address, but there are good ambiguity-avoidance reasons for it. > I think limiting overloading so that a ROUTINE specification is > unambiguous is also a good thing. I think ROUTINE being unambiguous is not e very good goal. What if next version of standard introduces DROP DATABASE OBJECT ? > I remain unexcited about changing our definition of this. > "Oracle allows it" is not something that has any weight in > my view: they have made a bunch of bad design decisions > as well as good ones, and I think this is a bad one. Fully agree that "Oracle allows it" is a non-argument. My main point is that there is no strong reason to have objects which are distinct at syntax level to be in the same namespace. # Oracle is actually much more restrictive in top level object namespace - All of the following share the same namespace - [Packages, Private synonyms, Sequences, Stand-alone procedures, Stand-alone stored functions, Tables, User-defined operators, User-defined types, Views]. (I guess this makes parser easier to write) The equivalent in postgreSQL would be [extensions, schemas, tables, procedures, functions and a few more] all sharing the namespace. Where Oracle *does* allow overloading is "packaged functions and procedures" which are probably using a separate
Re: Have pg_basebackup write "dbname" in "primary_conninfo"?
On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu) wrote: > > > PSA the patch for implementing it. It is basically same as Ian's one. > > However, this patch still cannot satisfy the condition 3). > > > > `pg_basebackup -D data_N2 -d "user=postgres" -R` > > -> dbname would not be appeared in primary_conninfo. > > > > This is because `if (connection_string)` case in GetConnection() explicy > > override > > a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} > > pair > > before the overriding, but it is no-op. Because The replacement of the > > dbname in > > pqConnectOptions2() would be done only for the valid (=lastly specified) > > connection options. > > Oh, this patch missed the straightforward case: > > pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R > -> dbname would not be appeared in primary_conninfo. > > So I think it cannot be applied as-is. Sorry for sharing the bad item. > Can you please share the patch that can be considered for review? -- With Regards, Amit Kapila.
Re: POC, WIP: OR-clause support for indexes
Hi Andrei, Thank you for your response. On Mon, Mar 11, 2024 at 7:13 AM Andrei Lepikhov wrote: > On 7/3/2024 21:51, Alexander Korotkov wrote: > > Hi! > > > > On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov > > mailto:a.lepik...@postgrespro.ru>> wrote: > > > On 5/3/2024 12:30, Andrei Lepikhov wrote: > > > > On 4/3/2024 09:26, jian he wrote: > > > ... and the new version of the patchset is attached. > > > > I made some revisions for the patchset. > Great! > > 1) Use hash_combine() to combine hash values. > Looks better > > 2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE. > > I'm not convinced about this limit. The initial reason was to combine > long lists of ORs into the array because such a transformation made at > an early stage increases efficiency. > I understand the necessity of this limit in the array decomposition > routine but not in the creation one. The comment near MAX_SAOP_ARRAY_SIZE says that this limit is because N^2 algorithms could be applied to arrays. Are you sure that's not true for our case? > > 3) Better save the original order of clauses by putting hash entries and > > untransformable clauses to the same list. A lot of differences in > > regression tests output have gone. > I agree that reducing the number of changes in regression tests looks > better. But to achieve this, you introduced a hack that increases the > complexity of the code. Is it worth it? Maybe it would be better to make > one-time changes in tests instead of getting this burden on board. Or > have you meant something more introducing the node type? For me the reason is not just a regression test. The current code keeps the original order of quals as much as possible. The OR transformation code reorders quals even in cases when it doesn't eventually apply any optimization. I don't think that's acceptable. However, less hackery ways for this is welcome for sure. > > We don't make array values unique. That might make query execution > > performance somewhat worse, and also makes selectivity estimation > > worse. I suggest Andrei and/or Alena should implement making array > > values unique. > The fix Alena has made looks correct. But I urge you to think twice: > The optimizer doesn't care about duplicates, so why do we do it? > What's more, this optimization is intended to speed up queries with long > OR lists. Using the list_append_unique() comparator on such lists could > impact performance. I suggest sticking to the common rule and leaving > the responsibility on the user's shoulders. I don't see why the optimizer doesn't care about duplicates for OR lists. As I showed before in an example, it successfully removes the duplicate. So, currently OR transformation clearly introduces a regression in terms of selectivity estimation. I think we should evade that. > At least, we should do this optimization later, in one pass, with > sorting elements before building the array. But what if we don't have a > sort operator for the type? It was probably discussed before, but can we do our work later? There is a canonicalize_qual() which calls find_duplicate_ors(). This is the place where currently duplicate OR clauses are removed. Could our OR-to-ANY transformation be just another call from canonicalize_qual()? -- Regards, Alexander Korotkov
Re: Using the %m printf format more
Michael Paquier writes: > On Wed, Mar 06, 2024 at 07:11:19PM +, Dagfinn Ilmari Mannsåker wrote: > >> The attached patch does so everywhere appropriate. One place where it's >> not appropriate is the TAP-emitting functions in pg_regress, since those >> call fprintf() > > I am not really following your argument with pg_regress.c and > fprintf(). d6c55de1f99a should make that possible even in the case of > emit_tap_output_v(), no? The problem isn't that emit_tap_output_v() doesn't support %m, which it does, but that it potentially calls fprintf() to output TAP protocol elements such as "\n" and "# " before it calls vprintf(…, fmt, …), and those calls might clobber errno. An option is to make it save errno at the start and restore it before the vprintf() calls, as in the second attached patch. >> and other potentially errno-modifying functions before >> evaluating the format string. > > Sure. On closer look, fprintf() is actually the only errno-clobbering function it calls, I was just hedging my bets in that statement. - ilmari >From 3727341aad84fbd275acb24f92591cc734fdd6a7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Wed, 6 Mar 2024 16:58:33 + Subject: [PATCH v2 1/2] Use %m printf format instead of strerror(errno) where appropriate Now that all live branches (12-) support %m in printf formats, let's use it everywhere appropriate. Particularly, we're _not_ converting the TAP output calls in pg_regress, since those functions call fprintf() and friends before evaluating the format string, which can clobber errno. --- src/backend/postmaster/postmaster.c | 21 ++-- src/backend/postmaster/syslogger.c| 2 +- src/backend/utils/misc/guc.c | 9 +- src/bin/initdb/findtimezone.c | 4 +- src/bin/pg_ctl/pg_ctl.c | 74 +++--- src/bin/pg_dump/compress_gzip.c | 2 +- src/bin/pg_dump/compress_none.c | 5 +- src/bin/pg_upgrade/check.c| 41 +++- src/bin/pg_upgrade/controldata.c | 6 +- src/bin/pg_upgrade/exec.c | 14 ++- src/bin/pg_upgrade/file.c | 98 +-- src/bin/pg_upgrade/function.c | 3 +- src/bin/pg_upgrade/option.c | 10 +- src/bin/pg_upgrade/parallel.c | 12 +-- src/bin/pg_upgrade/pg_upgrade.c | 4 +- src/bin/pg_upgrade/relfilenumber.c| 5 +- src/bin/pg_upgrade/tablespace.c | 4 +- src/bin/pg_upgrade/version.c | 9 +- src/common/psprintf.c | 4 +- src/interfaces/ecpg/preproc/ecpg.c| 12 +-- src/port/path.c | 3 +- src/test/isolation/isolationtester.c | 2 +- .../modules/libpq_pipeline/libpq_pipeline.c | 4 +- src/tools/ifaddrs/test_ifaddrs.c | 2 +- 24 files changed, 158 insertions(+), 192 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index f3c09e8dc0..af8a1efe66 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1375,12 +1375,12 @@ PostmasterMain(int argc, char *argv[]) /* Make PID file world readable */ if (chmod(external_pid_file, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH) != 0) -write_stderr("%s: could not change permissions of external PID file \"%s\": %s\n", - progname, external_pid_file, strerror(errno)); +write_stderr("%s: could not change permissions of external PID file \"%s\": %m\n", + progname, external_pid_file); } else - write_stderr("%s: could not write external PID file \"%s\": %s\n", - progname, external_pid_file, strerror(errno)); + write_stderr("%s: could not write external PID file \"%s\": %m\n", + progname, external_pid_file); on_proc_exit(unlink_external_pid_file, 0); } @@ -1589,8 +1589,8 @@ checkControlFile(void) { write_stderr("%s: could not find the database system\n" "Expected to find it in the directory \"%s\",\n" - "but could not open file \"%s\": %s\n", - progname, DataDir, path, strerror(errno)); + "but could not open file \"%s\": %m\n", + progname, DataDir, path); ExitPostmaster(2); } FreeFile(fp); @@ -6277,15 +6277,13 @@ read_backend_variables(char *id, Port **port, BackgroundWorker **worker) fp = AllocateFile(id, PG_BINARY_R); if (!fp) { - write_stderr("could not open backend variables file \"%s\": %s\n", - id, strerror(errno)); + write_stderr("could not open backend variables file \"%s\": %m\n", id); exit(1); } if (fread(, sizeof(param), 1, fp) != 1) { - write_stderr("could not read from backend variables file \"%s\": %s\n", - id, strerror(errno)); + write_stderr("could not read from backend variables file \"%s\": %m\n", id); exit(1); } @@ -6293,8 +6291,7 @@ read_backend_variables(char
Re: Transaction timeout
On Mon, Mar 11, 2024 at 12:53 PM Andrey M. Borodin wrote: > > On 7 Mar 2024, at 00:55, Alexander Korotkov wrote: > > > > On Wed, Mar 6, 2024 at 10:22 AM Andrey M. Borodin > > wrote: > >>> On 25 Feb 2024, at 21:50, Alexander Korotkov wrote: > >>> > >>> Thank you for the patches. I've pushed the 0001 patch to avoid > >>> further failures on buildfarm. Let 0004 wait till injections points > >>> by Mechael are committed. > >> > >> Thanks! > >> > >> All prerequisites are committed. I propose something in a line with this > >> patch. > > > > Thank you. I took a look at the patch. Should we also check the > > relevant message after the timeout is fired? We could check it in > > psql stderr or log for that. > > PFA version which checks log output. > But I could not come up with a proper use of BackgroundPsql->query_until() to > check outputs. And there are multiple possible errors. > > We can copy test from src/bin/psql/t/001_basic.pl: > > # test behavior and output on server crash > my ($ret, $out, $err) = $node->psql('postgres', > "SELECT 'before' AS running;\n" > . "SELECT pg_terminate_backend(pg_backend_pid());\n" > . "SELECT 'AFTER' AS not_running;\n"); > > is($ret, 2, 'server crash: psql exit code'); > like($out, qr/before/, 'server crash: output before crash'); > ok($out !~ qr/AFTER/, 'server crash: no output after crash'); > is( $err, > 'psql::2: FATAL: terminating connection due to administrator command > psql::2: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > psql::2: error: connection to server was lost', > 'server crash: error message’); > > But I do not see much value in this. > What do you think? I think if checking psql stderr is problematic, checking just logs is fine. Could we wait for the relevant log messages one by one with $node->wait_for_log() just like 040_standby_failover_slots_sync.pl do? -- Regards, Alexander Korotkov
Re: Reducing the log spam
On Mon, 2024-03-11 at 09:33 +0100, Jelte Fennema-Nio wrote: > - the subscriber's server log. > + the subscriber's server log if you remove 23505 from > + . > > This seems like a pretty big regression. Being able to know why your > replication got closed seems pretty critical. The actual SQLSTATEs that get suppressed are subject to discussion (an I have a gut feeling that some people will want the list empty). As far as this specific functionality is concerned, I think that the actual problem is a deficiency in PostgreSQL. The problem is that the log is the *only* place where you can get this information. That will be a problem for many people, even without "log_suppress_errcodes". I think that this isformation should be available in some statistics view. Yours, Laurenz Albe
Re: Regardign RecentFlushPtr in WalSndWaitForWal()
On Mon, Mar 11, 2024 at 4:17 PM shveta malik wrote: > > On Sat, Mar 2, 2024 at 4:44 PM Amit Kapila wrote: > > > > Right, I think the quoted code has check "if (!RecoveryInProgress())". > > > > > > > But apart from that, your > > > observation seems accurate, yes. > > > > > > > I also find the observation correct and the code has been like that > > since commit 5a991ef8 [1]. So, let's wait to see if Robert or Andres > > remembers the reason, otherwise, we should probably nuke this code. > > Please find the patch attached for the same. > LGTM. I'll push this tomorrow unless I see any comments/objections to this change. -- With Regards, Amit Kapila.
Re: Transaction timeout
> On 7 Mar 2024, at 00:55, Alexander Korotkov wrote: > > On Wed, Mar 6, 2024 at 10:22 AM Andrey M. Borodin > wrote: >>> On 25 Feb 2024, at 21:50, Alexander Korotkov wrote: >>> >>> Thank you for the patches. I've pushed the 0001 patch to avoid >>> further failures on buildfarm. Let 0004 wait till injections points >>> by Mechael are committed. >> >> Thanks! >> >> All prerequisites are committed. I propose something in a line with this >> patch. > > Thank you. I took a look at the patch. Should we also check the > relevant message after the timeout is fired? We could check it in > psql stderr or log for that. PFA version which checks log output. But I could not come up with a proper use of BackgroundPsql->query_until() to check outputs. And there are multiple possible errors. We can copy test from src/bin/psql/t/001_basic.pl: # test behavior and output on server crash my ($ret, $out, $err) = $node->psql('postgres', "SELECT 'before' AS running;\n" . "SELECT pg_terminate_backend(pg_backend_pid());\n" . "SELECT 'AFTER' AS not_running;\n"); is($ret, 2, 'server crash: psql exit code'); like($out, qr/before/, 'server crash: output before crash'); ok($out !~ qr/AFTER/, 'server crash: no output after crash'); is( $err, 'psql::2: FATAL: terminating connection due to administrator command psql::2: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql::2: error: connection to server was lost', 'server crash: error message’); But I do not see much value in this. What do you think? Best regards, Andrey Borodin. v3-0001-Add-timeouts-TAP-tests.patch Description: Binary data
Re: Add comment to specify timeout unit in ConditionVariableTimedSleep()
On Sat, Mar 9, 2024 at 12:19 PM Michael Paquier wrote: > > On Tue, Mar 05, 2024 at 03:20:48PM +0900, Michael Paquier wrote: > > That sounds like a good idea to me, so I'm OK with your suggestion. > > Applied this one as f160bf06f72a. Thanks. Thanks! thanks Shveta
Re: Regardign RecentFlushPtr in WalSndWaitForWal()
On Sat, Mar 2, 2024 at 4:44 PM Amit Kapila wrote: > > Right, I think the quoted code has check "if (!RecoveryInProgress())". > > > > But apart from that, your > > observation seems accurate, yes. > > > > I also find the observation correct and the code has been like that > since commit 5a991ef8 [1]. So, let's wait to see if Robert or Andres > remembers the reason, otherwise, we should probably nuke this code. Please find the patch attached for the same. thanks Shveta v1-0001-Remove-redundant-RecentFlushPtr-fetch-in-WalSndWa.patch Description: Binary data
Re: [HACKERS] make async slave to wait for lsn to be replayed
Hi! I've decided to put my hands on this patch. On Thu, Mar 7, 2024 at 2:25 PM Amit Kapila wrote: > +1 for the second one not only because it avoids new words in grammar > but also sounds to convey the meaning. I think you can explain in docs > how this feature can be used basically how will one get the correct > LSN value to specify. I picked the second option and left only the AFTER clause for the BEGIN statement. I think this should be enough for the beginning. > As suggested previously also pick one of the approaches (I would > advocate the second one) and keep an option for the second one by > mentioning it in the commit message. I hope to see more > reviews/discussions or usage like how will users get the LSN value to > be specified on the core logic of the feature at this stage. IF > possible, state, how real-world applications could leverage this > feature. I've added a paragraph to the docs about the usage. After you made some changes on primary, you run pg_current_wal_insert_lsn(). Then connect to replica and run 'BEGIN AFTER lsn' with the just obtained LSN. Now you're guaranteed to see the changes made to the primary. Also, I've significantly reworked other aspects of the patch. The most significant changes are: 1) Waiters are now stored in the array sorted by LSN. This saves us from scanning of wholeper-backend array. 2) Waiters are removed from the array immediately once their LSNs are replayed. Otherwise, the WAL replayer will keep scanning the shared memory array till waiters wake up. 3) To clean up after errors, we now call WaitLSNCleanup() on backend shmem exit. I think this is preferable over the previous approach to remove from the queue before ProcessInterrupts(). 4) There is now condition to recheck if LSN is replayed after adding to the shared memory array. This should save from the race conditions. 5) I've renamed too generic names for functions and files. -- Regards, Alexander Korotkov v8-0001-Implement-AFTER-clause-for-BEGIN-command.patch Description: Binary data
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Wed, Mar 6, 2024 at 2:47 PM Bharath Rupireddy wrote: > > Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change. > Commit message says: "Currently postgres has the ability to invalidate inactive replication slots based on the amount of WAL (set via max_slot_wal_keep_size GUC) that will be needed for the slots in case they become active. However, choosing a default value for max_slot_wal_keep_size is tricky. Because the amount of WAL a customer generates, and their allocated storage will vary greatly in production, making it difficult to pin down a one-size-fits-all value. It is often easy for developers to set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5 billion, after which the slots get invalidated." I don't see how it will be easier for the user to choose the default value of 'max_slot_xid_age' compared to 'max_slot_wal_keep_size'. But, I agree similar to 'max_slot_wal_keep_size', 'max_slot_xid_age' can be another parameter to allow vacuum to proceed removing the rows which otherwise it wouldn't have been as those would be required by some slot. Now, if this understanding is correct, we should probably make this invalidation happen by (auto)vacuum after computing the age based on this new parameter. -- With Regards, Amit Kapila.
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Fri, Mar 8, 2024 at 10:42 PM Bharath Rupireddy wrote: > > On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila wrote: > > > > You might want to consider its interaction with sync slots on standby. > > Say, there is no activity on slots in terms of processing the changes > > for slots. Now, we won't perform sync of such slots on standby showing > > them inactive as per your new criteria where as same slots could still > > be valid on primary as the walsender is still active. This may be more > > of a theoretical point as in running system there will probably be > > some activity but I think this needs some thougths. > > I believe the xmin and catalog_xmin of the sync slots on the standby > keep advancing depending on the slots on the primary, no? If yes, the > XID age based invalidation shouldn't be a problem. > > I believe there are no walsenders started for the sync slots on the > standbys, right? If yes, the inactive timeout based invalidation also > shouldn't be a problem. Because, the inactive timeouts for a slot are > tracked only for walsenders because they are the ones that typically > hold replication slots for longer durations and for real replication > use. We did a similar thing in a recent commit [1]. > > Is my understanding right? > Yes, your understanding is correct. I wanted us to consider having new parameters like 'inactive_replication_slot_timeout' to be at slot-level instead of GUC. I think this new parameter doesn't seem to be the similar as 'max_slot_wal_keep_size' which leads to truncation of WAL at global and then invalidates the appropriate slots. OTOH, the 'inactive_replication_slot_timeout' doesn't appear to have a similar global effect. The other thing we should consider is what if the checkpoint happens at a timeout greater than 'inactive_replication_slot_timeout'? Shall, we consider doing it via some other background process or do we think checkpointer is the best we can have? > Do you still see any problems with it? > Sorry, I haven't done any detailed review yet so can't say with confidence whether there is any problem or not w.r.t sync slots. -- With Regards, Amit Kapila.
Re: Proposal to add page headers to SLRU pages
> On Mar 9, 2024, at 05:22, Jeff Davis wrote: > > External Email > > On Wed, 2024-03-06 at 12:01 +, Li, Yong wrote: >> Rebase the patch against the latest HEAD. > > The upgrade logic could use more comments explaining what's going on > and why. As I understand it, it's a one-time conversion that needs to > happen between 16 and 17. Is that right? > > Regards, >Jeff Davis > > In the new code we effectively store only one LSN per page, which I > understand is strictly worse. Maybe the idea of doing away with these > LSN groups should be reconsidered ... unless I completely misunderstand > the whole thing. > > -- > Álvaro Herrera PostgreSQL Developer — Thanks for the comments on LSN groups and pg_upgrade. I have updated the patch to address both comments: - The clog LSN group has been brought back. Now the page LSN on each clog page is used for honoring the write-ahead rule and it is always the highest LSN of all the LSN groups on the page. The LSN groups are used by TransactionIdGetStatus() as before. - New comments have been added to pg_upgrade to mention the SLRU page header change as the reason for upgrading clog files. Regards, Yong slru_page_header_v6.patch Description: slru_page_header_v6.patch
Re: Statistics Import and Export
Greetings, * Corey Huinker (corey.huin...@gmail.com) wrote: > > > +/* > > > + * Set statistics for a given pg_class entry. > > > + * > > > + * pg_set_relation_stats(relation Oid, reltuples double, relpages int) > > > + * > > > + * This does an in-place (i.e. non-transactional) update of pg_class, > > just as > > > + * is done in ANALYZE. > > > + * > > > + */ > > > +Datum > > > +pg_set_relation_stats(PG_FUNCTION_ARGS) > > > +{ > > > + const char *param_names[] = { > > > + "relation", > > > + "reltuples", > > > + "relpages", > > > + }; > > > + > > > + Oid relid; > > > + Relationrel; > > > + HeapTuple ctup; > > > + Form_pg_class pgcform; > > > + > > > + for (int i = 0; i <= 2; i++) > > > + if (PG_ARGISNULL(i)) > > > + ereport(ERROR, > > > + > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > + errmsg("%s cannot be NULL", > > param_names[i]))); > > > > Why not just mark this function as strict..? Or perhaps we should allow > > NULLs to be passed in and just not update the current value in that > > case? > > Strict could definitely apply here, and I'm inclined to make it so. Having thought about it a bit more, I generally like the idea of being able to just update one stat instead of having to update all of them at once (and therefore having to go look up what the other values currently are...). That said, per below, perhaps making it strict is the better plan. > > Also, in some cases we allow the function to be called with a > > NULL but then make it a no-op rather than throwing an ERROR (eg, if the > > OID ends up being NULL). > > Thoughts on it emitting a WARN or NOTICE before returning false? Eh, I don't think so? Where this is coming from is that we can often end up with functions like these being called inside of larger queries, and having them spit out WARN or NOTICE will just make them noisy. That leads to my general feeling of just returning NULL if called with a NULL OID, as we would get with setting the function strict. > > Not sure if that makes sense here or not > > offhand but figured I'd mention it as something to consider. > > > > > + pgcform = (Form_pg_class) GETSTRUCT(ctup); > > > + pgcform->reltuples = PG_GETARG_FLOAT4(1); > > > + pgcform->relpages = PG_GETARG_INT32(2); > > > > Shouldn't we include relallvisible? > > Yes. No idea why I didn't have that in there from the start. Ok. > > Also, perhaps we should use the approach that we have in ANALYZE, and > > only actually do something if the values are different rather than just > > always doing an update. > > That was how it worked back in v1, more for the possibility that there was > no matching JSON to set values. > > Looking again at analyze.c (currently lines 1751-1780), we just check if > there is a row in the way, and if so we replace it. I don't see where we > compare existing values to new values. Well, that code is for pg_statistic while I was looking at pg_class (in vacuum.c:1428-1443, where we track if we're actually changing anything and only make the pg_class change if there's actually something different): vacuum.c:1531 /* If anything changed, write out the tuple. */ if (dirty) heap_inplace_update(rd, ctup); Not sure why we don't treat both the same way though ... although it's probably the case that it's much less likely to have an entire pg_statistic row be identical than the few values in pg_class. > > > +Datum > > > +pg_set_attribute_stats(PG_FUNCTION_ARGS) > > > > > + /* names of columns that cannot be null */ > > > + const char *required_param_names[] = { > > > + "relation", > > > + "attname", > > > + "stainherit", > > > + "stanullfrac", > > > + "stawidth", > > > + "stadistinct", > > > + "stakind1", > > > + "stakind2", > > > + "stakind3", > > > + "stakind4", > > > + "stakind5", > > > + }; > > > > Same comment here as above wrt NULL being passed in. > > In this case, the last 10 params (stanumbersN and stavaluesN) can be null, > and are NULL more often than not. Hmm, that's a valid point, so a NULL passed in would need to set that value actually to NULL, presumably. Perhaps then we should have pg_set_relation_stats() be strict and have pg_set_attribute_stats() handles NULLs passed in appropriately, and return NULL if the relation itself or attname, or other required (not NULL'able) argument passed in cause the function to return NULL. (What I'm trying to drive at here is a consistent interface for these functions, but one which does a no-op instead of returning an ERROR on values being passed in which aren't allowable; it can be quite frustrating trying to get a query to work where one of the functions decides to return ERROR instead of
Re: Support a wildcard in backtrace_functions
On Fri, 8 Mar 2024 at 17:17, Bharath Rupireddy wrote: > Hm, we may not want backtrace_on_internal_error to be on by default. > AFAICS, all ERRCODE_INTERNAL_ERROR are identifiable with the error > message, so it's sort of easy for one to track down the cause of it > without actually needing to log backtrace by default. While maybe all messages uniquely identify the exact error, these errors usually originate somewhere deep down the call stack in a function that is called from many other places. Having the full stack trace can thus greatly help us to find what caused this specific error to occur. I think that would be quite useful to enable by default, if only because it would make many bug reports much more actionable. > 1. Rename the new GUC backtrace_functions_min_level to backtrace_min_level. > 2. Add 'internal' to backtrace_min_level_options enum > +static const struct config_enum_entry backtrace_functions_level_options[] = { > + {"internal", INTERNAL, false}, > +{"debug5", DEBUG5, false}, > +{"debug4", DEBUG4, false}, > 3. Remove backtrace_on_internal_error GUC which is now effectively > covered by backtrace_min_level = 'internal'; > > Does anyone see a problem with it? Honestly, it seems a bit confusing to me to add INTERNAL as a level, since it's an error code not log level. Also merging it in this way, brings up certain questions: 1. How do you get the current backtrace_on_internal_error=true behaviour? Would you need to set both backtrace_functions='*' and backtrace_min_level=INTERNAL? 2. What is the default value for backtrace_min_level? 3. You still wouldn't be able to limit INTERNAL errors to FATAL level I personally think having three GUCs in this patchset make sense, especially since I think it would be good to turn backtrace_on_internal_error on by default. The only additional change that I think might be worth making is to make backtrace_on_internal_error take a level argument, so that you could configure postgres to only add stack traces to FATAL internal errors. (attached is a patch that should fix the CI issue by adding GUC_NOT_IN_SAMPLE backtrace_functions_min_level) v7-0002-Add-wildcard-support-to-backtrace_functions-GUC.patch Description: Binary data v7-0001-Add-backtrace_functions_min_level.patch Description: Binary data
Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
On Mon, Mar 11, 2024 at 5:43 AM Anton A. Melnikov wrote: > On 11.03.2024 03:39, Alexander Korotkov wrote: > > Now that we distinguish stats for checkpoints and > > restartpoints, we need to update the docs. Please, check the patch > > attached. > > Maybe bring the pg_stat_get_checkpointer_buffers_written() description in > consistent with these changes? > Like that: > > --- a/src/include/catalog/pg_proc.dat > +++ b/src/include/catalog/pg_proc.dat > @@ -5740 +5740 @@ > - descr => 'statistics: number of buffers written by the checkpointer', > + descr => 'statistics: number of buffers written during checkpoints and > restartpoints', This makes sense. I've included this into the revised patch. > And after i took a fresh look at this patch i noticed a simple way to extract > write_time and sync_time counters for restartpoints too. > > What do you think, is there a sense to do this? I'm not sure we need this. The ways we trigger checkpoints and restartpoints are different. This is why we needed separate statistics. But the process of writing buffers is the same. -- Regards, Alexander Korotkov rename_pg_stat_get_checkpointer_fields_v2.patch Description: Binary data
Re: Confine vacuum skip logic to lazy_scan_skip
On 08/03/2024 19:34, Melanie Plageman wrote: On Fri, Mar 08, 2024 at 06:07:33PM +0200, Heikki Linnakangas wrote: On 08/03/2024 02:46, Melanie Plageman wrote: On Wed, Mar 06, 2024 at 10:00:23PM -0500, Melanie Plageman wrote: On Wed, Mar 06, 2024 at 09:55:21PM +0200, Heikki Linnakangas wrote: However, by adding a vmbuffer to next_block_state, the callback may be able to avoid extra VM fetches from one invocation to the next. That's a good idea, holding separate VM buffer pins for the next-unskippable block and the block we're processing. I adopted that approach. Cool. It can't be avoided with streaming read vacuum, but I wonder if there would ever be adverse effects to doing it on master? Maybe if we are doing a lot of skipping and the block of the VM for the heap blocks we are processing ends up changing each time but we would have had the right block of the VM if we used the one from heap_vac_scan_next_block()? Frankly, I'm in favor of just doing it now because it makes lazy_scan_heap() less confusing. +1 I'm pretty happy with the attached patches now. The first one fixes the existing bug I mentioned in the other email (based on the on-going discussion that might not how we want to fix it though). ISTM we should still do the fix you mentioned -- seems like it has more upsides than downsides? From b68cb29c547de3c4acd10f31aad47b453d154666 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Fri, 8 Mar 2024 16:00:22 +0200 Subject: [PATCH v8 1/3] Set all_visible_according_to_vm correctly with DISABLE_PAGE_SKIPPING It's important for 'all_visible_according_to_vm' to correctly reflect whether the VM bit is set or not, even when we are not trusting the VM to skip pages, because contrary to what the comment said, lazy_scan_prune() relies on it. If it's incorrectly set to 'false', when the VM bit is in fact set, lazy_scan_prune() will try to set the VM bit again and dirty the page unnecessarily. As a result, if you used DISABLE_PAGE_SKIPPING, all heap pages were dirtied, even if there were no changes. We would also fail to clear any VM bits that were set incorrectly. This was broken in commit 980ae17310, so backpatch to v16. LGTM. Committed and backpatched this. From 47af1ca65cf55ca876869b43bff47f9d43f0750e Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Fri, 8 Mar 2024 17:32:19 +0200 Subject: [PATCH v8 2/3] Confine vacuum skip logic to lazy_scan_skip() --- src/backend/access/heap/vacuumlazy.c | 256 +++ 1 file changed, 141 insertions(+), 115 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index ac55ebd2ae5..0aa08762015 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -204,6 +204,12 @@ typedef struct LVRelState int64 live_tuples;/* # live tuples remaining */ int64 recently_dead_tuples; /* # dead, but not yet removable */ int64 missed_dead_tuples; /* # removable, but not removed */ Perhaps we should add a comment to the blkno member of LVRelState indicating that it is used for error reporting and logging? Well, it's already under the "/* Error reporting state */" section. I agree this is a little confusing, the name 'blkno' doesn't convey that it's supposed to be used just for error reporting. But it's a pre-existing issue so I left it alone. It can be changed with a separate patch if we come up with a good idea. I see why you removed my treatise-level comment that was here about unskipped skippable blocks. However, when I was trying to understand this code, I did wish there was some comment that explained to me why we needed all of the variables next_unskippable_block, next_unskippable_allvis, all_visible_according_to_vm, and current_block. The idea that we would choose not to skip a skippable block because of kernel readahead makes sense. The part that I had trouble wrapping my head around was that we want to also keep the visibility status of both the beginning and ending blocks of the skippable range and then use those to infer the visibility status of the intervening blocks without another VM lookup if we decide not to skip them. Right, I removed the comment because looked a little out of place and it duplicated the other comments sprinkled in the function. I agree this could still use some more comments though. Here's yet another attempt at making this more readable. I moved the logic to find the next unskippable block to a separate function, and added comments to make the states more explicit. What do you think? -- Heikki Linnakangas Neon (https://neon.tech) From c21480e9da61e145573de3b502551dde1b8fa3f6 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Fri, 8 Mar 2024 17:32:19 +0200 Subject: [PATCH v9 1/2] Confine vacuum skip logic to lazy_scan_skip() Rename lazy_scan_skip() to heap_vac_scan_next_block() and move more code into the function, so
Re: automating RangeTblEntry node support
On 20.02.24 08:57, Peter Eisentraut wrote: On 18.02.24 00:06, Matthias van de Meent wrote: I'm not sure that the cleanup which is done when changing a RTE's rtekind is also complete enough for this purpose. Things like inline_cte_walker change the node->rtekind, which could leave residual junk data in fields that are currently dropped during serialization (as the rtekind specifically ignores those fields), but which would add overhead when the default omission is expected to handle these fields; as they could then contain junk. It looks like there is some care about zeroing now unused fields, but I haven't checked that it covers all cases and fields to the extent required so that removing this specialized serializer would have zero impact on size once the default omission patch is committed. An additional patch with a single function that for this purpose clears junk fields from RTEs that changed kind would be appreciated: it is often hand-coded at those locations the kind changes, but that's more sensitive to programmer error. Yes, interesting idea. Or maybe an assert-like function that checks an existing structure for consistency. Or maybe both. I'll try this out. In the meantime, if there are no remaining concerns, I propose to commit the first two patches Remove custom Constraint node read/write implementations Remove custom _jumbleRangeTblEntry() After a few side quests, here is an updated patch set. (I had committed the first of the two patches mentioned above, but not yet the second one.) v3-0001-Remove-obsolete-comment.patch v3-0002-Improve-comment.patch These just update a few comments around the RangeTblEntry definition. v3-0003-Reformat-some-node-comments.patch v3-0004-Remove-custom-_jumbleRangeTblEntry.patch This is pretty much the same patch as before. I have now split it up to first reformat the comments to make room for the node annotations. This patch is now also pgindent-proof. After some side quest discussions, the set of fields to jumble seems correct now, so commit message comments to the contrary have been dropped. v3-0005-Make-RangeTblEntry-dump-order-consistent.patch I separated that from the 0008 patch below. I think it useful even if we don't go ahead with 0008 now, for example in dumps from the debugger, and just in general to keep everything more consistent. v3-0006-WIP-AssertRangeTblEntryIsValid.patch This is in response to some of the discussions where there was some doubt whether all fields are always filled and cleared correctly when the RTE kind is changed. Seems correct as far as this goes. I didn't know of a good way to hook this in, so I put it into the write/read functions, which is obviously a bit weird if I'm proposing to remove them later. Consider it a proof of concept. v3-0007-Simplify-range_table_mutator_impl-and-range_table.patch v3-0008-WIP-Remove-custom-RangeTblEntry-node-read-write-i.patch At this point, I'm not too stressed about pressing forward with these last two patches. We can look at them again perhaps if we make progress on a more compact node output format. When I started this thread, I had a lot of questions about various details about the RangeTblEntry struct, and we have achieved many answers during the discussions, so I'm happy with the progress. So for PG17, I'd like to just do patches 0001..0005. From dc53b7ddfc5aa004a0d222b4084a1c580f05a296 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 11 Mar 2024 09:15:33 +0100 Subject: [PATCH v3 1/8] Remove obsolete comment The idea to use a union in the definition of RangeTblEntry is clearly not being pursued. Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca...@eisentraut.org --- src/include/nodes/parsenodes.h | 6 -- 1 file changed, 6 deletions(-) diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 2380821600..5113f97363 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -1028,12 +1028,6 @@ typedef struct RangeTblEntry RTEKind rtekind;/* see above */ - /* -* XXX the fields applicable to only some rte kinds should be merged into -* a union. I didn't do this yet because the diffs would impact a lot of -* code that is being actively worked on. FIXME someday. -*/ - /* * Fields valid for a plain relation RTE (else zero): * base-commit: af0e7deb4a1c369bb8154ac55f085d6a93fe5c35 -- 2.44.0 From 9fd2ae6a561ea72f627057d922e48d92eaafe099 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 11 Mar 2024 09:15:33 +0100 Subject: [PATCH v3 2/8] Improve comment Clarify that RangeTblEntry.lateral reflects whether LATERAL was specified in the statement (as opposed to whether lateralness is implicit). Also, the list of applicable entry types was incomplete. Discussion:
Re: remaining sql/json patches
Hi. more minor issues. by searching `elog(ERROR, "unrecognized node type: %d"` I found that generally enum is cast to int, before printing it out. I also found a related post at [1]. So I add the typecast to int, before printing it out. most of the refactored code is unlikely to be reachable, but still. I also refactored ExecPrepareJsonItemCoercion error message, to make the error message more explicit. @@ -4498,7 +4498,9 @@ ExecPrepareJsonItemCoercion(JsonbValue *item, JsonExprState *jsestate, if (throw_error) ereport(ERROR, errcode(ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE), - errmsg("SQL/JSON item cannot be cast to target type")); + errcode(ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE), + errmsg("SQL/JSON item cannot be cast to type %s", + format_type_be(jsestate->jsexpr->returning->typid))); + /* + * We abuse CaseTestExpr here as placeholder to pass the result of + * evaluating the JSON_VALUE/QUERY jsonpath expression as input to the + * coercion expression. + */ + CaseTestExpr *placeholder = makeNode(CaseTestExpr); typo in comment, should it be `JSON_VALUE/JSON_QUERY`? [1] https://stackoverflow.com/questions/8012647/can-we-typecast-a-enum-variable-in-c v42-0001-miscellaneous-fix-based-on-v42.no-cfbot Description: Binary data
Re: Add bump memory context type and use it for tuplesorts
On Tue, Mar 5, 2024 at 9:42 AM David Rowley wrote: > performance against the recently optimised aset, generation and slab > contexts. The attached graph shows the time it took in seconds to > allocate 1GB of memory performing a context reset after 1MB. The > function I ran the test on is in the attached > pg_allocate_memory_test.patch.txt file. > > The query I ran was: > > select chksz,mtype,pg_allocate_memory_test_reset(chksz, > 1024*1024,1024*1024*1024, mtype) from (values(8),(16),(32),(64)) > sizes(chksz),(values('aset'),('generation'),('slab'),('bump')) > cxt(mtype) order by mtype,chksz; I ran the test function, but using 256kB and 3MB for the reset frequency, and with 8,16,24,32 byte sizes (patched against a commit after the recent hot/cold path separation). Images attached. I also get a decent speedup with the bump context, but not quite as dramatic as on your machine. It's worth noting that slab is the slowest for me. This is an Intel i7-10750H.
Re: Statistics Import and Export
Hi, On Fri, Mar 08, 2024 at 01:35:40AM -0500, Corey Huinker wrote: > Anyway, here's v7. Eagerly awaiting feedback. Thanks! A few random comments: 1 === +The purpose of this function is to apply statistics values in an +upgrade situation that are "good enough" for system operation until Worth to add a few words about "influencing" the planner use case? 2 === +#include "catalog/pg_type.h" +#include "fmgr.h" Are those 2 needed? 3 === + if (!HeapTupleIsValid(ctup)) + elog(ERROR, "pg_class entry for relid %u vanished during statistics import", s/during statistics import/when setting statistics/? 4 === +Datum +pg_set_relation_stats(PG_FUNCTION_ARGS) +{ . . + table_close(rel, ShareUpdateExclusiveLock); + + PG_RETURN_BOOL(true); Why returning a bool? (I mean we'd throw an error or return true). 5 === + */ +Datum +pg_set_attribute_stats(PG_FUNCTION_ARGS) +{ This function is not that simple, worth to explain its logic in a comment above? 6 === + if (!HeapTupleIsValid(tuple)) + { + relation_close(rel, NoLock); + PG_RETURN_BOOL(false); + } + + attr = (Form_pg_attribute) GETSTRUCT(tuple); + if (attr->attisdropped) + { + ReleaseSysCache(tuple); + relation_close(rel, NoLock); + PG_RETURN_BOOL(false); + } Why is it returning "false" and not throwing an error? (if ok, then I think we can get rid of returning a bool). 7 === +* If this relation is an index and that index has expressions in +* it, and the attnum specified s/is an index and that index has/is an index that has/? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Add Index-level REINDEX with multiple jobs
Andrey M. Borodin писал(а) 2024-03-04 09:26: On 6 Feb 2024, at 11:21, Michael Paquier wrote: The problem may be actually trickier than that, no? Could there be other factors to take into account for their classification, like their sizes (typically, we'd want to process the biggest one first, I guess)? Maxim, what do you think about it? Best regards, Andrey Borodin. I agree that, as is the case with tables in REINDEX SCHEME, indices should be sorted accordingly to their size. Attaching the second version of patch, with indices being sorted by size. Best regards, Svetlana DerevyankoFrom 3c2f0fcbcf382cc2cb9786e26ad8027558937ea2 Mon Sep 17 00:00:00 2001 From: Svetlana Derevyanko Date: Fri, 29 Dec 2023 08:20:54 +0300 Subject: [PATCH v2] Add Index-level REINDEX with multiple jobs Author: Maxim Orlov , Svetlana Derevyanko --- doc/src/sgml/ref/reindexdb.sgml| 3 +- src/bin/scripts/reindexdb.c| 159 ++--- src/bin/scripts/t/090_reindexdb.pl | 8 +- 3 files changed, 151 insertions(+), 19 deletions(-) diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml index 8d9ced212f..dfb5e534fb 100644 --- a/doc/src/sgml/ref/reindexdb.sgml +++ b/doc/src/sgml/ref/reindexdb.sgml @@ -187,8 +187,7 @@ PostgreSQL documentation setting is high enough to accommodate all connections. -Note that this option is incompatible with the --index -and --system options. +Note that this option is incompatible with the --system option. diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c index 6ae30dff31..63d99e7441 100644 --- a/src/bin/scripts/reindexdb.c +++ b/src/bin/scripts/reindexdb.c @@ -251,14 +251,6 @@ main(int argc, char *argv[]) } else { - /* - * Index-level REINDEX is not supported with multiple jobs as we - * cannot control the concurrent processing of multiple indexes - * depending on the same relation. - */ - if (concurrentCons > 1 && indexes.head != NULL) - pg_fatal("cannot use multiple jobs to reindex indexes"); - if (dbname == NULL) { if (getenv("PGDATABASE")) @@ -279,7 +271,7 @@ main(int argc, char *argv[]) if (indexes.head != NULL) reindex_one_database(, REINDEX_INDEX, , progname, echo, verbose, - concurrently, 1, tablespace); + concurrently, concurrentCons, tablespace); if (tables.head != NULL) reindex_one_database(, REINDEX_TABLE, , @@ -299,6 +291,67 @@ main(int argc, char *argv[]) exit(0); } +static SimpleStringList * +sort_indices_by_tables(PGconn *conn, SimpleStringList *user_list, SimpleStringList *indices_tables_list, + SimpleStringList ***indices_process_list, bool echo) +{ + SimpleStringList *process_list = pg_malloc0(sizeof(SimpleStringList)); + SimpleStringListCell *idx_cell, + *idx_table, + *cell; + int ipl_len = 10, +current_tables_num = 0, +i; + + *indices_process_list = pg_malloc0(sizeof(*indices_process_list) * ipl_len); + + for (idx_cell = user_list->head, idx_table = indices_tables_list->head; + idx_cell; + idx_cell = idx_cell->next, idx_table = idx_table->next) + { + SimpleStringList *list_to_add = NULL; + + if (!current_tables_num) + { + simple_string_list_append(process_list, idx_table->val); + (*indices_process_list)[0] = pg_malloc0(sizeof(SimpleStringList)); + current_tables_num++; + } + + cell = process_list->head; + for (i = 0; i < current_tables_num; i++) + { + if (!strcmp(idx_table->val, cell->val)) + { +list_to_add = (*indices_process_list)[i]; +break; + } + + } + + if (!list_to_add) + { + simple_string_list_append(process_list, idx_table->val); + + if (current_tables_num >= ipl_len) + { +ipl_len *= 2; +*indices_process_list = pg_realloc(*indices_process_list, + sizeof(*indices_process_list) * ipl_len); + } + + (*indices_process_list)[current_tables_num] = pg_malloc0(sizeof(SimpleStringList)); + list_to_add = (*indices_process_list)[current_tables_num]; + current_tables_num++; + } + + simple_string_list_append(list_to_add, idx_cell->val); + + } + + return process_list; +} + static void reindex_one_database(ConnParams *cparams, ReindexType type, SimpleStringList *user_list, @@ -310,10 +363,13 @@ reindex_one_database(ConnParams *cparams, ReindexType type, SimpleStringListCell *cell; bool parallel = concurrentCons > 1; SimpleStringList *process_list = user_list; + SimpleStringList **indices_process_list = NULL; + SimpleStringList *indices_tables_list = NULL; ReindexType process_type = type; ParallelSlotArray *sa; bool failed = false; - int items_count = 0; + int items_count = 0, +i = 0; conn = connectDatabase(cparams, progname, echo, false, true); @@ -383,8 +439,23 @@ reindex_one_database(ConnParams *cparams, ReindexType type, return; break; - case REINDEX_SYSTEM: case REINDEX_INDEX: +
Re: Reducing the log spam
- the subscriber's server log. + the subscriber's server log if you remove 23505 from + . This seems like a pretty big regression. Being able to know why your replication got closed seems pretty critical. On Mon, 11 Mar 2024 at 03:44, Laurenz Albe wrote: > > On Sat, 2024-03-09 at 14:03 +0100, Laurenz Albe wrote: > > Here is a patch that implements this. > > And here is patch v2 that fixes a bug and passes the regression tests. > > Yours, > Laurenz Albe
Fix expecteddir argument in pg_regress
Hi all! pg_regress accepts the expecteddir argument. However, it is never used and outputdir is used instead to get the expected files paths. This patch fixes this and uses the expecteddir argument as expected. Regards, Anthonin v01-0001-pg_regress-Use-expecteddir-for-the-expected-file.patch Description: Binary data
[PROPOSAL] Skip test citext_utf8 on Windows
Greetings, everyone! While running "installchecks" on databases with UTF-8 encoding the test citext_utf8 fails because of Turkish dotted I like this: SELECT 'i'::citext = 'İ'::citext AS t; t --- - t + f (1 row) I tried to replicate the test's results by hand and with any collation that I tried (including --locale="Turkish") this test failed Also an interesing result of my tesing. If you initialize you DB with -E utf-8 --locale="Turkish" and then run select LOWER('İ'); the output will be this: lower --- İ (1 row) Which I find strange since lower() uses collation that was passed (default in this case but still) My PostgreSQL version is this: postgres=# select version(); version -- PostgreSQL 17devel on x86_64-windows, compiled by gcc-13.1.0, 64-bit The proposed patch for skipping test is attached Oleg Tselebrovskiy, Postgres Proÿþd i f f - - g i t a / c o n t r i b / c i t e x t / e x p e c t e d / c i t e x t _ u t f 8 . o u t b / c o n t r i b / c i t e x t / e x p e c t e d / c i t e x t _ u t f 8 . o u t i n d e x 5 d 9 8 8 d c d 4 8 5 . . 6 c 4 0 6 9 f 9 4 6 9 1 0 0 6 4 4 - - - a / c o n t r i b / c i t e x t / e x p e c t e d / c i t e x t _ u t f 8 . o u t + + + b / c o n t r i b / c i t e x t / e x p e c t e d / c i t e x t _ u t f 8 . o u t @ @ - 1 0 , 7 + 1 0 , 8 @ @ S E L E C T g e t d a t a b a s e e n c o d i n g ( ) <