Re: pg_rewind WAL segments deletion pitfall
At Wed, 31 Aug 2022 14:30:31 +0900 (JST), Kyotaro Horiguchi wrote in > What do you think about that? By the way don't you add an CF entry for this? -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_rewind WAL segments deletion pitfall
At Tue, 30 Aug 2022 11:01:58 +0200, Alexander Kukushkin wrote in > On Tue, 30 Aug 2022 at 10:27, Kyotaro Horiguchi > wrote: > > > > > Hmm. Doesn't it work to ignoring tli then? All segments that their > > segment number is equal to or larger than the checkpoint locaiton are > > preserved regardless of TLI? > > > > If we ignore TLI there is a chance that we may retain some unnecessary (or > just wrong) files. Right. I mean I don't think thats a problem and we can rely on postgres itself for later cleanup. Theoretically some out-of-range tli or segno files are left alone but they surely will be gone soon after the server starts. > > > Also, we need to take into account the divergency LSN. Files after it are > > > not required. > > > > They are removed at the later checkpoints. But also we can remove > > segments that are out of the range between the last common checkpoint > > and divergence point ignoring TLI. > > > Everything that is newer last_common_checkpoint_seg could be removed (but > it already happens automatically, because these files are missing on the > new primary). > WAL files that are older than last_common_checkpoint_seg could be either > removed or at least not copied from the new primary. .. > The current implementation relies on tracking WAL files being open while > searching for the last common checkpoint. It automatically starts from the > divergence_seg, automatically finishes at last_common_checkpoint_seg, and > last but not least, automatically handles timeline changes. I don't think > that manually written code that decides what to do from the WAL file name > (and also takes into account TLI) could be much simpler than the current > approach. Yeah, I know. My expectation is taking the simplest way for the same effect. My concern was the additional hash. On second thought, I concluded that we should that on the existing filehash. We can just add a FILE_ACTION_NONE entry to the file hash from SimpleXLogPageRead. Since this happens before decide_file_action() call, decide_file_action() should ignore the entries with FILE_ACTION_NONE. Also we need to call filehash_init() earlier. > Actually, since we start doing some additional "manipulations" with files > in pg_wal, we probably should do a symmetric action with files inside > pg_wal/archive_status In that sense, pg_rewind rather should place missing archive_status/*.done for segments including restored ones seen while finding checkpoint. This is analogous of the behavior with pg_basebackup and pg_receivewal. Also we should add FILE_ACTION_NONE entries for .done files for segments read while finding checkpoint. What do you think about that? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: New strategies for freezing, advancing relfrozenxid early
On Tue, Aug 30, 2022 at 9:37 PM Jeff Davis wrote: > On Tue, 2022-08-30 at 18:50 -0700, Peter Geoghegan wrote: > > Since VM snapshots are immutable, it should be relatively > > easy to have the implementation make its final decision on skipping > > only *after* lazy_scan_heap() returns. > > I like this idea. I was hoping that you would. I imagine that this idea (with minor variations) could enable an approach that's much closer to what you were thinking of: one that mostly focuses on controlling the number of unfrozen pages, and not so much on advancing relfrozenxid early, just because we can and we might not get another chance for a long time. In other words your idea of a design that can freeze more during a non-aggressive VACUUM, while still potentially skipping all-visible pages. I said earlier on that we ought to at least have a strong bias in the direction of advancing relfrozenxid in larger tables, especially when we decide to freeze whole pages more eagerly -- we only get one chance to advance relfrozenxid per VACUUM, and those opportunities will naturally be few and far between. We cannot really justify all this extra freezing if it doesn't prevent antiwraparound autovacuums. That was more or less my objection to going in that direction. But if we can more or less double the number of opportunities to at least ask the question "is now a good time to advance relfrozenxid?" without really paying much for keeping this option open (and I think that we can), my concern about relfrozenxid advancement becomes far less important. Just being able to ask that question is significantly less rare and precious. Plus we'll probably be able to make significantly better decisions about relfrozenxid overall with the "second phase because I changed my mind about skipping" concept in place. -- Peter Geoghegan
Re: New strategies for freezing, advancing relfrozenxid early
On Tue, 2022-08-30 at 18:50 -0700, Peter Geoghegan wrote: > Since VM snapshots are immutable, it should be relatively > easy to have the implementation make its final decision on skipping > only *after* lazy_scan_heap() returns. I like this idea. Regards, Jeff Davis
Re: Schema variables - new implementation for Postgres 15
st 24. 8. 2022 v 10:04 odesílatel Erik Rijkers napsal: > Op 24-08-2022 om 08:37 schreef Pavel Stehule: > >> > > > > I fixed these. > > > > > [v20220824-1-*.patch] > > Hi Pavel, > > I noticed just now that variable assignment (i.e., LET) unexpectedly > (for me anyway) cast the type of the input value. Surely that's wrong? > The documentation says clearly enough: > > 'The result must be of the same data type as the session variable.' > > > Example: > > create variable x integer; > let x=1.5; > select x, pg_typeof(x); > x | pg_typeof > ---+--- > 2 | integer > (1 row) > > > Is this correct? > > If such casts (there are several) are intended then the text of the > documentation should be changed. > I changed this @@ -58,8 +58,9 @@ LET session_variable = DEFAULT sql_expression - An SQL expression, in parentheses. The result must be of the same data type as the session - variable. + An SQL expression (can be subquery in parenthesis). The result must + be of castable to the same data type as the session variable (in + implicit or assignment context). is it ok? Regards Pavel > Thanks, > > Erik > >
Re: [PATCH] Optimize json_lex_string by batching character copying
On Wed, Aug 31, 2022 at 10:50:39AM +0700, John Naylor wrote: > Here's the final piece. I debated how many tests to add and decided it > was probably enough to add one each for checking quotes and > backslashes in the fast path. There is one cosmetic change in the > code: Before, the vectorized less-equal check compared to 0x1F, but > the byte-wise path did so with < 32. I made them both "less-equal 31" > for consistency. I'll commit this by the end of the week unless anyone > has a better idea about testing. LGTM -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: effective_multixact_freeze_max_age issue
On Tue, Aug 30, 2022 at 05:24:17PM -0700, Peter Geoghegan wrote: > Attached is v2, which cleans up the structure of > vacuum_set_xid_limits() a bit more. The overall idea was to improve > the overall flow/readability of the function by moving the WARNINGs > into their own "code stanza", just after the logic for establishing > freeze cutoffs and just before the logic for deciding on > aggressiveness. That is now the more logical approach (group the > stanzas by functionality), since we can't sensibly group the code > based on whether it deals with XIDs or with Multis anymore (not since > the function was taught to deal with the question of whether caller's > VACUUM will be aggressive). > > Going to push this in the next day or so. LGTM -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Optimize json_lex_string by batching character copying
On Tue, Aug 23, 2022 at 1:03 PM John Naylor wrote: > > LGTM overall. My plan is to split out the json piece, adding tests for > that, and commit the infrastructure for it fairly soon. Here's the final piece. I debated how many tests to add and decided it was probably enough to add one each for checking quotes and backslashes in the fast path. There is one cosmetic change in the code: Before, the vectorized less-equal check compared to 0x1F, but the byte-wise path did so with < 32. I made them both "less-equal 31" for consistency. I'll commit this by the end of the week unless anyone has a better idea about testing. -- John Naylor EDB: http://www.enterprisedb.com From f1159dcc2044edb107e0dfeae5e8f3c7feb10cd2 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Wed, 31 Aug 2022 10:39:17 +0700 Subject: [PATCH v10] Optimize JSON lexing of long strings Use optimized linear search when looking ahead for end quotes, backslashes, and non-printable characters. This results in nearly 40% faster JSON parsing on x86-64 when most values are long strings, and all platforms should see some improvement. Reviewed by Andres Freund and Nathan Bossart Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com --- src/common/jsonapi.c | 13 ++--- src/test/regress/expected/json.out | 13 + src/test/regress/sql/json.sql | 5 + 3 files changed, 28 insertions(+), 3 deletions(-) diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c index fefd1d24d9..cfc025749c 100644 --- a/src/common/jsonapi.c +++ b/src/common/jsonapi.c @@ -19,6 +19,7 @@ #include "common/jsonapi.h" #include "mb/pg_wchar.h" +#include "port/pg_lfind.h" #ifndef FRONTEND #include "miscadmin.h" @@ -844,7 +845,7 @@ json_lex_string(JsonLexContext *lex) } else { - char *p; + char *p = s; if (hi_surrogate != -1) return JSON_UNICODE_LOW_SURROGATE; @@ -853,11 +854,17 @@ json_lex_string(JsonLexContext *lex) * Skip to the first byte that requires special handling, so we * can batch calls to appendBinaryStringInfo. */ - for (p = s; p < end; p++) + while (p < end - sizeof(Vector8) && + !pg_lfind8('\\', (uint8 *) p, sizeof(Vector8)) && + !pg_lfind8('"', (uint8 *) p, sizeof(Vector8)) && + !pg_lfind8_le(31, (uint8 *) p, sizeof(Vector8))) +p += sizeof(Vector8); + + for (; p < end; p++) { if (*p == '\\' || *p == '"') break; -else if ((unsigned char) *p < 32) +else if ((unsigned char) *p <= 31) { /* Per RFC4627, these characters MUST be escaped. */ /* diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out index e9d6e9faf2..cb181226e9 100644 --- a/src/test/regress/expected/json.out +++ b/src/test/regress/expected/json.out @@ -42,6 +42,19 @@ LINE 1: SELECT '"\v"'::json; ^ DETAIL: Escape sequence "\v" is invalid. CONTEXT: JSON data, line 1: "\v... +-- Check fast path for longer strings (at least 16 bytes long) +SELECT ('"'||repeat('.', 12)||'abc"')::json; -- OK + json +--- + "abc" +(1 row) + +SELECT ('"'||repeat('.', 12)||'abc\n"')::json; -- OK, legal escapes +json +- + "abc\n" +(1 row) + -- see json_encoding test for input with unicode escapes -- Numbers. SELECT '1'::json;-- OK diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql index e366c6f51b..589e0cea36 100644 --- a/src/test/regress/sql/json.sql +++ b/src/test/regress/sql/json.sql @@ -7,6 +7,11 @@ SELECT '"abc def"'::json; -- ERROR, unescaped newline in string constant SELECT '"\n\"\\"'::json; -- OK, legal escapes SELECT '"\v"'::json; -- ERROR, not a valid JSON escape + +-- Check fast path for longer strings (at least 16 bytes long) +SELECT ('"'||repeat('.', 12)||'abc"')::json; -- OK +SELECT ('"'||repeat('.', 12)||'abc\n"')::json; -- OK, legal escapes + -- see json_encoding test for input with unicode escapes -- Numbers. -- 2.36.1
Re: Stack overflow issue
On Wed, Aug 31, 2022 at 6:57 AM Tom Lane wrote: > I wrote: > > The upstream recommendation, which seems pretty sane to me, is to > > simply reject any string exceeding some threshold length as not > > possibly being a word. Apparently it's common to use thresholds > > as small as 64 bytes, but in the attached I used 1000 bytes. > > On further thought: that coding treats anything longer than 1000 > bytes as a stopword, but maybe we should just accept it unmodified. > The manual says "A Snowball dictionary recognizes everything, whether > or not it is able to simplify the word". While "recognizes" formally > includes the case of "recognizes as a stopword", people might find > this behavior surprising. We could alternatively do it as attached, > which accepts overlength words but does nothing to them except > case-fold. This is closer to the pre-patch behavior, but gives up > the opportunity to avoid useless downstream processing of long words. This patch looks good to me. It avoids overly-long words (> 1000 bytes) going through the stemmer so the stack overflow issue in Turkish stemmer should not exist any more. Thanks Richard
Re: New strategies for freezing, advancing relfrozenxid early
On Tue, Aug 30, 2022 at 1:45 PM Peter Geoghegan wrote: > > d. Can you help give me a sense of scale of the problems solved by > > visibilitymap snapshots and the cost model? Do those need to be in v1? > > I'm not sure. I think that having certainty that we'll be able to scan > only so many pages up-front is very broadly useful, though. Plus it > removes the SKIP_PAGES_THRESHOLD stuff, which was intended to enable > relfrozenxid advancement in non-aggressive VACUUMs, but does so in a > way that results in scanning many more pages needlessly. See commit > bf136cf6, which added the SKIP_PAGES_THRESHOLD stuff back in 2009, > shortly after the visibility map first appeared. Here is a better example: Right now the second patch adds both VM snapshots and the skipping strategy stuff. The VM snapshot is used in the second patch, as a source of reliable information about how we need to process the table, in terms of the total number of scanned_pages -- which drives our choice of strategy. Importantly, we can assess the question of which skipping strategy to take (in non-aggressive VACUUM) based on 100% accurate information about how many *extra* pages we'll have to scan in the event of being eager (i.e. in the event that we prioritize early relfrozenxid advancement over skipping some pages). Importantly, that cannot change later on, since VM snapshots are immutable -- everything is locked in. That already seems quite valuable to me. This general concept could be pushed a lot further without great difficulty. Since VM snapshots are immutable, it should be relatively easy to have the implementation make its final decision on skipping only *after* lazy_scan_heap() returns. We could allow VACUUM to "change its mind about skipping" in cases where it initially thought that skipping was the best strategy, only to discover much later on that that was the wrong choice after all. A huge amount of new, reliable information will come to light from scanning the heap rel. In particular, the current value of vacrel->NewRelfrozenXid seems like it would be particularly interesting when the time came to consider if a second scan made sense -- if NewRelfrozenXid is a recent-ish value already, then that argues for finishing off the all-visible pages in a second heap pass, with the aim of setting relfrozenxid to a similarly recent value when it happens to be cheap to do so. The actual process of scanning precisely those all-visible pages that were skipped the first time around during a second call to lazy_scan_heap() can be implemented in the obvious way: by teaching the VM snapshot infrastructure/lazy_scan_skip() to treat pages that were skipped the first time around to get scanned during the second pass over the heap instead. Also, those pages that were scanned the first time around can/must be skipped on our second pass (excluding all-frozen pages, which won't be scanned in either heap pass). I've used the term "second heap pass" here, but that term is slightly misleading. The final outcome of this whole process is that every heap page that the vmsnap says VACUUM will need to scan in order for it to be able to safely advance relfrozenxid will be scanned, precisely once. The overall order that the heap pages are scanned in will of course differ from the simple case, but I don't think that it makes very much difference. In reality there will have only been one heap pass, consisting of two distinct phases. No individual heap page will ever be considered for pruning/freezing more than once, no matter what. This is just a case of *reordering* work. Immutability makes reordering work easy in general. -- Peter Geoghegan
Re: [PATCH] Add native windows on arm64 support
Michael Paquier writes: > Hence, I am wondering if the solution here > would be to do one xmlXPathNodeSetSort(xpathobj->nodesetval) after > compiling the path with xmlXPathCompiledEval() in XmlTableGetValue(). Hmm, I see that function declared in , which sure doesn't give me a warm feeling that we're supposed to call it directly. But maybe there's an approved way to get the result. Or perhaps this test case is wrong, and instead of "node()" we need to write something that specifies a sorted result? regards, tom lane
Re: effective_multixact_freeze_max_age issue
On Mon, Aug 29, 2022 at 3:40 PM Nathan Bossart wrote: > Agreed. Attached is v2, which cleans up the structure of vacuum_set_xid_limits() a bit more. The overall idea was to improve the overall flow/readability of the function by moving the WARNINGs into their own "code stanza", just after the logic for establishing freeze cutoffs and just before the logic for deciding on aggressiveness. That is now the more logical approach (group the stanzas by functionality), since we can't sensibly group the code based on whether it deals with XIDs or with Multis anymore (not since the function was taught to deal with the question of whether caller's VACUUM will be aggressive). Going to push this in the next day or so. I also removed some local variables that seem to make the function a lot harder to follow in v2. Consider code like this: - freezemin = freeze_min_age; - if (freezemin < 0) - freezemin = vacuum_freeze_min_age; - freezemin = Min(freezemin, autovacuum_freeze_max_age / 2); - Assert(freezemin >= 0); + if (freeze_min_age < 0) + freeze_min_age = vacuum_freeze_min_age; + freeze_min_age = Min(freeze_min_age, autovacuum_freeze_max_age / 2); + Assert(freeze_min_age >= 0); Why have this freezemin temp variable? Why not just use the vacuum_freeze_min_age function parameter directly instead? That is a better representation of what's going on at the conceptual level. We now assign vacuum_freeze_min_age to the vacuum_freeze_min_age arg (not to the freezemin variable) when our VACUUM caller passes us a value of -1 for that arg. -1 effectively means "whatever the value of the vacuum_freeze_min_age GUC is'', which is clearer without the superfluous freezemin variable. -- Peter Geoghegan v2-0001-Derive-freeze-cutoff-from-nextXID-not-OldestXmin.patch Description: Binary data
Re: [PATCH] Add native windows on arm64 support
On Tue, Aug 30, 2022 at 08:07:27PM -0400, Tom Lane wrote: > It seems like maybe we're relying on an ordering we should not. > Yet surely the ordering of the pieces of this output is meaningful? > Are we using the wrong libxml API to create this result? > > Anyway, I'm now suspicious that we've accidentally exposed a logic > bug in the XMLTABLE code, rather than anything wrong with the > ASLR stuff. Funny. I have reached basically the same conclusion as you a couple of minutes ago, but I also think that I have found what we need to do here to ensure the ordering of the nodes generated by the libxml code. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add native windows on arm64 support
On Wed, Aug 31, 2022 at 08:36:01AM +0900, Michael Paquier wrote: > There have been more failures, always switching the input from > "predeeppost" > to "predeeppost". > > Using a PATH of node() influences the output. I am not verse unto > XMLTABLE, but could it be an issue where each node is parsed and we > have something like a qsort() applied on the pointer addresses for > each part or something like that, causing the output to become > unstable? Hmm. I think that I may have an idea here after looking at our xml.c and xpath.c in libxml2/. From what I understand, we process the PATH through XmlTableGetValue() that builds a XML node path in xmlXPathCompiledEval(). The interesting part comes from libxml2's xmlXPathCompiledEvalInternal(), where I think we don't apply a sort on the contents generated. Hence, I am wondering if the solution here would be to do one xmlXPathNodeSetSort(xpathobj->nodesetval) after compiling the path with xmlXPathCompiledEval() in XmlTableGetValue(). This should ensure that the items are ordered even if ASLR mixes if the pointer positions. A complete solution would involve more code paths, but we'd need only one change in XmlTableGetValue() for the current regression tests to work. I don't have an environment where I can reproduce that, so that would be up to the buildfarm to stress this solution.. Thoughts? -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add native windows on arm64 support
Michael Paquier writes: > Using a PATH of node() influences the output. I am not verse unto > XMLTABLE, but could it be an issue where each node is parsed and we > have something like a qsort() applied on the pointer addresses for > each part or something like that, causing the output to become > unstable? I'm not sure. I dug into this enough to find that the output from this query is generated in xml.c lines 4635ff: /* Concatenate serialized values */ initStringInfo(); for (int i = 0; i < count; i++) { textstr = xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i], xtCxt->xmlerrcxt); appendStringInfoText(, textstr); } cstr = str.data; So evidently, the problem occurs because the elements of the nodesetval->nodeTab[] array are in a different order than we expect. I looked into and found the definition of the "nodesetval" struct, and the comments are eye-opening to say the least: /* * A node-set (an unordered collection of nodes without duplicates). */ typedef struct _xmlNodeSet xmlNodeSet; typedef xmlNodeSet *xmlNodeSetPtr; struct _xmlNodeSet { int nodeNr; /* number of nodes in the set */ int nodeMax;/* size of the array as allocated */ xmlNodePtr *nodeTab;/* array of nodes in no particular order */ /* @@ with_ns to check wether namespace nodes should be looked at @@ */ }; It seems like maybe we're relying on an ordering we should not. Yet surely the ordering of the pieces of this output is meaningful? Are we using the wrong libxml API to create this result? Anyway, I'm now suspicious that we've accidentally exposed a logic bug in the XMLTABLE code, rather than anything wrong with the ASLR stuff. regards, tom lane
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Here are some review comments for the patch v9-0001: == 1. Commit message 1a. With this patch, I'm proposing the following change: If there is an index on the subscriber, use the index as long as the planner sub-modules picks any index over sequential scan. The index should be a btree index, not a partital index. Finally, the index should have at least one column reference (e.g., cannot consists of only expressions). SUGGESTION With this patch, I'm proposing the following change: If there is any index on the subscriber, let the planner sub-modules compare the costs of index versus sequential scan and choose the cheapest. The index should be a btree index, not a partial index, and it should have at least one column reference (e.g., cannot consist of only expressions). ~ 1b. The Majority of the logic on the subscriber side exists in the code. "exists" -> "already exists" ~ 1c. psql -c "truncate pgbench_accounts;" -p 9700 postgres "truncate" -> "TRUNCATE" ~ 1d. Try to wrap this message text at 80 char width. == 2. src/backend/replication/logical/relation.c - logicalrep_rel_open + /* + * Finding a usable index is an infrequent task. It is performed + * when an operation is first performed on the relation, or after + * invalidation of the relation cache entry (e.g., such as ANALYZE). + */ + entry->usableIndexOid = LogicalRepUsableIndex(entry->localrel, remoterel); Seemed a bit odd to say "performed" 2x in the same sentence. "It is performed when..." -> "It occurs when...” (?) ~~~ 3. src/backend/replication/logical/relation.c - logicalrep_partition_open + /* + * Finding a usable index is an infrequent task. It is performed + * when an operation is first performed on the relation, or after + * invalidation of the relation cache entry (e.g., such as ANALYZE). + */ + part_entry->relmapentry.usableIndexOid = + LogicalRepUsableIndex(partrel, remoterel); 3a. Same as comment #2 above. ~ 3b. The jumping between 'part_entry' and 'entry' is confusing. Since 'entry' is already assigned to be _entry->relmapentry can't you use that here? SUGGESTION entry->usableIndexOid = LogicalRepUsableIndex(partrel, remoterel); ~~~ 4. src/backend/replication/logical/relation.c - GetIndexOidFromPath +/* + * Returns a valid index oid if the input path is an index path. + * Otherwise, return invalid oid. + */ +static Oid +GetIndexOidFromPath(Path *path) Perhaps may this function comment more consistent with others (like GetRelationIdentityOrPK, LogicalRepUsableIndex) and refer to the InvalidOid. SUGGESTION /* * Returns a valid index oid if the input path is an index path. * * Otherwise, returns InvalidOid. */ ~~~ 5. src/backend/replication/logical/relation.c - IndexOnlyOnExpression +bool +IndexOnlyOnExpression(IndexInfo *indexInfo) +{ + int i; + for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++) + { + AttrNumber attnum = indexInfo->ii_IndexAttrNumbers[i]; + if (AttributeNumberIsValid(attnum)) + return false; + } + + return true; +} 5a. Add a blank line after those declarations. ~ 5b. AFAIK the C99 style for loop declarations should be OK [1] for new code, so declaring like below would be cleaner: for (int i = 0; ... ~~~ 6. src/backend/replication/logical/relation.c - FilterOutNotSuitablePathsForReplIdentFull +/* + * Iterates over the input path list and returns another path list + * where paths with non-btree indexes, partial indexes or + * indexes on only expressions are eliminated from the list. + */ +static List * +FilterOutNotSuitablePathsForReplIdentFull(List *pathlist) "are eliminated from the list." -> "have been removed." ~~~ 7. + foreach(lc, pathlist) + { + Path*path = (Path *) lfirst(lc); + Oid indexOid = GetIndexOidFromPath(path); + Relation indexRelation; + IndexInfo *indexInfo; + bool is_btree; + bool is_partial; + bool is_only_on_expression; + + if (!OidIsValid(indexOid)) + { + /* Unrelated Path, skip */ + suitableIndexList = lappend(suitableIndexList, path); + } + else + { + indexRelation = index_open(indexOid, AccessShareLock); + indexInfo = BuildIndexInfo(indexRelation); + is_btree = (indexInfo->ii_Am == BTREE_AM_OID); + is_partial = (indexInfo->ii_Predicate != NIL); + is_only_on_expression = IndexOnlyOnExpression(indexInfo); + index_close(indexRelation, NoLock); + + if (is_btree && !is_partial && !is_only_on_expression) + suitableIndexList = lappend(suitableIndexList, path); + } + } I think most of those variables are only used in the "else" block so maybe it's better to declare them at that scope. + Relation indexRelation; + IndexInfo *indexInfo; + bool is_btree; + bool is_partial; + bool is_only_on_expression; ~~~ 8. src/backend/replication/logical/relation.c - GetCheapestReplicaIdentityFullPath + * Indexes that consists of only expressions (e.g., + * no simple column references on the index) are also + * eliminated with a similar reasoning. "consists" -> "consist" "with a similar reasoning" -> "with similar reasoning" ~~~ 9. + * We also eliminate
Re: [PATCH] Add native windows on arm64 support
On Mon, Aug 29, 2022 at 10:00:10PM -0400, Tom Lane wrote: > Michael Paquier writes: > > On Mon, Aug 29, 2022 at 05:33:56PM -0700, Andres Freund wrote: > >> The weirdest part is that it only happens as part of the the pg_upgrade > >> test. > > > make check has just failed: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird=2022-08-30%2001%3A15%3A13 > > So it *is* probabilistic, which is pretty much what you'd expect > if ASLR triggers it. That brings us no closer to understanding > what the mechanism is, though. There have been more failures, always switching the input from "predeeppost" to "predeeppost". Using a PATH of node() influences the output. I am not verse unto XMLTABLE, but could it be an issue where each node is parsed and we have something like a qsort() applied on the pointer addresses for each part or something like that, causing the output to become unstable? -- Michael signature.asc Description: PGP signature
Re: Stack overflow issue
I wrote: > The upstream recommendation, which seems pretty sane to me, is to > simply reject any string exceeding some threshold length as not > possibly being a word. Apparently it's common to use thresholds > as small as 64 bytes, but in the attached I used 1000 bytes. On further thought: that coding treats anything longer than 1000 bytes as a stopword, but maybe we should just accept it unmodified. The manual says "A Snowball dictionary recognizes everything, whether or not it is able to simplify the word". While "recognizes" formally includes the case of "recognizes as a stopword", people might find this behavior surprising. We could alternatively do it as attached, which accepts overlength words but does nothing to them except case-fold. This is closer to the pre-patch behavior, but gives up the opportunity to avoid useless downstream processing of long words. regards, tom lane diff --git a/src/backend/snowball/dict_snowball.c b/src/backend/snowball/dict_snowball.c index 68c9213f69..1d5dfff5a0 100644 --- a/src/backend/snowball/dict_snowball.c +++ b/src/backend/snowball/dict_snowball.c @@ -275,8 +275,24 @@ dsnowball_lexize(PG_FUNCTION_ARGS) char *txt = lowerstr_with_len(in, len); TSLexeme *res = palloc0(sizeof(TSLexeme) * 2); - if (*txt == '\0' || searchstoplist(&(d->stoplist), txt)) + /* + * Do not pass strings exceeding 1000 bytes to the stemmer, as they're + * surely not words in any human language. This restriction avoids + * wasting cycles on stuff like base64-encoded data, and it protects us + * against possible inefficiency or misbehavior in the stemmer. (For + * example, the Turkish stemmer has an indefinite recursion, so it can + * crash on long-enough strings.) However, Snowball dictionaries are + * defined to recognize all strings, so we can't reject the string as an + * unknown word. + */ + if (len > 1000) + { + /* return the lexeme lowercased, but otherwise unmodified */ + res->lexeme = txt; + } + else if (*txt == '\0' || searchstoplist(&(d->stoplist), txt)) { + /* empty or stopword, so report as stopword */ pfree(txt); } else
Re: Reducing the chunk header sizes on all memory context types
On Wed, 31 Aug 2022 at 02:17, Tom Lane wrote: > > I wrote: > > So maybe we should revisit the question. It'd be worth collecting some > > stats about how much extra space would be needed if we force there > > to be room for a sentinel. > > Actually, after ingesting more caffeine, the problem with this for aset.c > is that the only way to add space for a sentinel that didn't fit already > is to double the space allocation. That's a little daunting, especially > remembering how many places deliberately allocate power-of-2-sized > arrays. I decided to try and quantify that by logging the size, MAXALIGN(size) and the power of 2 size during AllocSetAlloc and GenerationAlloc. I made the pow2_size 0 in GenerationAlloc and in AlloocSetAlloc when size > allocChunkLimit. After running make installcheck, grabbing the records out the log and loading them into Postgres, I see that if we did double the pow2_size when there's no space for the sentinel byte then we'd go from allocating a total of 10.2GB all the way to 16.4GB (!) of non-dedicated block aset.c allocations. select round(sum(pow2_Size)::numeric/1024/1024/1024,3) as pow2_size, round(sum(case when maxalign_size=pow2_size then pow2_size*2 else pow2_size end)::numeric/1024/1024/1024,3) as method1, round(sum(case when maxalign_size=pow2_size then pow2_size+8 else pow2_size end)::numeric/1024/1024/1024,3) as method2 from memstats where pow2_size > 0; pow2_size | method1 | method2 ---+-+- 10.194 | 16.382 | 10.463 if we did just add on an extra 8 bytes (or or MAXALIGN(size+1) at least), then that would take the size up to 10.5GB. David
Re: Comma-separated predicates in simple CASE expressions (f263)
> On 31 Aug 2022, at 00:20, Tom Lane wrote: > > Daniel Gustafsson writes: >> I was looking at F263 from the SQL standard, Comma-separated predicates in >> simple CASE expression, and thinking if we could support this within the >> framework we already have at a minimal added cost. The attached sketch diff >> turns each predicate in the list into a CaseWhen node and uses the location >> from parsing for grouping in errorhandling for searched case. > >> Is this a viable approach or am I missing something obvious? Thanks for looking! > I don't particularly like duplicating the THEN clause multiple times. > I think if we're going to do this we should do it right, and that > means a substantially larger patch to propagate the notion of multiple > comparison values all the way down. Fair enough, I think that's doable without splitting the simple and searched case in the parser which I think would be a good thing to avoid. I'll take a stab at it. > I also don't care for the bit in transformCaseExpr where you seem > to be relying on subexpression location fields to make semantic > decisions. Surely there's a better way. If we group the predicates such a single node contains the full list then we'll have all the info we need at that point. -- Daniel Gustafsson https://vmware.com/
Re: introduce bufmgr hooks
Thanks for taking a look. On Tue, Aug 30, 2022 at 01:02:20PM +0900, Kyotaro Horiguchi wrote: > smgr is an abstract interface originally intended to allow to choose > one implementation among several (though cannot dynamically). Even > though the patch intends to replace specific (but most of all) uses of > the smgrread/write, still it sounds somewhat strange to me to add > hooks to replace smgr functions in that respect. I'm not sure whether > we still regard smgr as just an interface, though.. I suspect that it's probably still worthwhile to provide such hooks so that you don't have to write an entire smgr implementation. But I think you bring up a good point. > As for the names, bufmgr_read_hook looks like as if it is additionally > called when the normal operation performed by smgrread completes, or > just before. (planner_hook already doesn't sounds so for me, though:p) > "bufmgr_alt_smgrread" works for me but I'm not sure it is following > the project policy. Yeah, the intent is for this hook to replace the smgrread() call (although it might end up calling smgrread()). I debated having this hook return whether smgrread() needs to be called. Would that address your concern? > I think that the INSTR_* section should enclose the hook call as it is > still an I/O operation in the view of the core. Okay, will do. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Comma-separated predicates in simple CASE expressions (f263)
Daniel Gustafsson writes: > I was looking at F263 from the SQL standard, Comma-separated predicates in > simple CASE expression, and thinking if we could support this within the > framework we already have at a minimal added cost. The attached sketch diff > turns each predicate in the list into a CaseWhen node and uses the location > from parsing for grouping in errorhandling for searched case. > Is this a viable approach or am I missing something obvious? I don't particularly like duplicating the THEN clause multiple times. I think if we're going to do this we should do it right, and that means a substantially larger patch to propagate the notion of multiple comparison values all the way down. I also don't care for the bit in transformCaseExpr where you seem to be relying on subexpression location fields to make semantic decisions. Surely there's a better way. regards, tom lane
Comma-separated predicates in simple CASE expressions (f263)
I was looking at F263 from the SQL standard, Comma-separated predicates in simple CASE expression, and thinking if we could support this within the framework we already have at a minimal added cost. The attached sketch diff turns each predicate in the list into a CaseWhen node and uses the location from parsing for grouping in errorhandling for searched case. Is this a viable approach or am I missing something obvious? -- Daniel Gustafsson https://vmware.com/ f263_case_list.diff Description: Binary data
Re: replacing role-level NOINHERIT with a grant-level option
On Tue, Aug 30, 2022 at 08:33:46AM -0400, Robert Haas wrote: > Third try. Now that I have this grantor stuff fresh in my mind, this patch looks good to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: replacing role-level NOINHERIT with a grant-level option
On Tue, Aug 30, 2022 at 03:24:56PM -0400, Robert Haas wrote: > - william => charles => elizabeth => uk is OK because the first two > hops have ADMIN and the last hop has INHERIT Don't you mean the first two hops have INHERIT and the last has ADMIN? > - william => charles => elizabeth => uk => parliament is probably not > OK because although the first two hops have ADMIN and the last has > INHERIT, the third hop probably lacks INHERIT Same here. I don't mean to be pedantic, I just want to make sure I'm thinking of this correctly. > I hope this makes it clearer. select_best_grantor() can't completely > disregard links without INHERIT, because if it does, it can't pay any > attention to the last grant in the chain, which only needs to have > ADMIN, not INHERIT. But it must pay some attention to them, because > every earlier link in the chain does need to have INHERIT. By moving > the if-test up, the patch makes it behave just that way, or at least, > I think it does. Yes, this is very helpful. I always appreciate your detailed examples. I think what you are describing matches the mental model I was beginning to form. Okay, now to take a closer look at the patch... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Slight refactoring of state check in pg_upgrade check_ function
On Sun, Aug 28, 2022 at 03:06:09PM -0700, Nathan Bossart wrote: > On Sun, Aug 28, 2022 at 10:42:24PM +0200, Daniel Gustafsson wrote: > > I noticed that the pg_upgrade check_ functions were determining failures > > found > > in a few different ways. Some keep a boolen flag variable, and some (like > > check_for_incompatible_polymorphics) check the state of the script > > filehandle > > which is guaranteed to be set (with the error message referring to the path > > of > > said file). Others like check_loadable_libraries only check the flag > > variable > > and fclose the handle assuming it was opened. > > > > The attached diff changes the functions to do it consistently in one way, by > > checking the state of the filehandle. Since we are referring to the file by > > path in the printed error message it seemed the cleanest approach, and it > > saves > > a few lines of code without IMO reducing readability. > > > > There is no change in functionality, just code consistency. > > The patch looks reasonable to me. +1. Those checks have accumulated over time with different authors, hence the stylistic differences. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: New strategies for freezing, advancing relfrozenxid early
On Tue, Aug 30, 2022 at 11:11 AM Jeff Davis wrote: > The solution involves more changes to the philosophy and mechanics of > vacuum than I would expect, though. For instance, VM snapshotting, > page-level-freezing, and a cost model all might make sense, but I don't > see why they are critical for solving the problem above. I certainly wouldn't say that they're critical. I tend to doubt that I can be perfectly crisp about what the exact relationship is between each component in isolation and how it contributes towards addressing the problems we're concerned with. > I think I'm > still missing something. My mental model is closer to the bgwriter and > checkpoint_completion_target. That's not a bad starting point. The main thing that that mental model is missing is how the timeframes work with VACUUM, and the fact that there are multiple timeframes involved (maybe the system's vacuuming work could be seen as having one timeframe at the highest level, but it's more of a fractal picture overall). Checkpoints just don't take that long, and checkpoint duration has a fairly low variance (barring pathological performance problems). You only have so many buffers that you can dirty, too -- it's a self-limiting process. This is even true when (for whatever reason) the checkpoint_completion_target logic just doesn't do what it's supposed to do. There is more or less a natural floor on how bad things can get, so you don't have to invent a synthetic floor at all. LSM-based DB systems like the MyRocks storage engine for MySQL don't use checkpoints at all -- the closest analog is compaction, which is closer to a hybrid of VACUUM and checkpointing than anything else. The LSM compaction model necessitates adding artificial throttling to keep the system stable over time [1]. There is a disconnect between the initial ingest of data, and the compaction process. And so top-down modelling of costs and benefits with compaction is more natural with an LSM [2] -- and not a million miles from the strategy stuff I'm proposing. > Allow me to make a naive counter-proposal (not a real proposal, just so > I can better understand the contrast with your proposal): > I know there would still be some problem cases, but to me it seems like > we solve 80% of the problem in a couple dozen lines of code. It's not that this statement is wrong, exactly. It's that I believe that it is all but mandatory for me to ameliorate the downside that goes with more eager freezing, for example by not doing it at all when it doesn't seem to make sense. I want to solve the big problem of freeze debt, without creating any new problems. And if I should also make things in adjacent areas better too, so much the better. Why stop at a couple of dozens of lines of code? Why not just change the default of vacuum_freeze_min_age and vacuum_multixact_freeze_min_age to 0? > a. Can you clarify some of the problem cases, and why it's worth > spending more code to fix them? For one thing if we're going to do a lot of extra freezing, we really want to "get credit" for it afterwards, by updating relfrozenxid to reflect the new oldest extant XID, and so avoid getting an antiwraparound VACUUM early, in the near future. That isn't strictly true, of course. But I think that we at least ought to have a strong bias in the direction of updating relfrozenxid, having decided to do significantly more freezing in some particular VACUUM operation. > b. How much of your effort is groundwork for related future > improvements? If it's a substantial part, can you explain in that > larger context? Hard to say. It's true that the idea of VM snapshots is quite general, and could have been introduced in a number of different ways. But I don't think that that should count against it. It's also not something that seems contrived or artificial -- it's at least as good of a reason to add VM snapshots as any other I can think of. Does it really matter if this project is the freeze debt project, or the VM snapshot project? Do we even need to decide which one it is right now? > c. Can some of your patches be separated into independent discussions? > For instance, patch 1 has been discussed in other threads and seems > independently useful, and I don't see the current work as dependent on > it. I simply don't know if I can usefully split it up just yet. > Patch 4 also seems largerly independent. Patch 4 directly compensates for a problem created by the earlier patches. The patch series as a whole isn't supposed to amerliorate the problem of MultiXacts being allocated in VACUUM. It only needs to avoid making the situation any worse than it is today IMV (I suspect that the real fix is to make the VACUUM FREEZE command not tune vacuum_freeze_min_age). > d. Can you help give me a sense of scale of the problems solved by > visibilitymap snapshots and the cost model? Do those need to be in v1? I'm not sure. I think that having certainty that we'll be able to scan only so many pages
Re: First draft of the PG 15 release notes
On Sat, Aug 27, 2022 at 04:03:02PM +0200, Matthias van de Meent wrote: > Hi, > > I noticed a stray "DETAILS?" marker while going through the release > notes for 15. Is that subsection still under construction or review? > > > > > > > Record and check the collation of each > linkend="sql-createdatabase">database (Peter Eisentraut) > > [...] > > to match the operating system collation version. DETAILS? > > > > Good question --- the full text is: Record and check the collation of each database (Peter Eisentraut) This is designed to detect collation mismatches to avoid data corruption. Function pg_database_collation_actual_version() reports the underlying operating system collation version, and ALTER DATABASE ... REFRESH sets the database to match the operating system collation version. DETAILS? I just can't figure out what the user needs to understand about this, and I understand very little of it. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
PostgreSQL 15 release announcement draft
Hi, Please see the first draft for the PostgreSQL 15 release announcement. This is the announcement that goes out when we ship 15.0. A few notes on the first draft: 1. I have not put in any links yet -- I want to ensure the document is close to being static before I add those in. 2. I have left in a blurb about SQL/JSON while awaiting the decision on if the feature is included in v15. Please provide feedback no later than 2022-09-10 0:00 AoE[1]. After this date, we will begin assembling the presskit that includes the translations. Thanks, Jonathan [1] https://en.wikipedia.org/wiki/Anywhere_on_Earth The PostgreSQL Global Development Group today announced the release of [PostgreSQL 15](https://www.postgresql.org/docs/15/release-15.html), the latest version of the world’s [most advanced open source database](https://www.postgresql.org/). PostgreSQL 15 builds on the performance improvements of recent releases with noticeable gains for managing workloads in both local and distributed deployments, including improved sorting. This release improves the developer experience with the addition of the popular `MERGE` command, and adds more capabilities for observing the state of the database. [PostgreSQL](https://www.postgresql.org), an innovative data management system known for its reliability and robustness, benefits from over 25 years of open source development from a [global developer community](https://www.postgresql.org/community/) and has become the preferred open source relational database for organizations of all sizes. ### Improved Sort Performance and Compression In this latest release, PostgreSQL improves on its in-memory and on-disk sorting algorithms, with benchmarks showing speedups of 25% - 400% based on sort types. Using `row_number()`, `rank()`, and `count()` as window functions also have performance benefits in PostgreSQL 15, and queries using `SELECT DISTINCT` can now be executed in parallel. Building on work from the previous PostgreSQL release for allowing async remote queries, the PostgreSQL foreign data wrapper, `postgres_fdw`, can now commit transactions in parallel. The performance improvements in PostgreSQL 15 extend to its archiving and backup facilities. PostgreSQL 15 adds support for LZ4 and Zstandard (zstd) compression to write-ahead log (WAL) files, which can have both space and performance benefits for certain workloads. On certain operating systems, PostgreSQL 15 supports the ability to prefetch WAL file contents and speed up recovery times. PostgreSQL's built-in backup command, `pg_basebackup`, now supports server-side compression of backup files with a choice of gzip, LZ4, and zstd. PostgreSQL 15 includes the ability to use custom modules for archiving, which eliminates the overhead of using a shell command. ### Expressive Developer Features PostgreSQL 15 includes the SQL standard `MERGE` command. `MERGE` lets you write conditional SQL statements that include `INSERT`, `UPDATE`, and `DELETE` actions within a single statement. PostgreSQL 15 expands on its support for the SQL/JSON standard, including syntax for supporting JSON constructors, introspection functions, and the ability to convert JSON data into a table using the `JSON_TABLE` function. This latest release adds new functions for using regular expressions to inspect strings: `regexp_count()`, `regexp_instr()`, `regexp_like()`, and `regexp_substr()`. PostgreSQL 15 also extends the `range_agg` function to aggregate `multirange` data types, which were introduced in the previous release. PostgreSQL 15 lets user create views that query data using the permissions of the caller, not the view creator. This option, called `security_invoker`, adds an additional layer of protection to ensure view callers have the correct permissions for working with the underlying data. ### More Options with Logical Replication PostgreSQL 15 provides more flexibility for managing logical replication. This release introduces row and column filtering for publishers, letting users choose to replicate a subset of data from a table. PostgreSQL 15 adds features to simplify conflict management, including the ability to skip replaying a conflicting transaction and to automatically disable a subscription if an error is detected. This release also includes support for using two-phase commit (2PC) with logical replication. ### Logging and Configuration Enhancements PostgreSQL 15 introduces a new logging format: `jsonlog`. This new format outputs log data using a defined JSON structure, which allows PostgreSQL logs to be processed in structured logging systems. This release also adds a new built-in extension, `pg_walinspect`, that lets users inspect the contents of write-ahead log files directly from a SQL interface. PostgreSQL 15 gives database administrators more flexibility in how users can manage PostgreSQL configuration, adding the ability to grant users permission to alter server-level configuration parameters.
Re: replacing role-level NOINHERIT with a grant-level option
On Tue, Aug 30, 2022 at 1:30 PM Nathan Bossart wrote: > I've been staring at this stuff for a while, and I'm still rather confused. > > * You mention that you modelled the "new" function select_best_grantor() on > is_admin_of_role(), but it looks like that function was introduced in 2005. > IIUC you meant select_best_admin(), which was added much more recently. Well, a combination of the two. It's modeled after is_admin_of_role() in the sense that it also calls roles_is_member_of() with a non-InvalidOid third argument and a non-NULL fourth argument. All other callers pass InvalidOid/NULL. > * I don't see why we should ignore inheritance until after checking admin > option. Prior to your commit (e3ce2de), we skipped checking for admin > option if the role had [NO]INHERIT, and AFAICT your commit aimed to retain > that behavior. The comment above check_role_grantor() even mentions > "inheriting the privileges of a role which does have ADMIN OPTION." Within > the function, I see the following comment: > > * Otherwise, the grantor must either have ADMIN OPTION on > the role or > * inherit the privileges of a role which does. In the former > case, > * record the grantor as the current user; in the latter, > pick one of > * the roles that is "most directly" inherited by the current > role > * (i.e. fewest "hops"). > > So, IIUC, the intent is for ADMIN OPTION to apply even if for non-inherited > grants, but we still choose to recurse in roles_is_member_of() based on > inheritance. This means that you can inherit the ADMIN OPTION to some > degree, but if you have membership WITH INHERIT FALSE to a role that has > ADMIN OPTION on another role, the current role effectively does not have > ADMIN OPTION on the other role. Is this correct? I think it's easiest if I get an example. Suppose role 'william' inherits 'charles' and 'diana' and role 'charles' in turn inherits from roles 'elizabeth' and 'philip'. Furthermore, 'william' has ADMIN OPTION on 'cambridge', 'charles' on 'wales', and 'elizabeth' on 'uk'. Now, because 'william' has ADMIN OPTION on role 'cambridge', he can add members to that role and remove the ones he added. He can also grant admin out option to others and later remove it (and all dependent grants that those others have made). When he does this, he is acting as himself, on his own authority. Because he inherits the privileges of charles directly and of elizabeth via charles, he can also add members to wales and uk. But if does this, he is not acting as himself, because he himself does not possess the ADMIN OPTION on either of those roles. Rather, he is acting on authority delegated from some other role which does possess admin option on those roles. Therefore, if 'william' adds a member to 'uk', the grantor MUST be recorded as 'elizabeth', not 'william', because the grantor of record must possess admin option on the role. Now let's add another layer of complexity. Suppose that the 'uk' role possesses ADMIN OPTION on some other role, let's say 'parliament'. Can 'william' use the fact that he inherits from 'elizabeth' to grant membership in 'parliament'? That all depends on whether 'uk' was granted to 'elizabeth' WITH INHERIT TRUE or WITH INHERIT FALSE. If it was granted WITH INHERIT TRUE, then elizabeth freely exercises all the powers of parliament, william freely exercises all of elizabeth's powers, and thus william can grant membership in parliament. Such a membership will be recorded as having been granted by uk, the role that actually holds ADMIN OPTION on parliament. However, if elizabeth was granted uk WITH INHERIT FALSE, then she can't mess with the membership of parliament, and thus neither can william. At least, not without first executing "SET ROLE uk", which in this scenario, either could do, but perhaps the use of SET ROLE is logged, audited, and very carefully scrutinized. So, what does all of this mean for select_best_grantor() and its helper function roles_is_member_of()? Well, first, we have to recurse. william can act on his own behalf when administering cambridge, and on behalf of charles or elizabeth when administering wales or uk respectively, and there's no way to get that behavior without recursing. Second, we have to stop the recursion when we reach a grant that is not inherited. Otherwise, william might be able to act on behalf of uk and administer membership in parliament even if uk is granted to elizabeth WITH INHERIT FALSE. Third, and this is where it gets tricky, we have to stop recursion *only after checking whether we've found a valid grantor*. william is allowed to administer membership cambridge whether or not it is granted to william WITH INHERIT TRUE or WITH INHERIT FALSE. Either way, he possess ADMIN OPTION on that role, and may administer membership in it. Likewise, he can administer membership in wales or uk as charles or elizabeth, respectively, because he
Re: \pset xheader_width page as default? (Re: very long record lines in expanded psql output)
On 2022-08-30 Tu 10:55, Pavel Stehule wrote: > > > út 30. 8. 2022 v 16:49 odesílatel Pavel Stehule > napsal: > > > > út 30. 8. 2022 v 16:36 odesílatel Christoph Berg > napsal: > > Re: Pavel Stehule > > pspg requires all lines to have the same width. It can do > some corrections > > - but it is hard to detect wanted differences or just plain > text format. > > > > can be nice to have the first invisible row with some > information about > > used formatting. pspg does some heuristic but this code is > not nice and it > > is fragile. > > I like pspg and use it myself, but I don't think a tool that > does the > right thing by hiding a full screen of from the user should > hinder making the same progress in psql with a simple pager. > > > ASCII allows to set some metadata, that should be invisible in all > correctly implemented pagers. > > > or these parameters can be sent by pager's command line or via some > environment variable. Currently there are only two pagers on the world > that support tabular format, and both are created against psql (pspg > and ov), so we can define our own protocol. Surely - pspg will have > heuristic forever, because I want to support psql, mysql and many > others. But it can be fine to switch to some more robust mode. It can > be interesting for continuous load via pipe. > I'm somewhat sympathetic to Christoph's position. Surely pspg could itself issue \pset xheader_width full at the start of a session. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Postmaster self-deadlock due to PLT linkage resolution
Hi, On 2022-08-30 14:32:26 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-08-30 14:07:41 -0400, Tom Lane wrote: > >> Do we want to install this just for NetBSD, or more widely? > >> I think we'd better back-patch it for NetBSD, so I'm inclined > >> to be conservative about the change. > > > It's likely a good idea to enable it everywhere applicable, but I agree that > > we shouldn't unnecessarily do so in the backbranches. So I'd be inclined to > > add it to the netbsd template for the backbranches. > > > For HEAD I can see putting it into all the applicable templates, adding an > > AC_LINK_IFELSE() test, or just putting it into the meson stuff. > > For the moment I'll stick it into the netbsd template. Cool. > I'm not on board with having the meson stuff generating different > executables than the Makefiles do, so if someone wants to propose applying > this widely, they'll need to fix both. Seems like that is a good thing to > consider after the meson patches land. We don't need unnecessary churn in > that area before that. Yea, I didn't like that idea either, hence listing it last... Greetings, Andres Freund
Re: Tracking last scan time
On Fri, Aug 26, 2022 at 02:05:36PM +0100, Dave Page wrote: > On Thu, 25 Aug 2022 at 01:44, David Rowley wrote: > I don't have a particular opinion about the patch, I'm just pointing > out that there are other ways. Even just writing down the numbers on a > post-it note and coming back in a month to see if they've changed is > enough to tell if the table or index has been used. > > > There are usually other ways to perform monitoring tasks, but there is > something to be said for the convenience of having functionality built in and > not having to rely on tools, scripts, or post-it notes :-) Should we consider using something cheaper like time() so we don't need a GUC to enable this? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: Hash index build performance tweak from sorting
>It's a shame you only see 3%, but that's still worth it. Hi, I ran this test here: DROP TABLE hash_speed; CREATE unlogged TABLE hash_speed (x integer); INSERT INTO hash_speed SELECT random()*1000 FROM generate_series(1,1000) x; VACUUM Timing is on. CREATE INDEX ON hash_speed USING hash (x); head: Time: 20526,490 ms (00:20,526) attached patch (v3): Time: 18810,777 ms (00:18,811) I can see 9%, with the patch (v3) attached. This optimization would not apply in any way also to _hash_pgaddmultitup? regards, Ranier Vilela diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c index c361509d68..a68eb40b2b 100644 --- a/src/backend/access/hash/hash.c +++ b/src/backend/access/hash/hash.c @@ -39,6 +39,7 @@ typedef struct HSpool *spool; /* NULL if not using spooling */ double indtuples; /* # tuples accepted into index */ Relation heapRel; /* heap relation descriptor */ + HashInsertState istate; /* insert state */ } HashBuildState; static void hashbuildCallback(Relation index, @@ -118,6 +119,7 @@ hashbuild(Relation heap, Relation index, IndexInfo *indexInfo) uint32 num_buckets; long sort_threshold; HashBuildState buildstate; + HashInsertStateData insertstate; /* * We expect to be called exactly once for any index relation. If that's @@ -158,13 +160,20 @@ hashbuild(Relation heap, Relation index, IndexInfo *indexInfo) sort_threshold = Min(sort_threshold, NLocBuffer); if (num_buckets >= (uint32) sort_threshold) - buildstate.spool = _h_spoolinit(heap, index, num_buckets); + { + insertstate.sorted = true; + buildstate.spool = _h_spoolinit(heap, index, num_buckets, (HashInsertState) ); + } else + { + insertstate.sorted = false; buildstate.spool = NULL; + } /* prepare to build the index */ - buildstate.indtuples = 0; buildstate.heapRel = heap; + buildstate.indtuples = 0; + buildstate.istate = (HashInsertState) /* do the heap scan */ reltuples = table_index_build_scan(heap, index, indexInfo, true, true, @@ -231,7 +240,7 @@ hashbuildCallback(Relation index, itup = index_form_tuple(RelationGetDescr(index), index_values, index_isnull); itup->t_tid = *tid; - _hash_doinsert(index, itup, buildstate->heapRel); + _hash_doinsert(index, itup, buildstate->heapRel, buildstate->istate); pfree(itup); } @@ -254,6 +263,7 @@ hashinsert(Relation rel, Datum *values, bool *isnull, Datum index_values[1]; bool index_isnull[1]; IndexTuple itup; + HashInsertStateData istate; /* convert data to a hash key; on failure, do not insert anything */ if (!_hash_convert_tuple(rel, @@ -265,7 +275,9 @@ hashinsert(Relation rel, Datum *values, bool *isnull, itup = index_form_tuple(RelationGetDescr(rel), index_values, index_isnull); itup->t_tid = *ht_ctid; - _hash_doinsert(rel, itup, heapRel); + istate.sorted = false; + + _hash_doinsert(rel, itup, heapRel, (HashInsertState) ); pfree(itup); diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c index 4f2fecb908..4d17c841c0 100644 --- a/src/backend/access/hash/hashinsert.c +++ b/src/backend/access/hash/hashinsert.c @@ -34,7 +34,7 @@ static void _hash_vacuum_one_page(Relation rel, Relation hrel, * and hashinsert. By here, itup is completely filled in. */ void -_hash_doinsert(Relation rel, IndexTuple itup, Relation heapRel) +_hash_doinsert(Relation rel, IndexTuple itup, Relation heapRel, HashInsertState istate) { Buffer buf = InvalidBuffer; Buffer bucket_buf; @@ -198,7 +198,7 @@ restart_insert: START_CRIT_SECTION(); /* found page with enough space, so add the item here */ - itup_off = _hash_pgaddtup(rel, buf, itemsz, itup); + itup_off = _hash_pgaddtup(rel, buf, itemsz, itup, istate->sorted); MarkBufferDirty(buf); /* metapage operations */ @@ -266,18 +266,28 @@ restart_insert: * page are sorted by hashkey value. */ OffsetNumber -_hash_pgaddtup(Relation rel, Buffer buf, Size itemsize, IndexTuple itup) +_hash_pgaddtup(Relation rel, Buffer buf, Size itemsize, IndexTuple itup, bool sorted) { OffsetNumber itup_off; Page page; - uint32 hashkey; _hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE); page = BufferGetPage(buf); - /* Find where to insert the tuple (preserving page's hashkey ordering) */ - hashkey = _hash_get_indextuple_hashkey(itup); - itup_off = _hash_binsearch(page, hashkey); + /* + * Find where to insert the tuple (preserving page's hashkey ordering). + * If the input is already sorted by hashkey, then we can avoid the + * binary search and just add it to the end of the page. + */ + if (sorted) + itup_off = PageGetMaxOffsetNumber(page) + 1; + else + { + uint32 hashkey; + + hashkey = _hash_get_indextuple_hashkey(itup); + itup_off = _hash_binsearch(page, hashkey); + } if (PageAddItem(page, (Item) itup, itemsize, itup_off, false, false) == InvalidOffsetNumber) @@ -300,7 +310,6 @@ void _hash_pgaddmultitup(Relation rel, Buffer buf, IndexTuple *itups,
Re: Postmaster self-deadlock due to PLT linkage resolution
Andres Freund writes: > On 2022-08-30 14:07:41 -0400, Tom Lane wrote: >> Do we want to install this just for NetBSD, or more widely? >> I think we'd better back-patch it for NetBSD, so I'm inclined >> to be conservative about the change. > It's likely a good idea to enable it everywhere applicable, but I agree that > we shouldn't unnecessarily do so in the backbranches. So I'd be inclined to > add it to the netbsd template for the backbranches. > For HEAD I can see putting it into all the applicable templates, adding an > AC_LINK_IFELSE() test, or just putting it into the meson stuff. For the moment I'll stick it into the netbsd template. I'm not on board with having the meson stuff generating different executables than the Makefiles do, so if someone wants to propose applying this widely, they'll need to fix both. Seems like that is a good thing to consider after the meson patches land. We don't need unnecessary churn in that area before that. regards, tom lane
Re: Postmaster self-deadlock due to PLT linkage resolution
Hi, On 2022-08-30 14:07:41 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-08-30 13:24:39 -0400, Tom Lane wrote: > >> Andres Freund writes: > >>> Perhaps it'd be saner to default to building with -Wl,-z,now? That should > >>> fix > >>> the problem too, right (and if we combine it with relro, it'd be a > >>> security > >>> improvement to boot). > > >> Hm. Not sure if that works on NetBSD, but I'll check it out. > > > FWIW, it's a decently (well over 10 years) old thing I think. And it's > > documented in > > the netbsd ld manpage and their packaging guide (albeit indirectly, with > > their > > tooling doing the work of specifying the flags): > > https://www.netbsd.org/docs/pkgsrc/hardening.html#hardening.audit.relrofull > > It does appear that they use GNU ld, and I've just finished confirming > that each of those switches has the expected effects on my PPC box. > So yeah, this looks like a better answer. Cool. > Do we want to install this just for NetBSD, or more widely? > I think we'd better back-patch it for NetBSD, so I'm inclined > to be conservative about the change. It's likely a good idea to enable it everywhere applicable, but I agree that we shouldn't unnecessarily do so in the backbranches. So I'd be inclined to add it to the netbsd template for the backbranches. For HEAD I can see putting it into all the applicable templates, adding an AC_LINK_IFELSE() test, or just putting it into the meson stuff. Greetings, Andres Freund
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Hi hackers, > Here is v3 with silenced compiler warnings. Some more warnings were reported by cfbot, so here is v4. Apologies for the noise. -- Best regards, Aleksander Alekseev v4-0001-Fix-incorrect-uses-of-Datum-conversion-macros.patch Description: Binary data v4-0002-Convert-GetDatum-and-DatumGet-macros-to-inline-fu.patch Description: Binary data
Re: New strategies for freezing, advancing relfrozenxid early
On Thu, 2022-08-25 at 14:21 -0700, Peter Geoghegan wrote: > Attached patch series is a completely overhauled version of earlier > work on freezing. Related work from the Postgres 15 cycle became > commits 0b018fab, f3c15cbe, and 44fa8488. > > Recap > = > > The main high level goal of this work is to avoid painful, disruptive > antiwraparound autovacuums (and other aggressive VACUUMs) that do way > too much "catch up" freezing, all at once I agree with the motivation: that keeping around a lot of deferred work (unfrozen pages) is risky, and that administrators would want a way to control that risk. The solution involves more changes to the philosophy and mechanics of vacuum than I would expect, though. For instance, VM snapshotting, page-level-freezing, and a cost model all might make sense, but I don't see why they are critical for solving the problem above. I think I'm still missing something. My mental model is closer to the bgwriter and checkpoint_completion_target. Allow me to make a naive counter-proposal (not a real proposal, just so I can better understand the contrast with your proposal): * introduce a reloption unfrozen_pages_target (default -1, meaning infinity, which is the current behavior) * introduce two fields to LVRelState: n_pages_frozen and delay_skip_count, both initialized to zero * when marking a page frozen: n_pages_frozen++ * when vacuum begins: if (unfrozen_pages_target >= 0 && current_unfrozen_page_count > unfrozen_pages_target) { vacrel->delay_skip_count = current_unfrozen_page_count - unfrozen_pages_target; /* ?also use more aggressive freezing thresholds? */ } * in lazy_scan_skip(), have a final check: if (vacrel->n_pages_frozen < vacrel->delay_skip_count) { break; } I know there would still be some problem cases, but to me it seems like we solve 80% of the problem in a couple dozen lines of code. a. Can you clarify some of the problem cases, and why it's worth spending more code to fix them? b. How much of your effort is groundwork for related future improvements? If it's a substantial part, can you explain in that larger context? c. Can some of your patches be separated into independent discussions? For instance, patch 1 has been discussed in other threads and seems independently useful, and I don't see the current work as dependent on it. Patch 4 also seems largerly independent. d. Can you help give me a sense of scale of the problems solved by visibilitymap snapshots and the cost model? Do those need to be in v1? Regards, Jeff Davis
Re: Postmaster self-deadlock due to PLT linkage resolution
Andres Freund writes: > On 2022-08-30 13:24:39 -0400, Tom Lane wrote: >> Andres Freund writes: >>> Perhaps it'd be saner to default to building with -Wl,-z,now? That should >>> fix >>> the problem too, right (and if we combine it with relro, it'd be a security >>> improvement to boot). >> Hm. Not sure if that works on NetBSD, but I'll check it out. > FWIW, it's a decently (well over 10 years) old thing I think. And it's > documented in > the netbsd ld manpage and their packaging guide (albeit indirectly, with their > tooling doing the work of specifying the flags): > https://www.netbsd.org/docs/pkgsrc/hardening.html#hardening.audit.relrofull It does appear that they use GNU ld, and I've just finished confirming that each of those switches has the expected effects on my PPC box. So yeah, this looks like a better answer. Do we want to install this just for NetBSD, or more widely? I think we'd better back-patch it for NetBSD, so I'm inclined to be conservative about the change. regards, tom lane
Re: Postmaster self-deadlock due to PLT linkage resolution
Hi, On 2022-08-30 13:24:39 -0400, Tom Lane wrote: > Andres Freund writes: > > Perhaps it'd be saner to default to building with -Wl,-z,now? That should > > fix > > the problem too, right (and if we combine it with relro, it'd be a security > > improvement to boot). > > Hm. Not sure if that works on NetBSD, but I'll check it out. FWIW, it's a decently (well over 10 years) old thing I think. And it's documented in the netbsd ld manpage and their packaging guide (albeit indirectly, with their tooling doing the work of specifying the flags): https://www.netbsd.org/docs/pkgsrc/hardening.html#hardening.audit.relrofull Greetings, Andres Freund
Re: replacing role-level NOINHERIT with a grant-level option
On Mon, Aug 29, 2022 at 03:38:57PM -0400, Robert Haas wrote: > Argh, I found a bug, and one that I should have caught during testing, > too. I modelled the new function select_best_grantor() on > is_admin_of_role(), but it differs in that it calls > roles_is_member_of() with ROLERECURSE_PRIVS rather than > ROLECURSE_MEMBERS. Sadly, roles_is_member_of() handles > ROLERECURSE_PRIVS by completely ignoring non-inherited grants, which > is wrong, because then calls to select_best_grantor() treat a member > of a role with INHERIT FALSE, ADMIN TRUE is if they were not an admin > at all, which is incorrect. I've been staring at this stuff for a while, and I'm still rather confused. * You mention that you modelled the "new" function select_best_grantor() on is_admin_of_role(), but it looks like that function was introduced in 2005. IIUC you meant select_best_admin(), which was added much more recently. * I don't see why we should ignore inheritance until after checking admin option. Prior to your commit (e3ce2de), we skipped checking for admin option if the role had [NO]INHERIT, and AFAICT your commit aimed to retain that behavior. The comment above check_role_grantor() even mentions "inheriting the privileges of a role which does have ADMIN OPTION." Within the function, I see the following comment: * Otherwise, the grantor must either have ADMIN OPTION on the role or * inherit the privileges of a role which does. In the former case, * record the grantor as the current user; in the latter, pick one of * the roles that is "most directly" inherited by the current role * (i.e. fewest "hops"). So, IIUC, the intent is for ADMIN OPTION to apply even if for non-inherited grants, but we still choose to recurse in roles_is_member_of() based on inheritance. This means that you can inherit the ADMIN OPTION to some degree, but if you have membership WITH INHERIT FALSE to a role that has ADMIN OPTION on another role, the current role effectively does not have ADMIN OPTION on the other role. Is this correct? As I'm writing this down, I think it's beginning to make more sense to me, so this might just be a matter of under-caffeination. But perhaps some extra comments or documentation wouldn't be out of the question... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postmaster self-deadlock due to PLT linkage resolution
Andres Freund writes: > On 2022-08-29 15:43:55 -0400, Tom Lane wrote: >> The attached patch seems to fix the problem, by forcing resolution of >> the PLT link before we unblock signals. It depends on the assumption >> that another select() call appearing within postmaster.c will share >> the same PLT link, which seems pretty safe. > Hm, what stops the same problem from occuring with other functions? These few lines are the only part of the postmaster that runs with signals enabled and unblocked. > Perhaps it'd be saner to default to building with -Wl,-z,now? That should fix > the problem too, right (and if we combine it with relro, it'd be a security > improvement to boot). Hm. Not sure if that works on NetBSD, but I'll check it out. regards, tom lane
Re: Postmaster self-deadlock due to PLT linkage resolution
Hi, On 2022-08-29 15:43:55 -0400, Tom Lane wrote: > Buildfarm member mamba (NetBSD-current on prairiedog's former hardware) > has failed repeatedly since I set it up. I have now run the cause of > that to ground [1], and here's what's happening: if the postmaster > receives a signal just before it first waits at the select() in > ServerLoop, it can self-deadlock. During the postmaster's first use of > select(), the dynamic loader needs to resolve the PLT branch table entry > that the core executable uses to reach select() in libc.so, and it locks > the loader's internal data structures while doing that. If we enter > a signal handler while the lock is held, and the handler needs to do > anything that also requires the lock, the postmaster is frozen. Ick. > The attached patch seems to fix the problem, by forcing resolution of > the PLT link before we unblock signals. It depends on the assumption > that another select() call appearing within postmaster.c will share > the same PLT link, which seems pretty safe. Hm, what stops the same problem from occuring with other functions? Perhaps it'd be saner to default to building with -Wl,-z,now? That should fix the problem too, right (and if we combine it with relro, it'd be a security improvement to boot). Greetings, Andres Freund
Re: Handle infinite recursion in logical replication setup
On Mon, 29 Aug 2022 at 12:01, Peter Smith wrote: > > Here are some review comments for patch v42-0002: > > == > > 1. doc/src/sgml/logical-replication.sgml > > copy_data = true > > There are a couple of these tags where there is a trailing space > before the . I guess it is doing no harm, but it is doing no > good either, so IMO better to get rid of the space. Modified > ~~ > > 2. doc/src/sgml/logical-replication.sgml - 31.3.1. Setting replication > between two primaries > > User need to ensure that no data changes happen on table t1 on > primary1 and primary2 until the setup is completed. > > SUGGESTION > The user must ensure that no data changes happen on table t1 of > primary1 and primary2 until the setup is completed. Modified > ~~~ > > 3. doc/src/sgml/logical-replication.sgml - 31.3.2. Adding a new > primary when there is no table data on any of the primaries > > User need to ensure that no data changes happen on table t1 on all the > primaries primary1, primary2 and primary3 until the setup is > completed. > > SUGGESTION > The user must ensure that no data changes happen on table t1 of all > the primaries primary1, primary2 and primary3 until the setup is > completed. Modified > ~~~ > > 4. doc/src/sgml/logical-replication.sgml - 31.3.3. Adding a new > primary when table data is present on the existing primaries > > User need to ensure that no operations should be performed on table t1 > on primary2 and primary3 until the setup is completed. > > SUGGESTION > The user must ensure that no operations are performed on table t1 of > primary2 and primary3 until the setup is completed. > > Here also you changed the wording - "no data changes happen" versus > "no operations should be performed". Was that a deliberate difference? > Maybe they should all be consistent wording? If they are > interchangeable IMO the "no operations are performed..." wording was > the better of the two. Modified, it was not a deliberate change. I have changed it to "no operations are performed" in other places too. > ~~~ > > 5. doc/src/sgml/logical-replication.sgml - 31.3.4. Generic steps for > adding a new primary to an existing set of primaries > > Step-2: User need to ensure that no changes happen on the required > tables of the new primary until the setup is complete. > > SUGGESTION > Step-2: The user must ensure that no changes happen on the required > tables of the new primary until the setup is complete. Modified > ~~~ > > 6. > > Step-4. User need to ensure that no changes happen on the required > tables of the existing primaries except the first primary until the > setup is complete. > > SUGGESTION > Step-4. The user must ensure that no changes happen on the required > tables of the existing primaries except the first primary until the > setup is complete. Modified > ~~~ > > 7. (the example) > > 7a. > Step-2. User need to ensure that no changes happen on table t1 on > primary4 until the setup is completed. > > SUGGESTION > Step-2. The user must ensure that no changes happen on table t1 of > primary4 until the setup is completed. Modified > ~ > > 7b. > Step-4. User need to ensure that no changes happen on table t1 on > primary2 and primary3 until the setup is completed. > > SUGGESTION > Step-4. The user must ensure that no changes happen on table t1 of > primary2 and primary3 until the setup is completed. Modified Thanks for the comments, the changes for the same are available in the v43 patch attached at [1]. [1] - https://www.postgresql.org/message-id/CALDaNm1V%2BAvzPsUcq%3DmNYG%2BJfAyovTWBe4vWKTknOgH_ko1E0Q%40mail.gmail.com Regards, Vignesh
Re: Handle infinite recursion in logical replication setup
On Mon, 29 Aug 2022 at 11:59, Peter Smith wrote: > > Here are some review comments for patch v42-0001: > > == > > 1. Commit message > > A later review comment below suggests some changes to the WARNING > message so if those changes are made then the example in this commit > message also needs to be modified. Modified > == > > 2. doc/src/sgml/ref/alter_subscription.sgml - copy_data > > Refer to the Notes for the usage of true for copy_data parameter and > its interaction with the origin parameter. > > I think saying "usage of true" sounded a bit strange. > > SUGGESTION > Refer to the Notes about how copy_data = true can interact with the > origin parameter. Modified > == > > 3. doc/src/sgml/ref/create_subscription.sgml - copy data > > Refer to the Notes for the usage of true for copy_data parameter and > its interaction with the origin parameter. > > SUGGESTION > (same as #2) Modified > ~~ > > 4. doc/src/sgml/ref/create_subscription.sgml - origin > > Refer to the Notes for the usage of true for copy_data parameter and > its interaction with the origin parameter. > > SUGGESTION > (same as #2) Modified > ~~~ > > 5. doc/src/sgml/ref/create_subscription.sgml - Notes > > + > + If the subscription is created with origin = none and > + copy_data = true, it will check if the publisher has > + subscribed to the same table from other publishers and, if so, throw a > + warning to notify user to check the publisher tables. The user can ensure > > "throw a warning to notify user" -> "log a warning to notify the user" Modified > ~~ > > 6. > > + warning to notify user to check the publisher tables. The user can ensure > + that publisher tables do not have data which has an origin associated > before > + continuing with any other operations to prevent inconsistent data being > + replicated. > + > > 6a. > > I'm not so sure about this. IMO the warning is not about the > replication – really it is about the COPY which has already happened > anyway. So the user can't really prevent anything from going wrong; > instead, they have to take some action to clean up if anything did go > wrong. > > SUGGESTION > Before continuing with other operations the user should check that > publisher tables did not have data with different origins, otherwise > inconsistent data may have been copied. Modified based on the suggestion provided by Amit. > ~ > > 6b. > > I am also wondering what can the user do now. Assuming there was bad > COPY then the subscriber table has already got unwanted stuff copied > into it. Is there any advice we can give to help users fix this mess? There is nothing much a user can do in this case. Only option would be to take a backup before the operation and restore it and then recreate the replication setup. I was not sure if we should document these steps as similar inconsistent data could get created without the origin option . Thoughts? > == > > 7. src/backend/commands/subscriptioncmds.c - AlterSubscription_refresh > > + /* Check whether we can allow copy of newly added relations. */ > + check_pub_table_subscribed(wrconn, sub->publications, copy_data, > +sub->origin, subrel_local_oids, > +subrel_count); > > "whether we can allow" seems not quite the right wording here anymore, > because now there is no ERROR stopping this - so if there was unwanted > data the COPY will proceed to copy it anyhow... Removed the comment as the function details the necessary things. > ~~~ > > 8. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed > > @@ -1781,6 +1794,121 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId) > table_close(rel, RowExclusiveLock); > } > > +/* > + * Check and throw a warning if the publisher has subscribed to the same > table > + * from some other publisher. This check is required only if "copy_data = > true" > + * and "origin = none" for CREATE SUBSCRIPTION and > + * ALTER SUBSCRIPTION ... REFRESH statements to notify user that data having > + * origin might have been copied. > > "throw a warning" -> "log a warning" > > "to notify user" -> "to notify the user" ? Modified > ~~~ > > 9. > > + if (copydata == false || !origin || > + (pg_strcasecmp(origin, LOGICALREP_ORIGIN_NONE) != 0)) > + return; > > SUGGESTION > if (!copydata || !origin || > (pg_strcasecmp(origin, LOGICALREP_ORIGIN_NONE) != 0)) Modified > ~~~ > > 10. (Question) > > I can't tell just by reading the code if FOR ALL TABLES case is > handled – e.g. will this recognise the case were the publisher might > have table data from other origins because it has a subscription on > some other node that was publishing "FOR ALL TABLES"? Yes it handles it. > ~~~ > > 11. > > + ereport(WARNING, > + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("publisher has subscribed table \"%s.%s\" from some other publisher", > +nspname, relname), > + errdetail("Publisher might have subscribed one or more tables from > some other publisher."), > +
Re: SUBTRANS: Minimizing calls to SubTransSetParent()
On Thu, 11 Aug 2022 at 06:32, Dilip Kumar wrote: > > > So I still think some adjustment is required in XidInMVCCSnapdhot() > > > > That is one way to resolve the issue, but not the only one. I can also > > change AssignTransactionId() to recursively register parent xids for > > all of a subxid's parents. > > > > I will add in a test case and resolve the dependency in my next patch. > > Okay, thanks, I will look into the updated patch after you submit that. PFA two patches, replacing earlier work 001_new_isolation_tests_for_subxids.v3.patch 002_minimize_calls_to_SubTransSetParent.v8.patch 001_new_isolation_tests_for_subxids.v3.patch Adds new test cases to master without adding any new code, specifically addressing the two areas of code that are not tested by existing tests. This gives us a baseline from which we can do test driven development. I'm hoping this can be reviewed and committed fairly smoothly. 002_minimize_calls_to_SubTransSetParent.v8.patch Reduces the number of calls to subtrans below 1% for the first 64 subxids, so overall will substantially reduce subtrans contention on master for the typical case, as well as smoothing the overflow case. Some discussion needed on this; there are various options. This combines the work originally posted here with another patch posted on the thread "Smoothing the subtrans performance catastrophe". I will do some performance testing also, but more welcome. -- Simon Riggshttp://www.EnterpriseDB.com/ 002_minimize_calls_to_SubTransSetParent.v8.patch Description: Binary data 001_new_isolation_tests_for_subxids.v3.patch Description: Binary data
Re: archive modules
On Tue, Aug 30, 2022 at 09:49:20AM +0200, Benoit Lobréau wrote: > Ok done, https://commitfest.postgresql.org/39/3856/ (is that fine if I > added you as a reviewer ?) Of course. I've marked it as ready-for-committer. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Hash index build performance tweak from sorting
On Fri, 5 Aug 2022 at 20:46, David Zhang wrote: > > On 2022-08-01 8:37 a.m., Simon Riggs wrote: > > Using the above test case, I'm getting a further 4-7% improvement on > > already committed code with the attached patch, which follows your > > proposal. > > I ran two test cases: for committed patch `hash_sort_by_hash.v3.patch`, I can > see about 6 ~ 7% improvement; and after applied patch > `hash_inserted_sorted.v2.patch`, I see about ~3% improvement. All the test > results are based on 10 times average on two different machines. Thanks for testing David. It's a shame you only see 3%, but that's still worth it. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Reducing the chunk header sizes on all memory context types
David Rowley writes: > Here's a patch which adds a comment to MemoryContextMethodID to Robert's > patch. OK, but while looking at that I noticed the adjacent #define MEMORY_CONTEXT_METHODID_MASK \ UINT64CONST((1 << MEMORY_CONTEXT_METHODID_BITS) - 1) I'm rather astonished that that compiles; UINT64CONST was only ever meant to be applied to *literals*. I think what it's expanding to is ((1 << MEMORY_CONTEXT_METHODID_BITS) - 1UL) (or on some machines 1ULL) which only accidentally does approximately what you want. It'd be all right perhaps to write ((UINT64CONST(1) << MEMORY_CONTEXT_METHODID_BITS) - 1) but you might as well avoid the Postgres-ism and just write ((uint64) ((1 << MEMORY_CONTEXT_METHODID_BITS) - 1)) Nobody's ever going to make MEMORY_CONTEXT_METHODID_BITS large enough for the shift to overflow in int arithmetic. regards, tom lane
Re: Reducing the chunk header sizes on all memory context types
On Tue, Aug 30, 2022 at 11:39 AM David Rowley wrote: > On Wed, 31 Aug 2022 at 03:31, David Rowley wrote: > > I've no objections to adding a comment to the enum to > > explain to future devs. My vote would be for that and add the (int) > > cast as proposed by Robert. > > Here's a patch which adds a comment to MemoryContextMethodID to Robert's > patch. LGTM. -- Robert Haas EDB: http://www.enterprisedb.com
Re: making relfilenodes 56 bits
On Tue, Aug 30, 2022 at 8:45 AM Dilip Kumar wrote: > I have found one more issue with this approach of rewriting the > conflicting table. Earlier I thought we could do the conflict > checking and rewriting inside create_new_objects() right before the > restore command. But after implementing (while testing) this I > realized that we DROP and CREATE the database while restoring the dump > that means it will again generate the conflicting system tables. So > theoretically the rewriting should go in between the CREATE DATABASE > and restoring the object but as of now both create database and > restoring other objects are part of a single dump file. I haven't yet > analyzed how feasible it is to generate the dump in two parts, first > part just to create the database and in second part restore the rest > of the object. > > Thoughts? Well, that's very awkward. It doesn't seem like it would be very difficult to teach pg_upgrade to call pg_restore without --clean and just do the drop database itself, but that doesn't really help, because pg_restore will in any event be creating the new database. That doesn't seem like something we can practically refactor out, because only pg_dump knows what properties to use when creating the new database. What we could do is have the dump include a command like SELECT pg_binary_upgrade_move_things_out_of_the_way(some_arguments_here), but that doesn't really help very much, because passing the whole list of relfilenode values from the old database seems pretty certain to be a bad idea. The whole idea here was that we'd be able to build a hash table on the new database's system table OIDs, and it seems like that's not going to work. We could try to salvage some portion of the idea by making pg_binary_upgrade_move_things_out_of_the_way() take a more restricted set of arguments, like the smallest and largest relfilenode values from the old database, and then we'd just need to move things that overlap. But that feels pretty hit-or-miss to me as to whether it actually avoids any work, and pg_binary_upgrade_move_things_out_of_the_way() might also be annoying to write. So perhaps we have to go back to the drawing board here. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Reducing the chunk header sizes on all memory context types
On Wed, 31 Aug 2022 at 03:31, David Rowley wrote: > I've no objections to adding a comment to the enum to > explain to future devs. My vote would be for that and add the (int) > cast as proposed by Robert. Here's a patch which adds a comment to MemoryContextMethodID to Robert's patch. David diff --git a/src/include/utils/memutils_internal.h b/src/include/utils/memutils_internal.h index 9a9f52ef16..f348282a07 100644 --- a/src/include/utils/memutils_internal.h +++ b/src/include/utils/memutils_internal.h @@ -74,6 +74,9 @@ extern void SlabCheck(MemoryContext context); * MemoryContextMethodID * A unique identifier for each MemoryContext implementation which * indicates the index into the mcxt_methods[] array. See mcxt.c. + * The maximum value for this enum is constrained by + * MEMORY_CONTEXT_METHODID_MASK. If an enum value higher than that is + * required then MEMORY_CONTEXT_METHODID_BITS will need to be increased. */ typedef enum MemoryContextMethodID { diff --git a/src/include/utils/memutils_memorychunk.h b/src/include/utils/memutils_memorychunk.h index 685c177b68..2eefc138e3 100644 --- a/src/include/utils/memutils_memorychunk.h +++ b/src/include/utils/memutils_memorychunk.h @@ -159,7 +159,7 @@ MemoryChunkSetHdrMask(MemoryChunk *chunk, void *block, Assert((char *) chunk > (char *) block); Assert(blockoffset <= MEMORYCHUNK_MAX_BLOCKOFFSET); Assert(value <= MEMORYCHUNK_MAX_VALUE); - Assert(methodid <= MEMORY_CONTEXT_METHODID_MASK); + Assert((int) methodid <= MEMORY_CONTEXT_METHODID_MASK); chunk->hdrmask = (((uint64) blockoffset) << MEMORYCHUNK_BLOCKOFFSET_BASEBIT) | (((uint64) value) << MEMORYCHUNK_VALUE_BASEBIT) | @@ -175,7 +175,7 @@ static inline void MemoryChunkSetHdrMaskExternal(MemoryChunk *chunk, MemoryContextMethodID methodid) { - Assert(methodid <= MEMORY_CONTEXT_METHODID_MASK); + Assert((int) methodid <= MEMORY_CONTEXT_METHODID_MASK); chunk->hdrmask = MEMORYCHUNK_MAGIC | (((uint64) 1) << MEMORYCHUNK_EXTERNAL_BASEBIT) | methodid;
Re: Reducing the chunk header sizes on all memory context types
On Wed, 31 Aug 2022 at 01:36, Tom Lane wrote: > > David Rowley writes: > > I think the Assert is useful as if we were ever to add an enum member > > with the value of 8 and forgot to adjust MEMORY_CONTEXT_METHODID_BITS > > then bad things would happen inside MemoryChunkSetHdrMask() and > > MemoryChunkSetHdrMaskExternal(). I think it's unlikely we'll ever get > > that many MemoryContext types, but I don't know for sure and would > > rather the person who adds the 9th one get alerted to the lack of bit > > space in MemoryChunk as soon as possible. > > I think that's a weak argument, so I don't mind dropping this Assert. > What would be far more useful is a comment inside the > MemoryContextMethodID enum pointing out that we can support at most > 8 values because XYZ. I'd just sleep better knowing that we have some test coverage to ensure that MemoryChunkSetHdrMask() and MemoryChunkSetHdrMaskExternal() have some verification that we don't end up with future code that will cause the hdrmask to be invalid. I tried to make those functions as lightweight as possible. Without the Assert, I just feel that there's a bit too much trust that none of the bits overlap. I've no objections to adding a comment to the enum to explain to future devs. My vote would be for that and add the (int) cast as proposed by Robert. David
Re: Reducing the chunk header sizes on all memory context types
On Wed, 31 Aug 2022 at 03:00, Robert Haas wrote: > > On Tue, Aug 30, 2022 at 3:14 AM David Rowley wrote: > > I think the Assert is useful as if we were ever to add an enum member > > with the value of 8 and forgot to adjust MEMORY_CONTEXT_METHODID_BITS > > then bad things would happen inside MemoryChunkSetHdrMask() and > > MemoryChunkSetHdrMaskExternal(). I think it's unlikely we'll ever get > > that many MemoryContext types, but I don't know for sure and would > > rather the person who adds the 9th one get alerted to the lack of bit > > space in MemoryChunk as soon as possible. > > Well I don't have a problem with that, but I think we should try to do > it without causing compiler warnings. The attached patch fixes it for > me. I'm fine with adding the int cast. Seems like a good idea. > > As much as I'm not a fan of adding new warnings for compiler options > > that are not part of our standard set, I feel like if there are > > warning flags out there that are as giving us false warnings such as > > this one, then we shouldn't trouble ourselves trying to get rid of > > them, especially so when they force us to remove something which might > > catch a future bug. > > For me the point is that, at least on the compiler that I'm using, the > warning suggests that the compiler will optimize the test away > completely, and therefore it wouldn't catch a future bug. Could there > be compilers where no warning is generated but the assertion is still > optimized away? I'd not considered that the compiler might optimise it away. My suspicions had been more along the lines of that clang removed the enum out of range warnings because they were annoying and wrong as it's pretty easy to set an enum variable to something out of range of the defined enum values. Looking at [1], it seems like 5.0.2 is producing the correct code and it's just producing a warning. The 2nd compiler window has -Werror and shows that it does fail to compile. If I change that to use clang 6.0.0 then it works. It seems to fail all the way back to clang 3.1. clang 3.0.0 works. David [1] https://godbolt.org/z/Gx388z5Ej
Re: Stack overflow issue
I wrote: >> I think most likely we should report this to Snowball upstream >> and see what they think is an appropriate fix. > Done at [1], and I pushed the other fixes. Thanks again for the report! The upstream recommendation, which seems pretty sane to me, is to simply reject any string exceeding some threshold length as not possibly being a word. Apparently it's common to use thresholds as small as 64 bytes, but in the attached I used 1000 bytes. regards, tom lane diff --git a/src/backend/snowball/dict_snowball.c b/src/backend/snowball/dict_snowball.c index 68c9213f69..aaf4ff72b6 100644 --- a/src/backend/snowball/dict_snowball.c +++ b/src/backend/snowball/dict_snowball.c @@ -272,11 +272,25 @@ dsnowball_lexize(PG_FUNCTION_ARGS) DictSnowball *d = (DictSnowball *) PG_GETARG_POINTER(0); char *in = (char *) PG_GETARG_POINTER(1); int32 len = PG_GETARG_INT32(2); - char *txt = lowerstr_with_len(in, len); TSLexeme *res = palloc0(sizeof(TSLexeme) * 2); + char *txt; + /* + * Reject strings exceeding 1000 bytes, as they're surely not words in any + * human language. This restriction avoids wasting cycles on stuff like + * base64-encoded data, and it protects us against possible inefficiency + * or misbehavior in the stemmers (for example, the Turkish stemmer has an + * indefinite recursion so it can crash on long-enough strings). + */ + if (len <= 0 || len > 1000) + PG_RETURN_POINTER(res); + + txt = lowerstr_with_len(in, len); + + /* txt is probably not zero-length now, but we'll check anyway */ if (*txt == '\0' || searchstoplist(&(d->stoplist), txt)) { + /* empty or stopword, so reject */ pfree(txt); } else
Re: Reducing the chunk header sizes on all memory context types
On Tue, Aug 30, 2022 at 3:14 AM David Rowley wrote: > I'm not really sure either. I tried compiling with clang 12.0.1 with > -Wtautological-constant-out-of-range-compare and don't get this > warning. I have a much older clang version, it seems. clang -v reports 5.0.2. I use -Wall and -Werror as a matter of habit. It looks like 5.0.2 was released in May 2018, installed by me in November of 2019, and I just haven't had a reason to upgrade. > I think the Assert is useful as if we were ever to add an enum member > with the value of 8 and forgot to adjust MEMORY_CONTEXT_METHODID_BITS > then bad things would happen inside MemoryChunkSetHdrMask() and > MemoryChunkSetHdrMaskExternal(). I think it's unlikely we'll ever get > that many MemoryContext types, but I don't know for sure and would > rather the person who adds the 9th one get alerted to the lack of bit > space in MemoryChunk as soon as possible. Well I don't have a problem with that, but I think we should try to do it without causing compiler warnings. The attached patch fixes it for me. > As much as I'm not a fan of adding new warnings for compiler options > that are not part of our standard set, I feel like if there are > warning flags out there that are as giving us false warnings such as > this one, then we shouldn't trouble ourselves trying to get rid of > them, especially so when they force us to remove something which might > catch a future bug. For me the point is that, at least on the compiler that I'm using, the warning suggests that the compiler will optimize the test away completely, and therefore it wouldn't catch a future bug. Could there be compilers where no warning is generated but the assertion is still optimized away? I don't know, but I don't think a 4-year-old compiler is such a fossil that we shouldn't care whether it produces warnings. We worry about operating systems and PostgreSQL versions that are almost extinct in the wild, so saying we're not going to worry about failing to update the compiler regularly enough within the lifetime of one off-the-shelf MacBook does not really make sense to me. -- Robert Haas EDB: http://www.enterprisedb.com fix-memorychunk-warnings.patch Description: Binary data
Re: \pset xheader_width page as default? (Re: very long record lines in expanded psql output)
út 30. 8. 2022 v 16:49 odesílatel Pavel Stehule napsal: > > > út 30. 8. 2022 v 16:36 odesílatel Christoph Berg napsal: > >> Re: Pavel Stehule >> > pspg requires all lines to have the same width. It can do some >> corrections >> > - but it is hard to detect wanted differences or just plain text format. >> > >> > can be nice to have the first invisible row with some information about >> > used formatting. pspg does some heuristic but this code is not nice and >> it >> > is fragile. >> >> I like pspg and use it myself, but I don't think a tool that does the >> right thing by hiding a full screen of from the user should >> hinder making the same progress in psql with a simple pager. >> > > ASCII allows to set some metadata, that should be invisible in all > correctly implemented pagers. > or these parameters can be sent by pager's command line or via some environment variable. Currently there are only two pagers on the world that support tabular format, and both are created against psql (pspg and ov), so we can define our own protocol. Surely - pspg will have heuristic forever, because I want to support psql, mysql and many others. But it can be fine to switch to some more robust mode. It can be interesting for continuous load via pipe. Regards Pavel > > >> >> Christoph >> >
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Hi hackers, > Cfbot is making its mind at the moment of writing. Here is v3 with silenced compiler warnings. -- Best regards, Aleksander Alekseev v3-0002-Convert-GetDatum-and-DatumGet-macros-to-inline-fu.patch Description: Binary data v3-0001-Fix-incorrect-uses-of-Datum-conversion-macros.patch Description: Binary data
Re: \pset xheader_width page as default? (Re: very long record lines in expanded psql output)
út 30. 8. 2022 v 16:36 odesílatel Christoph Berg napsal: > Re: Pavel Stehule > > pspg requires all lines to have the same width. It can do some > corrections > > - but it is hard to detect wanted differences or just plain text format. > > > > can be nice to have the first invisible row with some information about > > used formatting. pspg does some heuristic but this code is not nice and > it > > is fragile. > > I like pspg and use it myself, but I don't think a tool that does the > right thing by hiding a full screen of from the user should > hinder making the same progress in psql with a simple pager. > ASCII allows to set some metadata, that should be invisible in all correctly implemented pagers. > > Christoph >
Re: \pset xheader_width page as default? (Re: very long record lines in expanded psql output)
Re: Pavel Stehule > pspg requires all lines to have the same width. It can do some corrections > - but it is hard to detect wanted differences or just plain text format. > > can be nice to have the first invisible row with some information about > used formatting. pspg does some heuristic but this code is not nice and it > is fragile. I like pspg and use it myself, but I don't think a tool that does the right thing by hiding a full screen of from the user should hinder making the same progress in psql with a simple pager. Christoph
Re: SQL/JSON features for v15
On 8/30/22 9:16 AM, Andrew Dunstan wrote: On 2022-08-30 Tu 06:29, Amit Langote wrote: On Tue, Aug 30, 2022 at 6:19 PM Alvaro Herrera wrote: On 2022-Aug-30, Amit Langote wrote: Patches 0001-0006: Yeah, these add the overhead of an extra function call (typin() -> typin_opt_error()) in possibly very common paths. Other than refactoring *all* places that call typin() to use the new API, the only other option seems to be to leave the typin() functions alone and duplicate their code in typin_opt_error() versions for all the types that this patch cares about. Though maybe, that's not necessarily a better compromise than accepting the extra function call overhead. I think another possibility is to create a static inline function in the corresponding .c module (say boolin_impl() in bool.c), which is called by both the opt_error variant as well as the regular one. This would avoid the duplicate code as well as the added function-call overhead. +1 Makes plenty of sense, I'll try to come up with replacements for these forthwith. The RMT had its weekly meeting today to discuss open items. As stated last week, to keep the v15 release within a late Sept / early Oct timeframe, we need to make a decision about the inclusion of SQL/JSON this week. First, we appreciate all of the effort and work that has gone into incorporating community feedback into the patches. We did note that folks working on this made a lot of progress over the past week. The RMT still has a few concerns, summarized as: 1. There is not yet consensus on the current patch proposals as we approach the end of the major release cycle 2. There is a lack of general feedback from folks who raised concerns about the implementation The RMT is still inclined to revert, but will give folks until Sep 1 0:00 AoE[1] to reach consensus on if SQL/JSON can be included in v15. This matches up to Andrew's availability timeline for a revert, and gives enough time to get through the buildfarm prior to the Beta 4 release[2]. After the deadline, if there is no consensus on how to proceed, the RMT will request that the patches are reverted. While noting that this RMT has no decision making over v16, in the event of a revert we do hope this recent work can be the basis of the feature in v16. Again, we appreciate the efforts that have gone into addressing the community feedback. Sincerely, John, Jonathan, Michael [1] https://en.wikipedia.org/wiki/Anywhere_on_Earth [2] https://www.postgresql.org/message-id/9d251aec-cea2-bc1a-5ed8-46ef0bcf6...@postgresql.org OpenPGP_signature Description: OpenPGP digital signature
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Aleksander Alekseev writes: > Maybe we should consider backporting at least 0001 patch, partially > perhaps? I believe if fixes pretty cursed pieces of code, e.g: Certainly if there are any parts of it that fix actual bugs, we ought to backport those. I'm not in a big hurry to backport cosmetic fixes though. regards, tom lane
Re: \pset xheader_width page as default? (Re: very long record lines in expanded psql output)
út 30. 8. 2022 v 15:49 odesílatel Christoph Berg napsal: > Re: Andrew Dunstan > > >> + pg_log_error("\\pset: allowed xheader_width values > are full > > >> (default), column, page, or an number specifying exact width."); > > > Committed. There were a couple of bits missing, which I filled in, and I > > changed the behaviour when the option is given without an argument, so > > that instead of resetting it the current value is shown, similarly to > > how most pset options work. > > Thanks for implementing this! This has been #1 on my PG annoyances > list since forever. I had even tried to patch it 10½ years ago, but > ever managed to get that patch to submittable quality. > > Now, is there a chance that we could make this the default? Having > psql print a whole screen of minus characters doesn't > seem very useful as default behavior. > > I see upthread that Pavel was objecting to that because pspg doesn't > understand it. But since the expanded format is a one-column output, > and the only thing pspg needs to know is the maxium line length, I'd > think pspg could just look at each line, and update the maximum as it > reads the input anyway. > pspg requires all lines to have the same width. It can do some corrections - but it is hard to detect wanted differences or just plain text format. can be nice to have the first invisible row with some information about used formatting. pspg does some heuristic but this code is not nice and it is fragile. Regards Pavel > Can we set 'page' as default? > > Christoph > > >
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
On Tue, Aug 30, 2022 at 10:25 AM Aleksander Alekseev wrote: > > Yeah, I don't see a reason to back-patch a change like this > > Maybe we should consider backporting at least 0001 patch, partially > perhaps? I believe if fixes pretty cursed pieces of code, e.g: > > ``` > pg_cryptohash_ctx *context = > -(pg_cryptohash_ctx *) PointerGetDatum(foundres); > +(pg_cryptohash_ctx *) DatumGetPointer(foundres); > ``` Sure, back-porting the bug fixes would make sense to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Tom, Robert, > Yeah, I don't see a reason to back-patch a change like this Maybe we should consider backporting at least 0001 patch, partially perhaps? I believe if fixes pretty cursed pieces of code, e.g: ``` pg_cryptohash_ctx *context = -(pg_cryptohash_ctx *) PointerGetDatum(foundres); +(pg_cryptohash_ctx *) DatumGetPointer(foundres); ``` This being said, personally I don't have a strong opinion here. After all, the code works and passes the tests. Maybe I'm just being a perfectionist here. -- Best regards, Aleksander Alekseev
Re: Reducing the chunk header sizes on all memory context types
I wrote: > So maybe we should revisit the question. It'd be worth collecting some > stats about how much extra space would be needed if we force there > to be room for a sentinel. Actually, after ingesting more caffeine, the problem with this for aset.c is that the only way to add space for a sentinel that didn't fit already is to double the space allocation. That's a little daunting, especially remembering how many places deliberately allocate power-of-2-sized arrays. You could imagine deciding that the space classifications are not power-of-2 but power-of-2-plus-one, or something like that. But that would be very invasive to the logic, and I doubt it's a good idea. regards, tom lane
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
On Tue, Aug 30, 2022 at 10:13 AM Tom Lane wrote: > Aleksander Alekseev writes: > > Just to clarify, a break in this case is going to be the fact that we > > are adding new functions, although inlined, correct? Or maybe > > something else? I'm sorry this is the first time I encounter the > > question of ABI compatibility in the context of Postgres, so I would > > appreciate it if you could elaborate a bit. > > After absorbing a bit more caffeine, I suppose that replacing a > macro with a "static inline" function would not be an ABI break, > at least not with most modern compilers, because the code should > end up the same. I'd still vote against back-patching though. > I don't think the risk-reward ratio is good, especially not for > the pre-C99 branches which don't necessarily have "inline". Yeah, I don't see a reason to back-patch a change like this, certainly not right away. If over time it turns out that the different definitions on different branches cause too many headaches, we could reconsider. However, I'm not sure that will happen, because the whole point is that the static inline functions are intended to behave in the same way as the macros, just with better type-checking. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Aleksander Alekseev writes: > Just to clarify, a break in this case is going to be the fact that we > are adding new functions, although inlined, correct? Or maybe > something else? I'm sorry this is the first time I encounter the > question of ABI compatibility in the context of Postgres, so I would > appreciate it if you could elaborate a bit. After absorbing a bit more caffeine, I suppose that replacing a macro with a "static inline" function would not be an ABI break, at least not with most modern compilers, because the code should end up the same. I'd still vote against back-patching though. I don't think the risk-reward ratio is good, especially not for the pre-C99 branches which don't necessarily have "inline". regards, tom lane
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Hi Tom, > Do you think this should be backported? >> Impossible, it's an ABI break. OK, got it. Just to clarify, a break in this case is going to be the fact that we are adding new functions, although inlined, correct? Or maybe something else? I'm sorry this is the first time I encounter the question of ABI compatibility in the context of Postgres, so I would appreciate it if you could elaborate a bit. -- Best regards, Aleksander Alekseev
\pset xheader_width page as default? (Re: very long record lines in expanded psql output)
Re: Andrew Dunstan > >> + pg_log_error("\\pset: allowed xheader_width values are full > >> (default), column, page, or an number specifying exact width."); > Committed. There were a couple of bits missing, which I filled in, and I > changed the behaviour when the option is given without an argument, so > that instead of resetting it the current value is shown, similarly to > how most pset options work. Thanks for implementing this! This has been #1 on my PG annoyances list since forever. I had even tried to patch it 10½ years ago, but ever managed to get that patch to submittable quality. Now, is there a chance that we could make this the default? Having psql print a whole screen of minus characters doesn't seem very useful as default behavior. I see upthread that Pavel was objecting to that because pspg doesn't understand it. But since the expanded format is a one-column output, and the only thing pspg needs to know is the maxium line length, I'd think pspg could just look at each line, and update the maximum as it reads the input anyway. Can we set 'page' as default? Christoph
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Aleksander Alekseev writes: > Do you think this should be backported? Impossible, it's an ABI break. regards, tom lane
Re: Reducing the chunk header sizes on all memory context types
David Rowley writes: > On Tue, 30 Aug 2022 at 03:16, Robert Haas wrote: >> ../../../../src/include/utils/memutils_memorychunk.h:186:18: error: >> comparison of constant 7 with expression of type >> 'MemoryContextMethodID' (aka 'enum MemoryContextMethodID') is always >> true [-Werror,-Wtautological-constant-out-of-range-compare] >> Assert(methodid <= MEMORY_CONTEXT_METHODID_MASK); >> ^~~~ > I think the Assert is useful as if we were ever to add an enum member > with the value of 8 and forgot to adjust MEMORY_CONTEXT_METHODID_BITS > then bad things would happen inside MemoryChunkSetHdrMask() and > MemoryChunkSetHdrMaskExternal(). I think it's unlikely we'll ever get > that many MemoryContext types, but I don't know for sure and would > rather the person who adds the 9th one get alerted to the lack of bit > space in MemoryChunk as soon as possible. I think that's a weak argument, so I don't mind dropping this Assert. What would be far more useful is a comment inside the MemoryContextMethodID enum pointing out that we can support at most 8 values because XYZ. However, I'm still wondering why Robert sees this when the rest of us don't. regards, tom lane
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Hi Peter, > To address this, I converted these macros to inline functions This is a great change! I encountered passing the wrong arguments to these macros many times, and this is indeed pretty annoying. I wish we could forbid doing other stupid things as well, e.g. comparing two Datum's directly, which for Timestamps works just fine but only on 64-bit platforms. Although this is certainly out of scope of this thread. The patch looks good to me, I merely added a link to the discussion. I added it to the CF application. Cfbot is making its mind at the moment of writing. Do you think this should be backported? -- Best regards, Aleksander Alekseev v2-0001-Fix-incorrect-uses-of-Datum-conversion-macros.patch Description: Binary data v2-0002-Convert-GetDatum-and-DatumGet-macros-to-inline-fu.patch Description: Binary data
Re: Strip -mmacosx-version-min options from plperl build
On 2022-08-26 Fr 16:25, Andres Freund wrote: > Hi, > > On 2022-08-26 16:00:31 -0400, Tom Lane wrote: >> Andrew Dunstan writes: >>> On 2022-08-26 Fr 12:11, Tom Lane wrote: And if that doesn't help, try -Wl,--export-all-symbols >>> worked > Except that it's only happening for plperl, I'd wonder if it's possibly > related to our magic symbols being prefixed with _. I noticed that the > underscore prefix e.g. changes the behaviour of gcc's "collect2" on AIX, which > is responsible for exporting symbols etc. > > >> Hmph. Hard to see how that isn't a linker bug. > Agreed, given that this is only happening with plperl, and not with any of the > other extensions... > > >> As a stopgap to get the farm green again, I propose adding something like >> >> ifeq ($(PORTNAME), cygwin) >> SHLIB_LINK += -Wl,--export-all-symbols >> endif >> >> to plperl's makefile. > :( > It doesn't make me very happy either, but nobody seems to have a better idea. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Postmaster self-deadlock due to PLT linkage resolution
On Wed, Aug 31, 2022 at 12:26 AM Robert Haas wrote: > On Tue, Aug 30, 2022 at 8:17 AM Thomas Munro wrote: > > FWIW I suspect FreeBSD can't break like this in a program linked with > > libthr, because it has a scheme for deferring signals while the > > runtime linker holds locks. _rtld_bind calls _thr_rtld_rlock_acquire, > > which uses the THR_CRITICAL_ENTER mechanism to cause thr_sighandler to > > defer until release. For a non-thread program, I'm not entirely sure, > > but I don't think the fork() problem exists there. (Could be wrong, > > based on a quick look.) > > Well that seems a bit ironic, considering that Tom has worried in the > past that linking with threading libraries would break stuff. Hah. To clarify, non-thread builds don't have that exact fork() problem, but it turns out they do have a related state clobbering problem elsewhere, which I've reported.
Re: Reducing the chunk header sizes on all memory context types
Tomas Vondra writes: > I guess the idea was to add a sentinel only when there already is space > for it, but perhaps that's a bad tradeoff limiting the benefits. Either > we add the sentinel fairly often (and then why not just add it all the > time - it'll need a bit more space), or we do it only very rarely (and > then it's a matter of luck if it catches an issue). I'm fairly sure that when we made that decision originally, a top-of-mind case was ListCells, which are plentiful, small, power-of-2-sized, and not used in a way likely to have buffer overruns. But since the List rewrite a couple years back we no longer palloc individual ListCells. So maybe we should revisit the question. It'd be worth collecting some stats about how much extra space would be needed if we force there to be room for a sentinel. regards, tom lane
Re: SQL/JSON features for v15
On 2022-08-30 Tu 06:29, Amit Langote wrote: > On Tue, Aug 30, 2022 at 6:19 PM Alvaro Herrera > wrote: >> On 2022-Aug-30, Amit Langote wrote: >> >>> Patches 0001-0006: >>> >>> Yeah, these add the overhead of an extra function call (typin() -> >>> typin_opt_error()) in possibly very common paths. Other than >>> refactoring *all* places that call typin() to use the new API, the >>> only other option seems to be to leave the typin() functions alone and >>> duplicate their code in typin_opt_error() versions for all the types >>> that this patch cares about. Though maybe, that's not necessarily a >>> better compromise than accepting the extra function call overhead. >> I think another possibility is to create a static inline function in the >> corresponding .c module (say boolin_impl() in bool.c), which is called >> by both the opt_error variant as well as the regular one. This would >> avoid the duplicate code as well as the added function-call overhead. > +1 > Makes plenty of sense, I'll try to come up with replacements for these forthwith. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
plpgsql-trigger.html: Format TG_ variables as table (patch)
Hi, I found the list of TG_ variables on https://www.postgresql.org/docs/current/plpgsql-trigger.html#PLPGSQL-DML-TRIGGER hard to read for several reasons: too much whitespace, all the lines start with "Data type", and even after that, the actual content is hiding behind some extra "variable that..." boilerplate. The attached patch formats the list as a table, and removes some of the clutter from the text. I reused the catalog_table_entry table machinery, that is probably not quite the correct thing, but I didn't find a better variant, and the result looks ok. Thanks to ilmari for the idea and some initial reviews. Christoph >From 708c05277e6e2a93e9ceb1e52fda5991cef8b545 Mon Sep 17 00:00:00 2001 From: Christoph Berg Date: Tue, 30 Aug 2022 15:09:39 +0200 Subject: [PATCH] plpgsql-trigger.html: Format TG_ variables as table Format list as table, and remove redundant clutter ("viarable holding...") that made the list hard to skim --- doc/src/sgml/plpgsql.sgml | 221 -- 1 file changed, 114 insertions(+), 107 deletions(-) diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index cf387dfc3f..c73e81eff0 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -4028,142 +4028,149 @@ ASSERT condition , When a PL/pgSQL function is called as a trigger, several special variables are created automatically in the - top-level block. They are: + top-level block. They are listed in . + - - - NEW - + + TG_ special trigger function variables + + + + + Variable Type + - Data type RECORD; variable holding the new - database row for INSERT/UPDATE operations in row-level - triggers. This variable is null in statement-level triggers - and for DELETE operations. + Description + + + + + + + + NEW record - - - - - OLD - - Data type RECORD; variable holding the old - database row for UPDATE/DELETE operations in row-level + new database row for INSERT/UPDATE operations in row-level triggers. This variable is null in statement-level triggers - and for INSERT operations. - - - + and for DELETE operations. + + - - TG_NAME - + + + OLD record + - Data type name; variable that contains the name of the trigger actually - fired. + old database row for UPDATE/DELETE operations in row-level + triggers. This variable is null in statement-level triggers and for INSERT + operations. + + + + + + TG_NAME name - - + + name of the trigger fired. + + - - TG_WHEN - + + + TG_WHEN text + - Data type text; a string of - BEFORE, AFTER, or + string BEFORE, AFTER, or INSTEAD OF, depending on the trigger's definition. - - - + + - - TG_LEVEL - + + + TG_LEVEL text + - Data type text; a string of either - ROW or STATEMENT - depending on the trigger's definition. + string ROW or STATEMENT depending + on the trigger's definition. + + + + + + TG_OP text - - - - - TG_OP - - Data type text; a string of - INSERT, UPDATE, + string INSERT, UPDATE, DELETE, or TRUNCATE telling for which operation the trigger was fired. - - - + + - - TG_RELID - - - Data type oid; the object ID of the table that caused the - trigger invocation. + + + TG_RELID oid + (references pg_class.oid) - - - - - TG_RELNAME - - Data type name; the name of the table that caused the trigger - invocation. This is now deprecated, and could disappear in a future - release. Use TG_TABLE_NAME instead. - - - + object ID of the table that caused the trigger invocation. + + - - TG_TABLE_NAME - - - Data type name; the name of the table that - caused the trigger invocation. + + + TG_RELNAME name - - - - - TG_TABLE_SCHEMA - - Data type name; the name of the schema of the - table that caused the trigger invocation. + name of the table that caused the trigger invocation. This is now + deprecated, and could disappear in a future release. Use + TG_TABLE_NAME instead. + + + + + + TG_TABLE_NAME name - - - - - TG_NARGS - - Data type integer; the number of arguments given to the trigger - function in the CREATE TRIGGER statement. +
Re: POC: GROUP BY optimization
On 8/18/22 3:29 AM, Tomas Vondra wrote: On 8/18/22 03:32, David Rowley wrote: Here are a couple of patches to demo the idea. Yeah, that's an option too. I should have mentioned it along with the cpu_operator_cost. BTW would you mind taking a look at the costing? I think it's fine, but it would be good if someone not involved in the patch takes a look. With RMT hat on, just a friendly reminder that this is still on the open items list[1] -- we have the Beta 4 release next week and it would be great if we can get this resolved prior to it. Thanks! Jonathan [1] https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items [2] https://www.postgresql.org/message-id/9d251aec-cea2-bc1a-5ed8-46ef0bcf6...@postgresql.org OpenPGP_signature Description: OpenPGP digital signature
Re: Removing dead code in pgcrypto
Hi again, > I'm pretty sure this change is fine too, but I added the patch to the > CF application in order to play it safe. Let's see what cfbot will > tell us. I see a little race condition happen :) Sorry for this. -- Best regards, Aleksander Alekseev
Re: ICU for global collation
On Wed, Aug 24, 2022 at 08:28:24PM +0200, Peter Eisentraut wrote: > Committed, thanks. > > (This should conclude all the issues discussed in this thread recently.) Please note that this open item was still listed as open. I have closed it now. -- Michael signature.asc Description: PGP signature
Setting restrictedtoken in pg_regress
In pg_regress we set restrictedToken when calling CreateRestrictedProcess, but we never seem to use that anywhere. Not being well versed in Windows I might be missing something, but is it needed or is it a copy/pasteo from fa1e5afa8a2 which does that in restricted_token.c? If not needed, removing it makes the code more readable IMO. -- Daniel Gustafsson https://vmware.com/ pg_regress_restrictedtoken.diff Description: Binary data
Re: Removing dead code in pgcrypto
Hi Daniel, > Thanks for looking! On closer inspection, I found another function which was > never used and which doesn't turn up when searching extensions. The attached > removes pgp_get_cipher_name as well. I'm pretty sure this change is fine too, but I added the patch to the CF application in order to play it safe. Let's see what cfbot will tell us. -- Best regards, Aleksander Alekseev
Re: Removing dead code in pgcrypto
> On 30 Aug 2022, at 14:39, Aleksander Alekseev > wrote: > > Hi Daniel, > >> it seems we can consider retiring them in v16. > > Looks good to me. A link to the discussion was added to the patch. Thanks for looking! On closer inspection, I found another function which was never used and which doesn't turn up when searching extensions. The attached removes pgp_get_cipher_name as well. -- Daniel Gustafsson https://vmware.com/ v3-0001-pgcrypto-Remove-unused-code.patch Description: Binary data
Re: making relfilenodes 56 bits
On Fri, Aug 26, 2022 at 9:33 PM Robert Haas wrote: > > On Fri, Aug 26, 2022 at 7:01 AM Dilip Kumar wrote: > > While working on this solution I noticed one issue. Basically, the > > problem is that during binary upgrade when we try to rewrite a heap we > > would expect that “binary_upgrade_next_heap_pg_class_oid” and > > “binary_upgrade_next_heap_pg_class_relfilenumber” are already set for > > creating a new heap. But we are not preserving anything so we don't > > have those values. One option to this problem is that we can first > > start the postmaster in non-binary upgrade mode perform all conflict > > checking and rewrite and stop the postmaster. Then start postmaster > > again and perform the restore as we are doing now. Although we will > > have to start/stop the postmaster one extra time we have a solution. > > Yeah, that seems OK. Or we could add a new function, like > binary_upgrade_allow_relation_oid_and_relfilenode_assignment(bool). > Not sure which way is better. I have found one more issue with this approach of rewriting the conflicting table. Earlier I thought we could do the conflict checking and rewriting inside create_new_objects() right before the restore command. But after implementing (while testing) this I realized that we DROP and CREATE the database while restoring the dump that means it will again generate the conflicting system tables. So theoretically the rewriting should go in between the CREATE DATABASE and restoring the object but as of now both create database and restoring other objects are part of a single dump file. I haven't yet analyzed how feasible it is to generate the dump in two parts, first part just to create the database and in second part restore the rest of the object. Thoughts? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Removing dead code in pgcrypto
Hi Daniel, > it seems we can consider retiring them in v16. Looks good to me. A link to the discussion was added to the patch. -- Best regards, Aleksander Alekseev v2-0001-pgcrypto-Remove-unused-mbuf-code.patch Description: Binary data
Re: replacing role-level NOINHERIT with a grant-level option
On Tue, Aug 30, 2022 at 8:24 AM Robert Haas wrote: > On Mon, Aug 29, 2022 at 10:16 PM Robert Haas wrote: > > I'll figure out a fix tomorrow when I'm less tired. Thanks for catching it. > > Second try. Third try. -- Robert Haas EDB: http://www.enterprisedb.com v3-0001-Fix-a-bug-in-roles_is_member_of.patch Description: Binary data
Re: Postmaster self-deadlock due to PLT linkage resolution
On Tue, Aug 30, 2022 at 8:17 AM Thomas Munro wrote: > FWIW I suspect FreeBSD can't break like this in a program linked with > libthr, because it has a scheme for deferring signals while the > runtime linker holds locks. _rtld_bind calls _thr_rtld_rlock_acquire, > which uses the THR_CRITICAL_ENTER mechanism to cause thr_sighandler to > defer until release. For a non-thread program, I'm not entirely sure, > but I don't think the fork() problem exists there. (Could be wrong, > based on a quick look.) Well that seems a bit ironic, considering that Tom has worried in the past that linking with threading libraries would break stuff. -- Robert Haas EDB: http://www.enterprisedb.com
Re: replacing role-level NOINHERIT with a grant-level option
On Mon, Aug 29, 2022 at 10:16 PM Robert Haas wrote: > I'll figure out a fix tomorrow when I'm less tired. Thanks for catching it. Second try. -- Robert Haas EDB: http://www.enterprisedb.com v2-0001-Fix-a-bug-in-roles_is_member_of.patch Description: Binary data
Re: Postmaster self-deadlock due to PLT linkage resolution
On Tue, Aug 30, 2022 at 7:44 AM Tom Lane wrote: > Buildfarm member mamba (NetBSD-current on prairiedog's former hardware) > has failed repeatedly since I set it up. I have now run the cause of > that to ground [1], and here's what's happening: if the postmaster > receives a signal just before it first waits at the select() in > ServerLoop, it can self-deadlock. During the postmaster's first use of > select(), the dynamic loader needs to resolve the PLT branch table entry > that the core executable uses to reach select() in libc.so, and it locks > the loader's internal data structures while doing that. If we enter > a signal handler while the lock is held, and the handler needs to do > anything that also requires the lock, the postmaster is frozen. . o O ( pselect() wouldn't have this problem, but it's slightly too new for the back branches that didn't yet require SUSv3... drat ) > I'd originally intended to make this code "#ifdef __NetBSD__", > but on looking into the FreeBSD sources I find much the same locking > logic in their dynamic loader, and now I'm wondering if such behavior > isn't pretty standard. The added calls should have negligible cost, > so it doesn't seem unreasonable to do them everywhere. FWIW I suspect FreeBSD can't break like this in a program linked with libthr, because it has a scheme for deferring signals while the runtime linker holds locks. _rtld_bind calls _thr_rtld_rlock_acquire, which uses the THR_CRITICAL_ENTER mechanism to cause thr_sighandler to defer until release. For a non-thread program, I'm not entirely sure, but I don't think the fork() problem exists there. (Could be wrong, based on a quick look.) > (Of course, a much better answer is to get out of the business of > doing nontrivial stuff in signal handlers. But even if we get that > done soon, we'd surely not back-patch it.) +1
Re: Transparent column encryption
On 27.07.22 01:19, Jacob Champion wrote: Now, if we don't have a padding system built into the feature, then that does put even more on the user; it's hard to argue with that. Right. If they can even fix it at all. Having a well-documented padding feature would not only help mitigate that, it would conveniently hang a big sign on the caveats that exist. I would be interested in learning more about such padding systems. I have done a lot of reading for this development project, and I have never come across a cryptographic approach to hide length differences by padding. Of course, padding to the block cipher's block size is already part of the process, but that is done out of necessity, not because you want to disguise the length. Are there any other methods? I'm interested to learn more.
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, Aug 30, 2022 at 12:12 PM Amit Kapila wrote: > > Few other comments on v25-0001* > > Some more comments on v25-0001*: = 1. +static void +apply_handle_stream_abort(StringInfo s) ... ... + else if (apply_action == TA_SEND_TO_PARALLEL_WORKER) + { + if (subxid == xid) + parallel_apply_replorigin_reset(); + + /* Send STREAM ABORT message to the apply parallel worker. */ + parallel_apply_send_data(winfo, s->len, s->data); + + /* + * After sending the data to the apply parallel worker, wait for + * that worker to finish. This is necessary to maintain commit + * order which avoids failures due to transaction dependencies and + * deadlocks. + */ + if (subxid == xid) + { + parallel_apply_wait_for_free(winfo); ... ... >From this code, it appears that we are waiting for rollbacks to finish but not doing the same in the rollback to savepoint cases. Is there a reason for the same? I think we need to wait for rollbacks to avoid transaction dependency and deadlock issues. Consider the below case: Consider table t1 (c1 primary key, c2, c3) has a row (1, 2, 3) on both publisher and subscriber. Publisher Session-1 == Begin; ... Delete from t1 where c1 = 1; Session-2 Begin; ... insert into t1 values(1, 4, 5); --This will wait for Session-1's Delete to finish. Session-1 Rollback; Session-2 -- The wait will be finished and the insert will be successful. Commit; Now, assume both these transactions get streamed and if we didn't wait for rollback/rollback to savepoint, it is possible that the insert gets executed before and leads to a constraint violation. This won't happen in non-parallel mode, so we should wait for rollbacks to finish. 2. I think we don't need to wait at Rollback Prepared/Commit Prepared because we wait for prepare to finish in *_stream_prepare function. That will ensure all the operations in that transaction have happened in the subscriber, so no concurrent transaction can create deadlock or transaction dependency issues. If so, I think it is better to explain this in the comments. 3. +/* What action to take for the transaction. */ +typedef enum { - LogicalRepMsgType command; /* 0 if invalid */ - LogicalRepRelMapEntry *rel; + /* The action for non-streaming transactions. */ + TA_APPLY_IN_LEADER_WORKER, - /* Remote node information */ - int remote_attnum; /* -1 if invalid */ - TransactionId remote_xid; - XLogRecPtr finish_lsn; - char*origin_name; -} ApplyErrorCallbackArg; + /* Actions for streaming transactions. */ + TA_SERIALIZE_TO_FILE, + TA_APPLY_IN_PARALLEL_WORKER, + TA_SEND_TO_PARALLEL_WORKER +} TransactionApplyAction; I think each action needs explanation atop this enum typedef. 4. @@ -1149,24 +1315,14 @@ static void apply_handle_stream_start(StringInfo s) { ... + else if (apply_action == TA_SERIALIZE_TO_FILE) + { + /* + * For the first stream start, check if there is any free apply + * parallel worker we can use to process this transaction. + */ + if (first_segment) + winfo = parallel_apply_start_worker(stream_xid); - /* open the spool file for this transaction */ - stream_open_file(MyLogicalRepWorker->subid, stream_xid, first_segment); + if (winfo) + { + /* + * If we have found a free worker, then we pass the data to that + * worker. + */ + parallel_apply_send_data(winfo, s->len, s->data); - /* if this is not the first segment, open existing subxact file */ - if (!first_segment) - subxact_info_read(MyLogicalRepWorker->subid, stream_xid); + nchanges = 0; - pgstat_report_activity(STATE_RUNNING, NULL); + /* Cache the apply parallel worker for this transaction. */ + stream_apply_worker = winfo; + } ... This looks odd to me in the sense that even if the action is TA_SERIALIZE_TO_FILE, we still send the information to the parallel worker. Won't it be better if we call parallel_apply_start_worker() for first_segment before checking apply_action with get_transaction_apply_action(). That way we can avoid this special case handling. 5. +/* + * Struct for sharing information between apply leader apply worker and apply + * parallel workers. + */ +typedef struct ApplyParallelWorkerShared +{ + slock_t mutex; + + bool in_use; + + /* Logical protocol version. */ + uint32 proto_version; + + TransactionId stream_xid; Are we using stream_xid passed by the leader in parallel worker? If so, how? If not, then can we do without this? 6. +void +HandleParallelApplyMessages(void) { ... + /* OK to process messages. Reset the flag saying there are more to do. */ + ParallelApplyMessagePending = false; I don't understand the meaning of the second part of the comment. Shouldn't we say: "Reset the flag saying there is nothing more to do."? I know you have copied from the other part of the code but there also I am not sure if it is correct. 7. +static List *ApplyParallelWorkersFreeList = NIL; +static List *ApplyParallelWorkersList = NIL; Do we really need to maintain two different workers' lists? If so, what is the advantage? I think
Re: Transparent column encryption
On 20.07.22 08:12, Masahiko Sawada wrote: --- Regarding the documentation, I'd like to have a page that describes the generic information of the transparent column encryption for users such as what this feature actually does, what can be achieved by this feature, CMK rotation, and its known limitations. The patch has "Transparent Column Encryption" section in protocol.sgml but it seems to be more internal information. I have added more documentation in the v6 patch. --- In datatype.sgml, it says "Thus, clients that don't support transparent column encryption or have disabled it will see the encrypted values as byte arrays." but I got an error rather than encrypted values when I tried to connect to the server using by clients that don't support the encryption: postgres(1:6040)=# select * from tbl; no CMK lookup found for realm "" This has now been improved in v6. The protocol changes need to be activated explicitly at connection time, so if you use a client that doesn't support it or activates it, you get the described behavior. --- In single-user mode, the user cannot decrypt the encrypted value but probably it's fine in practice. Yes, there is nothing really to do about that. --- Regarding the column master key rotation, would it be useful if we provide a tool for that? For example, it takes old and new CMK as input, re-encrypt all CEKs realted to the CMK, and registers them to the server. I imagine users using a variety of key management systems, so I don't see how a single tool would work. But it's something we can think about in the future. --- Is there any convenient way to load a large amount of test data to the encrypted columns? I tried to use generate_series() but it seems not to work as it generates the data on the server side: No, that doesn't work, by design. You'd have to write a client program to generate the data. I've also tried to load the data from a file on the client by using \copy command, but it seems not to work: postgres(1:80556)=# copy (select generate_series(1, 1000)::text) to '/tmp/tmp.dat'; COPY 1000 postgres(1:80556)=# \copy a from '/tmp/tmp.dat' COPY 1000 postgres(1:80556)=# select * from a; out out memory This was a bug that I have fixed. --- I got SEGV in the following two situations: I have fixed these.
Removing dead code in pgcrypto
mbuf_tell and mbuf_rewind functions were introduced in commit e94dd6ab91 but were seemingly never used, so it seems we can consider retiring them in v16. -- Daniel Gustafsson https://vmware.com/ v1-0001-pgcrypto-Remove-unused-mbuf-code.patch Description: Binary data
Re: SQL/JSON features for v15
On Tue, Aug 30, 2022 at 6:19 PM Alvaro Herrera wrote: > On 2022-Aug-30, Amit Langote wrote: > > > Patches 0001-0006: > > > > Yeah, these add the overhead of an extra function call (typin() -> > > typin_opt_error()) in possibly very common paths. Other than > > refactoring *all* places that call typin() to use the new API, the > > only other option seems to be to leave the typin() functions alone and > > duplicate their code in typin_opt_error() versions for all the types > > that this patch cares about. Though maybe, that's not necessarily a > > better compromise than accepting the extra function call overhead. > > I think another possibility is to create a static inline function in the > corresponding .c module (say boolin_impl() in bool.c), which is called > by both the opt_error variant as well as the regular one. This would > avoid the duplicate code as well as the added function-call overhead. +1 -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi Onder, Since you ask me several questions [1], this post is just for answering those. I have looked again at the latest v9 patch, but I will post my review comments for that separately. On Thu, Aug 25, 2022 at 7:09 PM Önder Kalacı wrote: > >> 1d. >> In above text, what was meant by "catches up around ~5 seconds"? >> e.g. Did it mean *improves* by ~5 seconds, or *takes* ~5 seconds? >> > > It "takes" 5 seconds to replicate all the changes. To be specific, I execute > 'SELECT sum(abalance) FROM pgbench_accounts' on the subscriber, and try to > measure the time until when all the changes are replicated. I do use the same > query on the publisher to check what the query result should be when > replication is done. > > I updated the relevant text, does that look better? Yes. >> 2. GENERAL >> >> 2a. >> There are lots of single-line comments that start lowercase, but by >> convention, I think they should start uppercase. >> >> e.g. + /* we should always use at least one attribute for the index scan */ >> e.g. + /* we might not need this if the index is unique */ >> e.g. + /* avoid expensive equality check if index is unique */ >> e.g. + /* unrelated Path, skip */ >> e.g. + /* simple case, we already have an identity or pkey */ >> e.g. + /* indexscans are disabled, use seq. scan */ >> e.g. + /* target is a regular table */ >> >> ~~ > > > Thanks for noting this, I didn't realize that there is a strict requirement > on this. Updated all of your suggestions, and realized one more such case. > > Is there documentation where such conventions are listed? I couldn't find any. I don’t know of any strict requirements, but I did think it was the more common practice to make the comments look like proper sentences. However, when I tried to prove that by counting the single-line comments in PG code it seems to be split almost 50:50 lowercase/uppercase, so I guess you should just do whatever is most sensible or is most consistent with the surrounding code …. Counts for single line /* */ comments: regex ^\s*\/\*\s[a-z]+.*\*\/$ = 18222 results regex ^\s*\/\*\s[A-Z]+.*\*\/$ = 20252 results >> 3. src/backend/executor/execReplication.c - build_replindex_scan_key >> >> - int attoff; >> + int index_attoff; >> + int scankey_attoff; >> bool isnull; >> Datum indclassDatum; >> oidvector *opclass; >> int2vector *indkey = >rd_index->indkey; >> - bool hasnulls = false; >> - >> - Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) || >> -RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel)); >> >> indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple, >> Anum_pg_index_indclass, ); >> Assert(!isnull); >> opclass = (oidvector *) DatumGetPointer(indclassDatum); >> + scankey_attoff = 0; >> >> Maybe just assign scankey_attoff = 0 at the declaration? >> > > Again, lack of coding convention knowledge :/ My observation is that it is > often not assigned during the declaration. But, changed this one. > I don’t know of any convention. Probably this is just my own preference to keep the simple default assignments with the declaration to reduce the LOC. YMMV. >> >> 6. >> >> - int pkattno = attoff + 1; >> ... >> /* Initialize the scankey. */ >> - ScanKeyInit([attoff], >> - pkattno, >> + ScanKeyInit([scankey_attoff], >> + index_attoff + 1, >> BTEqualStrategyNumber, >> Wondering if it would have been simpler if you just did: >> int pkattno = index_attoff + 1; > > > > The index is not necessarily the primary key at this point, that's why I > removed it. > > There are already 3 variables in the same function index_attoff, > scankey_attoff and table_attno, which are hard to avoid. But, this one seemed > ok to avoid, mostly to simplify the readability. Do you think it is better > with the additional variable? Still, I think we need a better name as "pk" is > not relevant anymore. > Your code is fine. Leave it as-is. >> 8. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex >> >> @@ -128,28 +171,44 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid, >> TransactionId xwait; >> Relation idxrel; >> bool found; >> + TypeCacheEntry **eq; >> + bool indisunique; >> + int scankey_attoff; >> >> /* Open the index. */ >> idxrel = index_open(idxoid, RowExclusiveLock); >> + indisunique = idxrel->rd_index->indisunique; >> + >> + /* we might not need this if the index is unique */ >> + eq = NULL; >> >> Maybe just default assign eq = NULL in the declaration? >> > > Again, I wasn't sure if it is OK regarding the coding convention to assign > during the declaration. Changed now. > Same as #3. >> 10. >> >> + /* we only need to allocate once */ >> + if (eq == NULL) >> + eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts); >> >> But shouldn't you also free this 'eq' before the function returns, to >> prevent leaking memory? >> > > Two notes here. First, this is allocated inside ApplyMessageContext, which > seems to be reset per tuple change. So,
Re: SQL/JSON features for v15
On 2022-Aug-30, Amit Langote wrote: > Patches 0001-0006: > > Yeah, these add the overhead of an extra function call (typin() -> > typin_opt_error()) in possibly very common paths. Other than > refactoring *all* places that call typin() to use the new API, the > only other option seems to be to leave the typin() functions alone and > duplicate their code in typin_opt_error() versions for all the types > that this patch cares about. Though maybe, that's not necessarily a > better compromise than accepting the extra function call overhead. I think another possibility is to create a static inline function in the corresponding .c module (say boolin_impl() in bool.c), which is called by both the opt_error variant as well as the regular one. This would avoid the duplicate code as well as the added function-call overhead. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Syntax error: function hell() needs an argument. Please choose what hell you want to involve.
Re: Letter case of "admin option"
On 2022-Aug-26, Robert Haas wrote: > Here's a patch changing all occurrences of "admin option" in error > messages to "ADMIN OPTION". > > Two of these five messages also exist in previous releases; the other > three are new. > > I'm not sure if this is our final conclusion on what we want to do > here, so please speak up if you don't agree. Thanks -- this is my personal preference, as well as speaking on behalf of a few people who considered the matter from a user's point of view. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Reducing the chunk header sizes on all memory context types
On 8/30/22 04:31, David Rowley wrote: > On Tue, 30 Aug 2022 at 13:55, Tom Lane wrote: >> I wonder if slab ought to artificially bump up such requests when >> MEMORY_CONTEXT_CHECKING is enabled, so there's room for a sentinel. >> I think it's okay for aset.c to not do that, because its power-of-2 >> behavior means there usually is room for a sentinel; but slab's >> policy makes it much more likely that there won't be. > > I think it's fairly likely that small allocations are a power of 2, > and I think most of our allocates are small, so I imagine that if we > didn't do that for aset.c, we'd miss out on most of the benefits. > Yeah. I think we have a fair number of "larger" allocations (once you get to ~100B it probably won't be a 2^N), but we may easily miss whole sections of allocations. I guess the idea was to add a sentinel only when there already is space for it, but perhaps that's a bad tradeoff limiting the benefits. Either we add the sentinel fairly often (and then why not just add it all the time - it'll need a bit more space), or we do it only very rarely (and then it's a matter of luck if it catches an issue). Considering we only do this with asserts, I doubt the extra bytes / CPU is a major issue, and a (more) reliable detection of issues seems worth it. But maybe I underestimate the costs. The only alternative seems to be valgrind, and that's way costlier, though. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_rewind WAL segments deletion pitfall
On Tue, 30 Aug 2022 at 10:27, Kyotaro Horiguchi wrote: > > Hmm. Doesn't it work to ignoring tli then? All segments that their > segment number is equal to or larger than the checkpoint locaiton are > preserved regardless of TLI? > If we ignore TLI there is a chance that we may retain some unnecessary (or just wrong) files. > > > Also, we need to take into account the divergency LSN. Files after it are > > not required. > > They are removed at the later checkpoints. But also we can remove > segments that are out of the range between the last common checkpoint > and divergence point ignoring TLI. Everything that is newer last_common_checkpoint_seg could be removed (but it already happens automatically, because these files are missing on the new primary). WAL files that are older than last_common_checkpoint_seg could be either removed or at least not copied from the new primary. > the divergence point is also > compared? > > > if (file_segno >= last_common_checkpoint_seg && > > file_segno <= divergence_seg) > > ; > The current implementation relies on tracking WAL files being open while searching for the last common checkpoint. It automatically starts from the divergence_seg, automatically finishes at last_common_checkpoint_seg, and last but not least, automatically handles timeline changes. I don't think that manually written code that decides what to do from the WAL file name (and also takes into account TLI) could be much simpler than the current approach. Actually, since we start doing some additional "manipulations" with files in pg_wal, we probably should do a symmetric action with files inside pg_wal/archive_status Regards, -- Alexander Kukushkin