Re: pg_rewind WAL segments deletion pitfall

2022-08-30 Thread Kyotaro Horiguchi
At Wed, 31 Aug 2022 14:30:31 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> What do you think about that?

By the way don't you add an CF entry for this?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_rewind WAL segments deletion pitfall

2022-08-30 Thread Kyotaro Horiguchi
At Tue, 30 Aug 2022 11:01:58 +0200, Alexander Kukushkin  
wrote in 
> On Tue, 30 Aug 2022 at 10:27, Kyotaro Horiguchi 
> wrote:
> 
> >
> > Hmm. Doesn't it work to ignoring tli then? All segments that their
> > segment number is equal to or larger than the checkpoint locaiton are
> > preserved regardless of TLI?
> >
> 
> If we ignore TLI there is a chance that we may retain some unnecessary (or
> just wrong) files.

Right. I mean I don't think thats a problem and we can rely on
postgres itself for later cleanup. Theoretically some out-of-range tli
or segno files are left alone but they surely will be gone soon after
the server starts.

> > > Also, we need to take into account the divergency LSN. Files after it are
> > > not required.
> >
> > They are removed at the later checkpoints. But also we can remove
> > segments that are out of the range between the last common checkpoint
> > and divergence point ignoring TLI.
> 
> 
> Everything that is newer last_common_checkpoint_seg could be removed (but
> it already happens automatically, because these files are missing on the
> new primary).
> WAL files that are older than last_common_checkpoint_seg could be either
> removed or at least not copied from the new primary.
..
> The current implementation relies on tracking WAL files being open while
> searching for the last common checkpoint. It automatically starts from the
> divergence_seg, automatically finishes at last_common_checkpoint_seg, and
> last but not least, automatically handles timeline changes. I don't think
> that manually written code that decides what to do from the WAL file name
> (and also takes into account TLI) could be much simpler than the current
> approach.

Yeah, I know.  My expectation is taking the simplest way for the same
effect. My concern was the additional hash. On second thought, I
concluded that we should that on the existing filehash.

We can just add a FILE_ACTION_NONE entry to the file hash from
SimpleXLogPageRead.  Since this happens before decide_file_action()
call, decide_file_action() should ignore the entries with
FILE_ACTION_NONE. Also we need to call filehash_init() earlier.

> Actually, since we start doing some additional "manipulations" with files
> in pg_wal, we probably should do a symmetric action with files inside
> pg_wal/archive_status

In that sense, pg_rewind rather should place missing
archive_status/*.done for segments including restored ones seen while
finding checkpoint.  This is analogous of the behavior with
pg_basebackup and pg_receivewal. Also we should add FILE_ACTION_NONE
entries for .done files for segments read while finding checkpoint.

What do you think about that?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: New strategies for freezing, advancing relfrozenxid early

2022-08-30 Thread Peter Geoghegan
On Tue, Aug 30, 2022 at 9:37 PM Jeff Davis  wrote:
> On Tue, 2022-08-30 at 18:50 -0700, Peter Geoghegan wrote:
> > Since VM snapshots are immutable, it should be relatively
> > easy to have the implementation make its final decision on skipping
> > only *after* lazy_scan_heap() returns.
>
> I like this idea.

I was hoping that you would. I imagine that this idea (with minor
variations) could enable an approach that's much closer to what you
were thinking of: one that mostly focuses on controlling the number of
unfrozen pages, and not so much on advancing relfrozenxid early, just
because we can and we might not get another chance for a long time. In
other words your idea of a design that can freeze more during a
non-aggressive VACUUM, while still potentially skipping all-visible
pages.

I said earlier on that we ought to at least have a strong bias in the
direction of advancing relfrozenxid in larger tables, especially when
we decide to freeze whole pages more eagerly -- we only get one chance
to advance relfrozenxid per VACUUM, and those opportunities will
naturally be few and far between. We cannot really justify all this
extra freezing if it doesn't prevent antiwraparound autovacuums. That
was more or less my objection to going in that direction.

But if we can more or less double the number of opportunities to at
least ask the question "is now a good time to advance relfrozenxid?"
without really paying much for keeping this option open (and I think
that we can), my concern about relfrozenxid advancement becomes far
less important. Just being able to ask that question is significantly
less rare and precious. Plus we'll probably be able to make
significantly better decisions about relfrozenxid overall with the
"second phase because I changed my mind about skipping" concept in
place.

-- 
Peter Geoghegan




Re: New strategies for freezing, advancing relfrozenxid early

2022-08-30 Thread Jeff Davis
On Tue, 2022-08-30 at 18:50 -0700, Peter Geoghegan wrote:
> Since VM snapshots are immutable, it should be relatively
> easy to have the implementation make its final decision on skipping
> only *after* lazy_scan_heap() returns.

I like this idea.

Regards,
Jeff Davis





Re: Schema variables - new implementation for Postgres 15

2022-08-30 Thread Pavel Stehule
st 24. 8. 2022 v 10:04 odesílatel Erik Rijkers  napsal:

> Op 24-08-2022 om 08:37 schreef Pavel Stehule:
> >>
> >
> > I fixed these.
> >
>
>  > [v20220824-1-*.patch]
>
> Hi Pavel,
>
> I noticed just now that variable assignment (i.e., LET) unexpectedly
> (for me anyway) cast the type of the input value. Surely that's wrong?
> The documentation says clearly enough:
>
> 'The result must be of the same data type as the session variable.'
>
>
> Example:
>
> create variable x integer;
> let x=1.5;
> select x, pg_typeof(x);
>   x | pg_typeof
> ---+---
>   2 | integer
> (1 row)
>
>
> Is this correct?
>
> If such casts (there are several) are intended then the text of the
> documentation should be changed.
>


I changed this

 @@ -58,8 +58,9 @@ LET session_variable = DEFAULT
 sql_expression
 
  
-  An SQL expression, in parentheses. The result must be of the same
data type as the session
-  variable.
+  An SQL expression (can be subquery in parenthesis). The result must
+  be of castable to the same data type as the session variable (in
+  implicit or assignment context).
  
 


is it ok?

Regards

Pavel


> Thanks,
>
> Erik
>
>


Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-30 Thread Nathan Bossart
On Wed, Aug 31, 2022 at 10:50:39AM +0700, John Naylor wrote:
> Here's the final piece. I debated how many tests to add and decided it
> was probably enough to add one each for checking quotes and
> backslashes in the fast path. There is one cosmetic change in the
> code: Before, the vectorized less-equal check compared to 0x1F, but
> the byte-wise path did so with < 32. I made them both "less-equal 31"
> for consistency. I'll commit this by the end of the week unless anyone
> has a better idea about testing.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: effective_multixact_freeze_max_age issue

2022-08-30 Thread Nathan Bossart
On Tue, Aug 30, 2022 at 05:24:17PM -0700, Peter Geoghegan wrote:
> Attached is v2, which cleans up the structure of
> vacuum_set_xid_limits() a bit more. The overall idea was to improve
> the overall flow/readability of the function by moving the WARNINGs
> into their own "code stanza", just after the logic for establishing
> freeze cutoffs and just before the logic for deciding on
> aggressiveness. That is now the more logical approach (group the
> stanzas by functionality), since we can't sensibly group the code
> based on whether it deals with XIDs or with Multis anymore (not since
> the function was taught to deal with the question of whether caller's
> VACUUM will be aggressive).
> 
> Going to push this in the next day or so.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-30 Thread John Naylor
On Tue, Aug 23, 2022 at 1:03 PM John Naylor
 wrote:
>
> LGTM overall. My plan is to split out the json piece, adding tests for
> that, and commit the infrastructure for it fairly soon.

Here's the final piece. I debated how many tests to add and decided it
was probably enough to add one each for checking quotes and
backslashes in the fast path. There is one cosmetic change in the
code: Before, the vectorized less-equal check compared to 0x1F, but
the byte-wise path did so with < 32. I made them both "less-equal 31"
for consistency. I'll commit this by the end of the week unless anyone
has a better idea about testing.

-- 
John Naylor
EDB: http://www.enterprisedb.com
From f1159dcc2044edb107e0dfeae5e8f3c7feb10cd2 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 31 Aug 2022 10:39:17 +0700
Subject: [PATCH v10] Optimize JSON lexing of long strings

Use optimized linear search when looking ahead for end quotes,
backslashes, and non-printable characters. This results in nearly 40%
faster JSON parsing on x86-64 when most values are long strings, and
all platforms should see some improvement.

Reviewed by Andres Freund and Nathan Bossart
Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com
---
 src/common/jsonapi.c   | 13 ++---
 src/test/regress/expected/json.out | 13 +
 src/test/regress/sql/json.sql  |  5 +
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index fefd1d24d9..cfc025749c 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -19,6 +19,7 @@
 
 #include "common/jsonapi.h"
 #include "mb/pg_wchar.h"
+#include "port/pg_lfind.h"
 
 #ifndef FRONTEND
 #include "miscadmin.h"
@@ -844,7 +845,7 @@ json_lex_string(JsonLexContext *lex)
 		}
 		else
 		{
-			char	   *p;
+			char	   *p = s;
 
 			if (hi_surrogate != -1)
 return JSON_UNICODE_LOW_SURROGATE;
@@ -853,11 +854,17 @@ json_lex_string(JsonLexContext *lex)
 			 * Skip to the first byte that requires special handling, so we
 			 * can batch calls to appendBinaryStringInfo.
 			 */
-			for (p = s; p < end; p++)
+			while (p < end - sizeof(Vector8) &&
+   !pg_lfind8('\\', (uint8 *) p, sizeof(Vector8)) &&
+   !pg_lfind8('"', (uint8 *) p, sizeof(Vector8)) &&
+   !pg_lfind8_le(31, (uint8 *) p, sizeof(Vector8)))
+p += sizeof(Vector8);
+
+			for (; p < end; p++)
 			{
 if (*p == '\\' || *p == '"')
 	break;
-else if ((unsigned char) *p < 32)
+else if ((unsigned char) *p <= 31)
 {
 	/* Per RFC4627, these characters MUST be escaped. */
 	/*
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
index e9d6e9faf2..cb181226e9 100644
--- a/src/test/regress/expected/json.out
+++ b/src/test/regress/expected/json.out
@@ -42,6 +42,19 @@ LINE 1: SELECT '"\v"'::json;
^
 DETAIL:  Escape sequence "\v" is invalid.
 CONTEXT:  JSON data, line 1: "\v...
+-- Check fast path for longer strings (at least 16 bytes long)
+SELECT ('"'||repeat('.', 12)||'abc"')::json; -- OK
+   json
+---
+ "abc"
+(1 row)
+
+SELECT ('"'||repeat('.', 12)||'abc\n"')::json; -- OK, legal escapes
+json 
+-
+ "abc\n"
+(1 row)
+
 -- see json_encoding test for input with unicode escapes
 -- Numbers.
 SELECT '1'::json;-- OK
diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql
index e366c6f51b..589e0cea36 100644
--- a/src/test/regress/sql/json.sql
+++ b/src/test/regress/sql/json.sql
@@ -7,6 +7,11 @@ SELECT '"abc
 def"'::json;	-- ERROR, unescaped newline in string constant
 SELECT '"\n\"\\"'::json;		-- OK, legal escapes
 SELECT '"\v"'::json;			-- ERROR, not a valid JSON escape
+
+-- Check fast path for longer strings (at least 16 bytes long)
+SELECT ('"'||repeat('.', 12)||'abc"')::json; -- OK
+SELECT ('"'||repeat('.', 12)||'abc\n"')::json; -- OK, legal escapes
+
 -- see json_encoding test for input with unicode escapes
 
 -- Numbers.
-- 
2.36.1



Re: Stack overflow issue

2022-08-30 Thread Richard Guo
On Wed, Aug 31, 2022 at 6:57 AM Tom Lane  wrote:

> I wrote:
> > The upstream recommendation, which seems pretty sane to me, is to
> > simply reject any string exceeding some threshold length as not
> > possibly being a word.  Apparently it's common to use thresholds
> > as small as 64 bytes, but in the attached I used 1000 bytes.
>
> On further thought: that coding treats anything longer than 1000
> bytes as a stopword, but maybe we should just accept it unmodified.
> The manual says "A Snowball dictionary recognizes everything, whether
> or not it is able to simplify the word".  While "recognizes" formally
> includes the case of "recognizes as a stopword", people might find
> this behavior surprising.  We could alternatively do it as attached,
> which accepts overlength words but does nothing to them except
> case-fold.  This is closer to the pre-patch behavior, but gives up
> the opportunity to avoid useless downstream processing of long words.


This patch looks good to me. It avoids overly-long words (> 1000 bytes)
going through the stemmer so the stack overflow issue in Turkish stemmer
should not exist any more.

Thanks
Richard


Re: New strategies for freezing, advancing relfrozenxid early

2022-08-30 Thread Peter Geoghegan
On Tue, Aug 30, 2022 at 1:45 PM Peter Geoghegan  wrote:
> > d. Can you help give me a sense of scale of the problems solved by
> > visibilitymap snapshots and the cost model? Do those need to be in v1?
>
> I'm not sure. I think that having certainty that we'll be able to scan
> only so many pages up-front is very broadly useful, though. Plus it
> removes the SKIP_PAGES_THRESHOLD stuff, which was intended to enable
> relfrozenxid advancement in non-aggressive VACUUMs, but does so in a
> way that results in scanning many more pages needlessly. See commit
> bf136cf6, which added the SKIP_PAGES_THRESHOLD stuff back in 2009,
> shortly after the visibility map first appeared.

Here is a better example:

Right now the second patch adds both VM snapshots and the skipping
strategy stuff. The VM snapshot is used in the second patch, as a
source of reliable information about how we need to process the table,
in terms of the total number of scanned_pages -- which drives our
choice of strategy. Importantly, we can assess the question of which
skipping strategy to take (in non-aggressive VACUUM) based on 100%
accurate information about how many *extra* pages we'll have to scan
in the event of being eager (i.e. in the event that we prioritize
early relfrozenxid advancement over skipping some pages). Importantly,
that cannot change later on, since VM snapshots are immutable --
everything is locked in. That already seems quite valuable to me.

This general concept could be pushed a lot further without great
difficulty. Since VM snapshots are immutable, it should be relatively
easy to have the implementation make its final decision on skipping
only *after* lazy_scan_heap() returns. We could allow VACUUM to
"change its mind about skipping" in cases where it initially thought
that skipping was the best strategy, only to discover much later on
that that was the wrong choice after all.

A huge amount of new, reliable information will come to light from
scanning the heap rel. In particular, the current value of
vacrel->NewRelfrozenXid seems like it would be particularly
interesting when the time came to consider if a second scan made sense
-- if NewRelfrozenXid is a recent-ish value already, then that argues
for finishing off the all-visible pages in a second heap pass, with
the aim of setting relfrozenxid to a similarly recent value when it
happens to be cheap to do so.

The actual process of scanning precisely those all-visible pages that
were skipped the first time around during a second call to
lazy_scan_heap() can be implemented in the obvious way: by teaching
the VM snapshot infrastructure/lazy_scan_skip() to treat pages that
were skipped the first time around to get scanned during the second
pass over the heap instead. Also, those pages that were scanned the
first time around can/must be skipped on our second pass (excluding
all-frozen pages, which won't be scanned in either heap pass).

I've used the term "second heap pass" here, but that term is slightly
misleading. The final outcome of this whole process is that every heap
page that the vmsnap says VACUUM will need to scan in order for it to
be able to safely advance relfrozenxid will be scanned, precisely
once. The overall order that the heap pages are scanned in will of
course differ from the simple case, but I don't think that it makes
very much difference. In reality there will have only been one heap
pass, consisting of two distinct phases. No individual heap page will
ever be considered for pruning/freezing more than once, no matter
what. This is just a case of *reordering* work. Immutability makes
reordering work easy in general.

--
Peter Geoghegan




Re: [PATCH] Add native windows on arm64 support

2022-08-30 Thread Tom Lane
Michael Paquier  writes:
> Hence, I am wondering if the solution here
> would be to do one xmlXPathNodeSetSort(xpathobj->nodesetval) after
> compiling the path with xmlXPathCompiledEval() in XmlTableGetValue().

Hmm, I see that function declared in , which
sure doesn't give me a warm feeling that we're supposed to call it
directly.  But maybe there's an approved way to get the result.

Or perhaps this test case is wrong, and instead of "node()" we
need to write something that specifies a sorted result?

regards, tom lane




Re: effective_multixact_freeze_max_age issue

2022-08-30 Thread Peter Geoghegan
On Mon, Aug 29, 2022 at 3:40 PM Nathan Bossart  wrote:
> Agreed.

Attached is v2, which cleans up the structure of
vacuum_set_xid_limits() a bit more. The overall idea was to improve
the overall flow/readability of the function by moving the WARNINGs
into their own "code stanza", just after the logic for establishing
freeze cutoffs and just before the logic for deciding on
aggressiveness. That is now the more logical approach (group the
stanzas by functionality), since we can't sensibly group the code
based on whether it deals with XIDs or with Multis anymore (not since
the function was taught to deal with the question of whether caller's
VACUUM will be aggressive).

Going to push this in the next day or so.

I also removed some local variables that seem to make the function a
lot harder to follow in v2. Consider code like this:

-   freezemin = freeze_min_age;
-   if (freezemin < 0)
-   freezemin = vacuum_freeze_min_age;
-   freezemin = Min(freezemin, autovacuum_freeze_max_age / 2);
-   Assert(freezemin >= 0);
+   if (freeze_min_age < 0)
+   freeze_min_age = vacuum_freeze_min_age;
+   freeze_min_age = Min(freeze_min_age, autovacuum_freeze_max_age / 2);
+   Assert(freeze_min_age >= 0);

Why have this freezemin temp variable? Why not just use the
vacuum_freeze_min_age function parameter directly instead? That is a
better representation of what's going on at the conceptual level. We
now assign vacuum_freeze_min_age to the vacuum_freeze_min_age arg (not
to the freezemin variable) when our VACUUM caller passes us a value of
-1 for that arg. -1 effectively means "whatever the value of the
vacuum_freeze_min_age GUC is'', which is clearer without the
superfluous freezemin variable.

-- 
Peter Geoghegan


v2-0001-Derive-freeze-cutoff-from-nextXID-not-OldestXmin.patch
Description: Binary data


Re: [PATCH] Add native windows on arm64 support

2022-08-30 Thread Michael Paquier
On Tue, Aug 30, 2022 at 08:07:27PM -0400, Tom Lane wrote:
> It seems like maybe we're relying on an ordering we should not.
> Yet surely the ordering of the pieces of this output is meaningful?
> Are we using the wrong libxml API to create this result?
> 
> Anyway, I'm now suspicious that we've accidentally exposed a logic
> bug in the XMLTABLE code, rather than anything wrong with the
> ASLR stuff.

Funny.  I have reached basically the same conclusion as you a couple
of minutes ago, but I also think that I have found what we need to do
here to ensure the ordering of the nodes generated by the libxml
code.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add native windows on arm64 support

2022-08-30 Thread Michael Paquier
On Wed, Aug 31, 2022 at 08:36:01AM +0900, Michael Paquier wrote:
> There have been more failures, always switching the input from
> "predeeppost"
> to "predeeppost".
> 
> Using a PATH of node() influences the output.  I am not verse unto
> XMLTABLE, but could it be an issue where each node is parsed and we
> have something like a qsort() applied on the pointer addresses for
> each part or something like that, causing the output to become
> unstable?

Hmm.  I think that I may have an idea here after looking at our xml.c
and xpath.c in libxml2/.  From what I understand, we process the PATH
through XmlTableGetValue() that builds a XML node path in
xmlXPathCompiledEval().  The interesting part comes from libxml2's
xmlXPathCompiledEvalInternal(), where I think we don't apply a sort on
the contents generated.  Hence, I am wondering if the solution here
would be to do one xmlXPathNodeSetSort(xpathobj->nodesetval) after
compiling the path with xmlXPathCompiledEval() in XmlTableGetValue().
This should ensure that the items are ordered even if ASLR mixes if
the pointer positions.

A complete solution would involve more code paths, but we'd need only
one change in XmlTableGetValue() for the current regression tests to
work.  I don't have an environment where I can reproduce that, so that
would be up to the buildfarm to stress this solution..

Thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add native windows on arm64 support

2022-08-30 Thread Tom Lane
Michael Paquier  writes:
> Using a PATH of node() influences the output.  I am not verse unto
> XMLTABLE, but could it be an issue where each node is parsed and we
> have something like a qsort() applied on the pointer addresses for
> each part or something like that, causing the output to become
> unstable?

I'm not sure.  I dug into this enough to find that the output from
this query is generated in xml.c lines 4635ff:

/* Concatenate serialized values */
initStringInfo();
for (int i = 0; i < count; i++)
{
textstr =

xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i],
 xtCxt->xmlerrcxt);

appendStringInfoText(, textstr);
}
cstr = str.data;

So evidently, the problem occurs because the elements of the
nodesetval->nodeTab[] array are in a different order than we expect.

I looked into  and found the definition of the "nodesetval"
struct, and the comments are eye-opening to say the least:

/*
 * A node-set (an unordered collection of nodes without duplicates).
 */
typedef struct _xmlNodeSet xmlNodeSet;
typedef xmlNodeSet *xmlNodeSetPtr;
struct _xmlNodeSet {
int nodeNr; /* number of nodes in the set */
int nodeMax;/* size of the array as allocated */
xmlNodePtr *nodeTab;/* array of nodes in no particular order */
/* @@ with_ns to check wether namespace nodes should be looked at @@ */
};

It seems like maybe we're relying on an ordering we should not.
Yet surely the ordering of the pieces of this output is meaningful?
Are we using the wrong libxml API to create this result?

Anyway, I'm now suspicious that we've accidentally exposed a logic
bug in the XMLTABLE code, rather than anything wrong with the
ASLR stuff.

regards, tom lane




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-30 Thread Peter Smith
Here are some review comments for the patch v9-0001:

==

1. Commit message

1a.
With this patch, I'm proposing the following change: If there is an
index on the subscriber, use the index as long as the planner
sub-modules picks any index over sequential scan. The index should be
a btree index, not a partital index. Finally, the index should have at
least one column reference (e.g., cannot consists of only
expressions).

SUGGESTION
With this patch, I'm proposing the following change: If there is any
index on the subscriber, let the planner sub-modules compare the costs
of index versus sequential scan and choose the cheapest. The index
should be a btree index, not a partial index, and it should have at
least one column reference (e.g., cannot consist of only expressions).

~

1b.
The Majority of the logic on the subscriber side exists in the code.

"exists" -> "already exists"

~

1c.
psql -c "truncate pgbench_accounts;" -p 9700 postgres

"truncate" -> "TRUNCATE"

~

1d.
Try to wrap this message text at 80 char width.

==

2. src/backend/replication/logical/relation.c - logicalrep_rel_open

+ /*
+ * Finding a usable index is an infrequent task. It is performed
+ * when an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ entry->usableIndexOid = LogicalRepUsableIndex(entry->localrel, remoterel);

Seemed a bit odd to say "performed" 2x in the same sentence.

"It is performed when..." -> "It occurs when...” (?)

~~~

3. src/backend/replication/logical/relation.c - logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It is performed
+ * when an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ part_entry->relmapentry.usableIndexOid =
+ LogicalRepUsableIndex(partrel, remoterel);

3a.
Same as comment #2 above.

~

3b.
The jumping between 'part_entry' and 'entry' is confusing. Since
'entry' is already assigned to be _entry->relmapentry can't you
use that here?

SUGGESTION
entry->usableIndexOid = LogicalRepUsableIndex(partrel, remoterel);

~~~

4. src/backend/replication/logical/relation.c - GetIndexOidFromPath

+/*
+ * Returns a valid index oid if the input path is an index path.
+ * Otherwise, return invalid oid.
+ */
+static Oid
+GetIndexOidFromPath(Path *path)

Perhaps may this function comment more consistent with others (like
GetRelationIdentityOrPK, LogicalRepUsableIndex) and refer to the
InvalidOid.

SUGGESTION
/*
 * Returns a valid index oid if the input path is an index path.
 *
 * Otherwise, returns InvalidOid.
 */

~~~

5. src/backend/replication/logical/relation.c - IndexOnlyOnExpression

+bool
+IndexOnlyOnExpression(IndexInfo *indexInfo)
+{
+ int i;
+ for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_IndexAttrNumbers[i];
+ if (AttributeNumberIsValid(attnum))
+ return false;
+ }
+
+ return true;
+}

5a.
Add a blank line after those declarations.

~

5b.
AFAIK the C99 style for loop declarations should be OK [1] for new
code, so declaring like below would be cleaner:

for (int i = 0; ...

~~~

6. src/backend/replication/logical/relation.c -
FilterOutNotSuitablePathsForReplIdentFull

+/*
+ * Iterates over the input path list and returns another path list
+ * where paths with non-btree indexes, partial indexes or
+ * indexes on only expressions are eliminated from the list.
+ */
+static List *
+FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)

"are eliminated from the list." -> "have been removed."

~~~

7.

+ foreach(lc, pathlist)
+ {
+ Path*path = (Path *) lfirst(lc);
+ Oid indexOid = GetIndexOidFromPath(path);
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;
+
+ if (!OidIsValid(indexOid))
+ {
+ /* Unrelated Path, skip */
+ suitableIndexList = lappend(suitableIndexList, path);
+ }
+ else
+ {
+ indexRelation = index_open(indexOid, AccessShareLock);
+ indexInfo = BuildIndexInfo(indexRelation);
+ is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ is_partial = (indexInfo->ii_Predicate != NIL);
+ is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+ index_close(indexRelation, NoLock);
+
+ if (is_btree && !is_partial && !is_only_on_expression)
+ suitableIndexList = lappend(suitableIndexList, path);
+ }
+ }

I think most of those variables are only used in the "else" block so
maybe it's better to declare them at that scope.

+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;

~~~

8. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+ * Indexes that consists of only expressions (e.g.,
+ * no simple column references on the index) are also
+ * eliminated with a similar reasoning.

"consists" -> "consist"

"with a similar reasoning" -> "with similar reasoning"

~~~

9.

+ * We also eliminate 

Re: [PATCH] Add native windows on arm64 support

2022-08-30 Thread Michael Paquier
On Mon, Aug 29, 2022 at 10:00:10PM -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Mon, Aug 29, 2022 at 05:33:56PM -0700, Andres Freund wrote:
> >> The weirdest part is that it only happens as part of the the pg_upgrade 
> >> test.
> 
> > make check has just failed:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird=2022-08-30%2001%3A15%3A13
> 
> So it *is* probabilistic, which is pretty much what you'd expect
> if ASLR triggers it.  That brings us no closer to understanding
> what the mechanism is, though.

There have been more failures, always switching the input from
"predeeppost"
to "predeeppost".

Using a PATH of node() influences the output.  I am not verse unto
XMLTABLE, but could it be an issue where each node is parsed and we
have something like a qsort() applied on the pointer addresses for
each part or something like that, causing the output to become
unstable?
--
Michael


signature.asc
Description: PGP signature


Re: Stack overflow issue

2022-08-30 Thread Tom Lane
I wrote:
> The upstream recommendation, which seems pretty sane to me, is to
> simply reject any string exceeding some threshold length as not
> possibly being a word.  Apparently it's common to use thresholds
> as small as 64 bytes, but in the attached I used 1000 bytes.

On further thought: that coding treats anything longer than 1000
bytes as a stopword, but maybe we should just accept it unmodified.
The manual says "A Snowball dictionary recognizes everything, whether
or not it is able to simplify the word".  While "recognizes" formally
includes the case of "recognizes as a stopword", people might find
this behavior surprising.  We could alternatively do it as attached,
which accepts overlength words but does nothing to them except
case-fold.  This is closer to the pre-patch behavior, but gives up
the opportunity to avoid useless downstream processing of long words.

regards, tom lane

diff --git a/src/backend/snowball/dict_snowball.c b/src/backend/snowball/dict_snowball.c
index 68c9213f69..1d5dfff5a0 100644
--- a/src/backend/snowball/dict_snowball.c
+++ b/src/backend/snowball/dict_snowball.c
@@ -275,8 +275,24 @@ dsnowball_lexize(PG_FUNCTION_ARGS)
 	char	   *txt = lowerstr_with_len(in, len);
 	TSLexeme   *res = palloc0(sizeof(TSLexeme) * 2);
 
-	if (*txt == '\0' || searchstoplist(&(d->stoplist), txt))
+	/*
+	 * Do not pass strings exceeding 1000 bytes to the stemmer, as they're
+	 * surely not words in any human language.  This restriction avoids
+	 * wasting cycles on stuff like base64-encoded data, and it protects us
+	 * against possible inefficiency or misbehavior in the stemmer.  (For
+	 * example, the Turkish stemmer has an indefinite recursion, so it can
+	 * crash on long-enough strings.)  However, Snowball dictionaries are
+	 * defined to recognize all strings, so we can't reject the string as an
+	 * unknown word.
+	 */
+	if (len > 1000)
+	{
+		/* return the lexeme lowercased, but otherwise unmodified */
+		res->lexeme = txt;
+	}
+	else if (*txt == '\0' || searchstoplist(&(d->stoplist), txt))
 	{
+		/* empty or stopword, so report as stopword */
 		pfree(txt);
 	}
 	else


Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread David Rowley
On Wed, 31 Aug 2022 at 02:17, Tom Lane  wrote:
>
> I wrote:
> > So maybe we should revisit the question.  It'd be worth collecting some
> > stats about how much extra space would be needed if we force there
> > to be room for a sentinel.
>
> Actually, after ingesting more caffeine, the problem with this for aset.c
> is that the only way to add space for a sentinel that didn't fit already
> is to double the space allocation.  That's a little daunting, especially
> remembering how many places deliberately allocate power-of-2-sized
> arrays.

I decided to try and quantify that by logging the size, MAXALIGN(size)
and the power of 2 size during AllocSetAlloc and GenerationAlloc. I
made the pow2_size 0 in GenerationAlloc and in AlloocSetAlloc when
size > allocChunkLimit.

After running make installcheck, grabbing the records out the log and
loading them into Postgres, I see that if we did double the pow2_size
when there's no space for the sentinel byte then we'd go from
allocating a total of 10.2GB all the way to 16.4GB (!) of
non-dedicated block aset.c allocations.

select
round(sum(pow2_Size)::numeric/1024/1024/1024,3) as pow2_size,
round(sum(case when maxalign_size=pow2_size then pow2_size*2 else
pow2_size end)::numeric/1024/1024/1024,3) as method1,
round(sum(case when maxalign_size=pow2_size then pow2_size+8 else
pow2_size end)::numeric/1024/1024/1024,3) as method2
from memstats
where pow2_size > 0;
 pow2_size | method1 | method2
---+-+-
10.194 |  16.382 |  10.463

if we did just add on an extra 8 bytes (or or MAXALIGN(size+1) at
least), then that would take the size up to 10.5GB.

David




Re: Comma-separated predicates in simple CASE expressions (f263)

2022-08-30 Thread Daniel Gustafsson
> On 31 Aug 2022, at 00:20, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> I was looking at F263 from the SQL standard, Comma-separated predicates in
>> simple CASE expression, and thinking if we could support this within the
>> framework we already have at a minimal added cost.  The attached sketch diff
>> turns each predicate in the list into a CaseWhen node and uses the location
>> from parsing for grouping in errorhandling for searched case.
> 
>> Is this a viable approach or am I missing something obvious?

Thanks for looking!

> I don't particularly like duplicating the THEN clause multiple times.
> I think if we're going to do this we should do it right, and that
> means a substantially larger patch to propagate the notion of multiple
> comparison values all the way down.

Fair enough, I think that's doable without splitting the simple and searched
case in the parser which I think would be a good thing to avoid.  I'll take a
stab at it.

> I also don't care for the bit in transformCaseExpr where you seem
> to be relying on subexpression location fields to make semantic
> decisions.  Surely there's a better way.

If we group the predicates such a single node contains the full list then we'll
have all the info we need at that point.

--
Daniel Gustafsson   https://vmware.com/





Re: introduce bufmgr hooks

2022-08-30 Thread Nathan Bossart
Thanks for taking a look.

On Tue, Aug 30, 2022 at 01:02:20PM +0900, Kyotaro Horiguchi wrote:
> smgr is an abstract interface originally intended to allow to choose
> one implementation among several (though cannot dynamically). Even
> though the patch intends to replace specific (but most of all) uses of
> the smgrread/write, still it sounds somewhat strange to me to add
> hooks to replace smgr functions in that respect.  I'm not sure whether
> we still regard smgr as just an interface, though..

I suspect that it's probably still worthwhile to provide such hooks so that
you don't have to write an entire smgr implementation.  But I think you
bring up a good point.

> As for the names, bufmgr_read_hook looks like as if it is additionally
> called when the normal operation performed by smgrread completes, or
> just before. (planner_hook already doesn't sounds so for me, though:p)
> "bufmgr_alt_smgrread" works for me but I'm not sure it is following
> the project policy.

Yeah, the intent is for this hook to replace the smgrread() call (although
it might end up calling smgrread()).  I debated having this hook return
whether smgrread() needs to be called.  Would that address your concern?

> I think that the INSTR_* section should enclose the hook call as it is
> still an I/O operation in the view of the core.

Okay, will do.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Comma-separated predicates in simple CASE expressions (f263)

2022-08-30 Thread Tom Lane
Daniel Gustafsson  writes:
> I was looking at F263 from the SQL standard, Comma-separated predicates in
> simple CASE expression, and thinking if we could support this within the
> framework we already have at a minimal added cost.  The attached sketch diff
> turns each predicate in the list into a CaseWhen node and uses the location
> from parsing for grouping in errorhandling for searched case.

> Is this a viable approach or am I missing something obvious?

I don't particularly like duplicating the THEN clause multiple times.
I think if we're going to do this we should do it right, and that
means a substantially larger patch to propagate the notion of multiple
comparison values all the way down.

I also don't care for the bit in transformCaseExpr where you seem
to be relying on subexpression location fields to make semantic
decisions.  Surely there's a better way.

regards, tom lane




Comma-separated predicates in simple CASE expressions (f263)

2022-08-30 Thread Daniel Gustafsson
I was looking at F263 from the SQL standard, Comma-separated predicates in
simple CASE expression, and thinking if we could support this within the
framework we already have at a minimal added cost.  The attached sketch diff
turns each predicate in the list into a CaseWhen node and uses the location
from parsing for grouping in errorhandling for searched case.

Is this a viable approach or am I missing something obvious?

--
Daniel Gustafsson   https://vmware.com/



f263_case_list.diff
Description: Binary data


Re: replacing role-level NOINHERIT with a grant-level option

2022-08-30 Thread Nathan Bossart
On Tue, Aug 30, 2022 at 08:33:46AM -0400, Robert Haas wrote:
> Third try.

Now that I have this grantor stuff fresh in my mind, this patch looks good
to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: replacing role-level NOINHERIT with a grant-level option

2022-08-30 Thread Nathan Bossart
On Tue, Aug 30, 2022 at 03:24:56PM -0400, Robert Haas wrote:
> - william => charles => elizabeth => uk is OK because the first two
> hops have ADMIN and the last hop has INHERIT

Don't you mean the first two hops have INHERIT and the last has ADMIN?

> - william => charles => elizabeth => uk => parliament is probably not
> OK because although the first two hops have ADMIN and the last has
> INHERIT, the third hop probably lacks INHERIT

Same here.  I don't mean to be pedantic, I just want to make sure I'm
thinking of this correctly.

> I hope this makes it clearer. select_best_grantor() can't completely
> disregard links without INHERIT, because if it does, it can't pay any
> attention to the last grant in the chain, which only needs to have
> ADMIN, not INHERIT. But it must pay some attention to them, because
> every earlier link in the chain does need to have INHERIT. By moving
> the if-test up, the patch makes it behave just that way, or at least,
> I think it does.

Yes, this is very helpful.  I always appreciate your detailed examples.  I
think what you are describing matches the mental model I was beginning to
form.

Okay, now to take a closer look at the patch...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Slight refactoring of state check in pg_upgrade check_ function

2022-08-30 Thread Bruce Momjian
On Sun, Aug 28, 2022 at 03:06:09PM -0700, Nathan Bossart wrote:
> On Sun, Aug 28, 2022 at 10:42:24PM +0200, Daniel Gustafsson wrote:
> > I noticed that the pg_upgrade check_ functions were determining failures 
> > found
> > in a few different ways.  Some keep a boolen flag variable, and some (like
> > check_for_incompatible_polymorphics) check the state of the script 
> > filehandle
> > which is guaranteed to be set (with the error message referring to the path 
> > of
> > said file).  Others like check_loadable_libraries only check the flag 
> > variable
> > and fclose the handle assuming it was opened.
> > 
> > The attached diff changes the functions to do it consistently in one way, by
> > checking the state of the filehandle.  Since we are referring to the file by
> > path in the printed error message it seemed the cleanest approach, and it 
> > saves
> > a few lines of code without IMO reducing readability.
> > 
> > There is no change in functionality, just code consistency.
> 
> The patch looks reasonable to me.

+1.  Those checks have accumulated over time with different authors,
hence the stylistic differences.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: New strategies for freezing, advancing relfrozenxid early

2022-08-30 Thread Peter Geoghegan
On Tue, Aug 30, 2022 at 11:11 AM Jeff Davis  wrote:
> The solution involves more changes to the philosophy and mechanics of
> vacuum than I would expect, though. For instance, VM snapshotting,
> page-level-freezing, and a cost model all might make sense, but I don't
> see why they are critical for solving the problem above.

I certainly wouldn't say that they're critical. I tend to doubt that I
can be perfectly crisp about what the exact relationship is between
each component in isolation and how it contributes towards addressing
the problems we're concerned with.

> I think I'm
> still missing something. My mental model is closer to the bgwriter and
> checkpoint_completion_target.

That's not a bad starting point. The main thing that that mental model
is missing is how the timeframes work with VACUUM, and the fact that
there are multiple timeframes involved (maybe the system's vacuuming
work could be seen as having one timeframe at the highest level, but
it's more of a fractal picture overall). Checkpoints just don't take
that long, and checkpoint duration has a fairly low variance (barring
pathological performance problems).

You only have so many buffers that you can dirty, too -- it's a
self-limiting process. This is even true when (for whatever reason)
the checkpoint_completion_target logic just doesn't do what it's
supposed to do. There is more or less a natural floor on how bad
things can get, so you don't have to invent a synthetic floor at all.
LSM-based DB systems like the MyRocks storage engine for MySQL don't
use checkpoints at all -- the closest analog is compaction, which is
closer to a hybrid of VACUUM and checkpointing than anything else.

The LSM compaction model necessitates adding artificial throttling to
keep the system stable over time [1]. There is a disconnect between
the initial ingest of data, and the compaction process. And so
top-down modelling of costs and benefits with compaction is more
natural with an LSM [2] -- and not a million miles from the strategy
stuff I'm proposing.

> Allow me to make a naive counter-proposal (not a real proposal, just so
> I can better understand the contrast with your proposal):

> I know there would still be some problem cases, but to me it seems like
> we solve 80% of the problem in a couple dozen lines of code.

It's not that this statement is wrong, exactly. It's that I believe
that it is all but mandatory for me to ameliorate the downside that
goes with more eager freezing, for example by not doing it at all when
it doesn't seem to make sense. I want to solve the big problem of
freeze debt, without creating any new problems. And if I should also
make things in adjacent areas better too, so much the better.

Why stop at a couple of dozens of lines of code? Why not just change
the default of vacuum_freeze_min_age and
vacuum_multixact_freeze_min_age to 0?

> a. Can you clarify some of the problem cases, and why it's worth
> spending more code to fix them?

For one thing if we're going to do a lot of extra freezing, we really
want to "get credit" for it afterwards, by updating relfrozenxid to
reflect the new oldest extant XID, and so avoid getting an
antiwraparound VACUUM early, in the near future.

That isn't strictly true, of course. But I think that we at least
ought to have a strong bias in the direction of updating relfrozenxid,
having decided to do significantly more freezing in some particular
VACUUM operation.

> b. How much of your effort is groundwork for related future
> improvements? If it's a substantial part, can you explain in that
> larger context?

Hard to say. It's true that the idea of VM snapshots is quite general,
and could have been introduced in a number of different ways. But I
don't think that that should count against it. It's also not something
that seems contrived or artificial -- it's at least as good of a
reason to add VM snapshots as any other I can think of.

Does it really matter if this project is the freeze debt project, or
the VM snapshot project? Do we even need to decide which one it is
right now?

> c. Can some of your patches be separated into independent discussions?
> For instance, patch 1 has been discussed in other threads and seems
> independently useful, and I don't see the current work as dependent on
> it.

I simply don't know if I can usefully split it up just yet.

> Patch 4 also seems largerly independent.

Patch 4 directly compensates for a problem created by the earlier
patches. The patch series as a whole isn't supposed to amerliorate the
problem of MultiXacts being allocated in VACUUM. It only needs to
avoid making the situation any worse than it is today IMV (I suspect
that the real fix is to make the VACUUM FREEZE command not tune
vacuum_freeze_min_age).

> d. Can you help give me a sense of scale of the problems solved by
> visibilitymap snapshots and the cost model? Do those need to be in v1?

I'm not sure. I think that having certainty that we'll be able to scan
only so many pages 

Re: First draft of the PG 15 release notes

2022-08-30 Thread Bruce Momjian
On Sat, Aug 27, 2022 at 04:03:02PM +0200, Matthias van de Meent wrote:
> Hi,
> 
> I noticed a stray "DETAILS?" marker while going through the release
> notes for 15. Is that subsection still under construction or review?
> 
> > 
> >  
> >   Record and check the collation of each  >   linkend="sql-createdatabase">database (Peter Eisentraut)
> > [...]
> >  to match the operating system collation version.  DETAILS?
> >  
> > 

Good question --- the full text is:

 
  
   Record and check the collation of each database (Peter Eisentraut)
  

  
   This is designed to detect collation
   mismatches to avoid data corruption.  Function
   pg_database_collation_actual_version()
   reports the underlying operating system collation version, and
   ALTER DATABASE ...  REFRESH sets the database
   to match the operating system collation version.  DETAILS?
  
 

I just can't figure out what the user needs to understand about this,
and I understand very little of it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





PostgreSQL 15 release announcement draft

2022-08-30 Thread Jonathan S. Katz

Hi,

Please see the first draft for the PostgreSQL 15 release announcement. 
This is the announcement that goes out when we ship 15.0.


A few notes on the first draft:

1. I have not put in any links yet -- I want to ensure the document is 
close to being static before I add those in.


2. I have left in a blurb about SQL/JSON while awaiting the decision on 
if the feature is included in v15.


Please provide feedback no later than 2022-09-10 0:00 AoE[1]. After this 
date, we will begin assembling the presskit that includes the translations.


Thanks,

Jonathan

[1] https://en.wikipedia.org/wiki/Anywhere_on_Earth
The PostgreSQL Global Development Group today announced the release of
[PostgreSQL 15](https://www.postgresql.org/docs/15/release-15.html), the latest
version of the world’s [most advanced open source 
database](https://www.postgresql.org/).

PostgreSQL 15 builds on the performance improvements of recent releases with
noticeable gains for managing workloads in both local and distributed
deployments, including improved sorting. This release improves the developer
experience with the addition of the popular `MERGE` command, and adds more
capabilities for observing the state of the database.



[PostgreSQL](https://www.postgresql.org), an innovative data management system
known for its reliability and robustness, benefits from over 25 years of open
source development from a [global developer 
community](https://www.postgresql.org/community/)
and has become the preferred open source relational database for organizations
of all sizes.

### Improved Sort Performance and Compression

In this latest release, PostgreSQL improves on its in-memory and on-disk sorting
algorithms, with benchmarks showing speedups of 25% - 400% based on sort types.
Using `row_number()`, `rank()`, and `count()` as window functions also have
performance benefits in PostgreSQL 15, and queries using `SELECT DISTINCT` can
now be executed in parallel.

Building on work from the previous PostgreSQL release for allowing async remote
queries, the PostgreSQL foreign data wrapper, `postgres_fdw`, can now commit
transactions in parallel.

The performance improvements in PostgreSQL 15 extend to its archiving and backup
facilities. PostgreSQL 15 adds support for LZ4 and Zstandard (zstd) compression
to write-ahead log (WAL) files, which can have both space and performance
benefits for certain workloads. On certain operating systems, PostgreSQL 15
supports the ability to prefetch WAL file contents and speed up recovery times.
PostgreSQL's built-in backup command, `pg_basebackup`, now supports server-side
compression of backup files with a choice of gzip, LZ4, and zstd. PostgreSQL 15
includes the ability to use custom modules for archiving, which eliminates the
overhead of using a shell command.

### Expressive Developer Features

PostgreSQL 15 includes the SQL standard `MERGE` command. `MERGE` lets you write
conditional SQL statements that include `INSERT`, `UPDATE`, and `DELETE` actions
within a single statement.

PostgreSQL 15 expands on its support for the SQL/JSON standard, including syntax
for supporting JSON constructors, introspection functions, and the ability to
convert JSON data into a table using the `JSON_TABLE` function.

This latest release adds new functions for using regular expressions to inspect
strings: `regexp_count()`, `regexp_instr()`, `regexp_like()`, and
`regexp_substr()`. PostgreSQL 15 also extends the `range_agg` function to
aggregate `multirange` data types, which were introduced in the previous
release.

PostgreSQL 15 lets user create views that query data using the permissions of
the caller, not the view creator. This option, called `security_invoker`, adds
an additional layer of protection to ensure view callers have the correct
permissions for working with the underlying data.

### More Options with Logical Replication

PostgreSQL 15 provides more flexibility for managing logical replication. This
release introduces row and column filtering for publishers, letting users choose
to replicate a subset of data from a table. PostgreSQL 15 adds features to
simplify conflict management, including the ability to skip replaying a
conflicting transaction and to automatically disable a subscription if an error
is detected. This release also includes support for using two-phase commit (2PC)
with logical replication.

### Logging and Configuration Enhancements

PostgreSQL 15 introduces a new logging format: `jsonlog`. This new format
outputs log data using a defined JSON structure, which allows PostgreSQL logs to
be processed in structured logging systems. This release also adds a new
built-in extension, `pg_walinspect`, that lets users inspect the contents of
write-ahead log files directly from a SQL interface.

PostgreSQL 15 gives database administrators more flexibility in how users can
manage PostgreSQL configuration, adding the ability to grant users permission to
alter server-level configuration parameters. 

Re: replacing role-level NOINHERIT with a grant-level option

2022-08-30 Thread Robert Haas
On Tue, Aug 30, 2022 at 1:30 PM Nathan Bossart  wrote:
> I've been staring at this stuff for a while, and I'm still rather confused.
>
> * You mention that you modelled the "new" function select_best_grantor() on
> is_admin_of_role(), but it looks like that function was introduced in 2005.
> IIUC you meant select_best_admin(), which was added much more recently.

Well, a combination of the two. It's modeled after is_admin_of_role()
in the sense that it also calls roles_is_member_of() with a
non-InvalidOid third argument and a non-NULL fourth argument. All
other callers pass InvalidOid/NULL.

> * I don't see why we should ignore inheritance until after checking admin
> option.  Prior to your commit (e3ce2de), we skipped checking for admin
> option if the role had [NO]INHERIT, and AFAICT your commit aimed to retain
> that behavior.  The comment above check_role_grantor() even mentions
> "inheriting the privileges of a role which does have ADMIN OPTION."  Within
> the function, I see the following comment:
>
>  * Otherwise, the grantor must either have ADMIN OPTION on 
> the role or
>  * inherit the privileges of a role which does. In the former 
> case,
>  * record the grantor as the current user; in the latter, 
> pick one of
>  * the roles that is "most directly" inherited by the current 
> role
>  * (i.e. fewest "hops").
>
> So, IIUC, the intent is for ADMIN OPTION to apply even if for non-inherited
> grants, but we still choose to recurse in roles_is_member_of() based on
> inheritance.  This means that you can inherit the ADMIN OPTION to some
> degree, but if you have membership WITH INHERIT FALSE to a role that has
> ADMIN OPTION on another role, the current role effectively does not have
> ADMIN OPTION on the other role.  Is this correct?

I think it's easiest if I get an example. Suppose role 'william'
inherits 'charles' and 'diana' and role 'charles' in turn inherits
from roles 'elizabeth' and 'philip'. Furthermore, 'william' has ADMIN
OPTION on 'cambridge', 'charles' on 'wales', and 'elizabeth' on 'uk'.
Now, because 'william' has ADMIN OPTION on role 'cambridge', he can
add members to that role and remove the ones he added. He can also
grant admin out option to others and later remove it (and all
dependent grants that those others have made). When he does this, he
is acting as himself, on his own authority. Because he inherits the
privileges of charles directly and of elizabeth via charles, he can
also add members to wales and uk. But if does this, he is not acting
as himself, because he himself does not possess the ADMIN OPTION on
either of those roles. Rather, he is acting on authority delegated
from some other role which does possess admin option on those roles.
Therefore, if 'william' adds a member to 'uk', the grantor MUST be
recorded as 'elizabeth', not 'william', because the grantor of record
must possess admin option on the role.

Now let's add another layer of complexity. Suppose that the 'uk' role
possesses ADMIN OPTION on some other role, let's say 'parliament'. Can
'william' use the fact that he inherits from 'elizabeth' to grant
membership in 'parliament'? That all depends on whether 'uk' was
granted to 'elizabeth' WITH INHERIT TRUE or WITH INHERIT FALSE. If it
was granted WITH INHERIT TRUE, then elizabeth freely exercises all the
powers of parliament, william freely exercises all of elizabeth's
powers, and thus william can grant membership in parliament. Such a
membership will be recorded as having been granted by uk, the role
that actually holds ADMIN OPTION on parliament. However, if elizabeth
was granted uk WITH INHERIT FALSE, then she can't mess with the
membership of parliament, and thus neither can william. At least, not
without first executing "SET ROLE uk", which in this scenario, either
could do, but perhaps the use of SET ROLE is logged, audited, and very
carefully scrutinized.

So, what does all of this mean for select_best_grantor() and its
helper function roles_is_member_of()? Well, first, we have to recurse.
william can act on his own behalf when administering cambridge, and on
behalf of charles or elizabeth when administering wales or uk
respectively, and there's no way to get that behavior without
recursing. Second, we have to stop the recursion when we reach a grant
that is not inherited. Otherwise, william might be able to act on
behalf of uk and administer membership in parliament even if uk is
granted to elizabeth WITH INHERIT FALSE. Third, and this is where it
gets tricky, we have to stop recursion *only after checking whether
we've found a valid grantor*. william is allowed to administer
membership cambridge whether or not it is granted to william WITH
INHERIT TRUE or WITH INHERIT FALSE. Either way, he possess ADMIN
OPTION on that role, and may administer membership in it. Likewise, he
can administer membership in wales or uk as charles or elizabeth,
respectively, because he 

Re: \pset xheader_width page as default? (Re: very long record lines in expanded psql output)

2022-08-30 Thread Andrew Dunstan


On 2022-08-30 Tu 10:55, Pavel Stehule wrote:
>
>
> út 30. 8. 2022 v 16:49 odesílatel Pavel Stehule
>  napsal:
>
>
>
> út 30. 8. 2022 v 16:36 odesílatel Christoph Berg 
> napsal:
>
> Re: Pavel Stehule
> > pspg requires all lines to have the same width. It can do
> some corrections
> > - but it is hard to detect wanted differences or just plain
> text format.
> >
> > can be nice to have the first invisible row with some
> information about
> > used formatting. pspg does some heuristic but this code is
> not nice and it
> > is fragile.
>
> I like pspg and use it myself, but I don't think a tool that
> does the
> right thing by hiding a full screen of  from the user should
> hinder making the same progress in psql with a simple pager.
>
>
> ASCII allows to set some metadata, that should be invisible in all
> correctly implemented pagers.
>
>
> or these parameters can be sent by pager's command line or via some
> environment variable. Currently there are only two pagers on the world
> that support tabular format, and both are created against psql (pspg
> and ov), so we can define our own protocol. Surely - pspg will have
> heuristic forever, because I want to support psql, mysql and many
> others. But it can be fine to switch to some more robust mode. It can
> be interesting for continuous load via pipe.  
>

I'm somewhat sympathetic to Christoph's position.

Surely pspg could itself issue

   \pset xheader_width full

at the start of a session.


cheers


andrew

-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Postmaster self-deadlock due to PLT linkage resolution

2022-08-30 Thread Andres Freund
Hi,

On 2022-08-30 14:32:26 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-08-30 14:07:41 -0400, Tom Lane wrote:
> >> Do we want to install this just for NetBSD, or more widely?
> >> I think we'd better back-patch it for NetBSD, so I'm inclined
> >> to be conservative about the change.
> 
> > It's likely a good idea to enable it everywhere applicable, but I agree that
> > we shouldn't unnecessarily do so in the backbranches. So I'd be inclined to
> > add it to the netbsd template for the backbranches.
> 
> > For HEAD I can see putting it into all the applicable templates, adding an
> > AC_LINK_IFELSE() test, or just putting it into the meson stuff.
> 
> For the moment I'll stick it into the netbsd template.

Cool.


> I'm not on board with having the meson stuff generating different
> executables than the Makefiles do, so if someone wants to propose applying
> this widely, they'll need to fix both.  Seems like that is a good thing to
> consider after the meson patches land.  We don't need unnecessary churn in
> that area before that.

Yea, I didn't like that idea either, hence listing it last...

Greetings,

Andres Freund




Re: Tracking last scan time

2022-08-30 Thread Bruce Momjian
On Fri, Aug 26, 2022 at 02:05:36PM +0100, Dave Page wrote:
> On Thu, 25 Aug 2022 at 01:44, David Rowley  wrote:
> I don't have a particular opinion about the patch, I'm just pointing
> out that there are other ways. Even just writing down the numbers on a
> post-it note and coming back in a month to see if they've changed is
> enough to tell if the table or index has been used.
> 
> 
> There are usually other ways to perform monitoring tasks, but there is
> something to be said for the convenience of having functionality built in and
> not having to rely on tools, scripts, or post-it notes :-)

Should we consider using something cheaper like time() so we don't need
a GUC to enable this?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Hash index build performance tweak from sorting

2022-08-30 Thread Ranier Vilela
>It's a shame you only see 3%, but that's still worth it.
Hi,

I ran this test here:

DROP TABLE hash_speed;
CREATE unlogged TABLE hash_speed (x integer);
INSERT INTO hash_speed SELECT random()*1000 FROM
generate_series(1,1000) x;
VACUUM
Timing is on.
CREATE INDEX ON hash_speed USING hash (x);

head:
Time: 20526,490 ms (00:20,526)

attached patch (v3):
Time: 18810,777 ms (00:18,811)

I can see 9%, with the patch (v3) attached.

This optimization would not apply in any way also to _hash_pgaddmultitup?

regards,
Ranier Vilela
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index c361509d68..a68eb40b2b 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -39,6 +39,7 @@ typedef struct
 	HSpool	   *spool;			/* NULL if not using spooling */
 	double		indtuples;		/* # tuples accepted into index */
 	Relation	heapRel;		/* heap relation descriptor */
+	HashInsertState istate;		/* insert state */
 } HashBuildState;
 
 static void hashbuildCallback(Relation index,
@@ -118,6 +119,7 @@ hashbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	uint32		num_buckets;
 	long		sort_threshold;
 	HashBuildState buildstate;
+	HashInsertStateData insertstate;
 
 	/*
 	 * We expect to be called exactly once for any index relation. If that's
@@ -158,13 +160,20 @@ hashbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 		sort_threshold = Min(sort_threshold, NLocBuffer);
 
 	if (num_buckets >= (uint32) sort_threshold)
-		buildstate.spool = _h_spoolinit(heap, index, num_buckets);
+	{
+		insertstate.sorted = true;
+		buildstate.spool = _h_spoolinit(heap, index, num_buckets, (HashInsertState) );
+	}
 	else
+	{
+		insertstate.sorted = false;
 		buildstate.spool = NULL;
+	}
 
 	/* prepare to build the index */
-	buildstate.indtuples = 0;
 	buildstate.heapRel = heap;
+	buildstate.indtuples = 0;
+	buildstate.istate = (HashInsertState) 
 
 	/* do the heap scan */
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
@@ -231,7 +240,7 @@ hashbuildCallback(Relation index,
 		itup = index_form_tuple(RelationGetDescr(index),
 index_values, index_isnull);
 		itup->t_tid = *tid;
-		_hash_doinsert(index, itup, buildstate->heapRel);
+		_hash_doinsert(index, itup, buildstate->heapRel, buildstate->istate);
 		pfree(itup);
 	}
 
@@ -254,6 +263,7 @@ hashinsert(Relation rel, Datum *values, bool *isnull,
 	Datum		index_values[1];
 	bool		index_isnull[1];
 	IndexTuple	itup;
+	HashInsertStateData istate;
 
 	/* convert data to a hash key; on failure, do not insert anything */
 	if (!_hash_convert_tuple(rel,
@@ -265,7 +275,9 @@ hashinsert(Relation rel, Datum *values, bool *isnull,
 	itup = index_form_tuple(RelationGetDescr(rel), index_values, index_isnull);
 	itup->t_tid = *ht_ctid;
 
-	_hash_doinsert(rel, itup, heapRel);
+	istate.sorted = false;
+
+	_hash_doinsert(rel, itup, heapRel, (HashInsertState) );
 
 	pfree(itup);
 
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 4f2fecb908..4d17c841c0 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -34,7 +34,7 @@ static void _hash_vacuum_one_page(Relation rel, Relation hrel,
  *		and hashinsert.  By here, itup is completely filled in.
  */
 void
-_hash_doinsert(Relation rel, IndexTuple itup, Relation heapRel)
+_hash_doinsert(Relation rel, IndexTuple itup, Relation heapRel, HashInsertState istate)
 {
 	Buffer		buf = InvalidBuffer;
 	Buffer		bucket_buf;
@@ -198,7 +198,7 @@ restart_insert:
 	START_CRIT_SECTION();
 
 	/* found page with enough space, so add the item here */
-	itup_off = _hash_pgaddtup(rel, buf, itemsz, itup);
+	itup_off = _hash_pgaddtup(rel, buf, itemsz, itup, istate->sorted);
 	MarkBufferDirty(buf);
 
 	/* metapage operations */
@@ -266,18 +266,28 @@ restart_insert:
  * page are sorted by hashkey value.
  */
 OffsetNumber
-_hash_pgaddtup(Relation rel, Buffer buf, Size itemsize, IndexTuple itup)
+_hash_pgaddtup(Relation rel, Buffer buf, Size itemsize, IndexTuple itup, bool sorted)
 {
 	OffsetNumber itup_off;
 	Page		page;
-	uint32		hashkey;
 
 	_hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
 	page = BufferGetPage(buf);
 
-	/* Find where to insert the tuple (preserving page's hashkey ordering) */
-	hashkey = _hash_get_indextuple_hashkey(itup);
-	itup_off = _hash_binsearch(page, hashkey);
+	/*
+	 * Find where to insert the tuple (preserving page's hashkey ordering).
+	 * If the input is already sorted by hashkey, then we can avoid the
+	 * binary search and just add it to the end of the page.
+	 */
+	if (sorted)
+		itup_off = PageGetMaxOffsetNumber(page) + 1;
+	else
+	{
+		uint32		hashkey;
+
+		hashkey = _hash_get_indextuple_hashkey(itup);
+		itup_off = _hash_binsearch(page, hashkey);
+	}
 
 	if (PageAddItem(page, (Item) itup, itemsize, itup_off, false, false)
 		== InvalidOffsetNumber)
@@ -300,7 +310,6 @@ void
 _hash_pgaddmultitup(Relation rel, Buffer buf, IndexTuple *itups,
 	

Re: Postmaster self-deadlock due to PLT linkage resolution

2022-08-30 Thread Tom Lane
Andres Freund  writes:
> On 2022-08-30 14:07:41 -0400, Tom Lane wrote:
>> Do we want to install this just for NetBSD, or more widely?
>> I think we'd better back-patch it for NetBSD, so I'm inclined
>> to be conservative about the change.

> It's likely a good idea to enable it everywhere applicable, but I agree that
> we shouldn't unnecessarily do so in the backbranches. So I'd be inclined to
> add it to the netbsd template for the backbranches.

> For HEAD I can see putting it into all the applicable templates, adding an
> AC_LINK_IFELSE() test, or just putting it into the meson stuff.

For the moment I'll stick it into the netbsd template.  I'm not on
board with having the meson stuff generating different executables
than the Makefiles do, so if someone wants to propose applying
this widely, they'll need to fix both.  Seems like that is a good
thing to consider after the meson patches land.  We don't need
unnecessary churn in that area before that.

regards, tom lane




Re: Postmaster self-deadlock due to PLT linkage resolution

2022-08-30 Thread Andres Freund
Hi,

On 2022-08-30 14:07:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-08-30 13:24:39 -0400, Tom Lane wrote:
> >> Andres Freund  writes:
> >>> Perhaps it'd be saner to default to building with -Wl,-z,now? That should 
> >>> fix
> >>> the problem too, right (and if we combine it with relro, it'd be a 
> >>> security
> >>> improvement to boot).
> 
> >> Hm.  Not sure if that works on NetBSD, but I'll check it out.
> 
> > FWIW, it's a decently (well over 10 years) old thing I think. And it's 
> > documented in
> > the netbsd ld manpage and their packaging guide (albeit indirectly, with 
> > their
> > tooling doing the work of specifying the flags):
> > https://www.netbsd.org/docs/pkgsrc/hardening.html#hardening.audit.relrofull
> 
> It does appear that they use GNU ld, and I've just finished confirming
> that each of those switches has the expected effects on my PPC box.
> So yeah, this looks like a better answer.

Cool.


> Do we want to install this just for NetBSD, or more widely?
> I think we'd better back-patch it for NetBSD, so I'm inclined
> to be conservative about the change.

It's likely a good idea to enable it everywhere applicable, but I agree that
we shouldn't unnecessarily do so in the backbranches. So I'd be inclined to
add it to the netbsd template for the backbranches.

For HEAD I can see putting it into all the applicable templates, adding an
AC_LINK_IFELSE() test, or just putting it into the meson stuff.

Greetings,

Andres Freund




Re: Convert *GetDatum() and DatumGet*() macros to inline functions

2022-08-30 Thread Aleksander Alekseev
Hi hackers,

> Here is v3 with silenced compiler warnings.

Some more warnings were reported by cfbot, so here is v4. Apologies
for the noise.

-- 
Best regards,
Aleksander Alekseev


v4-0001-Fix-incorrect-uses-of-Datum-conversion-macros.patch
Description: Binary data


v4-0002-Convert-GetDatum-and-DatumGet-macros-to-inline-fu.patch
Description: Binary data


Re: New strategies for freezing, advancing relfrozenxid early

2022-08-30 Thread Jeff Davis
On Thu, 2022-08-25 at 14:21 -0700, Peter Geoghegan wrote:
> Attached patch series is a completely overhauled version of earlier
> work on freezing. Related work from the Postgres 15 cycle became
> commits 0b018fab, f3c15cbe, and 44fa8488.
> 
> Recap
> =
> 
> The main high level goal of this work is to avoid painful, disruptive
> antiwraparound autovacuums (and other aggressive VACUUMs) that do way
> too much "catch up" freezing, all at once

I agree with the motivation: that keeping around a lot of deferred work
(unfrozen pages) is risky, and that administrators would want a way to
control that risk.

The solution involves more changes to the philosophy and mechanics of
vacuum than I would expect, though. For instance, VM snapshotting,
page-level-freezing, and a cost model all might make sense, but I don't
see why they are critical for solving the problem above. I think I'm
still missing something. My mental model is closer to the bgwriter and
checkpoint_completion_target.

Allow me to make a naive counter-proposal (not a real proposal, just so
I can better understand the contrast with your proposal):

  * introduce a reloption unfrozen_pages_target (default -1, meaning
infinity, which is the current behavior)
  * introduce two fields to LVRelState: n_pages_frozen and
delay_skip_count, both initialized to zero
  * when marking a page frozen: n_pages_frozen++
  * when vacuum begins:
  if (unfrozen_pages_target >= 0 &&
  current_unfrozen_page_count > unfrozen_pages_target)
  {
vacrel->delay_skip_count = current_unfrozen_page_count -
  unfrozen_pages_target;
/* ?also use more aggressive freezing thresholds? */
  }
  * in lazy_scan_skip(), have a final check:
  if (vacrel->n_pages_frozen < vacrel->delay_skip_count)
  {
 break;
  }
  
I know there would still be some problem cases, but to me it seems like
we solve 80% of the problem in a couple dozen lines of code.

a. Can you clarify some of the problem cases, and why it's worth
spending more code to fix them?

b. How much of your effort is groundwork for related future
improvements? If it's a substantial part, can you explain in that
larger context?

c. Can some of your patches be separated into independent discussions?
For instance, patch 1 has been discussed in other threads and seems
independently useful, and I don't see the current work as dependent on
it. Patch 4 also seems largerly independent.

d. Can you help give me a sense of scale of the problems solved by
visibilitymap snapshots and the cost model? Do those need to be in v1?

Regards,
Jeff Davis





Re: Postmaster self-deadlock due to PLT linkage resolution

2022-08-30 Thread Tom Lane
Andres Freund  writes:
> On 2022-08-30 13:24:39 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> Perhaps it'd be saner to default to building with -Wl,-z,now? That should 
>>> fix
>>> the problem too, right (and if we combine it with relro, it'd be a security
>>> improvement to boot).

>> Hm.  Not sure if that works on NetBSD, but I'll check it out.

> FWIW, it's a decently (well over 10 years) old thing I think. And it's 
> documented in
> the netbsd ld manpage and their packaging guide (albeit indirectly, with their
> tooling doing the work of specifying the flags):
> https://www.netbsd.org/docs/pkgsrc/hardening.html#hardening.audit.relrofull

It does appear that they use GNU ld, and I've just finished confirming
that each of those switches has the expected effects on my PPC box.
So yeah, this looks like a better answer.

Do we want to install this just for NetBSD, or more widely?
I think we'd better back-patch it for NetBSD, so I'm inclined
to be conservative about the change.

regards, tom lane




Re: Postmaster self-deadlock due to PLT linkage resolution

2022-08-30 Thread Andres Freund
Hi,

On 2022-08-30 13:24:39 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Perhaps it'd be saner to default to building with -Wl,-z,now? That should 
> > fix
> > the problem too, right (and if we combine it with relro, it'd be a security
> > improvement to boot).
> 
> Hm.  Not sure if that works on NetBSD, but I'll check it out.

FWIW, it's a decently (well over 10 years) old thing I think. And it's 
documented in
the netbsd ld manpage and their packaging guide (albeit indirectly, with their
tooling doing the work of specifying the flags):
https://www.netbsd.org/docs/pkgsrc/hardening.html#hardening.audit.relrofull

Greetings,

Andres Freund




Re: replacing role-level NOINHERIT with a grant-level option

2022-08-30 Thread Nathan Bossart
On Mon, Aug 29, 2022 at 03:38:57PM -0400, Robert Haas wrote:
> Argh, I found a bug, and one that I should have caught during testing,
> too. I modelled the new function select_best_grantor() on
> is_admin_of_role(), but it differs in that it calls
> roles_is_member_of() with ROLERECURSE_PRIVS rather than
> ROLECURSE_MEMBERS. Sadly, roles_is_member_of() handles
> ROLERECURSE_PRIVS by completely ignoring non-inherited grants, which
> is wrong, because then calls to select_best_grantor() treat a member
> of a role with INHERIT FALSE, ADMIN TRUE is if they were not an admin
> at all, which is incorrect.

I've been staring at this stuff for a while, and I'm still rather confused.

* You mention that you modelled the "new" function select_best_grantor() on
is_admin_of_role(), but it looks like that function was introduced in 2005.
IIUC you meant select_best_admin(), which was added much more recently.

* I don't see why we should ignore inheritance until after checking admin
option.  Prior to your commit (e3ce2de), we skipped checking for admin
option if the role had [NO]INHERIT, and AFAICT your commit aimed to retain
that behavior.  The comment above check_role_grantor() even mentions
"inheriting the privileges of a role which does have ADMIN OPTION."  Within
the function, I see the following comment:

 * Otherwise, the grantor must either have ADMIN OPTION on the 
role or
 * inherit the privileges of a role which does. In the former 
case,
 * record the grantor as the current user; in the latter, pick 
one of
 * the roles that is "most directly" inherited by the current 
role
 * (i.e. fewest "hops").

So, IIUC, the intent is for ADMIN OPTION to apply even if for non-inherited
grants, but we still choose to recurse in roles_is_member_of() based on
inheritance.  This means that you can inherit the ADMIN OPTION to some
degree, but if you have membership WITH INHERIT FALSE to a role that has
ADMIN OPTION on another role, the current role effectively does not have
ADMIN OPTION on the other role.  Is this correct?

As I'm writing this down, I think it's beginning to make more sense to me,
so this might just be a matter of under-caffeination.  But perhaps some
extra comments or documentation wouldn't be out of the question...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Postmaster self-deadlock due to PLT linkage resolution

2022-08-30 Thread Tom Lane
Andres Freund  writes:
> On 2022-08-29 15:43:55 -0400, Tom Lane wrote:
>> The attached patch seems to fix the problem, by forcing resolution of
>> the PLT link before we unblock signals.  It depends on the assumption
>> that another select() call appearing within postmaster.c will share
>> the same PLT link, which seems pretty safe.

> Hm, what stops the same problem from occuring with other functions?

These few lines are the only part of the postmaster that runs with
signals enabled and unblocked.

> Perhaps it'd be saner to default to building with -Wl,-z,now? That should fix
> the problem too, right (and if we combine it with relro, it'd be a security
> improvement to boot).

Hm.  Not sure if that works on NetBSD, but I'll check it out.

regards, tom lane




Re: Postmaster self-deadlock due to PLT linkage resolution

2022-08-30 Thread Andres Freund
Hi,

On 2022-08-29 15:43:55 -0400, Tom Lane wrote:
> Buildfarm member mamba (NetBSD-current on prairiedog's former hardware)
> has failed repeatedly since I set it up.  I have now run the cause of
> that to ground [1], and here's what's happening: if the postmaster
> receives a signal just before it first waits at the select() in
> ServerLoop, it can self-deadlock.  During the postmaster's first use of
> select(), the dynamic loader needs to resolve the PLT branch table entry
> that the core executable uses to reach select() in libc.so, and it locks
> the loader's internal data structures while doing that.  If we enter
> a signal handler while the lock is held, and the handler needs to do
> anything that also requires the lock, the postmaster is frozen.

Ick.


> The attached patch seems to fix the problem, by forcing resolution of
> the PLT link before we unblock signals.  It depends on the assumption
> that another select() call appearing within postmaster.c will share
> the same PLT link, which seems pretty safe.

Hm, what stops the same problem from occuring with other functions?

Perhaps it'd be saner to default to building with -Wl,-z,now? That should fix
the problem too, right (and if we combine it with relro, it'd be a security
improvement to boot).

Greetings,

Andres Freund




Re: Handle infinite recursion in logical replication setup

2022-08-30 Thread vignesh C
On Mon, 29 Aug 2022 at 12:01, Peter Smith  wrote:
>
> Here are some review comments for patch v42-0002:
>
> ==
>
> 1. doc/src/sgml/logical-replication.sgml
>
> copy_data = true 
>
> There are a couple of these tags where there is a trailing space
> before the . I guess it is doing no harm, but it is doing no
> good either, so IMO better to get rid of the space.

Modified

> ~~
>
> 2. doc/src/sgml/logical-replication.sgml - 31.3.1. Setting replication
> between two primaries
>
> User need to ensure that no data changes happen on table t1 on
> primary1 and primary2 until the setup is completed.
>
> SUGGESTION
> The user must ensure that no data changes happen on table t1 of
> primary1 and primary2 until the setup is completed.

Modified

> ~~~
>
> 3. doc/src/sgml/logical-replication.sgml - 31.3.2. Adding a new
> primary when there is no table data on any of the primaries
>
> User need to ensure that no data changes happen on table t1 on all the
> primaries primary1, primary2 and primary3 until the setup is
> completed.
>
> SUGGESTION
> The user must ensure that no data changes happen on table t1 of all
> the primaries primary1, primary2 and primary3 until the setup is
> completed.

Modified

> ~~~
>
> 4. doc/src/sgml/logical-replication.sgml - 31.3.3. Adding a new
> primary when table data is present on the existing primaries
>
> User need to ensure that no operations should be performed on table t1
> on primary2 and primary3 until the setup is completed.
>
> SUGGESTION
> The user must ensure that no operations are performed on table t1 of
> primary2 and primary3 until the setup is completed.
>
> Here also you changed the wording - "no data changes happen" versus
> "no operations should be performed". Was that a deliberate difference?
> Maybe they should all be consistent wording? If they are
> interchangeable IMO the "no operations are performed..." wording was
> the better of the two.

Modified, it was not a deliberate change. I have changed it to "no
operations are performed" in other places too.

> ~~~
>
> 5. doc/src/sgml/logical-replication.sgml - 31.3.4. Generic steps for
> adding a new primary to an existing set of primaries
>
> Step-2: User need to ensure that no changes happen on the required
> tables of the new primary until the setup is complete.
>
> SUGGESTION
> Step-2: The user must ensure that no changes happen on the required
> tables of the new primary until the setup is complete.

Modified

> ~~~
>
> 6.
>
> Step-4. User need to ensure that no changes happen on the required
> tables of the existing primaries except the first primary until the
> setup is complete.
>
> SUGGESTION
> Step-4. The user must ensure that no changes happen on the required
> tables of the existing primaries except the first primary until the
> setup is complete.

Modified

> ~~~
>
> 7. (the example)
>
> 7a.
> Step-2. User need to ensure that no changes happen on table t1 on
> primary4 until the setup is completed.
>
> SUGGESTION
> Step-2. The user must ensure that no changes happen on table t1 of
> primary4 until the setup is completed.

Modified

> ~
>
> 7b.
> Step-4. User need to ensure that no changes happen on table t1 on
> primary2 and primary3 until the setup is completed.
>
> SUGGESTION
> Step-4. The user must ensure that no changes happen on table t1 of
> primary2 and primary3 until the setup is completed.

Modified

Thanks for the comments, the changes for the same are available in the
v43 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm1V%2BAvzPsUcq%3DmNYG%2BJfAyovTWBe4vWKTknOgH_ko1E0Q%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-08-30 Thread vignesh C
On Mon, 29 Aug 2022 at 11:59, Peter Smith  wrote:
>
> Here are some review comments for patch v42-0001:
>
> ==
>
> 1. Commit message
>
> A later review comment below suggests some changes to the WARNING
> message so if those changes are made then the example in this commit
> message also needs to be modified.

Modified

> ==
>
> 2. doc/src/sgml/ref/alter_subscription.sgml - copy_data
>
> Refer to the Notes for the usage of true for copy_data parameter and
> its interaction with the origin parameter.
>
> I think saying "usage of true" sounded a bit strange.
>
> SUGGESTION
> Refer to the Notes about how copy_data = true can interact with the
> origin parameter.

Modified

> ==
>
> 3. doc/src/sgml/ref/create_subscription.sgml - copy data
>
> Refer to the Notes for the usage of true for copy_data parameter and
> its interaction with the origin parameter.
>
> SUGGESTION
> (same as #2)

Modified

> ~~
>
> 4. doc/src/sgml/ref/create_subscription.sgml - origin
>
> Refer to the Notes for the usage of true for copy_data parameter and
> its interaction with the origin parameter.
>
> SUGGESTION
> (same as #2)

Modified

> ~~~
>
> 5. doc/src/sgml/ref/create_subscription.sgml - Notes
>
> +  
> +   If the subscription is created with origin = none and
> +   copy_data = true, it will check if the publisher has
> +   subscribed to the same table from other publishers and, if so, throw a
> +   warning to notify user to check the publisher tables. The user can ensure
>
> "throw a warning to notify user" -> "log a warning to notify the user"

Modified

> ~~
>
> 6.
>
> +   warning to notify user to check the publisher tables. The user can ensure
> +   that publisher tables do not have data which has an origin associated 
> before
> +   continuing with any other operations to prevent inconsistent data being
> +   replicated.
> +  
>
> 6a.
>
> I'm not so sure about this. IMO the warning is not about the
> replication – really it is about the COPY which has already happened
> anyway. So the user can't really prevent anything from going wrong;
> instead, they have to take some action to clean up if anything did go
> wrong.
>
> SUGGESTION
> Before continuing with other operations the user should check that
> publisher tables did not have data with different origins, otherwise
> inconsistent data may have been copied.

Modified based on the suggestion provided by Amit.

> ~
>
> 6b.
>
> I am also wondering what can the user do now. Assuming there was bad
> COPY then the subscriber table has already got unwanted stuff copied
> into it. Is there any advice we can give to help users fix this mess?

There is nothing much a user can do in this case. Only option would be
to take a backup before the operation and restore it and then recreate
the replication setup. I was not sure if we should document these
steps as similar inconsistent data could get created without the
origin option . Thoughts?

> ==
>
> 7. src/backend/commands/subscriptioncmds.c - AlterSubscription_refresh
>
> + /* Check whether we can allow copy of newly added relations. */
> + check_pub_table_subscribed(wrconn, sub->publications, copy_data,
> +sub->origin, subrel_local_oids,
> +subrel_count);
>
> "whether we can allow" seems not quite the right wording here anymore,
> because now there is no ERROR stopping this - so if there was unwanted
> data the COPY will proceed to copy it anyhow...

Removed the comment as the function details the necessary things.

> ~~~
>
> 8. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> @@ -1781,6 +1794,121 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
>   table_close(rel, RowExclusiveLock);
>  }
>
> +/*
> + * Check and throw a warning if the publisher has subscribed to the same 
> table
> + * from some other publisher. This check is required only if "copy_data = 
> true"
> + * and "origin = none" for CREATE SUBSCRIPTION and
> + * ALTER SUBSCRIPTION ... REFRESH statements to notify user that data having
> + * origin might have been copied.
>
> "throw a warning" -> "log a warning"
>
> "to notify user" -> "to notify the user" ?

Modified

> ~~~
>
> 9.
>
> + if (copydata == false || !origin ||
> + (pg_strcasecmp(origin, LOGICALREP_ORIGIN_NONE) != 0))
> + return;
>
> SUGGESTION
> if (!copydata || !origin ||
> (pg_strcasecmp(origin, LOGICALREP_ORIGIN_NONE) != 0))

Modified

> ~~~
>
> 10. (Question)
>
> I can't tell just by reading the code if FOR ALL TABLES case is
> handled – e.g. will this recognise the case were the publisher might
> have table data from other origins because it has a subscription on
> some other node that was publishing "FOR ALL TABLES"?

Yes it handles it.

> ~~~
>
> 11.
>
> + ereport(WARNING,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("publisher has subscribed table \"%s.%s\" from some other publisher",
> +nspname, relname),
> + errdetail("Publisher might have subscribed one or more tables from
> some other publisher."),
> + 

Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-08-30 Thread Simon Riggs
On Thu, 11 Aug 2022 at 06:32, Dilip Kumar  wrote:

> > > So I still think some adjustment is required in XidInMVCCSnapdhot()
> >
> > That is one way to resolve the issue, but not the only one. I can also
> > change AssignTransactionId() to recursively register parent xids for
> > all of a subxid's parents.
> >
> > I will add in a test case and resolve the dependency in my next patch.
>
> Okay, thanks, I will look into the updated patch after you submit that.

PFA two patches, replacing earlier work
001_new_isolation_tests_for_subxids.v3.patch
002_minimize_calls_to_SubTransSetParent.v8.patch

001_new_isolation_tests_for_subxids.v3.patch
Adds new test cases to master without adding any new code, specifically
addressing the two areas of code that are not tested by existing tests.
This gives us a baseline from which we can do test driven development.
I'm hoping this can be reviewed and committed fairly smoothly.

002_minimize_calls_to_SubTransSetParent.v8.patch
Reduces the number of calls to subtrans below 1% for the first 64 subxids,
so overall will substantially reduce subtrans contention on master for the
typical case, as well as smoothing the overflow case.
Some discussion needed on this; there are various options.
This combines the work originally posted here with another patch posted on the
thread "Smoothing the subtrans performance catastrophe".

I will do some performance testing also, but more welcome.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


002_minimize_calls_to_SubTransSetParent.v8.patch
Description: Binary data


001_new_isolation_tests_for_subxids.v3.patch
Description: Binary data


Re: archive modules

2022-08-30 Thread Nathan Bossart
On Tue, Aug 30, 2022 at 09:49:20AM +0200, Benoit Lobréau wrote:
> Ok done, https://commitfest.postgresql.org/39/3856/ (is that fine if I
> added you as a reviewer ?)

Of course.  I've marked it as ready-for-committer.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Hash index build performance tweak from sorting

2022-08-30 Thread Simon Riggs
On Fri, 5 Aug 2022 at 20:46, David Zhang  wrote:
>
> On 2022-08-01 8:37 a.m., Simon Riggs wrote:
> > Using the above test case, I'm getting a further 4-7% improvement on
> > already committed code with the attached patch, which follows your
> > proposal.
>
> I ran two test cases: for committed patch `hash_sort_by_hash.v3.patch`, I can 
> see about 6 ~ 7% improvement; and after applied patch 
> `hash_inserted_sorted.v2.patch`, I see about ~3% improvement. All the test 
> results are based on 10 times average on two different machines.

Thanks for testing David.

It's a shame you only see 3%, but that's still worth it.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread Tom Lane
David Rowley  writes:
> Here's a patch which adds a comment to MemoryContextMethodID to Robert's 
> patch.

OK, but while looking at that I noticed the adjacent

#define MEMORY_CONTEXT_METHODID_MASK \
UINT64CONST((1 << MEMORY_CONTEXT_METHODID_BITS) - 1)

I'm rather astonished that that compiles; UINT64CONST was only ever
meant to be applied to *literals*.  I think what it's expanding to
is

((1 << MEMORY_CONTEXT_METHODID_BITS) - 1UL)

(or on some machines 1ULL) which only accidentally does approximately
what you want.  It'd be all right perhaps to write

((UINT64CONST(1) << MEMORY_CONTEXT_METHODID_BITS) - 1)

but you might as well avoid the Postgres-ism and just write

((uint64) ((1 << MEMORY_CONTEXT_METHODID_BITS) - 1))

Nobody's ever going to make MEMORY_CONTEXT_METHODID_BITS large
enough for the shift to overflow in int arithmetic.

regards, tom lane




Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread Robert Haas
On Tue, Aug 30, 2022 at 11:39 AM David Rowley  wrote:
> On Wed, 31 Aug 2022 at 03:31, David Rowley  wrote:
> > I've no objections to adding a comment to the enum to
> > explain to future devs. My vote would be for that and add the (int)
> > cast as proposed by Robert.
>
> Here's a patch which adds a comment to MemoryContextMethodID to Robert's 
> patch.

LGTM.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: making relfilenodes 56 bits

2022-08-30 Thread Robert Haas
On Tue, Aug 30, 2022 at 8:45 AM Dilip Kumar  wrote:
> I have found one more issue with this approach of rewriting the
> conflicting table.  Earlier I thought we could do the conflict
> checking and rewriting inside create_new_objects() right before the
> restore command.  But after implementing (while testing) this I
> realized that we DROP and CREATE the database while restoring the dump
> that means it will again generate the conflicting system tables.  So
> theoretically the rewriting should go in between the CREATE DATABASE
> and restoring the object but as of now both create database and
> restoring other objects are part of a single dump file.  I haven't yet
> analyzed how feasible it is to generate the dump in two parts, first
> part just to create the database and in second part restore the rest
> of the object.
>
> Thoughts?

Well, that's very awkward. It doesn't seem like it would be very
difficult to teach pg_upgrade to call pg_restore without --clean and
just do the drop database itself, but that doesn't really help,
because pg_restore will in any event be creating the new database.
That doesn't seem like something we can practically refactor out,
because only pg_dump knows what properties to use when creating the
new database. What we could do is have the dump include a command like
SELECT pg_binary_upgrade_move_things_out_of_the_way(some_arguments_here),
but that doesn't really help very much, because passing the whole list
of relfilenode values from the old database seems pretty certain to be
a bad idea. The whole idea here was that we'd be able to build a hash
table on the new database's system table OIDs, and it seems like
that's not going to work.

We could try to salvage some portion of the idea by making
pg_binary_upgrade_move_things_out_of_the_way() take a more restricted
set of arguments, like the smallest and largest relfilenode values
from the old database, and then we'd just need to move things that
overlap. But that feels pretty hit-or-miss to me as to whether it
actually avoids any work, and
pg_binary_upgrade_move_things_out_of_the_way() might also be annoying
to write. So perhaps we have to go back to the drawing board here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread David Rowley
On Wed, 31 Aug 2022 at 03:31, David Rowley  wrote:
> I've no objections to adding a comment to the enum to
> explain to future devs. My vote would be for that and add the (int)
> cast as proposed by Robert.

Here's a patch which adds a comment to MemoryContextMethodID to Robert's patch.

David
diff --git a/src/include/utils/memutils_internal.h 
b/src/include/utils/memutils_internal.h
index 9a9f52ef16..f348282a07 100644
--- a/src/include/utils/memutils_internal.h
+++ b/src/include/utils/memutils_internal.h
@@ -74,6 +74,9 @@ extern void SlabCheck(MemoryContext context);
  * MemoryContextMethodID
  * A unique identifier for each MemoryContext implementation which
  * indicates the index into the mcxt_methods[] array. See mcxt.c.
+ * The maximum value for this enum is constrained by
+ * MEMORY_CONTEXT_METHODID_MASK.  If an enum value higher than 
that is
+ * required then MEMORY_CONTEXT_METHODID_BITS will need to be 
increased.
  */
 typedef enum MemoryContextMethodID
 {
diff --git a/src/include/utils/memutils_memorychunk.h 
b/src/include/utils/memutils_memorychunk.h
index 685c177b68..2eefc138e3 100644
--- a/src/include/utils/memutils_memorychunk.h
+++ b/src/include/utils/memutils_memorychunk.h
@@ -159,7 +159,7 @@ MemoryChunkSetHdrMask(MemoryChunk *chunk, void *block,
Assert((char *) chunk > (char *) block);
Assert(blockoffset <= MEMORYCHUNK_MAX_BLOCKOFFSET);
Assert(value <= MEMORYCHUNK_MAX_VALUE);
-   Assert(methodid <= MEMORY_CONTEXT_METHODID_MASK);
+   Assert((int) methodid <= MEMORY_CONTEXT_METHODID_MASK);
 
chunk->hdrmask = (((uint64) blockoffset) << 
MEMORYCHUNK_BLOCKOFFSET_BASEBIT) |
(((uint64) value) << MEMORYCHUNK_VALUE_BASEBIT) |
@@ -175,7 +175,7 @@ static inline void
 MemoryChunkSetHdrMaskExternal(MemoryChunk *chunk,
  MemoryContextMethodID 
methodid)
 {
-   Assert(methodid <= MEMORY_CONTEXT_METHODID_MASK);
+   Assert((int) methodid <= MEMORY_CONTEXT_METHODID_MASK);
 
chunk->hdrmask = MEMORYCHUNK_MAGIC | (((uint64) 1) << 
MEMORYCHUNK_EXTERNAL_BASEBIT) |
methodid;


Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread David Rowley
On Wed, 31 Aug 2022 at 01:36, Tom Lane  wrote:
>
> David Rowley  writes:
> > I think the Assert is useful as if we were ever to add an enum member
> > with the value of 8 and forgot to adjust MEMORY_CONTEXT_METHODID_BITS
> > then bad things would happen inside MemoryChunkSetHdrMask() and
> > MemoryChunkSetHdrMaskExternal().  I think it's unlikely we'll ever get
> > that many MemoryContext types, but I don't know for sure and would
> > rather the person who adds the 9th one get alerted to the lack of bit
> > space in MemoryChunk as soon as possible.
>
> I think that's a weak argument, so I don't mind dropping this Assert.
> What would be far more useful is a comment inside the
> MemoryContextMethodID enum pointing out that we can support at most
> 8 values because XYZ.

I'd just sleep better knowing that we have some test coverage to
ensure that MemoryChunkSetHdrMask() and
MemoryChunkSetHdrMaskExternal() have some verification that we don't
end up with future code that will cause the hdrmask to be invalid.  I
tried to make those functions as lightweight as possible. Without the
Assert, I just feel that there's a bit too much trust that none of the
bits overlap.  I've no objections to adding a comment to the enum to
explain to future devs. My vote would be for that and add the (int)
cast as proposed by Robert.

David




Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread David Rowley
On Wed, 31 Aug 2022 at 03:00, Robert Haas  wrote:
>
> On Tue, Aug 30, 2022 at 3:14 AM David Rowley  wrote:
> > I think the Assert is useful as if we were ever to add an enum member
> > with the value of 8 and forgot to adjust MEMORY_CONTEXT_METHODID_BITS
> > then bad things would happen inside MemoryChunkSetHdrMask() and
> > MemoryChunkSetHdrMaskExternal().  I think it's unlikely we'll ever get
> > that many MemoryContext types, but I don't know for sure and would
> > rather the person who adds the 9th one get alerted to the lack of bit
> > space in MemoryChunk as soon as possible.
>
> Well I don't have a problem with that, but I think we should try to do
> it without causing compiler warnings. The attached patch fixes it for
> me.

I'm fine with adding the int cast. Seems like a good idea.

> > As much as I'm not a fan of adding new warnings for compiler options
> > that are not part of our standard set, I feel like if there are
> > warning flags out there that are as giving us false warnings such as
> > this one, then we shouldn't trouble ourselves trying to get rid of
> > them, especially so when they force us to remove something which might
> > catch a future bug.
>
> For me the point is that, at least on the compiler that I'm using, the
> warning suggests that the compiler will optimize the test away
> completely, and therefore it wouldn't catch a future bug. Could there
> be compilers where no warning is generated but the assertion is still
> optimized away?

I'd not considered that the compiler might optimise it away.  My
suspicions had been more along the lines of that clang removed the
enum out of range warnings because they were annoying and wrong as
it's pretty easy to set an enum variable to something out of range of
the defined enum values.

Looking at [1], it seems like 5.0.2 is producing the correct code and
it's just producing a warning. The 2nd compiler window has -Werror and
shows that it does fail to compile. If I change that to use clang
6.0.0 then it works. It seems to fail all the way back to clang 3.1.
clang 3.0.0 works.

David

[1] https://godbolt.org/z/Gx388z5Ej




Re: Stack overflow issue

2022-08-30 Thread Tom Lane
I wrote:
>> I think most likely we should report this to Snowball upstream
>> and see what they think is an appropriate fix.

> Done at [1], and I pushed the other fixes.  Thanks again for the report!

The upstream recommendation, which seems pretty sane to me, is to
simply reject any string exceeding some threshold length as not
possibly being a word.  Apparently it's common to use thresholds
as small as 64 bytes, but in the attached I used 1000 bytes.

regards, tom lane

diff --git a/src/backend/snowball/dict_snowball.c b/src/backend/snowball/dict_snowball.c
index 68c9213f69..aaf4ff72b6 100644
--- a/src/backend/snowball/dict_snowball.c
+++ b/src/backend/snowball/dict_snowball.c
@@ -272,11 +272,25 @@ dsnowball_lexize(PG_FUNCTION_ARGS)
 	DictSnowball *d = (DictSnowball *) PG_GETARG_POINTER(0);
 	char	   *in = (char *) PG_GETARG_POINTER(1);
 	int32		len = PG_GETARG_INT32(2);
-	char	   *txt = lowerstr_with_len(in, len);
 	TSLexeme   *res = palloc0(sizeof(TSLexeme) * 2);
+	char	   *txt;
 
+	/*
+	 * Reject strings exceeding 1000 bytes, as they're surely not words in any
+	 * human language.  This restriction avoids wasting cycles on stuff like
+	 * base64-encoded data, and it protects us against possible inefficiency
+	 * or misbehavior in the stemmers (for example, the Turkish stemmer has an
+	 * indefinite recursion so it can crash on long-enough strings).
+	 */
+	if (len <= 0 || len > 1000)
+		PG_RETURN_POINTER(res);
+
+	txt = lowerstr_with_len(in, len);
+
+	/* txt is probably not zero-length now, but we'll check anyway */
 	if (*txt == '\0' || searchstoplist(&(d->stoplist), txt))
 	{
+		/* empty or stopword, so reject */
 		pfree(txt);
 	}
 	else


Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread Robert Haas
On Tue, Aug 30, 2022 at 3:14 AM David Rowley  wrote:
> I'm not really sure either. I tried compiling with clang 12.0.1 with
> -Wtautological-constant-out-of-range-compare and don't get this
> warning.

I have a much older clang version, it seems. clang -v reports 5.0.2. I
use -Wall and -Werror as a matter of habit. It looks like 5.0.2 was
released in May 2018, installed by me in November of 2019, and I just
haven't had a reason to upgrade.

> I think the Assert is useful as if we were ever to add an enum member
> with the value of 8 and forgot to adjust MEMORY_CONTEXT_METHODID_BITS
> then bad things would happen inside MemoryChunkSetHdrMask() and
> MemoryChunkSetHdrMaskExternal().  I think it's unlikely we'll ever get
> that many MemoryContext types, but I don't know for sure and would
> rather the person who adds the 9th one get alerted to the lack of bit
> space in MemoryChunk as soon as possible.

Well I don't have a problem with that, but I think we should try to do
it without causing compiler warnings. The attached patch fixes it for
me.

> As much as I'm not a fan of adding new warnings for compiler options
> that are not part of our standard set, I feel like if there are
> warning flags out there that are as giving us false warnings such as
> this one, then we shouldn't trouble ourselves trying to get rid of
> them, especially so when they force us to remove something which might
> catch a future bug.

For me the point is that, at least on the compiler that I'm using, the
warning suggests that the compiler will optimize the test away
completely, and therefore it wouldn't catch a future bug. Could there
be compilers where no warning is generated but the assertion is still
optimized away?

I don't know, but I don't think a 4-year-old compiler is such a fossil
that we shouldn't care whether it produces warnings. We worry about
operating systems and PostgreSQL versions that are almost extinct in
the wild, so saying we're not going to worry about failing to update
the compiler regularly enough within the lifetime of one off-the-shelf
MacBook does not really make sense to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


fix-memorychunk-warnings.patch
Description: Binary data


Re: \pset xheader_width page as default? (Re: very long record lines in expanded psql output)

2022-08-30 Thread Pavel Stehule
út 30. 8. 2022 v 16:49 odesílatel Pavel Stehule 
napsal:

>
>
> út 30. 8. 2022 v 16:36 odesílatel Christoph Berg  napsal:
>
>> Re: Pavel Stehule
>> > pspg requires all lines to have the same width. It can do some
>> corrections
>> > - but it is hard to detect wanted differences or just plain text format.
>> >
>> > can be nice to have the first invisible row with some information about
>> > used formatting. pspg does some heuristic but this code is not nice and
>> it
>> > is fragile.
>>
>> I like pspg and use it myself, but I don't think a tool that does the
>> right thing by hiding a full screen of  from the user should
>> hinder making the same progress in psql with a simple pager.
>>
>
> ASCII allows to set some metadata, that should be invisible in all
> correctly implemented pagers.
>

or these parameters can be sent by pager's command line or via some
environment variable. Currently there are only two pagers on the world that
support tabular format, and both are created against psql (pspg and ov), so
we can define our own protocol. Surely - pspg will have heuristic forever,
because I want to support psql, mysql and many others. But it can be fine
to switch to some more robust mode. It can be interesting for continuous
load via pipe.

Regards

Pavel



>
>
>>
>> Christoph
>>
>


Re: Convert *GetDatum() and DatumGet*() macros to inline functions

2022-08-30 Thread Aleksander Alekseev
Hi hackers,

> Cfbot is making its mind at the moment of writing.

Here is v3 with silenced compiler warnings.

-- 
Best regards,
Aleksander Alekseev


v3-0002-Convert-GetDatum-and-DatumGet-macros-to-inline-fu.patch
Description: Binary data


v3-0001-Fix-incorrect-uses-of-Datum-conversion-macros.patch
Description: Binary data


Re: \pset xheader_width page as default? (Re: very long record lines in expanded psql output)

2022-08-30 Thread Pavel Stehule
út 30. 8. 2022 v 16:36 odesílatel Christoph Berg  napsal:

> Re: Pavel Stehule
> > pspg requires all lines to have the same width. It can do some
> corrections
> > - but it is hard to detect wanted differences or just plain text format.
> >
> > can be nice to have the first invisible row with some information about
> > used formatting. pspg does some heuristic but this code is not nice and
> it
> > is fragile.
>
> I like pspg and use it myself, but I don't think a tool that does the
> right thing by hiding a full screen of  from the user should
> hinder making the same progress in psql with a simple pager.
>

ASCII allows to set some metadata, that should be invisible in all
correctly implemented pagers.



>
> Christoph
>


Re: \pset xheader_width page as default? (Re: very long record lines in expanded psql output)

2022-08-30 Thread Christoph Berg
Re: Pavel Stehule
> pspg requires all lines to have the same width. It can do some corrections
> - but it is hard to detect wanted differences or just plain text format.
> 
> can be nice to have the first invisible row with some information about
> used formatting. pspg does some heuristic but this code is not nice and it
> is fragile.

I like pspg and use it myself, but I don't think a tool that does the
right thing by hiding a full screen of  from the user should
hinder making the same progress in psql with a simple pager.

Christoph




Re: SQL/JSON features for v15

2022-08-30 Thread Jonathan S. Katz

On 8/30/22 9:16 AM, Andrew Dunstan wrote:


On 2022-08-30 Tu 06:29, Amit Langote wrote:

On Tue, Aug 30, 2022 at 6:19 PM Alvaro Herrera  wrote:

On 2022-Aug-30, Amit Langote wrote:


Patches 0001-0006:

Yeah, these add the overhead of an extra function call (typin() ->
typin_opt_error()) in possibly very common paths.  Other than
refactoring *all* places that call typin() to use the new API, the
only other option seems to be to leave the typin() functions alone and
duplicate their code in typin_opt_error() versions for all the types
that this patch cares about.  Though maybe, that's not necessarily a
better compromise than accepting the extra function call overhead.

I think another possibility is to create a static inline function in the
corresponding .c module (say boolin_impl() in bool.c), which is called
by both the opt_error variant as well as the regular one.  This would
avoid the duplicate code as well as the added function-call overhead.

+1



Makes plenty of sense, I'll try to come up with replacements for these
forthwith.


The RMT had its weekly meeting today to discuss open items. As stated 
last week, to keep the v15 release within a late Sept / early Oct 
timeframe, we need to make a decision about the inclusion of SQL/JSON 
this week.


First, we appreciate all of the effort and work that has gone into 
incorporating community feedback into the patches. We did note that 
folks working on this made a lot of progress over the past week.


The RMT still has a few concerns, summarized as:

1. There is not yet consensus on the current patch proposals as we 
approach the end of the major release cycle
2. There is a lack of general feedback from folks who raised concerns 
about the implementation


The RMT is still inclined to revert, but will give folks until Sep 1 
0:00 AoE[1] to reach consensus on if SQL/JSON can be included in v15. 
This matches up to Andrew's availability timeline for a revert, and 
gives enough time to get through the buildfarm prior to the Beta 4 
release[2].


After the deadline, if there is no consensus on how to proceed, the RMT 
will request that the patches are reverted.


While noting that this RMT has no decision making over v16, in the event 
of a revert we do hope this recent work can be the basis of the feature 
in v16.


Again, we appreciate the efforts that have gone into addressing the 
community feedback.


Sincerely,

John, Jonathan, Michael

[1] https://en.wikipedia.org/wiki/Anywhere_on_Earth
[2] 
https://www.postgresql.org/message-id/9d251aec-cea2-bc1a-5ed8-46ef0bcf6...@postgresql.org


OpenPGP_signature
Description: OpenPGP digital signature


Re: Convert *GetDatum() and DatumGet*() macros to inline functions

2022-08-30 Thread Tom Lane
Aleksander Alekseev  writes:
> Maybe we should consider backporting at least 0001 patch, partially
> perhaps? I believe if fixes pretty cursed pieces of code, e.g:

Certainly if there are any parts of it that fix actual bugs,
we ought to backport those.  I'm not in a big hurry to backport
cosmetic fixes though.

regards, tom lane




Re: \pset xheader_width page as default? (Re: very long record lines in expanded psql output)

2022-08-30 Thread Pavel Stehule
út 30. 8. 2022 v 15:49 odesílatel Christoph Berg  napsal:

> Re: Andrew Dunstan
> > >> +   pg_log_error("\\pset: allowed xheader_width values
> are full
> > >> (default), column, page, or an number specifying exact width.");
>
> > Committed. There were a couple of bits missing, which I filled in, and I
> > changed the behaviour when the option is given without an argument, so
> > that instead of resetting it the current value is shown, similarly to
> > how most pset options work.
>
> Thanks for implementing this! This has been #1 on my PG annoyances
> list since forever. I had even tried to patch it 10½ years ago, but
> ever managed to get that patch to submittable quality.
>
> Now, is there a chance that we could make this the default? Having
> psql print a whole screen of  minus characters doesn't
> seem very useful as default behavior.
>
> I see upthread that Pavel was objecting to that because pspg doesn't
> understand it. But since the expanded format is a one-column output,
> and the only thing pspg needs to know is the maxium line length, I'd
> think pspg could just look at each line, and update the maximum as it
> reads the input anyway.
>

pspg requires all lines to have the same width. It can do some corrections
- but it is hard to detect wanted differences or just plain text format.

can be nice to have the first invisible row with some information about
used formatting. pspg does some heuristic but this code is not nice and it
is fragile.

Regards

Pavel


> Can we set 'page' as default?
>
> Christoph
>
>
>


Re: Convert *GetDatum() and DatumGet*() macros to inline functions

2022-08-30 Thread Robert Haas
On Tue, Aug 30, 2022 at 10:25 AM Aleksander Alekseev
 wrote:
> > Yeah, I don't see a reason to back-patch a change like this
>
> Maybe we should consider backporting at least 0001 patch, partially
> perhaps? I believe if fixes pretty cursed pieces of code, e.g:
>
> ```
>  pg_cryptohash_ctx *context =
> -(pg_cryptohash_ctx *) PointerGetDatum(foundres);
> +(pg_cryptohash_ctx *) DatumGetPointer(foundres);
> ```

Sure, back-porting the bug fixes would make sense to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Convert *GetDatum() and DatumGet*() macros to inline functions

2022-08-30 Thread Aleksander Alekseev
Tom, Robert,

> Yeah, I don't see a reason to back-patch a change like this

Maybe we should consider backporting at least 0001 patch, partially
perhaps? I believe if fixes pretty cursed pieces of code, e.g:

```
 pg_cryptohash_ctx *context =
-(pg_cryptohash_ctx *) PointerGetDatum(foundres);
+(pg_cryptohash_ctx *) DatumGetPointer(foundres);
```

This being said, personally I don't have a strong opinion here. After
all, the code works and passes the tests. Maybe I'm just being a
perfectionist here.

-- 
Best regards,
Aleksander Alekseev




Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread Tom Lane
I wrote:
> So maybe we should revisit the question.  It'd be worth collecting some
> stats about how much extra space would be needed if we force there
> to be room for a sentinel.

Actually, after ingesting more caffeine, the problem with this for aset.c
is that the only way to add space for a sentinel that didn't fit already
is to double the space allocation.  That's a little daunting, especially
remembering how many places deliberately allocate power-of-2-sized
arrays.

You could imagine deciding that the space classifications are not
power-of-2 but power-of-2-plus-one, or something like that.  But that
would be very invasive to the logic, and I doubt it's a good idea.

regards, tom lane




Re: Convert *GetDatum() and DatumGet*() macros to inline functions

2022-08-30 Thread Robert Haas
On Tue, Aug 30, 2022 at 10:13 AM Tom Lane  wrote:
> Aleksander Alekseev  writes:
> > Just to clarify, a break in this case is going to be the fact that we
> > are adding new functions, although inlined, correct? Or maybe
> > something else? I'm sorry this is the first time I encounter the
> > question of ABI compatibility in the context of Postgres, so I would
> > appreciate it if you could elaborate a bit.
>
> After absorbing a bit more caffeine, I suppose that replacing a
> macro with a "static inline" function would not be an ABI break,
> at least not with most modern compilers, because the code should
> end up the same.  I'd still vote against back-patching though.
> I don't think the risk-reward ratio is good, especially not for
> the pre-C99 branches which don't necessarily have "inline".

Yeah, I don't see a reason to back-patch a change like this, certainly
not right away. If over time it turns out that the different
definitions on different branches cause too many headaches, we could
reconsider. However, I'm not sure that will happen, because the whole
point is that the static inline functions are intended to behave in
the same way as the macros, just with better type-checking.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Convert *GetDatum() and DatumGet*() macros to inline functions

2022-08-30 Thread Tom Lane
Aleksander Alekseev  writes:
> Just to clarify, a break in this case is going to be the fact that we
> are adding new functions, although inlined, correct? Or maybe
> something else? I'm sorry this is the first time I encounter the
> question of ABI compatibility in the context of Postgres, so I would
> appreciate it if you could elaborate a bit.

After absorbing a bit more caffeine, I suppose that replacing a
macro with a "static inline" function would not be an ABI break,
at least not with most modern compilers, because the code should
end up the same.  I'd still vote against back-patching though.
I don't think the risk-reward ratio is good, especially not for
the pre-C99 branches which don't necessarily have "inline".

regards, tom lane




Re: Convert *GetDatum() and DatumGet*() macros to inline functions

2022-08-30 Thread Aleksander Alekseev
Hi Tom,

> Do you think this should be backported?
>> Impossible, it's an ABI break.

OK, got it.

Just to clarify, a break in this case is going to be the fact that we
are adding new functions, although inlined, correct? Or maybe
something else? I'm sorry this is the first time I encounter the
question of ABI compatibility in the context of Postgres, so I would
appreciate it if you could elaborate a bit.

-- 
Best regards,
Aleksander Alekseev




\pset xheader_width page as default? (Re: very long record lines in expanded psql output)

2022-08-30 Thread Christoph Berg
Re: Andrew Dunstan
> >> +   pg_log_error("\\pset: allowed xheader_width values are full
> >> (default), column, page, or an number specifying exact width.");

> Committed. There were a couple of bits missing, which I filled in, and I
> changed the behaviour when the option is given without an argument, so
> that instead of resetting it the current value is shown, similarly to
> how most pset options work.

Thanks for implementing this! This has been #1 on my PG annoyances
list since forever. I had even tried to patch it 10½ years ago, but
ever managed to get that patch to submittable quality.

Now, is there a chance that we could make this the default? Having
psql print a whole screen of  minus characters doesn't
seem very useful as default behavior.

I see upthread that Pavel was objecting to that because pspg doesn't
understand it. But since the expanded format is a one-column output,
and the only thing pspg needs to know is the maxium line length, I'd
think pspg could just look at each line, and update the maximum as it
reads the input anyway.

Can we set 'page' as default?

Christoph




Re: Convert *GetDatum() and DatumGet*() macros to inline functions

2022-08-30 Thread Tom Lane
Aleksander Alekseev  writes:
> Do you think this should be backported?

Impossible, it's an ABI break.

regards, tom lane




Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread Tom Lane
David Rowley  writes:
> On Tue, 30 Aug 2022 at 03:16, Robert Haas  wrote:
>> ../../../../src/include/utils/memutils_memorychunk.h:186:18: error:
>> comparison of constant 7 with expression of type
>> 'MemoryContextMethodID' (aka 'enum MemoryContextMethodID') is always
>> true [-Werror,-Wtautological-constant-out-of-range-compare]
>> Assert(methodid <= MEMORY_CONTEXT_METHODID_MASK);
>> ^~~~

> I think the Assert is useful as if we were ever to add an enum member
> with the value of 8 and forgot to adjust MEMORY_CONTEXT_METHODID_BITS
> then bad things would happen inside MemoryChunkSetHdrMask() and
> MemoryChunkSetHdrMaskExternal().  I think it's unlikely we'll ever get
> that many MemoryContext types, but I don't know for sure and would
> rather the person who adds the 9th one get alerted to the lack of bit
> space in MemoryChunk as soon as possible.

I think that's a weak argument, so I don't mind dropping this Assert.
What would be far more useful is a comment inside the
MemoryContextMethodID enum pointing out that we can support at most
8 values because XYZ.

However, I'm still wondering why Robert sees this when the rest of us
don't.

regards, tom lane




Re: Convert *GetDatum() and DatumGet*() macros to inline functions

2022-08-30 Thread Aleksander Alekseev
Hi Peter,

> To address this, I converted these macros to inline functions

This is a great change!

I encountered passing the wrong arguments to these macros many times,
and this is indeed pretty annoying. I wish we could forbid doing other
stupid things as well, e.g. comparing two Datum's directly, which for
Timestamps works just fine but only on 64-bit platforms. Although this
is certainly out of scope of this thread.

The patch looks good to me, I merely added a link to the discussion. I
added it to the CF application. Cfbot is making its mind at the moment
of writing.

Do you think this should be backported?

-- 
Best regards,
Aleksander Alekseev


v2-0001-Fix-incorrect-uses-of-Datum-conversion-macros.patch
Description: Binary data


v2-0002-Convert-GetDatum-and-DatumGet-macros-to-inline-fu.patch
Description: Binary data


Re: Strip -mmacosx-version-min options from plperl build

2022-08-30 Thread Andrew Dunstan


On 2022-08-26 Fr 16:25, Andres Freund wrote:
> Hi,
>
> On 2022-08-26 16:00:31 -0400, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> On 2022-08-26 Fr 12:11, Tom Lane wrote:
 And if that doesn't help, try -Wl,--export-all-symbols
>>> worked
> Except that it's only happening for plperl, I'd wonder if it's possibly
> related to our magic symbols being prefixed with _. I noticed that the
> underscore prefix e.g. changes the behaviour of gcc's "collect2" on AIX, which
> is responsible for exporting symbols etc.
>
>
>> Hmph.  Hard to see how that isn't a linker bug.
> Agreed, given that this is only happening with plperl, and not with any of the
> other extensions...
>
>
>> As a stopgap to get the farm green again, I propose adding something like
>>
>> ifeq ($(PORTNAME), cygwin)
>> SHLIB_LINK += -Wl,--export-all-symbols
>> endif
>>
>> to plperl's makefile.
> :(
>

It doesn't make me very happy either, but nobody seems to have a better
idea.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Postmaster self-deadlock due to PLT linkage resolution

2022-08-30 Thread Thomas Munro
On Wed, Aug 31, 2022 at 12:26 AM Robert Haas  wrote:
> On Tue, Aug 30, 2022 at 8:17 AM Thomas Munro  wrote:
> > FWIW I suspect FreeBSD can't break like this in a program linked with
> > libthr, because it has a scheme for deferring signals while the
> > runtime linker holds locks.  _rtld_bind calls _thr_rtld_rlock_acquire,
> > which uses the THR_CRITICAL_ENTER mechanism to cause thr_sighandler to
> > defer until release.  For a non-thread program, I'm not entirely sure,
> > but I don't think the fork() problem exists there.  (Could be wrong,
> > based on a quick look.)
>
> Well that seems a bit ironic, considering that Tom has worried in the
> past that linking with threading libraries would break stuff.

Hah.  To clarify, non-thread builds don't have that exact fork()
problem, but it turns out they do have a related state clobbering
problem elsewhere, which I've reported.




Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread Tom Lane
Tomas Vondra  writes:
> I guess the idea was to add a sentinel only when there already is space
> for it, but perhaps that's a bad tradeoff limiting the benefits. Either
> we add the sentinel fairly often (and then why not just add it all the
> time - it'll need a bit more space), or we do it only very rarely (and
> then it's a matter of luck if it catches an issue).

I'm fairly sure that when we made that decision originally, a top-of-mind
case was ListCells, which are plentiful, small, power-of-2-sized, and
not used in a way likely to have buffer overruns.  But since the List
rewrite a couple years back we no longer palloc individual ListCells.
So maybe we should revisit the question.  It'd be worth collecting some
stats about how much extra space would be needed if we force there
to be room for a sentinel.

regards, tom lane




Re: SQL/JSON features for v15

2022-08-30 Thread Andrew Dunstan


On 2022-08-30 Tu 06:29, Amit Langote wrote:
> On Tue, Aug 30, 2022 at 6:19 PM Alvaro Herrera  
> wrote:
>> On 2022-Aug-30, Amit Langote wrote:
>>
>>> Patches 0001-0006:
>>>
>>> Yeah, these add the overhead of an extra function call (typin() ->
>>> typin_opt_error()) in possibly very common paths.  Other than
>>> refactoring *all* places that call typin() to use the new API, the
>>> only other option seems to be to leave the typin() functions alone and
>>> duplicate their code in typin_opt_error() versions for all the types
>>> that this patch cares about.  Though maybe, that's not necessarily a
>>> better compromise than accepting the extra function call overhead.
>> I think another possibility is to create a static inline function in the
>> corresponding .c module (say boolin_impl() in bool.c), which is called
>> by both the opt_error variant as well as the regular one.  This would
>> avoid the duplicate code as well as the added function-call overhead.
> +1
>


Makes plenty of sense, I'll try to come up with replacements for these
forthwith.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





plpgsql-trigger.html: Format TG_ variables as table (patch)

2022-08-30 Thread Christoph Berg
Hi,

I found the list of TG_ variables on
https://www.postgresql.org/docs/current/plpgsql-trigger.html#PLPGSQL-DML-TRIGGER
hard to read for several reasons: too much whitespace, all the lines
start with "Data type", and even after that, the actual content is
hiding behind some extra "variable that..." boilerplate.

The attached patch formats the list as a table, and removes some of
the clutter from the text.

I reused the catalog_table_entry table machinery, that is probably not
quite the correct thing, but I didn't find a better variant, and the
result looks ok.

Thanks to ilmari for the idea and some initial reviews.

Christoph
>From 708c05277e6e2a93e9ceb1e52fda5991cef8b545 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Tue, 30 Aug 2022 15:09:39 +0200
Subject: [PATCH] plpgsql-trigger.html: Format TG_ variables as table

Format list as table, and remove redundant clutter ("viarable
holding...") that made the list hard to skim
---
 doc/src/sgml/plpgsql.sgml | 221 --
 1 file changed, 114 insertions(+), 107 deletions(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index cf387dfc3f..c73e81eff0 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -4028,142 +4028,149 @@ ASSERT condition  , 
When a PL/pgSQL function is called as a
trigger, several special variables are created automatically in the
-   top-level block. They are:
+   top-level block. They are listed in .
+  
 
-   
-
- NEW
- 
+  
+   TG_ special trigger function variables
+   
+
+ 
+  
+   Variable Type
+  
   
-   Data type RECORD; variable holding the new
-   database row for INSERT/UPDATE operations in row-level
-   triggers. This variable is null in statement-level triggers
-   and for DELETE operations.
+   Description
+  
+ 
+
+
+
+ 
+  
+   NEW record
   
- 
-
-
-
- OLD
- 
   
-   Data type RECORD; variable holding the old
-   database row for UPDATE/DELETE operations in row-level
+   new database row for INSERT/UPDATE operations in row-level
triggers. This variable is null in statement-level triggers
-   and for INSERT operations.
-  
- 
-
+   and for DELETE operations.
+  
+ 
 
-
- TG_NAME
- 
+ 
+  
+   OLD record
+  
   
-   Data type name; variable that contains the name of the trigger actually
-   fired.
+   old database row for UPDATE/DELETE operations in row-level
+   triggers. This variable is null in statement-level triggers and for INSERT
+   operations.
+  
+ 
+
+ 
+  
+   TG_NAME name
   
- 
-
+  
+   name of the trigger fired.
+  
+ 
 
-
- TG_WHEN
- 
+ 
+  
+   TG_WHEN text
+  
   
-   Data type text; a string of
-   BEFORE, AFTER, or
+   string BEFORE, AFTER, or
INSTEAD OF, depending on the trigger's definition.
-  
- 
-
+  
+ 
 
-
- TG_LEVEL
- 
+ 
+  
+   TG_LEVEL text
+  
   
-   Data type text; a string of either
-   ROW or STATEMENT
-   depending on the trigger's definition.
+   string ROW or STATEMENT depending
+   on the trigger's definition.
+  
+ 
+
+ 
+  
+   TG_OP text
   
- 
-
-
-
- TG_OP
- 
   
-   Data type text; a string of
-   INSERT, UPDATE,
+   string INSERT, UPDATE,
DELETE, or TRUNCATE
telling for which operation the trigger was fired.
-  
- 
-
+  
+ 
 
-
- TG_RELID
- 
-  
-   Data type oid; the object ID of the table that caused the
-   trigger invocation.
+ 
+  
+   TG_RELID oid
+   (references pg_class.oid)
   
- 
-
-
-
- TG_RELNAME
- 
   
-   Data type name; the name of the table that caused the trigger
-   invocation. This is now deprecated, and could disappear in a future
-   release. Use TG_TABLE_NAME instead.
-  
- 
-
+   object ID of the table that caused the trigger invocation.
+  
+ 
 
-
- TG_TABLE_NAME
- 
-  
-   Data type name; the name of the table that
-   caused the trigger invocation.
+ 
+  
+   TG_RELNAME name
   
- 
-
-
-
- TG_TABLE_SCHEMA
- 
   
-   Data type name; the name of the schema of the
-   table that caused the trigger invocation.
+   name of the table that caused the trigger invocation. This is now
+   deprecated, and could disappear in a future release. Use
+   TG_TABLE_NAME instead.
+  
+ 
+
+ 
+  
+   TG_TABLE_NAME name
   
- 
-
-
-
- TG_NARGS
- 
   
-   Data type integer; the number of arguments given to the trigger
-   function in the CREATE TRIGGER statement.
+   

Re: POC: GROUP BY optimization

2022-08-30 Thread Jonathan S. Katz

On 8/18/22 3:29 AM, Tomas Vondra wrote:



On 8/18/22 03:32, David Rowley wrote:



Here are a couple of patches to demo the idea.



Yeah, that's an option too. I should have mentioned it along with the
cpu_operator_cost.

BTW would you mind taking a look at the costing? I think it's fine, but
it would be good if someone not involved in the patch takes a look.


With RMT hat on, just a friendly reminder that this is still on the open 
items list[1] -- we have the Beta 4 release next week and it would be 
great if we can get this resolved prior to it.


Thanks!

Jonathan

[1] https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items
[2] 
https://www.postgresql.org/message-id/9d251aec-cea2-bc1a-5ed8-46ef0bcf6...@postgresql.org


OpenPGP_signature
Description: OpenPGP digital signature


Re: Removing dead code in pgcrypto

2022-08-30 Thread Aleksander Alekseev
Hi again,

> I'm pretty sure this change is fine too, but I added the patch to the
> CF application in order to play it safe. Let's see what cfbot will
> tell us.

I see a little race condition happen :) Sorry for this.

-- 
Best regards,
Aleksander Alekseev




Re: ICU for global collation

2022-08-30 Thread Michael Paquier
On Wed, Aug 24, 2022 at 08:28:24PM +0200, Peter Eisentraut wrote:
> Committed, thanks.
> 
> (This should conclude all the issues discussed in this thread recently.)

Please note that this open item was still listed as open.  I have
closed it now.
--
Michael


signature.asc
Description: PGP signature


Setting restrictedtoken in pg_regress

2022-08-30 Thread Daniel Gustafsson
In pg_regress we set restrictedToken when calling CreateRestrictedProcess, but
we never seem to use that anywhere.  Not being well versed in Windows I might
be missing something, but is it needed or is it a copy/pasteo from fa1e5afa8a2
which does that in restricted_token.c?  If not needed, removing it makes the
code more readable IMO.

--
Daniel Gustafsson   https://vmware.com/



pg_regress_restrictedtoken.diff
Description: Binary data


Re: Removing dead code in pgcrypto

2022-08-30 Thread Aleksander Alekseev
Hi Daniel,

> Thanks for looking!  On closer inspection, I found another function which was
> never used and which doesn't turn up when searching extensions.  The attached
> removes pgp_get_cipher_name as well.

I'm pretty sure this change is fine too, but I added the patch to the
CF application in order to play it safe. Let's see what cfbot will
tell us.

-- 
Best regards,
Aleksander Alekseev




Re: Removing dead code in pgcrypto

2022-08-30 Thread Daniel Gustafsson
> On 30 Aug 2022, at 14:39, Aleksander Alekseev  
> wrote:
> 
> Hi Daniel,
> 
>> it seems we can consider retiring them in v16.
> 
> Looks good to me. A link to the discussion was added to the patch.

Thanks for looking!  On closer inspection, I found another function which was
never used and which doesn't turn up when searching extensions.  The attached
removes pgp_get_cipher_name as well.

--
Daniel Gustafsson   https://vmware.com/



v3-0001-pgcrypto-Remove-unused-code.patch
Description: Binary data


Re: making relfilenodes 56 bits

2022-08-30 Thread Dilip Kumar
On Fri, Aug 26, 2022 at 9:33 PM Robert Haas  wrote:
>
> On Fri, Aug 26, 2022 at 7:01 AM Dilip Kumar  wrote:
> > While working on this solution I noticed one issue. Basically, the
> > problem is that during binary upgrade when we try to rewrite a heap we
> > would expect that “binary_upgrade_next_heap_pg_class_oid” and
> > “binary_upgrade_next_heap_pg_class_relfilenumber” are already set for
> > creating a new heap. But we are not preserving anything so we don't
> > have those values. One option to this problem is that we can first
> > start the postmaster in non-binary upgrade mode perform all conflict
> > checking and rewrite and stop the postmaster.  Then start postmaster
> > again and perform the restore as we are doing now.  Although we will
> > have to start/stop the postmaster one extra time we have a solution.
>
> Yeah, that seems OK. Or we could add a new function, like
> binary_upgrade_allow_relation_oid_and_relfilenode_assignment(bool).
> Not sure which way is better.

I have found one more issue with this approach of rewriting the
conflicting table.  Earlier I thought we could do the conflict
checking and rewriting inside create_new_objects() right before the
restore command.  But after implementing (while testing) this I
realized that we DROP and CREATE the database while restoring the dump
that means it will again generate the conflicting system tables.  So
theoretically the rewriting should go in between the CREATE DATABASE
and restoring the object but as of now both create database and
restoring other objects are part of a single dump file.  I haven't yet
analyzed how feasible it is to generate the dump in two parts, first
part just to create the database and in second part restore the rest
of the object.

Thoughts?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Removing dead code in pgcrypto

2022-08-30 Thread Aleksander Alekseev
Hi Daniel,

> it seems we can consider retiring them in v16.

Looks good to me. A link to the discussion was added to the patch.

-- 
Best regards,
Aleksander Alekseev


v2-0001-pgcrypto-Remove-unused-mbuf-code.patch
Description: Binary data


Re: replacing role-level NOINHERIT with a grant-level option

2022-08-30 Thread Robert Haas
On Tue, Aug 30, 2022 at 8:24 AM Robert Haas  wrote:
> On Mon, Aug 29, 2022 at 10:16 PM Robert Haas  wrote:
> > I'll figure out a fix tomorrow when I'm less tired. Thanks for catching it.
>
> Second try.

Third try.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v3-0001-Fix-a-bug-in-roles_is_member_of.patch
Description: Binary data


Re: Postmaster self-deadlock due to PLT linkage resolution

2022-08-30 Thread Robert Haas
On Tue, Aug 30, 2022 at 8:17 AM Thomas Munro  wrote:
> FWIW I suspect FreeBSD can't break like this in a program linked with
> libthr, because it has a scheme for deferring signals while the
> runtime linker holds locks.  _rtld_bind calls _thr_rtld_rlock_acquire,
> which uses the THR_CRITICAL_ENTER mechanism to cause thr_sighandler to
> defer until release.  For a non-thread program, I'm not entirely sure,
> but I don't think the fork() problem exists there.  (Could be wrong,
> based on a quick look.)

Well that seems a bit ironic, considering that Tom has worried in the
past that linking with threading libraries would break stuff.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: replacing role-level NOINHERIT with a grant-level option

2022-08-30 Thread Robert Haas
On Mon, Aug 29, 2022 at 10:16 PM Robert Haas  wrote:
> I'll figure out a fix tomorrow when I'm less tired. Thanks for catching it.

Second try.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-Fix-a-bug-in-roles_is_member_of.patch
Description: Binary data


Re: Postmaster self-deadlock due to PLT linkage resolution

2022-08-30 Thread Thomas Munro
On Tue, Aug 30, 2022 at 7:44 AM Tom Lane  wrote:
> Buildfarm member mamba (NetBSD-current on prairiedog's former hardware)
> has failed repeatedly since I set it up.  I have now run the cause of
> that to ground [1], and here's what's happening: if the postmaster
> receives a signal just before it first waits at the select() in
> ServerLoop, it can self-deadlock.  During the postmaster's first use of
> select(), the dynamic loader needs to resolve the PLT branch table entry
> that the core executable uses to reach select() in libc.so, and it locks
> the loader's internal data structures while doing that.  If we enter
> a signal handler while the lock is held, and the handler needs to do
> anything that also requires the lock, the postmaster is frozen.

. o O ( pselect() wouldn't have this problem, but it's slightly too
new for the back branches that didn't yet require SUSv3... drat )

> I'd originally intended to make this code "#ifdef __NetBSD__",
> but on looking into the FreeBSD sources I find much the same locking
> logic in their dynamic loader, and now I'm wondering if such behavior
> isn't pretty standard.  The added calls should have negligible cost,
> so it doesn't seem unreasonable to do them everywhere.

FWIW I suspect FreeBSD can't break like this in a program linked with
libthr, because it has a scheme for deferring signals while the
runtime linker holds locks.  _rtld_bind calls _thr_rtld_rlock_acquire,
which uses the THR_CRITICAL_ENTER mechanism to cause thr_sighandler to
defer until release.  For a non-thread program, I'm not entirely sure,
but I don't think the fork() problem exists there.  (Could be wrong,
based on a quick look.)

> (Of course, a much better answer is to get out of the business of
> doing nontrivial stuff in signal handlers.  But even if we get that
> done soon, we'd surely not back-patch it.)

+1




Re: Transparent column encryption

2022-08-30 Thread Peter Eisentraut

On 27.07.22 01:19, Jacob Champion wrote:

Now, if we don't have a padding system
built into the feature, then that does put even more on the user; it's
hard to argue with that.

Right. If they can even fix it at all. Having a well-documented padding
feature would not only help mitigate that, it would conveniently hang a
big sign on the caveats that exist.


I would be interested in learning more about such padding systems.  I 
have done a lot of reading for this development project, and I have 
never come across a cryptographic approach to hide length differences by 
padding.  Of course, padding to the block cipher's block size is already 
part of the process, but that is done out of necessity, not because you 
want to disguise the length.  Are there any other methods?  I'm 
interested to learn more.





Re: Perform streaming logical transactions by background workers and parallel apply

2022-08-30 Thread Amit Kapila
On Tue, Aug 30, 2022 at 12:12 PM Amit Kapila  wrote:
>
> Few other comments on v25-0001*
> 
>

Some more comments on v25-0001*:
=
1.
+static void
+apply_handle_stream_abort(StringInfo s)
...
...
+ else if (apply_action == TA_SEND_TO_PARALLEL_WORKER)
+ {
+ if (subxid == xid)
+ parallel_apply_replorigin_reset();
+
+ /* Send STREAM ABORT message to the apply parallel worker. */
+ parallel_apply_send_data(winfo, s->len, s->data);
+
+ /*
+ * After sending the data to the apply parallel worker, wait for
+ * that worker to finish. This is necessary to maintain commit
+ * order which avoids failures due to transaction dependencies and
+ * deadlocks.
+ */
+ if (subxid == xid)
+ {
+ parallel_apply_wait_for_free(winfo);
...
...

>From this code, it appears that we are waiting for rollbacks to finish
but not doing the same in the rollback to savepoint cases. Is there a
reason for the same? I think we need to wait for rollbacks to avoid
transaction dependency and deadlock issues. Consider the below case:

Consider table t1 (c1 primary key, c2, c3) has a row (1, 2, 3) on both
publisher and subscriber.

Publisher
Session-1
==
Begin;
...
Delete from t1 where c1 = 1;

Session-2
Begin;
...
insert into t1 values(1, 4, 5); --This will wait for Session-1's
Delete to finish.

Session-1
Rollback;

Session-2
-- The wait will be finished and the insert will be successful.
Commit;

Now, assume both these transactions get streamed and if we didn't wait
for rollback/rollback to savepoint, it is possible that the insert
gets executed before and leads to a constraint violation. This won't
happen in non-parallel mode, so we should wait for rollbacks to
finish.

2. I think we don't need to wait at Rollback Prepared/Commit Prepared
because we wait for prepare to finish in *_stream_prepare function.
That will ensure all the operations in that transaction have happened
in the subscriber, so no concurrent transaction can create deadlock or
transaction dependency issues. If so, I think it is better to explain
this in the comments.

3.
+/* What action to take for the transaction. */
+typedef enum
 {
- LogicalRepMsgType command; /* 0 if invalid */
- LogicalRepRelMapEntry *rel;
+ /* The action for non-streaming transactions. */
+ TA_APPLY_IN_LEADER_WORKER,

- /* Remote node information */
- int remote_attnum; /* -1 if invalid */
- TransactionId remote_xid;
- XLogRecPtr finish_lsn;
- char*origin_name;
-} ApplyErrorCallbackArg;
+ /* Actions for streaming transactions. */
+ TA_SERIALIZE_TO_FILE,
+ TA_APPLY_IN_PARALLEL_WORKER,
+ TA_SEND_TO_PARALLEL_WORKER
+} TransactionApplyAction;

I think each action needs explanation atop this enum typedef.

4.
@@ -1149,24 +1315,14 @@ static void
 apply_handle_stream_start(StringInfo s)
{
...
+ else if (apply_action == TA_SERIALIZE_TO_FILE)
+ {
+ /*
+ * For the first stream start, check if there is any free apply
+ * parallel worker we can use to process this transaction.
+ */
+ if (first_segment)
+ winfo = parallel_apply_start_worker(stream_xid);

- /* open the spool file for this transaction */
- stream_open_file(MyLogicalRepWorker->subid, stream_xid, first_segment);
+ if (winfo)
+ {
+ /*
+ * If we have found a free worker, then we pass the data to that
+ * worker.
+ */
+ parallel_apply_send_data(winfo, s->len, s->data);

- /* if this is not the first segment, open existing subxact file */
- if (!first_segment)
- subxact_info_read(MyLogicalRepWorker->subid, stream_xid);
+ nchanges = 0;

- pgstat_report_activity(STATE_RUNNING, NULL);
+ /* Cache the apply parallel worker for this transaction. */
+ stream_apply_worker = winfo;
+ }
...

This looks odd to me in the sense that even if the action is
TA_SERIALIZE_TO_FILE, we still send the information to the parallel
worker. Won't it be better if we call parallel_apply_start_worker()
for first_segment before checking apply_action with
get_transaction_apply_action(). That way we can avoid this special
case handling.

5.
+/*
+ * Struct for sharing information between apply leader apply worker and apply
+ * parallel workers.
+ */
+typedef struct ApplyParallelWorkerShared
+{
+ slock_t mutex;
+
+ bool in_use;
+
+ /* Logical protocol version. */
+ uint32 proto_version;
+
+ TransactionId stream_xid;

Are we using stream_xid passed by the leader in parallel worker? If
so, how? If not, then can we do without this?

6.
+void
+HandleParallelApplyMessages(void)
{
...
+ /* OK to process messages.  Reset the flag saying there are more to do. */
+ ParallelApplyMessagePending = false;

I don't understand the meaning of the second part of the comment.
Shouldn't we say: "Reset the flag saying there is nothing more to
do."? I know you have copied from the other part of the code but there
also I am not sure if it is correct.

7.
+static List *ApplyParallelWorkersFreeList = NIL;
+static List *ApplyParallelWorkersList = NIL;

Do we really need to maintain two different workers' lists? If so,
what is the advantage? I think 

Re: Transparent column encryption

2022-08-30 Thread Peter Eisentraut

On 20.07.22 08:12, Masahiko Sawada wrote:

---
Regarding the documentation, I'd like to have a page that describes
the generic information of the transparent column encryption for users
such as what this feature actually does, what can be achieved by this
feature, CMK rotation, and its known limitations. The patch has
"Transparent Column Encryption" section in protocol.sgml but it seems
to be more internal information.


I have added more documentation in the v6 patch.


---
In datatype.sgml, it says "Thus, clients that don't support
transparent column encryption or have disabled it will see the
encrypted values as byte arrays." but I got an error rather than
encrypted values when I tried to connect to the server using by
clients that don't support the encryption:

postgres(1:6040)=# select * from tbl;
no CMK lookup found for realm ""


This has now been improved in v6.  The protocol changes need to be 
activated explicitly at connection time, so if you use a client that 
doesn't support it or activates it, you get the described behavior.



---
In single-user mode, the user cannot decrypt the encrypted value but
probably it's fine in practice.


Yes, there is nothing really to do about that.


---
Regarding the column master key rotation, would it be useful if we
provide a tool for that? For example, it takes old and new CMK as
input, re-encrypt all CEKs realted to the CMK, and registers them to
the server.


I imagine users using a variety of key management systems, so I don't 
see how a single tool would work.  But it's something we can think about 
in the future.



---
Is there any convenient way to load a large amount of test data to the
encrypted columns? I tried to use generate_series() but it seems not
to work as it generates the data on the server side:


No, that doesn't work, by design.  You'd have to write a client program 
to generate the data.



I've also tried to load the data from a file on the client by using
\copy command, but it seems not to work:

postgres(1:80556)=# copy (select generate_series(1, 1000)::text) to
'/tmp/tmp.dat';
COPY 1000
postgres(1:80556)=# \copy a from '/tmp/tmp.dat'
COPY 1000
postgres(1:80556)=# select * from a;
out out memory


This was a bug that I have fixed.


---
I got SEGV in the following two situations:


I have fixed these.




Removing dead code in pgcrypto

2022-08-30 Thread Daniel Gustafsson
mbuf_tell and mbuf_rewind functions were introduced in commit e94dd6ab91 but
were seemingly never used, so it seems we can consider retiring them in v16.

--
Daniel Gustafsson   https://vmware.com/



v1-0001-pgcrypto-Remove-unused-mbuf-code.patch
Description: Binary data


Re: SQL/JSON features for v15

2022-08-30 Thread Amit Langote
On Tue, Aug 30, 2022 at 6:19 PM Alvaro Herrera  wrote:
> On 2022-Aug-30, Amit Langote wrote:
>
> > Patches 0001-0006:
> >
> > Yeah, these add the overhead of an extra function call (typin() ->
> > typin_opt_error()) in possibly very common paths.  Other than
> > refactoring *all* places that call typin() to use the new API, the
> > only other option seems to be to leave the typin() functions alone and
> > duplicate their code in typin_opt_error() versions for all the types
> > that this patch cares about.  Though maybe, that's not necessarily a
> > better compromise than accepting the extra function call overhead.
>
> I think another possibility is to create a static inline function in the
> corresponding .c module (say boolin_impl() in bool.c), which is called
> by both the opt_error variant as well as the regular one.  This would
> avoid the duplicate code as well as the added function-call overhead.

+1

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-30 Thread Peter Smith
Hi Onder,

Since you ask me several questions [1], this post is just for answering those.

I have looked again at the latest v9 patch, but I will post my review
comments for that separately.


On Thu, Aug 25, 2022 at 7:09 PM Önder Kalacı  wrote:
>
>> 1d.
>> In above text, what was meant by "catches up around ~5 seconds"?
>> e.g. Did it mean *improves* by ~5 seconds, or *takes* ~5 seconds?
>>
>
> It "takes" 5 seconds to replicate all the changes. To be specific, I execute 
> 'SELECT sum(abalance) FROM pgbench_accounts' on the subscriber, and try to 
> measure the time until when all the changes are replicated. I do use the same 
> query on the publisher to check what the query result should be when 
> replication is done.
>
> I updated the relevant text, does that look better?

Yes.

>> 2. GENERAL
>>
>> 2a.
>> There are lots of single-line comments that start lowercase, but by
>> convention, I think they should start uppercase.
>>
>> e.g. + /* we should always use at least one attribute for the index scan */
>> e.g. + /* we might not need this if the index is unique */
>> e.g. + /* avoid expensive equality check if index is unique */
>> e.g. + /* unrelated Path, skip */
>> e.g. + /* simple case, we already have an identity or pkey */
>> e.g. + /* indexscans are disabled, use seq. scan */
>> e.g. + /* target is a regular table */
>>
>> ~~
>
>
> Thanks for noting this, I didn't realize that there is a strict requirement 
> on this. Updated all of your suggestions, and realized one more such case.
>
> Is there documentation where such conventions are listed? I couldn't find any.

I don’t know of any strict requirements, but I did think it was the
more common practice to make the comments look like proper sentences.
However, when I tried to prove that by counting the single-line
comments in PG code it seems to be split almost 50:50
lowercase/uppercase, so I guess you should just do whatever is most
sensible or is most consistent with the surrounding code ….

Counts for single line /* */ comments:
regex ^\s*\/\*\s[a-z]+.*\*\/$  = 18222 results
regex ^\s*\/\*\s[A-Z]+.*\*\/$ = 20252 results

>> 3. src/backend/executor/execReplication.c - build_replindex_scan_key
>>
>> - int attoff;
>> + int index_attoff;
>> + int scankey_attoff;
>>   bool isnull;
>>   Datum indclassDatum;
>>   oidvector  *opclass;
>>   int2vector *indkey = >rd_index->indkey;
>> - bool hasnulls = false;
>> -
>> - Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
>> -RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));
>>
>>   indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple,
>>   Anum_pg_index_indclass, );
>>   Assert(!isnull);
>>   opclass = (oidvector *) DatumGetPointer(indclassDatum);
>> + scankey_attoff = 0;
>>
>> Maybe just assign scankey_attoff = 0 at the declaration?
>>
>
> Again, lack of coding convention knowledge :/ My observation is that it is 
> often not assigned during the declaration. But, changed this one.
>

I don’t know of any convention. Probably this is just my own
preference to keep the simple default assignments with the declaration
to reduce the LOC. YMMV.

>>
>> 6.
>>
>> - int pkattno = attoff + 1;
>> ...
>>   /* Initialize the scankey. */
>> - ScanKeyInit([attoff],
>> - pkattno,
>> + ScanKeyInit([scankey_attoff],
>> + index_attoff + 1,
>>   BTEqualStrategyNumber,
>> Wondering if it would have been simpler if you just did:
>> int pkattno = index_attoff + 1;
>
>
>
> The index is not necessarily the primary key at this point, that's why I 
> removed it.
>
> There are already 3 variables in the same function index_attoff, 
> scankey_attoff and table_attno, which are hard to avoid. But, this one seemed 
> ok to avoid, mostly to simplify the readability. Do you think it is better 
> with the additional variable? Still, I think we need a better name as "pk" is 
> not relevant anymore.
>

Your code is fine. Leave it as-is.

>> 8. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex
>>
>> @@ -128,28 +171,44 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
>>   TransactionId xwait;
>>   Relation idxrel;
>>   bool found;
>> + TypeCacheEntry **eq;
>> + bool indisunique;
>> + int scankey_attoff;
>>
>>   /* Open the index. */
>>   idxrel = index_open(idxoid, RowExclusiveLock);
>> + indisunique = idxrel->rd_index->indisunique;
>> +
>> + /* we might not need this if the index is unique */
>> + eq = NULL;
>>
>> Maybe just default assign eq = NULL in the declaration?
>>
>
> Again, I wasn't sure if it is OK regarding the coding convention to assign 
> during the declaration. Changed now.
>

Same as #3.

>> 10.
>>
>> + /* we only need to allocate once */
>> + if (eq == NULL)
>> + eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);
>>
>> But shouldn't you also free this 'eq' before the function returns, to
>> prevent leaking memory?
>>
>
> Two notes here. First, this is allocated inside ApplyMessageContext, which 
> seems to be reset per tuple change. So, 

Re: SQL/JSON features for v15

2022-08-30 Thread Alvaro Herrera
On 2022-Aug-30, Amit Langote wrote:

> Patches 0001-0006:
> 
> Yeah, these add the overhead of an extra function call (typin() ->
> typin_opt_error()) in possibly very common paths.  Other than
> refactoring *all* places that call typin() to use the new API, the
> only other option seems to be to leave the typin() functions alone and
> duplicate their code in typin_opt_error() versions for all the types
> that this patch cares about.  Though maybe, that's not necessarily a
> better compromise than accepting the extra function call overhead.

I think another possibility is to create a static inline function in the
corresponding .c module (say boolin_impl() in bool.c), which is called
by both the opt_error variant as well as the regular one.  This would
avoid the duplicate code as well as the added function-call overhead.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.




Re: Letter case of "admin option"

2022-08-30 Thread Alvaro Herrera
On 2022-Aug-26, Robert Haas wrote:

> Here's a patch changing all occurrences of "admin option" in error
> messages to "ADMIN OPTION".
> 
> Two of these five messages also exist in previous releases; the other
> three are new.
> 
> I'm not sure if this is our final conclusion on what we want to do
> here, so please speak up if you don't agree.

Thanks -- this is my personal preference, as well as speaking on behalf
of a few people who considered the matter from a user's point of view.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Reducing the chunk header sizes on all memory context types

2022-08-30 Thread Tomas Vondra



On 8/30/22 04:31, David Rowley wrote:
> On Tue, 30 Aug 2022 at 13:55, Tom Lane  wrote:
>> I wonder if slab ought to artificially bump up such requests when
>> MEMORY_CONTEXT_CHECKING is enabled, so there's room for a sentinel.
>> I think it's okay for aset.c to not do that, because its power-of-2
>> behavior means there usually is room for a sentinel; but slab's
>> policy makes it much more likely that there won't be.
> 
> I think it's fairly likely that small allocations are a power of 2,
> and I think most of our allocates are small, so I imagine that if we
> didn't do that for aset.c, we'd miss out on most of the benefits.
> 

Yeah. I think we have a fair number of "larger" allocations (once you
get to ~100B it probably won't be a 2^N), but we may easily miss whole
sections of allocations.

I guess the idea was to add a sentinel only when there already is space
for it, but perhaps that's a bad tradeoff limiting the benefits. Either
we add the sentinel fairly often (and then why not just add it all the
time - it'll need a bit more space), or we do it only very rarely (and
then it's a matter of luck if it catches an issue). Considering we only
do this with asserts, I doubt the extra bytes / CPU is a major issue,
and a (more) reliable detection of issues seems worth it. But maybe I
underestimate the costs. The only alternative seems to be valgrind, and
that's way costlier, though.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pg_rewind WAL segments deletion pitfall

2022-08-30 Thread Alexander Kukushkin
On Tue, 30 Aug 2022 at 10:27, Kyotaro Horiguchi 
wrote:

>
> Hmm. Doesn't it work to ignoring tli then? All segments that their
> segment number is equal to or larger than the checkpoint locaiton are
> preserved regardless of TLI?
>

If we ignore TLI there is a chance that we may retain some unnecessary (or
just wrong) files.


>
> > Also, we need to take into account the divergency LSN. Files after it are
> > not required.
>
> They are removed at the later checkpoints. But also we can remove
> segments that are out of the range between the last common checkpoint
> and divergence point ignoring TLI.


Everything that is newer last_common_checkpoint_seg could be removed (but
it already happens automatically, because these files are missing on the
new primary).
WAL files that are older than last_common_checkpoint_seg could be either
removed or at least not copied from the new primary.



> the divergence point is also
> compared?
>
> > if (file_segno >= last_common_checkpoint_seg &&
> > file_segno <= divergence_seg)
> > ;
>

The current implementation relies on tracking WAL files being open while
searching for the last common checkpoint. It automatically starts from the
divergence_seg, automatically finishes at last_common_checkpoint_seg, and
last but not least, automatically handles timeline changes. I don't think
that manually written code that decides what to do from the WAL file name
(and also takes into account TLI) could be much simpler than the current
approach.


Actually, since we start doing some additional "manipulations" with files
in pg_wal, we probably should do a symmetric action with files inside
pg_wal/archive_status

Regards,
--
Alexander Kukushkin


  1   2   >