date:20230714

On Thu, Jul 13, 2023 at 10:55 AM Masahiko Sawada  wrote:
>
> On Wed, Jul 12, 2023 at 11:15 PM Masahiko Sawada  
> wrote:
> >
> > > I think this is a valid concern. Can't we move all the checks
> > > (including the remote attrs check) inside
> > > IsIndexUsableForReplicaIdentityFull() and then call it from both
> > > places? Won't we have attrmap information available in the callers of
> > > FindReplTupleInLocalRel() via ApplyExecutionData?
> >
> > You mean to pass ApplyExecutionData or attrmap down to
> > RelationFindReplTupleByIndex()? I think it would be better to call it
> > from FindReplTupleInLocalRel() instead.
>
> I've attached a draft patch for this idea.
>

Looks reasonable to me. However, I am not very sure if we need to
change the prototype of RelationFindReplTupleByIndex(). Few other
minor comments:

1.
- * has been implemented as a tri-state with values DISABLED, PENDING, and
+n * has been implemented as a tri-state with values DISABLED, PENDING, and
  * ENABLED.
  *
The above seems like a spurious change.

2.
+ /* And must reference the remote relation column */
+ if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol) ||
+ attrmap->attnums[AttrNumberGetAttrOffset(keycol)] < 0)
+ return false;
+

I think we should specify the reason for this. I see that in the
commit fd48a86c62 [1], the reason for this is removed. Shouldn't we
retain that in some form?

[1] -
- * We also skip indexes if the remote relation does not contain the leftmost
- * column of the index. This is because in most such cases sequential scan is
- * favorable over index scan.

-- 
With Regards,
Amit Kapila.

Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

On Fri, Jul 14, 2023 at 2:15 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > > I think it's appropriate to add on the restrictions page. (But mentioning 
> > > that this
> > restriction is only for subscriber)
> > >
> > > If the list were larger, then the restrictions page could be divided into 
> > > publisher
> > and subscriber restrictions. But not for one very specific restriction.
> > >
> >
> > Okay, how about something like: "The UPDATE and DELETE operations
> > cannot be applied on the subscriber for the published tables that
> > specify REPLICA IDENTITY FULL when the table has attributes with
> > datatypes (e.g point or box) that don't have a default operator class
> > for Btree or Hash. This won't be a problem if the table has a primary
> > key or replica identity defined for it."?
>
> Thanks for discussing and giving suggestions. But it seems that the first
> sentence is difficult to read for me. How about attached?
>

The last line seems repetitive to me. So, I have removed it. Apart
from that patch looks good to me. Sergie, Peter, and others, any
thoughts?

-- 
With Regards,
Amit Kapila.


v5-0001-Doc-Update-the-logical-replication-restriction-w..patch
Description: Binary data

Re: Should we remove db_user_namespace?

2023-07-14 Thread Nathan Bossart

On Mon, Jul 10, 2023 at 03:43:07PM +0900, Michael Paquier wrote:
> On Wed, Jul 05, 2023 at 08:49:26PM -0700, Nathan Bossart wrote:
>> On Thu, Jul 06, 2023 at 08:21:18AM +0900, Michael Paquier wrote:
>>> Removing the GUC from this table is kind of annoying.  Cannot this be
>>> handled like default_with_oids or ssl_renegotiation_limit to avoid any
>>> kind of issues with the reload of dump files and the kind?
>> 
>> Ah, good catch.
> 
> Thanks.  Reading through the patch, this version should be able to
> handle the dump reloads.

Hm.  Do we actually need to worry about this?  It's a PGC_SIGHUP GUC, so it
can only be set at postmaster start or via a configuration file.  Any dump
files that are trying to set it or clients that are trying to add it to
startup packets are already broken.  I guess keeping the GUC around would
avoid breaking any configuration files or startup scripts that happen to be
setting it to false, but I don't know if that's really worth worrying
about.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: CommandStatus from insert returning when using a portal.

2023-07-14 Thread Chapman Flack


On 2023-07-14 18:22, David G. Johnston wrote:
For PostgreSQL this is even moreso (i.e, huge means count > 1) since 
the
order of rows in the returning clause is not promised to be related to 
the
order of the rows as seen in the supplied insert command.  A manual 
insert
returning should ask for not only any auto-generated column but also 
the

set of columns that provide the unique natural key.


Yikes!

That sounds like something that (if it's feasible) the driver's
rewriting for RETURN_GENERATED_KEYS should try to do ... the
driver is already expected to be smart enough to know which
columns the generated keys are ... should it also try to rewrite
the query in some way to get a meaningful order of the results?

But then ... the apidoc for getGeneratedKeys is completely
silent on the ordering anyway. I get the feeling this whole
corner of the JDBC API may have been thought out only as far
as issuing a single-row INSERT at a time and getting its
assigned keys back.

Regards,
-Chap

Re: CommandStatus from insert returning when using a portal.

On Fri, Jul 14, 2023 at 3:12 PM Chapman Flack  wrote:

> If someone really does want to do a huge INSERT and get the generated
> values back in increments, it might be clearer to write an explicit
> INSERT RETURNING and issue it with executeQuery, where everything will
> work as expected.
>
>
For PostgreSQL this is even moreso (i.e, huge means count > 1) since the
order of rows in the returning clause is not promised to be related to the
order of the rows as seen in the supplied insert command.  A manual insert
returning should ask for not only any auto-generated column but also the
set of columns that provide the unique natural key.

David J.

Re: pg_dump needs SELECT privileges on irrelevant extension table

2023-07-14 Thread Jacob Champion

Thanks for the reviews, both of you!

On Fri, Jul 14, 2023 at 5:04 AM Andrew Dunstan  wrote:
> Seems reasonable at first glance. Isn't it going to save some work anyway 
> later on, so the performance hit could end up negative?

Theoretically it could, if the OID list sent during getPolicies()
shrinks enough. I tried a quick test against the regression database,
but it's too noisy on my machine to know whether the difference is
really meaningful.

--Jacob

Re: CommandStatus from insert returning when using a portal.

2023-07-14 Thread Chapman Flack


On 2023-07-14 17:31, Chapman Flack wrote:

So when getGeneratedKeys was later added, a way of getting a ResultSet
after an executeUpdate, did they consciously intend it to come under
the jurisdiction of existing apidoc that concerned the fetch size of
a ResultSet you wanted from executeQuery?
...
Moreover, the apidoc does say the fetch size is "a hint", and also that
it applies "when more rows are needed" from the ResultSet.

So it's technically not a misbehavior to disregard the hint, and you're
not even disregarding the hint if you fetch all the rows at once, 
because

then more rows can't be needed. :)


... and just to complete the thought, the apidoc for executeUpdate 
leaves

no wiggle room for what that method returns: for DML, it has to be the
row count.

So if the only way to get the accurate row count is to fetch all the
RETURN_GENERATED_KEYS rows at once, either to count them locally or
to find the count in the completion message that follows them, that
mandate seems stronger than any hint from setFetchSize.

If someone really does want to do a huge INSERT and get the generated
values back in increments, it might be clearer to write an explicit
INSERT RETURNING and issue it with executeQuery, where everything will
work as expected.

I am also thinking someone might possibly allocate one Statement to
use for some number of executeQuery and executeUpdate calls, and might
call setFetchSize as a hint for the queries, but not expect it to have
effects spilling over to executeUpdate.

Regards,
-Chap

Re: CommandStatus from insert returning when using a portal.

On Fri, Jul 14, 2023 at 12:51 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > I agree that the documented contract of the insert command tag says it
> > reports the size of the entire tuple store maintained by the server
> during
> > the transaction instead of just the most recent count on subsequent
> fetches.
>
> Where do you see that documented, exactly?  I looked in the protocol
> chapter and didn't find anything definite either way.
>

On successful completion, an INSERT command returns a command tag of the
form

INSERT oid count
The count is the number of rows inserted or updated.

https://www.postgresql.org/docs/current/sql-insert.html

It doesn't, nor should, have any qualifications about not applying to the
returning case and definitely shouldn't change based upon use of FETCH on
the unnamed portal.

I'm quite prepared to believe there are bugs here, since this whole
> set of behaviors is unreachable via libpq: you can't get it to send
> an Execute with a count other than zero (ie, fetch all), nor is it
> prepared to deal with the PortalSuspended messages it'd get if it did.
>
> I think that the behavior is arising from this bit in PortalRun:
>
> if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN)
> {
> CopyQueryCompletion(qc, >qc);
> -->>>   qc->nprocessed = nprocessed;
> }
>

I came to the same conclusion.  The original introduction of that line
replaced string munging "SELECT " + nprocessed; so this code never even
considered INSERT as being in scope and indeed wanted to return the per-run
value in a fetch-list context.

https://github.com/postgres/postgres/commit/2f9661311b83dc481fc19f6e3bda015392010a40#diff-f66f9adc3dfc98f2ede2e96691843b75128689a8cb9b79ae68d53dc749c3b22bL781

I think I see why simple removal of that line is sufficient as the
copied-from >qc already has the result of executing the underlying
insert query.  That said, the paranoid approach would seem to be to keep
the assignment but only use it when we aren't dealing with the returning
case.

diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..5e75141f0b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -775,7 +775,8 @@ PortalRun(Portal portal, long count, bool isTopLevel,
bool run_once,
if (qc && portal->qc.commandTag !=
CMDTAG_UNKNOWN)
{
CopyQueryCompletion(qc,
>qc);
-   qc->nprocessed = nprocessed;
+   if (portal->strategy !=
PORTAL_ONE_RETURNING)
+   qc->nprocessed = nprocessed;
}

/* Mark portal not active */


> It seems plausible to me that we should just remove that marked line.
> Not sure about the compatibility implications, though.
>
>
I believe it is a bug worth fixing, save driver writers processing time
getting a count when the command tag is supposed to be providing it to them
using compute time already spent anyways.

David J.

Re: CommandStatus from insert returning when using a portal.

2023-07-14 Thread Chapman Flack


On 2023-07-14 17:02, Dave Cramer wrote:

The fly in the ointment here is when they setFetchSize and we decide to
use a Portal under the covers.


A person might language-lawyer about whether setFetchSize even applies
to the kind of thing done with executeUpdate.

Hmm ... the apidoc for setFetchSize says "Gives the JDBC driver a hint
as to the number of rows that should be fetched from the database when
more rows are needed for ResultSet objects generated by this Statement."

So ... it appears to apply to any "ResultSet objects generated by this
Statement", and getGeneratedKeys returns a ResultSet, so maybe
setFetchSize should apply to it.

OTOH, setFetchSize has @since 1.2, and getGeneratedKeys @since 1.4.
At the time setFetchSize was born, the only way you could get a 
ResultSet

was from the kind of thing you'd use executeQuery for.

So when getGeneratedKeys was later added, a way of getting a ResultSet
after an executeUpdate, did they consciously intend it to come under
the jurisdiction of existing apidoc that concerned the fetch size of
a ResultSet you wanted from executeQuery?

Full employment for language lawyers.

Moreover, the apidoc does say the fetch size is "a hint", and also that
it applies "when more rows are needed" from the ResultSet.

So it's technically not a misbehavior to disregard the hint, and you're
not even disregarding the hint if you fetch all the rows at once, 
because

then more rows can't be needed. :)

Regards,
-Chap

Re: CommandStatus from insert returning when using a portal.

On Fri, 14 Jul 2023 at 16:32,  wrote:

> On 2023-07-14 15:49, Dave Cramer wrote:
> > On Fri, 14 Jul 2023 at 15:40,  wrote:
> >> Perhaps an easy rule would be, if the driver itself adds RETURNING
> >> because of a RETURN_GENERATED_KEYS option, it should also force the
> >> fetch count to zero and collect all the returned rows before
> >> executeUpdate returns, and then it will have the right count
> >> to return?
> >
> > Well that kind of negates the whole point of using a cursor in the case
> > where you have a large result set.
> >
> > The rows are subsequently fetched in rs.next()
>
> I guess it comes down, again, to the question of what kind of thing
> the API client thinks it is doing, when it issues an INSERT with
> the RETURN_GENERATED_KEYS option.
>
> An API client issuing a plain INSERT knows it is the kind of thing
> for which executeUpdate() is appropriate, and the complete success
> or failure outcome is known when that returns.
>
> An API client issuing its own explicit INSERT RETURNING knows it
> is the kind of thing for which executeQuery() is appropriate, and
> that some processing (and possibly ereporting) may be left to
> occur while working through the ResultSet.
>
> But now how about this odd hybrid, where the API client wrote
> a plain INSERT, but added the RETURN_GENERATED_KEYS option?
> The rewritten query is the kind of thing the server and the
> driver need to treat as a query, but to the API client it still
> appears the kind of thing for which executeUpdate() is suited.
> The generated keys can then be examined--in the form of a
> ResultSet--but one obtained with the special method
> getGeneratedKeys(), rather than the usual way of getting the
> ResultSet from a query.
>
> Should the client then assume the semantics of executeUpdate
> have changed for this case, the outcome isn't fully known yet,
> and errors could come to light while examining the returned
> keys? Or should it still assume that executeUpdate has its
> usual meaning, all the work is done by that point, and the
> ResultSet from getGeneratedKeys() is simply a convenient,
> familiar interface for examining what came back?
>

The fly in the ointment here is when they setFetchSize and we decide to use
a Portal under the covers.

I"m willing to document around this since it looks pretty unlikely that
changing the behaviour in the server is a non-starter.


>
> I'm not sure if there's a firm answer to that one way or the
> other in the formal JDBC spec, but the latter seems perhaps
> safer to me.
>

I'll leave the user to decide their own fate.

Dave

>
> Regards,
> -Chap
>

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

2023-07-14 Thread David Zhang

I believe before users can make a backup using pg_basebackup and then 
start the backup as an independent Primary server for whatever reasons. 
Now, if this is still allowed, then users need to be aware that the 
backup_label must be manually deleted, otherwise, the backup won't be 
able to start as a Primary.


The current message below doesn't provide such a hint.

+   if (!ArchiveRecoveryRequested)
+   ereport(FATAL,
+   (errmsg("could not find recovery.signal or 
standby.signal when recovering with backup_label"),
+errhint("If you are restoring from a backup, touch 
\"%s/recovery.signal\" or \"%s/standby.signal\" and add required recovery options.",
+DataDir, DataDir)));

On 2023-03-12 6:06 p.m., Michael Paquier wrote:

On Fri, Mar 10, 2023 at 03:59:04PM +0900, Michael Paquier wrote:

My apologies for the long message, but this deserves some attention,
IMHO.

Note: A CF entry has been added as of [1], and I have added an item in
the list of live issues on the open item page for 16.

[1]:https://commitfest.postgresql.org/43/4244/
[2]:https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items#Live_issues
--
Michael


Best regards,

David

Re: CommandStatus from insert returning when using a portal.


On 2023-07-14 15:49, Dave Cramer wrote:

On Fri, 14 Jul 2023 at 15:40,  wrote:

Perhaps an easy rule would be, if the driver itself adds RETURNING
because of a RETURN_GENERATED_KEYS option, it should also force the
fetch count to zero and collect all the returned rows before
executeUpdate returns, and then it will have the right count
to return?


Well that kind of negates the whole point of using a cursor in the case
where you have a large result set.

The rows are subsequently fetched in rs.next()


I guess it comes down, again, to the question of what kind of thing
the API client thinks it is doing, when it issues an INSERT with
the RETURN_GENERATED_KEYS option.

An API client issuing a plain INSERT knows it is the kind of thing
for which executeUpdate() is appropriate, and the complete success
or failure outcome is known when that returns.

An API client issuing its own explicit INSERT RETURNING knows it
is the kind of thing for which executeQuery() is appropriate, and
that some processing (and possibly ereporting) may be left to
occur while working through the ResultSet.

But now how about this odd hybrid, where the API client wrote
a plain INSERT, but added the RETURN_GENERATED_KEYS option?
The rewritten query is the kind of thing the server and the
driver need to treat as a query, but to the API client it still
appears the kind of thing for which executeUpdate() is suited.
The generated keys can then be examined--in the form of a
ResultSet--but one obtained with the special method
getGeneratedKeys(), rather than the usual way of getting the
ResultSet from a query.

Should the client then assume the semantics of executeUpdate
have changed for this case, the outcome isn't fully known yet,
and errors could come to light while examining the returned
keys? Or should it still assume that executeUpdate has its
usual meaning, all the work is done by that point, and the
ResultSet from getGeneratedKeys() is simply a convenient,
familiar interface for examining what came back?

I'm not sure if there's a firm answer to that one way or the
other in the formal JDBC spec, but the latter seems perhaps
safer to me.

Regards,
-Chap

Re: index prefetching

Here's a v5 of the patch, rebased to current master and fixing a couple
compiler warnings reported by cfbot (%lu vs. UINT64_FORMAT in some debug
messages). No other changes compared to v4.

cfbot also reported a failure on windows in pg_dump [1], but it seem
pretty strange:

[11:42:48.708] - 8<
-
[11:42:48.708] stderr:
[11:42:48.708] #   Failed test 'connecting to an invalid database: matches'

The patch does nothing related to pg_dump, and the test works perfectly
fine for me (I don't have windows machine, but 32-bit and 64-bit linux
works fine for me).


regards


[1] https://cirrus-ci.com/task/6398095366291456

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index e2c9b5f069..9045c6eb7a 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -678,7 +678,6 @@ gistgettuple(IndexScanDesc scan, ScanDirection dir)
 	scan->xs_hitup = so->pageData[so->curPageData].recontup;
 
 so->curPageData++;
-
 return true;
 			}
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 5a17112c91..0b6c920ebd 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -44,6 +44,7 @@
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
+#include "utils/spccache.h"
 
 static void reform_and_rewrite_tuple(HeapTuple tuple,
 	 Relation OldHeap, Relation NewHeap,
@@ -751,6 +752,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			PROGRESS_CLUSTER_INDEX_RELID
 		};
 		int64		ci_val[2];
+		int			prefetch_target;
+
+		prefetch_target = get_tablespace_io_concurrency(OldHeap->rd_rel->reltablespace);
 
 		/* Set phase and OIDOldIndex to columns */
 		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
@@ -759,7 +763,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, 0,
+	prefetch_target, prefetch_target);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 722927aeba..264ebe1d8e 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -126,6 +126,9 @@ RelationGetIndexScan(Relation indexRelation, int nkeys, int norderbys)
 	scan->xs_hitup = NULL;
 	scan->xs_hitupdesc = NULL;
 
+	/* set in each AM when applicable */
+	scan->xs_prefetch = NULL;
+
 	return scan;
 }
 
@@ -440,8 +443,9 @@ systable_beginscan(Relation heapRelation,
 elog(ERROR, "column is not in index");
 		}
 
+		/* no index prefetch for system catalogs */
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-		 snapshot, nkeys, 0);
+		 snapshot, nkeys, 0, 0, 0);
 		index_rescan(sysscan->iscan, key, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 	}
@@ -696,8 +700,9 @@ systable_beginscan_ordered(Relation heapRelation,
 			elog(ERROR, "column is not in index");
 	}
 
+	/* no index prefetch for system catalogs */
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-	 snapshot, nkeys, 0);
+	 snapshot, nkeys, 0, 0, 0);
 	index_rescan(sysscan->iscan, key, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index b25b03f7ab..0b8f136f04 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -54,11 +54,13 @@
 #include "catalog/pg_amproc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "common/hashfn.h"
 #include "nodes/makefuncs.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "utils/lsyscache.h"
 #include "utils/ruleutils.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
@@ -106,7 +108,10 @@ do { \
 
 static IndexScanDesc index_beginscan_internal(Relation indexRelation,
 			  int nkeys, int norderbys, Snapshot snapshot,
-			  ParallelIndexScanDesc pscan, bool temp_snap);
+			  ParallelIndexScanDesc pscan, bool temp_snap,
+			  int prefetch_target, int prefetch_reset);
+
+static void index_prefetch(IndexScanDesc scan, ItemPointer tid);
 
 
 /* 
@@ -200,18 +205,36 @@ index_insert(Relation indexRelation,
  * index_beginscan - start a scan of an index with amgettuple
  *
  * Caller must be holding suitable locks on the heap and the index.
+ *
+ * prefetch_target determines if prefetching is requested for this index scan.
+ * We need to be able to disable this for two reasons. Firstly, we don't want
+ * to do

Re: Bytea PL/Perl transform

=?UTF-8?B?SXZhbiBQYW5jaGVua28=?=  writes:
> Четверг, 6 июля 2023, 14:48 +03:00 от Peter Eisentraut < pe...@eisentraut.org 
> >:
>> If the transform deals with a built-in type, then they should just be
>> added to the respective pl extension directly.

> The new extension bytea_plperl can be easily moved into plperl now, but what 
> should be do with the existing ones, namely jsonb_plperl and bool_plperl ?
> If we leave them where they are, it would be hard to explain why some 
> transforms are inside plperl while other ones live separately. If we move 
> them into plperl also, wouldn’t it break some compatibility?

It's kind of a mess, indeed.  But I think we could make plperl 1.1
contain the additional transforms and just tell people they have
to drop the obsolete extensions before they upgrade to 1.1.
Fortunately, it doesn't look like functions using a transform
have any hard dependency on the transform, so "drop extension
jsonb_plperl" followed by "alter extension plperl update" should
work without cascading to all your plperl functions.

Having said that, we'd still be in the position of having to
explain why some transforms are packaged with plperl and others
not.  The distinction between built-in and contrib types might
be obvious to us hackers, but I bet a lot of users see it as
pretty artificial.  So maybe the existing packaging design is
fine and we should just look for a way to reduce the code
duplication.

regards, tom lane

Re: CommandStatus from insert returning when using a portal.

"David G. Johnston"  writes:
> I agree that the documented contract of the insert command tag says it
> reports the size of the entire tuple store maintained by the server during
> the transaction instead of just the most recent count on subsequent fetches.

Where do you see that documented, exactly?  I looked in the protocol
chapter and didn't find anything definite either way.

I'm quite prepared to believe there are bugs here, since this whole
set of behaviors is unreachable via libpq: you can't get it to send
an Execute with a count other than zero (ie, fetch all), nor is it
prepared to deal with the PortalSuspended messages it'd get if it did.

I think that the behavior is arising from this bit in PortalRun:

switch (portal->strategy)
{
...
case PORTAL_ONE_RETURNING:
...

/*
 * If we have not yet run the command, do so, storing its
 * results in the portal's tuplestore.  But we don't do that
 * for the PORTAL_ONE_SELECT case.
 */
if (portal->strategy != PORTAL_ONE_SELECT && !portal->holdStore)
FillPortalStore(portal, isTopLevel);

/*
 * Now fetch desired portion of results.
 */
nprocessed = PortalRunSelect(portal, true, count, dest);

/*
 * If the portal result contains a command tag and the caller
 * gave us a pointer to store it, copy it and update the
 * rowcount.
 */
if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN)
{
CopyQueryCompletion(qc, >qc);
-->>>   qc->nprocessed = nprocessed;
}

/* Mark portal not active */
portal->status = PORTAL_READY;

/*
 * Since it's a forward fetch, say DONE iff atEnd is now true.
 */
result = portal->atEnd;

The marked line is, seemingly intentionally, overriding the portal's
rowcount (which ought to count the whole query result) with the number
of rows processed in the current fetch.  Chap's theory that that's
always zero when this is being driven by JDBC seems plausible,
since the query status won't be returned to the client unless we
detect end-of-portal (otherwise we just send PortalSuspended).

It seems plausible to me that we should just remove that marked line.
Not sure about the compatibility implications, though.

regards, tom lane

Re: CommandStatus from insert returning when using a portal.

On Fri, 14 Jul 2023 at 15:40,  wrote:

> On 2023-07-14 14:19, David G. Johnston wrote:
> > Because of the returning they all need a portal so far as the server is
> > concerned and the server will obligingly send the contents of the
> > portal
> > back to the client.
>
> Dave's pcap file, for the fetch count 0 case, does not show any
> portal name used in the bind, describe, or execute messages, or
> any portal close message issued by the client afterward. The server
> may be using a portal in that case, but it seems more transparent
> to the client when fetch count is zero.
>
>
There is no portal for fetch count 0.


> Perhaps an easy rule would be, if the driver itself adds RETURNING
> because of a RETURN_GENERATED_KEYS option, it should also force the
> fetch count to zero and collect all the returned rows before
> executeUpdate returns, and then it will have the right count
> to return?
>
> Well that kind of negates the whole point of using a cursor in the case
where you have a large result set.

The rows are subsequently fetched in rs.next()

Solves one problem, but creates another.

Dave

>
>

Re: In Postgres 16 BETA, should the ParseNamespaceItem have the same index as it's RangeTableEntry?

2023-07-14 Thread Farias de Oliveira

I believe I have found something interesting that might be the root of the
problem with RTEPermissionInfo. But I do not know how to fix it exactly. In
AGE's code, the execution of it goes through a function called
analyze_cypher_clause() which does the following:

static Query *analyze_cypher_clause(transform_method transform,
cypher_clause *clause,
cypher_parsestate *parent_cpstate)
{
cypher_parsestate *cpstate;
Query *query;
ParseState *parent_pstate = (ParseState*)parent_cpstate;
ParseState *pstate;

cpstate = make_cypher_parsestate(parent_cpstate);
pstate = (ParseState*)cpstate;

/* copy the expr_kind down to the child */
pstate->p_expr_kind = parent_pstate->p_expr_kind;

query = transform(cpstate, clause);

advance_transform_entities_to_next_clause(cpstate->entities);

parent_cpstate->entities = list_concat(parent_cpstate->entities,
   cpstate->entities);

free_cypher_parsestate(cpstate);

return query;
}

the free_cypher_parsestate() function calls the free_parsestate() function:

void free_cypher_parsestate(cypher_parsestate *cpstate)
{
free_parsestate((ParseState *)cpstate);
}

So, before that happens the cpstate struct contains the following data:

{pstate = {parentParseState = 0x2b06ab0, p_sourcetext = 0x2b06ef0 "MATCH
(n) SET n.i = 3", p_rtable = 0x2bdb370,
p_rteperminfos = 0x2bdb320, p_joinexprs = 0x0, p_nullingrels = 0x0,
p_joinlist = 0x2bdb478, p_namespace = 0x2bdb4c8,
p_lateral_active = false, p_ctenamespace = 0x0, p_future_ctes = 0x0,
p_parent_cte = 0x0, p_target_relation = 0x0,
p_target_nsitem = 0x0, p_is_insert = false, p_windowdefs = 0x0,
p_expr_kind = EXPR_KIND_FROM_SUBSELECT, p_next_resno = 2,
p_multiassign_exprs = 0x0, p_locking_clause = 0x0, p_locked_from_parent
= false, p_resolve_unknowns = true,
p_queryEnv = 0x0, p_hasAggs = false, p_hasWindowFuncs = false,
p_hasTargetSRFs = false, p_hasSubLinks = false,
p_hasModifyingCTE = false, p_last_srf = 0x0, p_pre_columnref_hook =
0x0, p_post_columnref_hook = 0x0,
p_paramref_hook = 0x0, p_coerce_param_hook = 0x0, p_ref_hook_state =
0x0}, graph_name = 0x2b06e50 "cypher_set",
  graph_oid = 16942, params = 0x0, default_alias_num = 0, entities =
0x2c6e158, property_constraint_quals = 0x0,
  exprHasAgg = false, p_opt_match = false}

And then after that the pstate gets all wiped out:

{pstate = {parentParseState = 0x0, p_sourcetext = 0x2b06ef0 "MATCH (n) SET
n.i = 3", p_rtable = 0x0,
p_rteperminfos = 0x0, p_joinexprs = 0x0, p_nullingrels = 0x0,
p_joinlist = 0x0, p_namespace = 0x0,
p_lateral_active = false, p_ctenamespace = 0x0, p_future_ctes = 0x0,
p_parent_cte = 0x0, p_target_relation = 0x0,
p_target_nsitem = 0x0, p_is_insert = false, p_windowdefs = 0x0,
p_expr_kind = EXPR_KIND_FROM_SUBSELECT, p_next_resno = 1,
p_multiassign_exprs = 0x0, p_locking_clause = 0x0, p_locked_from_parent
= false, p_resolve_unknowns = true,
p_queryEnv = 0x0, p_hasAggs = false, p_hasWindowFuncs = false,
p_hasTargetSRFs = false, p_hasSubLinks = false,
p_hasModifyingCTE = false, p_last_srf = 0x0, p_pre_columnref_hook =
0x0, p_post_columnref_hook = 0x0,
p_paramref_hook = 0x0, p_coerce_param_hook = 0x0, p_ref_hook_state =
0x0}, graph_name = 0x2b06e50 "cypher_set",
  graph_oid = 16942, params = 0x0, default_alias_num = 0, entities =
0x2c6e228, property_constraint_quals = 0x0,
  exprHasAgg = false, p_opt_match = false}

But in transform_cypher_clause_as_subquery(), we use the same pstate. And
when we assign

pnsi = addRangeTableEntryForSubquery(pstate, query, alias, lateral, true);

The pstate changes, adding a value to p_rtable but nothing in p_rteperminfos
.

Then after that addNSItemToQuery(pstate, pnsi, true, false, true) is
called, changing the pstate to add values to p_joinlist and p_namespace.

It ends up going inside other functions and changing it more a bit, but at
the end of one of these functions it assigns values to some members of the
query:

query->targetList = lappend(query->targetList, tle);
query->rtable = pstate->p_rtable;
query->jointree = makeFromExpr(pstate->p_joinlist, NULL);

I assume that here is missing the assignment of query->rteperminfos to be
the same as pstate->p_rteperminfos, but the pstate has the following values:

{pstate = {parentParseState = 0x0, p_sourcetext = 0x2b06ef0 "MATCH (n) SET
n.i = 3", p_rtable = 0x2c6e590,
p_rteperminfos = 0x0, p_joinexprs = 0x0, p_nullingrels = 0x0,
p_joinlist = 0x2c6e678, p_namespace = 0x2c6e6c8,
p_lateral_active = false, p_ctenamespace = 0x0, p_future_ctes = 0x0,
p_parent_cte = 0x0, p_target_relation = 0x0,
p_target_nsitem = 0x0, p_is_insert = false, p_windowdefs = 0x0,
p_expr_kind = EXPR_KIND_NONE, p_next_resno = 3,
p_multiassign_exprs = 0x0, p_locking_clause = 0x0, p_locked_from_parent
= false, p_resolve_unknowns = true,
p_queryEnv = 0x0, p_hasAggs = false,

Re: add non-option reordering to in-tree getopt_long

2023-07-14 Thread Nathan Bossart

On Fri, Jul 14, 2023 at 02:02:28PM +0900, Michael Paquier wrote:
> Objection withdrawn.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: CommandStatus from insert returning when using a portal.


On 2023-07-14 14:19, David G. Johnston wrote:

Because of the returning they all need a portal so far as the server is
concerned and the server will obligingly send the contents of the 
portal

back to the client.


Dave's pcap file, for the fetch count 0 case, does not show any
portal name used in the bind, describe, or execute messages, or
any portal close message issued by the client afterward. The server
may be using a portal in that case, but it seems more transparent
to the client when fetch count is zero.

Perhaps an easy rule would be, if the driver itself adds RETURNING
because of a RETURN_GENERATED_KEYS option, it should also force the
fetch count to zero and collect all the returned rows before
executeUpdate returns, and then it will have the right count
to return?

It seems that any approach leaving any of the rows unfetched at
the time executeUpdate returns might violate a caller's expectation
that the whole outcome is known by that point.

Regards,
-Chap

Re: PG 16 draft release notes ready

2023-07-14 Thread Erik Rijkers


Op 7/4/23 om 23:32 schreef Bruce Momjian:

https://momjian.us/pgsql_docs/release-16.html


I noticed these:

'new new RULES'  should be
'new RULES'

'Perform apply of large transactions'  should be
'Performs apply of large transactions'
(I think)

'SQL JSON paths'  should be
'SQL/JSON paths'

Erik Rijkers

Re: CommandStatus from insert returning when using a portal.

On Fri, Jul 14, 2023 at 11:34 AM  wrote:

> On 2023-07-12 21:30, David G. Johnston wrote:
> > Right, and executeUpdate is the wrong API method to use, in the
> > PostgreSQL
> > world, when executing insert/update/delete with the non-SQL-standard
> > returning clause. ... ISTM that you are trying to make user-error less
> > painful.
>
> In Dave's Java reproducer, no user-error has been made, because the user
> supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the
> RETURNING clause has been added by the JDBC driver. So the user expects
> executeUpdate to be the right method, and return the row count, and
> getGeneratedKeys() to then return the rows.
>
>
That makes more sense, though I don't understand how the original desire of
having the count appear before the actual rows would materially
benefit that feature.

I agree that the documented contract of the insert command tag says it
reports the size of the entire tuple store maintained by the server during
the transaction instead of just the most recent count on subsequent fetches.

David J.

Re: sslinfo extension - add notbefore and notafter timestamps

2023-07-14 Thread Daniel Gustafsson

> On 14 Jul 2023, at 20:41, Cary Huang  wrote:

> Perhaps calling "tm2timestamp(_time, 0, NULL, )" without checking the 
> return code would be just fine. I see some other usages of tm2timstamp() in 
> other code areas also skip checking the return code.

I think we want to know about any failures, btu we can probably make it into an
elog() instead, as it should never fail.

--
Daniel Gustafsson

Re: CommandStatus from insert returning when using a portal.

On Fri, 14 Jul 2023 at 14:34,  wrote:

> On 2023-07-12 21:30, David G. Johnston wrote:
> > Right, and executeUpdate is the wrong API method to use, in the
> > PostgreSQL
> > world, when executing insert/update/delete with the non-SQL-standard
> > returning clause. ... ISTM that you are trying to make user-error less
> > painful.
>
> In Dave's Java reproducer, no user-error has been made, because the user
> supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the
> RETURNING clause has been added by the JDBC driver. So the user expects
> executeUpdate to be the right method, and return the row count, and
> getGeneratedKeys() to then return the rows.
>
> I've seen a possibly even more interesting result using pgjdbc-ng with
> protocol.trace=true:
>
> FetchSize=0
>  > 1.>t.>T$>Z*
>  > 2.>D.>D.>C.>Z*
> executeUpdate result: 2
> ids: 1 2
>
> FetchSize=1
>  > 2.>D.>s*
> executeUpdate result: -1
> ids: 3  > D.>s*
> 4  > C*
>  > 3.>Z*
>
> FetchSize=2
>  > 2.>D.>D.>s*
> executeUpdate result: -1
> ids: 5 6  > C*
>  > 3.>Z*
>
> FetchSize=3
>  > 2.>D.>D.>C*
>  > 3.>Z*
> executeUpdate result: 2
> ids: 7 8
>
>
> Unless there's some interleaving of trace and stdout messages happening
> here, I think pgjdbc-ng is not even collecting all the returned rows
> in the suspended-cursor case before executeUpdate returns, but keeping
> the cursor around for getGeneratedKeys() to use, so executeUpdate
> returns -1 before even having seen the later command complete, and would
> still do that even if the command complete message had the right count.
>

My guess is that pgjdbc-ng sees the -1 and doesn't bother looking any
further

Either way pgjdbc-ng is a dead project so I'm not so concerned about it.

Dave

>
> Regards,
> -Chap
>

Re: sslinfo extension - add notbefore and notafter timestamps

2023-07-14 Thread Cary Huang

Hello 

 > The way we typically ship extensions in contrib/ is to not create a new base
 > version .sql file for smaller changes like adding a few functions.  For this
 > patch we should keep --1.2.sql and instead supply a 1.2--1.3.sql with the new
 > functions.

Thank you for pointing this out. It makes sense to me.


 > +errmsg("failed to convert tm to timestamp")));
 > 
 > I think this error is too obscure for the user to act on, what we use 
 > elsewhere
 > is "timestamp out of range" and I think thats more helpful.  I do wonder if
 > there is ever a legitimate case when this can fail while still having a
 > authenticated client connection?

My bad here, you are right. "timestamp out of range" is a much better error 
message. However, in an authenticated client connection, there should not be a 
legitimate case where the not before and not after can fall out of range. The 
"not before" and "not after" timestamps in a X509 certificate are normally 
represented by ASN1, which has maximum timestamp of December 31, . The 
timestamp data structure in PostgreSQL on the other hand can support year up to 
June 3, 5874898. Assuming the X509 certificate is generated correctly and no 
data corruptions happening (which is unlikely), the conversion from ASN1 to 
timestamp shall not result in out of range error.

Perhaps calling "tm2timestamp(_time, 0, NULL, )" without checking the 
return code would be just fine. I see some other usages of tm2timstamp() in 
other code areas also skip checking the return code.

 > I have addressed the issues above in a new v5 patchset which includes a new
 > patch for setting stable validity on the test certificates (the notBefore 
 > time
 > was arbitrarily chosen to match the date of opening up the tree for v17 - we
 > just need a date in the past).  Your two patches are rolled into a single one
 > with a commit message added to get started on that part as well.

thank you so much for addressing the ssl tests to make "not before" and "not 
after" timestamps static in the test certificate and also adjusting 
003_sslinfo.pl to expect the new static timestamps in the v5 patches. I am able 
to apply both and all tests are passing. I did not know this test certificate 
could be changed by `cd src/test/ssl && make -f sslfiles.mk`, but now I know, 
thanks to you :p.

Best regards

Cary Huang
-
HighGo Software Inc. (Canada)
cary.hu...@highgo.ca
www.highgo.ca

Re: CommandStatus from insert returning when using a portal.


On 2023-07-14 14:19, David G. Johnston wrote:
Is there some magic set of arguments I should be using besides: tcpdump 
-Ar

filename ?


I opened it with Wireshark, which has a pgsql protocol decoder.

Regards,
-Chap

Re: CommandStatus from insert returning when using a portal.


On 2023-07-12 21:30, David G. Johnston wrote:
Right, and executeUpdate is the wrong API method to use, in the 
PostgreSQL

world, when executing insert/update/delete with the non-SQL-standard
returning clause. ... ISTM that you are trying to make user-error less
painful.


In Dave's Java reproducer, no user-error has been made, because the user
supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the
RETURNING clause has been added by the JDBC driver. So the user expects
executeUpdate to be the right method, and return the row count, and
getGeneratedKeys() to then return the rows.

I've seen a possibly even more interesting result using pgjdbc-ng with
protocol.trace=true:

FetchSize=0

1.>t.>T$>Z*


2.>D.>D.>C.>Z*

executeUpdate result: 2
ids: 1 2

FetchSize=1

2.>D.>s*

executeUpdate result: -1
ids: 3 
D.>s*

4 
C*


3.>Z*


FetchSize=2

2.>D.>D.>s*

executeUpdate result: -1
ids: 5 6 
C*


3.>Z*


FetchSize=3

2.>D.>D.>C*


3.>Z*

executeUpdate result: 2
ids: 7 8


Unless there's some interleaving of trace and stdout messages happening
here, I think pgjdbc-ng is not even collecting all the returned rows
in the suspended-cursor case before executeUpdate returns, but keeping
the cursor around for getGeneratedKeys() to use, so executeUpdate
returns -1 before even having seen the later command complete, and would
still do that even if the command complete message had the right count.

Regards,
-Chap

Re: PG 16 draft release notes ready

2023-07-14 Thread Laurenz Albe

On Thu, 2023-05-18 at 16:49 -0400, Bruce Momjian wrote:
> I have completed the first draft of the PG 16 release notes.

The release notes still have:

- Have initdb use ICU by default if ICU is enabled in the binary (Jeff Davis)

  Option --locale-provider=libc can be used to disable ICU.


But this was reverted in 2535c74b1a6190cc42e13f6b6b55d94bff4b7dd6.

Yours,
Laurenz Albe

Re: CommandStatus from insert returning when using a portal.

On Fri, Jul 14, 2023 at 10:39 AM  wrote:

> On 2023-07-14 12:58, Dave Cramer wrote:
> > See attached pcap file
>
> So if the fetch count is zero and no portal is needed,
> or if the fetch count exceeds the row count and the command
> completion follows directly with no suspension of the portal, then
> it comes with the correct count, but if the portal gets suspended,
> then the later command completion reports a zero count?
>
>
I cannot really understand that output other than to confirm that all
queries had returning and one of them showed INSERT 0 0

Is there some magic set of arguments I should be using besides: tcpdump -Ar
filename ?

Because of the returning they all need a portal so far as the server is
concerned and the server will obligingly send the contents of the portal
back to the client.

I can definitely believe a bug exists in the intersection of a non-SELECT
query and a less-than-complete fetch count specification.  There doesn't
seem to be any place in the core testing framework to actually test out the
interaction though...I even tried using plpgsql, which lets me open/execute
a plpgsql cursor with insert returning (which SQL prohibits) but we can't
get access to the command tag itself there.  The ROW_COUNT variable likely
tracks actual rows fetched or seen (in the case of MOVE).

What I kinda suspect might be happening with a portal suspend is that the
suspension loop only ends when the final fetch attempt find zero rows to
return and thus the final count ends up being zero instead of the
cumulative sum over all portal scans.

David J.

Re: PG 16 draft release notes ready

2023-07-14 Thread Laurenz Albe

On Thu, 2023-05-18 at 16:49 -0400, Bruce Momjian wrote:
> I have completed the first draft of the PG 16 release notes.

The release notes say:

- Prevent \df+ from showing function source code (Isaac Morland)

  Function bodies are more easily viewed with \ev and \ef.


That should be \sf, not \ev or \ef, right?

Yours,
Laurenz Albe

Re: CommandStatus from insert returning when using a portal.

On Fri, 14 Jul 2023 at 13:39,  wrote:

> On 2023-07-14 12:58, Dave Cramer wrote:
> > See attached pcap file
>
> So if the fetch count is zero and no portal is needed,
> or if the fetch count exceeds the row count and the command
> completion follows directly with no suspension of the portal, then
> it comes with the correct count, but if the portal gets suspended,
> then the later command completion reports a zero count?
>
>
Seems so, yes.

Dave

Re: CommandStatus from insert returning when using a portal.


On 2023-07-14 12:58, Dave Cramer wrote:

See attached pcap file


So if the fetch count is zero and no portal is needed,
or if the fetch count exceeds the row count and the command
completion follows directly with no suspension of the portal, then
it comes with the correct count, but if the portal gets suspended,
then the later command completion reports a zero count?

Regards,
-Chap

Re: CommandStatus from insert returning when using a portal.

See attached pcap file

after the execute of the portal it returns INSERT 0 0
Dave Cramer


On Fri, 14 Jul 2023 at 12:57, David G. Johnston 
wrote:

> On Fri, Jul 14, 2023 at 9:50 AM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>>
>> Fixing that test in some manner and recompiling psql seems like it should
>> be the easiest way to produce a core-only test case.
>>
>>
> Apparently not - since it (ExecQueryUsingCursor) literally wraps the query
> in a DECLARE CURSOR SQL Command which prohibits INSERT...
>
> I suppose we'd have to write a psql equivalent of ExecQueryUsingPortal
> that iterates over via fetch to make this work...probably more than I'm
> willing to try.
>
> David J.
>


cursor.pcap
Description: Binary data

Re: CommandStatus from insert returning when using a portal.

On Fri, Jul 14, 2023 at 9:50 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
> Fixing that test in some manner and recompiling psql seems like it should
> be the easiest way to produce a core-only test case.
>
>
Apparently not - since it (ExecQueryUsingCursor) literally wraps the query
in a DECLARE CURSOR SQL Command which prohibits INSERT...

I suppose we'd have to write a psql equivalent of ExecQueryUsingPortal that
iterates over via fetch to make this work...probably more than I'm willing
to try.

David J.

Re: CommandStatus from insert returning when using a portal.

On Fri, Jul 14, 2023 at 9:30 AM Dave Cramer  wrote:

> David,
>
> I will try to get a tcpdump file. Doing this in libpq seems challenging as
> I'm not aware of how to create a portal in psql.
>

Yeah, apparently psql does something special (like ignoring it...) with its
FETCH_COUNT variable (set to 2 below as evidenced by the first query) for
the insert returning case.  As documented since the command itself is not
select or values the test in is_select_command returns false and the branch:

 else if (pset.fetch_count <= 0 || pset.gexec_flag ||
pset.crosstab_flag || !is_select_command(query))
{
/* Default fetch-it-all-and-print mode */

Is chosen.

Fixing that test in some manner and recompiling psql seems like it should
be the easiest way to produce a core-only test case.

postgres=# select * from (Values (1),(2),(3),(4)) vals (v);
 v
---
 1
 2
 3
 4
(4 rows)

postgres=# \bind 5 6 7 8
postgres=# insert into ins values ($1),($2),($3),($4) returning id;
  id
---
 5
 6
 7
 8
(4 rows)

INSERT 0 4

I was hoping to see the INSERT RETURNING query have a 4 width header
instead of 7.

David J.

Re: CommandStatus from insert returning when using a portal.

David,

I will try to get a tcpdump file. Doing this in libpq seems challenging as
I'm not aware of how to create a portal in psql.

Chap

The only difference is one instance uses a portal to fetch the results, the
other (correct one) is a normal insert where all of the rows are returned
immediately

this is a reproducer in Java

conn.prepareStatement("DROP TABLE IF EXISTS test_table").execute();
conn.prepareStatement("CREATE TABLE IF NOT EXISTS test_table (id
SERIAL PRIMARY KEY, cnt INT NOT NULL)").execute();

for (var fetchSize : List.of(0, 1, 2, 3)) {
System.out.println("FetchSize=" + fetchSize);

try (var stmt = conn.prepareStatement("INSERT INTO test_table
(cnt) VALUES (1), (2) RETURNING id", RETURN_GENERATED_KEYS)) {
stmt.setFetchSize(fetchSize);

var ret = stmt.executeUpdate();
System.out.println("executeUpdate result: " + ret);

var rs = stmt.getGeneratedKeys();
System.out.print("ids: ");
while (rs.next()) {
System.out.print(rs.getInt(1) + " ");
}
System.out.print("\n\n");
}
}

Dave

On Fri, 14 Jul 2023 at 12:07,  wrote:

> On 2023-07-12 20:57, Dave Cramer wrote:
> > Without a cursor it returns right away as all of the results are
> > returned
> > by the server. However with cursor you have to wait until you fetch the
> > rows before you can get the CommandComplete message which btw is wrong
> > as
> > it returns INSERT 0 0 instead of INSERT 2 0
>
> To make sure I am following, was this describing a comparison of
> two different ways in Java, using JDBC, to perform the same operation,
> one of which behaves as desired while the other doesn't? If so, for
> my curiosity, what do both ways look like in Java?
>
> Or was it a comparison of two different operations, say one
> an INSERT RETURNING and the other something else?
>
> Regards,
> -Chap
>

Re: CommandStatus from insert returning when using a portal.


On 2023-07-12 20:57, Dave Cramer wrote:
Without a cursor it returns right away as all of the results are 
returned

by the server. However with cursor you have to wait until you fetch the
rows before you can get the CommandComplete message which btw is wrong 
as

it returns INSERT 0 0 instead of INSERT 2 0


To make sure I am following, was this describing a comparison of
two different ways in Java, using JDBC, to perform the same operation,
one of which behaves as desired while the other doesn't? If so, for
my curiosity, what do both ways look like in Java?

Or was it a comparison of two different operations, say one
an INSERT RETURNING and the other something else?

Regards,
-Chap

Re: Allowing parallel-safe initplans

Richard Guo  writes:
> So +1 to 0001 patch.
> Also +1 to 0002 patch.

Pushed, thanks for looking at it!

regards, tom lane

Re: CommandStatus from insert returning when using a portal.

On Thu, Jul 13, 2023 at 6:07 PM Dave Cramer  wrote:

> On Thu, 13 Jul 2023 at 10:24, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Thursday, July 13, 2023, Dave Cramer  wrote:
>>
>>>
>>> Any comment on why the CommandComplete is incorrect ?
>>> It returns INSERT 0 0  if a cursor is used
>>>
>>
>>  Looking at DECLARE it is surprising that what you describe is even
>> possible.  Can you share a psql reproducer?
>>
>
> apologies, we are using a portal, not a cursor.
>
>
Still the same basic request of providing a reproducer - ideally in psql.

IIUC a portal has to be used for a prepared (extended query mode) result
set returning query.  v16 can now handle parameter binding so:

postgres=# \bind 4
postgres=# insert into ins values ($1) returning id;
 id

  4
(1 row)

INSERT 0 1

Which gives the expected non-zero command tag row count result.

David J.

Re: PATCH: Using BRIN indexes for sorted output

On 7/14/23 16:42, Matthias van de Meent wrote:
> On Fri, 14 Jul 2023 at 16:21, Tomas Vondra
>  wrote:
>>
>>
>>
>>
>> On 7/11/23 13:20, Matthias van de Meent wrote:
>>> On Mon, 10 Jul 2023 at 22:04, Tomas Vondra
>>>  wrote:
 Approximation in what sense? My understanding was we'd get a range of
 distances that we know covers all rows in that range. So the results
 should be accurate, no?
>>>
>>> The distance is going to be accurate only to the degree that the
>>> summary can produce accurate distances for the datapoints it
>>> represents. That can be quite imprecise due to the nature of the
>>> contained datapoints: a summary of the points (-1, -1) and (1, 1) will
>>> have a minimum distance of 0 to the origin, where the summary (-1, 0)
>>> and (-1, 0.5) would have a much more accurate distance of 1.
>>
>> Ummm, I'm probably missing something, or maybe my mental model of this
>> is just wrong, but why would the distance for the second summary be more
>> accurate? Or what does "more accurate" mean?
>>
>> Is that about the range of distances for the summary? For the first
>> range the summary is a bounding box [(-1,1), (1,1)] so all we know the
>> points may have distance in range [0, sqrt(2)]. While for the second
>> summary it's [1, sqrt(1.25)].
> 
> Yes; I was trying to refer to the difference between what results you
> get from the summary vs what results you get from the actual
> datapoints: In this case, for finding points which are closest to the
> origin, the first bounding box has a less accurate estimate than the
> second.
> 

OK. I think regular minmax indexes have a similar issue with
non-distance ordering, because we don't know if the min/max values are
still in the page range (or deleted/updated).

>>> The point I was making is that the summary can only approximate the
>>> distance, and that approximation is fine w.r.t. the BRIN sort
>>> algoritm.
>>>
>>
>> I think as long as the approximation (whatever it means) does not cause
>> differences in results (compared to not using an index), it's OK.
> 

I haven't written any code yet, but I think if we don't try to find the
exact min/max distances for the summary (e.g. by calculating the closest
point exactly) but rather "estimates" that are guaranteed to bound the
actual min/max, that's good enough for the sorting.

For the max, this probably is not an issue, as we can just calculate
distance for the corners and use a maximum of that. At least with
reasonable euclidean distance ... in 2D I'm imagining the bounding box
summary as a rectangle, with the "max distance" being a minimum radius
of a circle containing it (the rectangle).

For min we're looking for the largest radius not intersecting with the
box, which seems harder to calculate I think.

However, now that I'm thinking about it - don't (SP-)GiST indexes
already do pretty much exactly this?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

XLogSaveBufferForHint() correctness and more

2023-07-14 Thread Andres Freund

Hi,

While looking at [1] I started to wonder why it is safe that
CreateCheckPoint() updates XLogCtl->RedoRecPtr after releasing the WAL
insertion lock:

/*
 * Now we can release the WAL insertion locks, allowing other xacts to
 * proceed while we are flushing disk buffers.
 */
WALInsertLockRelease();

/* Update the info_lck-protected copy of RedoRecPtr as well */
SpinLockAcquire(>info_lck);
XLogCtl->RedoRecPtr = checkPoint.redo;
SpinLockRelease(>info_lck);

The most important user of that is GetRedoRecPtr().

Right now I'm a bit confused why it's ok that

/* Update the info_lck-protected copy of RedoRecPtr as well */
SpinLockAcquire(>info_lck);
XLogCtl->RedoRecPtr = checkPoint.redo;
SpinLockRelease(>info_lck);

happens after WALInsertLockRelease().


But then started to wonder, even if that weren't the case, how come
XLogSaveBufferForHint() and other uses of GetRedoRecPtr(), aren't racy as
hell?

The reason XLogInsertRecord() can safely check if an FPW is needed is that it
holds a WAL insertion lock, the redo pointer cannot change until the insertion
lock is released.

But there's *zero* interlock in XLogSaveBufferForHint() from what I can tell?
A checkpoint could easily start between between the GetRedoRecPtr() and the
check whether this buffer needs to be WAL logged?


While XLogSaveBufferForHint() makes no note of this, it's sole caller,
MarkBufferDirtyHint(), tries to deal with some related concerns to some
degree:

/*
 * If the block is already dirty because we either made 
a change
 * or set a hint already, then we don't need to write a 
full page
 * image.  Note that aggressive cleaning of blocks 
dirtied by hint
 * bit setting would increase the call rate. Bulk 
setting of hint
 * bits would reduce the call rate...
 *
 * We must issue the WAL record before we mark the 
buffer dirty.
 * Otherwise we might write the page before we write 
the WAL. That
 * causes a race condition, since a checkpoint might 
occur between
 * writing the WAL record and marking the buffer dirty. 
We solve
 * that with a kluge, but one that is already in use 
during
 * transaction commit to prevent race conditions. 
Basically, we
 * simply prevent the checkpoint WAL record from being 
written
 * until we have marked the buffer dirty. We don't 
start the
 * checkpoint flush until we have marked dirty, so our 
checkpoint
 * must flush the change to disk successfully or the 
checkpoint
 * never gets written, so crash recovery will fix.
 *
 * It's possible we may enter here without an xid, so 
it is
 * essential that CreateCheckPoint waits for virtual 
transactions
 * rather than full transactionids.
 */
Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 
0);
MyProc->delayChkptFlags |= DELAY_CHKPT_START;
delayChkptFlags = true;
lsn = XLogSaveBufferForHint(buffer, buffer_std);

but I don't think that really does all that much, because the
DELAY_CHKPT_START handling in CreateCheckPoint() happens after we determine
the redo pointer.  This code isn't even reached if we wrongly skipped due to
the if (lsn <= RedoRecPtr).


I seriously doubt this can correctly be implemented outside of xlog*.c /
without the use of a WALInsertLock?

I feel like I must be missing here, this isnt' a particularly narrow race?


It looks to me like the sue of GetRedoRecPtr() in nextval_internal() is also
wrong. I think the uses in slot.c, snapbuild.c, rewriteheap.c are fine.

Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/20230714151626.rhgae7taigk2xrq7%40awork3.anarazel.de

Re: Bytea PL/Perl transform

2023-07-14 Thread Ivan Panchenko


>Четверг, 6 июля 2023, 14:48 +03:00 от Peter Eisentraut < pe...@eisentraut.org 
>>:
> 
>On 22.06.23 22:56, Greg Sabino Mullane wrote:
>> * Do all of these transforms need to be their own contrib modules? So
>> much duplicated code across contrib/*_plperl already (and *plpython too
>> for that matter) ...
>The reason the first transform modules were separate extensions is that
>they interfaced between one extension (plpython, plperl) and another
>extension (ltree, hstore), so it wasn't clear where to put them without
>creating an additional dependency for one of them.
>
>If the transform deals with a built-in type, then they should just be
>added to the respective pl extension directly.
Looks reasonable. 
The new extension bytea_plperl can be easily moved into plperl now, but what 
should be do with the existing ones, namely jsonb_plperl and bool_plperl ?
If we leave them where they are, it would be hard to explain why some 
transforms are inside plperl while other ones live separately. If we move them 
into plperl also, wouldn’t it break some compatibility?
>
>

Re: In Postgres 16 BETA, should the ParseNamespaceItem have the same index as it's RangeTableEntry?

Farias de Oliveira  writes:
> 3905  elog(ERROR, "invalid perminfoindex %u in RTE with relid 
> %u",
> (gdb) bt
> #0  getRTEPermissionInfo (rteperminfos=0x138a500, rte=0x138a6b0) at 
> parse_relation.c:3905
> #1  0x00676e29 in GetResultRTEPermissionInfo 
> (relinfo=relinfo@entry=0x13b8f50, estate=estate@entry=0x138ce48)
> at execUtils.c:1412
> #2  0x00677c30 in ExecGetUpdatedCols 
> (relinfo=relinfo@entry=0x13b8f50, estate=estate@entry=0x138ce48)
> at execUtils.c:1321
> #3  0x00677cd7 in ExecGetAllUpdatedCols 
> (relinfo=relinfo@entry=0x13b8f50, estate=estate@entry=0x138ce48)
> at execUtils.c:1362
> #4  0x0066b9bf in ExecUpdateLockMode (estate=estate@entry=0x138ce48, 
> relinfo=relinfo@entry=0x13b8f50) at execMain.c:2385
> #5  0x7f197fb19a8d in update_entity_tuple (resultRelInfo=, 
> resultRelInfo@entry=0x13b8f50, 
> elemTupleSlot=elemTupleSlot@entry=0x13b9730, 
> estate=estate@entry=0x138ce48, old_tuple=0x13bae80)
> at src/backend/executor/cypher_set.c:120
> #6  0x7f197fb1a2ff in process_update_list (node=node@entry=0x138d0c8) at 
> src/backend/executor/cypher_set.c:595
> #7  0x7f197fb1a348 in process_all_tuples (node=node@entry=0x138d0c8) at 
> src/backend/executor/cypher_set.c:212
> #8  0x7f197fb1a455 in exec_cypher_set (node=0x138d0c8) at 
> src/backend/executor/cypher_set.c:641

So apparently, what we have here is a result-relation RTE that has
no permissions info associated.  It's not clear to me whether that
is a bug in building the parse tree, or an expectable situation
that GetResultRTEPermissionInfo ought to be coping with.  I'm
inclined to bet on the former though, and to guess that AGE is
missing dealing with the new RTEPermissionInfo structs in someplace
or other.  I'm afraid that allowing GetResultRTEPermissionInfo to
let this pass without comment would mask actual bugs, so fixing
it at that end doesn't seem attractive.

regards, tom lane

Re: New WAL record to detect the checkpoint redo location

2023-07-14 Thread Andres Freund

Hi,

As I think I mentioned before, I like this idea. However, I don't like the
implementation too much.

On 2023-06-15 13:11:57 +0530, Dilip Kumar wrote:
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index b2430f617c..a025fb91e2 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -744,6 +744,7 @@ XLogInsertRecord(XLogRecData *rdata,
>   XLogRecPtr  StartPos;
>   XLogRecPtr  EndPos;
>   boolprevDoPageWrites = doPageWrites;
> + boolcallerHoldingExlock = holdingAllLocks;
>   TimeLineID  insertTLI;
>  
>   /* we assume that all of the record header is in the first chunk */
> @@ -792,10 +793,18 @@ XLogInsertRecord(XLogRecData *rdata,
>*--
>*/
>   START_CRIT_SECTION();
> - if (isLogSwitch)
> - WALInsertLockAcquireExclusive();
> - else
> - WALInsertLockAcquire();
> +
> + /*
> +  * Acquire wal insertion lock, nothing to do if the caller is already
> +  * holding the exclusive lock.
> +  */
> + if (!callerHoldingExlock)
> + {
> + if (isLogSwitch)
> + WALInsertLockAcquireExclusive();
> + else
> + WALInsertLockAcquire();
> + }
>  
>   /*
>* Check to see if my copy of RedoRecPtr is out of date. If so, may have

This might work right now, but doesn't really seem maintainable. Nor do I like
adding branches into this code a whole lot.


> @@ -6597,6 +6612,32 @@ CreateCheckPoint(int flags)

I think the commentary above this function would need a fair bit of
revising...

>*/
>   RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
>  
> + /*
> +  * Insert a special purpose CHECKPOINT_REDO record as the first record 
> at
> +  * checkpoint redo lsn.  Although we have the checkpoint record that
> +  * contains the exact redo lsn, there might have been some other records
> +  * those got inserted between the redo lsn and the actual checkpoint
> +  * record.  So when processing the wal, we cannot rely on the checkpoint
> +  * record if we want to stop at the checkpoint-redo LSN.
> +  *
> +  * This special record, however, is not required when we doing a 
> shutdown
> +  * checkpoint, as there will be no concurrent wal insertions during that
> +  * time.  So, the shutdown checkpoint LSN will be the same as
> +  * checkpoint-redo LSN.
> +  *
> +  * This record is guaranteed to be the first record at checkpoint redo 
> lsn
> +  * because we are inserting this while holding the exclusive wal 
> insertion
> +  * lock.
> +  */
> + if (!shutdown)
> + {
> + int dummy = 0;
> +
> + XLogBeginInsert();
> + XLogRegisterData((char *) , sizeof(dummy));
> + recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKPOINT_REDO);
> + }

It seems to me that we should be able to do better than this.

I suspect we might be able to get rid of the need for exclusive inserts
here. If we rid of that, we could determine the redoa location based on the
LSN determined by the XLogInsert().

Alternatively, I think we should split XLogInsertRecord() so that the part
with the insertion locks held is a separate function, that we could use here.

Greetings,

Andres Freund

Re: Parallel CREATE INDEX for BRIN indexes

2023-07-14 Thread Matthias van de Meent

On Fri, 14 Jul 2023 at 15:57, Tomas Vondra
 wrote:
>
> On 7/11/23 23:11, Matthias van de Meent wrote:
>> On Thu, 6 Jul 2023 at 16:13, Tomas Vondra  
>> wrote:
>>>
>>> One thing I was wondering about is whether it might be better to allow
>>> the workers to process overlapping ranges, and then let the leader to
>>> merge the summaries. That would mean we might not need the tableam.c
>>> changes at all, but the leader would need to do more work (although the
>>> BRIN indexes tend to be fairly small). The main reason that got me
>>> thinking about this is that we have pretty much no tests for the union
>>> procedures, because triggering that is really difficult. But for
>>> parallel index builds that'd be much more common.
>>
>> Hmm, that's a good point. I don't mind either way, but it would add
>> overhead in the leader to do all of that merging - especially when you
>> configure pages_per_range > PARALLEL_SEQSCAN_MAX_CHUNK_SIZE as we'd
>> need to merge up to parallel_workers tuples. That could be a
>> significant overhead.
>>
>> ... thinks a bit.
>>
>> Hmm, but with the current P_S_M_C_S of 8192 blocks that's quite
>> unlikely to be a serious problem - the per-backend IO saved of such
>> large ranges on a single backend has presumably much more impact than
>> the merging of n_parallel_tasks max-sized brin tuples. So, seems fine
>> with me.
>>
>
> As for PARALLEL_SEQSCAN_MAX_CHUNK_SIZE, the last patch actually
> considers the chunk_factor (i.e. pages_per_range) *after* doing
>
> pbscanwork->phsw_chunk_size = Min(pbscanwork->phsw_chunk_size,
>   PARALLEL_SEQSCAN_MAX_CHUNK_SIZE);
>
> so even with (pages_per_range > PARALLEL_SEQSCAN_MAX_CHUNK_SIZE) it
> would not need to merge anything.
>
> Now, that might have been a bad idea and PARALLEL_SEQSCAN_MAX_CHUNK_SIZE
> should be considered. In which case this *has* to do the union, even if
> only for the rare corner case.
>
> But I don't think that's a major issue - it's pretty sure summarizing
> the tuples is way more expensive than merging the summaries. Which is
> what matters for Amdahl's law ...

Agreed.

>>> +BrinSpool  *bs_spool;
>>> +BrinSpool  *bs_spool_out;
>>
>> Are both used? If so, could you add comments why we have two spools
>> here, instead of only one?
>>
>
> OK, I admit I'm not sure both are actually necessary. I was struggling
> getting it working with just one spool, because when the leader
> participates as a worker, it needs to both summarize some of the chunks
> (and put the tuples somewhere). And then it also needs to consume the
> final output.
>
> Maybe it's just a case of cargo cult programming - I was mostly copying
> stuff from the btree index build, trying to make it work, and then with
> two spools it started working.

Two spools seem to be necessary in a participating leader, but both
spools have non-overlapping lifetimes. In the btree code actually two
pairs of spools are actually used (in unique indexes): you can see the
pairs being allocated in both _bt_leader_participate_as_worker (called
from _bt_begin_parallel, from _bt_spools_heapscan) and in
_bt_spools_heapscan.

> >> diff --git a/src/backend/utils/sort/tuplesortvariants.c 
> >> b/src/backend/utils/sort/tuplesortvariants.c
> >> [...]
> >> + * Computing BrinTuple size with only the tuple is difficult, so we want 
> >> to track
> >> + * the length for r referenced by SortTuple. That's what BrinSortTuple is 
> >> meant
> >> + * to do - it's essentially a BrinTuple prefixed by length. We only write 
> >> the
> >> + * BrinTuple to the logtapes, though.
> >
> > Why don't we write the full BrinSortTuple to disk? Doesn't that make more 
> > sense?
> >
>
> Not sure I understand. We ultimately do, because we write
>
> (length + BrinTuple)
>
> and BrinSortTuple is exactly that. But if we write BrinSortTuple, we
> would end up writing length for that too, no?
>
> Or maybe I just don't understand how would that make things simpler.

I don't quite understand the intricacies of the tape storage format
quite yet (specifically, I'm continuously getting confused by the len
-= sizeof(int)), so you might well be correct.

My comment was written based on just the comment's contents, which
claims "we can't easily recompute the length, so we store it with the
tuple in memory. However, we don't store the length when we write it
to the tape", which seems self-contradictory.

> >> +tuplesort_puttuple_common(state, ,
> >> +  base->sortKeys &&
> >> +  base->sortKeys->abbrev_converter &&
> >> +  !stup.isnull1);
> >
> > Can't this last argument just be inlined, based on knowledge that we
> > don't use sortKeys in brin?
> >
>
> What does "inlined" mean for an argument? But yeah, I guess it might be
> just set to false. And we should probably have an assert that there are
> no sortKeys.

"inlined", "precomputed", "const-ified"? I'm not super good at
vocabulary. But,

Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

On 7/9/23 23:44, Tomas Vondra wrote:
> ...
>>> Yes, my previous message was mostly about backwards compatibility, and
>>> this may seem a bit like an argument against it. But that message was
>>> more a question "If we do this, is it actually backwards compatible the
>>> way we want/need?")
>>>
>>> Anyway, I think the BrinDesc scratch space is a neat idea, I'll try
>>> doing it that way and report back in a couple days.
>>
>> Cool. In 0005-Support-SK_SEARCHARRAY-in-BRIN-bloom-20230702.patch, you
>> used the preprocess function to pre-calculate the scankey's hash, even
>> for scalars. You could use the scratch space in BrinDesc for that,
>> before doing anything with SEARCHARRAYs.
>>
> 
> Yeah, that's a good idea.
> 

I started looking at this (the scratch space in BrinDesc), and it's not
as straightforward. The trouble is BrinDesc is "per attribute" but the
scratch space is "per scankey" (because we'd like to sort values from
the scankey array).

With the "new" consistent functions (that get all scan keys at once)
this probably is not an issue, because we know which scan key we're
processing and so we can map it to the scratch space. But with the old
consistent function that's not the case. Maybe we should support this
only with the "new" consistent function variant?

This would however conflict with the idea to have a separate consistent
function for arrays, which "splits" the scankeys into multiple groups
again. There could be multiple SAOP scan keys, and then what?

I wonder if the scratch space should be in the ScanKey instead?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PATCH: Using BRIN indexes for sorted output

2023-07-14 Thread Matthias van de Meent

On Fri, 14 Jul 2023 at 16:21, Tomas Vondra
 wrote:
>
>
>
>
> On 7/11/23 13:20, Matthias van de Meent wrote:
>> On Mon, 10 Jul 2023 at 22:04, Tomas Vondra
>>  wrote:
>>> Approximation in what sense? My understanding was we'd get a range of
>>> distances that we know covers all rows in that range. So the results
>>> should be accurate, no?
>>
>> The distance is going to be accurate only to the degree that the
>> summary can produce accurate distances for the datapoints it
>> represents. That can be quite imprecise due to the nature of the
>> contained datapoints: a summary of the points (-1, -1) and (1, 1) will
>> have a minimum distance of 0 to the origin, where the summary (-1, 0)
>> and (-1, 0.5) would have a much more accurate distance of 1.
>
> Ummm, I'm probably missing something, or maybe my mental model of this
> is just wrong, but why would the distance for the second summary be more
> accurate? Or what does "more accurate" mean?
>
> Is that about the range of distances for the summary? For the first
> range the summary is a bounding box [(-1,1), (1,1)] so all we know the
> points may have distance in range [0, sqrt(2)]. While for the second
> summary it's [1, sqrt(1.25)].

Yes; I was trying to refer to the difference between what results you
get from the summary vs what results you get from the actual
datapoints: In this case, for finding points which are closest to the
origin, the first bounding box has a less accurate estimate than the
second.

> > The point I was making is that the summary can only approximate the
> > distance, and that approximation is fine w.r.t. the BRIN sort
> > algoritm.
> >
>
> I think as long as the approximation (whatever it means) does not cause
> differences in results (compared to not using an index), it's OK.

Agreed.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: logical decoding and replication of sequences, take 2

On 7/14/23 16:02, Ashutosh Bapat wrote:
> ...
>

 H, that might work. I feel a bit uneasy about having to keep all
 relfilenodes, not just sequences ...
>>>
>>> From relfilenode it should be easy to get to rel and then see if it's
>>> a sequence. Only add relfilenodes for the sequence.
>>>
>>
>> Will try.
>>
> 
> Actually, adding all relfilenodes to hash may not be that bad. There
> shouldn't be many of those. So the extra step to lookup reltype may
> not be necessary. What's your reason for uneasiness? But yeah, there's
> a way to avoid that as well.
> 
> Should I wait for this before the second round of review?
> 

I don't think you have to wait - just ignore the part that changes the
WAL record, which is a pretty tiny bit of the patch.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: In Postgres 16 BETA, should the ParseNamespaceItem have the same index as it's RangeTableEntry?

2023-07-14 Thread Farias de Oliveira

Thanks Amit and Tom for the quick response. I have attached a file that
contains the execution of the code via GDB and also what the backtrace
command shows when it gets the error. If I forgot to add something or if it
is necessary to add anything else, please let me know.

Thank you,
Matheus Farias

Em sex., 14 de jul. de 2023 às 00:05, Amit Langote 
escreveu:

> On Fri, Jul 14, 2023 at 7:12 AM Tom Lane  wrote:
> > Farias de Oliveira  writes:
> > > With further inspection in AGE's code, after executing the SET query,
> > > it goes inside transform_cypher_clause_as_subquery() function and the
> > > ParseNamespaceItem has the following values:
> >
> > >  {p_names = 0x1205638, p_rte = 0x11edb70, p_rtindex = 1, p_perminfo =
> > > 0x7f7f7f7f7f7f7f7f,
> > >   p_nscolumns = 0x1205848, p_rel_visible = true, p_cols_visible =
> > > true, p_lateral_only = false,
> > >   p_lateral_ok = true}
> >
> > Hmm, that uninitialized value for p_perminfo is pretty suspicious.
> > I see that transformFromClauseItem and buildNSItemFromLists both
> > create ParseNamespaceItems without bothering to fill p_perminfo,
> > while buildNSItemFromTupleDesc fills it per the caller and
> > addRangeTableEntryForJoin always sets it to NULL.  I think we
> > ought to make the first two set it to NULL as well, because
> > uninitialized fields are invariably a bad idea (even though the
> > lack of valgrind complaints says that the core code is managing
> > to avoid touching those fields).
>
> Agreed, I'll go ahead and fix that.
>
> > If we do that, is it sufficient to resolve your problem?
>
> Hmm, I'm afraid maybe not, because if the above were the root issue,
> we'd have seen a segfault and not the error the OP mentioned?  I'm
> thinking the issue is that their code appears to be consing up an RTE
> that the core code (getRTEPermissionInfo() most likely via
> markRTEForSelectPriv()) is not expecting to be called with?  I would
> be helpful to see a backtrace when the error occurs to be sure.
>
> --
> Thanks, Amit Langote
> EDB: http://www.enterprisedb.com
>
(gdb) b getRTEPermissionInfo 
Breakpoint 1 at 0x5bc188: file parse_relation.c, line 3900.
(gdb) c
Continuing.

Breakpoint 1, getRTEPermissionInfo (rteperminfos=0x1375d78, rte=0x135d5c8) at 
parse_relation.c:3900
3900{
(gdb) n
3903if (rte->perminfoindex == 0 ||
(gdb) 
3904rte->perminfoindex > list_length(rteperminfos))
(gdb) 
3907perminfo = list_nth_node(RTEPermissionInfo, rteperminfos,
(gdb) 
3909if (perminfo->relid != rte->relid)
(gdb) 
markRTEForSelectPriv (pstate=, rtindex=1, col=) 
at parse_relation.c:1076
1076perminfo->requiredPerms |= ACL_SELECT;
(gdb) 
1078perminfo->selectedCols =
(gdb) 
scanNSItemForColumn (pstate=pstate@entry=0x1375448, 
nsitem=nsitem@entry=0x1375e68, sublevels_up=sublevels_up@entry=0, 
colname=colname@entry=0x7f197fb5df1b "id", location=location@entry=-1) at 
parse_relation.c:772
772 return (Node *) var;
(gdb) 
make_vertex_expr (cpstate=cpstate@entry=0x1375448, pnsi=pnsi@entry=0x1375e68, 
label=)
at src/backend/parser/cypher_clause.c:4561

4561label_name_func_oid = get_ag_func_oid("_label_name", 2, OIDOID,
(gdb) 

254 return (Datum) X;
(gdb) 
4568label_name_args = list_make2(graph_oid_const, id);
(gdb) 
4570label_name_func_expr = makeFuncExpr(label_name_func_oid, CSTRINGOID,
(gdb) 
4573label_name_func_expr->location = -1;
(gdb) 
4575props = scanNSItemForColumn(pstate, pnsi, 0, 
AG_VERTEX_COLNAME_PROPERTIES, -1);
(gdb) 

Breakpoint 1, getRTEPermissionInfo (rteperminfos=0x1375d78, rte=0x135d5c8) at 
parse_relation.c:3900
3900{
(gdb) 
3903if (rte->perminfoindex == 0 ||
(gdb) 
3904rte->perminfoindex > list_length(rteperminfos))
(gdb) 
3907perminfo = list_nth_node(RTEPermissionInfo, rteperminfos,
(gdb) 
3909if (perminfo->relid != rte->relid)
(gdb) 
markRTEForSelectPriv (pstate=, rtindex=1, col=) 
at parse_relation.c:1076
1076perminfo->requiredPerms |= ACL_SELECT;
(gdb) 
1078perminfo->selectedCols =
(gdb) 
scanNSItemForColumn (pstate=pstate@entry=0x1375448, 
nsitem=nsitem@entry=0x1375e68, sublevels_up=sublevels_up@entry=0, 
colname=colname@entry=0x7f197fb5defb "properties", 
location=location@entry=-1) at parse_relation.c:772
772 return (Node *) var;
(gdb) 
make_vertex_expr (cpstate=cpstate@entry=0x1375448, pnsi=pnsi@entry=0x1375e68, 
label=)
at src/backend/parser/cypher_clause.c:4577
4577args = list_make3(id, label_name_func_expr, props);
(gdb) 
4579func_expr = makeFuncExpr(func_oid, AGTYPEOID, args, InvalidOid, 
InvalidOid,
(gdb) 
4581func_expr->location = -1;
(gdb) 
4583return (Node *)func_expr;
(gdb) 
transform_cypher_node (cpstate=cpstate@entry=0x1375448, 
node=node@entry=0x135edf0, target_list=target_list@entry=0x135d740,

Re: PATCH: Using BRIN indexes for sorted output





On 7/11/23 13:20, Matthias van de Meent wrote:
> On Mon, 10 Jul 2023 at 22:04, Tomas Vondra
>  wrote:
>> On 7/10/23 18:18, Matthias van de Meent wrote:
>>> On Mon, 10 Jul 2023 at 17:09, Tomas Vondra
>>>  wrote:
 On 7/10/23 14:38, Matthias van de Meent wrote:
>> I haven't really thought about geometric types, just about minmax and
>> minmax-multi. It's not clear to me what the benefit for these types be.
>> I mean, we can probably sort points lexicographically, but is anyone
>> doing that in queries? It seems useless for order by distance.
>
> Yes, that's why you would sort them by distance, where the distance is
> generated by the opclass as min/max distance between the summary and
> the distance's origin, and then inserted into the tuplesort.
>

 OK, so the query says "order by distance from point X" and we calculate
 the min/max distance of values in a given page range.
>>>
>>> Yes, and because it's BRIN that's an approximation, which should
>>> generally be fine.
>>>
>>
>> Approximation in what sense? My understanding was we'd get a range of
>> distances that we know covers all rows in that range. So the results
>> should be accurate, no?
> 
> The distance is going to be accurate only to the degree that the
> summary can produce accurate distances for the datapoints it
> represents. That can be quite imprecise due to the nature of the
> contained datapoints: a summary of the points (-1, -1) and (1, 1) will
> have a minimum distance of 0 to the origin, where the summary (-1, 0)
> and (-1, 0.5) would have a much more accurate distance of 1.

Ummm, I'm probably missing something, or maybe my mental model of this
is just wrong, but why would the distance for the second summary be more
accurate? Or what does "more accurate" mean?

Is that about the range of distances for the summary? For the first
range the summary is a bounding box [(-1,1), (1,1)] so all we know the
points may have distance in range [0, sqrt(2)]. While for the second
summary it's [1, sqrt(1.25)].

> The point I was making is that the summary can only approximate the
> distance, and that approximation is fine w.r.t. the BRIN sort
> algoritm.
> 

I think as long as the approximation (whatever it means) does not cause
differences in results (compared to not using an index), it's OK.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: logical decoding and replication of sequences, take 2




On 7/14/23 15:50, Ashutosh Bapat wrote:
> On Fri, Jul 14, 2023 at 3:59 PM Tomas Vondra
>  wrote:
> 
>>

 The new patch detects that, and triggers ERROR on the publisher. And I
 think that's the correct thing to do.
>>>
>>> With this behaviour users will never be able to setup logical
>>> replication between old and new servers considering almost every setup
>>> has sequences.
>>>
>>
>> That's not true.
>>
>> Replication to older versions works fine as long as the publication does
>> not include sequences (which need to be added explicitly). If you have a
>> publication with sequences, you clearly want to replicate them, ignoring
>> it is just confusing "magic".
> 
> I was looking at it from a different angle. Publishers publish what
> they want, subscribers choose what they want and what gets replicated
> is intersection of these two sets. Both live happily.
> 
> But I am fine with that too. It's just that users need to create more
> publications.
> 

I think you might make essentially the same argument about replicating
just some of the tables in the publication. That is, the publication has
tables t1 and t2, but subscriber only has t1. That will fail too, we
don't allow the subscriber to ignore changes for t2.

I think it'd be rather weird (and confusing) to do this differently for
different types of replicated objects.

>>
>> If you have a publication with sequences and still want to replicate to
>> an older server, create a new publication without sequences.
>>
> 
> I tested the current patches with subscriber at PG 14 and publisher at
> master + these patches. I created one table and a sequence on both
> publisher and subscriber. I created two publications, one with
> sequence and other without it. Both have the table in it. When the
> subscriber subscribes to the publication with sequence, following
> ERROR is repeated in the subscriber logs and nothing gets replicated
> ```
> [2023-07-14 18:55:41.307 IST] [916293] [] [] [3/30:0] LOG:  0:
> logical replication apply worker for subscription "sub5433" has
> started
> [2023-07-14 18:55:41.307 IST] [916293] [] [] [3/30:0] LOCATION:
> ApplyWorkerMain, worker.c:3169
> [2023-07-14 18:55:41.322 IST] [916293] [] [] [3/0:0] ERROR:  08P01:
> could not receive data from WAL stream: ERROR:  protocol version does
> not support sequence replication
> CONTEXT:  slot "sub5433", output plugin "pgoutput", in the
> sequence callback, associated LSN 0/1513718
> [2023-07-14 18:55:41.322 IST] [916293] [] [] [3/0:0] LOCATION:
> libpqrcv_receive, libpqwalreceiver.c:818
> [2023-07-14 18:55:41.325 IST] [916213] [] [] [:0] LOG:  0:
> background worker "logical replication worker" (PID 916293) exited
> with exit code 1
> [2023-07-14 18:55:41.325 IST] [916213] [] [] [:0] LOCATION:
> LogChildExit, postmaster.c:3737
> ```
> 
> When the subscriber subscribes to the publication without sequence,
> things work normally.
> 
> The cross-version replication is working as expected then.
> 

Thanks for testing / confirming this! So, do we agree this behavior is
reasonable?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: logical decoding and replication of sequences, take 2

On Fri, Jul 14, 2023 at 4:10 PM Tomas Vondra
 wrote:
>
> I don't think that's true - this will create 1 record with
> "created=true" (the one right after the CREATE SEQUENCE) and the rest
> will have "created=false".

I may have misread the code.

>
> I realized I haven't modified seq_desc to show this flag, so I did that
> in the updated patch version, which makes this easy to see.

Now I see it. Thanks for the clarification.

> >
> > Am I missing something here?
> >
>
> You're missing the fact that pg_upgrade does not copy replication slots,
> so the restart_lsn does not matter.
>
> (Yes, this is pretty annoying consequence of using pg_upgrade. And maybe
> we'll improve that in the future - but I'm pretty sure we won't allow
> decoding old WAL.)

Ah, I see. Thanks for correcting me.

> >>>
> >>
> >> H, that might work. I feel a bit uneasy about having to keep all
> >> relfilenodes, not just sequences ...
> >
> > From relfilenode it should be easy to get to rel and then see if it's
> > a sequence. Only add relfilenodes for the sequence.
> >
>
> Will try.
>

Actually, adding all relfilenodes to hash may not be that bad. There
shouldn't be many of those. So the extra step to lookup reltype may
not be necessary. What's your reason for uneasiness? But yeah, there's
a way to avoid that as well.

Should I wait for this before the second round of review?

-- 
Best Wishes,
Ashutosh Bapat

Re: Parallel CREATE INDEX for BRIN indexes




On 7/11/23 23:11, Matthias van de Meent wrote:
> On Thu, 6 Jul 2023 at 16:13, Tomas Vondra  
> wrote:
>>
>> On 7/5/23 16:33, Matthias van de Meent wrote:
>>> ...
>>>
 Maybe. I wasn't that familiar with what parallel tuplesort can and can't
 do, and the little I knew I managed to forget since I wrote this patch.
 Which similar features do you have in mind?
>>>
>>> I was referring to the feature that is "emitting a single sorted run
>>> of tuples at the leader backend based on data gathered in parallel
>>> worker backends". It manages the sort state, on-disk runs etc. so that
>>> you don't have to manage that yourself.
>>>
>>> Adding a new storage format for what is effectively a logical tape
>>> (logtape.{c,h}) and manually merging it seems like a lot of changes if
>>> that functionality is readily available, standardized and optimized in
>>> sortsupport; and adds an additional place to manually go through for
>>> disk-related changes like TDE.
>>>
>>
>> Here's a new version of the patch, with three main changes:
> 
> Thanks! I've done a review on the patch, and most looks good. Some
> places need cleanup and polish, some others more documentations, and
> there are some other issues, but so far it's looking OK.
> 
>> One thing I was wondering about is whether it might be better to allow
>> the workers to process overlapping ranges, and then let the leader to
>> merge the summaries. That would mean we might not need the tableam.c
>> changes at all, but the leader would need to do more work (although the
>> BRIN indexes tend to be fairly small). The main reason that got me
>> thinking about this is that we have pretty much no tests for the union
>> procedures, because triggering that is really difficult. But for
>> parallel index builds that'd be much more common.
> 
> Hmm, that's a good point. I don't mind either way, but it would add
> overhead in the leader to do all of that merging - especially when you
> configure pages_per_range > PARALLEL_SEQSCAN_MAX_CHUNK_SIZE as we'd
> need to merge up to parallel_workers tuples. That could be a
> significant overhead.
> 
> ... thinks a bit.
> 
> Hmm, but with the current P_S_M_C_S of 8192 blocks that's quite
> unlikely to be a serious problem - the per-backend IO saved of such
> large ranges on a single backend has presumably much more impact than
> the merging of n_parallel_tasks max-sized brin tuples. So, seems fine
> with me.
> 

As for PARALLEL_SEQSCAN_MAX_CHUNK_SIZE, the last patch actually
considers the chunk_factor (i.e. pages_per_range) *after* doing

pbscanwork->phsw_chunk_size = Min(pbscanwork->phsw_chunk_size,
  PARALLEL_SEQSCAN_MAX_CHUNK_SIZE);

so even with (pages_per_range > PARALLEL_SEQSCAN_MAX_CHUNK_SIZE) it
would not need to merge anything.

Now, that might have been a bad idea and PARALLEL_SEQSCAN_MAX_CHUNK_SIZE
should be considered. In which case this *has* to do the union, even if
only for the rare corner case.

But I don't think that's a major issue - it's pretty sure summarizing
the tuples is way more expensive than merging the summaries. Which is
what matters for Amdahl's law ...


> Review follows below.
> 
> Kind regards,
> 
> Matthias van de Meent
> Neon (https://neon.tech/)
> 
> ---
> 
>> diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
> 
>> +BrinShared   *brinshared;
> 
> Needs some indentation fixes.
> 
>> +intbs_reltuples;
>> [...]
>> +state->bs_reltuples += reltuples;
> 
> My IDE warns me that reltuples is a double. Looking deeper into the
> value, it contains the number of live tuples in the table, so this
> conversion may not result in a meaningful value for tables with >=2^31
> live tuples. Tables > 56GB could begin to get affected by this.
> 
>> +intbs_worker_id;
> 
> This variable seems to be unused.
> 
>> +BrinSpool  *bs_spool;
>> +BrinSpool  *bs_spool_out;
> 
> Are both used? If so, could you add comments why we have two spools
> here, instead of only one?
> 

OK, I admit I'm not sure both are actually necessary. I was struggling
getting it working with just one spool, because when the leader
participates as a worker, it needs to both summarize some of the chunks
(and put the tuples somewhere). And then it also needs to consume the
final output.

Maybe it's just a case of cargo cult programming - I was mostly copying
stuff from the btree index build, trying to make it work, and then with
two spools it started working.

>> +/*
>> + * A version of the callback, used by parallel index builds. The main 
>> difference
>> + * is that instead of writing the BRIN tuples into the index, we write them 
>> into
>> + * a shared tuplestore file, and leave the insertion up to the leader 
>> (which may
> 
> + ... shared tuplesort, and ...
> 
>> brinbuildCallbackParallel(...)
>> +while (thisblock > state->bs_currRangeStart + state->bs_pagesPerRange - 
>> 1)
> 
> shouldn't this be an 'if' now?
>

Re: logical decoding and replication of sequences, take 2

On Fri, Jul 14, 2023 at 3:59 PM Tomas Vondra
 wrote:

>
> >>
> >> The new patch detects that, and triggers ERROR on the publisher. And I
> >> think that's the correct thing to do.
> >
> > With this behaviour users will never be able to setup logical
> > replication between old and new servers considering almost every setup
> > has sequences.
> >
>
> That's not true.
>
> Replication to older versions works fine as long as the publication does
> not include sequences (which need to be added explicitly). If you have a
> publication with sequences, you clearly want to replicate them, ignoring
> it is just confusing "magic".

I was looking at it from a different angle. Publishers publish what
they want, subscribers choose what they want and what gets replicated
is intersection of these two sets. Both live happily.

But I am fine with that too. It's just that users need to create more
publications.

>
> If you have a publication with sequences and still want to replicate to
> an older server, create a new publication without sequences.
>

I tested the current patches with subscriber at PG 14 and publisher at
master + these patches. I created one table and a sequence on both
publisher and subscriber. I created two publications, one with
sequence and other without it. Both have the table in it. When the
subscriber subscribes to the publication with sequence, following
ERROR is repeated in the subscriber logs and nothing gets replicated
```
[2023-07-14 18:55:41.307 IST] [916293] [] [] [3/30:0] LOG:  0:
logical replication apply worker for subscription "sub5433" has
started
[2023-07-14 18:55:41.307 IST] [916293] [] [] [3/30:0] LOCATION:
ApplyWorkerMain, worker.c:3169
[2023-07-14 18:55:41.322 IST] [916293] [] [] [3/0:0] ERROR:  08P01:
could not receive data from WAL stream: ERROR:  protocol version does
not support sequence replication
CONTEXT:  slot "sub5433", output plugin "pgoutput", in the
sequence callback, associated LSN 0/1513718
[2023-07-14 18:55:41.322 IST] [916293] [] [] [3/0:0] LOCATION:
libpqrcv_receive, libpqwalreceiver.c:818
[2023-07-14 18:55:41.325 IST] [916213] [] [] [:0] LOG:  0:
background worker "logical replication worker" (PID 916293) exited
with exit code 1
[2023-07-14 18:55:41.325 IST] [916213] [] [] [:0] LOCATION:
LogChildExit, postmaster.c:3737
```

When the subscriber subscribes to the publication without sequence,
things work normally.

The cross-version replication is working as expected then.

-- 
Best Wishes,
Ashutosh Bapat

Re: Partial aggregates pushdown

2023-07-14 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-10 10:35:

I have modified the program except for the point "if the version of
the remote server is less than PG17".
Instead, we have addressed the following.
"If check_partial_aggregate_support is true and the remote server
version is older than the local server
version, postgres_fdw does not assume that the partial aggregate
function is on the remote server unless
the partial aggregate function and the aggregate function match."
The reason for this is to maintain compatibility with any aggregate
function that does not support partial
aggregate in one version of V1 (V1 is PG17 or higher), even if the
next version supports partial aggregate.
For example, string_agg does not support partial aggregation in PG15,
but it will support partial aggregation
in PG16.



Hi.

1) In foreign_join_ok() should we set fpinfo->user if 
fpinfo->check_partial_aggregate_support is set like it's done for 
fpinfo->use_remote_estimate? It seems we can end up with fpinfo->user = 
NULL if use_remote_estimate is not set.


2) It seeems we found an additional issue with original patch, which is 
present in current one. I'm attaching a patch which seems to fix it, but 
I'm not quite sure in it.




We have not been able to add a test for the case where the remote
server version is older than the
local server version to the regression test. Is there any way to add
such tests to the existing regression
tests?



--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 187d15185200aabc22c5219bbe636bc950670a78 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Fri, 14 Jul 2023 16:34:02 +0300
Subject: [PATCH] For partial aggregation we can't rely on the fact that every
 var is a part of some GROUP BY expression

---
 .../postgres_fdw/expected/postgres_fdw.out| 120 ++
 contrib/postgres_fdw/postgres_fdw.c   |  31 +++--
 contrib/postgres_fdw/sql/postgres_fdw.sql |   9 ++
 3 files changed, 152 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 8d80ba0a6be..31ee461045c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9955,6 +9955,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1;
  49 | 19. |  29 |60
 (50 rows)
 
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+   QUERY PLAN   
+
+ Sort
+   Output: ((pagg_tab.b / 2)), (avg(pagg_tab.a)), (max(pagg_tab.a)), (count(*))
+   Sort Key: ((pagg_tab.b / 2))
+   ->  Finalize HashAggregate
+ Output: ((pagg_tab.b / 2)), avg(pagg_tab.a), max(pagg_tab.a), count(*)
+ Group Key: ((pagg_tab.b / 2))
+ ->  Append
+   ->  Foreign Scan
+ Output: ((pagg_tab.b / 2)), (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*))
+ Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab)
+ Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p1 GROUP BY 1
+   ->  Foreign Scan
+ Output: ((pagg_tab_1.b / 2)), (PARTIAL avg(pagg_tab_1.a)), (PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*))
+ Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab_1)
+ Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p2 GROUP BY 1
+   ->  Foreign Scan
+ Output: ((pagg_tab_2.b / 2)), (PARTIAL avg(pagg_tab_2.a)), (PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*))
+ Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab_2)
+ Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p3 GROUP BY 1
+(19 rows)
+
+SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+ ?column? | avg | max | count 
+--+-+-+---
+0 | 10.5000 |  21 |   120
+1 | 12.5000 |  23 |   120
+2 | 14.5000 |  25 |   120
+3 | 16.5000 |  27 |   120
+4 | 18.5000 |  29 |   120
+5 | 10.5000 |  21 |   120
+6 | 12.5000 |  23 |   120
+7 | 14.5000 |  25 |   120
+8 | 16.5000 |  27 |   120
+9 | 18.5000 |  29 |   120
+   10 | 10.5000 |  21 |   120
+   11 | 12.5000 |  23 |   120
+   12 | 14.5000 |  25 |   120
+   13 | 16.5000 |  27 |   120
+

Re: pg_dump needs SELECT privileges on irrelevant extension table

2023-07-14 Thread Andrew Dunstan

On 2023-06-29 Th 12:24, Jacob Champion wrote:

On 3/20/23 10:43, Tom Lane wrote:

I'd be more willing to consider the proposed patch if it weren't such
a hack --- as you say, it doesn't fix the problem when the table has
policies, so it's hardly a general-purpose solution. I fear that it's
also fairly expensive: adding sub-selects to the query we must do
before we can lock any tables is not appetizing, because making that
window wider adds to the risk of deadlocks, dump failures, etc.

(moving to -hackers and July CF)

= Recap for Potential Reviewers =

The timescaledb extension has an internal table that's owned by the
superuser. It's not dumpable, and other users can only access its
contents through a filtered view. For our cloud deployments, we
additionally have that common trope where the most privileged users
aren't actually superusers, but we still want them to be able to perform
a subset of maintenance tasks, including pg_dumping their data.

When cloud users try to dump that data, pg_dump sees that this internal
table is an extension member and plans to dump ACLs, security labels,
and RLS policies for it. (This behavior cannot be overridden with
--exclude-table. pg_dump ignores that flag for extension members.)
Dumping policies requires the use of pg_get_expr() on the backend; this,
in turn, requires a lock on the table with ACCESS SHARE.

So pg_dump tries to lock a table, with no policies, that it's not going
to dump the schema or data for anyway, and it fails because our users
don't have (and shouldn't need) SELECT access to it. For an example of
this in action, I've attached a test case in v2-0001.

= Proposal =

Since this is affecting users on released Postgres versions, my end goal
is to find a fix that's backportable.

This situation looks very similar to [1], where non-superusers couldn't
perform a dump because we were eagerly grabbing table locks to read
(non-existent) ACLs. But that was solved with the realization that ACLs
don't need locks anyway, which is unfortunately not applicable to policies.

My initial patch to -bugs was a riff on a related performance fix [2],
which figured out which tables had interesting ACLs and skipped that
part if nothing was found. I added the same kind of subselect for RLS
policies as well, but that had nasty corner cases where it would perform
terribly, as Tom alluded to above. (In a cluster of 200k tables, where
one single table had 10M policies, the query ground to a halt.)

So v2-0002 is instead inspired by Tom's rewrite of that ACL dump logic
[3]. It scans pg_policy separately, stores the tables it finds into the
catalog map on the client side, and then again skips the policy dump
(and therefore the lock) if no policies exist. The performance hit now
scales with the size of pg_policy alone.

This is a bit more invasive than the subselect, but hopefully still
straightforward enough to be applicable to the back branches' old
catalog map strategy. It's still not a general-purpose fix, as Tom
pointed out above, but that was true of the discussion in [1] as well,
so I'm optimistic.

WDYT?

--Jacob

[1]
https://postgr.es/m/CAGPqQf3Uzo-yU1suYyoZR83h6QTxXxkGTtEyeMV7EAVBqn%3DPcQ%40mail.gmail.com
[2]https://git.postgresql.org/cgit/postgresql.git/commit/?id=5d589993
[3]https://git.postgresql.org/cgit/postgresql.git/commit/?id=0c9d8442

Seems reasonable at first glance. Isn't it going to save some work
anyway later on, so the performance hit could end up negative?

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: pgbnech: allow to cancel queries during benchmark

2023-07-14 Thread Yugo NAGATA

Hello Fabien,

Thank you for your review!

On Mon, 3 Jul 2023 20:39:23 +0200 (CEST)
Fabien COELHO  wrote:

> 
> Yugo-san,
> 
> Some feedback about v1 of this patch.
> 
> Patch applies cleanly, compiles.
> 
> There are no tests, could there be one? ISTM that one could be done with a 
> "SELECT pg_sleep(...)" script??

Agreed. I will add the test.

> The global name "all_state" is quite ambiguous, what about "client_states" 
> instead? Or maybe it could be avoided, see below.
> 
> Instead of renaming "state" to "all_state" (or client_states as suggested 
> above), I'd suggest to minimize the patch by letting "state" inside the 
> main and adding a "client_states = state;" just after the allocation, or 
> another approach, see below.

Ok, I'll fix to add a global variable "client_states" and make this point to
"state" instead of changing "state" to global.
 
> Should PQfreeCancel be called on deconnections, in finishCon? I think that 
> there may be a memory leak with the current implementation??

Agreed. I'll fix.
 
> Maybe it should check that cancel is not NULL before calling PQcancel?

I think this is already checked as below, but am I missing something?

+   if (all_state[i].cancel != NULL)
+   (void) PQcancel(all_state[i].cancel, errbuf, sizeof(errbuf));

> After looking at the code, I'm very unclear whether they may be some 
> underlying race conditions, or not, depending on when the cancel is 
> triggered. I think that some race conditions are still likely given the 
> current thread 0 implementation, and dealing with them with a barrier or 
> whatever is not desirable at all.
> 
> In order to work around this issue, ISTM that we should go away from the 
> simple and straightforward thread 0 approach, and the only clean way is 
> that the cancelation should be managed by each thread for its own client.
> 
> I'd suggest to have the advanceState to call PQcancel when CancelRequested 
> is set and switch to CSTATE_ABORTED to end properly. This means that there 
> would be no need for the global client states pointer, so the patch should 
> be smaller and simpler. Possibly there would be some shortcuts added here 
> and there to avoid lingering after the control-C, in threadRun.

I am not sure this approach is simpler than mine. 

In multi-threads, only one thread can catches the signal and other threads
continue to run. Therefore, if Ctrl+C is pressed while threads are waiting
responses from the backend in wait_on_socket_set, only one thread can be
interrupted and return, but other threads will continue to wait and cannot
check CancelRequested. So, for implementing your suggestion, we need any hack
to make all threads return from wait_on_socket_set when the event occurs, but
I don't have idea to do this in simpler way. 

In my patch, all threads can return from wait_on_socket_set at Ctrl+C
because when thread #0 cancels all connections, the following error is
sent to all sessions:

 ERROR:  canceling statement due to user request

and all threads will receive the response from the backend.

Regards,
Yugo Nagata

-- 
Yugo NAGATA

Re: Support logical replication of DDLs

On Tue, Jul 11, 2023 at 4:31 PM Zhijie Hou (Fujitsu)
 wrote:
>
> We have been researching how to create a test that detects failures resulting
> from future syntax changes, where the deparser fails to update properly.
>
> The basic idea comes from what Robert Haas suggested in [1]: when running the
> regression test(tests in parallel_schedule), we replace the executing ddl
> statement with the its deparsed version and execute the deparsed statement, so
> that we can run all the regression with the deparsed statement and can expect
> the output to be the same as the existing expected/*.out. As developers
> typically add new regression tests to test new syntax, so we expect this test
> can automatically identify most of the new syntax changes.
>
> One thing to note is that when entering the event trigger(where the deparsing
> happen), the ddl have already been executed. So we need to get the deparsed
> statement before the execution and replace the current statement with it. To
> achieve this, the current approach is to create another database with deparser
> trigger and in the ProcessUtility hook(e.g. tdeparser_ProcessUtility in the
> patch) we redirect the local statement to that remote database, then the
> statment will be deparsed and stored somewhere, we can query the remote
> database to get the deparsed statement and use it to replace the original
> statment.
>
> The process looks like:
> 1) create another database and deparser event trigger in it before running 
> the regression test.
> 2) run the regression test (the statement will be redirect to the remote 
> database and get the deparsed statement)
> 3) compare the output file with the expected ones.
>
> Attach the POC patch(0004) for this approach. Note that, the new test module 
> only
> test test_setup.sql, create_index.sql, create_table.sql and alter_table.sql as
> we only support deparsing TABLE related command for now. And the current test
> cannot pass because of some bugs in deparser, we will fix these bugs soon.
>
>
> TO IMPROVE:
>
> 1. The current approach needs to handle the ERRORs, WARNINGs, and NOTICEs from
> the remote database. Currently it will directly rethrow any ERRORs encountered
> in the remote database. However, for WARNINGs and NOTICEs, we will only 
> rethrow
> them along with the ERRORs. This is done to prevent duplicate messages in the
> output file during local statement execution, which would be inconsistent with
> the existing expected output. Note that this approach may potentially miss 
> some
> bugs, as there could be additional WARNINGs or NOTICEs caused by the deparser
> in the remote database.
>
> 2. The variable reference and assignment (xxx /gset and select :var_name) will
> not be sent to the server(only the qualified value will be sent), but it's
> possible the variable in another session should be set to a different value, 
> so
> this can cause inconsistent output.
>
> 3 .CREATE INDEX CONCURRENTLY will create an invalid index internally even if 
> it
> reports an ERROR later. But since we will directly rethrow the remote ERROR in
> the main database, we won't execute the "CREATE INDEX CONCURRENTLY" in the 
> main
> database. This means that we cannot see the invalid index in the main 
> database.
>
> To improve the above points, another variety is: run the regression test 
> twice.
> The first run is solely intended to collect all the deparsed statements. We 
> can
> dump these statements from the database and then reload them in the second
> regression run. This allows us to utilize the deparsed statements to replace
> the local statements in the second regression run. This approach does not need
> to handle any remote messages and client variable stuff during execution,
> although it could take more time to finsh the test.
>

I agree that this second approach can take more time but it would be
good to avoid special-purpose code the first approach needs. BTW, can
we try to evaluate the time difference between both approaches?
Anyway, in the first approach also, we need to run the test statement
twice.

-- 
With Regards,
Amit Kapila.

Re: logical decoding and replication of sequences, take 2




On 7/14/23 09:34, Ashutosh Bapat wrote:
> On Thu, Jul 13, 2023 at 9:47 PM Tomas Vondra
>  wrote:
> 
>>

 6) simplified protocol versioning
>>>
>>> I had tested the cross-version logical replication with older set of
>>> patches. Didn't see any unexpected behaviour then. I will test again.

>>
>> I think the question is what's the expected behavior. What behavior did
>> you expect/observe?
> 
> Let me run my test again and respond.
> 
>>
>> IIRC with the previous version of the patch, if you connected an old
>> subscriber (without sequence replication), it just ignored/skipped the
>> sequence increments and replicated the other changes.
> 
> I liked that.
> 

I liked that too, initially (which is why I did it that way). But I
changed my mind, because it's likely to cause more harm than good.

>>
>> The new patch detects that, and triggers ERROR on the publisher. And I
>> think that's the correct thing to do.
> 
> With this behaviour users will never be able to setup logical
> replication between old and new servers considering almost every setup
> has sequences.
> 

That's not true.

Replication to older versions works fine as long as the publication does
not include sequences (which need to be added explicitly). If you have a
publication with sequences, you clearly want to replicate them, ignoring
it is just confusing "magic".

If you have a publication with sequences and still want to replicate to
an older server, create a new publication without sequences.

>>
>> There was a lengthy discussion about making this more flexible (by not
>> tying this to "linear" protocol version) and/or permissive. I tried
>> doing that by doing similar thing to decoding of 2PC, which allows
>> choosing when creating a subscription.
>>
>> But ultimately that just chooses where to throw an error - whether on
>> the publisher (in the output plugin callback) or on apply side (when
>> trying to apply change to non-existent sequence).
> 
> I had some comments on throwing error in [1], esp. towards the end.
> 

Yes. You said:

If a given output plugin doesn't implement the callbacks but
subscription specifies sequences, the code will throw an error
whether or not publication is publishing sequences.

This refers to situation when the subscriber says "sequences" when
opening the connection. And this happens *in the plugin* which also
defines the callbacks, so I don't see how we could not have the
callbacks defined ...

Furthermore, the simplified protocol versioning does away with the
"sequences" option, so in that case this can't even happen.

>>
>> I still think it might be useful to have these "capabilities" orthogonal
>> to the protocol version, but it's a matter for a separate patch. It's
>> enough not to fail with "unknown message" on the subscriber.
> 
> Yes, We should avoid breaking replication with "unknown message".
> 
> I also agree that improving things in this area can be done in a
> separate patch, but as far as possible in this release itself.
> 
>>> If the DDL replication takes care of replicating and applying sequence
>>> changes, I think we don't need the changes tracking "transactional"
>>> sequence changes in this patch-set. That also makes a case for not
>>> adding a new field to WAL which may not be used.
>>>
>>
>> Maybe, but the DDL replication patch is not there yet, and I'm not sure
>> it's a good idea to make this patch wait for a much larger/complex
>> patch. If the DDL replication patch gets committed, it may ditch this
>> part (assuming it happens in the same development cycle).
>>
>> However, my impression was DDL replication would be optional. In which
>> case we still need to handle the transactional case, to support sequence
>> replication without DDL replication enabled.
> 
> As I said before, I don't think this patchset needs to wait for DDL
> replication patch. Let's hope that the later lands in the same release
> and straightens protocol instead of carrying it forever.
> 

OK, I agree with that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Remove distprep

Peter Eisentraut  writes:
> Ah, there was a reason.  The headerscheck and cpluspluscheck targets 
> need jsonpath_gram.h to be built first.  Which previously happened 
> indirectly somehow?  I have fixed this in the new patch version.  I also 
> fixed the issue that Álvaro reported nearby.

Have we yet run this concept past the packagers list?  I'm still
wondering if they will raise any objection to getting rid of all
the prebuilt files.

Also, I personally don't like the fact that you have removed the
distinction between distclean and maintainer-clean.  I use
distclean-and-reconfigure quite a lot to avoid having to rebuild
bison/flex outputs.  This patch seems to have destroyed that
workflow optimization in return for not much.

regards, tom lane

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-14 Thread Melih Mutlu

Hi,

Amit Kapila , 14 Tem 2023 Cum, 11:11 tarihinde
şunu yazdı:

> Yeah, it is quite surprising that Design#2 is worse than master. I
> suspect there is something wrong going on with your Design#2 patch.
> One area to check is whether apply worker is able to quickly assign
> the new relations to tablesync workers. Note that currently after the
> first time assigning the tables to workers, the apply worker may wait
> before processing the next set of tables in the main loop of
> LogicalRepApplyLoop(). The other minor point about design#2
> implementation is that you may want to first assign the allocated
> tablesync workers before trying to launch a new worker.
>

It's not actually worse than master all the time. It seems like it's just
unreliable.
Here are some consecutive runs for both designs and master.

design#1 = 1621,527 ms, 1788,533 ms, 1645,618 ms, 1702,068 ms, 1745,753 ms
design#2 = 2089,077 ms, 1864,571 ms, 4574,799 ms, 5422,217 ms, 1905,944 ms
master = 2815,138 ms, 2481,954 ms , 2594,413 ms, 2620,690 ms, 2489,323 ms

And apply worker was not busy with applying anything during these
experiments since there were not any writes to the publisher. I'm not sure
how that would also affect the performance if there were any writes.

Thanks,
-- 
Melih Mutlu
Microsoft

Re: 16beta2 SQL parser: different defaults on absent_on_null

2023-07-14 Thread Martin Butter


Hello Daniel,

Thanks for the explanation, it sounds reasonable. I'm glad it is not a bug.

Regards,
Martin.

On 14/07/2023 10:29, Daniel Gustafsson wrote:

On 14 Jul 2023, at 07:53, Martin Butter  wrote:
While adapting a Java implementation of the SQL parser, I noticed that in 
structures JsonArrayAgg, JsonArrayConstructor, JsonArrayQueryConstructor and 
JsonObjectConstrutor, the absent_on_null field defaults to TRUE.
But in JsonObjectAgg, absent_on_null defaults to FALSE.
Is that intentionally?

I would say so, an empty NULL|ABSENT ON NULL clause for arrays is defined as
true, while for objects it's defined as false (which is shared between both
json_object() and json_objectagg()).

--
Daniel Gustafsson


--
Martin Butter
Developer

Splendid Data Nederland B.V.
Binnenhof 62A
1412 LC  NAARDEN

T: +31 (0)85 773 19 99
M: +31 (0)6 226 946 62
E: martin.but...@splendiddata.com

http://www.splendiddata.com/

Re: Remove distprep

2023-07-14 Thread Peter Eisentraut


On 14.07.23 09:54, Peter Eisentraut wrote:
diff --git a/src/tools/pginclude/cpluspluscheck 
b/src/tools/pginclude/cpluspluscheck

index 4e09c4686b..287395887c 100755
--- a/src/tools/pginclude/cpluspluscheck
+++ b/src/tools/pginclude/cpluspluscheck
@@ -134,6 +134,9 @@ do
  test "$f" = src/interfaces/ecpg/preproc/preproc.h && continue
  test "$f" = src/test/isolation/specparse.h && continue

+    # FIXME
+    test "$f" = src/backend/utils/adt/jsonpath_internal.h && continue
+
  # ppport.h is not under our control, so we can't make it 
standalone.

  test "$f" = src/pl/plperl/ppport.h && continue


Hm, what's that about?


Don't remember ... ;-)  I removed this.


Ah, there was a reason.  The headerscheck and cpluspluscheck targets 
need jsonpath_gram.h to be built first.  Which previously happened 
indirectly somehow?  I have fixed this in the new patch version.  I also 
fixed the issue that Álvaro reported nearby.
From 5b5e46ea28f4911408eb40936e814b4a05281baa Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 14 Jul 2023 10:53:40 +0200
Subject: [PATCH v3] Remove distprep

A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were already required by meson.build):

- bison
- flex
- perl
---
 GNUmakefile.in|  6 +-
 config/perl.m4|  9 +-
 config/programs.m4| 21 +
 configure | 62 ++
 contrib/cube/Makefile |  7 +-
 contrib/fuzzystrmatch/Makefile|  9 +-
 contrib/seg/Makefile  |  7 +-
 doc/Makefile  |  2 +-
 doc/src/Makefile  |  2 +-
 doc/src/sgml/Makefile |  9 +-
 doc/src/sgml/installation.sgml| 83 ---
 meson.build   |  2 +-
 src/Makefile  |  5 +-
 src/Makefile.global.in| 30 +++
 src/backend/Makefile  | 65 ++-
 src/backend/bootstrap/Makefile|  6 +-
 src/backend/catalog/Makefile  | 14 +---
 src/backend/jit/llvm/Makefile |  2 +-
 src/backend/nls.mk|  2 +-
 src/backend/nodes/Makefile| 12 +--

Re: MERGE ... RETURNING

2023-07-14 Thread Dean Rasheed

On Thu, 13 Jul 2023 at 20:14, Jeff Davis  wrote:
>
> On Thu, 2023-07-13 at 18:01 +0100, Dean Rasheed wrote:
> > For some use cases, I can imagine allowing OLD/NEW.colname would mean
> > you wouldn't need pg_merge_action() (if the column was NOT NULL), so
> > I
> > think the features should work well together.
>
> For use cases where a user could do it either way, which would you
> expect to be the "typical" way (assuming we supported the new/old)?
>
>   MERGE ... RETURNING pg_merge_action(), id, val;
>
> or
>
>   MERGE ... RETURNING id, OLD.val, NEW.val;
>
> ?
>

I think it might depend on whether OLD.val and NEW.val were actually
required, but I think I would still probably use pg_merge_action() to
get the action, since it doesn't rely on specific table columns being
NOT NULL. It's a little like writing a trigger function that handles
multiple command types. You could use OLD and NEW to deduce whether it
was an INSERT, UPDATE or DELETE, or you could use TG_OP. I tend to use
TG_OP, but maybe there are situations where using OLD and NEW is more
natural.

I found a 10-year-old thread discussing adding support for OLD/NEW to
RETURNING [1], but it doesn't look like anything close to a
committable solution was developed, or even a design that might lead
to one. That's a shame, because there seemed to be a lot of demand for
the feature, but it's not clear how much effort it would be to
implement.

> I am still bothered that pg_merge_action() is so context-sensitive.
> "SELECT pg_merge_action()" by itself doesn't make any sense, but it's
> allowed in the v8 patch. We could make that a runtime error, which
> would be better, but it feels like it's structurally wrong. This is not
> an objection, but it's just making me think harder about alternatives.
>
> Maybe instead of a function it could be a special table reference like:
>
>   MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val?
>

Well, that's a little more concise, but I'm not sure that it really
buys us that much, to be worth the extra complication. Presumably
something in the planner would turn that into something the executor
could handle, which might just end up being the existing functions
anyway.

Regards,
Dean

[1] https://www.postgresql.org/message-id/flat/51822C0F.5030807%40gmail.com

RE: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

2023-07-14 Thread Hayato Kuroda (Fujitsu)

Dear Amit, Sergei,

> > I think it's appropriate to add on the restrictions page. (But mentioning 
> > that this
> restriction is only for subscriber)
> >
> > If the list were larger, then the restrictions page could be divided into 
> > publisher
> and subscriber restrictions. But not for one very specific restriction.
> >
> 
> Okay, how about something like: "The UPDATE and DELETE operations
> cannot be applied on the subscriber for the published tables that
> specify REPLICA IDENTITY FULL when the table has attributes with
> datatypes (e.g point or box) that don't have a default operator class
> for Btree or Hash. This won't be a problem if the table has a primary
> key or replica identity defined for it."?

Thanks for discussing and giving suggestions. But it seems that the first
sentence is difficult to read for me. How about attached?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v4_add_description.patch
Description: v4_add_description.patch

Re: DROP DATABASE is interruptible

2023-07-14 Thread Daniel Gustafsson

> On 13 Jul 2023, at 22:52, Andres Freund  wrote:
> On 2023-07-12 11:54:18 +0200, Daniel Gustafsson wrote:

>> Looking more at this I wonder if we in HEAD should make this a bit nicer by
>> extending the --check phase to catch this?  I did a quick hack along these
>> lines in the 0003 commit attached here (0001 and 0002 are your unchanged
>> patches, just added for consistency and to be CFBot compatible).  If done it
>> could be a separate commit to make the 0002 patch backport cleaner of course.
> 
> I don't really have an opinion on that, tbh...

Fair enough.  Thinking more on it I think it has merits, so I will submit that
patch in its own thread on -hackers.

--
Daniel Gustafsson

Re: 16beta2 SQL parser: different defaults on absent_on_null

2023-07-14 Thread Daniel Gustafsson

> On 14 Jul 2023, at 07:53, Martin Butter  
> wrote:

> While adapting a Java implementation of the SQL parser, I noticed that in 
> structures JsonArrayAgg, JsonArrayConstructor, JsonArrayQueryConstructor and 
> JsonObjectConstrutor, the absent_on_null field defaults to TRUE.
> But in JsonObjectAgg, absent_on_null defaults to FALSE.
> Is that intentionally?

I would say so, an empty NULL|ABSENT ON NULL clause for arrays is defined as
true, while for objects it's defined as false (which is shared between both
json_object() and json_objectagg()).

--
Daniel Gustafsson

Re: Remove distprep

2023-07-14 Thread Alvaro Herrera

On 2023-Jul-14, Peter Eisentraut wrote:

> diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile
> index 9f1c4022bb..3d33b082f2 100644
> --- a/src/backend/parser/Makefile
> +++ b/src/backend/parser/Makefile
> @@ -64,8 +64,8 @@ scan.c: FLEX_FIX_WARNING=yes
>  # Force these dependencies to be known even without dependency info built:
>  gram.o scan.o parser.o: gram.h
>  
> -
> -# gram.c, gram.h, and scan.c are in the distribution tarball, so they
> -# are not cleaned here.
> -clean distclean maintainer-clean:
> +clean:
> + rm -f parser/gram.c \
> +   parser/gram.h \
> +   parser/scan.c
>   rm -f lex.backup

Hmm, this hunk and the equivalents in src/backend/bootstrap and
src/backend/replication are wrong: you moved the rule from the parent
directory's makefile to the directory where the files reside, but didn't
remove the directory name from the command arguments, so the files
aren't actually deleted.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El destino baraja y nosotros jugamos" (A. Schopenhauer)

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-14 Thread Peter Smith

Hi Kuroda-san.

Here are some review comments for the v17-0003 patch. They are all minor.

==
Commit message

1.
Previously tablesync workers establish new connections when it changes
the syncing
table, but this might have additional overhead. This patch allows to
reuse connections
instead.

~

/This patch allows to reuse connections instead./This patch allows the
existing connection to be reused./

~~~

2.
As for the publisher node, this patch allows to reuse logical
walsender processes
after the streaming is done once.

~

Is this paragraph even needed? Since the connection is reused then it
already implies the other end (the Wlasender) is being reused, right?

==
src/backend/replication/logical/tablesync.c

3.
+ * FIXME: set appropriate application_name. Previously, the slot name was used
+ * because the lifetime of the tablesync worker was same as that, but now the
+ * tablesync worker handles many slots during the synchronization so that it is
+ * not suitable. So what should be? Note that if the tablesync worker starts to
+ * reuse the replication slot during synchronization, we should use the slot
+ * name as application_name again.
+ */
+static void
+ApplicationNameForTablesync(Oid suboid, int worker_slot,
+ char *application_name, Size szapp)

3a.
I felt that most of this FIXME comment belongs with the calling code,
not here.

3b.
Also, maybe it needs some rewording -- I didn't understand exactly
what it is trying to say.


~~~

4.
- /*
- * Here we use the slot name instead of the subscription name as the
- * application_name, so that it is different from the leader apply worker,
- * so that synchronous replication can distinguish them.
- */
- LogRepWorkerWalRcvConn =
- walrcv_connect(MySubscription->conninfo, true,
-must_use_password,
-slotname, );
+ /* Connect to the publisher if haven't done so already. */
+ if (LogRepWorkerWalRcvConn == NULL)
+ {
+ char application_name[NAMEDATALEN];
+
+ /*
+ * The application_name must be also different from the leader apply
+ * worker because synchronous replication must distinguish them.
+ */
+ ApplicationNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->worker_slot,
+ application_name,
+ NAMEDATALEN);
+ LogRepWorkerWalRcvConn =
+ walrcv_connect(MySubscription->conninfo, true,
+must_use_password,
+application_name, );
+ }
+

Should the comment mention the "subscription name" as it did before?

SUGGESTION
The application_name must differ from the subscription name (used by
the leader apply worker) because synchronous replication has to be
able to distinguish this worker from the leader apply worker.

==
src/backend/replication/logical/worker.c

5.
-start_table_sync(XLogRecPtr *origin_startpos, char **myslotname)
+start_table_sync(XLogRecPtr *origin_startpos,
+ char **myslotname)

This is a wrapping change only. It looks like an unnecessary hangover
from a previous version of 0003.

==
src/backend/replication/walsender.c

6. exec_replication_command

+
  if (cmd->kind == REPLICATION_KIND_PHYSICAL)
  StartReplication(cmd);
~

The extra blank line does not belong in this patch.

==
src/include/replication/worker_internal.h

+ /* Indicates the slot number which corresponds to this LogicalRepWorker. */
+ int worker_slot;
+

6a
I think this field is very fundamental, so IMO it should be defined at
the top of the struct, maybe nearby the other 'in_use' and
'generation' fields.

~

6b.
Also, since this is already a "worker" struct so there is no need to
have "worker" in the field name again -- just "slot_number" or
"slotnum" might be a better name.

And then the comment can also be simplified.

SUGGESTION
/* Slot number of this worker. */
int slotnum;

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [Patch] Use other indexes on the subscriber when REPLICA IDENTITY is FULL

On Thu, Jul 13, 2023 at 8:07 PM Önder Kalacı  wrote:
>
> Looks good to me. I have also done some testing, which works as 
> documented/expected.
>

Thanks, I have pushed this after minor changes in the comments.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

On Fri, Jul 14, 2023 at 1:58 AM Melih Mutlu  wrote:
>
> Here are some quick numbers with 100 empty tables.
>
> +--++++
> |  | 2 sync workers | 4 sync workers | 8 sync workers |
> +--++++
> | POC design#1 | 1909.873 ms| 986.261 ms | 552.404 ms |
> +--++++
> | POC design#2 | 4962.208 ms| 1240.503 ms| 1165.405 ms|
> +--++++
> | master   | 2666.008 ms| 1462.012 ms| 986.848 ms |
> +--++++
>
> Seems like design#1 is better than both design#2 and master overall. It's 
> surprising to see that even master beats design#2 in some cases though. Not 
> sure if that is expected or there are some places to improve design#2 even 
> more.
>

Yeah, it is quite surprising that Design#2 is worse than master. I
suspect there is something wrong going on with your Design#2 patch.
One area to check is whether apply worker is able to quickly assign
the new relations to tablesync workers. Note that currently after the
first time assigning the tables to workers, the apply worker may wait
before processing the next set of tables in the main loop of
LogicalRepApplyLoop(). The other minor point about design#2
implementation is that you may want to first assign the allocated
tablesync workers before trying to launch a new worker.

>
> PS: I only attached the related patches and not the whole patch set. 0001 and 
> 0002 may contain some of your earlier reviews, but I'll send a proper updated 
> set soon.
>

Yeah, that would be helpful.

-- 
With Regards,
Amit Kapila.

Re: Remove distprep

2023-07-14 Thread Peter Eisentraut


On 13.07.23 01:19, Andres Freund wrote:

One thing in particular that isn't clear right now is how "make world"
should behave if the documentation tools are not found.  Maybe we should
make a build option, like "--with-docs", to mirror the meson behavior.


Isn't that somewhat unrelated to distprep?  I see that you removed missing,
but I don't really understand why as part of this commit?


Ok, I put the docs stuff back the way it was and put "missing" back in.


-# If there are any files in the source directory that we also generate in the
-# build directory, they might get preferred over the newly generated files,
-# e.g. because of a #include "file", which always will search in the current
-# directory first.
-message('checking for file conflicts between source and build directory')


You're thinking this can be removed because distclean is now reliable? There
were some pretty annoying to debug issues early on, where people switched from
an in-tree autoconf build to meson, with some files left over from the source
build, causing problems at a *later* time (when the files should have changed,
but the wrong ones were picked up).  That's not really related distprep etc,
so I'd change this separately, if we want to change it.


Ok, I kept it in.


diff --git a/src/tools/pginclude/cpluspluscheck 
b/src/tools/pginclude/cpluspluscheck
index 4e09c4686b..287395887c 100755
--- a/src/tools/pginclude/cpluspluscheck
+++ b/src/tools/pginclude/cpluspluscheck
@@ -134,6 +134,9 @@ do
test "$f" = src/interfaces/ecpg/preproc/preproc.h && continue
test "$f" = src/test/isolation/specparse.h && continue

+   # FIXME
+   test "$f" = src/backend/utils/adt/jsonpath_internal.h && continue
+
# ppport.h is not under our control, so we can't make it standalone.
test "$f" = src/pl/plperl/ppport.h && continue


Hm, what's that about?


Don't remember ... ;-)  I removed this.

Attached is a new version with the above changes, also updated for the 
recently added generate-wait_event_types.pl, and I have adjusted all the 
header file linking to use relative paths consistently.  This addresses 
all issues known to me.
From 260d055b0428130d9db96bed2298495ce7e93505 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 14 Jul 2023 09:33:09 +0200
Subject: [PATCH v2] Remove distprep

A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were

Re: pg_dump needs SELECT privileges on irrelevant extension table

2023-07-14 Thread Akshat Jaimini

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Passes the default cases; Does not make any trivial changes to the codebase

Re: Allowing parallel-safe initplans

2023-07-14 Thread Richard Guo

On Fri, Jul 14, 2023 at 5:44 AM Tom Lane  wrote:

> I tried both of those and concluded they'd be too messy for a patch
> that we might find ourselves having to back-patch.  So 0001 attached
> fixes it by teaching SS_finalize_plan to treat optimized MIN()/MAX()
> aggregates as if they were already Params.  It's slightly annoying
> to have knowledge of that optimization metastasizing into another
> place, but the alternatives are even less palatable.


I tried with 0001 patch and can confirm that the wrong result issue
shown in [1] is fixed.

explain (costs off, verbose) select min(i) from a;
QUERY PLAN
---
 Gather
   Output: ($0)
   Workers Planned: 1
   Params Evaluated: $0   < initplan params
   Single Copy: true
   InitPlan 1 (returns $0)
 ->  Limit
   Output: a.i
   ->  Index Only Scan using a_i_j_idx on public.a
 Output: a.i
 Index Cond: (a.i IS NOT NULL)
   ->  Result
 Output: $0
(13 rows)

Now the Gather.initParam is filled and e89a71fb4 does its work to
transmit the Params to workers.

So +1 to 0001 patch.


> I'm still resistant to the idea of kluging EXPLAIN to the extent
> of hiding the EXPLAIN output changes.  It wouldn't be that hard
> to do really, but I worry that such a kluge might hide real problems
> in future.  So what I did in 0002 was to allow initPlans for an
> injected Gather only if debug_parallel_query = on, so that there
> will be a place for EXPLAIN to show them.  Other than the changes
> in that area, 0002 is the same as the previous patch.


Also +1 to 0002 patch.

[1]
https://www.postgresql.org/message-id/CAMbWs48p-WpnLdR9ZQ4QsHZP_a-P0rktAYo4Z3uOHUAkH3fjQg%40mail.gmail.com

Thanks
Richard

Re: logical decoding and replication of sequences, take 2

On Thu, Jul 13, 2023 at 9:47 PM Tomas Vondra
 wrote:

>
> >>
> >> 6) simplified protocol versioning
> >
> > I had tested the cross-version logical replication with older set of
> > patches. Didn't see any unexpected behaviour then. I will test again.
> >>
>
> I think the question is what's the expected behavior. What behavior did
> you expect/observe?

Let me run my test again and respond.

>
> IIRC with the previous version of the patch, if you connected an old
> subscriber (without sequence replication), it just ignored/skipped the
> sequence increments and replicated the other changes.

I liked that.

>
> The new patch detects that, and triggers ERROR on the publisher. And I
> think that's the correct thing to do.

With this behaviour users will never be able to setup logical
replication between old and new servers considering almost every setup
has sequences.

>
> There was a lengthy discussion about making this more flexible (by not
> tying this to "linear" protocol version) and/or permissive. I tried
> doing that by doing similar thing to decoding of 2PC, which allows
> choosing when creating a subscription.
>
> But ultimately that just chooses where to throw an error - whether on
> the publisher (in the output plugin callback) or on apply side (when
> trying to apply change to non-existent sequence).

I had some comments on throwing error in [1], esp. towards the end.

>
> I still think it might be useful to have these "capabilities" orthogonal
> to the protocol version, but it's a matter for a separate patch. It's
> enough not to fail with "unknown message" on the subscriber.

Yes, We should avoid breaking replication with "unknown message".

I also agree that improving things in this area can be done in a
separate patch, but as far as possible in this release itself.

> > If the DDL replication takes care of replicating and applying sequence
> > changes, I think we don't need the changes tracking "transactional"
> > sequence changes in this patch-set. That also makes a case for not
> > adding a new field to WAL which may not be used.
> >
>
> Maybe, but the DDL replication patch is not there yet, and I'm not sure
> it's a good idea to make this patch wait for a much larger/complex
> patch. If the DDL replication patch gets committed, it may ditch this
> part (assuming it happens in the same development cycle).
>
> However, my impression was DDL replication would be optional. In which
> case we still need to handle the transactional case, to support sequence
> replication without DDL replication enabled.

As I said before, I don't think this patchset needs to wait for DDL
replication patch. Let's hope that the later lands in the same release
and straightens protocol instead of carrying it forever.

[1] 
https://www.postgresql.org/message-id/CAExHW5vScYKKb0RZoiNEPfbaQ60hihfuWeLuZF4JKrwPJXPcUw%40mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat

Re: Changing types of block and chunk sizes in memory contexts

2023-07-14 Thread Melih Mutlu

Hi David,

David Rowley , 13 Tem 2023 Per, 08:04 tarihinde şunu
yazdı:

> I looked at your v2 patch. The only thing that really looked wrong
> were the (Size) casts in the context creation functions.  These should
> have been casts to uint32 rather than Size. Basically, all the casts
> do is say to the compiler "Yes, I know this could cause truncation due
> to assigning to a size smaller than the source type's size". Some
> compilers will likely warn without that and the cast will stop them.
> We know there can't be any truncation due to the Asserts. There's also
> the fundamental limitation that MemoryChunk can't store block offsets
> larger than 1GBs anyway, so things will go bad if we tried to have
> blocks bigger than 1GB.
>

Right! I don't know why I cast them to Size. Thanks for the fix.

Best,
-- 
Melih Mutlu
Microsoft

Re: logical decoding and replication of sequences, take 2