Re: Perform streaming logical transactions by background workers and parallel apply

2022-05-04 Thread Peter Smith
Here are my review comments for v5-0001.

I will take a look at the v5-0002 (TAP) patch another time.

==

1. Commit message

The message still refers to "apply background". Should that say "apply
background worker"?

Other parts just call this the "worker". Personally, I think it might
be better to coin some new term for this thing (e.g. "apply-bgworker"
or something like that of your choosing) so then you can just
concisely *always* refer to that everywhere without any ambiguity. e.g
same applies to every comment and every message in this patch. They
should all use identical terminology (e.g. "apply-bgworker").

~~~

2. Commit message

"We also need to allow stream_stop to complete by the apply background
to finish it to..."

Wording: ???

~~~

3. Commit message

This patch also extends the subscription streaming option so that user
can control whether apply the streaming transaction in a apply
background or spill the change to disk.

Wording: "user" -> "the user"
Typo: "whether apply" -> "whether to apply"
Typo: "a apply" -> "an apply"

~~~

4. Commit message

User can set the streaming option to 'on/off', 'apply'. For now,
'apply' means the streaming will be applied via a apply background if
available. 'on' means the streaming transaction will be spilled to
disk.


I think "apply" might not be the best choice of values for this
meaning, but I think Hou-san already said [1] that this was being
reconsidered.

~~~

5. doc/src/sgml/catalogs.sgml - formatting

@@ -7863,11 +7863,15 @@ SCRAM-SHA-256$iteration
count:

  
   
-   substream bool
+   substream char
   
   
-   If true, the subscription will allow streaming of in-progress
-   transactions
+   Controls how to handle the streaming of in-progress transactions.
+   f = disallow streaming of in-progress transactions
+   o = spill the changes of in-progress transactions to
+   disk and apply at once after the transaction is committed on the
+   publisher.
+   a = apply changes directly using a background worker
   
  

Needs to be consistent with other value lists on this page.

5a. The first sentence to end with ":"

5b. List items to end with ","

~~~

6. doc/src/sgml/ref/create_subscription.sgml

+ 
+  If set to apply incoming
+  changes are directly applied via one of the background worker, if
+  available. If no background worker is free to handle streaming
+  transaction then the changes are written to a file and applied after
+  the transaction is committed. Note that if error happen when applying
+  changes in background worker, it might not report the finish LSN of
+  the remote transaction in server log.
  

6a. Typo: "one of the background worker," -> "one of the background workers,"

6b. Wording
BEFORE
Note that if error happen when applying changes in background worker,
it might not report the finish LSN of the remote transaction in server
log.
SUGGESTION
Note that if an error happens when applying changes in a background
worker, it might not report the finish LSN of the remote transaction
in the server log.

~~~

7. src/backend/commands/subscriptioncmds.c - defGetStreamingMode

+static char
+defGetStreamingMode(DefElem *def)
+{
+ /*
+ * If no parameter given, assume "true" is meant.
+ */
+ if (def->arg == NULL)
+ return SUBSTREAM_ON;

But is that right? IIUC all the docs said that the default is OFF.

~~~

8. src/backend/commands/subscriptioncmds.c - defGetStreamingMode

+ /*
+ * The set of strings accepted here should match up with the
+ * grammar's opt_boolean_or_string production.
+ */
+ if (pg_strcasecmp(sval, "true") == 0 ||
+ pg_strcasecmp(sval, "on") == 0)
+ return SUBSTREAM_ON;
+ if (pg_strcasecmp(sval, "apply") == 0)
+ return SUBSTREAM_APPLY;
+ if (pg_strcasecmp(sval, "false") == 0 ||
+ pg_strcasecmp(sval, "off") == 0)
+ return SUBSTREAM_OFF;

Perhaps should re-order these OFF/ON/APPLY to be consistent with the
T_Integer case above here.

~~~

9. src/backend/replication/logical/launcher.c - logicalrep_worker_launch

The "start new apply background worker ..." function comment feels a
bit misleading now that seems what you are calling this new kind of
worker. E.g. this is also called to start the sync worker. And also
for the apply worker (which we are not really calling a "background
worker" in other places). This comment is the same as [PSv4] #19.

~~~

10. src/backend/replication/logical/launcher.c - logicalrep_worker_launch

@@ -275,6 +280,9 @@ logicalrep_worker_launch(Oid dbid, Oid subid,
const char *subname, Oid userid,
  int nsyncworkers;
  TimestampTz now;

+ /* We don't support table sync in subworker */
+ Assert(!((subworker_dsm != DSM_HANDLE_INVALID) && OidIsValid(relid)));

I think you should declare a new variable like:
bool is_subworker = subworker_dsm != DSM_HANDLE_INVALID;

Then this Assert can be simplified, and also you can re-use the
'is_subworker' later multiple times in 

Re: Skipping schema changes in publication

2022-05-04 Thread Amit Kapila
On Thu, May 5, 2022 at 9:20 AM Amit Kapila  wrote:
>
> On Wed, May 4, 2022 at 7:05 PM Peter Eisentraut
>  wrote:
> >
> > On 14.04.22 15:47, Peter Eisentraut wrote:
> > > That said, I'm not sure this feature is worth the trouble.  If this is
> > > useful, what about "whole database except these schemas"?  What about
> > > "create this database from this template except these schemas".  This
> > > could get out of hand.  I think we should encourage users to group their
> > > object the way they want and not offer these complicated negative
> > > selection mechanisms.
> >
> > Another problem in general with this "all except these" way of
> > specifying things is that you need to track negative dependencies.
> >
> > For example, assume you can't add a table to a publication unless it has
> > a replica identity.  Now, if you have a publication p1 that says
> > includes "all tables except t1", you now have to check p1 whenever a new
> > table is created, even though the new table has no direct dependency
> > link with p1.  So in more general cases, you would have to check all
> > existing objects to see whether their specification is in conflict with
> > the new object being created.
> >
>
> Yes, I think we should avoid adding such negative dependencies. We
> have carefully avoided such dependencies during row filter, column
> list work where we don't try to perform DDL time verification.
> However, it is not clear to me how this proposal is related to this
> example or in general about tracking negative dependencies?
>

I mean to say that even if we have such a restriction, it would apply
to "for all tables" or other publications as well. In your example,
consider one wants to Alter a table and remove its replica identity,
we have to check whether the table is part of any publication similar
to what we are doing for relation persistence in
ATPrepChangePersistence.

> AFAIR, we
> currently have such a check while changing persistence of logged table
> (logged to unlogged, see ATPrepChangePersistence) where we cannot
> allow changing persistence if that relation is part of some
> publication. But as per my understanding, this feature shouldn't add
> any such new dependencies. I agree that we have to ensure that
> existing checks shouldn't break due to this feature.
>
> > Now publications don't actually work that way, so it's not a real
> > problem right now, but similar things could work like that.  So I think
> > it's worth thinking this through a bit.
> >
>
> This is a good point and I agree that we should be careful to not add
> some new negative dependencies unless it is really required but I
> can't see how this proposal will make it more prone to such checks.
>

-- 
With Regards,
Amit Kapila.




Re: Proposal for internal Numeric to Uint64 conversion function.

2022-05-04 Thread Amul Sul
On Tue, May 3, 2022 at 8:04 PM Peter Eisentraut
 wrote:
>
> On 03.05.22 08:50, Amul Sul wrote:
> >> Do you have any data that supports removing DirectionFunctionCall()
> >> invocations?  I suppose some performance benefit could be expected, or
> >> what do you have in mind?
> >>
> > Not really, the suggestion to avoid DirectionFunctionCall() is from Tom.
> >
> > For a trial, I have called the money(numeric) function 10M times to
> > see the difference with and without patch but there is not much.
> > (I don't have any knowledge of micro profiling/benchmarking).
>
> Ok.  I have lost track of what you are trying to achieve with this patch
> set.  It's apparently not for performance, and in terms of refactoring
> you end up with more lines of code than before, so that doesn't seem too
> appealing either.  So I'm not sure what the end goal is.
>

Well, that is still worth it, IMHO.

Refactoring allows us to place numeric_pg_lsn to the correct file
where it should be. This refactoring, giving a function that converts
numeric to uint64. The same function is used in other routines of
pg_lsn.c which is otherwise using DirectFunctionCall().

Refactoring of numeric_int8() giving a function to convert numeric
to int64 which can be used at the many places without using
DirectFuctionCall(), and that is not the sole reason of it -- we have
a function that converts int64 to numeric but for the opposite
conversion needs to use DirectFuctionCall() which doesn't make
sense.

Along with this refactoring there are different routing calling
functions for the numeric arithmetic operation through
DirectFunctionCall() which isn't necessary since the required
arithmetic functions are already accessible.

The last benefit that is not completely worthless is avoiding
DirectFunctionCall() one less function call stack and one less work
that the system needs to do the same task.

Regards,
Amul




Re: Skipping schema changes in publication

2022-05-04 Thread Amit Kapila
On Wed, May 4, 2022 at 7:05 PM Peter Eisentraut
 wrote:
>
> On 14.04.22 15:47, Peter Eisentraut wrote:
> > That said, I'm not sure this feature is worth the trouble.  If this is
> > useful, what about "whole database except these schemas"?  What about
> > "create this database from this template except these schemas".  This
> > could get out of hand.  I think we should encourage users to group their
> > object the way they want and not offer these complicated negative
> > selection mechanisms.
>
> Another problem in general with this "all except these" way of
> specifying things is that you need to track negative dependencies.
>
> For example, assume you can't add a table to a publication unless it has
> a replica identity.  Now, if you have a publication p1 that says
> includes "all tables except t1", you now have to check p1 whenever a new
> table is created, even though the new table has no direct dependency
> link with p1.  So in more general cases, you would have to check all
> existing objects to see whether their specification is in conflict with
> the new object being created.
>

Yes, I think we should avoid adding such negative dependencies. We
have carefully avoided such dependencies during row filter, column
list work where we don't try to perform DDL time verification.
However, it is not clear to me how this proposal is related to this
example or in general about tracking negative dependencies? AFAIR, we
currently have such a check while changing persistence of logged table
(logged to unlogged, see ATPrepChangePersistence) where we cannot
allow changing persistence if that relation is part of some
publication. But as per my understanding, this feature shouldn't add
any such new dependencies. I agree that we have to ensure that
existing checks shouldn't break due to this feature.

> Now publications don't actually work that way, so it's not a real
> problem right now, but similar things could work like that.  So I think
> it's worth thinking this through a bit.
>

This is a good point and I agree that we should be careful to not add
some new negative dependencies unless it is really required but I
can't see how this proposal will make it more prone to such checks.

-- 
With Regards,
Amit Kapila.




Re: pg_stat_statements

2022-05-04 Thread Julien Rouhaud
Hi,

On Tue, May 03, 2022 at 01:30:32PM +, Godfrin, Philippe E wrote:
>
> I wasn't exactly clear about the queries. The values clauses themselves are 
> not long -
> We are using repeated values clauses:
>
> INSERT INTO timeseries.dvc_104 (tag_id, event_ts, bool_val, float_val, 
> int_val, string_val, last_updt_id)
> VALUES 
> ($1,$2,$3,$4,$5,$6,$7),($8,$9,$10,$11,$12,$13,$14),($15,$16,$17,$18,$19,$20,$21),
> ($22,$23,$24,$25,$26,$27,$28),($29,$30,$31,$32,$33,$34,$35),($36,$37,$38,$39,$40,$41,$42),
> ($43,$44,$45,$46,$47,$48,$49),($50,$51,$52,$53,$54,$55,$56),($57,$58,$59,$60,$61,$62,$63),
> ($64,$65,$66,$67,$68,$69,$70),($71,$72,$73,$74,$75,$76,$77),($78,$79,$80,$81,$82,$83,$84),
> ($85,$86,$87,$88,$89,$90,$91),($92,$93,$94,$95,$96,$97,$98)
>
> This one's not long, but some 'load statements' have 10,000 values clauses, 
> others add up to 10,000 more
> in an ON CONFLICT clause. I've checked the external Query file and it's 
> currently not large
> at all. But I will keep an eye on that. When I had The settings at 1000 
> statements
> the file was indeed over 1GB. For the record, development is reducing those 
> statement
> lengths.
> [...]
> The first observation is how long a simple query took:
>
> # select count(*) from pg_stat_statements;
> count
> ---
>971
> Time: 6457.985 ms (00:06.458)
>
> MORE than six seconds for a mere 971 rows! Furthermore, when removing the 
> long queries:
> # select count(*) from pg_stat_statements(showtext:=false);
> count
> ---
>970
> Time: 10.644 ms
>
> Only 10ms...

Well, 10ms is still quite slow.

You're not removing the long queries texts, you're removing all queries texts.
I don't know if the overhead comes from processing at least some long
statements or is mostly due to having to retrieve the query file.  Do you get
the same times if you run the query twice?  Maybe you're short on RAM and have
somewhat slow disks, and the text file has to be read from disk rather than OS
cache?

Also I don't know what you mean by "not large at all", so it's hard to compare
or try to reproduce.  FWIW on some instance I have around, I have a 140kB file
and querying pg_stat_statements *with* the query text file only takes a few ms.

You could try to query that view with some unprivileged user.  This way you
will still retrieve the query text file but will only emit "" rather than processing the query texts, this may narrow down the
problem.  Or better, if you could run perf [1] to see where the overhead really
is.

> Second, we have Datadog installed. Datadoq queries the pg_stat_statements 
> table
> every 10 seconds. The real pain point is querying the pg_stat_statements seems
> to have an impact on running queries, specifically inserts in my case.

I think this is a side effect of having a very low pg_stat_statements.max, if
of course you have more queries than the current value.

If the extra time is due to loading the query text file and if it's loaded
after acquiring the lightweight lock, then you will prevent evicting or
creating new entries for a long time, which means that the query execution for
those queries will be blocked until the query on pg_stat_statements ends.

There are unfortunately *a lot* of unknowns here, so I can't do anything apart
from guessing.

> I believe this is an actual impact that needs a solution.

First, if you have an OLTP workload you have to make sure that
pg_stat_statements.max is high enough so that you don't have to evict entries,
or at least not often.  Then, I think that querying pg_stat_statements every
10s is *really* aggressive, that's always going to have some noticeable
overhead.  For the rest, we need more information to understand where the
slowdown is coming from.

[1] https://wiki.postgresql.org/wiki/Profiling_with_perf




Re: Atomic GetFreeIndexPage()?

2022-05-04 Thread Peter Geoghegan
On Wed, May 4, 2022 at 12:16 PM Chris Cleveland
 wrote:
> Similar code is repeated in a bunch of places. Each access method has to 
> explicitly write something into a freed page that identifies it as ok to use.

I wouldn't say that that's what this code is doing, though I do see
what you mean. The reason why the ultimate consumer of the free page
writes to it is becausewell, it wouldn't have asked for a page if
it didn't have something to write.

Code like GinPageIsRecyclable() is generally concerned with something
called recycle safety, typically using something called the drain
technique. That's what this code is primarily concerned about. The
definition of recycle safety is documented in the nbtree README.

> Otherwise, you could have two processes get the same page, then one locks it, 
> writes it, and unlocks it, then the second one does the same clobbering the 
> first.

But you could have that anyway, since the FSM isn't crash safe. It's
inherently not completely trustworthy for that reason.

Actually, the FSM shouldn't really contain a page that is !
GinPageIsRecyclable() to begin with, but that is still possible for a
variety of reasons. All of which boil down to "the current FSM design
cannot be totally trusted, so we verify redundantly".

-- 
Peter Geoghegan




Re: Costing elided SubqueryScans more nearly correctly

2022-05-04 Thread Tom Lane
I wrote:
> I instrumented the code in setrefs.c, and found that during the
> core regression tests this patch estimates correctly in 2103
> places while guessing wrongly in 54, so that seems like a pretty
> good step forward.

On second thought, that's not a terribly helpful summary.  Breaking
things down to the next level, there were

   1088 places where we correctly guessed a subquery isn't trivial
(so no change from current behavior, which is correct)

   1015 places where we correctly guessed a subquery is trivial
(hence, improving the cost estimate from before)

 40 places where we incorrectly guessed a subquery isn't trivial
(so no change from current behavior, although that's wrong)

 14 places where we incorrectly guessed a subquery is trivial
(hence, incorrectly charging zero for the SubqueryScan)

1015 improvements to 14 disimprovements isn't a bad score.  I'm
a bit surprised there are that many removable SubqueryScans TBH;
maybe that's an artifact of all the "SELECT *" queries.

regards, tom lane




Costing elided SubqueryScans more nearly correctly

2022-05-04 Thread Tom Lane
In [1] I complained about how SubqueryScans that get deleted from
a plan tree by setrefs.c nonetheless contribute cost increments
that might cause the planner to make odd choices.  That turned
out not to be the proximate cause of that particular issue, but
it still seems like it might be a good idea to do something about
it.  Here's a little patch to improve matters.

It turns out to be hard to predict perfectly whether setrefs.c will
remove a SubqueryScan, because createplan.c plays some games with
moving tlist evaluations around and/or inserting "physical"
(i.e. trivial) tlists, which can falsify any earlier estimate of
whether a SubqueryScan is trivial.  I'm not especially interested in
redesigning those mechanisms, so the best we can hope for is an
approximate determination.  (Those same behaviors also make a lot of
other path cost estimates a bit incorrect, so it doesn't seem like
we need to feel too awful about not getting SubqueryScan perfect.)

Given that ground rule, however, it's not very hard to determine
whether a SubqueryScanPath looks like it will be trivial and change
its costing accordingly.  The attached draft patch does that.

I instrumented the code in setrefs.c, and found that during the
core regression tests this patch estimates correctly in 2103
places while guessing wrongly in 54, so that seems like a pretty
good step forward.

Perhaps I overcomplicated matters by making the callers responsible
for determining triviality of the paths' targets.  We could just
make cost_subqueryscan check that for itself (using logic similar
to what I wrote in set_subquery_pathlist), but that'd result in
duplicative calculations anytime we make more than one Path for a
subquery.  On the other hand, said calculations wouldn't be that
expensive, so perhaps a more localized patch would be better.

I also notice that setrefs.c can elide Append and MergeAppend nodes
too in some cases, but AFAICS costing of those node types doesn't
take that into account.

Anyway, I'll stick this in the queue for v16.

regards, tom lane

[1] https://www.postgresql.org/message-id/328872.1651247595%40sss.pgh.pa.us

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d84f66a81b..08f4d125e1 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2439,6 +2439,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 {
 	Query	   *parse = root->parse;
 	Query	   *subquery = rte->subquery;
+	bool		trivial_pathtarget;
 	Relids		required_outer;
 	pushdown_safety_info safetyInfo;
 	double		tuple_fraction;
@@ -2597,6 +2598,36 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	 */
 	set_subquery_size_estimates(root, rel);
 
+	/*
+	 * Also detect whether the reltarget is trivial, so that we can pass that
+	 * info to cost_subqueryscan (rather than re-deriving it multiple times).
+	 * It's trivial if it fetches all the subplan output columns in order.
+	 */
+	if (list_length(rel->reltarget->exprs) != list_length(subquery->targetList))
+		trivial_pathtarget = false;
+	else
+	{
+		trivial_pathtarget = true;
+		foreach(lc, rel->reltarget->exprs)
+		{
+			Node	   *node = (Node *) lfirst(lc);
+			Var		   *var;
+
+			if (!IsA(node, Var))
+			{
+trivial_pathtarget = false;
+break;
+			}
+			var = (Var *) node;
+			if (var->varno != rti ||
+var->varattno != foreach_current_index(lc) + 1)
+			{
+trivial_pathtarget = false;
+break;
+			}
+		}
+	}
+
 	/*
 	 * For each Path that subquery_planner produced, make a SubqueryScanPath
 	 * in the outer query.
@@ -2615,6 +2646,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		/* Generate outer path using this subpath */
 		add_path(rel, (Path *)
  create_subqueryscan_path(root, rel, subpath,
+		  trivial_pathtarget,
 		  pathkeys, required_outer));
 	}
 
@@ -2640,6 +2672,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 			/* Generate outer path using this subpath */
 			add_partial_path(rel, (Path *)
 			 create_subqueryscan_path(root, rel, subpath,
+	  trivial_pathtarget,
 	  pathkeys,
 	  required_outer));
 		}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 6673d271c2..efb58f083a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1388,10 +1388,12 @@ cost_tidrangescan(Path *path, PlannerInfo *root,
  *
  * 'baserel' is the relation to be scanned
  * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'trivial_pathtarget' is true if the pathtarget is believed to be trivial.
  */
 void
 cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
-  RelOptInfo *baserel, ParamPathInfo *param_info)
+  RelOptInfo *baserel, ParamPathInfo *param_info,
+  bool trivial_pathtarget)
 {
 	Cost		startup_cost;
 	Cost		run_cost;
@@ -1431,6 

Re: SQL/JSON: FOR ORDINALITY bug

2022-05-04 Thread David G. Johnston
On Wed, May 4, 2022 at 1:43 PM David G. Johnston 
wrote:

> On Wed, May 4, 2022 at 1:09 PM Erik Rijkers  wrote:
>
>> Op 04-05-2022 om 21:12 schreef Andrew Dunstan:
>> >
>> 
>>  I don't see how rowseq can be anything but 1.  Each invocation of
>> >>
>> >>
>> >> After some further experimentation, I now think you must be right,
>> David.
>> >>
>> >> Also, looking at the DB2 docs:
>> >>https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
>> >>  (see especially under 'Handling nested information')
>>
>> You've probably noticed then that on that same page under 'Sibling
>> Nesting' is a statement that gives a 13-row resultset on DB2 whereas in
>> 15devel that statement yields just 10 rows.  I don't know which is
>> correct.
>>
>>
> There should be 12 results (minimum would be 8 - 5 of which are used for
> real matches, plus 4 new row producing matches).
>
> Our result seems internally inconsistent; conceptually there are two kinds
> of nulls here and we cannot collapse them.
>


> null-val: we are outputting the record from the nested path but there is
> no actual value to output so we output null-val
> null-union: we are not outputting the record for the nested path (we are
> doing a different one) but we need to output something for this column so
> we output null-union.
>
>
Thinking this over - I think the difference is we implemented a FULL OUTER
JOIN to combine the siblings - including the behavior of that construct and
the absence of rows.  DB2 took the word "UNION" for the plan modifier
literally and unioned (actually union all) the two subpaths together using
the null concepts above (though somehow ensuring that at least one row was
produced from each subpath...).

Thus we are indeed back to seeing whether the standard defines sibling
combining as union or join, or some other special construct.  I'm now
leaning toward what we've done as at least being the more sane option.

Even if our outer join process is correct the existing wording is odd.

"Use FULL OUTER JOIN ON FALSE, so that both parent and child rows are
included into the output, with NULL values inserted into both child and
parent columns for all missing values."

I don't think it helps to mention parent here.  This aspect of plan doesn't
concern itself with the final output, only the output of the subplan which
is then combined with the parent using a join.  I would probably want to
phrase the default more like:

"This is the default option for joining the combined child rows to the
parent."

David J.


Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-05-04 Thread David Zhang

Thanks a lot for the patch update.

On 2022-05-02 1:25 a.m., Etsuro Fujita wrote:

Hi,

On Wed, Apr 20, 2022 at 4:55 AM David Zhang  wrote:

I tried to apply the patch to master and plan to run some tests, but got
below errors due to other commits.

I rebased the patch against HEAD.  Attached is an updated patch.


Applied the patch v8 to master branch today, and the `make check` is OK. 
I also repeated previous performance tests on three virtual Ubuntu 
18.04, and the performance improvement of parallel abort in 10 times 
average is more consistent.


before:

    abort.1 = 2.6344 ms
    abort.2 = 4.2799 ms

after:

    abort.1 = 1.4105 ms
    abort.2 = 2.2075 ms


+ * remote server in parallel at (sub)transaction end.

Here, I think the comment above could potentially apply to multiple
remote server(s).

I agree on that point, but I think it's correct to say "the remote
server" here, because we determine this for the given remote server.
Maybe I'm missing something, so could you elaborate on it?
Your understanding is correct. I was thinking `remote server(s)` would 
be easy for end user to understand but this is a comment in source code, 
so either way is fine for me.

Not sure if there is a way to avoid repeated comments? For example, the
same comment below appears in two places (line 231 and line 296).

+/*
+ * If requested, consume whatever data is available from the socket.
+ * (Note that if all data is available, this allows
+ * pgfdw_get_cleanup_result to call PQgetResult without forcing the
+ * overhead of WaitLatchOrSocket, which would be large compared to the
+ * overhead of PQconsumeInput.)
+ */

IMO I think it's OK to have this in multiple places, because 1) IMO it
wouldn't be that long, and 2) we already duplicated comments like this
in the same file in v14 and earlier.  Here is such an example in
pgfdw_xact_callback() and pgfdw_subxact_callback() in that file in
those versions:

 /*
  * If a command has been submitted to the remote server by
  * using an asynchronous execution function, the command
  * might not have yet completed.  Check to see if a
  * command is still being processed by the remote server,
  * and if so, request cancellation of the command.
  */

Thanks for reviewing!  Sorry for the delay.

Best regards,
Etsuro Fujita

--
Best regards,
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: SQL/JSON: FOR ORDINALITY bug

2022-05-04 Thread David G. Johnston
On Wed, May 4, 2022 at 1:09 PM Erik Rijkers  wrote:

> Op 04-05-2022 om 21:12 schreef Andrew Dunstan:
> >
> 
>  I don't see how rowseq can be anything but 1.  Each invocation of
> >>
> >>
> >> After some further experimentation, I now think you must be right,
> David.
> >>
> >> Also, looking at the DB2 docs:
> >>https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
> >>  (see especially under 'Handling nested information')
> >>
> >> There, I gathered some example data + statements where one is the case
> >> at hand.  I also made them runnable under postgres (attached).
> >>
> >> I thought that was an instructive example, with those
> >> 'outer_ordinality' and 'inner_ordinality' columns.
> >>
> >>
> >
> > Yeah, I just reviewed the latest version of that page (7.5) and the
> > example seems fairly plain that we are doing the right thing, or if not
> > we're in pretty good company, so I guess this is probably a false alarm.
> > Looks like ordinality is for the number of the element produced by the
> > path expression. So a path of 'lax $' should just produce ordinality of
> > 1 in each case, while a path of 'lax $[*]' will produce increasing
> > ordinality for each element of the root array.
>
> Agreed.
>
> You've probably noticed then that on that same page under 'Sibling
> Nesting' is a statement that gives a 13-row resultset on DB2 whereas in
> 15devel that statement yields just 10 rows.  I don't know which is correct.
>
>
There should be 12 results (minimum would be 8 - 5 of which are used for
real matches, plus 4 new row producing matches).

Our result seems internally inconsistent; conceptually there are two kinds
of nulls here and we cannot collapse them.

null-val: we are outputting the record from the nested path but there is no
actual value to output so we output null-val
null-union: we are not outputting the record for the nested path (we are
doing a different one) but we need to output something for this column so
we output null-union.

Sally, null-val, null-union
Sally, null-union, null-val

We only have one Sally but need both (11)

We are also missing:

Mary, null-union, null-val (12)

The fact that we agree on John means that we at least agree on UNION
meaning we output a pair of rows when there are two nested paths.

I point to relative comparisons for fear of reading the specification
here...

David J.


David J.


Re: SQL/JSON: FOR ORDINALITY bug

2022-05-04 Thread Erik Rijkers

Op 04-05-2022 om 21:12 schreef Andrew Dunstan:




I don't see how rowseq can be anything but 1.  Each invocation of



After some further experimentation, I now think you must be right, David.

Also, looking at the DB2 docs:
   https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
     (see especially under 'Handling nested information')

There, I gathered some example data + statements where one is the case
at hand.  I also made them runnable under postgres (attached).

I thought that was an instructive example, with those
'outer_ordinality' and 'inner_ordinality' columns.




Yeah, I just reviewed the latest version of that page (7.5) and the
example seems fairly plain that we are doing the right thing, or if not
we're in pretty good company, so I guess this is probably a false alarm.
Looks like ordinality is for the number of the element produced by the
path expression. So a path of 'lax $' should just produce ordinality of
1 in each case, while a path of 'lax $[*]' will produce increasing
ordinality for each element of the root array.


Agreed.

You've probably noticed then that on that same page under 'Sibling 
Nesting' is a statement that gives a 13-row resultset on DB2 whereas in 
15devel that statement yields just 10 rows.  I don't know which is correct.



Erik





cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com






Re: Did we intend to change whether PUBLIC can create tables in the public schema by default?

2022-05-04 Thread David G. Johnston
On Wed, May 4, 2022 at 12:42 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> Hey,
>
> For the following sequence of commands, on a newly initdb v15devel and
> mostly clean v13 I get a failure and a created table respectively.
>
>
Apparently I didn't search commit history well enough the first time...

https://github.com/postgres/postgres/commit/b073c3ccd06e4cb845e121387a43faa8c68a7b62

Sorry for the noise.

David J.


Did we intend to change whether PUBLIC can create tables in the public schema by default?

2022-05-04 Thread David G. Johnston
Hey,

For the following sequence of commands, on a newly initdb v15devel and
mostly clean v13 I get a failure and a created table respectively.

Showing v15devel:

postgres=# create database testdb;
CREATE DATABASE
postgres=# create role testrole;
CREATE ROLE
postgres=# \c testdb
You are now connected to database "testdb" as user "vagrant".
testdb=# set session authorization testrole;
SET
testdb=> create table public.testtable(id int);
ERROR:  permission denied for schema public
LINE 1: create table public.testtable(id int);
testdb=> select version();
 version
--
 PostgreSQL 15devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)


===
v13.6  (I also have a report this is the behavior of v14)

postgres=# create database testdb;
crCREATE DATABASE
postgres=# create role testrole;
CREATE ROLE
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# select version();
 version
--
 PostgreSQL 13.6 (Ubuntu 13.6-1.pgdg20.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
(1 row)

testdb=# set session authorization testrole;
SET
testdb=> create table public.testtable (id int);
CREATE TABLE


David J.


Re: Atomic GetFreeIndexPage()?

2022-05-04 Thread Tom Lane
Chris Cleveland  writes:
> Would it make sense to make GetFreeIndexPage() atomic?

I strongly doubt it.  The loss of concurrency would outweigh any
code-simplicity benefits.

regards, tom lane




Re: Query generates infinite loop

2022-05-04 Thread Tom Lane
Jeff Janes  writes:
> The regression test you added for this change causes an infinite loop when
> run against an unpatched server with --install-check.  That is a bit
> unpleasant.  Is there something we can and should do about that?  I was
> expecting regression test failures of course but not an infinite loop
> leading towards disk exhaustion.

We very often add regression test cases that will cause unpleasant
failures on unpatched code.  I categorically reject the idea that
that's not a good thing, and question why you think that running
known-broken code against a regression suite is an important use case.

regards, tom lane




Atomic GetFreeIndexPage()?

2022-05-04 Thread Chris Cleveland
Would it make sense to make GetFreeIndexPage() atomic?

Internally, GetFreeIndexPage() calls GetPageWithFreeSpace() and then
RecordUsedIndexPage() in two separate operations. It's possible for two
different processes to get the same free page at the same time.

To guard against this, there are several index access methods that do
something like this:

/* First, try to get a page from FSM */
for (;;)
{
BlockNumber blkno = GetFreeIndexPage(index);

if (blkno == InvalidBlockNumber)
break;

buffer = ReadBuffer(index, blkno);

/*
* We have to guard against the possibility that someone else already
* recycled this page; the buffer may be locked if so.
*/
if (ConditionalLockBuffer(buffer))
{
if (GinPageIsRecyclable(BufferGetPage(buffer)))
return buffer; /* OK to use */

LockBuffer(buffer, GIN_UNLOCK);
}

/* Can't use it, so release buffer and try again */
ReleaseBuffer(buffer);
}

Similar code is repeated in a bunch of places. Each access method has to
explicitly write something into a freed page that identifies it as ok to
use. Otherwise, you could have two processes get the same page, then one
locks it, writes it, and unlocks it, then the second one does the same
clobbering the first.

All this logic goes away if GetFreeIndexPage() is atomic, such that finding
and marking a page as not-free always happens in one go. No longer would
the index pages themselves need to be modified to identify them as
available.

I'm thinking of surrounding the code that calls GetFreeIndexPage() with a
lightweight lock, although I wonder if that would harm performance. Perhaps
there is a more performant way to do this deep down in the FSM code.

Thoughts?

-- 
Chris Cleveland
312-339-2677 mobile


Re: JSON Functions and Operators Docs for v15

2022-05-04 Thread Andrew Dunstan


On 2022-05-04 We 11:39, Tom Lane wrote:
> "David G. Johnston"  writes:
>> Is there a thread I'm not finding where the upcoming JSON function
>> documentation is being made reasonably usable after doubling its size with
>> all the new JSON Table features that we've added?  If nothing else, the
>> table of contents at the top of the page needs to be greatly expanded to
>> make seeing and navigating to all that is available a possibility.
> The entire structure of that text needs to be rethought, IMO, as it
> has been written with precisely no concern for fitting into our
> hard-won structure for func.sgml.  Andrew muttered something about
> rewriting it awhile ago, but I don't know what progress he's made.
>

Yes, I've been clearing the decks a bit, but I'm working on it now,
should have something within the next week.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: SQL/JSON: FOR ORDINALITY bug

2022-05-04 Thread Andrew Dunstan


On 2022-05-04 We 10:39, Erik Rijkers wrote:
> Op 04-05-2022 om 13:55 schreef Andrew Dunstan:
>>
>> On 2022-05-03 Tu 20:39, David G. Johnston wrote:
>>> On Tue, May 3, 2022 at 5:27 PM Andrew Dunstan 
>>> wrote:
>>>
>>>
>>>  On 2022-05-03 Tu 11:19, Erik Rijkers wrote:
>>>  > Hi
>>>  >
>>>  > I've copied some statements from the .pdf called:
>>>  > "TECHNICAL REPORT ISO/IEC TR 19075-6   First edition 2017-03
>>>  > Part SQL Notation support 6: (JSON) for JavaScript Object"
>>>  > (not available anymore although there should be a similar
>>>  replacement
>>>  > file)
>>>  >
>>>  > In that pdf I found the data and statement (called 'table 15'
>>> in the
>>>  > .pdf) as in the attached bash file.  But the result is
>>> different: as
>>>  > implemented by 15devel, the column rowseq is always 1.  It seems
>>>  to me
>>>  > that that is wrong; it should count 1, 2, 3 as indeed the
>>>  > example-result column in that pdf shows.
>>>  >
>>>  > What do you think?
>>>  >
>>>  >
>>>
>>>  Possibly.
>>>
>>>
>>> I don't see how rowseq can be anything but 1.  Each invocation of
>
>
> After some further experimentation, I now think you must be right, David.
>
> Also, looking at the DB2 docs:
>   https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
>     (see especially under 'Handling nested information')
>
> There, I gathered some example data + statements where one is the case
> at hand.  I also made them runnable under postgres (attached).
>
> I thought that was an instructive example, with those
> 'outer_ordinality' and 'inner_ordinality' columns.
>
>

Yeah, I just reviewed the latest version of that page (7.5) and the
example seems fairly plain that we are doing the right thing, or if not
we're in pretty good company, so I guess this is probably a false alarm.
Looks like ordinality is for the number of the element produced by the
path expression. So a path of 'lax $' should just produce ordinality of
1 in each case, while a path of 'lax $[*]' will produce increasing
ordinality for each element of the root array.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Query generates infinite loop

2022-05-04 Thread Jeff Janes
On Wed, Apr 20, 2022 at 5:43 PM Tom Lane  wrote:

> I wrote:
> > it's true that infinities as generate_series endpoints are going
> > to work pretty oddly, so I agree with the idea of forbidding 'em.
>
> > Numeric has infinity as of late, so the numeric variant would
> > need to do this too.
>
> Oh --- looks like numeric generate_series() already throws error for
> this, so we should just make the timestamp variants do the same.
>

The regression test you added for this change causes an infinite loop when
run against an unpatched server with --install-check.  That is a bit
unpleasant.  Is there something we can and should do about that?  I was
expecting regression test failures of course but not an infinite loop
leading towards disk exhaustion.

Cheers,

Jeff


Re: fix cost subqueryscan wrong parallel cost

2022-05-04 Thread Tom Lane
Robert Haas  writes:
> On Tue, May 3, 2022 at 2:13 PM Tom Lane  wrote:
>> In any case, fundamental redesign of what EXPLAIN prints is a job
>> for v16 or later.  Are you okay with the proposed patch as a v15 fix?

> Yes. I can't really vouch for it, but I don't object to it.

I re-read the patch and noticed that I'd missed one additional change
needed:

-   run_cost = cpu_per_tuple * baserel->tuples;
+   run_cost = cpu_per_tuple * path->subpath->rows;

That is, we should estimate the number of qual evaluations and
cpu_tuple_cost charges based on the subpath row count not the
non-parallel-aware relation size.  So this reduces the cost of
the SubqueryScan itself, as well as its row count, in partial paths.
I don't see any further change in regression test results though.

Pushed with that change.

regards, tom lane




Re: strange slow query - lost lot of time somewhere

2022-05-04 Thread Pavel Stehule
st 4. 5. 2022 v 16:08 odesílatel Jakub Wartak 
napsal:

>
> Additional three ways to figure that one (all are IMHO production safe):
> a) already mentioned perf with --call-graph dwarf -p PID
> b) strace -p PID -e 'mmap' # verify if mmap() NULL is not having
> MAP_ANONYMOUS flag, size of mmap() request will somehow match work_mem
> sizing
> c) gdb -p PID and then breakpoint for mmap and verify each mmap() # check
> MAP_ANONYMOUS as above
>
>
I have not debug symbols, so I have not more details now

Breakpoint 1 at 0x7f557f0c16c0
(gdb) c
Continuing.

Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
(gdb) bt
#0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
#1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
#2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
#3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
#4  0x00932134 in AllocSetAlloc ()
#5  0x009376cf in MemoryContextAllocExtended ()
#6  0x006ad915 in ExecInitMemoize ()
#7  0x0068dc02 in ExecInitNode ()
#8  0x006b37ff in ExecInitNestLoop ()
#9  0x0068dc56 in ExecInitNode ()
#10 0x006b37ff in ExecInitNestLoop ()
#11 0x0068dc56 in ExecInitNode ()
#12 0x006b37de in ExecInitNestLoop ()
#13 0x0068dc56 in ExecInitNode ()
#14 0x006b37de in ExecInitNestLoop ()
#15 0x0068dc56 in ExecInitNode ()
#16 0x00687e4d in standard_ExecutorStart ()
#17 0x7f5579b5ad2d in pgss_ExecutorStart () from
/usr/pgsql-14/lib/pg_stat_statements.so
#18 0x0062e643 in ExplainOnePlan ()
#19 0x0062e83d in ExplainOneQuery ()
#20 0x0062ee6f in ExplainQuery ()
#21 0x007f9b15 in standard_ProcessUtility ()
#22 0x7f5579b599c7 in pgss_ProcessUtility () from
/usr/pgsql-14/lib/pg_stat_statements.so
#23 0x007f7fef in PortalRunUtility ()
#24 0x007f83af in FillPortalStore ()
#25 0x007f86dd in PortalRun ()
#26 0x007f48cb in exec_simple_query ()
#27 0x007f610e in PostgresMain ()
#28 0x00776b8a in ServerLoop ()
#29 0x00777a03 in PostmasterMain ()
#30 0x004fe413 in main ()
(gdb) p
The history is empty.
(gdb) c
Continuing.

Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
(gdb) bt
#0  0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6
#1  0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6
#2  0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
#3  0x7f557f04fb1e in malloc () from /lib64/libc.so.6
#4  0x00932134 in AllocSetAlloc ()
#5  0x009376cf in MemoryContextAllocExtended ()
#6  0x006ad915 in ExecInitMemoize ()
#7  0x0068dc02 in ExecInitNode ()
#8  0x006b37ff in ExecInitNestLoop ()
#9  0x0068dc56 in ExecInitNode ()
#10 0x006b37ff in ExecInitNestLoop ()
#11 0x0068dc56 in ExecInitNode ()
#12 0x006b37de in ExecInitNestLoop ()
#13 0x0068dc56 in ExecInitNode ()
#14 0x00687e4d in standard_ExecutorStart ()
#15 0x7f5579b5ad2d in pgss_ExecutorStart () from
/usr/pgsql-14/lib/pg_stat_statements.so
#16 0x0062e643 in ExplainOnePlan ()
#17 0x0062e83d in ExplainOneQuery ()
#18 0x0062ee6f in ExplainQuery ()
#19 0x007f9b15 in standard_ProcessUtility ()
#20 0x7f5579b599c7 in pgss_ProcessUtility () from
/usr/pgsql-14/lib/pg_stat_statements.so
#21 0x007f7fef in PortalRunUtility ()
#22 0x007f83af in FillPortalStore ()
#23 0x007f86dd in PortalRun ()
#24 0x007f48cb in exec_simple_query ()
#25 0x007f610e in PostgresMain ()
#26 0x00776b8a in ServerLoop ()
#27 0x00777a03 in PostmasterMain ()
#28 0x004fe413 in main ()
(gdb) c
Continuing.

there was 2 hits of mmap

Regards

Pavel




> [1] -
> https://www.postgresql.org/message-id/CAFj8pRAo5CrF8mpPxMvnBYFSqu4HYDqRsQnLqGphckNHkHosFg%40mail.gmail.com
>
> -J.
>


Re: strange slow query - lost lot of time somewhere

2022-05-04 Thread Pavel Stehule
st 4. 5. 2022 v 2:15 odesílatel David Rowley  napsal:

> On Tue, 3 May 2022 at 17:02, Pavel Stehule 
> wrote:
> > út 3. 5. 2022 v 6:57 odesílatel Tom Lane  napsal:
> >> You sure there's not something taking an exclusive lock on one of these
> >> tables every so often?
> >
> > I am almost sure, I can see this issue only every time when I set a
> higher work mem. I don't see this issue in other cases.
>
> hmm, I don't think the query being blocked on a table lock would cause
> this behaviour. As far as I know, all table locks should be obtained
> before the timer starts for the "Execution Time" timer in EXPLAIN
> ANALYZE.  However, locks are obtained on indexes at executor startup,
> so if there was some delay in obtaining a lock on the index it would
> show up this way.  I just don't know of anything that obtains a
> conflicting lock on an index without the same conflicting lock on the
> table that the index belongs to.
>
> I do agree that the perf report does indicate that the extra time is
> taken due to some large amount of memory being allocated. I just can't
> quite see how that would happen in Memoize given that
> estimate_num_groups() clamps the distinct estimate as the number of
> input rows, which is 91 in both cases in your problem query.
>
> Are you able to run the Memoize query in psql with \watch 0.1 for a
> few seconds while you do:
>
> perf record --call-graph dwarf --pid  sleep 2
>
> then send along the perf report.
>
> I locally hacked build_hash_table() in nodeMemoize.c to make the
> hashtable 100 million elements and I see my perf report for a trivial
> Memoize query come up as:
>
> +   99.98% 0.00%  postgres  postgres   [.] _start
> +   99.98% 0.00%  postgres  libc.so.6  [.]
> __libc_start_main_alias_2 (inlined)
> +   99.98% 0.00%  postgres  libc.so.6  [.]
> __libc_start_call_main
> +   99.98% 0.00%  postgres  postgres   [.] main
> +   99.98% 0.00%  postgres  postgres   [.] PostmasterMain
> +   99.98% 0.00%  postgres  postgres   [.] ServerLoop
> +   99.98% 0.00%  postgres  postgres   [.] BackendStartup
> (inlined)
> +   99.98% 0.00%  postgres  postgres   [.] BackendRun (inlined)
> +   99.98% 0.00%  postgres  postgres   [.] PostgresMain
> +   99.98% 0.00%  postgres  postgres   [.] exec_simple_query
> +   99.98% 0.00%  postgres  postgres   [.] PortalRun
> +   99.98% 0.00%  postgres  postgres   [.] FillPortalStore
> +   99.98% 0.00%  postgres  postgres   [.] PortalRunUtility
> +   99.98% 0.00%  postgres  postgres   [.]
> standard_ProcessUtility
> +   99.98% 0.00%  postgres  postgres   [.] ExplainQuery
> +   99.98% 0.00%  postgres  postgres   [.] ExplainOneQuery
> +   99.95% 0.00%  postgres  postgres   [.] ExplainOnePlan
> +   87.87% 0.00%  postgres  postgres   [.]
> standard_ExecutorStart
> +   87.87% 0.00%  postgres  postgres   [.] InitPlan (inlined)
> +   87.87% 0.00%  postgres  postgres   [.] ExecInitNode
> +   87.87% 0.00%  postgres  postgres   [.] ExecInitNestLoop
> +   87.87% 0.00%  postgres  postgres   [.] ExecInitMemoize
> +   87.87% 0.00%  postgres  postgres   [.]
> build_hash_table (inlined) <
> +   87.87% 0.00%  postgres  postgres   [.] memoize_create
> (inlined)
> +   87.87% 0.00%  postgres  postgres   [.]
> memoize_allocate (inlined)
> +   87.87% 0.00%  postgres  postgres   [.]
> MemoryContextAllocExtended
> +   87.87% 0.00%  postgres  postgres   [.] memset (inlined)
>
> Failing that, are you able to pg_dump these tables and load them into
> a PostgreSQL instance that you can play around with and patch?
> Provided you can actually recreate the problem on that instance.
>

+   71,98%14,36%  postmaster  [kernel.kallsyms]  [k] page_fault
 ▒
+   70,19% 6,59%  postmaster  libc-2.28.so   [.]
__memset_avx2_erms  ▒
+   68,20% 0,00%  postmaster  postgres   [.] ExecInitNode
 ▒
+   68,20% 0,00%  postmaster  postgres   [.]
ExecInitNestLoop▒
+   68,13% 0,00%  postmaster  postgres   [.]
ExecInitMemoize ▒
+   68,13% 0,00%  postmaster  postgres   [.]
MemoryContextAllocExtended  ▒
+   63,20% 0,00%  postmaster  postgres   [.]
0x00776b89  ▒
+   63,20% 0,00%  postmaster  postgres   [.] PostgresMain
 ◆
+   63,03% 0,00%  postmaster 

TAP test fail: we don't always detect backend crashes

2022-05-04 Thread Tom Lane
I discovered while poking at an LDAP problem that our TAP tests are
100% reproducibly capable of ignoring server crashes and reporting
success anyway.  This problem occurs if the postmaster doesn't get
the child failure report until after shutdown has been initiated,
in which case you find something like this in the postmaster log:

2022-05-04 12:01:33.946 EDT [57945] [unknown] LOG:  connection received: 
host=[local]
2022-05-04 12:01:33.995 EDT [57945] [unknown] LOG:  connection authenticated: 
identity="uid=test1,dc=example,dc=net" method=ldap 
(/Users/tgl/pgsql/src/test/ldap/tmp_check/t_001_auth_node_data/pgdata/pg_hba.conf:1)
2022-05-04 12:01:33.995 EDT [57945] [unknown] LOG:  connection authorized: 
user=test1 database=postgres application_name=001_auth.pl
2022-05-04 12:01:33.998 EDT [57945] 001_auth.pl LOG:  statement: SELECT 
$$connected with user=test1$$
2022-05-04 12:01:34.003 EDT [57937] LOG:  received fast shutdown request
2022-05-04 12:01:34.003 EDT [57937] LOG:  aborting any active transactions
2022-05-04 12:01:34.003 EDT [57937] LOG:  background worker "logical 
replication launcher" (PID 57943) exited with exit code 1
2022-05-04 12:01:35.750 EDT [57937] LOG:  server process (PID 57945) was 
terminated by signal 11: Segmentation fault: 11
2022-05-04 12:01:35.751 EDT [57937] LOG:  terminating any other active server 
processes
2022-05-04 12:01:35.751 EDT [57937] LOG:  abnormal database system shutdown
2022-05-04 12:01:35.751 EDT [57937] LOG:  database system is shut down

Our TAP scripts don't notice the "abnormal database system shutdown",
so it looks like things have passed.  It's even worse when a script
demands an immediate shutdown, because then the postmaster won't wait
around for the child status.

If you have core dumps enabled, that adds some time before the child
exit status is delivered, making this scenario extremely probable.
I'm finding that src/test/ldap reports success 100% reproducibly
after doing "ulimit -c unlimited", even though four backend core dumps
are produced during the run.

I think that (a) at least by default, node->stop() ought to check for
normal shutdown status, and (b) immediate shutdown should only be used
in cases where we've already decided that the test failed.

regards, tom lane




Re: automatically generating node support functions

2022-05-04 Thread Alvaro Herrera
On 2022-May-04, Peter Eisentraut wrote:

> I have committed your change to the JsonTableColumnType enum and the removal
> of JsonPathSpec.

Thanks!

> Other than that and some whitespace changes, I didn't find anything in
> your 0002 patch that was different from my last submitted patch.  Did
> I miss anything?

No, I had just fixed one simple conflict IIRC, but I had made no other
changes.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Porque francamente, si para saber manejarse a uno mismo hubiera que
rendir examen... ¿Quién es el machito que tendría carnet?"  (Mafalda)




Re: automatically generating node support functions

2022-05-04 Thread Peter Eisentraut

On 19.04.22 13:40, Alvaro Herrera wrote:

I rebased this mostly out of curiousity.  I fixed some smallish
conflicts and fixed a typedef problem new in JSON support; however, even
with these fixes it doesn't compile, because JsonPathSpec uses a novel
typedef pattern that apparently will need bespoke handling in the
gen_nodes_support.pl script.  It seemed better to post this even without
that, though.


I have committed your change to the JsonTableColumnType enum and the 
removal of JsonPathSpec.  Other than that and some whitespace changes, I 
didn't find anything in your 0002 patch that was different from my last 
submitted patch.  Did I miss anything?





Re: JSON Functions and Operators Docs for v15

2022-05-04 Thread David G. Johnston
On Wed, May 4, 2022 at 8:39 AM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > Is there a thread I'm not finding where the upcoming JSON function
> > documentation is being made reasonably usable after doubling its size
> with
> > all the new JSON Table features that we've added?  If nothing else, the
> > table of contents at the top of the page needs to be greatly expanded to
> > make seeing and navigating to all that is available a possibility.
>
> The entire structure of that text needs to be rethought, IMO, as it
> has been written with precisely no concern for fitting into our
> hard-won structure for func.sgml.  Andrew muttered something about
> rewriting it awhile ago, but I don't know what progress he's made.
>
>
I suppose regardless of the answer, or which thread is used for the patch,
the question at hand is whether this is problematic enough to warrant an
open item.  I would lean toward yes, we can decide how much reworking is
considered sufficient to clear the open item separately.

David J.


Re: JSON Functions and Operators Docs for v15

2022-05-04 Thread Justin Pryzby
On Wed, May 04, 2022 at 08:32:51AM -0700, David G. Johnston wrote:
> Hey,
> 
> Is there a thread I'm not finding where the upcoming JSON function
> documentation is being made reasonably usable after doubling its size with
> all the new JSON Table features that we've added?  If nothing else, the
> table of contents at the top of the page needs to be greatly expanded to
> make seeing and navigating to all that is available a possibility.

https://www.postgresql.org/message-id/20220411160905.gh26...@telsasoft.com




Re: JSON Functions and Operators Docs for v15

2022-05-04 Thread Tom Lane
"David G. Johnston"  writes:
> Is there a thread I'm not finding where the upcoming JSON function
> documentation is being made reasonably usable after doubling its size with
> all the new JSON Table features that we've added?  If nothing else, the
> table of contents at the top of the page needs to be greatly expanded to
> make seeing and navigating to all that is available a possibility.

The entire structure of that text needs to be rethought, IMO, as it
has been written with precisely no concern for fitting into our
hard-won structure for func.sgml.  Andrew muttered something about
rewriting it awhile ago, but I don't know what progress he's made.

regards, tom lane




JSON Functions and Operators Docs for v15

2022-05-04 Thread David G. Johnston
Hey,

Is there a thread I'm not finding where the upcoming JSON function
documentation is being made reasonably usable after doubling its size with
all the new JSON Table features that we've added?  If nothing else, the
table of contents at the top of the page needs to be greatly expanded to
make seeing and navigating to all that is available a possibility.

David J.


Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-05-04 Thread Tom Lane
Peter Eisentraut  writes:
> On 28.04.22 18:50, Przemysław Sztoch wrote:
>> Current unnaccent dictionary does not include many popular numeric symbols,
>> in example: "m²" -> "m2"

> Seems reasonable.

It kinda feels like this is outside the charter of an "unaccent"
dictionary.  I don't object to having these conversions available
but it seems like it ought to be a separate feature.

regards, tom lane




Re: configure openldap crash warning

2022-05-04 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> I tried building with Homebrew-supplied openldap.  What ends up 
>> happening is that the postgres binary is indeed linked with openldap, 
>> but libpq still is linked against the OS-supplied LDAP framework. 
>> (Checked with "otool -L" in each case.)  Can someone else reproduce 
>> this, too?

> [ it works with MacPorts ]

Oh, I have a theory about this: I bet your Homebrew installation
has a recent OpenLDAP version that only supplies libldap not libldap_r.
In that case, configure will still find libldap_r available and will
bind libpq to it, and you get the observed result.  The configure
check is not sophisticated enough to realize that it's finding chunks
of two different OpenLDAP installations.

Not sure about a good fix.  If we had a way to detect which library
file AC_CHECK_LIB finds, we could verify that libldap and libldap_r
come from the same directory ... but I don't think we have that.

regards, tom lane




Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-05-04 Thread Peter Eisentraut

On 28.04.22 18:50, Przemysław Sztoch wrote:

Current unnaccent dictionary does not include many popular numeric symbols,
in example: "m²" -> "m2"


Seems reasonable.

Can you explain what your patch does to achieve this?




Re: configure openldap crash warning

2022-05-04 Thread Tom Lane
Peter Eisentraut  writes:
> On 02.05.22 16:03, Tom Lane wrote:
>> I'm not that excited about getting rid of this warning, because to the
>> extent that anyone notices it at all, it'll motivate them to get OpenLDAP
>> from Homebrew or MacPorts, which seems like a good thing.

> I tried building with Homebrew-supplied openldap.  What ends up 
> happening is that the postgres binary is indeed linked with openldap, 
> but libpq still is linked against the OS-supplied LDAP framework. 
> (Checked with "otool -L" in each case.)  Can someone else reproduce 
> this, too?

Hmm, I just tried it with up-to-date MacPorts, and it was a *complete*
fail: I got all the deprecation warnings (so the system include headers
were used), and both postgres and libpq.dylib still ended up linked
against /System/Library/Frameworks/LDAP.framework/Versions/A/LDAP.

But then I went "doh!" and added
  --with-includes=/opt/local/include --with-libraries=/opt/local/lib
to the configure call, and everything built the way I expected.
I'm not sure offhand if the docs include a reminder to do that when
using stuff out of MacPorts, or the equivalent for Homebrew.

We still have a bit of work to do, because this setup isn't getting
all the way through src/test/ldap/:

2022-05-04 11:01:33.407 EDT [21312] [unknown] LOG:  connection received: 
host=[local]
2022-05-04 11:01:33.457 EDT [21312] [unknown] LOG:  could not start LDAP TLS 
session: Operations error
2022-05-04 11:01:33.457 EDT [21312] [unknown] DETAIL:  LDAP diagnostics: TLS 
already started
2022-05-04 11:01:33.457 EDT [21312] [unknown] FATAL:  LDAP authentication 
failed for user "test1"
2022-05-04 11:01:33.457 EDT [21312] [unknown] DETAIL:  Connection matched 
pg_hba.conf line 1: "local all all ldap 
ldapurl="ldaps://localhost:51335/dc=example,dc=net??sub?(uid=$username)" 
ldaptls=1"
2022-05-04 11:01:33.459 EDT [21304] LOG:  server process (PID 21312) was 
terminated by signal 11: Segmentation fault: 11

Many of the test cases pass, but it looks like ldaps-related ones don't.
The stack trace isn't very helpful:

(lldb) bt
* thread #1, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x0)
  * frame #0: 0x0001b5bfc628 libsystem_pthread.dylib`pthread_rwlock_rdlock
frame #1: 0x0001054a74c4 libcrypto.3.dylib`CRYPTO_THREAD_read_lock + 12

regards, tom lane




Re: SQL/JSON: FOR ORDINALITY bug

2022-05-04 Thread Erik Rijkers

Op 04-05-2022 om 13:55 schreef Andrew Dunstan:


On 2022-05-03 Tu 20:39, David G. Johnston wrote:

On Tue, May 3, 2022 at 5:27 PM Andrew Dunstan  wrote:


 On 2022-05-03 Tu 11:19, Erik Rijkers wrote:
 > Hi
 >
 > I've copied some statements from the .pdf called:
 > "TECHNICAL REPORT ISO/IEC TR 19075-6   First edition 2017-03
 > Part SQL Notation support 6: (JSON) for JavaScript Object"
 > (not available anymore although there should be a similar
 replacement
 > file)
 >
 > In that pdf I found the data and statement (called 'table 15' in the
 > .pdf) as in the attached bash file.  But the result is different: as
 > implemented by 15devel, the column rowseq is always 1.  It seems
 to me
 > that that is wrong; it should count 1, 2, 3 as indeed the
 > example-result column in that pdf shows.
 >
 > What do you think?
 >
 >

 Possibly.


I don't see how rowseq can be anything but 1.  Each invocation of



After some further experimentation, I now think you must be right, David.

Also, looking at the DB2 docs:
  https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
(see especially under 'Handling nested information')

There, I gathered some example data + statements where one is the case 
at hand.  I also made them runnable under postgres (attached).


I thought that was an instructive example, with those 'outer_ordinality' 
and 'inner_ordinality' columns.


Erik



json_table is given a single jsonb record via the lateral reference to
bookclub.jcol.  It produces one result, having a rowseq 1.  It does
this for all three outer lateral reference tuples and thus produces
three output rows each with one match numbered rowseq 1.



I imagine we could overcome that by stashing the sequence counter
somewhere it would survive across calls. The question really is what is
the right thing to do? I'm also a bit worried about how correct is
ordinal numbering with nested paths, e.g. (from the regression tests):


select
     jt.*
from
     jsonb_table_test jtt,
     json_table (
     jtt.js,'strict $[*]' as p
     columns (
     n for ordinality,
     a int path 'lax $.a' default -1 on empty,
     nested path 'strict $.b[*]' as pb columns ( b int path '$' ),
     nested path 'strict $.c[*]' as pc columns ( c int path '$' )
     )
     ) jt;
  n | a  | b | c
---++---+
  1 |  1 |   |
  2 |  2 | 1 |
  2 |  2 | 2 |
  2 |  2 | 3 |
  2 |  2 |   | 10
  2 |  2 |   |
  2 |  2 |   | 20
  3 |  3 | 1 |
  3 |  3 | 2 |
  4 | -1 | 1 |
  4 | -1 | 2 |


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com


json_ordinality_db2.sh
Description: application/shellscript


RE: strange slow query - lost lot of time somewhere

2022-05-04 Thread Jakub Wartak
> I do agree that the perf report does indicate that the extra time is taken 
> due to
> some large amount of memory being allocated. I just can't quite see how that
> would happen in Memoize given that
> estimate_num_groups() clamps the distinct estimate as the number of input
> rows, which is 91 in both cases in your problem query.
> 
> Are you able to run the Memoize query in psql with \watch 0.1 for a few 
> seconds
> while you do:
> 
> perf record --call-graph dwarf --pid  sleep 2
> 
> then send along the perf report.
> 
> I locally hacked build_hash_table() in nodeMemoize.c to make the hashtable 100
> million elements and I see my perf report for a trivial Memoize query come up
> as:
> 
[..]
> 
> Failing that, are you able to pg_dump these tables and load them into a
> PostgreSQL instance that you can play around with and patch?
> Provided you can actually recreate the problem on that instance.
> 

+1 to what David says, we need a reproducer. In [1] Pavel wrote that he's 
having a lot of clear_page_erms(), so maybe this will be a little help: I 
recall having similar issue having a lot of minor page faults and high %sys 
when raising work_mem. For me it was different issue some time ago, but it was 
something like build_hash_table() being used by UNION recursive calls -> 
BuildTupleHashTable() -> .. malloc() -> mmap64().  When mmap() is issued with 
MAP_ANONYMOUS the kernel will zero out the memory (more memory -> potentially 
bigger CPU waste visible as minor page faults; erms stands for "Enhanced REP 
MOVSB/STOSB"; this is on kernel side). The culprit was planner allocating 
something that wouldn't be used later.

Additional three ways to figure that one (all are IMHO production safe):
a) already mentioned perf with --call-graph dwarf -p PID
b) strace -p PID -e 'mmap' # verify if mmap() NULL is not having MAP_ANONYMOUS 
flag, size of mmap() request will somehow match work_mem sizing
c) gdb -p PID and then breakpoint for mmap and verify each mmap() # check 
MAP_ANONYMOUS as above

[1] - 
https://www.postgresql.org/message-id/CAFj8pRAo5CrF8mpPxMvnBYFSqu4HYDqRsQnLqGphckNHkHosFg%40mail.gmail.com

-J.


Re: configure openldap crash warning

2022-05-04 Thread Peter Eisentraut

On 02.05.22 16:03, Tom Lane wrote:

I'm not that excited about getting rid of this warning, because to the
extent that anyone notices it at all, it'll motivate them to get OpenLDAP
from Homebrew or MacPorts, which seems like a good thing.


I tried building with Homebrew-supplied openldap.  What ends up 
happening is that the postgres binary is indeed linked with openldap, 
but libpq still is linked against the OS-supplied LDAP framework. 
(Checked with "otool -L" in each case.)  Can someone else reproduce 
this, too?






Re: Add a new function and a document page to get/show all the server hooks

2022-05-04 Thread Tom Lane
Peter Eisentraut  writes:
> On 04.05.22 12:54, Bharath Rupireddy wrote:
>> One problem is that the new function and doc page create an extra
>> burden of keeping them up to date with the hooks modifications and new
>> hook additions, but I think that can be taken care of in the review
>> phases.

> I think this has been proposed a number of times and rejected.

The most recent such discussion was here:

https://www.postgresql.org/message-id/flat/20201231032813.GQ13234%40fetter.org

The basic point was that there's a pretty low bar to creating a hook,
but the more infrastructure you want to have around hooks the harder
it will be to add any ... and the less likely that the infrastructure
will be kept up-to-date.

My takeaway from that thread was that there could be support for
minimalistic documentation, along the lines of a README file listing
all the hooks.  I'm not in favor of trying to do more than that
--- in particular, the cost/benefit ratio for the function proposed
here seems to approach infinity.

regards, tom lane




Re: bogus: logical replication rows/cols combinations

2022-05-04 Thread Peter Eisentraut



On 03.05.22 21:40, Tomas Vondra wrote:

So what's wrong with merging the column lists as implemented in the v2
patch, posted a couple days ago?


Merging the column lists is ok if all other publication attributes 
match.  Otherwise, I think not.



I don't think triggers are a suitable alternative, as it executes on the
subscriber node. So you have to first copy the data to the remote node,
where it gets filtered. With column filters the data gets redacted on
the publisher.


Right, triggers are not currently a solution.  But you could imagine a 
redaction filter system that runs on the publisher that modifies rows 
before they are sent out.





Re: [PATCH] Log details for client certificate failures

2022-05-04 Thread Peter Eisentraut



On 04.05.22 01:05, Jacob Champion wrote:

On Tue, 2022-05-03 at 21:06 +0200, Peter Eisentraut wrote:

The information in pg_stat_ssl is limited to NAMEDATALEN (see struct
PgBackendSSLStatus).

It might make sense to align what your patch prints to identify
certificates with what is shown in that view.


Sure, a max length should be easy enough to do. Is there a reason to
limit to NAMEDATALEN specifically? I was under the impression that we
would rather not have had that limitation in the stats framework, if we
could have avoided it. (In particular I think NAMEDATALEN will cut off
the longest possible Common Name by just five bytes.)


Just saying that cutting it off appears to be acceptable.  A bit more 
than 63 bytes should be okay for the log.


In terms of aligning what is printed, I meant that pg_stat_ssl uses the 
issuer plus serial number to identify the certificate unambiguously.





Re: Add pg_strtoupper and pg_strtolower functions

2022-05-04 Thread Tom Lane
Alvaro Herrera  writes:
> Currently, pg_toupper/pg_tolower are used in very limited situations.
> Are they really always safe enough to run in arbitrary situations,
> enough to create this new layer on top of them?

They are not, and we should absolutely not be encouraging additional uses
of them.  The existing multi-character str_toupper/str_tolower functions
should be used instead.  (Perhaps those should be relocated to someplace
more prominent?)

> Reading the comment on
> pg_tolower, "the whole thing is a bit bogus for multibyte charsets", I
> worry that we might create security holes, either now or in future
> callsites that use these new functions.

I doubt that they are security holes, but they do give unexpected
answers in some locales.

regards, tom lane




Re: Add a new function and a document page to get/show all the server hooks

2022-05-04 Thread Peter Eisentraut

On 04.05.22 12:54, Bharath Rupireddy wrote:

One problem is that the new function and doc page create an extra
burden of keeping them up to date with the hooks modifications and new
hook additions, but I think that can be taken care of in the review
phases.

Thoughts?


I think this has been proposed a number of times and rejected.





Re: Skipping schema changes in publication

2022-05-04 Thread Peter Eisentraut

On 14.04.22 15:47, Peter Eisentraut wrote:
That said, I'm not sure this feature is worth the trouble.  If this is 
useful, what about "whole database except these schemas"?  What about 
"create this database from this template except these schemas".  This 
could get out of hand.  I think we should encourage users to group their 
object the way they want and not offer these complicated negative 
selection mechanisms.


Another problem in general with this "all except these" way of 
specifying things is that you need to track negative dependencies.


For example, assume you can't add a table to a publication unless it has 
a replica identity.  Now, if you have a publication p1 that says 
includes "all tables except t1", you now have to check p1 whenever a new 
table is created, even though the new table has no direct dependency 
link with p1.  So in more general cases, you would have to check all 
existing objects to see whether their specification is in conflict with 
the new object being created.


Now publications don't actually work that way, so it's not a real 
problem right now, but similar things could work like that.  So I think 
it's worth thinking this through a bit.






Re: Add pg_strtoupper and pg_strtolower functions

2022-05-04 Thread Alvaro Herrera
On 2022-May-02, Bharath Rupireddy wrote:

> Hi,
> 
> I came across pg_toupper and pg_tolower functions, converting a single
> character, are being used in loops to convert an entire
> null-terminated string. The cost of calling these character-based
> conversion functions (even though small) can be avoided if we have two
> new functions pg_strtoupper and pg_strtolower.

Currently, pg_toupper/pg_tolower are used in very limited situations.
Are they really always safe enough to run in arbitrary situations,
enough to create this new layer on top of them?  Reading the comment on
pg_tolower, "the whole thing is a bit bogus for multibyte charsets", I
worry that we might create security holes, either now or in future
callsites that use these new functions.

Consider that in the Turkish locale you lowercase an I (single-byte
ASCII character) with a dotless-i (two bytes).  So overwriting the input
string is not a great solution.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)




Re: testclient.exe installed under MSVC

2022-05-04 Thread Daniel Gustafsson
> On 4 May 2022, at 02:34, Andrew Dunstan  wrote:

> I think we should make the standard MSVC install look as much like the
> standard Unix/msys install as possible.

+1

--
Daniel Gustafsson   https://vmware.com/





Re: testclient.exe installed under MSVC

2022-05-04 Thread Daniel Gustafsson
> On 3 May 2022, at 16:50, Daniel Gustafsson  wrote:
> 
>> On 3 May 2022, at 15:58, Alvaro Herrera  wrote:
>> 
>> On 2022-May-03, Noah Misch wrote:
>> 
>>> Michael Paquier recommended s/-/_/ for uri-regress, and I agree with that.
>>> What do you think?
>> 
>> libpq_uri-regress is horrible, so +1 for that.
> 
> Agreed, I'll do that before pushing.

Done that way.

--
Daniel Gustafsson   https://vmware.com/





Re: SQL/JSON: FOR ORDINALITY bug

2022-05-04 Thread Andrew Dunstan


On 2022-05-03 Tu 20:39, David G. Johnston wrote:
> On Tue, May 3, 2022 at 5:27 PM Andrew Dunstan  wrote:
>
>
> On 2022-05-03 Tu 11:19, Erik Rijkers wrote:
> > Hi
> >
> > I've copied some statements from the .pdf called:
> > "TECHNICAL REPORT ISO/IEC TR 19075-6   First edition 2017-03
> > Part SQL Notation support 6: (JSON) for JavaScript Object"
> > (not available anymore although there should be a similar
> replacement
> > file)
> >
> > In that pdf I found the data and statement (called 'table 15' in the
> > .pdf) as in the attached bash file.  But the result is different: as
> > implemented by 15devel, the column rowseq is always 1.  It seems
> to me
> > that that is wrong; it should count 1, 2, 3 as indeed the
> > example-result column in that pdf shows.
> >
> > What do you think?
> >
> >
>
> Possibly. 
>
>
> I don't see how rowseq can be anything but 1.  Each invocation of
> json_table is given a single jsonb record via the lateral reference to
> bookclub.jcol.  It produces one result, having a rowseq 1.  It does
> this for all three outer lateral reference tuples and thus produces
> three output rows each with one match numbered rowseq 1.
>

I imagine we could overcome that by stashing the sequence counter
somewhere it would survive across calls. The question really is what is
the right thing to do? I'm also a bit worried about how correct is
ordinal numbering with nested paths, e.g. (from the regression tests):


select
    jt.*
from
    jsonb_table_test jtt,
    json_table (
    jtt.js,'strict $[*]' as p
    columns (
    n for ordinality,
    a int path 'lax $.a' default -1 on empty,
    nested path 'strict $.b[*]' as pb columns ( b int path '$' ),
    nested path 'strict $.c[*]' as pc columns ( c int path '$' )
    )
    ) jt;
 n | a  | b | c  
---++---+
 1 |  1 |   |   
 2 |  2 | 1 |   
 2 |  2 | 2 |   
 2 |  2 | 3 |   
 2 |  2 |   | 10
 2 |  2 |   |   
 2 |  2 |   | 20
 3 |  3 | 1 |   
 3 |  3 | 2 |   
 4 | -1 | 1 |   
 4 | -1 | 2 |   


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [RFC] building postgres with meson -v8

2022-05-04 Thread Peter Eisentraut

More patches:

0001-meson-Assorted-compiler-test-tweaks.patch

I was going through a diff of pg_config.h between old and new build and 
found a few omissions and small differences.


Some of the

blah ? 1 : false

is of course annoying and can be removed eventually, but it's useful 
when analyzing the diff, and since it's already done in other places it 
seems reasonable to apply it consistently.


Of course there is some more work left for some of the more complicated 
tests; this isn't meant to be complete.



0002-meson-Add-pg_walinspect.patch

This was added more recently and was not ported yet.  Nothing too 
interesting here.



0003-meson-Install-all-server-headers.patch

With this, all the server headers installed by a makefile-based build 
are installed.  I tried to strike a balance between using 
install_subdir() with exclude list versus listing things explicitly. 
Different variations might be possible, but this looked pretty sensible 
to me.



With these patches, the list of files installed with make versus meson 
match up, except for known open items (postmaster symlink, some library 
naming differences, pkgconfig, pgxs, test modules installed, documentation).From c3f4f4f8002e473284587d21f89cb66364365a26 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 4 May 2022 09:54:36 +0200
Subject: [PATCH 1/3] meson: Assorted compiler test tweaks

---
 meson.build | 46 +-
 1 file changed, 37 insertions(+), 9 deletions(-)

diff --git a/meson.build b/meson.build
index 80af8e13da..27bc9dcd48 100644
--- a/meson.build
+++ b/meson.build
@@ -222,7 +222,7 @@ meson_bin = find_program(meson_binpath, native: true)
 # Option Handling
 ###
 
-cdata.set('USE_ASSERT_CHECKING', get_option('cassert'))
+cdata.set('USE_ASSERT_CHECKING', get_option('cassert') ? 1 : false)
 
 cdata.set('BLCKSZ', 8192, description: '''
  Size of a disk block --- this also limits the size of a tuple.  You
@@ -241,7 +241,7 @@ cdata.set('BLCKSZ', 8192, description: '''
 cdata.set('XLOG_BLCKSZ', get_option('wal-blocksize') * 1024)
 cdata.set('RELSEG_SIZE', get_option('segsize') * 131072)
 cdata.set('DEF_PGPORT', get_option('pgport'))
-cdata.set_quoted('DEF_PGPORT_STR', '5432')
+cdata.set_quoted('DEF_PGPORT_STR', get_option('pgport'))
 cdata.set_quoted('PG_KRB_SRVNAM', 'postgres')
 
 
@@ -870,8 +870,6 @@ if get_option('ssl') == 'openssl'
 ssl_int = [ssl]
   endif
 
-  cdata.set_quoted('WITH_SSL', get_option('ssl'))
-
   check_funcs = [
 ['CRYPTO_new_ex_data', {'required': true}],
 ['SSL_new', {'required': true}],
@@ -1381,7 +1379,7 @@ int main(void)
 cdata.set(check['name'],
   cc.links(test,
 name: check['desc'],
-args: g_c_args + ['-DINT64=@0@'.format(cdata.get('PG_INT64_TYPE'))])
+args: g_c_args + ['-DINT64=@0@'.format(cdata.get('PG_INT64_TYPE'))]) ? 
1 : false
 )
   endforeach
 
@@ -1609,7 +1607,8 @@ endif
 
 if cc.has_member('struct sockaddr_storage', 'ss_family',
 args: g_c_args, include_directories: g_c_inc,
-prefix: '''#include 
+prefix: '''
+#include 
 #include ''')
   cdata.set('HAVE_STRUCT_SOCKADDR_STORAGE_SS_FAMILY', 1)
 endif
@@ -1622,6 +1621,30 @@ if cc.has_member('struct sockaddr_storage', 
'__ss_family',
   cdata.set('HAVE_STRUCT_SOCKADDR_STORAGE___SS_FAMILY', 1)
 endif
 
+if cc.has_member('struct sockaddr_storage', 'ss_len',
+args: g_c_args, include_directories: g_c_inc,
+prefix: '''
+#include 
+#include ''')
+  cdata.set('HAVE_STRUCT_SOCKADDR_STORAGE_SS_LEN', 1)
+endif
+
+if cc.has_member('struct sockaddr_storage', '__ss_len',
+args: g_c_args, include_directories: g_c_inc,
+prefix: '''
+#include 
+#include ''')
+  cdata.set('HAVE_STRUCT_SOCKADDR_STORAGE___SS_LEN', 1)
+endif
+
+if cc.has_member('struct sockaddr', 'sa_len',
+args: g_c_args, include_directories: g_c_inc,
+prefix: '''
+#include 
+#include ''')
+  cdata.set('HAVE_STRUCT_SOCKADDR_SA_LEN', 1)
+endif
+
 if cc.has_type('struct sockaddr_un',
 args: g_c_args, include_directories: g_c_inc,
 prefix: '''
@@ -1701,10 +1724,10 @@ endif
 # needs xlocale.h; standard is locale.h, but glibc also has an
 # xlocale.h file that we should not use.
 if cc.has_type('locale_t', prefix: '#include ')
-  cdata.set('HAVE_LOCALE_T', true)
+  cdata.set('HAVE_LOCALE_T', 1)
 elif cc.has_type('locale_t', prefix: '#include ')
-  cdata.set('HAVE_LOCALE_T', true)
-  cdata.set('LOCALE_T_IN_XLOCALE', true)
+  cdata.set('HAVE_LOCALE_T', 1)
+  cdata.set('LOCALE_T_IN_XLOCALE', 1)
 endif
 
 # MSVC doesn't cope well with defining restrict to __restrict, the
@@ -1781,6 +1804,7 @@ func_checks = [
   ['getrusage'],
   ['gettimeofday'], # XXX: This seems to be in the autoconf case
   ['inet_aton'],
+  ['inet_pton'],
   ['kqueue'],
   ['link'],
   ['mbstowcs_l'],
@@ -1852,6 +1876,10 @@ foreach c : func_checks
 endforeach
 
 
+if cc.has_function('syslog', args: g_c_args) and cc.check_header('syslog.h', 

Re: Configuration Parameter/GUC value validation hook

2022-05-04 Thread Bharath Rupireddy
On Tue, May 3, 2022 at 10:43 PM Robert Haas  wrote:
>
> On Tue, May 3, 2022 at 11:45 AM Tom Lane  wrote:
> > Robert Haas  writes:
> > > I have some desire here to see us solve this problem not just for
> > > service providers, but for users in general. You don't have to be a
> > > service provider to want to disallow SET work_mem = '1TB' -- you just
> > > need to be a DBA on a system where such a setting will cause bad
> > > things to happen. But, if you are a DBA on some random system, you
> > > won't likely find a hook to be a particularly useful way of
> > > controlling this sort of thing.
> >
> > Yeah, I think this is a more realistic point.  I too am not sure what
> > a good facility would look like.  I guess an argument in favor of
> > providing a hook is that we could then leave it to extension authors
> > to try to devise a facility that's useful to end users, rather than
> > having to write an in-core feature.
>
> RIght. The counter-argument is that if we just do that, then what will
> likely happen is that people who buy PostgreSQL services from
> Microsoft, Amazon, EDB, Crunchy, etc. will end up with reasonable
> options in this area, and people who download the source code from the
> Internet probably won't. As an open-source project, we might hope to
> avoid a scenario where it doesn't work unless you buy something. On
> the third hand, half a loaf is better than nothing.

Thanks Tom and Robert for your responses.

How about we provide a sample extension (limiting some important
parameters say shared_buffers, work_mem and so on to some
"reasonable/recommended" limits) in the core along with the
set_config_option_hook? This way, all the people using open source
postgres out-of-the-box will benefit and whoever wants, can modify
that sample extension to suit their needs. The sampe extension can
also serve as an example to implement set_config_option_hook.

Thoughts?

Regards,
Bharath Rupireddy.




Re: Add pg_strtoupper and pg_strtolower functions

2022-05-04 Thread Bharath Rupireddy
On Mon, May 2, 2022 at 6:43 PM Ashutosh Bapat
 wrote:
>
> On Mon, May 2, 2022 at 6:21 PM Bharath Rupireddy
>  wrote:
> >
> > Hi,
> >
> > I came across pg_toupper and pg_tolower functions, converting a single
> > character, are being used in loops to convert an entire
> > null-terminated string. The cost of calling these character-based
> > conversion functions (even though small) can be avoided if we have two
> > new functions pg_strtoupper and pg_strtolower.
>
> Have we measured the saving in cost? Let's say for a million character
> long string?

I didn't spend time on figuring out the use-cases hitting all the code
areas, even if I do so, the function call cost savings might not
impress most of the time and the argument of saving function call cost
then becomes pointless.

> > Attaching a patch with these new two functions and their usage in most
> > of the possible places in the code.
>
> Converting pg_toupper and pg_tolower to "inline" might save cost
> similarly and also avoid code duplication?

I think most of the modern compilers do inline small functions. But,
inlining isn't always good as it increases the size of the code. With
the proposed helper functions, the code looks cleaner (at least IMO,
others may have different opinions though).

Regards,
Bharath Rupireddy.




Add a new function and a document page to get/show all the server hooks

2022-05-04 Thread Bharath Rupireddy
Hi,

As we have many hooks in postgres that extends the postgres
functionality, I'm wondering if it's a good idea to add a new
function, say, pg_get_all_server_hooks, returning hook name, hook
declaration and its current value (if there's any external module
loaded implementing the hook). This basically helps to know at any
given point of time what all the hooks are installed and the modules
implementing them. Imagine using this new function on a production
server with many supported extensions installed which might implement
some of the hooks.

Also, a dedicated page in the documentation listing out all the hooks,
their declarations and a short description. This will help
developers/users to know the available hooks.

One problem is that the new function and doc page create an extra
burden of keeping them up to date with the hooks modifications and new
hook additions, but I think that can be taken care of in the review
phases.

Thoughts?

Regards,
Bharath Rupireddy.




Re: Hash index build performance tweak from sorting

2022-05-04 Thread Amit Kapila
On Mon, May 2, 2022 at 9:28 PM Simon Riggs  wrote:
>
> On Sat, 30 Apr 2022 at 12:12, Amit Kapila  wrote:
> >
> > On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs
> >  wrote:
> > >
> > > Hash index pages are stored in sorted order, but we don't prepare the
> > > data correctly.
> > >
> > > We sort the data as the first step of a hash index build, but we
> > > forget to sort the data by hash as well as by hash bucket.
> > >
> >
> > I was looking into the nearby comments (Fetch hash keys and mask off
> > bits we don't want to sort by.) and it sounds like we purposefully
> > don't want to sort by the hash key. I see that this comment was
> > originally introduced in the below commit:
> >
> > commit 4adc2f72a4ccd6e55e594aca837f09130a6af62b
> > Author: Tom Lane 
> > Date:   Mon Sep 15 18:43:41 2008 +
> >
> > Change hash indexes to store only the hash code rather than the
> > whole indexed
> > value.
> >
> > But even before that, we seem to mask off the bits before comparison.
> > Is it that we are doing so because we want to keep the order of hash
> > keys in a particular bucket so such masking was required?
>
> We need to sort by both hash bucket and hash value.
>
> Hash bucket id so we can identify the correct hash bucket to insert into.
>
> But then on each bucket/overflow page we store it sorted by hash value
> to make lookup faster, so inserts go faster if they are also sorted.
>

I also think so. So, we should go with this unless someone else sees
any flaw here.

-- 
With Regards,
Amit Kapila.




Re: Logical replication timeout problem

2022-05-04 Thread Amit Kapila
On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada  wrote:
>
> On Mon, May 2, 2022 at 11:32 AM Amit Kapila  wrote:
> >
> >
> > So, shall we go back to the previous approach of using a separate
> > function update_replication_progress?
>
> Ok, agreed.
>

Attached, please find the updated patch accordingly. Currently, I have
prepared it for HEAD, if you don't see any problem with this, we can
prepare the back-branch patches based on this.

-- 
With Regards,
Amit Kapila.


v20-0001-Fix-the-logical-replication-timeout-during-large.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2022-05-04 Thread vignesh C
On Mon, May 2, 2022 at 5:49 AM Peter Smith  wrote:
>
> Thanks for updating the patches for all my prior feedback.
>
> For v12* I have only minor feedback for the docs.
>
> ==
> V12-0001
>
> no comments
>
> ==
> V12-0002
>
> 1. Commit message
>
> In another thread using the terminology "multi-master" seems to be
> causing some problems. Maybe you should choose different wording in
> this commit message to avoid having the same problems?

I felt, we can leave this to committer

> ~~~
>
> 2. doc/src/sgml/logical-replication.sgml
>
> These are all similar minor wording suggestions to split the sentences.
> Wording: ", here copy_data is specified as XXX ..." -> ". Use
> copy_data specified as XXX ..."
>
> Also:
> Wording: "... to subscribe the changes from nodeXXX..."
> -> "... to subscribe to the changes from nodeXXX... " (add "to")
> -> "... to subscribe to nodeXXX."  (or I preferred just remove the
> whole "changes" part)
>
> 2a.
> Create a subscription in node3 to subscribe the changes from node1,
> here copy_data is specified as force so that the existing table data
> is copied during initial sync:
>
> SUGGESTION
> Create a subscription in node3 to subscribe to node1. Use copy_data
> specified as force so that the existing table data is copied during
> initial sync:

Modified

> 2b.
> Create a subscription in node1 to subscribe the changes fromnode3,
> here copy_data is specified as force so that the existing table data
> is copied during initial sync:
>
> SUGGESTION
> Create a subscription in node1 to subscribe to node3. Use copy_data
> specified as force so that the existing table data is copied during
> initial sync:

Modified

> 2c.
> Create a subscription in node2 to subscribe the changes from node3,
> here copy_data is specified as force so that the existing table data
> is copied during initial sync:
>
> SUGGESTION
> Create a subscription in node2 to subscribe to node3. Use copy_data
> specified as force so that the existing table data is copied during
> initial sync:

Modified

> 2d.
> Create a subscription in node3 to subscribe the changes from node1,
> here copy_data is specified as force when creating a subscription to
> node1 so that the existing table data is copied during initial sync:
>
> SUGGESTION
> Create a subscription in node3 to subscribe to node1. Use copy_data
> specified as force when creating a subscription to node1 so that the
> existing table data is copied during initial sync:

Modified

> 2e.
> Create a subscription in node3 to subscribe the changes from node2,
> here copy_data is specified as off as the initial table data would
> have been copied in the earlier step:
>
> SUGGESTION (this one has a bit more re-wording than the others)
> Create a subscription in node3 to subscribe to node2. Use copy_data
> specified as off because the initial table data would have been
> already copied in the previous step:

Modified

> ~~~
>
> 3. doc/src/sgml/logical-replication.sgml
>
> 31.11.2. Adding new node when there is no data in any of the nodes
> 31.11.3. Adding new node when data is present in the existing nodes
> 31.11.4. Adding new node when data is present in the new node
>
> Minor change to the above heading?
>
> Wording: "Adding new node ..." -> "Adding a new node ..."

Modified

Thanks for the comments, the attached v13 patch has the changes for the same.

Regards,
Vignesh
From 657265a56dfc2eee8f4b6cd2137e6c9b8c26d6a6 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Fri, 8 Apr 2022 11:10:05 +0530
Subject: [PATCH v13 1/2] Skip replication of non local data.

This patch adds a new SUBSCRIPTION boolean option
"local_only". The default is false. When a SUBSCRIPTION is
created with this option enabled, the publisher will only publish data
that originated at the publisher node.
Usage:
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port='
PUBLICATION pub1 with (local_only = true);
---
 doc/src/sgml/ref/alter_subscription.sgml  |   5 +-
 doc/src/sgml/ref/create_subscription.sgml |  12 ++
 src/backend/catalog/pg_subscription.c |   1 +
 src/backend/catalog/system_views.sql  |   4 +-
 src/backend/commands/subscriptioncmds.c   |  26 ++-
 .../libpqwalreceiver/libpqwalreceiver.c   |   5 +
 src/backend/replication/logical/worker.c  |   2 +
 src/backend/replication/pgoutput/pgoutput.c   |  23 +++
 src/bin/pg_dump/pg_dump.c |  17 +-
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/psql/describe.c   |   8 +-
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/pg_subscription.h |   3 +
 src/include/replication/logicalproto.h|  10 +-
 src/include/replication/pgoutput.h|   1 +
 src/include/replication/walreceiver.h |   1 +
 src/test/regress/expected/subscription.out| 142 ---
 src/test/regress/sql/subscription.sql |  10 ++
 src/test/subscription/t/032_localonly.pl  | 162 ++
 19 files