Re: Perform streaming logical transactions by background workers and parallel apply
Here are my review comments for v5-0001. I will take a look at the v5-0002 (TAP) patch another time. == 1. Commit message The message still refers to "apply background". Should that say "apply background worker"? Other parts just call this the "worker". Personally, I think it might be better to coin some new term for this thing (e.g. "apply-bgworker" or something like that of your choosing) so then you can just concisely *always* refer to that everywhere without any ambiguity. e.g same applies to every comment and every message in this patch. They should all use identical terminology (e.g. "apply-bgworker"). ~~~ 2. Commit message "We also need to allow stream_stop to complete by the apply background to finish it to..." Wording: ??? ~~~ 3. Commit message This patch also extends the subscription streaming option so that user can control whether apply the streaming transaction in a apply background or spill the change to disk. Wording: "user" -> "the user" Typo: "whether apply" -> "whether to apply" Typo: "a apply" -> "an apply" ~~~ 4. Commit message User can set the streaming option to 'on/off', 'apply'. For now, 'apply' means the streaming will be applied via a apply background if available. 'on' means the streaming transaction will be spilled to disk. I think "apply" might not be the best choice of values for this meaning, but I think Hou-san already said [1] that this was being reconsidered. ~~~ 5. doc/src/sgml/catalogs.sgml - formatting @@ -7863,11 +7863,15 @@ SCRAM-SHA-256$iteration count: - substream bool + substream char - If true, the subscription will allow streaming of in-progress - transactions + Controls how to handle the streaming of in-progress transactions. + f = disallow streaming of in-progress transactions + o = spill the changes of in-progress transactions to + disk and apply at once after the transaction is committed on the + publisher. + a = apply changes directly using a background worker Needs to be consistent with other value lists on this page. 5a. The first sentence to end with ":" 5b. List items to end with "," ~~~ 6. doc/src/sgml/ref/create_subscription.sgml + + If set to apply incoming + changes are directly applied via one of the background worker, if + available. If no background worker is free to handle streaming + transaction then the changes are written to a file and applied after + the transaction is committed. Note that if error happen when applying + changes in background worker, it might not report the finish LSN of + the remote transaction in server log. 6a. Typo: "one of the background worker," -> "one of the background workers," 6b. Wording BEFORE Note that if error happen when applying changes in background worker, it might not report the finish LSN of the remote transaction in server log. SUGGESTION Note that if an error happens when applying changes in a background worker, it might not report the finish LSN of the remote transaction in the server log. ~~~ 7. src/backend/commands/subscriptioncmds.c - defGetStreamingMode +static char +defGetStreamingMode(DefElem *def) +{ + /* + * If no parameter given, assume "true" is meant. + */ + if (def->arg == NULL) + return SUBSTREAM_ON; But is that right? IIUC all the docs said that the default is OFF. ~~~ 8. src/backend/commands/subscriptioncmds.c - defGetStreamingMode + /* + * The set of strings accepted here should match up with the + * grammar's opt_boolean_or_string production. + */ + if (pg_strcasecmp(sval, "true") == 0 || + pg_strcasecmp(sval, "on") == 0) + return SUBSTREAM_ON; + if (pg_strcasecmp(sval, "apply") == 0) + return SUBSTREAM_APPLY; + if (pg_strcasecmp(sval, "false") == 0 || + pg_strcasecmp(sval, "off") == 0) + return SUBSTREAM_OFF; Perhaps should re-order these OFF/ON/APPLY to be consistent with the T_Integer case above here. ~~~ 9. src/backend/replication/logical/launcher.c - logicalrep_worker_launch The "start new apply background worker ..." function comment feels a bit misleading now that seems what you are calling this new kind of worker. E.g. this is also called to start the sync worker. And also for the apply worker (which we are not really calling a "background worker" in other places). This comment is the same as [PSv4] #19. ~~~ 10. src/backend/replication/logical/launcher.c - logicalrep_worker_launch @@ -275,6 +280,9 @@ logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname, Oid userid, int nsyncworkers; TimestampTz now; + /* We don't support table sync in subworker */ + Assert(!((subworker_dsm != DSM_HANDLE_INVALID) && OidIsValid(relid))); I think you should declare a new variable like: bool is_subworker = subworker_dsm != DSM_HANDLE_INVALID; Then this Assert can be simplified, and also you can re-use the 'is_subworker' later multiple times in
Re: Skipping schema changes in publication
On Thu, May 5, 2022 at 9:20 AM Amit Kapila wrote: > > On Wed, May 4, 2022 at 7:05 PM Peter Eisentraut > wrote: > > > > On 14.04.22 15:47, Peter Eisentraut wrote: > > > That said, I'm not sure this feature is worth the trouble. If this is > > > useful, what about "whole database except these schemas"? What about > > > "create this database from this template except these schemas". This > > > could get out of hand. I think we should encourage users to group their > > > object the way they want and not offer these complicated negative > > > selection mechanisms. > > > > Another problem in general with this "all except these" way of > > specifying things is that you need to track negative dependencies. > > > > For example, assume you can't add a table to a publication unless it has > > a replica identity. Now, if you have a publication p1 that says > > includes "all tables except t1", you now have to check p1 whenever a new > > table is created, even though the new table has no direct dependency > > link with p1. So in more general cases, you would have to check all > > existing objects to see whether their specification is in conflict with > > the new object being created. > > > > Yes, I think we should avoid adding such negative dependencies. We > have carefully avoided such dependencies during row filter, column > list work where we don't try to perform DDL time verification. > However, it is not clear to me how this proposal is related to this > example or in general about tracking negative dependencies? > I mean to say that even if we have such a restriction, it would apply to "for all tables" or other publications as well. In your example, consider one wants to Alter a table and remove its replica identity, we have to check whether the table is part of any publication similar to what we are doing for relation persistence in ATPrepChangePersistence. > AFAIR, we > currently have such a check while changing persistence of logged table > (logged to unlogged, see ATPrepChangePersistence) where we cannot > allow changing persistence if that relation is part of some > publication. But as per my understanding, this feature shouldn't add > any such new dependencies. I agree that we have to ensure that > existing checks shouldn't break due to this feature. > > > Now publications don't actually work that way, so it's not a real > > problem right now, but similar things could work like that. So I think > > it's worth thinking this through a bit. > > > > This is a good point and I agree that we should be careful to not add > some new negative dependencies unless it is really required but I > can't see how this proposal will make it more prone to such checks. > -- With Regards, Amit Kapila.
Re: Proposal for internal Numeric to Uint64 conversion function.
On Tue, May 3, 2022 at 8:04 PM Peter Eisentraut wrote: > > On 03.05.22 08:50, Amul Sul wrote: > >> Do you have any data that supports removing DirectionFunctionCall() > >> invocations? I suppose some performance benefit could be expected, or > >> what do you have in mind? > >> > > Not really, the suggestion to avoid DirectionFunctionCall() is from Tom. > > > > For a trial, I have called the money(numeric) function 10M times to > > see the difference with and without patch but there is not much. > > (I don't have any knowledge of micro profiling/benchmarking). > > Ok. I have lost track of what you are trying to achieve with this patch > set. It's apparently not for performance, and in terms of refactoring > you end up with more lines of code than before, so that doesn't seem too > appealing either. So I'm not sure what the end goal is. > Well, that is still worth it, IMHO. Refactoring allows us to place numeric_pg_lsn to the correct file where it should be. This refactoring, giving a function that converts numeric to uint64. The same function is used in other routines of pg_lsn.c which is otherwise using DirectFunctionCall(). Refactoring of numeric_int8() giving a function to convert numeric to int64 which can be used at the many places without using DirectFuctionCall(), and that is not the sole reason of it -- we have a function that converts int64 to numeric but for the opposite conversion needs to use DirectFuctionCall() which doesn't make sense. Along with this refactoring there are different routing calling functions for the numeric arithmetic operation through DirectFunctionCall() which isn't necessary since the required arithmetic functions are already accessible. The last benefit that is not completely worthless is avoiding DirectFunctionCall() one less function call stack and one less work that the system needs to do the same task. Regards, Amul
Re: Skipping schema changes in publication
On Wed, May 4, 2022 at 7:05 PM Peter Eisentraut wrote: > > On 14.04.22 15:47, Peter Eisentraut wrote: > > That said, I'm not sure this feature is worth the trouble. If this is > > useful, what about "whole database except these schemas"? What about > > "create this database from this template except these schemas". This > > could get out of hand. I think we should encourage users to group their > > object the way they want and not offer these complicated negative > > selection mechanisms. > > Another problem in general with this "all except these" way of > specifying things is that you need to track negative dependencies. > > For example, assume you can't add a table to a publication unless it has > a replica identity. Now, if you have a publication p1 that says > includes "all tables except t1", you now have to check p1 whenever a new > table is created, even though the new table has no direct dependency > link with p1. So in more general cases, you would have to check all > existing objects to see whether their specification is in conflict with > the new object being created. > Yes, I think we should avoid adding such negative dependencies. We have carefully avoided such dependencies during row filter, column list work where we don't try to perform DDL time verification. However, it is not clear to me how this proposal is related to this example or in general about tracking negative dependencies? AFAIR, we currently have such a check while changing persistence of logged table (logged to unlogged, see ATPrepChangePersistence) where we cannot allow changing persistence if that relation is part of some publication. But as per my understanding, this feature shouldn't add any such new dependencies. I agree that we have to ensure that existing checks shouldn't break due to this feature. > Now publications don't actually work that way, so it's not a real > problem right now, but similar things could work like that. So I think > it's worth thinking this through a bit. > This is a good point and I agree that we should be careful to not add some new negative dependencies unless it is really required but I can't see how this proposal will make it more prone to such checks. -- With Regards, Amit Kapila.
Re: pg_stat_statements
Hi, On Tue, May 03, 2022 at 01:30:32PM +, Godfrin, Philippe E wrote: > > I wasn't exactly clear about the queries. The values clauses themselves are > not long - > We are using repeated values clauses: > > INSERT INTO timeseries.dvc_104 (tag_id, event_ts, bool_val, float_val, > int_val, string_val, last_updt_id) > VALUES > ($1,$2,$3,$4,$5,$6,$7),($8,$9,$10,$11,$12,$13,$14),($15,$16,$17,$18,$19,$20,$21), > ($22,$23,$24,$25,$26,$27,$28),($29,$30,$31,$32,$33,$34,$35),($36,$37,$38,$39,$40,$41,$42), > ($43,$44,$45,$46,$47,$48,$49),($50,$51,$52,$53,$54,$55,$56),($57,$58,$59,$60,$61,$62,$63), > ($64,$65,$66,$67,$68,$69,$70),($71,$72,$73,$74,$75,$76,$77),($78,$79,$80,$81,$82,$83,$84), > ($85,$86,$87,$88,$89,$90,$91),($92,$93,$94,$95,$96,$97,$98) > > This one's not long, but some 'load statements' have 10,000 values clauses, > others add up to 10,000 more > in an ON CONFLICT clause. I've checked the external Query file and it's > currently not large > at all. But I will keep an eye on that. When I had The settings at 1000 > statements > the file was indeed over 1GB. For the record, development is reducing those > statement > lengths. > [...] > The first observation is how long a simple query took: > > # select count(*) from pg_stat_statements; > count > --- >971 > Time: 6457.985 ms (00:06.458) > > MORE than six seconds for a mere 971 rows! Furthermore, when removing the > long queries: > # select count(*) from pg_stat_statements(showtext:=false); > count > --- >970 > Time: 10.644 ms > > Only 10ms... Well, 10ms is still quite slow. You're not removing the long queries texts, you're removing all queries texts. I don't know if the overhead comes from processing at least some long statements or is mostly due to having to retrieve the query file. Do you get the same times if you run the query twice? Maybe you're short on RAM and have somewhat slow disks, and the text file has to be read from disk rather than OS cache? Also I don't know what you mean by "not large at all", so it's hard to compare or try to reproduce. FWIW on some instance I have around, I have a 140kB file and querying pg_stat_statements *with* the query text file only takes a few ms. You could try to query that view with some unprivileged user. This way you will still retrieve the query text file but will only emit "" rather than processing the query texts, this may narrow down the problem. Or better, if you could run perf [1] to see where the overhead really is. > Second, we have Datadog installed. Datadoq queries the pg_stat_statements > table > every 10 seconds. The real pain point is querying the pg_stat_statements seems > to have an impact on running queries, specifically inserts in my case. I think this is a side effect of having a very low pg_stat_statements.max, if of course you have more queries than the current value. If the extra time is due to loading the query text file and if it's loaded after acquiring the lightweight lock, then you will prevent evicting or creating new entries for a long time, which means that the query execution for those queries will be blocked until the query on pg_stat_statements ends. There are unfortunately *a lot* of unknowns here, so I can't do anything apart from guessing. > I believe this is an actual impact that needs a solution. First, if you have an OLTP workload you have to make sure that pg_stat_statements.max is high enough so that you don't have to evict entries, or at least not often. Then, I think that querying pg_stat_statements every 10s is *really* aggressive, that's always going to have some noticeable overhead. For the rest, we need more information to understand where the slowdown is coming from. [1] https://wiki.postgresql.org/wiki/Profiling_with_perf
Re: Atomic GetFreeIndexPage()?
On Wed, May 4, 2022 at 12:16 PM Chris Cleveland wrote: > Similar code is repeated in a bunch of places. Each access method has to > explicitly write something into a freed page that identifies it as ok to use. I wouldn't say that that's what this code is doing, though I do see what you mean. The reason why the ultimate consumer of the free page writes to it is becausewell, it wouldn't have asked for a page if it didn't have something to write. Code like GinPageIsRecyclable() is generally concerned with something called recycle safety, typically using something called the drain technique. That's what this code is primarily concerned about. The definition of recycle safety is documented in the nbtree README. > Otherwise, you could have two processes get the same page, then one locks it, > writes it, and unlocks it, then the second one does the same clobbering the > first. But you could have that anyway, since the FSM isn't crash safe. It's inherently not completely trustworthy for that reason. Actually, the FSM shouldn't really contain a page that is ! GinPageIsRecyclable() to begin with, but that is still possible for a variety of reasons. All of which boil down to "the current FSM design cannot be totally trusted, so we verify redundantly". -- Peter Geoghegan
Re: Costing elided SubqueryScans more nearly correctly
I wrote: > I instrumented the code in setrefs.c, and found that during the > core regression tests this patch estimates correctly in 2103 > places while guessing wrongly in 54, so that seems like a pretty > good step forward. On second thought, that's not a terribly helpful summary. Breaking things down to the next level, there were 1088 places where we correctly guessed a subquery isn't trivial (so no change from current behavior, which is correct) 1015 places where we correctly guessed a subquery is trivial (hence, improving the cost estimate from before) 40 places where we incorrectly guessed a subquery isn't trivial (so no change from current behavior, although that's wrong) 14 places where we incorrectly guessed a subquery is trivial (hence, incorrectly charging zero for the SubqueryScan) 1015 improvements to 14 disimprovements isn't a bad score. I'm a bit surprised there are that many removable SubqueryScans TBH; maybe that's an artifact of all the "SELECT *" queries. regards, tom lane
Costing elided SubqueryScans more nearly correctly
In [1] I complained about how SubqueryScans that get deleted from a plan tree by setrefs.c nonetheless contribute cost increments that might cause the planner to make odd choices. That turned out not to be the proximate cause of that particular issue, but it still seems like it might be a good idea to do something about it. Here's a little patch to improve matters. It turns out to be hard to predict perfectly whether setrefs.c will remove a SubqueryScan, because createplan.c plays some games with moving tlist evaluations around and/or inserting "physical" (i.e. trivial) tlists, which can falsify any earlier estimate of whether a SubqueryScan is trivial. I'm not especially interested in redesigning those mechanisms, so the best we can hope for is an approximate determination. (Those same behaviors also make a lot of other path cost estimates a bit incorrect, so it doesn't seem like we need to feel too awful about not getting SubqueryScan perfect.) Given that ground rule, however, it's not very hard to determine whether a SubqueryScanPath looks like it will be trivial and change its costing accordingly. The attached draft patch does that. I instrumented the code in setrefs.c, and found that during the core regression tests this patch estimates correctly in 2103 places while guessing wrongly in 54, so that seems like a pretty good step forward. Perhaps I overcomplicated matters by making the callers responsible for determining triviality of the paths' targets. We could just make cost_subqueryscan check that for itself (using logic similar to what I wrote in set_subquery_pathlist), but that'd result in duplicative calculations anytime we make more than one Path for a subquery. On the other hand, said calculations wouldn't be that expensive, so perhaps a more localized patch would be better. I also notice that setrefs.c can elide Append and MergeAppend nodes too in some cases, but AFAICS costing of those node types doesn't take that into account. Anyway, I'll stick this in the queue for v16. regards, tom lane [1] https://www.postgresql.org/message-id/328872.1651247595%40sss.pgh.pa.us diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index d84f66a81b..08f4d125e1 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -2439,6 +2439,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel, { Query *parse = root->parse; Query *subquery = rte->subquery; + bool trivial_pathtarget; Relids required_outer; pushdown_safety_info safetyInfo; double tuple_fraction; @@ -2597,6 +2598,36 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel, */ set_subquery_size_estimates(root, rel); + /* + * Also detect whether the reltarget is trivial, so that we can pass that + * info to cost_subqueryscan (rather than re-deriving it multiple times). + * It's trivial if it fetches all the subplan output columns in order. + */ + if (list_length(rel->reltarget->exprs) != list_length(subquery->targetList)) + trivial_pathtarget = false; + else + { + trivial_pathtarget = true; + foreach(lc, rel->reltarget->exprs) + { + Node *node = (Node *) lfirst(lc); + Var *var; + + if (!IsA(node, Var)) + { +trivial_pathtarget = false; +break; + } + var = (Var *) node; + if (var->varno != rti || +var->varattno != foreach_current_index(lc) + 1) + { +trivial_pathtarget = false; +break; + } + } + } + /* * For each Path that subquery_planner produced, make a SubqueryScanPath * in the outer query. @@ -2615,6 +2646,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Generate outer path using this subpath */ add_path(rel, (Path *) create_subqueryscan_path(root, rel, subpath, + trivial_pathtarget, pathkeys, required_outer)); } @@ -2640,6 +2672,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Generate outer path using this subpath */ add_partial_path(rel, (Path *) create_subqueryscan_path(root, rel, subpath, + trivial_pathtarget, pathkeys, required_outer)); } diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 6673d271c2..efb58f083a 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -1388,10 +1388,12 @@ cost_tidrangescan(Path *path, PlannerInfo *root, * * 'baserel' is the relation to be scanned * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL + * 'trivial_pathtarget' is true if the pathtarget is believed to be trivial. */ void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root, - RelOptInfo *baserel, ParamPathInfo *param_info) + RelOptInfo *baserel, ParamPathInfo *param_info, + bool trivial_pathtarget) { Cost startup_cost; Cost run_cost; @@ -1431,6
Re: SQL/JSON: FOR ORDINALITY bug
On Wed, May 4, 2022 at 1:43 PM David G. Johnston wrote: > On Wed, May 4, 2022 at 1:09 PM Erik Rijkers wrote: > >> Op 04-05-2022 om 21:12 schreef Andrew Dunstan: >> > >> >> I don't see how rowseq can be anything but 1. Each invocation of >> >> >> >> >> >> After some further experimentation, I now think you must be right, >> David. >> >> >> >> Also, looking at the DB2 docs: >> >>https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table >> >> (see especially under 'Handling nested information') >> >> You've probably noticed then that on that same page under 'Sibling >> Nesting' is a statement that gives a 13-row resultset on DB2 whereas in >> 15devel that statement yields just 10 rows. I don't know which is >> correct. >> >> > There should be 12 results (minimum would be 8 - 5 of which are used for > real matches, plus 4 new row producing matches). > > Our result seems internally inconsistent; conceptually there are two kinds > of nulls here and we cannot collapse them. > > null-val: we are outputting the record from the nested path but there is > no actual value to output so we output null-val > null-union: we are not outputting the record for the nested path (we are > doing a different one) but we need to output something for this column so > we output null-union. > > Thinking this over - I think the difference is we implemented a FULL OUTER JOIN to combine the siblings - including the behavior of that construct and the absence of rows. DB2 took the word "UNION" for the plan modifier literally and unioned (actually union all) the two subpaths together using the null concepts above (though somehow ensuring that at least one row was produced from each subpath...). Thus we are indeed back to seeing whether the standard defines sibling combining as union or join, or some other special construct. I'm now leaning toward what we've done as at least being the more sane option. Even if our outer join process is correct the existing wording is odd. "Use FULL OUTER JOIN ON FALSE, so that both parent and child rows are included into the output, with NULL values inserted into both child and parent columns for all missing values." I don't think it helps to mention parent here. This aspect of plan doesn't concern itself with the final output, only the output of the subplan which is then combined with the parent using a join. I would probably want to phrase the default more like: "This is the default option for joining the combined child rows to the parent." David J.
Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
Thanks a lot for the patch update. On 2022-05-02 1:25 a.m., Etsuro Fujita wrote: Hi, On Wed, Apr 20, 2022 at 4:55 AM David Zhang wrote: I tried to apply the patch to master and plan to run some tests, but got below errors due to other commits. I rebased the patch against HEAD. Attached is an updated patch. Applied the patch v8 to master branch today, and the `make check` is OK. I also repeated previous performance tests on three virtual Ubuntu 18.04, and the performance improvement of parallel abort in 10 times average is more consistent. before: abort.1 = 2.6344 ms abort.2 = 4.2799 ms after: abort.1 = 1.4105 ms abort.2 = 2.2075 ms + * remote server in parallel at (sub)transaction end. Here, I think the comment above could potentially apply to multiple remote server(s). I agree on that point, but I think it's correct to say "the remote server" here, because we determine this for the given remote server. Maybe I'm missing something, so could you elaborate on it? Your understanding is correct. I was thinking `remote server(s)` would be easy for end user to understand but this is a comment in source code, so either way is fine for me. Not sure if there is a way to avoid repeated comments? For example, the same comment below appears in two places (line 231 and line 296). +/* + * If requested, consume whatever data is available from the socket. + * (Note that if all data is available, this allows + * pgfdw_get_cleanup_result to call PQgetResult without forcing the + * overhead of WaitLatchOrSocket, which would be large compared to the + * overhead of PQconsumeInput.) + */ IMO I think it's OK to have this in multiple places, because 1) IMO it wouldn't be that long, and 2) we already duplicated comments like this in the same file in v14 and earlier. Here is such an example in pgfdw_xact_callback() and pgfdw_subxact_callback() in that file in those versions: /* * If a command has been submitted to the remote server by * using an asynchronous execution function, the command * might not have yet completed. Check to see if a * command is still being processed by the remote server, * and if so, request cancellation of the command. */ Thanks for reviewing! Sorry for the delay. Best regards, Etsuro Fujita -- Best regards, David Software Engineer Highgo Software Inc. (Canada) www.highgo.ca
Re: SQL/JSON: FOR ORDINALITY bug
On Wed, May 4, 2022 at 1:09 PM Erik Rijkers wrote: > Op 04-05-2022 om 21:12 schreef Andrew Dunstan: > > > > I don't see how rowseq can be anything but 1. Each invocation of > >> > >> > >> After some further experimentation, I now think you must be right, > David. > >> > >> Also, looking at the DB2 docs: > >>https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table > >> (see especially under 'Handling nested information') > >> > >> There, I gathered some example data + statements where one is the case > >> at hand. I also made them runnable under postgres (attached). > >> > >> I thought that was an instructive example, with those > >> 'outer_ordinality' and 'inner_ordinality' columns. > >> > >> > > > > Yeah, I just reviewed the latest version of that page (7.5) and the > > example seems fairly plain that we are doing the right thing, or if not > > we're in pretty good company, so I guess this is probably a false alarm. > > Looks like ordinality is for the number of the element produced by the > > path expression. So a path of 'lax $' should just produce ordinality of > > 1 in each case, while a path of 'lax $[*]' will produce increasing > > ordinality for each element of the root array. > > Agreed. > > You've probably noticed then that on that same page under 'Sibling > Nesting' is a statement that gives a 13-row resultset on DB2 whereas in > 15devel that statement yields just 10 rows. I don't know which is correct. > > There should be 12 results (minimum would be 8 - 5 of which are used for real matches, plus 4 new row producing matches). Our result seems internally inconsistent; conceptually there are two kinds of nulls here and we cannot collapse them. null-val: we are outputting the record from the nested path but there is no actual value to output so we output null-val null-union: we are not outputting the record for the nested path (we are doing a different one) but we need to output something for this column so we output null-union. Sally, null-val, null-union Sally, null-union, null-val We only have one Sally but need both (11) We are also missing: Mary, null-union, null-val (12) The fact that we agree on John means that we at least agree on UNION meaning we output a pair of rows when there are two nested paths. I point to relative comparisons for fear of reading the specification here... David J. David J.
Re: SQL/JSON: FOR ORDINALITY bug
Op 04-05-2022 om 21:12 schreef Andrew Dunstan: I don't see how rowseq can be anything but 1. Each invocation of After some further experimentation, I now think you must be right, David. Also, looking at the DB2 docs: https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table (see especially under 'Handling nested information') There, I gathered some example data + statements where one is the case at hand. I also made them runnable under postgres (attached). I thought that was an instructive example, with those 'outer_ordinality' and 'inner_ordinality' columns. Yeah, I just reviewed the latest version of that page (7.5) and the example seems fairly plain that we are doing the right thing, or if not we're in pretty good company, so I guess this is probably a false alarm. Looks like ordinality is for the number of the element produced by the path expression. So a path of 'lax $' should just produce ordinality of 1 in each case, while a path of 'lax $[*]' will produce increasing ordinality for each element of the root array. Agreed. You've probably noticed then that on that same page under 'Sibling Nesting' is a statement that gives a 13-row resultset on DB2 whereas in 15devel that statement yields just 10 rows. I don't know which is correct. Erik cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Did we intend to change whether PUBLIC can create tables in the public schema by default?
On Wed, May 4, 2022 at 12:42 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > Hey, > > For the following sequence of commands, on a newly initdb v15devel and > mostly clean v13 I get a failure and a created table respectively. > > Apparently I didn't search commit history well enough the first time... https://github.com/postgres/postgres/commit/b073c3ccd06e4cb845e121387a43faa8c68a7b62 Sorry for the noise. David J.
Did we intend to change whether PUBLIC can create tables in the public schema by default?
Hey, For the following sequence of commands, on a newly initdb v15devel and mostly clean v13 I get a failure and a created table respectively. Showing v15devel: postgres=# create database testdb; CREATE DATABASE postgres=# create role testrole; CREATE ROLE postgres=# \c testdb You are now connected to database "testdb" as user "vagrant". testdb=# set session authorization testrole; SET testdb=> create table public.testtable(id int); ERROR: permission denied for schema public LINE 1: create table public.testtable(id int); testdb=> select version(); version -- PostgreSQL 15devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit (1 row) === v13.6 (I also have a report this is the behavior of v14) postgres=# create database testdb; crCREATE DATABASE postgres=# create role testrole; CREATE ROLE postgres=# \c testdb You are now connected to database "testdb" as user "postgres". testdb=# select version(); version -- PostgreSQL 13.6 (Ubuntu 13.6-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit (1 row) testdb=# set session authorization testrole; SET testdb=> create table public.testtable (id int); CREATE TABLE David J.
Re: Atomic GetFreeIndexPage()?
Chris Cleveland writes: > Would it make sense to make GetFreeIndexPage() atomic? I strongly doubt it. The loss of concurrency would outweigh any code-simplicity benefits. regards, tom lane
Re: Query generates infinite loop
Jeff Janes writes: > The regression test you added for this change causes an infinite loop when > run against an unpatched server with --install-check. That is a bit > unpleasant. Is there something we can and should do about that? I was > expecting regression test failures of course but not an infinite loop > leading towards disk exhaustion. We very often add regression test cases that will cause unpleasant failures on unpatched code. I categorically reject the idea that that's not a good thing, and question why you think that running known-broken code against a regression suite is an important use case. regards, tom lane
Atomic GetFreeIndexPage()?
Would it make sense to make GetFreeIndexPage() atomic? Internally, GetFreeIndexPage() calls GetPageWithFreeSpace() and then RecordUsedIndexPage() in two separate operations. It's possible for two different processes to get the same free page at the same time. To guard against this, there are several index access methods that do something like this: /* First, try to get a page from FSM */ for (;;) { BlockNumber blkno = GetFreeIndexPage(index); if (blkno == InvalidBlockNumber) break; buffer = ReadBuffer(index, blkno); /* * We have to guard against the possibility that someone else already * recycled this page; the buffer may be locked if so. */ if (ConditionalLockBuffer(buffer)) { if (GinPageIsRecyclable(BufferGetPage(buffer))) return buffer; /* OK to use */ LockBuffer(buffer, GIN_UNLOCK); } /* Can't use it, so release buffer and try again */ ReleaseBuffer(buffer); } Similar code is repeated in a bunch of places. Each access method has to explicitly write something into a freed page that identifies it as ok to use. Otherwise, you could have two processes get the same page, then one locks it, writes it, and unlocks it, then the second one does the same clobbering the first. All this logic goes away if GetFreeIndexPage() is atomic, such that finding and marking a page as not-free always happens in one go. No longer would the index pages themselves need to be modified to identify them as available. I'm thinking of surrounding the code that calls GetFreeIndexPage() with a lightweight lock, although I wonder if that would harm performance. Perhaps there is a more performant way to do this deep down in the FSM code. Thoughts? -- Chris Cleveland 312-339-2677 mobile
Re: JSON Functions and Operators Docs for v15
On 2022-05-04 We 11:39, Tom Lane wrote: > "David G. Johnston" writes: >> Is there a thread I'm not finding where the upcoming JSON function >> documentation is being made reasonably usable after doubling its size with >> all the new JSON Table features that we've added? If nothing else, the >> table of contents at the top of the page needs to be greatly expanded to >> make seeing and navigating to all that is available a possibility. > The entire structure of that text needs to be rethought, IMO, as it > has been written with precisely no concern for fitting into our > hard-won structure for func.sgml. Andrew muttered something about > rewriting it awhile ago, but I don't know what progress he's made. > Yes, I've been clearing the decks a bit, but I'm working on it now, should have something within the next week. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: SQL/JSON: FOR ORDINALITY bug
On 2022-05-04 We 10:39, Erik Rijkers wrote: > Op 04-05-2022 om 13:55 schreef Andrew Dunstan: >> >> On 2022-05-03 Tu 20:39, David G. Johnston wrote: >>> On Tue, May 3, 2022 at 5:27 PM Andrew Dunstan >>> wrote: >>> >>> >>> On 2022-05-03 Tu 11:19, Erik Rijkers wrote: >>> > Hi >>> > >>> > I've copied some statements from the .pdf called: >>> > "TECHNICAL REPORT ISO/IEC TR 19075-6 First edition 2017-03 >>> > Part SQL Notation support 6: (JSON) for JavaScript Object" >>> > (not available anymore although there should be a similar >>> replacement >>> > file) >>> > >>> > In that pdf I found the data and statement (called 'table 15' >>> in the >>> > .pdf) as in the attached bash file. But the result is >>> different: as >>> > implemented by 15devel, the column rowseq is always 1. It seems >>> to me >>> > that that is wrong; it should count 1, 2, 3 as indeed the >>> > example-result column in that pdf shows. >>> > >>> > What do you think? >>> > >>> > >>> >>> Possibly. >>> >>> >>> I don't see how rowseq can be anything but 1. Each invocation of > > > After some further experimentation, I now think you must be right, David. > > Also, looking at the DB2 docs: > https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table > (see especially under 'Handling nested information') > > There, I gathered some example data + statements where one is the case > at hand. I also made them runnable under postgres (attached). > > I thought that was an instructive example, with those > 'outer_ordinality' and 'inner_ordinality' columns. > > Yeah, I just reviewed the latest version of that page (7.5) and the example seems fairly plain that we are doing the right thing, or if not we're in pretty good company, so I guess this is probably a false alarm. Looks like ordinality is for the number of the element produced by the path expression. So a path of 'lax $' should just produce ordinality of 1 in each case, while a path of 'lax $[*]' will produce increasing ordinality for each element of the root array. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Query generates infinite loop
On Wed, Apr 20, 2022 at 5:43 PM Tom Lane wrote: > I wrote: > > it's true that infinities as generate_series endpoints are going > > to work pretty oddly, so I agree with the idea of forbidding 'em. > > > Numeric has infinity as of late, so the numeric variant would > > need to do this too. > > Oh --- looks like numeric generate_series() already throws error for > this, so we should just make the timestamp variants do the same. > The regression test you added for this change causes an infinite loop when run against an unpatched server with --install-check. That is a bit unpleasant. Is there something we can and should do about that? I was expecting regression test failures of course but not an infinite loop leading towards disk exhaustion. Cheers, Jeff
Re: fix cost subqueryscan wrong parallel cost
Robert Haas writes: > On Tue, May 3, 2022 at 2:13 PM Tom Lane wrote: >> In any case, fundamental redesign of what EXPLAIN prints is a job >> for v16 or later. Are you okay with the proposed patch as a v15 fix? > Yes. I can't really vouch for it, but I don't object to it. I re-read the patch and noticed that I'd missed one additional change needed: - run_cost = cpu_per_tuple * baserel->tuples; + run_cost = cpu_per_tuple * path->subpath->rows; That is, we should estimate the number of qual evaluations and cpu_tuple_cost charges based on the subpath row count not the non-parallel-aware relation size. So this reduces the cost of the SubqueryScan itself, as well as its row count, in partial paths. I don't see any further change in regression test results though. Pushed with that change. regards, tom lane
Re: strange slow query - lost lot of time somewhere
st 4. 5. 2022 v 16:08 odesílatel Jakub Wartak napsal: > > Additional three ways to figure that one (all are IMHO production safe): > a) already mentioned perf with --call-graph dwarf -p PID > b) strace -p PID -e 'mmap' # verify if mmap() NULL is not having > MAP_ANONYMOUS flag, size of mmap() request will somehow match work_mem > sizing > c) gdb -p PID and then breakpoint for mmap and verify each mmap() # check > MAP_ANONYMOUS as above > > I have not debug symbols, so I have not more details now Breakpoint 1 at 0x7f557f0c16c0 (gdb) c Continuing. Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 (gdb) bt #0 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 #1 0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6 #2 0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6 #3 0x7f557f04fb1e in malloc () from /lib64/libc.so.6 #4 0x00932134 in AllocSetAlloc () #5 0x009376cf in MemoryContextAllocExtended () #6 0x006ad915 in ExecInitMemoize () #7 0x0068dc02 in ExecInitNode () #8 0x006b37ff in ExecInitNestLoop () #9 0x0068dc56 in ExecInitNode () #10 0x006b37ff in ExecInitNestLoop () #11 0x0068dc56 in ExecInitNode () #12 0x006b37de in ExecInitNestLoop () #13 0x0068dc56 in ExecInitNode () #14 0x006b37de in ExecInitNestLoop () #15 0x0068dc56 in ExecInitNode () #16 0x00687e4d in standard_ExecutorStart () #17 0x7f5579b5ad2d in pgss_ExecutorStart () from /usr/pgsql-14/lib/pg_stat_statements.so #18 0x0062e643 in ExplainOnePlan () #19 0x0062e83d in ExplainOneQuery () #20 0x0062ee6f in ExplainQuery () #21 0x007f9b15 in standard_ProcessUtility () #22 0x7f5579b599c7 in pgss_ProcessUtility () from /usr/pgsql-14/lib/pg_stat_statements.so #23 0x007f7fef in PortalRunUtility () #24 0x007f83af in FillPortalStore () #25 0x007f86dd in PortalRun () #26 0x007f48cb in exec_simple_query () #27 0x007f610e in PostgresMain () #28 0x00776b8a in ServerLoop () #29 0x00777a03 in PostmasterMain () #30 0x004fe413 in main () (gdb) p The history is empty. (gdb) c Continuing. Breakpoint 1, 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 (gdb) bt #0 0x7f557f0c16c0 in mmap64 () from /lib64/libc.so.6 #1 0x7f557f04dd91 in sysmalloc () from /lib64/libc.so.6 #2 0x7f557f04eaa9 in _int_malloc () from /lib64/libc.so.6 #3 0x7f557f04fb1e in malloc () from /lib64/libc.so.6 #4 0x00932134 in AllocSetAlloc () #5 0x009376cf in MemoryContextAllocExtended () #6 0x006ad915 in ExecInitMemoize () #7 0x0068dc02 in ExecInitNode () #8 0x006b37ff in ExecInitNestLoop () #9 0x0068dc56 in ExecInitNode () #10 0x006b37ff in ExecInitNestLoop () #11 0x0068dc56 in ExecInitNode () #12 0x006b37de in ExecInitNestLoop () #13 0x0068dc56 in ExecInitNode () #14 0x00687e4d in standard_ExecutorStart () #15 0x7f5579b5ad2d in pgss_ExecutorStart () from /usr/pgsql-14/lib/pg_stat_statements.so #16 0x0062e643 in ExplainOnePlan () #17 0x0062e83d in ExplainOneQuery () #18 0x0062ee6f in ExplainQuery () #19 0x007f9b15 in standard_ProcessUtility () #20 0x7f5579b599c7 in pgss_ProcessUtility () from /usr/pgsql-14/lib/pg_stat_statements.so #21 0x007f7fef in PortalRunUtility () #22 0x007f83af in FillPortalStore () #23 0x007f86dd in PortalRun () #24 0x007f48cb in exec_simple_query () #25 0x007f610e in PostgresMain () #26 0x00776b8a in ServerLoop () #27 0x00777a03 in PostmasterMain () #28 0x004fe413 in main () (gdb) c Continuing. there was 2 hits of mmap Regards Pavel > [1] - > https://www.postgresql.org/message-id/CAFj8pRAo5CrF8mpPxMvnBYFSqu4HYDqRsQnLqGphckNHkHosFg%40mail.gmail.com > > -J. >
Re: strange slow query - lost lot of time somewhere
st 4. 5. 2022 v 2:15 odesílatel David Rowley napsal: > On Tue, 3 May 2022 at 17:02, Pavel Stehule > wrote: > > út 3. 5. 2022 v 6:57 odesílatel Tom Lane napsal: > >> You sure there's not something taking an exclusive lock on one of these > >> tables every so often? > > > > I am almost sure, I can see this issue only every time when I set a > higher work mem. I don't see this issue in other cases. > > hmm, I don't think the query being blocked on a table lock would cause > this behaviour. As far as I know, all table locks should be obtained > before the timer starts for the "Execution Time" timer in EXPLAIN > ANALYZE. However, locks are obtained on indexes at executor startup, > so if there was some delay in obtaining a lock on the index it would > show up this way. I just don't know of anything that obtains a > conflicting lock on an index without the same conflicting lock on the > table that the index belongs to. > > I do agree that the perf report does indicate that the extra time is > taken due to some large amount of memory being allocated. I just can't > quite see how that would happen in Memoize given that > estimate_num_groups() clamps the distinct estimate as the number of > input rows, which is 91 in both cases in your problem query. > > Are you able to run the Memoize query in psql with \watch 0.1 for a > few seconds while you do: > > perf record --call-graph dwarf --pid sleep 2 > > then send along the perf report. > > I locally hacked build_hash_table() in nodeMemoize.c to make the > hashtable 100 million elements and I see my perf report for a trivial > Memoize query come up as: > > + 99.98% 0.00% postgres postgres [.] _start > + 99.98% 0.00% postgres libc.so.6 [.] > __libc_start_main_alias_2 (inlined) > + 99.98% 0.00% postgres libc.so.6 [.] > __libc_start_call_main > + 99.98% 0.00% postgres postgres [.] main > + 99.98% 0.00% postgres postgres [.] PostmasterMain > + 99.98% 0.00% postgres postgres [.] ServerLoop > + 99.98% 0.00% postgres postgres [.] BackendStartup > (inlined) > + 99.98% 0.00% postgres postgres [.] BackendRun (inlined) > + 99.98% 0.00% postgres postgres [.] PostgresMain > + 99.98% 0.00% postgres postgres [.] exec_simple_query > + 99.98% 0.00% postgres postgres [.] PortalRun > + 99.98% 0.00% postgres postgres [.] FillPortalStore > + 99.98% 0.00% postgres postgres [.] PortalRunUtility > + 99.98% 0.00% postgres postgres [.] > standard_ProcessUtility > + 99.98% 0.00% postgres postgres [.] ExplainQuery > + 99.98% 0.00% postgres postgres [.] ExplainOneQuery > + 99.95% 0.00% postgres postgres [.] ExplainOnePlan > + 87.87% 0.00% postgres postgres [.] > standard_ExecutorStart > + 87.87% 0.00% postgres postgres [.] InitPlan (inlined) > + 87.87% 0.00% postgres postgres [.] ExecInitNode > + 87.87% 0.00% postgres postgres [.] ExecInitNestLoop > + 87.87% 0.00% postgres postgres [.] ExecInitMemoize > + 87.87% 0.00% postgres postgres [.] > build_hash_table (inlined) < > + 87.87% 0.00% postgres postgres [.] memoize_create > (inlined) > + 87.87% 0.00% postgres postgres [.] > memoize_allocate (inlined) > + 87.87% 0.00% postgres postgres [.] > MemoryContextAllocExtended > + 87.87% 0.00% postgres postgres [.] memset (inlined) > > Failing that, are you able to pg_dump these tables and load them into > a PostgreSQL instance that you can play around with and patch? > Provided you can actually recreate the problem on that instance. > + 71,98%14,36% postmaster [kernel.kallsyms] [k] page_fault ▒ + 70,19% 6,59% postmaster libc-2.28.so [.] __memset_avx2_erms ▒ + 68,20% 0,00% postmaster postgres [.] ExecInitNode ▒ + 68,20% 0,00% postmaster postgres [.] ExecInitNestLoop▒ + 68,13% 0,00% postmaster postgres [.] ExecInitMemoize ▒ + 68,13% 0,00% postmaster postgres [.] MemoryContextAllocExtended ▒ + 63,20% 0,00% postmaster postgres [.] 0x00776b89 ▒ + 63,20% 0,00% postmaster postgres [.] PostgresMain ◆ + 63,03% 0,00% postmaster
TAP test fail: we don't always detect backend crashes
I discovered while poking at an LDAP problem that our TAP tests are 100% reproducibly capable of ignoring server crashes and reporting success anyway. This problem occurs if the postmaster doesn't get the child failure report until after shutdown has been initiated, in which case you find something like this in the postmaster log: 2022-05-04 12:01:33.946 EDT [57945] [unknown] LOG: connection received: host=[local] 2022-05-04 12:01:33.995 EDT [57945] [unknown] LOG: connection authenticated: identity="uid=test1,dc=example,dc=net" method=ldap (/Users/tgl/pgsql/src/test/ldap/tmp_check/t_001_auth_node_data/pgdata/pg_hba.conf:1) 2022-05-04 12:01:33.995 EDT [57945] [unknown] LOG: connection authorized: user=test1 database=postgres application_name=001_auth.pl 2022-05-04 12:01:33.998 EDT [57945] 001_auth.pl LOG: statement: SELECT $$connected with user=test1$$ 2022-05-04 12:01:34.003 EDT [57937] LOG: received fast shutdown request 2022-05-04 12:01:34.003 EDT [57937] LOG: aborting any active transactions 2022-05-04 12:01:34.003 EDT [57937] LOG: background worker "logical replication launcher" (PID 57943) exited with exit code 1 2022-05-04 12:01:35.750 EDT [57937] LOG: server process (PID 57945) was terminated by signal 11: Segmentation fault: 11 2022-05-04 12:01:35.751 EDT [57937] LOG: terminating any other active server processes 2022-05-04 12:01:35.751 EDT [57937] LOG: abnormal database system shutdown 2022-05-04 12:01:35.751 EDT [57937] LOG: database system is shut down Our TAP scripts don't notice the "abnormal database system shutdown", so it looks like things have passed. It's even worse when a script demands an immediate shutdown, because then the postmaster won't wait around for the child status. If you have core dumps enabled, that adds some time before the child exit status is delivered, making this scenario extremely probable. I'm finding that src/test/ldap reports success 100% reproducibly after doing "ulimit -c unlimited", even though four backend core dumps are produced during the run. I think that (a) at least by default, node->stop() ought to check for normal shutdown status, and (b) immediate shutdown should only be used in cases where we've already decided that the test failed. regards, tom lane
Re: automatically generating node support functions
On 2022-May-04, Peter Eisentraut wrote: > I have committed your change to the JsonTableColumnType enum and the removal > of JsonPathSpec. Thanks! > Other than that and some whitespace changes, I didn't find anything in > your 0002 patch that was different from my last submitted patch. Did > I miss anything? No, I had just fixed one simple conflict IIRC, but I had made no other changes. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Porque francamente, si para saber manejarse a uno mismo hubiera que rendir examen... ¿Quién es el machito que tendría carnet?" (Mafalda)
Re: automatically generating node support functions
On 19.04.22 13:40, Alvaro Herrera wrote: I rebased this mostly out of curiousity. I fixed some smallish conflicts and fixed a typedef problem new in JSON support; however, even with these fixes it doesn't compile, because JsonPathSpec uses a novel typedef pattern that apparently will need bespoke handling in the gen_nodes_support.pl script. It seemed better to post this even without that, though. I have committed your change to the JsonTableColumnType enum and the removal of JsonPathSpec. Other than that and some whitespace changes, I didn't find anything in your 0002 patch that was different from my last submitted patch. Did I miss anything?
Re: JSON Functions and Operators Docs for v15
On Wed, May 4, 2022 at 8:39 AM Tom Lane wrote: > "David G. Johnston" writes: > > Is there a thread I'm not finding where the upcoming JSON function > > documentation is being made reasonably usable after doubling its size > with > > all the new JSON Table features that we've added? If nothing else, the > > table of contents at the top of the page needs to be greatly expanded to > > make seeing and navigating to all that is available a possibility. > > The entire structure of that text needs to be rethought, IMO, as it > has been written with precisely no concern for fitting into our > hard-won structure for func.sgml. Andrew muttered something about > rewriting it awhile ago, but I don't know what progress he's made. > > I suppose regardless of the answer, or which thread is used for the patch, the question at hand is whether this is problematic enough to warrant an open item. I would lean toward yes, we can decide how much reworking is considered sufficient to clear the open item separately. David J.
Re: JSON Functions and Operators Docs for v15
On Wed, May 04, 2022 at 08:32:51AM -0700, David G. Johnston wrote: > Hey, > > Is there a thread I'm not finding where the upcoming JSON function > documentation is being made reasonably usable after doubling its size with > all the new JSON Table features that we've added? If nothing else, the > table of contents at the top of the page needs to be greatly expanded to > make seeing and navigating to all that is available a possibility. https://www.postgresql.org/message-id/20220411160905.gh26...@telsasoft.com
Re: JSON Functions and Operators Docs for v15
"David G. Johnston" writes: > Is there a thread I'm not finding where the upcoming JSON function > documentation is being made reasonably usable after doubling its size with > all the new JSON Table features that we've added? If nothing else, the > table of contents at the top of the page needs to be greatly expanded to > make seeing and navigating to all that is available a possibility. The entire structure of that text needs to be rethought, IMO, as it has been written with precisely no concern for fitting into our hard-won structure for func.sgml. Andrew muttered something about rewriting it awhile ago, but I don't know what progress he's made. regards, tom lane
JSON Functions and Operators Docs for v15
Hey, Is there a thread I'm not finding where the upcoming JSON function documentation is being made reasonably usable after doubling its size with all the new JSON Table features that we've added? If nothing else, the table of contents at the top of the page needs to be greatly expanded to make seeing and navigating to all that is available a possibility. David J.
Re: [PATCH] Completed unaccent dictionary with many missing characters
Peter Eisentraut writes: > On 28.04.22 18:50, Przemysław Sztoch wrote: >> Current unnaccent dictionary does not include many popular numeric symbols, >> in example: "m²" -> "m2" > Seems reasonable. It kinda feels like this is outside the charter of an "unaccent" dictionary. I don't object to having these conversions available but it seems like it ought to be a separate feature. regards, tom lane
Re: configure openldap crash warning
I wrote: > Peter Eisentraut writes: >> I tried building with Homebrew-supplied openldap. What ends up >> happening is that the postgres binary is indeed linked with openldap, >> but libpq still is linked against the OS-supplied LDAP framework. >> (Checked with "otool -L" in each case.) Can someone else reproduce >> this, too? > [ it works with MacPorts ] Oh, I have a theory about this: I bet your Homebrew installation has a recent OpenLDAP version that only supplies libldap not libldap_r. In that case, configure will still find libldap_r available and will bind libpq to it, and you get the observed result. The configure check is not sophisticated enough to realize that it's finding chunks of two different OpenLDAP installations. Not sure about a good fix. If we had a way to detect which library file AC_CHECK_LIB finds, we could verify that libldap and libldap_r come from the same directory ... but I don't think we have that. regards, tom lane
Re: [PATCH] Completed unaccent dictionary with many missing characters
On 28.04.22 18:50, Przemysław Sztoch wrote: Current unnaccent dictionary does not include many popular numeric symbols, in example: "m²" -> "m2" Seems reasonable. Can you explain what your patch does to achieve this?
Re: configure openldap crash warning
Peter Eisentraut writes: > On 02.05.22 16:03, Tom Lane wrote: >> I'm not that excited about getting rid of this warning, because to the >> extent that anyone notices it at all, it'll motivate them to get OpenLDAP >> from Homebrew or MacPorts, which seems like a good thing. > I tried building with Homebrew-supplied openldap. What ends up > happening is that the postgres binary is indeed linked with openldap, > but libpq still is linked against the OS-supplied LDAP framework. > (Checked with "otool -L" in each case.) Can someone else reproduce > this, too? Hmm, I just tried it with up-to-date MacPorts, and it was a *complete* fail: I got all the deprecation warnings (so the system include headers were used), and both postgres and libpq.dylib still ended up linked against /System/Library/Frameworks/LDAP.framework/Versions/A/LDAP. But then I went "doh!" and added --with-includes=/opt/local/include --with-libraries=/opt/local/lib to the configure call, and everything built the way I expected. I'm not sure offhand if the docs include a reminder to do that when using stuff out of MacPorts, or the equivalent for Homebrew. We still have a bit of work to do, because this setup isn't getting all the way through src/test/ldap/: 2022-05-04 11:01:33.407 EDT [21312] [unknown] LOG: connection received: host=[local] 2022-05-04 11:01:33.457 EDT [21312] [unknown] LOG: could not start LDAP TLS session: Operations error 2022-05-04 11:01:33.457 EDT [21312] [unknown] DETAIL: LDAP diagnostics: TLS already started 2022-05-04 11:01:33.457 EDT [21312] [unknown] FATAL: LDAP authentication failed for user "test1" 2022-05-04 11:01:33.457 EDT [21312] [unknown] DETAIL: Connection matched pg_hba.conf line 1: "local all all ldap ldapurl="ldaps://localhost:51335/dc=example,dc=net??sub?(uid=$username)" ldaptls=1" 2022-05-04 11:01:33.459 EDT [21304] LOG: server process (PID 21312) was terminated by signal 11: Segmentation fault: 11 Many of the test cases pass, but it looks like ldaps-related ones don't. The stack trace isn't very helpful: (lldb) bt * thread #1, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x0) * frame #0: 0x0001b5bfc628 libsystem_pthread.dylib`pthread_rwlock_rdlock frame #1: 0x0001054a74c4 libcrypto.3.dylib`CRYPTO_THREAD_read_lock + 12 regards, tom lane
Re: SQL/JSON: FOR ORDINALITY bug
Op 04-05-2022 om 13:55 schreef Andrew Dunstan: On 2022-05-03 Tu 20:39, David G. Johnston wrote: On Tue, May 3, 2022 at 5:27 PM Andrew Dunstan wrote: On 2022-05-03 Tu 11:19, Erik Rijkers wrote: > Hi > > I've copied some statements from the .pdf called: > "TECHNICAL REPORT ISO/IEC TR 19075-6 First edition 2017-03 > Part SQL Notation support 6: (JSON) for JavaScript Object" > (not available anymore although there should be a similar replacement > file) > > In that pdf I found the data and statement (called 'table 15' in the > .pdf) as in the attached bash file. But the result is different: as > implemented by 15devel, the column rowseq is always 1. It seems to me > that that is wrong; it should count 1, 2, 3 as indeed the > example-result column in that pdf shows. > > What do you think? > > Possibly. I don't see how rowseq can be anything but 1. Each invocation of After some further experimentation, I now think you must be right, David. Also, looking at the DB2 docs: https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table (see especially under 'Handling nested information') There, I gathered some example data + statements where one is the case at hand. I also made them runnable under postgres (attached). I thought that was an instructive example, with those 'outer_ordinality' and 'inner_ordinality' columns. Erik json_table is given a single jsonb record via the lateral reference to bookclub.jcol. It produces one result, having a rowseq 1. It does this for all three outer lateral reference tuples and thus produces three output rows each with one match numbered rowseq 1. I imagine we could overcome that by stashing the sequence counter somewhere it would survive across calls. The question really is what is the right thing to do? I'm also a bit worried about how correct is ordinal numbering with nested paths, e.g. (from the regression tests): select jt.* from jsonb_table_test jtt, json_table ( jtt.js,'strict $[*]' as p columns ( n for ordinality, a int path 'lax $.a' default -1 on empty, nested path 'strict $.b[*]' as pb columns ( b int path '$' ), nested path 'strict $.c[*]' as pc columns ( c int path '$' ) ) ) jt; n | a | b | c ---++---+ 1 | 1 | | 2 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 2 | 2 | | 10 2 | 2 | | 2 | 2 | | 20 3 | 3 | 1 | 3 | 3 | 2 | 4 | -1 | 1 | 4 | -1 | 2 | cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com json_ordinality_db2.sh Description: application/shellscript
RE: strange slow query - lost lot of time somewhere
> I do agree that the perf report does indicate that the extra time is taken > due to > some large amount of memory being allocated. I just can't quite see how that > would happen in Memoize given that > estimate_num_groups() clamps the distinct estimate as the number of input > rows, which is 91 in both cases in your problem query. > > Are you able to run the Memoize query in psql with \watch 0.1 for a few > seconds > while you do: > > perf record --call-graph dwarf --pid sleep 2 > > then send along the perf report. > > I locally hacked build_hash_table() in nodeMemoize.c to make the hashtable 100 > million elements and I see my perf report for a trivial Memoize query come up > as: > [..] > > Failing that, are you able to pg_dump these tables and load them into a > PostgreSQL instance that you can play around with and patch? > Provided you can actually recreate the problem on that instance. > +1 to what David says, we need a reproducer. In [1] Pavel wrote that he's having a lot of clear_page_erms(), so maybe this will be a little help: I recall having similar issue having a lot of minor page faults and high %sys when raising work_mem. For me it was different issue some time ago, but it was something like build_hash_table() being used by UNION recursive calls -> BuildTupleHashTable() -> .. malloc() -> mmap64(). When mmap() is issued with MAP_ANONYMOUS the kernel will zero out the memory (more memory -> potentially bigger CPU waste visible as minor page faults; erms stands for "Enhanced REP MOVSB/STOSB"; this is on kernel side). The culprit was planner allocating something that wouldn't be used later. Additional three ways to figure that one (all are IMHO production safe): a) already mentioned perf with --call-graph dwarf -p PID b) strace -p PID -e 'mmap' # verify if mmap() NULL is not having MAP_ANONYMOUS flag, size of mmap() request will somehow match work_mem sizing c) gdb -p PID and then breakpoint for mmap and verify each mmap() # check MAP_ANONYMOUS as above [1] - https://www.postgresql.org/message-id/CAFj8pRAo5CrF8mpPxMvnBYFSqu4HYDqRsQnLqGphckNHkHosFg%40mail.gmail.com -J.
Re: configure openldap crash warning
On 02.05.22 16:03, Tom Lane wrote: I'm not that excited about getting rid of this warning, because to the extent that anyone notices it at all, it'll motivate them to get OpenLDAP from Homebrew or MacPorts, which seems like a good thing. I tried building with Homebrew-supplied openldap. What ends up happening is that the postgres binary is indeed linked with openldap, but libpq still is linked against the OS-supplied LDAP framework. (Checked with "otool -L" in each case.) Can someone else reproduce this, too?
Re: Add a new function and a document page to get/show all the server hooks
Peter Eisentraut writes: > On 04.05.22 12:54, Bharath Rupireddy wrote: >> One problem is that the new function and doc page create an extra >> burden of keeping them up to date with the hooks modifications and new >> hook additions, but I think that can be taken care of in the review >> phases. > I think this has been proposed a number of times and rejected. The most recent such discussion was here: https://www.postgresql.org/message-id/flat/20201231032813.GQ13234%40fetter.org The basic point was that there's a pretty low bar to creating a hook, but the more infrastructure you want to have around hooks the harder it will be to add any ... and the less likely that the infrastructure will be kept up-to-date. My takeaway from that thread was that there could be support for minimalistic documentation, along the lines of a README file listing all the hooks. I'm not in favor of trying to do more than that --- in particular, the cost/benefit ratio for the function proposed here seems to approach infinity. regards, tom lane
Re: bogus: logical replication rows/cols combinations
On 03.05.22 21:40, Tomas Vondra wrote: So what's wrong with merging the column lists as implemented in the v2 patch, posted a couple days ago? Merging the column lists is ok if all other publication attributes match. Otherwise, I think not. I don't think triggers are a suitable alternative, as it executes on the subscriber node. So you have to first copy the data to the remote node, where it gets filtered. With column filters the data gets redacted on the publisher. Right, triggers are not currently a solution. But you could imagine a redaction filter system that runs on the publisher that modifies rows before they are sent out.
Re: [PATCH] Log details for client certificate failures
On 04.05.22 01:05, Jacob Champion wrote: On Tue, 2022-05-03 at 21:06 +0200, Peter Eisentraut wrote: The information in pg_stat_ssl is limited to NAMEDATALEN (see struct PgBackendSSLStatus). It might make sense to align what your patch prints to identify certificates with what is shown in that view. Sure, a max length should be easy enough to do. Is there a reason to limit to NAMEDATALEN specifically? I was under the impression that we would rather not have had that limitation in the stats framework, if we could have avoided it. (In particular I think NAMEDATALEN will cut off the longest possible Common Name by just five bytes.) Just saying that cutting it off appears to be acceptable. A bit more than 63 bytes should be okay for the log. In terms of aligning what is printed, I meant that pg_stat_ssl uses the issuer plus serial number to identify the certificate unambiguously.
Re: Add pg_strtoupper and pg_strtolower functions
Alvaro Herrera writes: > Currently, pg_toupper/pg_tolower are used in very limited situations. > Are they really always safe enough to run in arbitrary situations, > enough to create this new layer on top of them? They are not, and we should absolutely not be encouraging additional uses of them. The existing multi-character str_toupper/str_tolower functions should be used instead. (Perhaps those should be relocated to someplace more prominent?) > Reading the comment on > pg_tolower, "the whole thing is a bit bogus for multibyte charsets", I > worry that we might create security holes, either now or in future > callsites that use these new functions. I doubt that they are security holes, but they do give unexpected answers in some locales. regards, tom lane
Re: Add a new function and a document page to get/show all the server hooks
On 04.05.22 12:54, Bharath Rupireddy wrote: One problem is that the new function and doc page create an extra burden of keeping them up to date with the hooks modifications and new hook additions, but I think that can be taken care of in the review phases. Thoughts? I think this has been proposed a number of times and rejected.
Re: Skipping schema changes in publication
On 14.04.22 15:47, Peter Eisentraut wrote: That said, I'm not sure this feature is worth the trouble. If this is useful, what about "whole database except these schemas"? What about "create this database from this template except these schemas". This could get out of hand. I think we should encourage users to group their object the way they want and not offer these complicated negative selection mechanisms. Another problem in general with this "all except these" way of specifying things is that you need to track negative dependencies. For example, assume you can't add a table to a publication unless it has a replica identity. Now, if you have a publication p1 that says includes "all tables except t1", you now have to check p1 whenever a new table is created, even though the new table has no direct dependency link with p1. So in more general cases, you would have to check all existing objects to see whether their specification is in conflict with the new object being created. Now publications don't actually work that way, so it's not a real problem right now, but similar things could work like that. So I think it's worth thinking this through a bit.
Re: Add pg_strtoupper and pg_strtolower functions
On 2022-May-02, Bharath Rupireddy wrote: > Hi, > > I came across pg_toupper and pg_tolower functions, converting a single > character, are being used in loops to convert an entire > null-terminated string. The cost of calling these character-based > conversion functions (even though small) can be avoided if we have two > new functions pg_strtoupper and pg_strtolower. Currently, pg_toupper/pg_tolower are used in very limited situations. Are they really always safe enough to run in arbitrary situations, enough to create this new layer on top of them? Reading the comment on pg_tolower, "the whole thing is a bit bogus for multibyte charsets", I worry that we might create security holes, either now or in future callsites that use these new functions. Consider that in the Turkish locale you lowercase an I (single-byte ASCII character) with a dotless-i (two bytes). So overwriting the input string is not a great solution. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)
Re: testclient.exe installed under MSVC
> On 4 May 2022, at 02:34, Andrew Dunstan wrote: > I think we should make the standard MSVC install look as much like the > standard Unix/msys install as possible. +1 -- Daniel Gustafsson https://vmware.com/
Re: testclient.exe installed under MSVC
> On 3 May 2022, at 16:50, Daniel Gustafsson wrote: > >> On 3 May 2022, at 15:58, Alvaro Herrera wrote: >> >> On 2022-May-03, Noah Misch wrote: >> >>> Michael Paquier recommended s/-/_/ for uri-regress, and I agree with that. >>> What do you think? >> >> libpq_uri-regress is horrible, so +1 for that. > > Agreed, I'll do that before pushing. Done that way. -- Daniel Gustafsson https://vmware.com/
Re: SQL/JSON: FOR ORDINALITY bug
On 2022-05-03 Tu 20:39, David G. Johnston wrote: > On Tue, May 3, 2022 at 5:27 PM Andrew Dunstan wrote: > > > On 2022-05-03 Tu 11:19, Erik Rijkers wrote: > > Hi > > > > I've copied some statements from the .pdf called: > > "TECHNICAL REPORT ISO/IEC TR 19075-6 First edition 2017-03 > > Part SQL Notation support 6: (JSON) for JavaScript Object" > > (not available anymore although there should be a similar > replacement > > file) > > > > In that pdf I found the data and statement (called 'table 15' in the > > .pdf) as in the attached bash file. But the result is different: as > > implemented by 15devel, the column rowseq is always 1. It seems > to me > > that that is wrong; it should count 1, 2, 3 as indeed the > > example-result column in that pdf shows. > > > > What do you think? > > > > > > Possibly. > > > I don't see how rowseq can be anything but 1. Each invocation of > json_table is given a single jsonb record via the lateral reference to > bookclub.jcol. It produces one result, having a rowseq 1. It does > this for all three outer lateral reference tuples and thus produces > three output rows each with one match numbered rowseq 1. > I imagine we could overcome that by stashing the sequence counter somewhere it would survive across calls. The question really is what is the right thing to do? I'm also a bit worried about how correct is ordinal numbering with nested paths, e.g. (from the regression tests): select jt.* from jsonb_table_test jtt, json_table ( jtt.js,'strict $[*]' as p columns ( n for ordinality, a int path 'lax $.a' default -1 on empty, nested path 'strict $.b[*]' as pb columns ( b int path '$' ), nested path 'strict $.c[*]' as pc columns ( c int path '$' ) ) ) jt; n | a | b | c ---++---+ 1 | 1 | | 2 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 2 | 2 | | 10 2 | 2 | | 2 | 2 | | 20 3 | 3 | 1 | 3 | 3 | 2 | 4 | -1 | 1 | 4 | -1 | 2 | cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [RFC] building postgres with meson -v8
More patches: 0001-meson-Assorted-compiler-test-tweaks.patch I was going through a diff of pg_config.h between old and new build and found a few omissions and small differences. Some of the blah ? 1 : false is of course annoying and can be removed eventually, but it's useful when analyzing the diff, and since it's already done in other places it seems reasonable to apply it consistently. Of course there is some more work left for some of the more complicated tests; this isn't meant to be complete. 0002-meson-Add-pg_walinspect.patch This was added more recently and was not ported yet. Nothing too interesting here. 0003-meson-Install-all-server-headers.patch With this, all the server headers installed by a makefile-based build are installed. I tried to strike a balance between using install_subdir() with exclude list versus listing things explicitly. Different variations might be possible, but this looked pretty sensible to me. With these patches, the list of files installed with make versus meson match up, except for known open items (postmaster symlink, some library naming differences, pkgconfig, pgxs, test modules installed, documentation).From c3f4f4f8002e473284587d21f89cb66364365a26 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 4 May 2022 09:54:36 +0200 Subject: [PATCH 1/3] meson: Assorted compiler test tweaks --- meson.build | 46 +- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/meson.build b/meson.build index 80af8e13da..27bc9dcd48 100644 --- a/meson.build +++ b/meson.build @@ -222,7 +222,7 @@ meson_bin = find_program(meson_binpath, native: true) # Option Handling ### -cdata.set('USE_ASSERT_CHECKING', get_option('cassert')) +cdata.set('USE_ASSERT_CHECKING', get_option('cassert') ? 1 : false) cdata.set('BLCKSZ', 8192, description: ''' Size of a disk block --- this also limits the size of a tuple. You @@ -241,7 +241,7 @@ cdata.set('BLCKSZ', 8192, description: ''' cdata.set('XLOG_BLCKSZ', get_option('wal-blocksize') * 1024) cdata.set('RELSEG_SIZE', get_option('segsize') * 131072) cdata.set('DEF_PGPORT', get_option('pgport')) -cdata.set_quoted('DEF_PGPORT_STR', '5432') +cdata.set_quoted('DEF_PGPORT_STR', get_option('pgport')) cdata.set_quoted('PG_KRB_SRVNAM', 'postgres') @@ -870,8 +870,6 @@ if get_option('ssl') == 'openssl' ssl_int = [ssl] endif - cdata.set_quoted('WITH_SSL', get_option('ssl')) - check_funcs = [ ['CRYPTO_new_ex_data', {'required': true}], ['SSL_new', {'required': true}], @@ -1381,7 +1379,7 @@ int main(void) cdata.set(check['name'], cc.links(test, name: check['desc'], -args: g_c_args + ['-DINT64=@0@'.format(cdata.get('PG_INT64_TYPE'))]) +args: g_c_args + ['-DINT64=@0@'.format(cdata.get('PG_INT64_TYPE'))]) ? 1 : false ) endforeach @@ -1609,7 +1607,8 @@ endif if cc.has_member('struct sockaddr_storage', 'ss_family', args: g_c_args, include_directories: g_c_inc, -prefix: '''#include +prefix: ''' +#include #include ''') cdata.set('HAVE_STRUCT_SOCKADDR_STORAGE_SS_FAMILY', 1) endif @@ -1622,6 +1621,30 @@ if cc.has_member('struct sockaddr_storage', '__ss_family', cdata.set('HAVE_STRUCT_SOCKADDR_STORAGE___SS_FAMILY', 1) endif +if cc.has_member('struct sockaddr_storage', 'ss_len', +args: g_c_args, include_directories: g_c_inc, +prefix: ''' +#include +#include ''') + cdata.set('HAVE_STRUCT_SOCKADDR_STORAGE_SS_LEN', 1) +endif + +if cc.has_member('struct sockaddr_storage', '__ss_len', +args: g_c_args, include_directories: g_c_inc, +prefix: ''' +#include +#include ''') + cdata.set('HAVE_STRUCT_SOCKADDR_STORAGE___SS_LEN', 1) +endif + +if cc.has_member('struct sockaddr', 'sa_len', +args: g_c_args, include_directories: g_c_inc, +prefix: ''' +#include +#include ''') + cdata.set('HAVE_STRUCT_SOCKADDR_SA_LEN', 1) +endif + if cc.has_type('struct sockaddr_un', args: g_c_args, include_directories: g_c_inc, prefix: ''' @@ -1701,10 +1724,10 @@ endif # needs xlocale.h; standard is locale.h, but glibc also has an # xlocale.h file that we should not use. if cc.has_type('locale_t', prefix: '#include ') - cdata.set('HAVE_LOCALE_T', true) + cdata.set('HAVE_LOCALE_T', 1) elif cc.has_type('locale_t', prefix: '#include ') - cdata.set('HAVE_LOCALE_T', true) - cdata.set('LOCALE_T_IN_XLOCALE', true) + cdata.set('HAVE_LOCALE_T', 1) + cdata.set('LOCALE_T_IN_XLOCALE', 1) endif # MSVC doesn't cope well with defining restrict to __restrict, the @@ -1781,6 +1804,7 @@ func_checks = [ ['getrusage'], ['gettimeofday'], # XXX: This seems to be in the autoconf case ['inet_aton'], + ['inet_pton'], ['kqueue'], ['link'], ['mbstowcs_l'], @@ -1852,6 +1876,10 @@ foreach c : func_checks endforeach +if cc.has_function('syslog', args: g_c_args) and cc.check_header('syslog.h',
Re: Configuration Parameter/GUC value validation hook
On Tue, May 3, 2022 at 10:43 PM Robert Haas wrote: > > On Tue, May 3, 2022 at 11:45 AM Tom Lane wrote: > > Robert Haas writes: > > > I have some desire here to see us solve this problem not just for > > > service providers, but for users in general. You don't have to be a > > > service provider to want to disallow SET work_mem = '1TB' -- you just > > > need to be a DBA on a system where such a setting will cause bad > > > things to happen. But, if you are a DBA on some random system, you > > > won't likely find a hook to be a particularly useful way of > > > controlling this sort of thing. > > > > Yeah, I think this is a more realistic point. I too am not sure what > > a good facility would look like. I guess an argument in favor of > > providing a hook is that we could then leave it to extension authors > > to try to devise a facility that's useful to end users, rather than > > having to write an in-core feature. > > RIght. The counter-argument is that if we just do that, then what will > likely happen is that people who buy PostgreSQL services from > Microsoft, Amazon, EDB, Crunchy, etc. will end up with reasonable > options in this area, and people who download the source code from the > Internet probably won't. As an open-source project, we might hope to > avoid a scenario where it doesn't work unless you buy something. On > the third hand, half a loaf is better than nothing. Thanks Tom and Robert for your responses. How about we provide a sample extension (limiting some important parameters say shared_buffers, work_mem and so on to some "reasonable/recommended" limits) in the core along with the set_config_option_hook? This way, all the people using open source postgres out-of-the-box will benefit and whoever wants, can modify that sample extension to suit their needs. The sampe extension can also serve as an example to implement set_config_option_hook. Thoughts? Regards, Bharath Rupireddy.
Re: Add pg_strtoupper and pg_strtolower functions
On Mon, May 2, 2022 at 6:43 PM Ashutosh Bapat wrote: > > On Mon, May 2, 2022 at 6:21 PM Bharath Rupireddy > wrote: > > > > Hi, > > > > I came across pg_toupper and pg_tolower functions, converting a single > > character, are being used in loops to convert an entire > > null-terminated string. The cost of calling these character-based > > conversion functions (even though small) can be avoided if we have two > > new functions pg_strtoupper and pg_strtolower. > > Have we measured the saving in cost? Let's say for a million character > long string? I didn't spend time on figuring out the use-cases hitting all the code areas, even if I do so, the function call cost savings might not impress most of the time and the argument of saving function call cost then becomes pointless. > > Attaching a patch with these new two functions and their usage in most > > of the possible places in the code. > > Converting pg_toupper and pg_tolower to "inline" might save cost > similarly and also avoid code duplication? I think most of the modern compilers do inline small functions. But, inlining isn't always good as it increases the size of the code. With the proposed helper functions, the code looks cleaner (at least IMO, others may have different opinions though). Regards, Bharath Rupireddy.
Add a new function and a document page to get/show all the server hooks
Hi, As we have many hooks in postgres that extends the postgres functionality, I'm wondering if it's a good idea to add a new function, say, pg_get_all_server_hooks, returning hook name, hook declaration and its current value (if there's any external module loaded implementing the hook). This basically helps to know at any given point of time what all the hooks are installed and the modules implementing them. Imagine using this new function on a production server with many supported extensions installed which might implement some of the hooks. Also, a dedicated page in the documentation listing out all the hooks, their declarations and a short description. This will help developers/users to know the available hooks. One problem is that the new function and doc page create an extra burden of keeping them up to date with the hooks modifications and new hook additions, but I think that can be taken care of in the review phases. Thoughts? Regards, Bharath Rupireddy.
Re: Hash index build performance tweak from sorting
On Mon, May 2, 2022 at 9:28 PM Simon Riggs wrote: > > On Sat, 30 Apr 2022 at 12:12, Amit Kapila wrote: > > > > On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs > > wrote: > > > > > > Hash index pages are stored in sorted order, but we don't prepare the > > > data correctly. > > > > > > We sort the data as the first step of a hash index build, but we > > > forget to sort the data by hash as well as by hash bucket. > > > > > > > I was looking into the nearby comments (Fetch hash keys and mask off > > bits we don't want to sort by.) and it sounds like we purposefully > > don't want to sort by the hash key. I see that this comment was > > originally introduced in the below commit: > > > > commit 4adc2f72a4ccd6e55e594aca837f09130a6af62b > > Author: Tom Lane > > Date: Mon Sep 15 18:43:41 2008 + > > > > Change hash indexes to store only the hash code rather than the > > whole indexed > > value. > > > > But even before that, we seem to mask off the bits before comparison. > > Is it that we are doing so because we want to keep the order of hash > > keys in a particular bucket so such masking was required? > > We need to sort by both hash bucket and hash value. > > Hash bucket id so we can identify the correct hash bucket to insert into. > > But then on each bucket/overflow page we store it sorted by hash value > to make lookup faster, so inserts go faster if they are also sorted. > I also think so. So, we should go with this unless someone else sees any flaw here. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada wrote: > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila wrote: > > > > > > So, shall we go back to the previous approach of using a separate > > function update_replication_progress? > > Ok, agreed. > Attached, please find the updated patch accordingly. Currently, I have prepared it for HEAD, if you don't see any problem with this, we can prepare the back-branch patches based on this. -- With Regards, Amit Kapila. v20-0001-Fix-the-logical-replication-timeout-during-large.patch Description: Binary data
Re: Handle infinite recursion in logical replication setup
On Mon, May 2, 2022 at 5:49 AM Peter Smith wrote: > > Thanks for updating the patches for all my prior feedback. > > For v12* I have only minor feedback for the docs. > > == > V12-0001 > > no comments > > == > V12-0002 > > 1. Commit message > > In another thread using the terminology "multi-master" seems to be > causing some problems. Maybe you should choose different wording in > this commit message to avoid having the same problems? I felt, we can leave this to committer > ~~~ > > 2. doc/src/sgml/logical-replication.sgml > > These are all similar minor wording suggestions to split the sentences. > Wording: ", here copy_data is specified as XXX ..." -> ". Use > copy_data specified as XXX ..." > > Also: > Wording: "... to subscribe the changes from nodeXXX..." > -> "... to subscribe to the changes from nodeXXX... " (add "to") > -> "... to subscribe to nodeXXX." (or I preferred just remove the > whole "changes" part) > > 2a. > Create a subscription in node3 to subscribe the changes from node1, > here copy_data is specified as force so that the existing table data > is copied during initial sync: > > SUGGESTION > Create a subscription in node3 to subscribe to node1. Use copy_data > specified as force so that the existing table data is copied during > initial sync: Modified > 2b. > Create a subscription in node1 to subscribe the changes fromnode3, > here copy_data is specified as force so that the existing table data > is copied during initial sync: > > SUGGESTION > Create a subscription in node1 to subscribe to node3. Use copy_data > specified as force so that the existing table data is copied during > initial sync: Modified > 2c. > Create a subscription in node2 to subscribe the changes from node3, > here copy_data is specified as force so that the existing table data > is copied during initial sync: > > SUGGESTION > Create a subscription in node2 to subscribe to node3. Use copy_data > specified as force so that the existing table data is copied during > initial sync: Modified > 2d. > Create a subscription in node3 to subscribe the changes from node1, > here copy_data is specified as force when creating a subscription to > node1 so that the existing table data is copied during initial sync: > > SUGGESTION > Create a subscription in node3 to subscribe to node1. Use copy_data > specified as force when creating a subscription to node1 so that the > existing table data is copied during initial sync: Modified > 2e. > Create a subscription in node3 to subscribe the changes from node2, > here copy_data is specified as off as the initial table data would > have been copied in the earlier step: > > SUGGESTION (this one has a bit more re-wording than the others) > Create a subscription in node3 to subscribe to node2. Use copy_data > specified as off because the initial table data would have been > already copied in the previous step: Modified > ~~~ > > 3. doc/src/sgml/logical-replication.sgml > > 31.11.2. Adding new node when there is no data in any of the nodes > 31.11.3. Adding new node when data is present in the existing nodes > 31.11.4. Adding new node when data is present in the new node > > Minor change to the above heading? > > Wording: "Adding new node ..." -> "Adding a new node ..." Modified Thanks for the comments, the attached v13 patch has the changes for the same. Regards, Vignesh From 657265a56dfc2eee8f4b6cd2137e6c9b8c26d6a6 Mon Sep 17 00:00:00 2001 From: Vigneshwaran C Date: Fri, 8 Apr 2022 11:10:05 +0530 Subject: [PATCH v13 1/2] Skip replication of non local data. This patch adds a new SUBSCRIPTION boolean option "local_only". The default is false. When a SUBSCRIPTION is created with this option enabled, the publisher will only publish data that originated at the publisher node. Usage: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port=' PUBLICATION pub1 with (local_only = true); --- doc/src/sgml/ref/alter_subscription.sgml | 5 +- doc/src/sgml/ref/create_subscription.sgml | 12 ++ src/backend/catalog/pg_subscription.c | 1 + src/backend/catalog/system_views.sql | 4 +- src/backend/commands/subscriptioncmds.c | 26 ++- .../libpqwalreceiver/libpqwalreceiver.c | 5 + src/backend/replication/logical/worker.c | 2 + src/backend/replication/pgoutput/pgoutput.c | 23 +++ src/bin/pg_dump/pg_dump.c | 17 +- src/bin/pg_dump/pg_dump.h | 1 + src/bin/psql/describe.c | 8 +- src/bin/psql/tab-complete.c | 4 +- src/include/catalog/pg_subscription.h | 3 + src/include/replication/logicalproto.h| 10 +- src/include/replication/pgoutput.h| 1 + src/include/replication/walreceiver.h | 1 + src/test/regress/expected/subscription.out| 142 --- src/test/regress/sql/subscription.sql | 10 ++ src/test/subscription/t/032_localonly.pl | 162 ++ 19 files