Bibliography section, some references cannot be found
hi. while reading this[1], << More information about partial indexes can be found in [ston89b], [olson93], and [seshadri95]. I googled around, still cannot find [olson93] related pdf or html link. in [2], I found out [ong90] “A Unified Framework for Version Modeling Using Production Rules in a Database System”. L. Ong and J. Goh. ERL Technical Memorandum M90/33. University of California. Berkeley, California. April, 1990. related link is https://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/1466.html an idea: In case these external links in (https://www.postgresql.org/docs/current/biblio.html) become dead links, we can send these external pdf or html files in the mailing list for archive purposes. [1] https://www.postgresql.org/docs/current/indexes-partial.html [2] https://www.postgresql.org/docs/current/biblio.html
Re: explain format json, unit for serialize and memory are different.
On Mon, May 13, 2024 at 11:22:08AM +0200, Daniel Gustafsson wrote: > Since json (and yaml/xml) is intended to be machine-readable I think we use a > single unit for all values, and document this fact. Agreed with the documentation gap. Another thing that could be worth considering is to add the units aside with the numerical values, say: "Memory Used": {"value": 23432, "units": "bytes"} That would require changing ExplainProperty() so as the units are showed in some shape, while still being readable when parsed. I wouldn't recommend doing that in v17, but perhaps v18 could do better? Units are also ignored for the XML and yaml outputs. -- Michael signature.asc Description: PGP signature
Re: SQL:2011 application time
On Tue, May 14, 2024 at 7:30 AM Paul Jungwirth wrote: > > On 5/13/24 03:11, Peter Eisentraut wrote: > > It looks like we missed some of these fundamental design questions early > > on, and it might be too > > late now to fix them for PG17. > > > > For example, the discussion on unique constraints misses that the question > > of null values in unique > > constraints itself is controversial and that there is now a way to change > > the behavior. So I > > imagine there is also a selection of possible behaviors you might want for > > empty ranges. > > Intuitively, I don't think empty ranges are sensible for temporal unique > > constraints. But anyway, > > it's a bit late now to be discussing this. > > > > I'm also concerned that if ranges have this fundamental incompatibility > > with periods, then the plan > > to eventually evolve this patch set to support standard periods will also > > have as-yet-unknown problems. > > > > Some of these issues might be design flaws in the underlying mechanisms, > > like range types and > > exclusion constraints. Like, if you're supposed to use this for scheduling > > but you can use empty > > ranges to bypass exclusion constraints, how is one supposed to use this? > > Yes, a check constraint > > using isempty() might be the right answer. But I don't see this documented > > anywhere. > > > > On the technical side, adding an implicit check constraint as part of a > > primary key constraint is > > quite a difficult implementation task, as I think you are discovering. I'm > > just reminded about how > > the patch for catalogued not-null constraints struggled with linking these > > not-null constraints to > > primary keys correctly. This sounds a bit similar. > > > > I'm afraid that these issues cannot be resolved in good time for this > > release, so we should revert > > this patch set for now. > > I think reverting is a good idea. I'm not really happy with the CHECK > constraint solution either. > I'd be happy to have some more time to rework this for v18. > > A couple alternatives I'd like to explore: > > 1. Domain constraints instead of a CHECK constraint. I think this is probably > worse, and I don't > plan to spend much time on it, but I thought I'd mention it in case someone > else thought otherwise. > > 2. A slightly different overlaps operator, say &&&, where 'empty' &&& 'empty' > is true. But 'empty' > with anything else could still be false (or not). That operator would prevent > duplicates in an > exclusion constraint. This also means we could support more types than just > ranges & multiranges. I > need to think about whether this combines badly with existing operators, but > if not it has a lot of > promise. If anything it might be *less* contradictory, because it fits better > with 'empty' @> > 'empty', which we say is true. > thanks for the idea, I roughly played around with it, seems doable. but the timing seems not good, reverting is a good idea. I also checked the commit. 6db4598fcb82a87a683c4572707e522504830a2b + +/* + * Returns the btree number for equals, otherwise invalid. + */ +Datum +gist_stratnum_btree(PG_FUNCTION_ARGS) +{ + StrategyNumber strat = PG_GETARG_UINT16(0); + + switch (strat) + { + case RTEqualStrategyNumber: + PG_RETURN_UINT16(BTEqualStrategyNumber); + case RTLessStrategyNumber: + PG_RETURN_UINT16(BTLessStrategyNumber); + case RTLessEqualStrategyNumber: + PG_RETURN_UINT16(BTLessEqualStrategyNumber); + case RTGreaterStrategyNumber: + PG_RETURN_UINT16(BTGreaterStrategyNumber); + case RTGreaterEqualStrategyNumber: + PG_RETURN_UINT16(BTGreaterEqualStrategyNumber); + default: + PG_RETURN_UINT16(InvalidStrategy); + } +} the comments seem not right?
Re: [PATCH] Fix bug when calling strncmp in check_authmethod_valid
On Mon, May 13, 2024 at 01:01:21PM +0300, Aleksander Alekseev wrote: >> Any objections to fixing this in 17 by removing it? (cc:ing Michael from the >> RMT) > > +1 Something that is not documented or used by anyone (apparently) and > is broken should just be removed. 8a02339e9ba3 sounds like an argument good enough to prove there is no demand in the field for being able to support options through initdb --auth, and this does not concern only pam. If somebody is interested in that, that could always be done later. My take is that this would be simpler if implemented through a separate option, leaving the checks between the options and the auth method up to the postmaster when loading pg_hba.conf at startup. Hence, no objections to clean up that now. Thanks for asking. -- Michael signature.asc Description: PGP signature
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Kuroda-san, Here are some review comments for all patches v9* // Patch v9-0001 // There were no changes since v8-0001, so no comments. // Patch v9-0002 // == Commit Message 2.1. Regarding the off->on case, the logical replication already has a mechanism for it, so there is no need to do anything special for the on->off case; however, we must connect to the publisher and expressly change the parameter. The operation cannot be rolled back, and altering the parameter from "on" to "off" within a transaction is prohibited. In the opposite case, there is no need to prevent this because the logical replication worker already had the mechanism to alter the slot option at a convenient time. ~ This explanation seems to be going around in circles, without giving any new information: AFAICT, "Regarding the off->on case, the logical replication already has a mechanism for it, so there is no need to do anything special for the on->off case;" is saying pretty much the same as: "In the opposite case, there is no need to prevent this because the logical replication worker already had the mechanism to alter the slot option at a convenient time." But, what I hoped for in previous review comments was an explanation somewhat less vague than "already has a mechanism" or "already had the mechanism". Can't this have just 1 or 2 lines to say WHAT is that existing mechanism for the "off" to "on" case, and WHY that means there is nothing special to do in that scenario? == src/backend/commands/subscriptioncmds.c 2.2. AlterSubscription /* - * The changed two_phase option of the slot can't be rolled - * back. + * Since altering the two_phase option of subscriptions + * also leads to changing the slot option, this command + * cannot be rolled back. So prevent this if we are in a + * transaction block. In the opposite case, there is no + * need to prevent this because the logical replication + * worker already had the mechanism to alter the slot + * option at a convenient time. */ (Same previous review comments, and same as my review comment for the commit message above). I don't think "already had the mechanism" is enough explanation. Also, the 2nd sentence doesn't make sense here because the comment only said "altering the slot option" -- it didn't say it was altering it to "on" or altering it to "off", so "the opposite case" has no meaning. ~~~ 2.3. AlterSubscription /* - * Try to acquire the connection necessary for altering slot. + * Check the need to alter the replication slot. Failover and two_phase + * options are controlled by both the publisher (as a slot option) and the + * subscriber (as a subscription option). The slot option must be altered + * only when changing "on" to "off". Because in opposite case, the logical + * replication worker already has the mechanism to do so at a convenient + * time. + */ + update_failover = replaces[Anum_pg_subscription_subfailover - 1]; + update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] && + !opts.twophase); This is again the same as other review comments above. Probably, when some better explanation can be found for "already has the mechanism to do so at a convenient time." then all of these places can be changed using similar text. // Patch v9-0003 // There are some imperfect code comments but AFAIK they are the same ones from patch 0002. I think patch 0003 is just moving those comments to different places, so probably they would already be addressed by patch 0002. // Patch v9-0004 // == doc/src/sgml/catalogs.sgml 4.1. + + + subforcealter bool + + + If true, the subscription can be altered two_phase + option, even if there are prepared transactions + + + BEFORE If true, the subscription can be altered two_phase option, even if there are prepared transactions SUGGESTION If true, then the ALTER SUBSCRIPTION command can disable two_phase option, even if there are uncommitted prepared transactions from when two_phase was enabled == doc/src/sgml/ref/alter_subscription.sgml 4.2. - - - The two_phase parameter can only be altered when the - subscription is disabled. When altering the parameter from on - to off, the backend process checks for any incomplete - prepared transactions done by the logical replication worker (from when - two_phase parameter was still on) - and, if any are found, those are aborted. - Well, I still think there ought to be some mention of the relationship between 'force_alter' and 'two_phase' given on this ALTER SUBSCRIPTION page. Then the user can cross-reference to read what the 'force_alter' actually does. == doc/src/sgml/ref/create_subscription.sgml 4.3. + + +force_alter (boolean) + + + Specifies whether the subscription can be altered + two_phase option, even if there are prepared
Re: JIT compilation per plan node
On Fri, 15 Mar 2024 at 10:13, Tomas Vondra wrote: > To clarify, I think the patch is a step in the right direction, and a > meaningful improvement. It may not be the perfect solution we imagine > (but who knows how far we are from that), but AFAIK moving these > decisions to the node level is something the ideal solution would need > to do too. I got thinking about this patch again while working on [1]. I want to write this down as I don't quite have time to get fully back into this right now... Currently, during execution, ExecCreateExprSetupSteps() traverses the Node tree of the Expr to figure out the max varattno of for each slot. That's done so all of the tuple deforming happens at once rather than incrementally. Figuring out the max varattno is a price we have to pay for every execution of the query. I think we'd be better off doing that in the planner. To do this, I thought that setrefs.c could do this processing in fix_join_expr / fix_upper_expr and wrap up the expression in a new Node type that stores the max varattno for each special var type. This idea is related to this discussion because another thing that could be stored in the very same struct is the "num_exec" value. I feel the number of executions of an ExprState is a better gauge of how useful JIT will be than the cost of the plan node. Now, looking at set_join_references(), the execution estimates are not exactly perfect. For example; #define NUM_EXEC_QUAL(parentplan) ((parentplan)->plan_rows * 2.0) that's not a great estimate for a Nested Loop's joinqual, but we could easily make efforts to improve those and that could likely be done independently and concurrently with other work to make JIT more granular. The problem with doing this is that there's just a huge amount of code churn in the executor. I am keen to come up with a prototype so I can get a better understanding of if this is a practical solution. I don't want to go down that path if it's just me that thinks the number of times an ExprState is evaluated is a better measure to go on for JIT vs no JIT than the plan node's total cost. Does anyone have any thoughts on that idea? David [1] https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm=m0L+Gc=T=ke...@mail.gmail.com
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Kuroda-san, I'm having second thoughts about how these patches mention the option values "on|off". These are used in the ALTER SUBSCRIPTION document page for 'two_phase' and 'failover' parameters, and then those "on|off" get propagated to the code comments, error messages, and tests... Now I see that on the CREATE SUBSCRIPTION page [1], every boolean parameter (even including 'two_phase' and 'failover') is described in terms of "true|false" (not "on|off"). In hindsight, it is probably better to refer only to true|false everywhere for these boolean parameters, instead of sometimes using different values like on|off. What do you think? == [1] https://www.postgresql.org/docs/devel/sql-createsubscription.html Kind Regards, Peter Smith. Fujitsu Australia
Re: UniqueKey v2
>> Consider join of tables "a", "b" and "c". My understanding is that >> make_join_rel() is called once with rel1={a} and rel2={b join c}, then with >> rel1={a join b} and rel2={c}, etc. I wanted to say that each call should >> produce the same set of unique keys. >> >> I need to check this part more in detail. > > I think I understand now. By calling populate_joinrel_uniquekeys() for various > orderings, you can find out that various input relation unique keys can > represent the whole join. For example, if the ordering is > > A JOIN (B JOIN C) > > you can prove that the unique keys of A can be used for the whole join, while > for the ordering > > B JOIN (A JOIN C) > > you can prove the same for the unique keys of B, and so on. Yes. >> > We don't create the component uniquekey if any one side of the boths >> > sides is unique already. For example: >> > >> > "(t1.id) in joinrel(t1, t2) is unique" OR "(t2.id) in joinrel is >> > unique", there is no need to create component UniqueKey (t1.id, t2.id); >> >> ok, I need to check more in detail how this part works. > > This optimization makes sense to me. OK. >> > > >> > > Of course one problem is that the number of combinations can grow >> > > exponentially as new relations are joined. >> > >> > Yes, that's why "rule 1" needed and "How to reduce the overhead" in >> > UniqueKey.README is introduced. > > I think there should yet be some guarantee that the number of unique keys does > not grow exponentially. Perhaps a constant that allows a relation (base or > join) to have at most N unique keys. (I imagine N to be rather small, e.g. 3 > or 4.) And when picking the "best N keys", one criterion could be the number > of expressions in the key (the shorter key the better). I don't want to introduce this complextity right now. I'm more inerested with how to work with them effectivity. main effort includes: - the design of bitmapset which is memory usage friendly and easy for combinations. - Optimize the singlerow cases to reduce N UnqiueKeys to 1 UniqueKey. I hope we can pay more attention to this optimization (at most N UniqueKeys) when the major inforastruce has been done. >> > You are correct and IMO my current code are able to tell it is a single >> > row as well. >> > >> > 1. Since t1.id = 1, so t1 is single row, so t1.d is unqiuekey as a >> > consequence. >> > 2. Given t2.id is unique, t1.e = t2.id so t1's unqiuekey can be kept >> > after the join because of rule 1 on joinrel. and t1 is singlerow, so the >> > joinrel is singlerow as well. > > ok, I think I understand now. OK. At last, this probably is my first non-trival patchs which has multiple authors, I don't want myself is the bottleneck for the coorperation, so if you need something to do done sooner, please don't hesitate to ask me for it explicitly. Here is my schedule about this. I can provide the next version based our discussion and your patches at the eariler of next week. and update the UniqueKey.README to make sure the overall design clearer. What I hope you to pay more attention is the UniqueKey.README besides the code. I hope the UniqueKey.README can reduce the effort for others to understand the overall design enormously. -- Best Regards Andy Fan
Re: UniqueKey v2
Antonin Houska writes: >> Could you make the reason clearer for adding 'List *opfamily_lists;' >> into UniqueKey? You said "This is needed to create ECs in the parent >> query if the upper relation represents a subquery." but I didn't get the >> it. Since we need to maintain the UniqueKey in the many places, I'd like >> to keep it as simple as possbile. Of course, anything essentical should >> be added for sure. > > If unique keys are generated for a subquery output, they also need to be > created for the corresponding relation in the upper query ("sub" in the > following example): OK. > > select * from tab1 left join (select * from tab2) sub; > > However, to create an unique key for "sub", you need an EC for each expression > of the key. OK. > And to create an EC, you in turn need the list of operator > families. I'm thinking if we need to "create" any EC. Can you find out a user case where the outer EC is missed and the UniqueKey is still interesting? I don't have an example now. convert_subquery_pathkeys has a similar sistuation and has the following codes: outer_ec = get_eclass_for_sort_expr(root, (Expr *) outer_var, sub_eclass->ec_opfamilies, sub_member->em_datatype, sub_eclass->ec_collation, 0, rel->relids, NULL, false); /* * If we don't find a matching EC, sub-pathkey isn't * interesting to the outer query */ if (outer_ec) best_pathkey = make_canonical_pathkey(root, outer_ec, sub_pathkey->pk_opfamily, sub_pathkey->pk_strategy, sub_pathkey->pk_nulls_first); } > Even if the parent query already had ECs for the columns of "sub" which are > contained in the unique key, you need to make sure that those ECs are > "compatible" with the ECs of the subquery which generated the unique key. That > is, if an EC of the subquery considers certain input values equal, the EC of > the parent query must also be able to determine if they are equal or not. > >> > * RelOptInfo.notnullattrs >> > >> > My understanding is that this field contains the set attributes whose >> > uniqueness is guaranteed by the unique key. They are acceptable because >> > they >> > are either 1) not allowed to be NULL due to NOT NULL constraint or 2) >> > NULL >> > value makes the containing row filtered out, so the row cannot break >> > uniqueness of the output. Am I right? >> > >> > If so, I suggest to change the field name to something less generic, >> > maybe >> > 'uniquekey_attrs' or 'uniquekey_candidate_attrs', and adding a comment >> > that >> > more checks are needed before particular attribute can actually be used >> > in >> > UniqueKey. >> >> I don't think so, UniqueKey is just one of the places to use this >> not-null property, see 3af704098 for the another user case of it. >> >> (Because of 3af704098, we should leverage notnullattnums somehow in this >> patch, which will be included in the next version as well). > > In your patch you modify 'notnullattrs' in add_base_clause_to_rel(), but that > does not happen to 'notnullattnums' in the current master branch. Thus I think > that 'notnullattrs' is specific to the unique keys feature, so the field name > should be less generic. OK. >> > >> > * uniquekey_useful_for_merging() >> > >> > How does uniqueness relate to merge join? In README.uniquekey you seem to >> > point out that a single row is always sorted, but I don't think this >> > function is related to that fact. (Instead, I'd expect that pathkeys are >> > added to all paths for a single-row relation, but I'm not sure you do >> > that >> > in the current version of
Underscore in positional parameters?
I noticed that we (kind of) accept underscores in positional parameters. For example, this works: => PREPARE p1 AS SELECT $1_2; PREPARE => EXECUTE p1 (123); ?column? -- 123 (1 row) Parameter $1_2 is taken as $1 because in rule {param} in scan.l we get the parameter number with atol which stops at the underscore. That's a regression in faff8f8e47f. Before that commit, $1_2 resulted in "ERROR: trailing junk after parameter". I can't tell which fix is the way to go: (1) accept underscores without using atol, or (2) just forbid underscores. Any ideas? atol can be replaced with pg_strtoint32_safe to handle the underscores. This also avoids atol's undefined behavior on overflows. AFAICT, positional parameters are not part of the SQL standard, so nothing prevents us from accepting underscores here as well. The attached patch does that and also adds a test case. But reverting {param} to its old form to forbid underscores also makes sense. That is: param \${decdigit}+ param_junk \${decdigit}+{ident_start} It seems very unlikely that anybody uses that many parameters and still cares about readability to use underscores. But maybe users simply expect that underscores are valid here as well. -- Erik >From 0f319818360cdd26e96198ea7fcaba1b8298f264 Mon Sep 17 00:00:00 2001 From: Erik Wienhold Date: Tue, 14 May 2024 04:14:08 +0200 Subject: [PATCH] Fix parsing of positional parameters with underscores Underscores were added to numeric literals in faff8f8e47. That also introduced a regression whereby positional parameters accepted underscores as well. But such parameter numbers were not parsed correctly by atol which stops at the first non-digit character. So parameter $1_2 would be taken as just $1. Fix that by replacing atol with pg_strtoint32_safe. Thereby we also avoid the undefined behavior of atol on overflows. --- src/backend/parser/scan.l| 5 - src/test/regress/expected/numerology.out | 1 + src/test/regress/sql/numerology.sql | 2 ++ 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l index 5eaadf53b2..ebb7ace9ca 100644 --- a/src/backend/parser/scan.l +++ b/src/backend/parser/scan.l @@ -992,7 +992,10 @@ other . {param} { SET_YYLLOC(); - yylval->ival = atol(yytext + 1); + ErrorSaveContext escontext = {T_ErrorSaveContext}; + yylval->ival = pg_strtoint32_safe(yytext + 1, (Node *) ); + if (escontext.error_occurred) + yyerror("parameter too large"); return PARAM; } {param_junk} { diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out index f662a5050a..3c1308e22c 100644 --- a/src/test/regress/expected/numerology.out +++ b/src/test/regress/expected/numerology.out @@ -297,6 +297,7 @@ SELECT 1_000.5e0_1; 10005 (1 row) +PREPARE p2 AS SELECT $0_1; -- error cases SELECT _100; ERROR: column "_100" does not exist diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql index 1941c58e68..11af76828d 100644 --- a/src/test/regress/sql/numerology.sql +++ b/src/test/regress/sql/numerology.sql @@ -77,6 +77,8 @@ SELECT 1_000.; SELECT .000_005; SELECT 1_000.5e0_1; +PREPARE p2 AS SELECT $0_1; + -- error cases SELECT _100; SELECT 100_; -- 2.45.0
Re: First draft of PG 17 release notes
Bruce Momjian writes: > On Sat, May 11, 2024 at 01:27:25PM +0800, Andy Fan wrote: >> >> Hello Bruce, >> >> > I have committed the first draft of the PG 17 release notes; you can >> > see the results here: >> > >> >https://momjian.us/pgsql_docs/release-17.html >> >> Thank you for working on this! >> >> > I welcome feedback. For some reason it was an easier job than usual. >> >> Do you think we need to add the following 2 items? >> >> - 9f133763961e280d8ba692bcad0b061b861e9138 this is an optimizer >> transform improvement. > > It was unclear from the commit message exactly what user-visible > optimization this allowed. Do you have details? Yes, It allows the query like "SELECT * FROM t1 WHERE t1.a in (SELECT a FROM t2 WHERE t2.b = t1.b)" be pulled up a semi join, hence more join methods / join orders are possible. > >> - a8a968a8212ee3ef7f22795c834b33d871fac262 this is an optimizer costing >> improvement. > > Does this allow faster UNION ALL with LIMIT, perhaps? Yes, for example: (subquery-1) UNION ALL (subquery-2) LIMIT n; When planning the subquery-1 or subquery-2, limit N should be considered. As a consequence, maybe hash join should be replaced with Nested Loop. Before this commits, it is ignored if it is flatten into appendrel, and the "flatten" happens very often. David provided a summary for the both commits in [1]. [1] https://www.postgresql.org/message-id/CAApHDvqAQgq27LgYmJ85VVGTR0%3DhRW6HHq2oZgK0ZiYC_a%2BEww%40mail.gmail.com -- Best Regards Andy Fan
Re: First draft of PG 17 release notes
jian he 于2024年5月9日周四 18:00写道: > On Thu, May 9, 2024 at 12:04 PM Bruce Momjian wrote: > > > > I have committed the first draft of the PG 17 release notes; you can > > see the results here: > > > > https://momjian.us/pgsql_docs/release-17.html > > > > another potential incompatibilities issue: > ALTER TABLE DROP PRIMARY KEY > > see: > > https://www.postgresql.org/message-id/202404181849.6frtmajobe27%40alvherre.pgsql > > Since Alvaro has reverted all changes to not-null constraints, so will not have potential incompatibilities issue. -- Tender Wang OpenPie: https://en.openpie.com/
Re: First draft of PG 17 release notes
On Sat, May 11, 2024 at 01:27:25PM +0800, Andy Fan wrote: > > Hello Bruce, > > > I have committed the first draft of the PG 17 release notes; you can > > see the results here: > > > > https://momjian.us/pgsql_docs/release-17.html > > Thank you for working on this! > > > I welcome feedback. For some reason it was an easier job than usual. > > Do you think we need to add the following 2 items? > > - 9f133763961e280d8ba692bcad0b061b861e9138 this is an optimizer > transform improvement. It was unclear from the commit message exactly what user-visible optimization this allowed. Do you have details? > - a8a968a8212ee3ef7f22795c834b33d871fac262 this is an optimizer costing > improvement. Does this allow faster UNION ALL with LIMIT, perhaps? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: First draft of PG 17 release notes
On Sat, May 11, 2024 at 10:24:39AM -0400, Joe Conway wrote: > On 5/11/24 09:57, Jelte Fennema-Nio wrote: > > On Fri, 10 May 2024 at 23:31, Tom Lane wrote: > > > > > > Bruce Momjian writes: > > > > I looked at both of these. In both cases I didn't see why the user > > > > would need to know these changes were made: > > > > > > I agree that the buffering change is not likely interesting, but > > > the fact that you can now control-C out of a psql "\c" command > > > is user-visible. People might have internalized the fact that > > > it didn't work, or created complicated workarounds. > > > > The buffering change improved performance up to ~40% in some of the > > benchmarks. The case it improves mostly is COPY of large rows and > > streaming a base backup. That sounds user-visible enough to me to > > warrant an entry imho. > > +1 Attached patch applied. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you. commit e87e7324555 Author: Bruce Momjian Date: Mon May 13 20:55:13 2024 -0400 doc PG 17 relnotes: add item about libpq large data transfers Reported-by: Jelte Fennema-Nio Discussion: https://postgr.es/m/cageczqtz5auqlel6dald2hu2fxs_losh4kedndj1fwthsb_...@mail.gmail.com Reviewed-by: Joe Conway Backpatch-through: master diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml index 9dd3954f3c2..38c14970822 100644 --- a/doc/src/sgml/release-17.sgml +++ b/doc/src/sgml/release-17.sgml @@ -1901,6 +1901,17 @@ Add libpq function PQsetChunkedRowsMode to allow retrieval of results in chunks + + + + +Allow libpq to more efficiently transfer large blocks of data (Melih Mutlu) + + +
Re: Why is citext/regress failing on hamerkop?
On Tue, May 14, 2024 at 8:17 AM Tom Lane wrote: > I'm not sure whether we've got unsent data pending in the problematic > condition, nor why it'd remain unsent if we do (shouldn't the backend > consume it anyway?). But this has the right odor for an explanation. > > I'm pretty hesitant to touch this area myself, because it looks an > awful lot like commits 6051857fc and ed52c3707, which eventually > had to be reverted. I think we need a deeper understanding of > exactly what Winsock is doing or not doing before we try to fix it. I was beginning to suspect that lingering odour myself. I haven't look at the GSS code but I was imagining that what we have here is perhaps not unsent data dropped on the floor due to linger policy (unclean socket close on process exist), but rather that the server didn't see the socket as ready to read because it lost track of the FD_CLOSE somewhere because the client closed it gracefully, and our server-side FD_CLOSE handling has always been a bit suspect. I wonder if the GSS code is somehow more prone to brokenness. One thing we learned in earlier problems was that abortive/error disconnections generate FD_CLOSE repeatedly, while graceful ones give you only one. In other words, if the other end politely calls closesocket(), the server had better not miss the FD_CLOSE event, because it won't come again. That's what https://commitfest.postgresql.org/46/3523/ is intended to fix. Does it help here? Unfortunately that's unpleasantly complicated and unbackpatchable (keeping a side-table of socket FDs and event handles, so we don't lose events between the cracks).
Re: First draft of PG 17 release notes
On Fri, May 10, 2024 at 05:31:33PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Fri, May 10, 2024 at 06:50:54PM +0200, Jelte Fennema-Nio wrote: > >> There are two commits that I think would benefit from being listed > >> (but maybe they are already listed and I somehow missed them, or they > >> are left out on purpose for some reason): > > > I looked at both of these. In both cases I didn't see why the user > > would need to know these changes were made: > > I agree that the buffering change is not likely interesting, but > the fact that you can now control-C out of a psql "\c" command > is user-visible. People might have internalized the fact that > it didn't work, or created complicated workarounds. Agreed, attached patch applied. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you. diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml index 5a741ff16f1..9dd3954f3c2 100644 --- a/doc/src/sgml/release-17.sgml +++ b/doc/src/sgml/release-17.sgml @@ -1980,6 +1980,17 @@ The parameter is "min_rows". + + + + +Allow psql connections to be canceled with control-C (Tristan Partin) + + +
Re: GUC-ify walsender MAX_SEND_SIZE constant
On Thu, Apr 25, 2024 at 02:53:33PM +0200, Majid Garoosi wrote: > Unfortunately, I quit that company a month ago (I wish we could > discuss this earlier) and don't have access to the environment > anymore. > I'll try to ask my teammates and see if they can find anything about > the exact values of latency, bw, etc. > > Anyway, here is some description of the environment. Sadly, there > are no numbers in this description, but I'll try to provide as much details > as possible. > There is a k8s cluster running over some VMs. Each postgres > instance runs as a pod inside the k8s cluster. So, both the > primary and standby servers are in the same DC, but might be on > different baremetal nodes. There is an overlay network for the pods to > see each other, and there's also another overlay network for the VMs > to see each other. > The storage servers are in the same DC, but we're sure they're on some > racks other than the postgres pods. They run Ceph [1] project and provide > Rados Block Devices (RBD) [2] interface. In order for k8s to use ceph, a > Ceph-CSI [3] controller is running inside the k8s cluster. > BTW, the FS type is ext4. Okay, seeing the feedback for this patch and Andres disagreeing with it as being a good idea, I have marked the patch as rejected as it was still in the CF app. -- Michael signature.asc Description: PGP signature
Re: I have an exporting need...
On Tue, 14 May 2024 at 06:18, Juan Hernández wrote: > Do you consider useful to add a parameter (for example, --separatetables) so > when used the exporting file process can create a different tablename.sql > file for each table in database automatically? > > Example... > > PGHOST="/tmp" PGPASSWORD="mydbpass" pg_dump -U dbusername --separatetables > -Fp --inserts dbname > "/route/dbname.sql" > > And if this database has tables table1...table10, then 10 files are created... pg_dump has code to figure out the dependency of objects in the database so that the dump file produced can be restored. If one file was output per table, how would you know in which order to restore them? For example, you have a table with a FOREIGN KEY to reference some other table, you need to restore the referenced table first. That's true for both restoring the schema and restoring the data. David
RE: Resetting synchronous_standby_names can wait for CHECKPOINT to finish
Hello, > When the checkpointer process is busy, even if we reset > synchronous_standby_names, the resumption of the backend processes waiting in > SyncRep are made to wait until the checkpoint is completed. > This prevents the prompt resumption of application processing when a problem > occurs on the standby server in a synchronous replication system. > I confirmed this in PostgreSQL 12.18. I have tested this issue on Postgres built from the master branch (17devel) and observed the same behavior where the backend SyncRep release is blocked until CHECKPOINT completion. In situations where a synchronous standby instance encounters an error and needs to be detached, I believe that the current behavior of waiting for SyncRep is inappropriate as it delays the backend. I don't think changing the position of SIGHUP processing in the Checkpointer process carries much risk. Is there any oversight in my perception? Regards, Yusuke Egashira.
Re: Direct SSL connection with ALPN and HBA rules
[this should probably belong to a different thread, but I'm not sure what to title it] On Mon, May 13, 2024 at 4:08 AM Heikki Linnakangas wrote: > I think these options should be designed from the user's point of view, > so that the user can specify the risks they're willing to accept, and > the details of how that's accomplished are handled by libpq. For > example, I'm OK with (tick all that apply): > > [ ] passive eavesdroppers seeing all the traffic > [ ] MITM being able to hijack the session > [ ] connecting without verifying the server's identity > [ ] divulging the plaintext password to the server > [ ] ... I'm pessimistic about a quality-of-protection scheme for this use case (*). I don't think users need more knobs in their connection strings, and many of the goals of transport encryption are really not independent from each other in practice. As evidence I point to the absolute mess of GSSAPI wrapping, which lets you check separate boxes for things like "require the server to authenticate itself" and "require integrity" and "allow MITMs to reorder messages" and so on, as if the whole idea of "integrity" is useful if you don't know who you're talking to in the first place. I think I recall slapd having something similarly arcane (but at least it doesn't make the clients do it!). Those kinds of APIs don't evolve well, in my opinion. I think most people probably just want browser-grade security as quickly and cheaply as possible, and we don't make that very easy today. I'll try to review a QoP scheme if someone works on it, don't get me wrong, but I'd much rather spend time on a "very secure by default" mode that gets rid of most of the options (i.e. a postgresqls:// scheme). (*) I've proposed quality-of-protection in the past, for Postgres proxy authentication [1]. But I'm comfortable in my hypocrisy, because in that case, the end user doing the configuration is a DBA with a config file who is expected to understand the whole system, and it's a niche use case (IMO) without an obvious "common setup". And frankly I think my proposal is unlikely to go anywhere; the cost/benefit probably isn't good enough. > If you need to verify > the server's identity, it implies sslmode=verify-CA or > channel_binding=true. Neither of those two options provides strong authentication of the peer, and personally I wouldn't want them to be considered worthy of "verify the server's identity" mode. And -- taking a wild leap here -- if we disagree, then granularity becomes a problem: either the QoP scheme now has to have sub-checkboxes for "if the server knows my password, that's good enough" and "it's fine if the server's hostname doesn't match the cert, for some reason", or it smashes all of those different ideas into one setting and then I have to settle for the weakest common denominator during an active attack. Assuming I haven't missed a third option, will that be easier/better than the status quo of require_auth+sslmode? > A different line of thought is that to keep the attack surface as smal > as possible, you should specify very explicitly what exactly you expect > to happen in the authentication, and disallow any variance. For example, > you expect SCRAM to be used, with a certificate signed by a particular > CA, and if the server requests anything else, that's suspicious and the > connection is aborted. We should make that possible too That's 'require_auth=scram-sha-256 sslmode=verify-ca', no? --Jacob [1] https://www.postgresql.org/message-id/0768cedb-695a-8841-5f8b-da2aa64c8f3a%40timescale.com
Re: SQL:2011 application time
On 5/13/24 03:11, Peter Eisentraut wrote: It looks like we missed some of these fundamental design questions early on, and it might be too late now to fix them for PG17. For example, the discussion on unique constraints misses that the question of null values in unique constraints itself is controversial and that there is now a way to change the behavior. So I imagine there is also a selection of possible behaviors you might want for empty ranges. Intuitively, I don't think empty ranges are sensible for temporal unique constraints. But anyway, it's a bit late now to be discussing this. I'm also concerned that if ranges have this fundamental incompatibility with periods, then the plan to eventually evolve this patch set to support standard periods will also have as-yet-unknown problems. Some of these issues might be design flaws in the underlying mechanisms, like range types and exclusion constraints. Like, if you're supposed to use this for scheduling but you can use empty ranges to bypass exclusion constraints, how is one supposed to use this? Yes, a check constraint using isempty() might be the right answer. But I don't see this documented anywhere. On the technical side, adding an implicit check constraint as part of a primary key constraint is quite a difficult implementation task, as I think you are discovering. I'm just reminded about how the patch for catalogued not-null constraints struggled with linking these not-null constraints to primary keys correctly. This sounds a bit similar. I'm afraid that these issues cannot be resolved in good time for this release, so we should revert this patch set for now. I think reverting is a good idea. I'm not really happy with the CHECK constraint solution either. I'd be happy to have some more time to rework this for v18. A couple alternatives I'd like to explore: 1. Domain constraints instead of a CHECK constraint. I think this is probably worse, and I don't plan to spend much time on it, but I thought I'd mention it in case someone else thought otherwise. 2. A slightly different overlaps operator, say &&&, where 'empty' &&& 'empty' is true. But 'empty' with anything else could still be false (or not). That operator would prevent duplicates in an exclusion constraint. This also means we could support more types than just ranges & multiranges. I need to think about whether this combines badly with existing operators, but if not it has a lot of promise. If anything it might be *less* contradictory, because it fits better with 'empty' @> 'empty', which we say is true. Another thing a revert would give me some time to consider: even though it's not standard syntax, I wonder if we want to require syntax something like `PRIMARY KEY USING gist (id, valid_at WITHOUT OVERLAPS)`. Everywhere else we default to btree, so defaulting to gist feels a little weird. In theory we could even someday support WITHOUT OVERLAPS with btree, if we taught that AM to answer that question. (I admit there is probably not a lot of desire for that though.) Yours, -- Paul ~{:-) p...@illuminatedcomputing.com
Re: Why is parula failing?
David Rowley writes: > I've not seen any recent failures from Parula that relate to this > issue. The last one seems to have been about 4 weeks ago. > I'm now wondering if it's time to revert the debugging code added in > 1db689715. Does anyone think differently? +1. It seems like we wrote off the other issue as a compiler bug, so I don't have much trouble assuming that this one was too. regards, tom lane
Re: Adding the extension name to EData / log_line_prefix
On Mon, May 13, 2024 at 07:11:53PM GMT, Tom Lane wrote: > Andres Freund writes: > > On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote: > >> Hmmm, depending on the extension it can extensively call/use postgres code > >> so would be nice if we can differentiate if the code is called from > >> Postgres itself or from an extension. > > > I think that's not realistically possible. It's also very fuzzy what that'd > > mean. If there's a planner hook and then the query executes normally, what > > do > > you report for an execution time error? And even the simpler case - should > > use > > of pg_stat_statements cause everything within to be logged as a > > pg_stat_statement message? > > Not to mention that there could be more than one extension on the call > stack. I think tying this statically to the ereport call site is > fine. > > The mechanism that Andres describes for sourcing the name seems a bit > overcomplex though. Why not just allow/require each extension to > specify its name as a constant string? We could force the matter by > redefining PG_MODULE_MAGIC as taking an argument: > > PG_MODULE_MAGIC("hstore"); FTR there was a proposal at [1] some time ago that could be used for that need (and others), I thought it could be good to mention it just in case. That would obviously only work if all extensions uses that framework. [1] https://www.postgresql.org/message-id/flat/3207907.AWbSqkKDnR%40aivenronan
Re: Adding the extension name to EData / log_line_prefix
Hi, On 2024-05-13 19:11:53 -0400, Tom Lane wrote: > The mechanism that Andres describes for sourcing the name seems a bit > overcomplex though. Why not just allow/require each extension to > specify its name as a constant string? We could force the matter by > redefining PG_MODULE_MAGIC as taking an argument: > PG_MODULE_MAGIC("hstore"); Mostly because it seemed somewhat sad to require every extension to have version-specific ifdefs around that, particularly because it's not hard for us to infer. I think there might be other use cases for the backend to provide "extension scoped" information, FWIW. Even just providing the full path to the extension library could be useful. Greetings, Andres Freund
Re: Why is parula failing?
On Thu, 21 Mar 2024 at 13:53, David Rowley wrote: > > On Thu, 21 Mar 2024 at 12:36, Tom Lane wrote: > > So yeah, if we could have log_autovacuum_min_duration = 0 perhaps > > that would yield a clue. > > FWIW, I agree with your earlier statement about it looking very much > like auto-vacuum has run on that table, but equally, if something like > the pg_index record was damaged we could get the same plan change. > > We could also do something like the attached just in case we're > barking up the wrong tree. I've not seen any recent failures from Parula that relate to this issue. The last one seems to have been about 4 weeks ago. I'm now wondering if it's time to revert the debugging code added in 1db689715. Does anyone think differently? David
Re: Adding the extension name to EData / log_line_prefix
Andres Freund writes: > On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote: >> Hmmm, depending on the extension it can extensively call/use postgres code >> so would be nice if we can differentiate if the code is called from >> Postgres itself or from an extension. > I think that's not realistically possible. It's also very fuzzy what that'd > mean. If there's a planner hook and then the query executes normally, what do > you report for an execution time error? And even the simpler case - should use > of pg_stat_statements cause everything within to be logged as a > pg_stat_statement message? Not to mention that there could be more than one extension on the call stack. I think tying this statically to the ereport call site is fine. The mechanism that Andres describes for sourcing the name seems a bit overcomplex though. Why not just allow/require each extension to specify its name as a constant string? We could force the matter by redefining PG_MODULE_MAGIC as taking an argument: PG_MODULE_MAGIC("hstore"); regards, tom lane
Re: Adding the extension name to EData / log_line_prefix
Hi, On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote: > On Mon, May 13, 2024 at 5:51 PM Andres Freund wrote: > > It's not entirely trivial to provide errfinish() with a parameter > indicating > > the extension, but it's doable: > > > > 1) Have PG_MODULE_MAGIC also define a new variable for the extension name, > >empty at that point > > > > 2) In internal_load_library(), look up that new variable, and fill it > with a, > >mangled, libname. > > > > 4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set, > >we're in the server, otherwise an extension). In the backend itself, > define > >it to NULL, otherwise to the variable created by PG_MODULE_MAGIC. > > > > 5) In elog/ereport/errsave/... pass this new variable to > >errfinish/errsave_finish. > > > > Then every extension should define their own Pg_extension_filename, right? It'd be automatically set by postgres when loading libraries. > > I've attached a *very rough* prototype of this idea. My goal at this > stage was > > just to show that it's possible, not for the code to be in a reviewable > state. > > > > > > Here's e.g. what this produces with log_line_prefix='%m [%E] ' > > > > 2024-05-13 13:50:17.518 PDT [postgres] LOG: database system is ready to > accept connections > > 2024-05-13 13:50:19.138 PDT [cube] ERROR: invalid input syntax for cube > at character 13 > > 2024-05-13 13:50:19.138 PDT [cube] DETAIL: syntax error at or near "f" > > 2024-05-13 13:50:19.138 PDT [cube] STATEMENT: SELECT cube('foo'); > > > > 2024-05-13 13:43:07.484 PDT [postgres] LOG: database system is ready to > accept connections > > 2024-05-13 13:43:11.699 PDT [hstore] ERROR: syntax error in hstore: > unexpected end of string at character 15 > > 2024-05-13 13:43:11.699 PDT [hstore] STATEMENT: SELECT hstore('foo'); > > > > > > Was not able to build your patch by simply: Oh, turns out it only builds with meson right now. I forgot that, with autoconf, for some unknown reason, we only set BUILDING_DLL on some OSs. I attached a crude patch changing that. > > It's worth pointing out that this, quite fundamentally, can only work > when the > > log message is triggered directly by the extension. If the extension code > > calls some postgres function and that function then errors out, it'll be > seens > > as being part of postgres. > > > > But I think that's ok - they're going to be properly errcode-ified etc. > > > > Hmmm, depending on the extension it can extensively call/use postgres code > so would be nice if we can differentiate if the code is called from > Postgres itself or from an extension. I think that's not realistically possible. It's also very fuzzy what that'd mean. If there's a planner hook and then the query executes normally, what do you report for an execution time error? And even the simpler case - should use of pg_stat_statements cause everything within to be logged as a pg_stat_statement message? I think the best we can do is to actually say where the error is directly triggered from. Greetings, Andres Freund >From 1c59c465f2d359e8c47cf91d1ea458ea3b64ec84 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 13 May 2024 13:47:41 -0700 Subject: [PATCH v2 1/2] WIP: Track shared library in which elog/ereport is called --- src/include/fmgr.h | 1 + src/include/utils/elog.h | 18 +- src/backend/utils/error/elog.c | 33 - src/backend/utils/fmgr/dfmgr.c | 30 ++ 4 files changed, 68 insertions(+), 14 deletions(-) diff --git a/src/include/fmgr.h b/src/include/fmgr.h index ccb4070a251..3c7cfd7fee9 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -504,6 +504,7 @@ PG_MAGIC_FUNCTION_NAME(void) \ static const Pg_magic_struct Pg_magic_data = PG_MODULE_MAGIC_DATA; \ return _magic_data; \ } \ +PGDLLEXPORT const char *Pg_extension_filename = NULL; \ extern int no_such_variable diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 054dd2bf62f..5e57f01823d 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -137,6 +137,13 @@ struct Node; * prevents gcc from making the unreachability deduction at optlevel -O0. *-- */ +#ifdef BUILDING_DLL +#define ELOG_EXTNAME NULL +#else +extern PGDLLEXPORT const char *Pg_extension_filename; +#define ELOG_EXTNAME Pg_extension_filename +#endif + #ifdef HAVE__BUILTIN_CONSTANT_P #define ereport_domain(elevel, domain, ...) \ do { \ @@ -144,7 +151,7 @@ struct Node; if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ errstart_cold(elevel, domain) : \ errstart(elevel, domain)) \ - __VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \ + __VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \ if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \ pg_unreachable(); \ } while(0) @@ -154,7 +161,7 @@ struct Node; const int elevel_ = (elevel); \
Re: Direct SSL connection with ALPN and HBA rules
On Mon, May 13, 2024 at 9:13 AM Robert Haas wrote: > I find this idea to be a massive improvement over the status quo, +1 > and > I didn't spot any major problems when I read through the patch, > either. Definitely not a major problem, but I think select_next_encryption_method() has gone stale, since it originally provided generality and lines of fallback that no longer exist. In other words, I think the following code is now misleading: > if (conn->sslmode[0] == 'a') > SELECT_NEXT_METHOD(ENC_PLAINTEXT); > > SELECT_NEXT_METHOD(ENC_NEGOTIATED_SSL); > SELECT_NEXT_METHOD(ENC_DIRECT_SSL); > > if (conn->sslmode[0] != 'a') > SELECT_NEXT_METHOD(ENC_PLAINTEXT); To me, that implies that negotiated mode takes precedence over direct, but the point of the patch is that it's not possible to have both. And if direct SSL is in use, then sslmode can't be "allow" anyway, and we definitely don't want ENC_PLAINTEXT. So if someone proposes a change to select_next_encryption_method(), you'll have to remember to stare at init_allowed_encryption_methods() as well, and think really hard about what's going on. And vice-versa. That worries me. > I don't have a strong opinion about whether sslnegotiation=direct > should error out (as you propose here) or silently promote sslmode to > require. I think either is defensible. I'm comforted that, since sslrootcert=system already does it, plenty of use cases will get that for free. And if you decide in the future that you really really want it to promote, it won't be a compatibility break to make that change. (That gives us more time for wider v16-17 adoption, to see how the sslrootcert=system magic promotion behavior is going in practice.) > Had I been implementing it, I > think I would have done as Jacob proposes, just because once we've > forced a direct SSL negotiation surely the only sensible behavior is > to be using SSL, unless you think there should be a > silently-reconnect-without-SSL behavior, which I sure don't. We still allow GSS to preempt SSL, though, so "forced" is probably overstating things. > It's really hard to believe in 2024 that anyone should ever be using a > setting that may or may not encrypt the connection. There's http and > https but there's no httpmaybes. +1. I think (someone hop in and correct me please) that Opportunistic Encryption for HTTP mostly fizzled, and they gave it a *lot* of thought. --Jacob
Re: Adding the extension name to EData / log_line_prefix
On Mon, May 13, 2024 at 5:51 PM Andres Freund wrote: > > Hi, > > It can be very useful to look at the log messages emitted by a larger number > of postgres instances to see if anything unusual is happening. E.g. checking > whether there are an increased number of internal, IO, corruption errors (and > LOGs too, because we emit plenty bad things as LOG) . One difficulty is that > extensions tend to not categorize their errors. But unfortunately errors in > extensions are hard to distinguish from errors emitted by postgres. > > A related issue is that it'd be useful to be able to group log messages by > extension, to e.g. see which extensions are emitting disproportionally many > log messages. > > Therefore I'd like to collect the extension name in elog/ereport and add a > matching log_line_prefix escape code. > I liked the idea ... It is very helpful for troubleshooting problems in production. > It's not entirely trivial to provide errfinish() with a parameter indicating > the extension, but it's doable: > > 1) Have PG_MODULE_MAGIC also define a new variable for the extension name, >empty at that point > > 2) In internal_load_library(), look up that new variable, and fill it with a, >mangled, libname. > > 4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set, >we're in the server, otherwise an extension). In the backend itself, define >it to NULL, otherwise to the variable created by PG_MODULE_MAGIC. > > 5) In elog/ereport/errsave/... pass this new variable to >errfinish/errsave_finish. > Then every extension should define their own Pg_extension_filename, right? > I've attached a *very rough* prototype of this idea. My goal at this stage was > just to show that it's possible, not for the code to be in a reviewable state. > > > Here's e.g. what this produces with log_line_prefix='%m [%E] ' > > 2024-05-13 13:50:17.518 PDT [postgres] LOG: database system is ready to accept connections > 2024-05-13 13:50:19.138 PDT [cube] ERROR: invalid input syntax for cube at character 13 > 2024-05-13 13:50:19.138 PDT [cube] DETAIL: syntax error at or near "f" > 2024-05-13 13:50:19.138 PDT [cube] STATEMENT: SELECT cube('foo'); > > 2024-05-13 13:43:07.484 PDT [postgres] LOG: database system is ready to accept connections > 2024-05-13 13:43:11.699 PDT [hstore] ERROR: syntax error in hstore: unexpected end of string at character 15 > 2024-05-13 13:43:11.699 PDT [hstore] STATEMENT: SELECT hstore('foo'); > > Was not able to build your patch by simply: ./configure --prefix=/tmp/pg ... make -j ... /usr/bin/ld: ../../src/port/libpgport_srv.a(path_srv.o): warning: relocation against `Pg_extension_filename' in read-only section `.text' /usr/bin/ld: access/brin/brin.o: in function `brininsert': /data/src/pg/main/src/backend/access/brin/brin.c:403: undefined reference to `Pg_extension_filename' /usr/bin/ld: access/brin/brin.o: in function `brinbuild': /data/src/pg/main/src/backend/access/brin/brin.c:1107: undefined reference to `Pg_extension_filename' /usr/bin/ld: access/brin/brin.o: in function `brin_summarize_range': /data/src/pg/main/src/backend/access/brin/brin.c:1383: undefined reference to `Pg_extension_filename' /usr/bin/ld: /data/src/pg/main/src/backend/access/brin/brin.c:1389: undefined reference to `Pg_extension_filename' /usr/bin/ld: /data/src/pg/main/src/backend/access/brin/brin.c:1434: undefined reference to `Pg_extension_filename' /usr/bin/ld: access/brin/brin.o:/data/src/pg/main/src/backend/access/brin/brin.c:1450: more undefined references to `Pg_extension_filename' follow /usr/bin/ld: warning: creating DT_TEXTREL in a PIE collect2: error: ld returned 1 exit status make[2]: *** [Makefile:67: postgres] Error 1 make[2]: Leaving directory '/data/src/pg/main/src/backend' make[1]: *** [Makefile:42: all-backend-recurse] Error 2 make[1]: Leaving directory '/data/src/pg/main/src' make: *** [GNUmakefile:11: all-src-recurse] Error 2 > It's worth pointing out that this, quite fundamentally, can only work when the > log message is triggered directly by the extension. If the extension code > calls some postgres function and that function then errors out, it'll be seens > as being part of postgres. > > But I think that's ok - they're going to be properly errcode-ified etc. > Hmmm, depending on the extension it can extensively call/use postgres code so would be nice if we can differentiate if the code is called from Postgres itself or from an extension. Regards, -- Fabrízio de Royes Mello
Re: pg_sequence_last_value() for unlogged sequences on standbys
Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Adding the extension name to EData / log_line_prefix
Hi, It can be very useful to look at the log messages emitted by a larger number of postgres instances to see if anything unusual is happening. E.g. checking whether there are an increased number of internal, IO, corruption errors (and LOGs too, because we emit plenty bad things as LOG) . One difficulty is that extensions tend to not categorize their errors. But unfortunately errors in extensions are hard to distinguish from errors emitted by postgres. A related issue is that it'd be useful to be able to group log messages by extension, to e.g. see which extensions are emitting disproportionally many log messages. Therefore I'd like to collect the extension name in elog/ereport and add a matching log_line_prefix escape code. It's not entirely trivial to provide errfinish() with a parameter indicating the extension, but it's doable: 1) Have PG_MODULE_MAGIC also define a new variable for the extension name, empty at that point 2) In internal_load_library(), look up that new variable, and fill it with a, mangled, libname. 4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set, we're in the server, otherwise an extension). In the backend itself, define it to NULL, otherwise to the variable created by PG_MODULE_MAGIC. 5) In elog/ereport/errsave/... pass this new variable to errfinish/errsave_finish. I've attached a *very rough* prototype of this idea. My goal at this stage was just to show that it's possible, not for the code to be in a reviewable state. Here's e.g. what this produces with log_line_prefix='%m [%E] ' 2024-05-13 13:50:17.518 PDT [postgres] LOG: database system is ready to accept connections 2024-05-13 13:50:19.138 PDT [cube] ERROR: invalid input syntax for cube at character 13 2024-05-13 13:50:19.138 PDT [cube] DETAIL: syntax error at or near "f" 2024-05-13 13:50:19.138 PDT [cube] STATEMENT: SELECT cube('foo'); 2024-05-13 13:43:07.484 PDT [postgres] LOG: database system is ready to accept connections 2024-05-13 13:43:11.699 PDT [hstore] ERROR: syntax error in hstore: unexpected end of string at character 15 2024-05-13 13:43:11.699 PDT [hstore] STATEMENT: SELECT hstore('foo'); It's worth pointing out that this, quite fundamentally, can only work when the log message is triggered directly by the extension. If the extension code calls some postgres function and that function then errors out, it'll be seens as being part of postgres. But I think that's ok - they're going to be properly errcode-ified etc. Thoughts? Greetings, Andres Freund >From 1c59c465f2d359e8c47cf91d1ea458ea3b64ec84 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 13 May 2024 13:47:41 -0700 Subject: [PATCH v1] WIP: Track shared library in which elog/ereport is called --- src/include/fmgr.h | 1 + src/include/utils/elog.h | 18 +- src/backend/utils/error/elog.c | 33 - src/backend/utils/fmgr/dfmgr.c | 30 ++ 4 files changed, 68 insertions(+), 14 deletions(-) diff --git a/src/include/fmgr.h b/src/include/fmgr.h index ccb4070a251..3c7cfd7fee9 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -504,6 +504,7 @@ PG_MAGIC_FUNCTION_NAME(void) \ static const Pg_magic_struct Pg_magic_data = PG_MODULE_MAGIC_DATA; \ return _magic_data; \ } \ +PGDLLEXPORT const char *Pg_extension_filename = NULL; \ extern int no_such_variable diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 054dd2bf62f..5e57f01823d 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -137,6 +137,13 @@ struct Node; * prevents gcc from making the unreachability deduction at optlevel -O0. *-- */ +#ifdef BUILDING_DLL +#define ELOG_EXTNAME NULL +#else +extern PGDLLEXPORT const char *Pg_extension_filename; +#define ELOG_EXTNAME Pg_extension_filename +#endif + #ifdef HAVE__BUILTIN_CONSTANT_P #define ereport_domain(elevel, domain, ...) \ do { \ @@ -144,7 +151,7 @@ struct Node; if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ errstart_cold(elevel, domain) : \ errstart(elevel, domain)) \ - __VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \ + __VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \ if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \ pg_unreachable(); \ } while(0) @@ -154,7 +161,7 @@ struct Node; const int elevel_ = (elevel); \ pg_prevent_errno_in_scope(); \ if (errstart(elevel_, domain)) \ - __VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \ + __VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \ if (elevel_ >= ERROR) \ pg_unreachable(); \ } while(0) @@ -169,7 +176,7 @@ extern bool message_level_is_interesting(int elevel); extern bool errstart(int elevel, const char *domain); extern pg_attribute_cold bool errstart_cold(int elevel, const char *domain); -extern void errfinish(const char *filename, int lineno, const char
Re: Large files for relations
On 06.03.24 22:54, Thomas Munro wrote: Rebased. I had intended to try to get this into v17, but a couple of unresolved problems came up while rebasing over the new incremental backup stuff. You snooze, you lose. Hopefully we can sort these out in time for the next commitfest: * should pg_combinebasebackup read the control file to fetch the segment size? * hunt for other segment-size related problems that may be lurking in new incremental backup stuff * basebackup_incremental.c wants to use memory in proportion to segment size, which looks like a problem, and I wrote about that in a new thread[1] Overall, I like this idea, and the patch seems to have many bases covered. The patch will need a rebase. I was able to test it on master@{2024-03-13}, but after that there are conflicts. In .cirrus.tasks.yml, one of the test tasks uses --with-segsize-blocks=6, but you are removing that option. You could replace that with something like PG_TEST_INITDB_EXTRA_OPTS='--rel-segsize=48kB' But that won't work exactly because initdb: error: argument of --rel-segsize must be a power of two I suppose that's ok as a change, since it makes the arithmetic more efficient. But maybe it should be called out explicitly in the commit message. If I run it with 64kB, the test pgbench/001_pgbench_with_server fails consistently, so it seems there is still a gap somewhere. A minor point, the initdb error message initdb: error: argument of --rel-segsize must be a multiple of BLCKSZ would be friendlier if actually showed the value of the block size instead of just the symbol. Similarly for the nearby error message about the off_t size. In the control file, all the other fields use unsigned types. Should relseg_size be uint64? PG_CONTROL_VERSION needs to be changed.
Summary of Sort Improvement Proposals
Hello, In light of multiple threads [1-6] discussing sorting improvements, I'd like to consolidate the old (+some new) ideas as a starting point. It might make sense to brain storm on a few of these ideas and maybe even identify some that are worth implementing and testing. 1. Simple algorithmic ideas: - Use single-assignment insertion-sort instead of swapping - Increase insertion-sort threshold to at least 8 (possibly 10+), to be determined empirically based on current hardware - Make insertion-sort threshold customizable via template based on sort element size 2. More complex/speculative algorithmic ideas: - Try counting insertion-sort loop iterations and bail after a certain limit (include presorted check in insertion-sort loop and continue presorted check from last position in separate loop after bailout) - Try binary search for presorted check (outside of insertion-sort-code) - Try binary insertion sort (if comparison costs are high) - Try partial insertion sort (include presorted check) - Try presorted check only at top-level, not on every recursive step, or if on every level than at least only for n > some threshold - Try asymmetric quick-sort partitioning - Try dual pivot quick-sort - Try switching to heap-sort dependent on recursion depth (might allow ripping out median-of-median) 3. TupleSort ideas: - Use separate sort partition for NULL values to avoid null check on every comparison and to make nulls first/last trivial - Pass down non-nullness info to avoid null check and/or null-partition creation (should ideally be determined by planner) - Skip comparison of first sort key on subsequent full tuple tie-breaker comparison (unless abbreviated key) - Encode NULL directly in abbreviated key (only if no null-partitioning) 4. Planner ideas: - Use pg_stats.correlation to inform sort algorithm selection for sort keys that come from sequential-scans/bitmap-heap-scans - Use n_distinct to inform sort algorithm selection (many tie-breaker comparisons necessary on multi-key sort) - Improve costing of sorts in planner considering tuple size, distribution and n_distinct [1] https://www.postgresql.org/message-id/flat/ddc4e498740a8e411c59%40zeyos.com [2] https://www.postgresql.org/message-id/flat/CAFBsxsHanJTsX9DNJppXJxwg3bU%2BYQ6pnmSfPM0uvYUaFdwZdQ%40mail.gmail.com [3] https://www.postgresql.org/message-id/flat/CAApHDvoTTtoQYfp3d0kTPF6y1pjexgLwquzKmjzvjC9NCw4RGw%40mail.gmail.com [4] https://www.postgresql.org/message-id/flat/CAEYLb_Xn4-6f1ofsf2qduf24dDCVHbQidt7JPpdL_RiT1zBJ6A%40mail.gmail.com [5] https://www.postgresql.org/message-id/flat/CAEYLb_W%2B%2BUhrcWprzG9TyBVF7Sn-c1s9oLbABvAvPGdeP2DFSQ%40mail.gmail.com [6] https://www.postgresql.org/message-id/flat/683635b8-381b-5b08-6069-d6a45de19a12%40enterprisedb.com#12683b7a6c566eb5b926369b5fd2d41f -- Benjamin Coutu http://www.zeyos.com
Re: race condition in pg_class
On Mon, May 13, 2024 at 03:53:08PM -0400, Robert Haas wrote: > On Sun, May 12, 2024 at 7:29 PM Noah Misch wrote: > > - [consequences limited to transient failure] Since a PROC_IN_VACUUM > > backend's > > xmin does not stop pruning, an MVCC scan in that backend can find zero > > tuples when one is live. This is like what all backends got in the days > > of > > SnapshotNow catalog scans. See the pgbench test suite addition. (Perhaps > > the fix is to make VACUUM do its MVCC scans outside of PROC_IN_VACUUM, > > setting that flag later and unsetting it earlier.) > > Are you saying that this is a problem already, or that the patch > causes it to start happening? If it's the former, that's horrible. The former.
Re: New GUC autovacuum_max_threshold ?
On Mon, May 13, 2024 at 11:14 AM Frédéric Yhuel wrote: > FWIW, I do agree with your math. I found your demonstration convincing. > 50 was selected with the wet finger. Good to know. > Using the formula I suggested earlier: > > vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples, > vac_base_thresh + vac_scale_factor * sqrt(reltuples) * 1000); > > your table of 2.56 billion tuples will be vacuumed if there are > more than 10 million dead tuples (every 28 minutes). Yeah, so that is about 50x what we do now (twice an hour vs. once a day). While that's a lot more reasonable than the behavior that we'd get from a 500k hard cap (every 84 seconds), I suspect it's still too aggressive. I find these things much easier to reason about in gigabytes than in time units. In that example, the table was 320GB and was getting vacuumed after accumulating 64GB of bloat. That seems like a lot. It means that the table can grow from 320GB all the way up until 384GB before we even think about starting to vacuum it, and then we might not start right away, depending on resource availability, and we may take some time to finish, possibly considerable time, depending on the number and size of indexes and the availability of I/O resources. So actually the table might very plausibly be well above 400GB before we get done processing it, or potentially even more. I think that's not aggressive enough. But how much would we like to push that 64GB of bloat number down for a table of this size? I would argue that if we're vacuuming the table when it's only got 1GB of bloat, or 2GB of bloat, that seems excessive. Unless the system is very lightly loaded and has no long-running transactions at all, we're unlikely to be able to vacuum aggressively enough to keep a 320GB table at a size of 321GB or 322GB. Without testing or doing any research, I'm going to guess that a realistic number is probably in the range of 10-20GB of bloat. If the table activity is very light, we might be able to get it even lower, like say 5GB, but the costs ramp up very quickly as you push the vacuuming threshold down. Also, if the table accumulates X amount of bloat during the time it takes to run one vacuum, you can never succeed in limiting bloat to a value less than X (and probably more like 1.5*X or 2*X or something). So without actually trying anything, which I do think somebody should do and report results, my guess is that for a 320GB table, you'd like to multiply the vacuum frequency by a value somewhere between 3 and 10, and probably much closer to 3 than to 10. Maybe even less than 3. Not sure exactly. Like I say, I think someone needs to try some different workloads and database sizes and numbers of indexes, and try to get a feeling for what actually works well in practice. > If we want to stick with the simple formula, we should probably choose a > very high default, maybe 100 million, as you suggested earlier. > > However, it would be nice to have the visibility map updated more > frequently than every 100 million dead tuples. I wonder if this could be > decoupled from the vacuum process? Yes, but if a page has had any non-HOT updates, it can't become all-visible again without vacuum. If it has had only HOT updates, then a HOT-prune could make it all-visible. I don't think we do that currently, but I think in theory we could. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why is citext/regress failing on hamerkop?
Thomas Munro writes: > For citext_utf8, I pushed cff4e5a3. Hamerkop runs infrequently, so > here's hoping for 100% green on master by Tuesday or so. In the meantime, some off-list investigation by Alexander Lakhin has turned up a good deal of information about why we're seeing failures on hamerkop in the back branches. Summarizing, it appears that 1. In a GSS-enabled Windows build without any active Kerberos server, libpq's pg_GSS_have_cred_cache() succeeds, allowing libpq to try to open a GSSAPI connection, but then gss_init_sec_context() fails, leading to client-side reports like this: +Connection 2 failed: could not initiate GSSAPI security context: Unspecified GSS failure. Minor code may provide more information: Credential cache is empty +FATAL: sorry, too many clients already (The first of these lines comes out during the attempted GSS connection, the second during the only-slightly-more-successful non-GSS connection.) So that's problem number 1: how is it that gss_acquire_cred() succeeds but then gss_init_sec_context() disclaims knowledge of any credentials? Can we find a way to get this failure to be detected during pg_GSS_have_cred_cache()? It is mighty expensive to launch a backend connection that is doomed to fail, especially when this happens during *every single libpq connection attempt*. 2. Once gss_init_sec_context() fails, libpq abandons the connection and starts over; since it has already initiated a GSS handshake on the connection, there's not much choice. Although libpq faithfully issues closesocket() on the abandoned connection, Alexander found that the connected backend doesn't reliably see that: it may just sit there until the AuthenticationTimeout elapses (1 minute by default). That backend is still consuming a postmaster child slot, so if this happens on any sizable fraction of test connection attempts, it's little surprise that we soon get "sorry, too many clients already" failures. 3. We don't know exactly why hamerkop suddenly started seeing these failures, but a plausible theory emerges after noting that its reported time for the successful "make check" step dropped pretty substantially right when this started. In the v13 branch, "make check" was taking 2:18 or more in the several runs right before the first isolationcheck failure, but 1:40 or less just after. So it looks like the animal was moved onto faster hardware. That feeds into this problem because, with a successful isolationcheck run taking more than a minute, there was enough time for some of the earlier stuck sessions to time out and exit before their postmaster-child slots were needed. Alexander confirmed this theory by demonstrating that the main regression tests in v15 would pass if he limited their speed enough (by reducing the CPU allowed to a VM) but not at full speed. So the buildfarm results suggesting this is only an issue in <= v13 must be just a timing artifact; the problem is still there. Of course, backends waiting till timeout is not a good behavior independently of what is triggering that, so we have two problems to solve here, not one. I have no ideas about the gss_init_sec_context() failure, but I see a plausible theory about the failure to detect socket closure in Microsoft's documentation about closesocket() [1]: If the l_onoff member of the LINGER structure is zero on a stream socket, the closesocket call will return immediately and does not receive WSAEWOULDBLOCK whether the socket is blocking or nonblocking. However, any data queued for transmission will be sent, if possible, before the underlying socket is closed. This is also called a graceful disconnect or close. In this case, the Windows Sockets provider cannot release the socket and other resources for an arbitrary period, thus affecting applications that expect to use all available sockets. This is the default behavior for a socket. I'm not sure whether we've got unsent data pending in the problematic condition, nor why it'd remain unsent if we do (shouldn't the backend consume it anyway?). But this has the right odor for an explanation. I'm pretty hesitant to touch this area myself, because it looks an awful lot like commits 6051857fc and ed52c3707, which eventually had to be reverted. I think we need a deeper understanding of exactly what Winsock is doing or not doing before we try to fix it. I wonder if there are any Microsoft employees around here with access to the relevant source code. In the short run, it might be a good idea to deprecate using --with-gssapi on Windows builds. A different stopgap idea could be to drastically reduce the default AuthenticationTimeout, perhaps only on Windows. regards, tom lane [1] https://learn.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-closesocket
Re: race condition in pg_class
On Sun, May 12, 2024 at 7:29 PM Noah Misch wrote: > - [consequences limited to transient failure] Since a PROC_IN_VACUUM backend's > xmin does not stop pruning, an MVCC scan in that backend can find zero > tuples when one is live. This is like what all backends got in the days of > SnapshotNow catalog scans. See the pgbench test suite addition. (Perhaps > the fix is to make VACUUM do its MVCC scans outside of PROC_IN_VACUUM, > setting that flag later and unsetting it earlier.) Are you saying that this is a problem already, or that the patch causes it to start happening? If it's the former, that's horrible. If it's the latter, I'd say that is a fatal defect. -- Robert Haas EDB: http://www.enterprisedb.com
Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts
On Mon, May 13, 2024 at 10:53 AM Matthias van de Meent wrote: > It's not inconceivable that this will significantly increase WAL > volume, but I think we should go for correctness rather than fastest > copy. I don't think we can afford to just do this blindly for the sake of a hypothetical non-core AM that uses nonstandard pages. There must be lots of cases where the holes are large, and where the WAL volume would be a multiple of what it is currently. That's a *big* regression. > If we went with fastest copy, we'd better just skip logging the > FSM and VM forks because we're already ignoring the data of the pages, > so why not ignore the pages themselves, too? I don't think that holds > water when we want to be crash-proof in CREATE DATABASE, with a full > data copy of the template database. This seems like a red herring. Either assuming standard pages is a good idea or it isn't, and either logging the FSM and VM forks is a good idea or it isn't, but those are two separate questions. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Direct SSL connection with ALPN and HBA rules
On Mon, May 13, 2024 at 1:42 PM Jelte Fennema-Nio wrote: > Like Jacob already said, I guess you meant me here. The main point I > was trying to make is that sslmode=require is extremely insecure too, > so if we're changing the default then I'd rather bite the bullet and > actually make the default a secure one this time. No-ones browser > trusts self-signed certs by default, but currently lots of people > trust self-signed certs when connecting to their production database > without realizing. > > IMHO the only benefit that sslmode=require brings over sslmode=prefer > is detecting incorrectly configured servers i.e. servers that are > supposed to support ssl but somehow don't due to a misconfigured > GUC/pg_hba. Such "incorrectly configured server" detection avoids > sending data to such a server, which an eavesdropper on the network > could see. Which is definitely a security benefit, but it's an > extremely small one. In all other cases sslmode=prefer brings exactly > the same protection as sslmode=require, because sslmode=prefer > encrypts the connection unless postgres actively tells the client to > downgrade to plaintext (which never happens when the server is > configured correctly). I think I agree with *nearly* every word of this. However: (1) I don't want to hijack this thread about a v17 open item to talk too much about a hypothetical v18 proposal. (2) While in general you need more than just SSL to ensure security, I'm not sure that there's only one way to do it, and I doubt that we should try to pick a winner. (3) I suspect that even going as far as sslmode=require by default is going to be extremely painful for hackers, the project, and users. Moving the goalposts further increases the likelihood of nothing happening at all. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Direct SSL connection with ALPN and HBA rules
On Mon, May 13, 2024 at 12:45 PM Jacob Champion wrote: > For the record, I didn't say that... You mean Jelte's quote up above? Yeah, sorry, I got my J-named hackers confused. Apologies. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?
On 05/13/24 09:35, aa wrote: > If you call the action of "sifting" ordering, then yes. If you don't call > it ordering, then no. One thing seems intriguing about this idea: normally, an expected property of any ORDER BY is that no result row can be passed down the pipe until all input rows have been seen. In the case of ORDER BY , or more generally ORDER BY , a pigeonhole sort could be used—and rows mapping to the ordered-first pigeonhole could be passed down the pipe on sight. (Rows mapping to any later pigeonhole still have to be held to the end, unless some further analysis can identify when all rows for earlier pigeonholes must have been seen). I don't know whether any such ORDER BY strategy is already implemented, or would be useful enough to be worth implementing, but it might be handy in cases where a large number of rows are expected to map to the first pigeonhole. Intermediate storage wouldn't be needed for those, and some follow-on processing could go on concurrently. The usage example offered here ("sift" nulls last, followed by a LIMIT) does look a lot like a job for a WHERE clause though. Regards, -Chap
Re: cataloguing NOT NULL constraints
On Mon, May 13, 2024 at 12:45 PM Alvaro Herrera wrote: > The point is that a column can be in a primary key and not have an > explicit not-null constraint. This is different from having a column be > NOT NULL and having a primary key on top. In both cases the attnotnull > flag is set; the difference between these two scenarios is what happens > if you drop the primary key. If you do not have an explicit not-null > constraint, then the attnotnull flag is lost as soon as you drop the > primary key. You don't have to do DROP NOT NULL for that to happen > > This means that if you have a column that's in the primary key but does > not have an explicit not-null constraint, then we shouldn't make one up. > (Which we would, if we were to keep an unadorned NOT NULL that we can't > drop at the end of the dump.) It seems to me that the practical thing to do about this problem is just decide not to solve it. I mean, it's currently the case that if you establish a PRIMARY KEY when you create a table, the columns of that key are marked NOT NULL and remain NOT NULL even if the primary key is later dropped. So, if that didn't change, we would be no less compliant with the SQL standard (or your reading of it) than we are now. And if you do really want to make that change, why not split it out into its own patch, so that the patch that does $SUBJECT is changing the minimal number of other things at the same time? That way, reverting something might not involve reverting everything, plus you could have a separate design discussion about what that fix ought to look like, separate from the issues that are truly inherent to cataloging NOT NULL constraints per se. What I meant about changing the order of operations is that, currently, the database knows that the column is NOT NULL before the COPY happens, and I don't think we can change that. I think you agree -- that's why you invented the throwaway constraints. As far as I can see, the problems all have to do with getting the "throwaway" part to happen correctly. It can't be a problem to just mark the relevant columns NOT NULL in the relevant tables -- we already do that. But if you want to discard some of those NOT NULL markings once the PRIMARY KEY is added, you have to know which ones to discard. If we just consider the most straightforward scenario where somebody does a full dump-and-restore, getting that right may be annoying, but it seems like it surely has to be possible. The dump will just have to understand which child tables (or, more generally, descendent tables) got a NOT NULL marking on a column because of the PK and which ones had an explicit marking in the old database and do the right thing in each case. But what if somebody does a selective restore of one table from a partitioning hierarchy? Currently, the columns that would have been part of the primary key end up NOT NULL, but the primary key itself is not restored because it can't be. What will happen in this new system? If you don't apply any NOT NULL constraints to those columns, then a user who restores one partition from an old dump and tries to reattach it to the correct partitioned table has to recheck the NOT NULL constraint, unlike now. If you apply a normal-looking garden-variety NOT NULL constraint to that column, you've invented a constraint that didn't exist in the source database. And if you apply a throwaway NOT NULL constraint but the user never attaches that table anywhere, then the throwaway constraint survives. None of those options sound very good to me. Another scenario: Say that you have a table with a PRIMARY KEY. For some reason, you want to drop the primary key and then add it back. Well, with this definitional change, as soon as you drop it, you forget that the underlying columns don't contain any nulls, so when you add it back, you have to check them again. I don't know who would find that behavior an improvement over what we have today. So I don't really think it's a great idea to change this behavior, but even if it is, is it such a good idea that we want to sink the whole patch set repeatedly over it, as has already happened twice now? I feel that if we did what Tom suggested a year ago in https://www.postgresql.org/message-id/3801207.1681057...@sss.pgh.pa.us -- "I'm inclined to think that this idea of suppressing the implied NOT NULL from PRIMARY KEY is a nonstarter and we should just go ahead and make such a constraint" -- there's a very good chance that a revert would have been avoided here and it would still be just as valid to think of revisiting this particular question in a future release as it is now. -- Robert Haas EDB: http://www.enterprisedb.com
Re: race condition in pg_class
On Mon, May 13, 2024 at 04:59:59PM +0900, Michael Paquier wrote: > About inplace050-tests-inj-v1.patch. > > + /* Check if blocked_pid is in injection_wait(). */ > + proc = BackendPidGetProc(blocked_pid); > + if (proc == NULL) > + PG_RETURN_BOOL(false); /* session gone: definitely unblocked */ > + wait_event = > + > pgstat_get_wait_event(UINT32_ACCESS_ONCE(proc->wait_event_info)); > + if (wait_event && strncmp("INJECTION_POINT(", > + wait_event, > + > strlen("INJECTION_POINT(")) == 0) > + PG_RETURN_BOOL(true); > > Hmm. I am not sure that this is the right interface for the job > because this is not only related to injection points but to the > monitoring of a one or more wait events when running a permutation > step. Could you say more about that? Permutation steps don't monitor wait events today. This patch would be the first instance of that. > Perhaps this is something that should be linked to the spec > files with some property area listing the wait events we're expected > to wait on instead when running a step that we know will wait? The spec syntax doesn't distinguish contention types at all. The isolation tester's needs are limited to distinguishing: (a) process is waiting on another test session (b) process is waiting on automatic background activity (autovacuum, mainly) Automatic background activity doesn't make a process enter or leave injection_wait(), so all injection point wait events fall in (a). (The tester ignores (b), since those clear up without intervention. Failing to ignore them, as the tester did long ago, made output unstable.)
Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?
aa writes: > If you call the action of "sifting" ordering, then yes. If you don't call > it ordering, then no. > In essence, is the output of a filtering mechanism, done in a single result > set pass. And this pass should be the same pass in charge of collecting the > result set in the first place. Sounds a lot like a WHERE clause to me. regards, tom lane
Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?
Hi, If you call the action of "sifting" ordering, then yes. If you don't call it ordering, then no. In essence, is the output of a filtering mechanism, done in a single result set pass. And this pass should be the same pass in charge of collecting the result set in the first place. Thanks On Mon, May 13, 2024 at 5:48 AM Wolfgang Wilhelm wrote: > Hi, > > do I interpret your idea correctly: You want some sort of ordering without > ordering? > > Kind regards > WW > > Am Montag, 13. Mai 2024 um 10:40:38 MESZ hat aa > Folgendes geschrieben: > > > Hello Everyone! > > Is there any chance to get some kind of a result set sifting mechanism in > Postgres? > > What I am looking for is a way to get for example: "nulls last" in a > result set, without having to call "order by" or having to use UNION ALL, > and if possible to get this in a single result set pass. > > Something on this line: SELECT a, b, c FROM my_table WHERE a nulls last > OFFSET 0 LIMIT 25 > > I don't want to use order by or union all because these are time consuming > operations, especially on large data sets and when comparations are done > on dynamic values (eg: geolocation distances in between a mobile and a > static location) > > What I would expect from such a feature, will be speeds comparable with > non sorted selects, while getting a very rudimentary ordering. > > A use case for such a mechanism will be the implementation of QUICK > relevant search results for a search engine. > > I'm not familiar with how Postgres logic handles simple select queries, > but the way I would envision a result set sifting logic, would be to > collect the result set, in 2 separate lists, based on the sifting > condition, and then concatenate these 2 lists and return the result, when > the pagination requests conditions are met. > > Any idea if such a functionality is feasible ? > > Thank you. > > PS: if ever implemented, the sifting mechanism could be extended to > accommodate any type of thresholds, not just null values. > > > >
I have an exporting need...
Hi team! First, i want to thank you for having your hands in this. You are doing a fantastic and blessing job. Bless to you all! I have a special need i want to comment to you. This is not a bug, is a need i have and i write here for been redirected where needed. I have to make a daily backup. The database is growing a lot per day, and sometimes i've had the need to recover just a table. And would be easier move a 200M file with only the needed table instead of moving a 5G file with all the tables i don't need, just a matter of speed. I've created a script to export every table one by one, so in case i need to import a table again, don't have the need to use the very big exportation file, but the "tablename.sql" file created for every table. My hosting provider truncated my script because is very large (more than 200 lines, each line to export one table), so i think the way i do this is hurting the server performance. Then my question. Do you consider useful to add a parameter (for example, --separatetables) so when used the exporting file process can create a different tablename.sql file for each table in database automatically? Example... PGHOST="/tmp" PGPASSWORD="mydbpass" pg_dump -U dbusername --separatetables -Fp --inserts dbname > "/route/dbname.sql" And if this database has tables table1...table10, then 10 files are created... dbname_table1.sql dbname_table2.sql dbname_table3.sql ... dbname_table8.sql dbname_table9.sql dbname_table10.sql In each file, all main parameters will be generated again. For example the file dbname_table1.sql... -- -- PostgreSQL database dump -- -- Dumped from database version 10.21 -- Dumped by pg_dump version 15.6 SET statement_timeout = 0; SET lock_timeout = 0; SET client_encoding = 'UTF8'; ... ... SET default_tablespace = ''; -- -- Name: table1; Type: TABLE; Schema: public; Owner: dbusername -- CREATE TABLE public.table1 ( code numeric(5,0), name character varying(20) ) I dont know if many developers have same need as me. I hope this help in future. Thanks for reading me and thanks for what you've done.. You are doing fine! Cheers! __ Juan de Jesús
On the use of channel binding without server certificates (was: Direct SSL connection with ALPN and HBA rules)
[soapbox thread, so I've changed the Subject] On Mon, May 13, 2024 at 4:08 AM Heikki Linnakangas wrote: > "channel_binding=require sslmode=require" also protects from MITM attacks. This isn't true in the same way that "standard" TLS protects against MITM. I know you know that, but for the benefit of bystanders reading the threads, I think we should stop phrasing it like this. Most people who want MITM protection need to be using verify-full. Details for those bystanders: Channel binding alone will only disconnect you after the MITM is discovered, after your startup packet is leaked but before you send any queries to the server. A hash of your password will also be leaked in that situation, which starts the timer on an offline attack. And IIRC, you won't get an alert that says "someone's in the middle"; it'll just look like you mistyped your password. (Stronger passwords provide stronger protection in this situation, which is not a property that most people are used to. If I choose to sign into Google with the password "hunter2", it doesn't somehow make the TLS protection weaker. But if you rely on SCRAM by itself for server authentication, it does more or less work like that.) Use channel_binding *in addition to* sslmode=verify-full if you want enhanced authentication of the peer, as suggested in the docs [1]. Don't rely on channel binding alone for the vast majority of use cases, and if you know better for your particular use case, then you already know enough to be able to ignore my advice. [/soapbox] Thanks, --Jacob [1] https://www.postgresql.org/docs/current/preventing-server-spoofing.html
Re: Fix resource leak (src/backend/libpq/be-secure-common.c)
> On 13 May 2024, at 20:05, Ranier Vilela wrote: > Em qua., 10 de abr. de 2024 às 15:33, Daniel Gustafsson > escreveu: > Thanks, I'll have a look. I've left this for post-freeze on purpose to not > cause unnecessary rebasing. Will take a look over the next few days unless > beaten to it. > Any chance we'll have these fixes in v17? Nice timing, I was actually rebasing them today to get them committed. -- Daniel Gustafsson
Re: Fix resource leak (src/backend/libpq/be-secure-common.c)
Em qua., 10 de abr. de 2024 às 15:33, Daniel Gustafsson escreveu: > On 10 Apr 2024, at 20:31, Ranier Vilela wrote: > > Em ter., 2 de abr. de 2024 às 15:31, Daniel Gustafsson > escreveu: > >> > On 2 Apr 2024, at 20:13, Ranier Vilela wrote: >> >> > Fix by freeing the pointer, like pclose_check (src/common/exec.c) >> similar case. >> >> Off the cuff, seems reasonable when loglevel is LOG. >> > > Per Coverity. > > Another case of resource leak, when loglevel is LOG. > In the function shell_archive_file (src/backend/archive/shell_archive.c) > The pointer *xlogarchcmd* is not freed. > > > Thanks, I'll have a look. I've left this for post-freeze on purpose to not > cause unnecessary rebasing. Will take a look over the next few days unless > beaten to it. > Any chance we'll have these fixes in v17? best regards, Ranier Vilela
Re: Fix out-of-bounds in the function GetCommandTagName
Em seg., 13 de mai. de 2024 às 14:38, Tom Lane escreveu: > David Rowley writes: > > I've added a CF entry under your name for this: > > https://commitfest.postgresql.org/48/4927/ > > > If it was code new to PG17 I'd be inclined to go ahead with it now, > > but it does not seem to align with making the release mode stable. > > I'd bet others will feel differently about that. Delaying seems a > > better default choice at least. > > The security team's Coverity instance has started to show this > complaint now too. So I'm going to go ahead and push this change > in HEAD. It's probably unwise to change it in stable branches, > since there's at least a small chance some external code is using > COMMAND_TAG_NEXTTAG for the same purpose tag_behavior[] does. > But we aren't anywhere near declaring v17's API stable, so > I'd rather fix the issue than dismiss it in HEAD. > Thanks for the commit, Tom. best regards, Ranier Vilela
Re: UniqueKey v2
Antonin Houska wrote: > Andy Fan wrote: > > > > > > * Combining the UKs > > > > > > IMO this is the most problematic part of the patch. You call > > > populate_joinrel_uniquekeys() for the same join multiple times, > > > > Why do you think so? The below code is called in "make_join_rel" > > Consider join of tables "a", "b" and "c". My understanding is that > make_join_rel() is called once with rel1={a} and rel2={b join c}, then with > rel1={a join b} and rel2={c}, etc. I wanted to say that each call should > produce the same set of unique keys. > > I need to check this part more in detail. I think I understand now. By calling populate_joinrel_uniquekeys() for various orderings, you can find out that various input relation unique keys can represent the whole join. For example, if the ordering is A JOIN (B JOIN C) you can prove that the unique keys of A can be used for the whole join, while for the ordering B JOIN (A JOIN C) you can prove the same for the unique keys of B, and so on. > > Is your original question is about populate_joinrel_uniquekey_for_rel > > rather than populate_joinrel_uniquekeys? We have the below codes: > > > > outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, > > outerrel, > > > > innerrel, restrictlist); > > inneruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, > > innerrel, > > > > outerrel, restrictlist); > > > > This is mainly because of the following theory. Quoted from > > README.uniquekey. Let's called this as "rule 1". > > > > """ > > How to maintain the uniquekey? > > --- > > .. From the join relation, it is maintained with two rules: > > > > - the uniquekey in one side is still unique if it can't be duplicated > > after the join. for example: > > > > SELECT t1.pk FROM t1 JOIN t2 ON t1.a = t2.pk; > > UniqueKey on t1: t1.pk > > UniqueKey on t1 Join t2: t1.pk > > """ > > > > AND the blow codes: > > > > > > if (outeruk_still_valid || inneruk_still_valid) > > > > /* > > * the uniquekey on outers or inners have been added into > > joinrel so > > * the combined uniuqekey from both sides is not needed. > > */ > > return; > > > > > > We don't create the component uniquekey if any one side of the boths > > sides is unique already. For example: > > > > "(t1.id) in joinrel(t1, t2) is unique" OR "(t2.id) in joinrel is > > unique", there is no need to create component UniqueKey (t1.id, t2.id); > > ok, I need to check more in detail how this part works. This optimization makes sense to me. > > > > > > Of course one problem is that the number of combinations can grow > > > exponentially as new relations are joined. > > > > Yes, that's why "rule 1" needed and "How to reduce the overhead" in > > UniqueKey.README is introduced. I think there should yet be some guarantee that the number of unique keys does not grow exponentially. Perhaps a constant that allows a relation (base or join) to have at most N unique keys. (I imagine N to be rather small, e.g. 3 or 4.) And when picking the "best N keys", one criterion could be the number of expressions in the key (the shorter key the better). > > > > > > 2) Check if the join relation is single-row > > > > > > I in order to get rid of the dependency on 'restrictlist', I think you > > > can > > > use ECs. Consider a query from your regression tests: > > > > > > CREATE TABLE uk_t (id int primary key, a int not null, b int not null, c > > > int, d int, e int); > > > > > > SELECT distinct t1.d FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id and t1.id > > > = 1; > > > > > > The idea here seems to be that no more than one row of t1 matches the > > > query > > > clauses. Therefore, if t2(id) is unique, the clause t1.e=t2.id ensures > > > that > > > no more than one row of t2 matches the query (because t1 cannot provide > > > the > > > clause with more than one input value of 'e'). And therefore, the join > > > also > > > produces at most one row. > > > > You are correct and IMO my current code are able to tell it is a single > > row as well. > > > > 1. Since t1.id = 1, so t1 is single row, so t1.d is unqiuekey as a > > consequence. > > 2. Given t2.id is unique, t1.e = t2.id so t1's unqiuekey can be kept > > after the join because of rule 1 on joinrel. and t1 is singlerow, so the > > joinrel is singlerow as well. ok, I think I understand now. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: Direct SSL connection with ALPN and HBA rules
On Mon, 13 May 2024 at 18:14, Robert Haas wrote: > I disagree with Jacob's assertion that sslmode=require has no security > benefits over sslmode=prefer. That seems like the kind of pessimism > that makes people hate security professionals. There have got to be > some attacks that are foreclosed by encrypting the connection, even if > you don't stop MITM attacks or other things that are more > sophisticated than running wireshark and seeing what goes by on the > wire. Like Jacob already said, I guess you meant me here. The main point I was trying to make is that sslmode=require is extremely insecure too, so if we're changing the default then I'd rather bite the bullet and actually make the default a secure one this time. No-ones browser trusts self-signed certs by default, but currently lots of people trust self-signed certs when connecting to their production database without realizing. IMHO the only benefit that sslmode=require brings over sslmode=prefer is detecting incorrectly configured servers i.e. servers that are supposed to support ssl but somehow don't due to a misconfigured GUC/pg_hba. Such "incorrectly configured server" detection avoids sending data to such a server, which an eavesdropper on the network could see. Which is definitely a security benefit, but it's an extremely small one. In all other cases sslmode=prefer brings exactly the same protection as sslmode=require, because sslmode=prefer encrypts the connection unless postgres actively tells the client to downgrade to plaintext (which never happens when the server is configured correctly).
Re: Fix out-of-bounds in the function GetCommandTagName
David Rowley writes: > I've added a CF entry under your name for this: > https://commitfest.postgresql.org/48/4927/ > If it was code new to PG17 I'd be inclined to go ahead with it now, > but it does not seem to align with making the release mode stable. > I'd bet others will feel differently about that. Delaying seems a > better default choice at least. The security team's Coverity instance has started to show this complaint now too. So I'm going to go ahead and push this change in HEAD. It's probably unwise to change it in stable branches, since there's at least a small chance some external code is using COMMAND_TAG_NEXTTAG for the same purpose tag_behavior[] does. But we aren't anywhere near declaring v17's API stable, so I'd rather fix the issue than dismiss it in HEAD. regards, tom lane
Re: Allowing additional commas between columns, and at the end of the SELECT clause
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= writes: > Tom Lane writes: >> I'm fairly down on this idea for SQL, because I think it creates >> ambiguity for the ROW() constructor syntax. That is: >> (x,y) is understood to be shorthand for ROW(x,y) >> (x) is not ROW(x), it's just x >> (x,) means what? > Python has a similar issue: (x, y) is a tuple, but (x) is just x, and > they use the trailing comma to disambiguate, so (x,) creates a > single-item tuple. AFAIK it's the only place where the trailing comma > is significant. Ugh :-(. The semantic principle I'd prefer to have here is "a trailing comma is ignored", but what they did breaks that. But then again, I'm not particularly a fan of anything about Python's syntax. > Yeah, a more principled approach would be to not special-case target > lists, but to allow one (and only one) trailing comma everywhere: > select, order by, group by, array constructors, row constructors, > everything that looks like a function call, etc. If it can be made to work everywhere, that would get my vote. I'm not sure if any other ambiguities arise, though. SQL has a lot of weird syntax corners (and the committee keeps adding more :-(). regards, tom lane
Re: Improving information_schema._pg_expandarray()
[ I got distracted while writing this follow-up and only just found it in my list of unsent Gnus buffers, and now it's probably too late to make it for 17, but here it is anyway while I remember. ] Tom Lane writes: > I happened to notice that information_schema._pg_expandarray(), > which has the nigh-unreadable definition > > AS 'select $1[s], > s operator(pg_catalog.-) pg_catalog.array_lower($1,1) > operator(pg_catalog.+) 1 > from pg_catalog.generate_series(pg_catalog.array_lower($1,1), > pg_catalog.array_upper($1,1), > 1) as g(s)'; > > can now be implemented using unnest(): > > AS 'SELECT * FROM pg_catalog.unnest($1) WITH ORDINALITY'; > > It seems to be slightly more efficient this way, but the main point > is to make it more readable. I didn't spot this until it got committed, but it got me wondering what eliminating the wrapper function completely would look like, so I whipped up the attached. It instead calls UNNEST() laterally in the queries, which has the side benefit of getting rid of several subselects, one of which was particularly confusing. In one place the lateral form eliminated the need for WITH ORDINALITY as well. - ilmari >From 9d1b2f2d16f10903d975a3bb7551a38c5ce62e15 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Thu, 28 Dec 2023 23:47:33 + Subject: [PATCH] Eliminate information_schema._pg_expandarray completely Commit 58054de2d0847c09ef091956f72ae5e9fb9a176e made it into a simple wrapper around unnest, but we can simplfy things further by calling unnest directly in the queries. --- src/backend/catalog/information_schema.sql | 104 + src/backend/utils/adt/arrayfuncs.c | 3 - src/test/regress/expected/psql.out | 30 +++--- src/test/regress/sql/psql.sql | 2 +- 4 files changed, 58 insertions(+), 81 deletions(-) diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index c4145131ce..cf25f5d1bc 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -39,22 +39,15 @@ SET search_path TO information_schema; * A few supporting functions first ... */ -/* Expand any 1-D array into a set with integers 1..N */ -CREATE FUNCTION _pg_expandarray(IN anyarray, OUT x anyelement, OUT n int) -RETURNS SETOF RECORD -LANGUAGE sql STRICT IMMUTABLE PARALLEL SAFE -ROWS 100 SUPPORT pg_catalog.array_unnest_support -AS 'SELECT * FROM pg_catalog.unnest($1) WITH ORDINALITY'; - /* Given an index's OID and an underlying-table column number, return the * column's position in the index (NULL if not there) */ CREATE FUNCTION _pg_index_position(oid, smallint) RETURNS int LANGUAGE sql STRICT STABLE BEGIN ATOMIC -SELECT (ss.a).n FROM - (SELECT information_schema._pg_expandarray(indkey) AS a - FROM pg_catalog.pg_index WHERE indexrelid = $1) ss - WHERE (ss.a).x = $2; +SELECT ik.icol +FROM pg_catalog.pg_index, + pg_catalog.unnest(indkey) WITH ORDINALITY ik(tcol, icol) +WHERE indexrelid = $1 AND ik.tcol = $2; END; CREATE FUNCTION _pg_truetypid(pg_attribute, pg_type) RETURNS oid @@ -1079,37 +1072,32 @@ GRANT SELECT ON enabled_roles TO PUBLIC; CREATE VIEW key_column_usage AS SELECT CAST(current_database() AS sql_identifier) AS constraint_catalog, - CAST(nc_nspname AS sql_identifier) AS constraint_schema, + CAST(nc.nspname AS sql_identifier) AS constraint_schema, CAST(conname AS sql_identifier) AS constraint_name, CAST(current_database() AS sql_identifier) AS table_catalog, - CAST(nr_nspname AS sql_identifier) AS table_schema, + CAST(nr.nspname AS sql_identifier) AS table_schema, CAST(relname AS sql_identifier) AS table_name, CAST(a.attname AS sql_identifier) AS column_name, - CAST((ss.x).n AS cardinal_number) AS ordinal_position, + CAST(ck.icol AS cardinal_number) AS ordinal_position, CAST(CASE WHEN contype = 'f' THEN - _pg_index_position(ss.conindid, ss.confkey[(ss.x).n]) + _pg_index_position(c.conindid, c.confkey[ck.icol]) ELSE NULL END AS cardinal_number) AS position_in_unique_constraint FROM pg_attribute a, - (SELECT r.oid AS roid, r.relname, r.relowner, - nc.nspname AS nc_nspname, nr.nspname AS nr_nspname, - c.oid AS coid, c.conname, c.contype, c.conindid, - c.confkey, c.confrelid, - _pg_expandarray(c.conkey) AS x - FROM pg_namespace nr, pg_class r, pg_namespace nc, - pg_constraint c - WHERE nr.oid = r.relnamespace -AND r.oid = c.conrelid -AND nc.oid = c.connamespace -AND c.contype IN ('p', 'u', 'f')
Re: cataloguing NOT NULL constraints
On 2024-May-13, Robert Haas wrote: > On Mon, May 13, 2024 at 9:44 AM Alvaro Herrera > wrote: > > The problematic point is the need to add NOT NULL constraints during > > table creation that don't exist in the table being dumped, for > > performance of primary key creation -- I called this a throwaway > > constraint. We needed to be able to drop those constraints after the PK > > was created. These were marked NO INHERIT to allow them to be dropped, > > which is easier if the children don't have them. This all worked fine. > > This seems really weird to me. Why is it necessary? I mean, in > existing releases, if you declare a column as PRIMARY KEY, the columns > included in the key are forced to be NOT NULL, and you can't change > that for so long as they are included in the PRIMARY KEY. The point is that a column can be in a primary key and not have an explicit not-null constraint. This is different from having a column be NOT NULL and having a primary key on top. In both cases the attnotnull flag is set; the difference between these two scenarios is what happens if you drop the primary key. If you do not have an explicit not-null constraint, then the attnotnull flag is lost as soon as you drop the primary key. You don't have to do DROP NOT NULL for that to happen. This means that if you have a column that's in the primary key but does not have an explicit not-null constraint, then we shouldn't make one up. (Which we would, if we were to keep an unadorned NOT NULL that we can't drop at the end of the dump.) > So I would have thought that after this patch, you'd end up with the > same thing. At least as I interpret the standard, you wouldn't. > One way of doing that would be to make the PRIMARY KEY depend on the > now-catalogued NOT NULL constraints, and the other way would be to > keep it as an ad-hoc prohibition, same as now. That would be against what [I think] the standard says. > But I don't see why I need to end up with what the patch generates, > which seems to be something like CONSTRAINT pgdump_throwaway_notnull_0 > NOT NULL NO INHERIT. That kind of thing suggests that we're changing > around the order of operations in pg_dump, probably by adding the NOT > NULL constraints at a later stage than currently, and I think the > proper solution is most likely to be to avoid doing that in the first > place. The point of the throwaway constraints is that they don't remain after the dump has restored completely. They are there only so that we don't have to scan the data looking for possible nulls when we create the primary key. We have a DROP CONSTRAINT for the throwaway not-nulls as soon as the PK is created. We're not changing any order of operations as such. > That's not to say that we shouldn't try to make improvements, just > that it may be hard to get right. Sure, that's why this patch has now been reverted twice :-) and has been in the works for ... how many years now? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Direct SSL connection with ALPN and HBA rules
(There's, uh, a lot to respond to above and I'm trying to figure out how best to type up all of it.) On Mon, May 13, 2024 at 9:13 AM Robert Haas wrote: > However, > I disagree with Jacob's assertion that sslmode=require has no security > benefits over sslmode=prefer. For the record, I didn't say that... You mean Jelte's quote up above?: > sslmode=prefer and sslmode=require > are the same amount of insecure imho (i.e. extremely insecure). I agree that requiring passive security is tangibly better than allowing fallback to plaintext. I think Jelte's point might be better stated as, =prefer and =require give the same amount of protection against active attack (none). --Jacob
Re: Allowing additional commas between columns, and at the end of the SELECT clause
Tom Lane writes: > =?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= writes: >> Matthias van de Meent writes: >>> Single trailing commas are a feature that's more and more common in >>> languages, yes, but arbitrary excess commas is new to me. Could you >>> provide some examples of popular languages which have that, as I can't >>> think of any. > >> The only one I can think of is Perl, which I'm not sure counts as >> popular any more. JavaScript allows consecutive commas in array >> literals, but they're not no-ops, they create empty array slots: > > I'm fairly down on this idea for SQL, because I think it creates > ambiguity for the ROW() constructor syntax. That is: > > (x,y) is understood to be shorthand for ROW(x,y) > > (x) is not ROW(x), it's just x > > (x,) means what? Python has a similar issue: (x, y) is a tuple, but (x) is just x, and they use the trailing comma to disambiguate, so (x,) creates a single-item tuple. AFAIK it's the only place where the trailing comma is significant. > I realize the original proposal intended to restrict the legality of > excess commas to only a couple of places, but to me that just flags > it as a kluge. ROW(...) ought to work pretty much the same as a > SELECT list. Yeah, a more principled approach would be to not special-case target lists, but to allow one (and only one) trailing comma everywhere: select, order by, group by, array constructors, row constructors, everything that looks like a function call, etc. > As already mentioned, if you can get some variant of this through the > SQL standards process, we'll probably adopt it. But I doubt that we > want to get out front of the committee in this area. Agreed. > regards, tom lane - ilmari
Re: Direct SSL connection with ALPN and HBA rules
On Mon, May 13, 2024 at 9:37 AM Heikki Linnakangas wrote: > On 10/05/2024 16:50, Heikki Linnakangas wrote: > > New proposal: > > > > - Remove the "try both" mode completely, and rename "requiredirect" to > > just "direct". So there would be just two modes: "postgres" and > > "direct". On reflection, the automatic fallback mode doesn't seem very > > useful. It would make sense as the default, because then you would get > > the benefits automatically in most cases but still be compatible with > > old servers. But if it's not the default, you have to fiddle with libpq > > settings anyway to enable it, and then you might as well use the > > "requiredirect" mode when you know the server supports it. There isn't > > anything wrong with it as such, but given how much confusion there's > > been on how this all works, I'd prefer to cut this back to the bare > > minimum now. We can add it back in the future, and perhaps make it the > > default at the same time. This addresses points 2. and 3. above. > > > > and: > > > > - Only allow sslnegotiation=direct with sslmode=require or higher. This > > is what you, Jacob, wanted to do all along, and addresses point 1. > > Here's a patch to implement that. I find this idea to be a massive improvement over the status quo, and I didn't spot any major problems when I read through the patch, either. I'm not quite sure if the patch takes the right approach in emphasizing that weaker sslmode settings are not allowed because of unintended fallbacks. It seems to me that we could equally well say that those combinations are nonsensical. If we're making a direct SSL connection, SSL is eo ipso required. I don't have a strong opinion about whether sslnegotiation=direct should error out (as you propose here) or silently promote sslmode to require. I think either is defensible. Had I been implementing it, I think I would have done as Jacob proposes, just because once we've forced a direct SSL negotiation surely the only sensible behavior is to be using SSL, unless you think there should be a silently-reconnect-without-SSL behavior, which I sure don't. However, I disagree with Jacob's assertion that sslmode=require has no security benefits over sslmode=prefer. That seems like the kind of pessimism that makes people hate security professionals. There have got to be some attacks that are foreclosed by encrypting the connection, even if you don't stop MITM attacks or other things that are more sophisticated than running wireshark and seeing what goes by on the wire. I'm pleased to hear that you will propose to make sslmode=require the default in v18. I think we'll need to do some work to figure out how much collateral damage that will cause, and maybe it will be more than we can stomach, but Magnus has been saying for years that the current default is terrible. I'm not sure I was entirely convinced of that the first time I heard him say it, but I'm smarter now than I was then. It's really hard to believe in 2024 that anyone should ever be using a setting that may or may not encrypt the connection. There's http and https but there's no httpmaybes. -- Robert Haas EDB: http://www.enterprisedb.com
Re: An improved README experience for PostgreSQL
On Mon, May 13, 2024 at 05:43:45PM +0200, Alvaro Herrera wrote: > Can't we add these two lines per topic to the README.md? That would be fine with me, too. The multiple-files approach is perhaps a bit more tailored to GitHub, but there's something to be said for keeping this information centralized. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: An improved README experience for PostgreSQL
On 2024-May-13, Nathan Bossart wrote: > > If we want to enhance the GitHub experience, we can also add these files to > > the organization instead: > > https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/creating-a-default-community-health-file > > This was the intent of my patch. There might be a few others that we could > use, but I figured we could start with the low-hanging fruit that would > have the most impact on the GitHub experience. Can't we add these two lines per topic to the README.md? -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "No hay hombre que no aspire a la plenitud, es decir, la suma de experiencias de que un hombre es capaz"
Re: An improved README experience for PostgreSQL
On Sun, May 12, 2024 at 05:17:42PM +0200, Peter Eisentraut wrote: > I don't know, I find these files kind of "yelling". It's fine to have a > couple, but now it's getting a bit much, and there are more that could be > added. I'm not sure what you mean by this. Do you mean that the contents are too blunt? That there are too many files? Something else? > If we want to enhance the GitHub experience, we can also add these files to > the organization instead: > https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/creating-a-default-community-health-file This was the intent of my patch. There might be a few others that we could use, but I figured we could start with the low-hanging fruit that would have the most impact on the GitHub experience. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: cataloguing NOT NULL constraints
On Mon, May 13, 2024 at 9:44 AM Alvaro Herrera wrote: > The problematic point is the need to add NOT NULL constraints during > table creation that don't exist in the table being dumped, for > performance of primary key creation -- I called this a throwaway > constraint. We needed to be able to drop those constraints after the PK > was created. These were marked NO INHERIT to allow them to be dropped, > which is easier if the children don't have them. This all worked fine. This seems really weird to me. Why is it necessary? I mean, in existing releases, if you declare a column as PRIMARY KEY, the columns included in the key are forced to be NOT NULL, and you can't change that for so long as they are included in the PRIMARY KEY. So I would have thought that after this patch, you'd end up with the same thing. One way of doing that would be to make the PRIMARY KEY depend on the now-catalogued NOT NULL constraints, and the other way would be to keep it as an ad-hoc prohibition, same as now. In PostgreSQL 16, I get a dump like this: CREATE TABLE public.foo ( a integer NOT NULL, b text ); COPY public.foo (a, b) FROM stdin; \. ALTER TABLE ONLY public.foo ADD CONSTRAINT foo_pkey PRIMARY KEY (a); If I'm dumping from an existing release, I don't see why any of that needs to change. The NOT NULL decoration should lead to a system-generated constraint name. If I'm dumping from a new release, the NOT NULL decoration needs to be replaced with CONSTRAINT existing_constraint_name NOT NULL. But I don't see why I need to end up with what the patch generates, which seems to be something like CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT. That kind of thing suggests that we're changing around the order of operations in pg_dump, probably by adding the NOT NULL constraints at a later stage than currently, and I think the proper solution is most likely to be to avoid doing that in the first place. > However, at some point we realized that we needed to add NOT NULL > constraints in child tables for the columns in which the parent had a > primary key. Then things become messy because we had the throwaway > constraints on one hand and the not-nulls that descend from the PK on > the other hand, where one was NO INHERIT and the other wasn't; worse if > the child also has a primary key. This seems like another problem that is created by changing the order of operations in pg_dump. > > The other possibility that occurs to me is that I think the motivation > > for cataloging NOT NULL constraints was that we wanted to be able to > > track dependencies on them, or something like that, which seems like > > it might be able to create issues of the type that you're facing, but > > the details aren't clear to me. > > NOT VALID constraints would be extremely useful, for one thing (because > then you don't need to exclusively-lock the table during a long scan in > order to add a constraint), and it's just one step away from having > these constraints be catalogued. It was also fixing some inconsistent > handling of inheritance cases. I agree that NOT VALID constraints would be very useful. I'm a little scared by the idea of fixing inconsistent handling of inheritance cases, just for fear that there may be more things relying on the inconsistent behavior than we realize. I feel like this is an area where it's easy for changes to be scarier than they at first seem. I still have memories of discovering some of the current behavior back in the mid-2000s when I was learning PostgreSQL (and databases generally). It struck me as fiddly back then, and it still does. I feel like there are probably some behaviors that look like arbitrary decisions but are actually very important for some undocumented reason. That's not to say that we shouldn't try to make improvements, just that it may be hard to get right. -- Robert Haas EDB: http://www.enterprisedb.com
Re: New GUC autovacuum_max_threshold ?
Le 09/05/2024 à 16:58, Robert Haas a écrit : As I see it, a lot of the lack of agreement up until now is people just not understanding the math. Since I think I've got the right idea about the math, I attribute this to other people being confused about what is going to happen and would tend to phrase it as: some people don't understand how catastrophically bad it will be if you set this value too low. FWIW, I do agree with your math. I found your demonstration convincing. 50 was selected with the wet finger. Using the formula I suggested earlier: vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples, vac_base_thresh + vac_scale_factor * sqrt(reltuples) * 1000); your table of 2.56 billion tuples will be vacuumed if there are more than 10 million dead tuples (every 28 minutes). If we want to stick with the simple formula, we should probably choose a very high default, maybe 100 million, as you suggested earlier. However, it would be nice to have the visibility map updated more frequently than every 100 million dead tuples. I wonder if this could be decoupled from the vacuum process?
Re: Direct SSL connection with ALPN and HBA rules
On 13/05/2024 16:55, Jelte Fennema-Nio wrote: On Mon, 13 May 2024 at 15:38, Heikki Linnakangas wrote: Here's a patch to implement that. + if (conn->sslnegotiation[0] == 'd' && + conn->sslmode[0] != 'r' && conn->sslmode[0] != 'v') I think these checks should use strcmp instead of checking magic first characters. I see this same clever trick is used in the recently added init_allowed_encryption_methods, and I think that should be changed to use strcmp too for readability. Oh yeah, I hate that too. These should be refactored into enums, with a clear separate stage of parsing the options from strings. But we use that pattern all over the place, so I didn't want to start reforming it with this patch. -- Heikki Linnakangas Neon (https://neon.tech)
Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts
On Mon, 13 May 2024 at 16:13, Tom Lane wrote: > > Matthias van de Meent writes: > > PFA a patch that fixes this issue, by assuming that all pages in the > > source database utilize a non-standard page layout. > > Surely that cure is worse than the disease? I don't know where we would get the information whether the selected relation fork's pages are standard-compliant. We could base it off of the fork number (that info is available locally) but that doesn't guarantee much. For VM and FSM-pages we know they're essentially never standard-compliant (hence this thread), but for the main fork it is anyone's guess once the user has installed an additional AM - which we don't detect nor pass through to the offending RelationCopyStorageUsingBuffer. As for "worse", the default template database is still much smaller than the working set of most databases. This will indeed regress the workload a bit, but only by the fraction of holes in the page + all FSM/VM data. I think the additional WAL volume during CREATE DATABASE is worth it when the alternative is losing that data with physical replication/secondary instances. Note that this does not disable page compression, it just stops the logging of holes in pages; holes which generally are only a fraction of the whole database. It's not inconceivable that this will significantly increase WAL volume, but I think we should go for correctness rather than fastest copy. If we went with fastest copy, we'd better just skip logging the FSM and VM forks because we're already ignoring the data of the pages, so why not ignore the pages themselves, too? I don't think that holds water when we want to be crash-proof in CREATE DATABASE, with a full data copy of the template database. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?
On Mon, 13 May 2024 at 04:40, aa wrote: > Hello Everyone! > > Is there any chance to get some kind of a result set sifting mechanism in > Postgres? > > What I am looking for is a way to get for example: "nulls last" in a > result set, without having to call "order by" or having to use UNION ALL, > and if possible to get this in a single result set pass. > > Something on this line: SELECT a, b, c FROM my_table WHERE a nulls last > OFFSET 0 LIMIT 25 > > I don't want to use order by or union all because these are time consuming > operations, especially on large data sets and when comparations are done > on dynamic values (eg: geolocation distances in between a mobile and a > static location) > This already exists: ORDER BY a IS NULL I've found it to be more useful than one might initially expect to order by a boolean expression.
Re: Direct SSL connection with ALPN and HBA rules
On Mon, 13 May 2024 at 13:07, Heikki Linnakangas wrote: > "channel_binding=require sslmode=require" also protects from MITM attacks. Cool, I didn't realize we had this connection option and it could be used like this. But I think there's a few security downsides of channel_binding=require over sslmode=verify-full: If the client relies on channel_binding to validate server authenticity, a leaked server-side SCRAM hash is enough for an attacker to impersonate a server. While normally a leaked scram hash isn't really much of a security concern (assuming long enough passwords). I also don't know of many people rotating their scram hashes, even though many rotate TLS certs. > I think these options should be designed from the user's point of view, > so that the user can specify the risks they're willing to accept, and > the details of how that's accomplished are handled by libpq. For > example, I'm OK with (tick all that apply): > > [ ] passive eavesdroppers seeing all the traffic > [ ] MITM being able to hijack the session > [ ] connecting without verifying the server's identity > [ ] divulging the plaintext password to the server > [ ] ... I think that sounds like a great idea, looking forward to the proposal.
Re: Why is citext/regress failing on hamerkop?
On 2024-05-12 Su 18:05, Thomas Munro wrote: On Mon, May 13, 2024 at 12:26 AM Andrew Dunstan wrote: Well, this is more or less where I came in back in about 2002 :-) I've been trying to help support it ever since, mainly motivated by stubborn persistence than anything else. Still, I agree that the lack of support for the Windows port from Microsoft over the years has been more than disappointing. I think "state of the Windows port" would make a good discussion topic at pgconf.dev (with write-up for those who can't be there). If there is interest, I could organise that with a short presentation of the issues I am aware of so far and possible improvements and smaller-things-we-could-drop-instead-of-dropping-the-whole-port. I would focus on technical stuff, not who-should-be-doing-what, 'cause I can't make anyone do anything. +1 cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts
Matthias van de Meent writes: > PFA a patch that fixes this issue, by assuming that all pages in the > source database utilize a non-standard page layout. Surely that cure is worse than the disease? regards, tom lane
Re: Show WAL write and fsync stats in pg_stat_io
On Fri, Apr 19, 2024 at 1:32 PM Nazir Bilal Yavuz wrote: > > > I wanted to inform you that the 73f0a13266 commit changed all WALRead > > calls to read variable bytes, only the WAL receiver was reading > > variable bytes before. > > I want to start working on this again if possible. I will try to > summarize the current status: Thanks for working on this. > * With the 73f0a13266 commit, the WALRead() function started to read > variable bytes in every case. Before, only the WAL receiver was > reading variable bytes. > > * With the 91f2cae7a4 commit, WALReadFromBuffers() is merged. We were > discussing what we have to do when this is merged. It is decided that > WALReadFromBuffers() does not call pgstat_report_wait_start() because > this function does not perform any IO [1]. We may follow the same > logic by not including these to pg_stat_io? Right. WALReadFromBuffers doesn't do any I/O. Whoever reads WAL from disk (backends, walsenders, recovery process) using pg_pread (XLogPageRead, WALRead) needs to be tracked in pg_stat_io or some other view. If it were to be in pg_stat_io, although we may not be able to distinguish WAL read stats at a backend level (like how many times/bytes a walsender or recovery process or a backend read WAL from disk), but it can help understand overall impact of WAL read I/O at a cluster level. With this approach, the WAL I/O stats are divided up - WAL read I/O and write I/O stats are in pg_stat_io and pg_stat_wal respectively. This makes me think if we need to add WAL read I/O stats also to pg_stat_wal. Then, we can also add WALReadFromBuffers stats hits/misses there. With this approach, pg_stat_wal can be a one-stop view for all the WAL related stats. If needed, we can join info from pg_stat_wal to pg_stat_io in system_views.sql so that the I/O stats are emitted to the end-user via pg_stat_io. > * With the b5a9b18cd0 commit, streaming I/O is merged but AFAIK this > does not block anything related to putting WAL stats in pg_stat_io. > > If I am not missing any new changes, the only problem is reading > variable bytes now. We have discussed a couple of solutions: > > 1- Change op_bytes to something like -1, 0, 1, NULL etc. and document > that this means some variable byte I/O is happening. > > I kind of dislike this solution because if the *only* read I/O is > happening in variable bytes, it will look like write and extend I/Os > are happening in variable bytes as well. As a solution, it could be > documented that only read I/Os could happen in variable bytes for now. Yes, read I/O for relation and WAL can happen in variable bytes. I think this idea seems reasonable and simple yet useful to know the cluster-wide read I/O. However, another point here is how the total number of bytes read is represented with existing pg_stat_io columns 'reads' and 'op_bytes'. It is known now with 'reads' * 'op_bytes', but with variable bytes, how is read bytes calculated? Maybe add new columns read_bytes/write_bytes? > 2- Use op_bytes_[read | write | extend] columns instead of one > op_bytes column, also use the first solution. > > This can solve the first solution's weakness but it introduces two > more columns. This is more future proof compared to the first solution > if there is a chance that some variable I/O could happen in write and > extend calls as well in the future. -1 as more columns impact the readability and usability. > 3- Create a new pg_stat_io_wal view to put WAL I/Os here instead of > pg_stat_io. > > pg_stat_io could remain untouchable and we will have flexibility to > edit this new view as much as we want. But the original aim of the > pg_stat_io is evaluating all I/O from a single view and adding a new > view breaks this aim. -1 as it defeats the very purpose of one-stop view pg_stat_io for all kinds of I/O. PS: see my response above about adding both WAL write I/O and read I/O stats to pg_stat_wal. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Allowing additional commas between columns, and at the end of the SELECT clause
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= writes: > Matthias van de Meent writes: >> Single trailing commas are a feature that's more and more common in >> languages, yes, but arbitrary excess commas is new to me. Could you >> provide some examples of popular languages which have that, as I can't >> think of any. > The only one I can think of is Perl, which I'm not sure counts as > popular any more. JavaScript allows consecutive commas in array > literals, but they're not no-ops, they create empty array slots: I'm fairly down on this idea for SQL, because I think it creates ambiguity for the ROW() constructor syntax. That is: (x,y) is understood to be shorthand for ROW(x,y) (x) is not ROW(x), it's just x (x,) means what? I realize the original proposal intended to restrict the legality of excess commas to only a couple of places, but to me that just flags it as a kluge. ROW(...) ought to work pretty much the same as a SELECT list. As already mentioned, if you can get some variant of this through the SQL standards process, we'll probably adopt it. But I doubt that we want to get out front of the committee in this area. regards, tom lane
Re: BitmapHeapScan streaming read user and prelim refactoring
On Sat, May 11, 2024 at 3:18 PM Tomas Vondra wrote: > > On 5/10/24 21:48, Melanie Plageman wrote: > > Attached is v3. I didn't use your exact language because the test > > wouldn't actually verify that we properly discard the tuples. Whether > > or not the empty tuples are all emitted, it just resets the counter to > > 0. I decided to go with "exercise" instead. > > > > I did go over the v3 patch, did a bunch of tests, and I think it's fine > and ready to go. The one thing that might need some minor tweaks is the > commit message. > > 1) Isn't the subject "Remove incorrect assert" a bit misleading, as the > patch does not simply remove an assert, but replaces it with a reset of > the field the assert used to check? (The commit message does not mention > this either, at least not explicitly.) I've updated the commit message. > 2) The "heap AM-specific bitmap heap scan code" sounds a bit strange to > me, isn't the first "heap" unnecessary? bitmap heap scan has been used to refer to bitmap table scans, as the name wasn't changed from heap when the table AM API was added (e.g. BitmapHeapNext() is used by all table AMs doing bitmap table scans). 04e72ed617be specifically pushed the skip fetch optimization into heap implementations of bitmap table scan callbacks, so it was important to make this distinction. I've changed the commit message to say heap AM implementations of bitmap table scan callbacks. While looking at the patch again, I wondered if I should set enable_material=false in the test. It doesn't matter from the perspective of exercising the correct code; however, I wasn't sure if disabling materialization would make the test more resilient against future planner changes which could cause it to incorrectly fail. - Melanie From d3628d8d36b2be54b64c18402bdda80e1c8a436f Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Fri, 10 May 2024 14:52:34 -0400 Subject: [PATCH v4] BitmapHeapScan: Replace incorrect assert with reinitialization 04e72ed617be pushed the skip fetch optimization (allowing bitmap heap scans to operate like index-only scans if none of the underlying data is needed) into heap AM implementations of bitmap table scan callbacks. 04e72ed617be added an assert that all tuples in blocks eligible for the optimization had been NULL-filled and emitted by the end of the scan. This assert is incorrect when not all tuples need be scanned to execute the query; for example: a join in which not all inner tuples need to be scanned before skipping to the next outer tuple. Remove the assert and reset the field on which it previously asserted to avoid incorrectly emitting NULL-filled tuples from a previous scan on rescan. Author: Melanie Plageman Reviewed-by: Tomas Vondra Reported-by: Melanie Plageman Reproduced-by: Tomas Vondra, Richard Guo Discussion: https://postgr.es/m/CAMbWs48orzZVXa7-vP9Nt7vQWLTE04Qy4PePaLQYsVNQgo6qRg%40mail.gmail.com --- src/backend/access/heap/heapam.c | 4 ++-- src/test/regress/expected/join.out | 38 ++ src/test/regress/sql/join.sql | 26 3 files changed, 66 insertions(+), 2 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 4be0dee4de..8600c22515 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -1184,7 +1184,7 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params, scan->rs_vmbuffer = InvalidBuffer; } - Assert(scan->rs_empty_tuples_pending == 0); + scan->rs_empty_tuples_pending = 0; /* * The read stream is reset on rescan. This must be done before @@ -1216,7 +1216,7 @@ heap_endscan(TableScanDesc sscan) if (BufferIsValid(scan->rs_vmbuffer)) ReleaseBuffer(scan->rs_vmbuffer); - Assert(scan->rs_empty_tuples_pending == 0); + scan->rs_empty_tuples_pending = 0; /* * Must free the read stream before freeing the BufferAccessStrategy. diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 0246d56aea..8829bd76e7 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -7924,3 +7924,41 @@ where exists (select 1 from j3 (13 rows) drop table j3; +-- Exercise the "skip fetch" Bitmap Heap Scan optimization when candidate +-- tuples are discarded. This may occur when: +-- 1. A join doesn't require all inner tuples to be scanned for each outer +-- tuple, and +-- 2. The inner side is scanned using a bitmap heap scan, and +-- 3. The bitmap heap scan is eligible for the "skip fetch" optimization. +-- This optimization is usable when no data from the underlying table is +-- needed. Use a temp table so it is only visible to this backend and +-- vacuum may reliably mark all blocks in the table all visible in the +-- visibility map. +CREATE TEMP TABLE skip_fetch (a INT, b INT) WITH (fillfactor=10); +INSERT INTO skip_fetch SELECT i % 3, i FROM generate_series(0,30) i; +CREATE INDEX ON skip_fetch(a); +VACUUM
Re: Direct SSL connection with ALPN and HBA rules
On Mon, 13 May 2024 at 15:38, Heikki Linnakangas wrote: > Here's a patch to implement that. + if (conn->sslnegotiation[0] == 'd' && + conn->sslmode[0] != 'r' && conn->sslmode[0] != 'v') I think these checks should use strcmp instead of checking magic first characters. I see this same clever trick is used in the recently added init_allowed_encryption_methods, and I think that should be changed to use strcmp too for readability.
Re: cataloguing NOT NULL constraints
On 2024-May-13, Robert Haas wrote: > On Sat, May 11, 2024 at 5:40 AM Alvaro Herrera > wrote: > > Specifically, the problem is that I mentioned that we could restrict the > > NOT NULL NO INHERIT addition in pg_dump for primary keys to occur only > > in pg_upgrade; but it turns this is not correct. In normal > > dump/restore, there's an additional table scan to check for nulls when > > the constraints is not there, so the PK creation would become measurably > > slower. (In a table with a million single-int rows, PK creation goes > > from 2000ms to 2300ms due to the second scan to check for nulls). > > I have a feeling that any theory of the form "X only needs to happen > during pg_upgrade" is likely to be wrong. pg_upgrade isn't really > doing anything especially unusual: just creating some objects and > loading data. Those things can also be done at other times, so > whatever is needed during pg_upgrade is also likely to be needed at > other times. Maybe that's not sound reasoning for some reason or > other, but that's my intuition. True. It may be that by setting up the upgrade SQL script differently, we don't need to make the distinction at all. I hope to be able to do that. > I'm sorry that I haven't been following this thread closely, but I'm > confused about how we ended up here. What exactly are the user-visible > behavior changes wrought by this patch, and how do they give rise to > these issues? The problematic point is the need to add NOT NULL constraints during table creation that don't exist in the table being dumped, for performance of primary key creation -- I called this a throwaway constraint. We needed to be able to drop those constraints after the PK was created. These were marked NO INHERIT to allow them to be dropped, which is easier if the children don't have them. This all worked fine. However, at some point we realized that we needed to add NOT NULL constraints in child tables for the columns in which the parent had a primary key. Then things become messy because we had the throwaway constraints on one hand and the not-nulls that descend from the PK on the other hand, where one was NO INHERIT and the other wasn't; worse if the child also has a primary key. It turned out that we didn't have any mechanism to transform a NO INHERIT constraint into a regular one that would be inherited. I added one, didn't like the way it worked, tried to restrict it but that caused other problems; this is the mess that led to the revert (pg_dump in normal mode would emit scripts that fail for some legitimate cases). One possible way forward might be to make pg_dump smarter by adding one more query to know the relationship between constraints that must be dropped and those that don't. Another might be to allow multiple not-null constraints on the same column (one inherits, the other doesn't, and you can drop them independently). There may be others. > The other possibility that occurs to me is that I think the motivation > for cataloging NOT NULL constraints was that we wanted to be able to > track dependencies on them, or something like that, which seems like > it might be able to create issues of the type that you're facing, but > the details aren't clear to me. NOT VALID constraints would be extremely useful, for one thing (because then you don't need to exclusively-lock the table during a long scan in order to add a constraint), and it's just one step away from having these constraints be catalogued. It was also fixing some inconsistent handling of inheritance cases. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Direct SSL connection with ALPN and HBA rules
On 10/05/2024 16:50, Heikki Linnakangas wrote: New proposal: - Remove the "try both" mode completely, and rename "requiredirect" to just "direct". So there would be just two modes: "postgres" and "direct". On reflection, the automatic fallback mode doesn't seem very useful. It would make sense as the default, because then you would get the benefits automatically in most cases but still be compatible with old servers. But if it's not the default, you have to fiddle with libpq settings anyway to enable it, and then you might as well use the "requiredirect" mode when you know the server supports it. There isn't anything wrong with it as such, but given how much confusion there's been on how this all works, I'd prefer to cut this back to the bare minimum now. We can add it back in the future, and perhaps make it the default at the same time. This addresses points 2. and 3. above. and: - Only allow sslnegotiation=direct with sslmode=require or higher. This is what you, Jacob, wanted to do all along, and addresses point 1. Here's a patch to implement that. -- Heikki Linnakangas Neon (https://neon.tech) From 2a7bde8502966a634b62d4109f0cde5fdc685da2 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Mon, 13 May 2024 16:36:54 +0300 Subject: [PATCH 1/1] Remove option to fall back from direct to postgres SSL negotiation There were three problems with the sslnegotiation options: 1. The sslmode=prefer and sslnegotiation=requiredirect combination was somewhat dangerous, as you might unintentionally fall back to plaintext authentication when connecting to a pre-v17 server. 2. There was an asymmetry between 'postgres' and 'direct' options. 'postgres' meant "try only traditional negotiation", while 'direct' meant "try direct first, and fall back to traditional negotiation if it fails". That was apparent only if you knew that the 'requiredirect' mode also exists. 3. The "require" word in 'requiredirect' suggests that it's somehow more strict or more secure, similar to sslmode. However, I don't consider direct SSL connections to be a security feature. To address these problems: - Only allow sslnegotiation='direct' if sslmode='require' or stronger. And for the record, Jacob and Robert felt that we should do that (or have sslnegotiation='direct' imply sslmode='require') anyway, regardless of the first issue. - Remove the 'direct' mode that falls back to traditional negotiation, and rename what was called 'requiredirect' to 'direct' instead. In other words, there is no "try both methods" option anymore, 'postgres' now means the traditional negotiation and 'direct' means a direct SSL connection. Discussion: https://www.postgresql.org/message-id/d3b1608a-a1b6-4eda-9ec5-ddb3e4375808%40iki.fi --- doc/src/sgml/libpq.sgml | 49 ++-- src/interfaces/libpq/fe-connect.c | 52 ++-- src/interfaces/libpq/libpq-int.h | 3 +- .../libpq/t/005_negotiate_encryption.pl | 254 -- 4 files changed, 150 insertions(+), 208 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 1d32c226d8..b32e497b1b 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -1772,15 +1772,18 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname sslnegotiation -This option controls whether PostgreSQL -will perform its protocol negotiation to request encryption from the -server or will just directly make a standard SSL -connection. Traditional PostgreSQL -protocol negotiation is the default and the most flexible with -different server configurations. If the server is known to support -direct SSL connections then the latter requires one -fewer round trip reducing connection latency and also allows the use -of protocol agnostic SSL network tools. +This option controls how SSL encryption is negotiated with the server, +if SSL is used. In the default postgres mode, the +client first asks the server if SSL is supported. In +direct mode, the client starts the standard SSL +handshake directly after establishing the TCP/IP connection. Traditional +PostgreSQL protocol negotiation is the most +flexible with different server configurations. If the server is known +to support direct SSL connections then the latter +requires one fewer round trip reducing connection latency and also +allows the use of protocol agnostic SSL network tools. The direct SSL +option was introduced in PostgreSQL version +17. @@ -1798,32 +1801,14 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname direct - first attempt to establish a standard SSL connection and if that - fails reconnect and perform the negotiation. This fallback - process adds significant latency if the initial SSL connection -
Re: Fix parallel vacuum buffer usage reporting
Hi, On Fri, 10 May 2024 at 19:09, Masahiko Sawada wrote: > > On Fri, May 10, 2024 at 7:26 PM Nazir Bilal Yavuz wrote: > > > > Hi, > > > > Thank you for working on this! > > > > On Wed, 1 May 2024 at 06:37, Masahiko Sawada wrote: > > > > > > Thank you for further testing! I've pushed the patch. > > > > I realized a behaviour change while looking at 'Use pgBufferUsage for > > block reporting in analyze' thread [1]. Since that change applies here > > as well, I thought it is better to mention it here. > > > > Before this commit, VacuumPageMiss did not count the blocks if its > > read was already completed by other backends [2]. Now, > > 'pgBufferUsage.local_blks_read + pgBufferUsage.shared_blks_read' > > counts every block attempted to be read; possibly double counting if > > someone else has already completed the read. > > True. IIUC there is such a difference only in HEAD but not in v15 and > v16. The following comment in WaitReadBuffers() says that it's a > traditional behavior that we count blocks as read even if someone else > has already completed its I/O: > > /* > * We count all these blocks as read by this backend. This is traditional > * behavior, but might turn out to be not true if we find that someone > * else has beaten us and completed the read of some of these blocks. In > * that case the system globally double-counts, but we traditionally don't > * count this as a "hit", and we don't have a separate counter for "miss, > * but another backend completed the read". > */ > > So I think using pgBufferUsage for (parallel) vacuum is a legitimate > usage and makes it more consistent with other parallel operations. That sounds logical. Thank you for the clarification. -- Regards, Nazir Bilal Yavuz Microsoft
Re: cataloguing NOT NULL constraints
On Sat, May 11, 2024 at 5:40 AM Alvaro Herrera wrote: > I have found two more problems that I think are going to require some > more work to fix, so I've decided to cut my losses now and revert the > whole. I'll come back again in 18 with these problems fixed. Bummer, but makes sense. > Specifically, the problem is that I mentioned that we could restrict the > NOT NULL NO INHERIT addition in pg_dump for primary keys to occur only > in pg_upgrade; but it turns this is not correct. In normal > dump/restore, there's an additional table scan to check for nulls when > the constraints is not there, so the PK creation would become measurably > slower. (In a table with a million single-int rows, PK creation goes > from 2000ms to 2300ms due to the second scan to check for nulls). I have a feeling that any theory of the form "X only needs to happen during pg_upgrade" is likely to be wrong. pg_upgrade isn't really doing anything especially unusual: just creating some objects and loading data. Those things can also be done at other times, so whatever is needed during pg_upgrade is also likely to be needed at other times. Maybe that's not sound reasoning for some reason or other, but that's my intuition. > The addition of NOT NULL NO INHERIT constraints for this purpose > collides with addition of constraints for other reasons, and it forces > us to do unpleasant things such as altering an existing constraint to go > from NO INHERIT to INHERIT. If this happens only during pg_upgrade, > that would be okay IMV; but if we're forced to allow in normal operation > (and in some cases we are), it could cause inconsistencies, so I don't > want to do that. I see a way to fix this (adding another query in > pg_dump that detects which columns descend from ones used in PKs in > ancestor tables), but that's definitely too much additional mechanism to > be adding this late in the cycle. I'm sorry that I haven't been following this thread closely, but I'm confused about how we ended up here. What exactly are the user-visible behavior changes wrought by this patch, and how do they give rise to these issues? One change I know about is that a constraint that is explicitly catalogued (vs. just existing implicitly) has a name. But it isn't obvious to me that such a difference, by itself, is enough to cause all of these problems: if a NOT NULL constraint is created without a name, then I suppose we just have to generate one. Maybe the fact that the constraints have names somehow causes ugliness later, but I can't quite understand why it would. The other possibility that occurs to me is that I think the motivation for cataloging NOT NULL constraints was that we wanted to be able to track dependencies on them, or something like that, which seems like it might be able to create issues of the type that you're facing, but the details aren't clear to me. Changing any behavior in this area seems like it could be quite tricky, because of things like the interaction between PRIMARY KEY and NOT NULL, which is rather idiosyncratic but upon which a lot of existing SQL (including SQL not controlled by us) likely depends. If there's not a clear plan for how we keep all the stuff that works today working, I fear we'll end up in an endless game of whack-a-mole. If you've already written the design ideas down someplace, I'd appreciate a pointer in the right direction. Or maybe there's some other issue entirely. In any case, sorry about the revert, and sorry that I haven't paid more attention to this. -- Robert Haas EDB: http://www.enterprisedb.com
WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts
Hi, My collegue Konstantin Knizhnik pointed out that we fail to mark pages with a non-standard page layout with page_std=false in RelationCopyStorageUsingBuffer when we WAL-log them. This causes us to interpret the registered buffer as a standard buffer, and omit the hole in the page, which for FSM/VM pages covers the whole page. The immediate effect of this bug is that replicas and primaries in a physical replication system won't have the same data in their VM- and FSM-forks until the first VACUUM on the new database has WAL-logged these pages again. Whilst not actively harmful for the VM/FSM subsystems, it's definitely suboptimal. Secondary unwanted effects are that AMs that use the buffercache- but which don't use or update the pageheader- also won't see the main data logged in WAL, thus potentially losing user data in the physical replication stream or with a system crash. I've not looked for any such AMs and am unaware of any that would have this issue, but it's better to fix this. PFA a patch that fixes this issue, by assuming that all pages in the source database utilize a non-standard page layout. Kind regards, Matthias van de Meent Neon (https://neon.tech) v1-0001-Fix-logging-of-non-standard-pages-in-RelationCopy.patch Description: Binary data
Re: Allowing additional commas between columns, and at the end of the SELECT clause
Matthias van de Meent writes: > On Mon, 13 May 2024 at 10:42, Artur Formella > wrote: >> Motivation: >> Commas of this type are allowed in many programming languages, in some >> it is even recommended to use them at the ends of lists or objects. > > Single trailing commas are a feature that's more and more common in > languages, yes, but arbitrary excess commas is new to me. Could you > provide some examples of popular languages which have that, as I can't > think of any. The only one I can think of is Perl, which I'm not sure counts as popular any more. JavaScript allows consecutive commas in array literals, but they're not no-ops, they create empty array slots: ❯ js Welcome to Node.js v18.19.0. Type ".help" for more information. > [1,,2,,] [ 1, <1 empty item>, 2, <1 empty item> ] - ilmari
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, Thanks for giving comments! New patch was posted in [1]. > 0.1 General - Patch name > > /SUBSCIRPTION/SUBSCRIPTION/ Fixed. > == > 0.2 General - Apply > > FYI, there are whitespace warnings: > > git > apply ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTI > ON-.-S.patch > ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.- > S.patch:191: > trailing whitespace. > # command will abort the prepared transaction and succeed. > warning: 1 line adds whitespace errors. I didn't recognize, fixed. > == > 0.3 General - Regression test fails > > The subscription regression tests are not working. > > ok 158 + publication 1187 ms > not ok 159 + subscription 123 ms > > See review comments #4 and #5 below for the reason why. Yeah, I missed to update the expected result. Fixed. > == > src/sgml/ref/alter_subscription.sgml > > 1. > >The two_phase parameter can only be altered when > the > - subscription is disabled. When altering the parameter from > on > - to off, the backend process checks prepared > - transactions done by the logical replication worker and aborts them. > + subscription is disabled. Altering the parameter from > on > + to off will be failed when there are prepared > + transactions done by the logical replication worker. If you want to > alter > + the parameter forcibly in this case, force_alter > + option must be set to on at the same time. If > + specified, the backend process aborts prepared transactions. > > 1a. > That "will be failed when..." seems strange. Maybe say "will give an > error when..." > > ~ > 1b. > Because "force" is a verb, I think true/false is more natural than > on/off for this new boolean option. e.g. it acts more like a "flag" > than a "mode". See all the other boolean options in CREATE > SUBSCRIPTION -- those are mostly all verbs too and are all true/false > AFAIK. Fixed, but note that the part was moved. > > == > > 2. CREATE SUBSCRIPTION > > For my previous review, two comments [v7-0004#2] and [v7-0004#3] were > not addressed. Kuroda-san wrote: > Hmm, but the parameter cannot be used for CREATE SUBSCRIPTION. Should > we modify to accept and add the description in the doc? > > ~ > > Yes, that is what I am suggesting. IMO it is odd for the user to be > able to ALTER a parameter that cannot be included in the CREATE > SUBSCRIPTION in the first place. AFAIK there are no other parameters > that behave that way. Hmm. I felt that this change required the new attribute in pg_subscription system catalog. Previously I did not like because it contains huge change, but...I tried to do. New attribute 'subforcealter', and some parts were updated accordingly. > src/backend/commands/subscriptioncmds.c > > 3. AlterSubscription > > + if (!opts.force_alter) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot alter %s when there are prepared transactions", > + "two_phase = off"), > + errhint("Resolve these transactions or set %s at the same time, and > then try again.", > + "force_alter = true"))); > > I think saying "at the same time" in the hint is unnecessary. Surely > the user is allowed to set this parameter separately if they want to? > > e.g. > ALTER SUBSCRIPTION sub SET (force_alter=true); > ALTER SUBSCRIPTION sub SET (two_phase=off); Actually, it was correct. Since force_alter was not recorded in the system catalog, it must be specified at the same time. Now, we allow to be separated, so removed. > == > src/test/regress/expected/subscription.out > > 4. > +-- fail - force_alter cannot be set alone > +ALTER SUBSCRIPTION regress_testsub SET (force_alter = true); > +ERROR: force_alter must be specified with two_phase > > This error cannot happen. You removed that error! Fixed. > == > src/test/subscription/t/099_twophase_added.pl > > 6. > +# Try altering the two_phase option to "off." The command will fail since > there > +# is a prepared transaction and the force option is not specified. > +my $stdout; > +my $stderr; > + > +($result, $stdout, $stderr) = $node_subscriber->psql( > + 'postgres', "ALTER SUBSCRIPTION regress_sub SET (two_phase = off);"); > +ok($stderr =~ /cannot alter two_phase = off when there are prepared > transactions/, > + 'ALTER SUBSCRIPTION failed'); > > /force option is not specified./'force_alter' option is not specified as > true./ Fixed. > > 7. > +# Verify the prepared transaction still exists > +$result = $node_subscriber->safe_psql('postgres', > +"SELECT count(*) FROM pg_prepared_xacts;"); > +is($result, q(1), "prepared transaction still exits"); > + > > TYPO: /exits/exists/ Fixed. > > ~~~ > > 8. > +# Alter the two_phase with the force_alter option. Apart from the above, the > +# command will abort the prepared transaction and succeed. >
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, Thanks for reviewing! Patch can be available in [1]. > == > src/sgml/ref/alter_subscription.sgml > > 1. > + > + The two_phase parameter can only be altered when > the > + subscription is disabled. When altering the parameter from > on > + to off, the backend process checks prepared > + transactions done by the logical replication worker and aborts them. > + > > The text may be OK as-is, but I was wondering if it might be better to > give a more verbose explanation. > > BEFORE > ... the backend process checks prepared transactions done by the > logical replication worker and aborts them. > > SUGGESTION > ... the backend process checks for any incomplete prepared > transactions done by the logical replication worker (from when > two_phase parameter was still "on") and, if any are found, those are > aborted. > Fixed. > == > src/backend/commands/subscriptioncmds.c > > 2. AlterSubscription > > - /* > - * Since the altering two_phase option of subscriptions > - * also leads to the change of slot option, this command > - * cannot be rolled back. So prevent we are in the > - * transaction block. > + * If two_phase was enabled, there is a possibility the > + * transactions has already been PREPARE'd. They must be > + * checked and rolled back. > */ > > BEFORE > ... there is a possibility the transactions has already been PREPARE'd. > > SUGGESTION > ... there is a possibility that transactions have already been PREPARE'd. Fixed. > 3. AlterSubscription > + /* > + * Since the altering two_phase option of subscriptions > + * (especially on->off case) also leads to the > + * change of slot option, this command cannot be rolled > + * back. So prevent we are in the transaction block. > + */ > PreventInTransactionBlock(isTopLevel, > "ALTER SUBSCRIPTION ... SET (two_phase = off)"); > > > This comment is a bit vague and includes some typos, but IIUC these > problems will already be addressed by the 0002 patch changes.AFAIK > patch 0003 is only moving the 0002 comment. Yeah, the comment was updated accordingly. > > == > src/test/subscription/t/099_twophase_added.pl > > 4. > +# Check the case that prepared transactions exist on the subscriber node > +# > +# If the two_phase is altering from "on" to "off" and there are prepared > +# transactions on the subscriber, they must be aborted. This test checks it. > + > > Similar to the comment that I gave for v8-0002. I think there should > be comment for the major test comment to > distinguish it from comments for the sub-steps. Added. > 5. > +# Verify the prepared transaction are aborted because two_phase is changed to > +# "off". > +$result = $node_subscriber->safe_psql('postgres', > +"SELECT count(*) FROM pg_prepared_xacts;"); > +is($result, q(0), "prepared transaction done by worker is aborted"); > + > > /the prepared transaction are aborted/any prepared transactions are aborted/ Fixed. [1]: https://www.postgresql.org/message-id/OSBPR01MB2552FEA48D265EA278AA9F7AF5E22%40OSBPR01MB2552.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: pg_stat_advisor extension
1. In the case of parallel workers the plan_rows value has a different semantics than the number of rows predicted. Just explore get_parallel_divisor(). 2. The extension recommends new statistics immediately upon an error finding. But what if the reason for the error is stale statistics? Or this error may be raised for only one specific set of constants, and estimation will be done well in another 99.% of cases for the same expression. The new parameter, `pg_stat_advisor.analyze_scale_factor`, can suggest the execution of the ANALYZE command on specific tables. The extension now evaluates the ratio of `n_live_tup` (number of live tuples) to `n_mod_since_analyze` (number of modifications since last analyze) in the `pg_stat_all_tables` catalog. If this ratio exceeds the value specified in `analyze_scale_factor`, the extension will suggest an update to the table's statistics. There are a lot of parameters that influences on estimated rows. Statistics might not help improve estimated rows. This feature is designed to provide users with data-driven insights to decide whether updating statistics via the ANALYZE command could potentially improve query performance. By suggesting rather than automatically executing statistics updates, we empower you to make informed decisions based on the specific needs and conditions of your database environment. I've developed an extension that provides suggestions on whether to update or create statistics for your PostgreSQL database, without executing any changes. This approach allows you to consider various parameters that influence row estimates and make informed decisions about optimizing your database's performance. Your feedback is invaluable, and we look forward to hearing about your experiences and any improvements you might suggest. Best regards, Ilia Evdokimov Tantor Labs LLC. From eb998bea96a3640d240afa63e08cc8cf98925bf7 Mon Sep 17 00:00:00 2001 From: Ilia Evdokimov Date: Mon, 13 May 2024 14:41:59 +0300 Subject: [PATCH] 'pg_stat_advisor' extension This service as a hook into executor. It has two GUC-parameters. pg_stat_advisor.analyze_scale_factor: if ratio of pg_stat_all_tables.n_live_tup to pg_stat_all_tables.n_mod_since_analyze is greater than pg_stat_advisor.analyze_scale_factor extension prints suggestion executing ANALYZE command. pg_stat_advisor.suggest_statistics_threshold: the ratio of total rows to planned rows is greater than or equal to this threshold the extension prints suggestion executing the creation of statistics, using the naming format 'relationName_columns' --- contrib/Makefile | 1 + contrib/pg_stat_advisor/Makefile | 20 + .../expected/pg_stat_advisor.out | 52 ++ contrib/pg_stat_advisor/meson.build | 30 + contrib/pg_stat_advisor/pg_stat_advisor.c | 560 ++ .../pg_stat_advisor/sql/pg_stat_advisor.sql | 24 + 6 files changed, 687 insertions(+) create mode 100644 contrib/pg_stat_advisor/Makefile create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out create mode 100644 contrib/pg_stat_advisor/meson.build create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql diff --git a/contrib/Makefile b/contrib/Makefile index abd780f277..d6ce2fe562 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -33,6 +33,7 @@ SUBDIRS = \ pg_buffercache \ pg_freespacemap \ pg_prewarm \ + pg_stat_advisor \ pg_stat_statements \ pg_surgery \ pg_trgm \ diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile new file mode 100644 index 00..f31b939e8a --- /dev/null +++ b/contrib/pg_stat_advisor/Makefile @@ -0,0 +1,20 @@ +# contrib/pg_stat_advisor/Makefile + +MODULE_big = pg_stat_advisor +OBJS = \ + $(WIN32RES) \ + pg_stat_advisor.o +PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics" + +REGRESS = pg_stat_advisor + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/pg_stat_advisor +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/pg_stat_advisor/expected/pg_stat_advisor.out b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out new file mode 100644 index 00..8f3dab2c2f --- /dev/null +++ b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out @@ -0,0 +1,52 @@ +LOAD 'pg_stat_advisor'; +SET pg_stat_advisor.analyze_scale_factor = 0.4; +SET pg_stat_advisor.suggest_statistics_threshold = 0.11; +CREATE TABLE my_tbl(fld_1 INTEGER, fld_2 BIGINT) WITH (autovacuum_enabled = false); +INSERT INTO my_tbl (fld_1, fld_2) +SELECT + i/100 as fld_1, + i/500 as fld_2 +FROM generate_series(1, 1000) s(i); +ANALYZE my_tbl; +INSERT INTO my_tbl (fld_1, fld_2) +SELECT + i/100 as fld_1,
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
A correction of a typo in previous message: non-leaf pages iteration cycles (under P_ISLEAF(topaque)) -> non-leaf pages iteration cycles (under !P_ISLEAF(topaque)) On Mon, 13 May 2024 at 16:19, Pavel Borisov wrote: > > > On Mon, 13 May 2024 at 15:55, Pavel Borisov > wrote: > >> Hi, Alexander! >> >> On Mon, 13 May 2024 at 05:42, Alexander Korotkov >> wrote: >> >>> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov >>> wrote: >>> > On Sat, May 11, 2024 at 4:13 AM Mark Dilger >>> > wrote: >>> > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov < >>> aekorot...@gmail.com> wrote: >>> > > > The only bt_target_page_check() caller is >>> > > > bt_check_level_from_leftmost(), which overrides state->target in >>> the >>> > > > next iteration anyway. I think the patch is just refactoring to >>> > > > eliminate the confusion pointer by Peter Geoghegan upthread. >>> > > >>> > > I find your argument unconvincing. >>> > > >>> > > After bt_target_page_check() returns at line 919, and before >>> bt_check_level_from_leftmost() overrides state->target in the next >>> iteration, bt_check_level_from_leftmost() conditionally fetches an item >>> from the page referenced by state->target. See line 963. >>> > > >>> > > I'm left with four possibilities: >>> > > >>> > > >>> > > 1) bt_target_page_check() never gets to the code that uses >>> "rightpage" rather than "state->target" in the same iteration where >>> bt_check_level_from_leftmost() conditionally fetches an item from >>> state->target, so the change you're making doesn't matter. >>> > > >>> > > 2) The code prior to v2-0003 was wrong, having changed >>> state->target in an inappropriate way, causing the wrong thing to happen at >>> what is now line 963. The patch fixes the bug, because state->target no >>> longer gets overwritten where you are now using "rightpage" for the value. >>> > > >>> > > 3) The code used to work, having set up state->target correctly in >>> the place where you are now using "rightpage", but v2-0003 has broken that. >>> > > >>> > > 4) It's been broken all along and your patch just changes from >>> wrong to wrong. >>> > > >>> > > >>> > > If you believe (1) is true, then I'm complaining that you are >>> relying far to much on action at a distance, and that you are not >>> documenting it. Even with documentation of this interrelationship, I'd be >>> unhappy with how brittle the code is. I cannot easily discern that the two >>> don't ever happen in the same iteration, and I'm not at all convinced one >>> way or the other. I tried to set up some Asserts about that, but none of >>> the test cases actually reach the new code, so adding Asserts doesn't help >>> to investigate the question. >>> > > >>> > > If (2) is true, then I'm complaining that the commit message doesn't >>> mention the fact that this is a bug fix. Bug fixes should be clearly >>> documented as such, otherwise future work might assume the commit can be >>> reverted with only stylistic consequences. >>> > > >>> > > If (3) is true, then I'm complaining that the patch is flat busted. >>> > > >>> > > If (4) is true, then maybe we should revert the entire feature, or >>> have a discussion of mitigation efforts that are needed. >>> > > >>> > > Regardless of which of 1..4 you pick, I think it could all do with >>> more regression test coverage. >>> > > >>> > > >>> > > For reference, I said something similar earlier today in another >>> email to this thread: >>> > > >>> > > This patch introduces a change that stores a new page into variable >>> "rightpage" rather than overwriting "state->target", which the old >>> implementation most certainly did. That means that after returning from >>> bt_target_page_check() into the calling function >>> bt_check_level_from_leftmost() the value in state->target is not what it >>> would have been prior to this patch. Now, that'd be irrelevant if nobody >>> goes on to consult that value, but just 44 lines further down in >>> bt_check_level_from_leftmost() state->target is clearly used. So the >>> behavior at that point is changing between the old and new versions of the >>> code, and I think I'm within reason to ask if it was wrong before the >>> patch, wrong after the patch, or something else? Is this a bug being >>> introduced, being fixed, or ... ? >>> > >>> > Thank you for your analysis. I'm inclined to believe in 2, but not >>> > yet completely sure. It's really pity that our tests don't cover >>> > this. I'm investigating this area. >>> >>> It seems that I got to the bottom of this. Changing >>> BtreeCheckState.target for a cross-page unique constraint check is >>> wrong, but that happens only for leaf pages. After that >>> BtreeCheckState.target is only used for setting the low key. The low >>> key is only used for non-leaf pages. So, that didn't lead to any >>> visible bug. I've revised the commit message to reflect this. >>> >> >> I agree with your analysis regarding state->target: >> - when the unique check is on,
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
On Mon, 13 May 2024 at 15:55, Pavel Borisov wrote: > Hi, Alexander! > > On Mon, 13 May 2024 at 05:42, Alexander Korotkov > wrote: > >> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov >> wrote: >> > On Sat, May 11, 2024 at 4:13 AM Mark Dilger >> > wrote: >> > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov < >> aekorot...@gmail.com> wrote: >> > > > The only bt_target_page_check() caller is >> > > > bt_check_level_from_leftmost(), which overrides state->target in the >> > > > next iteration anyway. I think the patch is just refactoring to >> > > > eliminate the confusion pointer by Peter Geoghegan upthread. >> > > >> > > I find your argument unconvincing. >> > > >> > > After bt_target_page_check() returns at line 919, and before >> bt_check_level_from_leftmost() overrides state->target in the next >> iteration, bt_check_level_from_leftmost() conditionally fetches an item >> from the page referenced by state->target. See line 963. >> > > >> > > I'm left with four possibilities: >> > > >> > > >> > > 1) bt_target_page_check() never gets to the code that uses >> "rightpage" rather than "state->target" in the same iteration where >> bt_check_level_from_leftmost() conditionally fetches an item from >> state->target, so the change you're making doesn't matter. >> > > >> > > 2) The code prior to v2-0003 was wrong, having changed state->target >> in an inappropriate way, causing the wrong thing to happen at what is now >> line 963. The patch fixes the bug, because state->target no longer gets >> overwritten where you are now using "rightpage" for the value. >> > > >> > > 3) The code used to work, having set up state->target correctly in >> the place where you are now using "rightpage", but v2-0003 has broken that. >> > > >> > > 4) It's been broken all along and your patch just changes from wrong >> to wrong. >> > > >> > > >> > > If you believe (1) is true, then I'm complaining that you are relying >> far to much on action at a distance, and that you are not documenting it. >> Even with documentation of this interrelationship, I'd be unhappy with how >> brittle the code is. I cannot easily discern that the two don't ever >> happen in the same iteration, and I'm not at all convinced one way or the >> other. I tried to set up some Asserts about that, but none of the test >> cases actually reach the new code, so adding Asserts doesn't help to >> investigate the question. >> > > >> > > If (2) is true, then I'm complaining that the commit message doesn't >> mention the fact that this is a bug fix. Bug fixes should be clearly >> documented as such, otherwise future work might assume the commit can be >> reverted with only stylistic consequences. >> > > >> > > If (3) is true, then I'm complaining that the patch is flat busted. >> > > >> > > If (4) is true, then maybe we should revert the entire feature, or >> have a discussion of mitigation efforts that are needed. >> > > >> > > Regardless of which of 1..4 you pick, I think it could all do with >> more regression test coverage. >> > > >> > > >> > > For reference, I said something similar earlier today in another >> email to this thread: >> > > >> > > This patch introduces a change that stores a new page into variable >> "rightpage" rather than overwriting "state->target", which the old >> implementation most certainly did. That means that after returning from >> bt_target_page_check() into the calling function >> bt_check_level_from_leftmost() the value in state->target is not what it >> would have been prior to this patch. Now, that'd be irrelevant if nobody >> goes on to consult that value, but just 44 lines further down in >> bt_check_level_from_leftmost() state->target is clearly used. So the >> behavior at that point is changing between the old and new versions of the >> code, and I think I'm within reason to ask if it was wrong before the >> patch, wrong after the patch, or something else? Is this a bug being >> introduced, being fixed, or ... ? >> > >> > Thank you for your analysis. I'm inclined to believe in 2, but not >> > yet completely sure. It's really pity that our tests don't cover >> > this. I'm investigating this area. >> >> It seems that I got to the bottom of this. Changing >> BtreeCheckState.target for a cross-page unique constraint check is >> wrong, but that happens only for leaf pages. After that >> BtreeCheckState.target is only used for setting the low key. The low >> key is only used for non-leaf pages. So, that didn't lead to any >> visible bug. I've revised the commit message to reflect this. >> > > I agree with your analysis regarding state->target: > - when the unique check is on, state->target was reassigned only for the > leaf pages (under P_ISLEAF(topaque) in bt_target_page_check). > - in this level (leaf) in bt_check_level_from_leftmost() this value of > state->target was used to get state->lowkey. Then it was reset (in the next > iteration of do loop in in bt_check_level_from_leftmost() > - state->lowkey
Re: Allowing additional commas between columns, and at the end of the SELECT clause
Hi, As a developer, I love this feature. But as a developer of an universal TDOP SQL parser[1], this can be a pain. Please request it to the standard. Regards, Étienne [1]: https://gitlab.com/dalibo/transqlate
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Hi, Alexander! On Mon, 13 May 2024 at 05:42, Alexander Korotkov wrote: > On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov > wrote: > > On Sat, May 11, 2024 at 4:13 AM Mark Dilger > > wrote: > > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov < > aekorot...@gmail.com> wrote: > > > > The only bt_target_page_check() caller is > > > > bt_check_level_from_leftmost(), which overrides state->target in the > > > > next iteration anyway. I think the patch is just refactoring to > > > > eliminate the confusion pointer by Peter Geoghegan upthread. > > > > > > I find your argument unconvincing. > > > > > > After bt_target_page_check() returns at line 919, and before > bt_check_level_from_leftmost() overrides state->target in the next > iteration, bt_check_level_from_leftmost() conditionally fetches an item > from the page referenced by state->target. See line 963. > > > > > > I'm left with four possibilities: > > > > > > > > > 1) bt_target_page_check() never gets to the code that uses > "rightpage" rather than "state->target" in the same iteration where > bt_check_level_from_leftmost() conditionally fetches an item from > state->target, so the change you're making doesn't matter. > > > > > > 2) The code prior to v2-0003 was wrong, having changed state->target > in an inappropriate way, causing the wrong thing to happen at what is now > line 963. The patch fixes the bug, because state->target no longer gets > overwritten where you are now using "rightpage" for the value. > > > > > > 3) The code used to work, having set up state->target correctly in > the place where you are now using "rightpage", but v2-0003 has broken that. > > > > > > 4) It's been broken all along and your patch just changes from wrong > to wrong. > > > > > > > > > If you believe (1) is true, then I'm complaining that you are relying > far to much on action at a distance, and that you are not documenting it. > Even with documentation of this interrelationship, I'd be unhappy with how > brittle the code is. I cannot easily discern that the two don't ever > happen in the same iteration, and I'm not at all convinced one way or the > other. I tried to set up some Asserts about that, but none of the test > cases actually reach the new code, so adding Asserts doesn't help to > investigate the question. > > > > > > If (2) is true, then I'm complaining that the commit message doesn't > mention the fact that this is a bug fix. Bug fixes should be clearly > documented as such, otherwise future work might assume the commit can be > reverted with only stylistic consequences. > > > > > > If (3) is true, then I'm complaining that the patch is flat busted. > > > > > > If (4) is true, then maybe we should revert the entire feature, or > have a discussion of mitigation efforts that are needed. > > > > > > Regardless of which of 1..4 you pick, I think it could all do with > more regression test coverage. > > > > > > > > > For reference, I said something similar earlier today in another email > to this thread: > > > > > > This patch introduces a change that stores a new page into variable > "rightpage" rather than overwriting "state->target", which the old > implementation most certainly did. That means that after returning from > bt_target_page_check() into the calling function > bt_check_level_from_leftmost() the value in state->target is not what it > would have been prior to this patch. Now, that'd be irrelevant if nobody > goes on to consult that value, but just 44 lines further down in > bt_check_level_from_leftmost() state->target is clearly used. So the > behavior at that point is changing between the old and new versions of the > code, and I think I'm within reason to ask if it was wrong before the > patch, wrong after the patch, or something else? Is this a bug being > introduced, being fixed, or ... ? > > > > Thank you for your analysis. I'm inclined to believe in 2, but not > > yet completely sure. It's really pity that our tests don't cover > > this. I'm investigating this area. > > It seems that I got to the bottom of this. Changing > BtreeCheckState.target for a cross-page unique constraint check is > wrong, but that happens only for leaf pages. After that > BtreeCheckState.target is only used for setting the low key. The low > key is only used for non-leaf pages. So, that didn't lead to any > visible bug. I've revised the commit message to reflect this. > I agree with your analysis regarding state->target: - when the unique check is on, state->target was reassigned only for the leaf pages (under P_ISLEAF(topaque) in bt_target_page_check). - in this level (leaf) in bt_check_level_from_leftmost() this value of state->target was used to get state->lowkey. Then it was reset (in the next iteration of do loop in in bt_check_level_from_leftmost() - state->lowkey lives until the end of pages level (leaf) iteration cycle. Then, low-key is reset (state->lowkey = NULL in the end of bt_check_level_from_leftmost()) -
Re: Direct SSL connection with ALPN and HBA rules
On 13/05/2024 12:50, Jelte Fennema-Nio wrote: On Sun, 12 May 2024 at 23:39, Heikki Linnakangas wrote: In v18, I'd like to make sslmode=require the default. Or maybe introduce a new setting like "encryption=ssl|gss|none", defaulting to 'ssl'. If we want to encourage encryption, that's the right way to do it. (I'd still recommend everyone to use an explicit sslmode=require in their connection strings for many years, though, because you might be using an older client without realizing it.) I'm definitely a huge proponent of making the connection defaults more secure. But as described above sslmode=require is still extremely insecure and I don't think we gain anything substantial by making it the default. I think the only useful secure default would be to use sslmode=verify-full (with probably some automatic fallback to sslmode=prefer when connecting to hardcoded IPs or localhost). Which probably means that sslrootcert=system should also be made the default. Which would mean that ~/.postgresql/root.crt would not be the default anymore, which I personally think is fine but others likely disagree. "channel_binding=require sslmode=require" also protects from MITM attacks. I think these options should be designed from the user's point of view, so that the user can specify the risks they're willing to accept, and the details of how that's accomplished are handled by libpq. For example, I'm OK with (tick all that apply): [ ] passive eavesdroppers seeing all the traffic [ ] MITM being able to hijack the session [ ] connecting without verifying the server's identity [ ] divulging the plaintext password to the server [ ] ... The requirements for whether SSL or GSS encryption is required, whether the server's certificate needs to signed with known CA, etc. can be derived from those. For example, if you need protection from eavesdroppers, SSL or GSS encryption must be used. If you need to verify the server's identity, it implies sslmode=verify-CA or channel_binding=true. If you don't want to divulge the password, it implies a suitable require_auth setting. I don't have a concrete proposal yet, but something like that. And the defaults for those are up for debate. psql could perhaps help by listing the above properties at the beginning of the session, something like: psql (16.2) WARNING: Connection is not encrypted. WARNING: The server's identity has not been verified Type "help" for help. postgres=# Although for the "divulge plaintext password to server" property, it's too late to print a warning after connecting, because the damage has already been done. A different line of thought is that to keep the attack surface as smal as possible, you should specify very explicitly what exactly you expect to happen in the authentication, and disallow any variance. For example, you expect SCRAM to be used, with a certificate signed by a particular CA, and if the server requests anything else, that's suspicious and the connection is aborted. We should make that possible too, but the above flexible risk-based approach seems good for the defaults. -- Heikki Linnakangas Neon (https://neon.tech)
Re: doc: some fixes for environment sections in ref pages
> On 13 May 2024, at 10:48, Peter Eisentraut wrote: > Patches attached. All patches look good. > I think the first one is a bug fix. Agreed. -- Daniel Gustafsson
Upgrade Debian CI images to Bookworm
Hi, Bookworm versions of the Debian CI images are available now [0]. The patches to use these images are attached. 'v1-0001-Upgrade-Debian-CI-images-to-Bookworm_REL_16+.patch' patch can be applied to both upstream and REL_16 and all of the tasks finish successfully. 'v1-0001-Upgrade-Debian-CI-images-to-Bookworm_REL_15.patch' patch can be applied to REL_15 but it gives a compiler warning. The fix for this warning is proposed here [1]. After the fix is applied, all of the tasks finish successfully. Any kind of feedback would be appreciated. [0] https://github.com/anarazel/pg-vm-images/pull/91 [1] postgr.es/m/CAN55FZ0o9wqVoMTh_gJCmj_%2B4XbX9VXzQF8OySPZ0R1saxV3bA%40mail.gmail.com -- Regards, Nazir Bilal Yavuz Microsoft From 26f4b2425cb04dc7218142f24f151d3698f33191 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Mon, 13 May 2024 10:56:28 +0300 Subject: [PATCH v1] Upgrade Debian CI images to Bookworm New Debian version, namely Bookworm, is released. Use these new images in CI tasks. Perl version is upgraded in the Bookworm images, so update Perl version at 'Linux - Debian Bookworm - Meson' task as well. Upgrading Debian CI images to Bookworm PR: https://github.com/anarazel/pg-vm-images/pull/91 --- .cirrus.tasks.yml | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index 82261eccb85..d2ff833fcad 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -65,7 +65,7 @@ task: CPUS: 4 BUILD_JOBS: 8 TEST_JOBS: 8 -IMAGE_FAMILY: pg-ci-bullseye +IMAGE_FAMILY: pg-ci-bookworm CCACHE_DIR: ${CIRRUS_WORKING_DIR}/ccache_dir # no options enabled, should be small CCACHE_MAXSIZE: "150M" @@ -241,7 +241,7 @@ task: CPUS: 4 BUILD_JOBS: 4 TEST_JOBS: 8 # experimentally derived to be a decent choice -IMAGE_FAMILY: pg-ci-bullseye +IMAGE_FAMILY: pg-ci-bookworm CCACHE_DIR: /tmp/ccache_dir DEBUGINFOD_URLS: "https://debuginfod.debian.net; @@ -312,7 +312,7 @@ task: #DEBIAN_FRONTEND=noninteractive apt-get -y install ... matrix: -- name: Linux - Debian Bullseye - Autoconf +- name: Linux - Debian Bookworm - Autoconf env: SANITIZER_FLAGS: -fsanitize=address @@ -346,7 +346,7 @@ task: on_failure: <<: *on_failure_ac -- name: Linux - Debian Bullseye - Meson +- name: Linux - Debian Bookworm - Meson env: CCACHE_MAXSIZE: "400M" # tests two different builds @@ -373,7 +373,7 @@ task: ${LINUX_MESON_FEATURES} \ -Dllvm=disabled \ --pkg-config-path /usr/lib/i386-linux-gnu/pkgconfig/ \ --DPERL=perl5.32-i386-linux-gnu \ +-DPERL=perl5.36-i386-linux-gnu \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ build-32 EOF @@ -648,7 +648,7 @@ task: env: CPUS: 4 BUILD_JOBS: 4 -IMAGE_FAMILY: pg-ci-bullseye +IMAGE_FAMILY: pg-ci-bookworm # Use larger ccache cache, as this task compiles with multiple compilers / # flag combinations -- 2.43.0 From d8f4b660bdd408fa22c857e24d1b4d05ff41df03 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Mon, 13 May 2024 11:55:48 +0300 Subject: [PATCH v1] Upgrade Debian CI images to Bookworm New Debian version, namely Bookworm, is released. Use these new images in CI tasks. Upgrading Debian CI images to Bookworm PR: https://github.com/anarazel/pg-vm-images/pull/91 --- .cirrus.tasks.yml | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index b03d128184d..a2735f8bb60 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -138,13 +138,13 @@ LINUX_CONFIGURE_FEATURES: _CONFIGURE_FEATURES >- task: - name: Linux - Debian Bullseye + name: Linux - Debian Bookworm env: CPUS: 4 BUILD_JOBS: 4 TEST_JOBS: 8 # experimentally derived to be a decent choice -IMAGE_FAMILY: pg-ci-bullseye +IMAGE_FAMILY: pg-ci-bookworm CCACHE_DIR: /tmp/ccache_dir DEBUGINFOD_URLS: "https://debuginfod.debian.net; @@ -451,12 +451,12 @@ task: # To limit unnecessary work only run this once the normal linux test succeeds depends_on: -- Linux - Debian Bullseye +- Linux - Debian Bookworm env: CPUS: 4 BUILD_JOBS: 4 -IMAGE_FAMILY: pg-ci-bullseye +IMAGE_FAMILY: pg-ci-bookworm # Use larger ccache cache, as this task compiles with multiple compilers / # flag combinations -- 2.43.0
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
On Mon, May 13, 2024 at 12:45 PM Dmitry Koval wrote: > 13.05.2024 11:45, Daniel Gustafsson пишет: > > Commit 3ca43dbbb67f which adds the permission checks seems to cause > > conflicts > > in the pg_upgrade tests > > Thanks! > > It will probably be enough to rename the roles: > > regress_partition_merge_alice -> regress_partition_split_alice > regress_partition_merge_bob -> regress_partition_split_bob Thanks to Danial for spotting this. Thanks to Dmitry for the proposed fix. The actual problem appears to be a bit more complex. Additionally to the role names, the lack of permissions on schemas lead to creation of tables in public schema and potential conflict there. Fixed in 2a679ae94e. -- Regards, Alexander Korotkov
Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS
Hi, > > Thanks. I see a few pieces of code that use special FOO_NUMBER enum > > values instead of a macro. Should we refactor these pieces > > accordingly? PFA another patch. > > I think this is a sensible improvement. > > But please keep the trailing commas on the last enum items. Thanks, fixed. -- Best regards, Aleksander Alekseev v2-0001-Use-macro-to-define-the-number-of-enum-values.patch Description: Binary data
Re: SQL:2011 application time
On 03.04.24 07:30, Paul Jungwirth wrote: But is it *literally* unique? Well two identical keys, e.g. (5, '[Jan24,Mar24)') and (5, '[Jan24,Mar24)'), do have overlapping ranges, so the second is excluded. Normally a temporal unique index is *more* restrictive than a standard one, since it forbids other values too (e.g. (5, '[Jan24,Feb24)')). But sadly there is one exception: the ranges in these keys do not overlap: (5, 'empty'), (5, 'empty'). With ranges/multiranges, `'empty' && x` is false for all x. You can add that key as many times as you like, despite a PK/UQ constraint: postgres=# insert into t values ('[1,2)', 'empty', 'foo'), ('[1,2)', 'empty', 'bar'); INSERT 0 2 postgres=# select * from t; id | valid_at | name ---+--+-- [1,2) | empty | foo [1,2) | empty | bar (2 rows) Cases like this shouldn't actually happen for temporal tables, since empty is not a meaningful value. An UPDATE/DELETE FOR PORTION OF would never cause an empty. But we should still make sure they don't cause problems. We should give temporal primary keys an internal CHECK constraint saying `NOT isempty(valid_at)`. The problem is analogous to NULLs in parts of a primary key. NULLs prevent two identical keys from ever comparing as equal. And just as a regular primary key cannot contain NULLs, so a temporal primary key should not contain empties. The standard effectively prevents this with PERIODs, because a PERIOD adds a constraint saying start < end. But our ranges enforce only start <= end. If you say `int4range(4,4)` you get `empty`. If we constrain primary keys as I'm suggesting, then they are literally unique, and indisunique seems safer. Should we add the same CHECK constraint to temporal UNIQUE indexes? I'm inclined toward no, just as we don't forbid NULLs in parts of a UNIQUE key. We should try to pick what gives users more options, when possible. Even if it is questionably meaningful, I can see use cases for allowing empty ranges in a temporal table. For example it lets you "disable" a row, preserving its values but marking it as never true. It looks like we missed some of these fundamental design questions early on, and it might be too late now to fix them for PG17. For example, the discussion on unique constraints misses that the question of null values in unique constraints itself is controversial and that there is now a way to change the behavior. So I imagine there is also a selection of possible behaviors you might want for empty ranges. Intuitively, I don't think empty ranges are sensible for temporal unique constraints. But anyway, it's a bit late now to be discussing this. I'm also concerned that if ranges have this fundamental incompatibility with periods, then the plan to eventually evolve this patch set to support standard periods will also have as-yet-unknown problems. Some of these issues might be design flaws in the underlying mechanisms, like range types and exclusion constraints. Like, if you're supposed to use this for scheduling but you can use empty ranges to bypass exclusion constraints, how is one supposed to use this? Yes, a check constraint using isempty() might be the right answer. But I don't see this documented anywhere. On the technical side, adding an implicit check constraint as part of a primary key constraint is quite a difficult implementation task, as I think you are discovering. I'm just reminded about how the patch for catalogued not-null constraints struggled with linking these not-null constraints to primary keys correctly. This sounds a bit similar. I'm afraid that these issues cannot be resolved in good time for this release, so we should revert this patch set for now.
Re: [PATCH] Fix bug when calling strncmp in check_authmethod_valid
Hi, > Searching the archives I was unable to find any complaints, and this has been > broken for the entire window of supported releases, so I propose we remove it > as per the attached patch. If anyone is keen on making this work again for > all > the types where it makes sense, it can be resurrected (probably with a better > implementation). > > Any objections to fixing this in 17 by removing it? (cc:ing Michael from the > RMT) +1 Something that is not documented or used by anyone (apparently) and is broken should just be removed. -- Best regards, Aleksander Alekseev
Re: UniqueKey v2
Andy Fan wrote: > > * I think that, before EC is considered suitable for an UK, its > > ec_opfamilies > > field needs to be checked. I try to do that in > > find_ec_position_matching_expr(), see 0004. > > Could you make the reason clearer for adding 'List *opfamily_lists;' > into UniqueKey? You said "This is needed to create ECs in the parent > query if the upper relation represents a subquery." but I didn't get the > it. Since we need to maintain the UniqueKey in the many places, I'd like > to keep it as simple as possbile. Of course, anything essentical should > be added for sure. If unique keys are generated for a subquery output, they also need to be created for the corresponding relation in the upper query ("sub" in the following example): select * from tab1 left join (select * from tab2) sub; However, to create an unique key for "sub", you need an EC for each expression of the key. And to create an EC, you in turn need the list of operator families. Even if the parent query already had ECs for the columns of "sub" which are contained in the unique key, you need to make sure that those ECs are "compatible" with the ECs of the subquery which generated the unique key. That is, if an EC of the subquery considers certain input values equal, the EC of the parent query must also be able to determine if they are equal or not. > > * RelOptInfo.notnullattrs > > > > My understanding is that this field contains the set attributes whose > > uniqueness is guaranteed by the unique key. They are acceptable because > > they > > are either 1) not allowed to be NULL due to NOT NULL constraint or 2) NULL > > value makes the containing row filtered out, so the row cannot break > > uniqueness of the output. Am I right? > > > > If so, I suggest to change the field name to something less generic, maybe > > 'uniquekey_attrs' or 'uniquekey_candidate_attrs', and adding a comment > > that > > more checks are needed before particular attribute can actually be used in > > UniqueKey. > > I don't think so, UniqueKey is just one of the places to use this > not-null property, see 3af704098 for the another user case of it. > > (Because of 3af704098, we should leverage notnullattnums somehow in this > patch, which will be included in the next version as well). In your patch you modify 'notnullattrs' in add_base_clause_to_rel(), but that does not happen to 'notnullattnums' in the current master branch. Thus I think that 'notnullattrs' is specific to the unique keys feature, so the field name should be less generic. > > > > * uniquekey_useful_for_merging() > > > > How does uniqueness relate to merge join? In README.uniquekey you seem to > > point out that a single row is always sorted, but I don't think this > > function is related to that fact. (Instead, I'd expect that pathkeys are > > added to all paths for a single-row relation, but I'm not sure you do that > > in the current version of the patch.) > > The merging is for "mergejoinable join clauses", see function > eclass_useful_for_merging. Usually I think it as operator "t1.a = t2.a"; My question is: why is the uniqueness important specifically to merge join? I understand that join evaluation can be more efficient if we know that one input relation is unique (i.e. we only scan that relation until we find the first match), but this is not specific to merge join. > > * is_uniquekey_useful_afterjoin() > > > > Now that my patch (0004) allows propagation of the unique keys from a > > subquery to the upper query, I was wondering if the UniqueKey structure > > needs the 'use_for_distinct field' I mean we should now propagate the > > unique > > keys to the parent query whether the subquery has DISTINCT clause or not. > > I > > noticed that the field is checked by is_uniquekey_useful_afterjoin(), so I > > changed the function to always returned true. However nothing changed in > > the > > output of regression tests (installcheck). Do you insist that the > > 'use_for_distinct' field is needed? I miss your answer to this comment. > > * uniquekey_contains_multinulls() > > > > ** Instead of calling the function when trying to use the UK, how about > > checking the ECs when considering creation of the UK? If the tests > > fail, > > just don't create the UK. > > I don't think so since we maintain the UniqueKey from bottom to top, you > can double check if my reason is appropriate. > > CREATE TABLE t1(a int); > CREATE INDEX ON t1(a); > > SELECT distinct t1.a FROM t1 JOIN t2 using(a); > > We need to create the UniqueKey on the baserel for t1 and the NULL > values is filtered out in the joinrel. so we have to creating it with > allowing NULL values first. ok > > ** What does the 'multi' word in the function name mean? > > multi means multiple, I thought we use this short name in the many > places, for ex bt_multi_page_stats after a quick search. Why not simply uniquekey_contains_nulls() ? Actually
Re: Direct SSL connection with ALPN and HBA rules
On Sun, 12 May 2024 at 23:39, Heikki Linnakangas wrote: > You might miss that by changing sslnegotiation to 'postgres', or by > removing it altogether, you not only made it compatible with older > server versions, but you also allowed falling back to a plaintext > connection. Maybe you're fine with that, but maybe not. I'd like to > nudge people to use sslmode=require, not rely on implicit stuff like > this just to make connection strings a little shorter. I understand your worry, but I'm not sure that this is actually much of a security issue in practice. sslmode=prefer and sslmode=require are the same amount of insecure imho (i.e. extremely insecure). The only reason sslmode=prefer would connect as non-ssl to a server that supports ssl is if an attacker has write access to the network in the middle (i.e. eavesdropping ability alone is not enough). Once an attacker has this level of network access, it's trivial for this attacker to read any data sent to Postgres by intercepting the TLS handshake and doing TLS termination with some arbitrary cert (any cert is trusted by sslmode=require). So the only actual case where this is a security issue I can think of is when an attacker has only eavesdropping ability on the network. And somehow the Postgres server that the client tries to connect to is configured incorrectly, so that no ssl is set up at all. Then a client would drop to plaintext, when connecting to this server instead of erroring, and the attacker could now read the traffic. But I don't really see this scenario end up any differently when requiring people to enter sslmode=require. The only action a user can take to connect to a server that does not have ssl support is to remove sslmode=require from the connection string. Except if they are also the server operator, in which case they could enable ssl on the server. But if ssl is not set up, then it was probably never set up, and thus providing sslnegotiation=direct would be to test if ssl works. But, if you disagree I'm fine with erroring on plain sslnegotiation=direct > In v18, I'd like to make sslmode=require the default. Or maybe introduce > a new setting like "encryption=ssl|gss|none", defaulting to 'ssl'. If we > want to encourage encryption, that's the right way to do it. (I'd still > recommend everyone to use an explicit sslmode=require in their > connection strings for many years, though, because you might be using an > older client without realizing it.) I'm definitely a huge proponent of making the connection defaults more secure. But as described above sslmode=require is still extremely insecure and I don't think we gain anything substantial by making it the default. I think the only useful secure default would be to use sslmode=verify-full (with probably some automatic fallback to sslmode=prefer when connecting to hardcoded IPs or localhost). Which probably means that sslrootcert=system should also be made the default. Which would mean that ~/.postgresql/root.crt would not be the default anymore, which I personally think is fine but others likely disagree.
Re: 039_end_of_wal: error in "xl_tot_len zero" test
On 13/05/2024 00:39, Tom Lane wrote: Hm. It occurs to me that there *is* a system-specific component to the amount of WAL emitted during initdb: the number of locales that "locale -a" prints translates directly to the number of rows inserted into pg_collation. [...] Yes. Another system-specific circumstance affecting WAL position is the name length of the unix user doing initdb. I've seen 039_end_of_wal failing consistently under user but working fine with , both on the same machine at the same time. To be more precise, on one particular machine under those particular circumstances (including set of locales) it would work for any username with length < 8 or >= 16, but would fail for length 8..15 (in bytes, not characters, if non-ASCII usernames were used). -- Anton Voloshin Postgres Professional, The Russian Postgres Company https://postgrespro.ru
Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?
Hi, do I interpret your idea correctly: You want some sort of ordering without ordering? Kind regardsWW Am Montag, 13. Mai 2024 um 10:40:38 MESZ hat aa Folgendes geschrieben: Hello Everyone! Is there any chance to get some kind of a result set sifting mechanism in Postgres? What I am looking for is a way to get for example: "nulls last" in a result set, without having to call "order by" or having to use UNION ALL, and if possible to get this in a single result set pass. Something on this line: SELECT a, b, c FROM my_table WHERE a nulls last OFFSET 0 LIMIT 25 I don't want to use order by or union all because these are time consuming operations, especially on large data sets and when comparations are done on dynamic values (eg: geolocation distances in between a mobile and a static location) What I would expect from such a feature, will be speeds comparable with non sorted selects, while getting a very rudimentary ordering. A use case for such a mechanism will be the implementation of QUICK relevant search results for a search engine. I'm not familiar with how Postgres logic handles simple select queries, but the way I would envision a result set sifting logic, would be to collect the result set, in 2 separate lists, based on the sifting condition, and then concatenate these 2 lists and return the result, when the pagination requests conditions are met. Any idea if such a functionality is feasible ? Thank you. PS: if ever implemented, the sifting mechanism could be extended to accommodate any type of thresholds, not just null values.
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi! 13.05.2024 11:45, Daniel Gustafsson пишет: Commit 3ca43dbbb67f which adds the permission checks seems to cause conflicts in the pg_upgrade tests Thanks! It will probably be enough to rename the roles: regress_partition_merge_alice -> regress_partition_split_alice regress_partition_merge_bob -> regress_partition_split_bob (changes in attachment) -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.comFrom f307add79397bcfe20f7c779e77630fb0df73020 Mon Sep 17 00:00:00 2001 From: Koval Dmitry Date: Mon, 13 May 2024 12:32:43 +0300 Subject: [PATCH v1] Rename roles to avoid conflicts in concurrent work --- src/test/regress/expected/partition_split.out | 20 +-- src/test/regress/sql/partition_split.sql | 20 +-- 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/src/test/regress/expected/partition_split.out b/src/test/regress/expected/partition_split.out index b1108c92a2..999e923ef1 100644 --- a/src/test/regress/expected/partition_split.out +++ b/src/test/regress/expected/partition_split.out @@ -1516,33 +1516,33 @@ DROP TABLE t; DROP ACCESS METHOD partition_split_heap; -- Test permission checks. The user needs to own the parent table and the -- the partition to split to do the split. -CREATE ROLE regress_partition_merge_alice; -CREATE ROLE regress_partition_merge_bob; -SET SESSION AUTHORIZATION regress_partition_merge_alice; +CREATE ROLE regress_partition_split_alice; +CREATE ROLE regress_partition_split_bob; +SET SESSION AUTHORIZATION regress_partition_split_alice; CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2); -SET SESSION AUTHORIZATION regress_partition_merge_bob; +SET SESSION AUTHORIZATION regress_partition_split_bob; ALTER TABLE t SPLIT PARTITION tp_0_2 INTO (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1), PARTITION tp_1_2 FOR VALUES FROM (1) TO (2)); ERROR: must be owner of table t RESET SESSION AUTHORIZATION; -ALTER TABLE t OWNER TO regress_partition_merge_bob; -SET SESSION AUTHORIZATION regress_partition_merge_bob; +ALTER TABLE t OWNER TO regress_partition_split_bob; +SET SESSION AUTHORIZATION regress_partition_split_bob; ALTER TABLE t SPLIT PARTITION tp_0_2 INTO (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1), PARTITION tp_1_2 FOR VALUES FROM (1) TO (2)); ERROR: must be owner of table tp_0_2 RESET SESSION AUTHORIZATION; -ALTER TABLE tp_0_2 OWNER TO regress_partition_merge_bob; -SET SESSION AUTHORIZATION regress_partition_merge_bob; +ALTER TABLE tp_0_2 OWNER TO regress_partition_split_bob; +SET SESSION AUTHORIZATION regress_partition_split_bob; ALTER TABLE t SPLIT PARTITION tp_0_2 INTO (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1), PARTITION tp_1_2 FOR VALUES FROM (1) TO (2)); RESET SESSION AUTHORIZATION; DROP TABLE t; -DROP ROLE regress_partition_merge_alice; -DROP ROLE regress_partition_merge_bob; +DROP ROLE regress_partition_split_alice; +DROP ROLE regress_partition_split_bob; -- Check extended statistics aren't copied from the parent table to new -- partitions. CREATE TABLE t (i int, j int) PARTITION BY RANGE (i); diff --git a/src/test/regress/sql/partition_split.sql b/src/test/regress/sql/partition_split.sql index 7f231b0d39..be4a785b75 100644 --- a/src/test/regress/sql/partition_split.sql +++ b/src/test/regress/sql/partition_split.sql @@ -896,36 +896,36 @@ DROP ACCESS METHOD partition_split_heap; -- Test permission checks. The user needs to own the parent table and the -- the partition to split to do the split. -CREATE ROLE regress_partition_merge_alice; -CREATE ROLE regress_partition_merge_bob; +CREATE ROLE regress_partition_split_alice; +CREATE ROLE regress_partition_split_bob; -SET SESSION AUTHORIZATION regress_partition_merge_alice; +SET SESSION AUTHORIZATION regress_partition_split_alice; CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2); -SET SESSION AUTHORIZATION regress_partition_merge_bob; +SET SESSION AUTHORIZATION regress_partition_split_bob; ALTER TABLE t SPLIT PARTITION tp_0_2 INTO (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1), PARTITION tp_1_2 FOR VALUES FROM (1) TO (2)); RESET SESSION AUTHORIZATION; -ALTER TABLE t OWNER TO regress_partition_merge_bob; -SET SESSION AUTHORIZATION regress_partition_merge_bob; +ALTER TABLE t OWNER TO regress_partition_split_bob; +SET SESSION AUTHORIZATION regress_partition_split_bob; ALTER TABLE t SPLIT PARTITION tp_0_2 INTO (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1), PARTITION tp_1_2 FOR VALUES FROM (1) TO (2)); RESET SESSION AUTHORIZATION; -ALTER TABLE tp_0_2 OWNER TO regress_partition_merge_bob; -SET SESSION AUTHORIZATION regress_partition_merge_bob; +ALTER TABLE tp_0_2 OWNER TO regress_partition_split_bob; +SET SESSION AUTHORIZATION regress_partition_split_bob; ALTER TABLE t SPLIT PARTITION tp_0_2 INTO (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1), PARTITION tp_1_2
Re: Allowing additional commas between columns, and at the end of the SELECT clause
On Mon, 13 May 2024 at 10:42, Artur Formella wrote: > Motivation: > Commas of this type are allowed in many programming languages, in some > it is even recommended to use them at the ends of lists or objects. Single trailing commas are a feature that's more and more common in languages, yes, but arbitrary excess commas is new to me. Could you provide some examples of popular languages which have that, as I can't think of any. > Accepted: > SELECT 1,; > SELECT 1,; > SELECT *, from information_schema.sql_features; > (...) RETURNING a,,b,c,; > > Not accepted: > SELECT ,; > SELECT ,1; > SELECT ,,,; > > Advantages: > - simplifies the creation and debugging of queries by reducing the most > common syntax error, > - eliminates the need to use the popular `1::int as dummy` at the end of > a SELECT list, This is the first time I've heard of this `1 as dummy`. > - simplifies query generators, > - the query is still deterministic, What part of a query would (or would not) be deterministic? I don't think I understand the potential concern here. Is it about whether the statement can be parsed deterministically? > Disadvantages: > - counting of returned columns can be difficult, > - syntax checkers will still report errors, > - probably not SQL standard compliant, I'd argue you better raise this with the standard committee if this isn't compliant. I don't see enough added value to break standard compliance here, especially when the standard may at some point allow only a single trailing comma (and not arbitrarily many). > What do you think? Do you expect `SELECT 1,,,` to have an equivalent query identifier to `SELECT 1;` in pg_stat_statements? Why, or why not? Overall, I don't think unlimited commas is a good feature. A trailing comma in the select list would be less problematic, but I'd still want to follow the standard first and foremost. Kind regards, Matthias van de Meent Neon (https://neon.tech)