Bibliography section, some references cannot be found

2024-05-13 Thread jian he
hi.

while reading this[1],
<< More information about partial indexes can be found in [ston89b],
[olson93], and [seshadri95].
I googled around, still cannot find [olson93] related pdf or html link.


in [2],
I found out
[ong90] “A Unified Framework for Version Modeling Using Production
Rules in a Database System”. L. Ong and J. Goh. ERL Technical
Memorandum M90/33. University of California. Berkeley, California.
April, 1990.
related link is
https://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/1466.html


an idea:
In case these external links in
(https://www.postgresql.org/docs/current/biblio.html) become dead
links,
we can send these external pdf or html files in the mailing list for
archive purposes.

[1] https://www.postgresql.org/docs/current/indexes-partial.html
[2] https://www.postgresql.org/docs/current/biblio.html




Re: explain format json, unit for serialize and memory are different.

2024-05-13 Thread Michael Paquier
On Mon, May 13, 2024 at 11:22:08AM +0200, Daniel Gustafsson wrote:
> Since json (and yaml/xml) is intended to be machine-readable I think we use a
> single unit for all values, and document this fact.

Agreed with the documentation gap.  Another thing that could be worth
considering is to add the units aside with the numerical values, say:
"Memory Used": {"value": 23432, "units": "bytes"}

That would require changing ExplainProperty() so as the units are
showed in some shape, while still being readable when parsed.  I
wouldn't recommend doing that in v17, but perhaps v18 could do better?

Units are also ignored for the XML and yaml outputs.
--
Michael


signature.asc
Description: PGP signature


Re: SQL:2011 application time

2024-05-13 Thread jian he
On Tue, May 14, 2024 at 7:30 AM Paul Jungwirth
 wrote:
>
> On 5/13/24 03:11, Peter Eisentraut wrote:
> > It looks like we missed some of these fundamental design questions early 
> > on, and it might be too
> > late now to fix them for PG17.
> >
> > For example, the discussion on unique constraints misses that the question 
> > of null values in unique
> > constraints itself is controversial and that there is now a way to change 
> > the behavior.  So I
> > imagine there is also a selection of possible behaviors you might want for 
> > empty ranges.
> > Intuitively, I don't think empty ranges are sensible for temporal unique 
> > constraints.  But anyway,
> > it's a bit late now to be discussing this.
> >
> > I'm also concerned that if ranges have this fundamental incompatibility 
> > with periods, then the plan
> > to eventually evolve this patch set to support standard periods will also 
> > have as-yet-unknown problems.
> >
> > Some of these issues might be design flaws in the underlying mechanisms, 
> > like range types and
> > exclusion constraints.  Like, if you're supposed to use this for scheduling 
> > but you can use empty
> > ranges to bypass exclusion constraints, how is one supposed to use this?  
> > Yes, a check constraint
> > using isempty() might be the right answer.  But I don't see this documented 
> > anywhere.
> >
> > On the technical side, adding an implicit check constraint as part of a 
> > primary key constraint is
> > quite a difficult implementation task, as I think you are discovering.  I'm 
> > just reminded about how
> > the patch for catalogued not-null constraints struggled with linking these 
> > not-null constraints to
> > primary keys correctly.  This sounds a bit similar.
> >
> > I'm afraid that these issues cannot be resolved in good time for this 
> > release, so we should revert
> > this patch set for now.
>
> I think reverting is a good idea. I'm not really happy with the CHECK 
> constraint solution either.
> I'd be happy to have some more time to rework this for v18.
>
> A couple alternatives I'd like to explore:
>
> 1. Domain constraints instead of a CHECK constraint. I think this is probably 
> worse, and I don't
> plan to spend much time on it, but I thought I'd mention it in case someone 
> else thought otherwise.
>
> 2. A slightly different overlaps operator, say &&&, where 'empty' &&& 'empty' 
> is true. But 'empty'
> with anything else could still be false (or not). That operator would prevent 
> duplicates in an
> exclusion constraint. This also means we could support more types than just 
> ranges & multiranges. I
> need to think about whether this combines badly with existing operators, but 
> if not it has a lot of
> promise. If anything it might be *less* contradictory, because it fits better 
> with 'empty' @>
> 'empty', which we say is true.
>
thanks for the idea, I roughly played around with it, seems doable.
but the timing seems not good, reverting is a good idea.


I also checked the commit. 6db4598fcb82a87a683c4572707e522504830a2b
+
+/*
+ * Returns the btree number for equals, otherwise invalid.
+ */
+Datum
+gist_stratnum_btree(PG_FUNCTION_ARGS)
+{
+ StrategyNumber strat = PG_GETARG_UINT16(0);
+
+ switch (strat)
+ {
+ case RTEqualStrategyNumber:
+ PG_RETURN_UINT16(BTEqualStrategyNumber);
+ case RTLessStrategyNumber:
+ PG_RETURN_UINT16(BTLessStrategyNumber);
+ case RTLessEqualStrategyNumber:
+ PG_RETURN_UINT16(BTLessEqualStrategyNumber);
+ case RTGreaterStrategyNumber:
+ PG_RETURN_UINT16(BTGreaterStrategyNumber);
+ case RTGreaterEqualStrategyNumber:
+ PG_RETURN_UINT16(BTGreaterEqualStrategyNumber);
+ default:
+ PG_RETURN_UINT16(InvalidStrategy);
+ }
+}
the comments seem not right?




Re: [PATCH] Fix bug when calling strncmp in check_authmethod_valid

2024-05-13 Thread Michael Paquier
On Mon, May 13, 2024 at 01:01:21PM +0300, Aleksander Alekseev wrote:
>> Any objections to fixing this in 17 by removing it? (cc:ing Michael from the 
>> RMT)
> 
> +1 Something that is not documented or used by anyone (apparently) and
> is broken should just be removed.

8a02339e9ba3 sounds like an argument good enough to prove there is no
demand in the field for being able to support options through initdb
--auth, and this does not concern only pam.  If somebody is interested
in that, that could always be done later.  My take is that this would
be simpler if implemented through a separate option, leaving the
checks between the options and the auth method up to the postmaster
when loading pg_hba.conf at startup.

Hence, no objections to clean up that now.  Thanks for asking.
--
Michael


signature.asc
Description: PGP signature


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Peter Smith
Hi Kuroda-san, Here are some review comments for all patches v9*

//
Patch v9-0001
//

There were no changes since v8-0001, so no comments.

//
Patch v9-0002
//

==
Commit Message

2.1.
Regarding the off->on case, the logical replication already has a
mechanism for it, so there is no need to do anything special for the
on->off case; however, we must connect to the publisher and expressly
change the parameter. The operation cannot be rolled back, and
altering the parameter from "on" to "off" within a transaction is
prohibited.

In the opposite case, there is no need to prevent this because the
logical replication worker already had the mechanism to alter the slot
option at a convenient time.

~

This explanation seems to be going around in circles, without giving
any new information:

AFAICT, "Regarding the off->on case, the logical replication already
has a mechanism for it, so there is no need to do anything special for
the on->off case;"

is saying pretty much the same as:

"In the opposite case, there is no need to prevent this because the
logical replication worker already had the mechanism to alter the slot
option at a convenient time."

But, what I hoped for in previous review comments was an explanation
somewhat less vague than "already has a mechanism" or "already had the
mechanism". Can't this have just 1 or 2 lines to say WHAT is that
existing mechanism for the "off" to "on" case, and WHY that means
there is nothing special to do in that scenario?

==
src/backend/commands/subscriptioncmds.c

2.2. AlterSubscription

  /*
- * The changed two_phase option of the slot can't be rolled
- * back.
+ * Since altering the two_phase option of subscriptions
+ * also leads to changing the slot option, this command
+ * cannot be rolled back. So prevent this if we are in a
+ * transaction block. In the opposite case, there is no
+ * need to prevent this because the logical replication
+ * worker already had the mechanism to alter the slot
+ * option at a convenient time.
  */

(Same previous review comments, and same as my review comment for the
commit message above).

I don't think "already had the mechanism" is enough explanation.

Also, the 2nd sentence doesn't make sense here because the comment
only said "altering the slot option" -- it didn't say it was altering
it to "on" or altering it to "off", so "the opposite case" has no
meaning.

~~~

2.3. AlterSubscription

  /*
- * Try to acquire the connection necessary for altering slot.
+ * Check the need to alter the replication slot. Failover and two_phase
+ * options are controlled by both the publisher (as a slot option) and the
+ * subscriber (as a subscription option). The slot option must be altered
+ * only when changing "on" to "off". Because in opposite case, the logical
+ * replication worker already has the mechanism to do so at a convenient
+ * time.
+ */
+ update_failover = replaces[Anum_pg_subscription_subfailover - 1];
+ update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] &&
+ !opts.twophase);

This is again the same as other review comments above. Probably, when
some better explanation can be found for "already has the mechanism to
do so at a convenient time." then all of these places can be changed
using similar text.

//
Patch v9-0003
//

There are some imperfect code comments but AFAIK they are the same
ones from patch 0002. I think patch 0003 is just moving those comments
to different places, so probably they would already be addressed by
patch 0002.

//
Patch v9-0004
//

==
doc/src/sgml/catalogs.sgml

4.1.
+ 
+  
+   subforcealter bool
+  
+  
+   If true, the subscription can be altered two_phase
+   option, even if there are prepared transactions
+  
+ 
+

BEFORE
If true, the subscription can be altered two_phase
option, even if there are prepared transactions

SUGGESTION
If true, then the ALTER SUBSCRIPTION command can disable
two_phase option, even if there are uncommitted
prepared transactions from when two_phase was
enabled

==
doc/src/sgml/ref/alter_subscription.sgml

4.2.
-
- 
-  The two_phase parameter can only be altered when the
-  subscription is disabled. When altering the parameter from
on
-  to off, the backend process checks for any incomplete
-  prepared transactions done by the logical replication worker (from when
-  two_phase parameter was still on)
-  and, if any are found, those are aborted.
- 

Well, I still think there ought to be some mention of the relationship
between 'force_alter' and 'two_phase' given on this ALTER SUBSCRIPTION
page. Then the user can cross-reference to read what the 'force_alter'
actually does.

==
doc/src/sgml/ref/create_subscription.sgml

4.3.
+
+   
+force_alter (boolean)
+
+ 
+  Specifies whether the subscription can be altered
+  two_phase option, even if there are prepared

Re: JIT compilation per plan node

2024-05-13 Thread David Rowley
On Fri, 15 Mar 2024 at 10:13, Tomas Vondra
 wrote:
> To clarify, I think the patch is a step in the right direction, and a
> meaningful improvement. It may not be the perfect solution we imagine
> (but who knows how far we are from that), but AFAIK moving these
> decisions to the node level is something the ideal solution would need
> to do too.

I got thinking about this patch again while working on [1]. I want to
write this down as I don't quite have time to get fully back into this
right now...

Currently, during execution, ExecCreateExprSetupSteps() traverses the
Node tree of the Expr to figure out the max varattno of for each slot.
That's done so all of the tuple deforming happens at once rather than
incrementally. Figuring out the max varattno is a price we have to pay
for every execution of the query. I think we'd be better off doing
that in the planner.

To do this, I thought that setrefs.c could do this processing in
fix_join_expr / fix_upper_expr and wrap up the expression in a new
Node type that stores the max varattno for each special var type.

This idea is related to this discussion because another thing that
could be stored in the very same struct is the "num_exec" value.   I
feel the number of executions of an ExprState is a better gauge of how
useful JIT will be than the cost of the plan node. Now, looking at
set_join_references(), the execution estimates are not exactly
perfect. For example;

#define NUM_EXEC_QUAL(parentplan)   ((parentplan)->plan_rows * 2.0)

that's not a great estimate for a Nested Loop's joinqual, but we could
easily make efforts to improve those and that could likely be done
independently and concurrently with other work to make JIT more
granular.

The problem with doing this is that there's just a huge amount of code
churn in the executor.  I am keen to come up with a prototype so I can
get a better understanding of if this is a practical solution.  I
don't want to go down that path if it's just me that thinks the number
of times an ExprState is evaluated is a better measure to go on for
JIT vs no JIT than the plan node's total cost.

Does anyone have any thoughts on that idea?

David

[1] 
https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm=m0L+Gc=T=ke...@mail.gmail.com




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Peter Smith
Hi Kuroda-san,

I'm having second thoughts about how these patches mention the option
values "on|off". These are used in the ALTER SUBSCRIPTION document
page for 'two_phase' and 'failover' parameters, and then those
"on|off" get propagated to the code comments, error messages, and
tests...

Now I see that on the CREATE SUBSCRIPTION page [1], every boolean
parameter (even including 'two_phase' and 'failover') is described in
terms of "true|false" (not "on|off").

In hindsight, it is probably better to refer only to true|false
everywhere for these boolean parameters, instead of sometimes using
different values like on|off.

What do you think?

==
[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: UniqueKey v2

2024-05-13 Thread Andy Fan


>> Consider join of tables "a", "b" and "c". My understanding is that
>> make_join_rel() is called once with rel1={a} and rel2={b join c}, then with
>> rel1={a join b} and rel2={c}, etc. I wanted to say that each call should
>> produce the same set of unique keys.
>> 
>> I need to check this part more in detail.
>
> I think I understand now. By calling populate_joinrel_uniquekeys() for various
> orderings, you can find out that various input relation unique keys can
> represent the whole join. For example, if the ordering is
>
> A JOIN (B JOIN C)
>
> you can prove that the unique keys of A can be used for the whole join, while
> for the ordering
>
> B JOIN (A JOIN C)
>
> you can prove the same for the unique keys of B, and so on.

Yes.

>> > We don't create the component uniquekey if any one side of the boths
>> > sides is unique already. For example:
>> > 
>> > "(t1.id) in joinrel(t1, t2) is unique" OR "(t2.id) in joinrel is
>> > unique", there is no need to create component UniqueKey (t1.id, t2.id);  
>> 
>> ok, I need to check more in detail how this part works.
>
> This optimization makes sense to me.

OK.

>> > >
>> > >   Of course one problem is that the number of combinations can grow
>> > >   exponentially as new relations are joined.
>> > 
>> > Yes, that's why "rule 1" needed and "How to reduce the overhead" in
>> > UniqueKey.README is introduced. 
>
> I think there should yet be some guarantee that the number of unique keys does
> not grow exponentially. Perhaps a constant that allows a relation (base or
> join) to have at most N unique keys. (I imagine N to be rather small, e.g. 3
> or 4.) And when picking the "best N keys", one criterion could be the number
> of expressions in the key (the shorter key the better).

I don't want to introduce this complextity right now. I'm more
inerested with how to work with them effectivity. main effort includes: 

- the design of bitmapset which is memory usage friendly and easy for
combinations.
- Optimize the singlerow cases to reduce N UnqiueKeys to 1 UniqueKey.

I hope we can pay more attention to this optimization (at most N
UniqueKeys) when the major inforastruce has been done. 

>> > You are correct and IMO my current code are able to tell it is a single
>> > row as well.
>> > 
>> > 1. Since t1.id = 1, so t1 is single row, so t1.d is unqiuekey as a
>> > consequence.
>> > 2. Given t2.id is unique, t1.e = t2.id so t1's unqiuekey can be kept
>> > after the join because of rule 1 on joinrel. and t1 is singlerow, so the
>> > joinrel is singlerow as well.
>
> ok, I think I understand now.

OK.

At last, this probably is my first non-trival patchs which has multiple
authors, I don't want myself is the bottleneck for the coorperation, so
if you need something to do done sooner, please don't hesitate to ask me
for it explicitly.

Here is my schedule about this. I can provide the next version based
our discussion and your patches at the eariler of next week. and update
the UniqueKey.README to make sure the overall design clearer. What I
hope you to pay more attention is the UniqueKey.README besides the
code. I hope the UniqueKey.README can reduce the effort for others to
understand the overall design enormously.

-- 
Best Regards
Andy Fan





Re: UniqueKey v2

2024-05-13 Thread Andy Fan


Antonin Houska  writes:

>> Could you make the reason clearer for adding 'List *opfamily_lists;'
>> into UniqueKey?  You said "This is needed to create ECs in the parent
>> query if the upper relation represents a subquery." but I didn't get the
>> it. Since we need to maintain the UniqueKey in the many places, I'd like
>> to keep it as simple as possbile. Of course, anything essentical should
>> be added for sure. 
>
> If unique keys are generated for a subquery output, they also need to be
> created for the corresponding relation in the upper query ("sub" in the
> following example):

OK.
>
> select * from tab1 left join (select * from tab2) sub;
>
> However, to create an unique key for "sub", you need an EC for each expression
> of the key.

OK.
> And to create an EC, you in turn need the list of operator
> families.

I'm thinking if we need to "create" any EC. Can you find out a user case
where the outer EC is missed and the UniqueKey is still interesting? I
don't have an example now.  

convert_subquery_pathkeys has a similar sistuation and has the following
codes:

outer_ec =
get_eclass_for_sort_expr(root,

 (Expr *) outer_var,

 sub_eclass->ec_opfamilies,

 sub_member->em_datatype,

 sub_eclass->ec_collation,

 0,

 rel->relids,

 NULL,

 false);

/*
 * If we don't find a matching EC, sub-pathkey 
isn't
 * interesting to the outer query
 */
if (outer_ec)
best_pathkey =
make_canonical_pathkey(root,

   outer_ec,

   sub_pathkey->pk_opfamily,

   sub_pathkey->pk_strategy,

   sub_pathkey->pk_nulls_first);
}

> Even if the parent query already had ECs for the columns of "sub" which are
> contained in the unique key, you need to make sure that those ECs are
> "compatible" with the ECs of the subquery which generated the unique key. That
> is, if an EC of the subquery considers certain input values equal, the EC of
> the parent query must also be able to determine if they are equal or not.
>
>> > * RelOptInfo.notnullattrs
>> >
>> >   My understanding is that this field contains the set attributes whose
>> >   uniqueness is guaranteed by the unique key. They are acceptable because 
>> > they
>> >   are either 1) not allowed to be NULL due to NOT NULL constraint or 2) 
>> > NULL
>> >   value makes the containing row filtered out, so the row cannot break
>> >   uniqueness of the output. Am I right?
>> >
>> >   If so, I suggest to change the field name to something less generic, 
>> > maybe
>> >   'uniquekey_attrs' or 'uniquekey_candidate_attrs', and adding a comment 
>> > that
>> >   more checks are needed before particular attribute can actually be used 
>> > in
>> >   UniqueKey.
>> 
>> I don't think so, UniqueKey is just one of the places to use this
>> not-null property, see 3af704098 for the another user case of it. 
>> 
>> (Because of 3af704098, we should leverage notnullattnums somehow in this
>> patch, which will be included in the next version as well).
>
> In your patch you modify 'notnullattrs' in add_base_clause_to_rel(), but that
> does not happen to 'notnullattnums' in the current master branch. Thus I think
> that 'notnullattrs' is specific to the unique keys feature, so the field name
> should be less generic.

OK.

>> >
>> > * uniquekey_useful_for_merging()
>> >
>> >   How does uniqueness relate to merge join? In README.uniquekey you seem to
>> >   point out that a single row is always sorted, but I don't think this
>> >   function is related to that fact. (Instead, I'd expect that pathkeys are
>> >   added to all paths for a single-row relation, but I'm not sure you do 
>> > that
>> >   in the current version of 

Underscore in positional parameters?

2024-05-13 Thread Erik Wienhold
I noticed that we (kind of) accept underscores in positional parameters.
For example, this works:

=> PREPARE p1 AS SELECT $1_2;
PREPARE
=> EXECUTE p1 (123);
 ?column?
--
 123
(1 row)

Parameter $1_2 is taken as $1 because in rule {param} in scan.l we get
the parameter number with atol which stops at the underscore.  That's a
regression in faff8f8e47f.  Before that commit, $1_2 resulted in
"ERROR: trailing junk after parameter".

I can't tell which fix is the way to go: (1) accept underscores without
using atol, or (2) just forbid underscores.  Any ideas?

atol can be replaced with pg_strtoint32_safe to handle the underscores.
This also avoids atol's undefined behavior on overflows.  AFAICT,
positional parameters are not part of the SQL standard, so nothing
prevents us from accepting underscores here as well.  The attached patch
does that and also adds a test case.

But reverting {param} to its old form to forbid underscores also makes
sense.  That is:

param   \${decdigit}+
param_junk  \${decdigit}+{ident_start}

It seems very unlikely that anybody uses that many parameters and still
cares about readability to use underscores.  But maybe users simply
expect that underscores are valid here as well.

-- 
Erik
>From 0f319818360cdd26e96198ea7fcaba1b8298f264 Mon Sep 17 00:00:00 2001
From: Erik Wienhold 
Date: Tue, 14 May 2024 04:14:08 +0200
Subject: [PATCH] Fix parsing of positional parameters with underscores

Underscores were added to numeric literals in faff8f8e47.  That also
introduced a regression whereby positional parameters accepted
underscores as well.  But such parameter numbers were not parsed
correctly by atol which stops at the first non-digit character.  So
parameter $1_2 would be taken as just $1.  Fix that by replacing atol
with pg_strtoint32_safe.  Thereby we also avoid the undefined behavior
of atol on overflows.
---
 src/backend/parser/scan.l| 5 -
 src/test/regress/expected/numerology.out | 1 +
 src/test/regress/sql/numerology.sql  | 2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 5eaadf53b2..ebb7ace9ca 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -992,7 +992,10 @@ other			.
 
 {param}			{
 	SET_YYLLOC();
-	yylval->ival = atol(yytext + 1);
+	ErrorSaveContext escontext = {T_ErrorSaveContext};
+	yylval->ival = pg_strtoint32_safe(yytext + 1, (Node *) );
+	if (escontext.error_occurred)
+		yyerror("parameter too large");
 	return PARAM;
 }
 {param_junk}	{
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index f662a5050a..3c1308e22c 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -297,6 +297,7 @@ SELECT 1_000.5e0_1;
 10005
 (1 row)
 
+PREPARE p2 AS SELECT $0_1;
 -- error cases
 SELECT _100;
 ERROR:  column "_100" does not exist
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index 1941c58e68..11af76828d 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -77,6 +77,8 @@ SELECT 1_000.;
 SELECT .000_005;
 SELECT 1_000.5e0_1;
 
+PREPARE p2 AS SELECT $0_1;
+
 -- error cases
 SELECT _100;
 SELECT 100_;
-- 
2.45.0



Re: First draft of PG 17 release notes

2024-05-13 Thread Andy Fan


Bruce Momjian  writes:

> On Sat, May 11, 2024 at 01:27:25PM +0800, Andy Fan wrote:
>> 
>> Hello Bruce,
>> 
>> > I have committed the first draft of the PG 17 release notes;  you can
>> > see the results here:
>> >
>> >https://momjian.us/pgsql_docs/release-17.html
>> 
>> Thank you for working on this!
>> 
>> > I welcome feedback.  For some reason it was an easier job than usual.
>> 
>> Do you think we need to add the following 2 items?
>> 
>> - 9f133763961e280d8ba692bcad0b061b861e9138 this is an optimizer
>>   transform improvement.
>
> It was unclear from the commit message exactly what user-visible
> optimization this allowed.  Do you have details?

Yes, It allows the query like "SELECT * FROM t1 WHERE t1.a in (SELECT a
FROM t2 WHERE t2.b = t1.b)" be pulled up a semi join, hence more join
methods / join orders are possible.

>
>> - a8a968a8212ee3ef7f22795c834b33d871fac262 this is an optimizer costing
>>   improvement.
>
> Does this allow faster UNION ALL with LIMIT, perhaps?

Yes, for example:  (subquery-1) UNION ALL (subquery-2) LIMIT n;

When planning the subquery-1 or subquery-2, limit N should be
considered. As a consequence, maybe hash join should be replaced with
Nested Loop. Before this commits, it is ignored if it is flatten into 
appendrel, and the "flatten" happens very often.

David provided a summary for the both commits in [1].

[1]
https://www.postgresql.org/message-id/CAApHDvqAQgq27LgYmJ85VVGTR0%3DhRW6HHq2oZgK0ZiYC_a%2BEww%40mail.gmail.com
 

-- 
Best Regards
Andy Fan





Re: First draft of PG 17 release notes

2024-05-13 Thread Tender Wang
jian he  于2024年5月9日周四 18:00写道:

> On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:
> >
> > I have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> >
> > https://momjian.us/pgsql_docs/release-17.html
> >
>
> another potential incompatibilities issue:
> ALTER TABLE DROP PRIMARY KEY
>
> see:
>
> https://www.postgresql.org/message-id/202404181849.6frtmajobe27%40alvherre.pgsql
>
>
Since Alvaro has reverted all changes to not-null constraints, so will not
have potential incompatibilities issue.
-- 
Tender Wang
OpenPie:  https://en.openpie.com/


Re: First draft of PG 17 release notes

2024-05-13 Thread Bruce Momjian
On Sat, May 11, 2024 at 01:27:25PM +0800, Andy Fan wrote:
> 
> Hello Bruce,
> 
> > I have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> >
> > https://momjian.us/pgsql_docs/release-17.html
> 
> Thank you for working on this!
> 
> > I welcome feedback.  For some reason it was an easier job than usual.
> 
> Do you think we need to add the following 2 items?
> 
> - 9f133763961e280d8ba692bcad0b061b861e9138 this is an optimizer
>   transform improvement.

It was unclear from the commit message exactly what user-visible
optimization this allowed.  Do you have details?

> - a8a968a8212ee3ef7f22795c834b33d871fac262 this is an optimizer costing
>   improvement.

Does this allow faster UNION ALL with LIMIT, perhaps?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: First draft of PG 17 release notes

2024-05-13 Thread Bruce Momjian
On Sat, May 11, 2024 at 10:24:39AM -0400, Joe Conway wrote:
> On 5/11/24 09:57, Jelte Fennema-Nio wrote:
> > On Fri, 10 May 2024 at 23:31, Tom Lane  wrote:
> > > 
> > > Bruce Momjian  writes:
> > > > I looked at both of these.   In both cases I didn't see why the user
> > > > would need to know these changes were made:
> > > 
> > > I agree that the buffering change is not likely interesting, but
> > > the fact that you can now control-C out of a psql "\c" command
> > > is user-visible.  People might have internalized the fact that
> > > it didn't work, or created complicated workarounds.
> > 
> > The buffering change improved performance up to ~40% in some of the
> > benchmarks. The case it improves mostly is COPY of large rows and
> > streaming a base backup. That sounds user-visible enough to me to
> > warrant an entry imho.
> 
> +1

Attached patch applied.


-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
commit e87e7324555
Author: Bruce Momjian 
Date:   Mon May 13 20:55:13 2024 -0400

doc PG 17 relnotes:  add item about libpq large data transfers

Reported-by: Jelte Fennema-Nio

Discussion: https://postgr.es/m/cageczqtz5auqlel6dald2hu2fxs_losh4kedndj1fwthsb_...@mail.gmail.com

Reviewed-by: Joe Conway

Backpatch-through: master

diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index 9dd3954f3c2..38c14970822 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -1901,6 +1901,17 @@ Add libpq function PQsetChunkedRowsMode to allow retrieval of results in chunks
 
 
 
+
+
+
+
+Allow libpq to more efficiently transfer large blocks of data (Melih Mutlu)
+
+
+
 

Re: Why is citext/regress failing on hamerkop?

2024-05-13 Thread Thomas Munro
On Tue, May 14, 2024 at 8:17 AM Tom Lane  wrote:
> I'm not sure whether we've got unsent data pending in the problematic
> condition, nor why it'd remain unsent if we do (shouldn't the backend
> consume it anyway?).  But this has the right odor for an explanation.
>
> I'm pretty hesitant to touch this area myself, because it looks an
> awful lot like commits 6051857fc and ed52c3707, which eventually
> had to be reverted.  I think we need a deeper understanding of
> exactly what Winsock is doing or not doing before we try to fix it.

I was beginning to suspect that lingering odour myself.  I haven't
look at the GSS code but I was imagining that what we have here is
perhaps not unsent data dropped on the floor due to linger policy
(unclean socket close on process exist), but rather that the server
didn't see the socket as ready to read because it lost track of the
FD_CLOSE somewhere because the client closed it gracefully, and our
server-side FD_CLOSE handling has always been a bit suspect.  I wonder
if the GSS code is somehow more prone to brokenness.  One thing we
learned in earlier problems was that abortive/error disconnections
generate FD_CLOSE repeatedly, while graceful ones give you only one.
In other words, if the other end politely calls closesocket(), the
server had better not miss the FD_CLOSE event, because it won't come
again.   That's what

https://commitfest.postgresql.org/46/3523/

is intended to fix.  Does it help here?  Unfortunately that's
unpleasantly complicated and unbackpatchable (keeping a side-table of
socket FDs and event handles, so we don't lose events between the
cracks).




Re: First draft of PG 17 release notes

2024-05-13 Thread Bruce Momjian
On Fri, May 10, 2024 at 05:31:33PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Fri, May 10, 2024 at 06:50:54PM +0200, Jelte Fennema-Nio wrote:
> >> There are two commits that I think would benefit from being listed
> >> (but maybe they are already listed and I somehow missed them, or they
> >> are left out on purpose for some reason):
> 
> > I looked at both of these.   In both cases I didn't see why the user
> > would need to know these changes were made:
> 
> I agree that the buffering change is not likely interesting, but
> the fact that you can now control-C out of a psql "\c" command
> is user-visible.  People might have internalized the fact that
> it didn't work, or created complicated workarounds.

Agreed, attached patch applied.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index 5a741ff16f1..9dd3954f3c2 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -1980,6 +1980,17 @@ The parameter is "min_rows".
 
 
 
+
+
+
+
+Allow psql connections to be canceled with control-C (Tristan Partin)
+
+
+
 

Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-05-13 Thread Michael Paquier
On Thu, Apr 25, 2024 at 02:53:33PM +0200, Majid Garoosi wrote:
> Unfortunately, I quit that company a month ago (I wish we could
> discuss this earlier) and don't have access to the environment
> anymore.
> I'll try to ask my teammates and see if they can find anything about
> the exact values of latency, bw, etc.
> 
> Anyway, here is some description of the environment. Sadly, there
> are no numbers in this description, but I'll try to provide as much details
> as possible.
> There is a k8s cluster running over some VMs. Each postgres
> instance runs as a pod inside the k8s cluster. So, both the
> primary and standby servers are in the same DC, but might be on
> different baremetal nodes. There is an overlay network for the pods to
> see each other, and there's also another overlay network for the VMs
> to see each other.
> The storage servers are in the same DC, but we're sure they're on some
> racks other than the postgres pods. They run Ceph [1] project and provide
> Rados Block Devices (RBD) [2] interface. In order for k8s to use ceph, a
> Ceph-CSI [3] controller is running inside the k8s cluster.
> BTW, the FS type is ext4.

Okay, seeing the feedback for this patch and Andres disagreeing with
it as being a good idea, I have marked the patch as rejected as it was
still in the CF app.
--
Michael


signature.asc
Description: PGP signature


Re: I have an exporting need...

2024-05-13 Thread David Rowley
On Tue, 14 May 2024 at 06:18, Juan Hernández  wrote:
> Do you consider useful to add a parameter (for example, --separatetables) so 
> when used the exporting file process can create a different tablename.sql 
> file for each table in database automatically?
>
> Example...
>
> PGHOST="/tmp" PGPASSWORD="mydbpass" pg_dump -U dbusername --separatetables 
> -Fp --inserts dbname > "/route/dbname.sql"
>
> And if this database has tables table1...table10, then 10 files are created...

pg_dump has code to figure out the dependency of objects in the
database so that the dump file produced can be restored.  If one file
was output per table, how would you know in which order to restore
them? For example, you have a table with a FOREIGN KEY to reference
some other table, you need to restore the referenced table first.
That's true for both restoring the schema and restoring the data.

David




RE: Resetting synchronous_standby_names can wait for CHECKPOINT to finish

2024-05-13 Thread Yusuke Egashira (Fujitsu)
Hello,

> When the checkpointer process is busy, even if we reset 
> synchronous_standby_names, the resumption of the backend processes waiting in 
> SyncRep are made to wait until the checkpoint is completed.
> This prevents the prompt resumption of application processing when a problem 
> occurs on the standby server in a synchronous replication system.
> I confirmed this in PostgreSQL 12.18.

I have tested this issue on Postgres built from the master branch (17devel) and 
observed the same behavior where the backend SyncRep release is blocked until 
CHECKPOINT completion.

In situations where a synchronous standby instance encounters an error and 
needs to be detached, I believe that the current behavior of waiting for 
SyncRep is inappropriate as it delays the backend.
I don't think changing the position of SIGHUP processing in the Checkpointer 
process carries much risk. Is there any oversight in my perception?


Regards, 
Yusuke Egashira.





Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jacob Champion
[this should probably belong to a different thread, but I'm not sure
what to title it]

On Mon, May 13, 2024 at 4:08 AM Heikki Linnakangas  wrote:
> I think these options should be designed from the user's point of view,
> so that the user can specify the risks they're willing to accept, and
> the details of how that's accomplished are handled by libpq. For
> example, I'm OK with (tick all that apply):
>
> [ ] passive eavesdroppers seeing all the traffic
> [ ] MITM being able to hijack the session
> [ ] connecting without verifying the server's identity
> [ ] divulging the plaintext password to the server
> [ ] ...

I'm pessimistic about a quality-of-protection scheme for this use case
(*). I don't think users need more knobs in their connection strings,
and many of the goals of transport encryption are really not
independent from each other in practice. As evidence I point to the
absolute mess of GSSAPI wrapping, which lets you check separate boxes
for things like "require the server to authenticate itself" and
"require integrity" and "allow MITMs to reorder messages" and so on,
as if the whole idea of "integrity" is useful if you don't know who
you're talking to in the first place. I think I recall slapd having
something similarly arcane (but at least it doesn't make the clients
do it!). Those kinds of APIs don't evolve well, in my opinion.

I think most people probably just want browser-grade security as
quickly and cheaply as possible, and we don't make that very easy
today. I'll try to review a QoP scheme if someone works on it, don't
get me wrong, but I'd much rather spend time on a "very secure by
default" mode that gets rid of most of the options (i.e. a
postgresqls:// scheme).

(*) I've proposed quality-of-protection in the past, for Postgres
proxy authentication [1]. But I'm comfortable in my hypocrisy, because
in that case, the end user doing the configuration is a DBA with a
config file who is expected to understand the whole system, and it's a
niche use case (IMO) without an obvious "common setup". And frankly I
think my proposal is unlikely to go anywhere; the cost/benefit
probably isn't good enough.

> If you need to verify
> the server's identity, it implies sslmode=verify-CA or
> channel_binding=true.

Neither of those two options provides strong authentication of the
peer, and personally I wouldn't want them to be considered worthy of
"verify the server's identity" mode.

And -- taking a wild leap here -- if we disagree, then granularity
becomes a problem: either the QoP scheme now has to have
sub-checkboxes for "if the server knows my password, that's good
enough" and "it's fine if the server's hostname doesn't match the
cert, for some reason", or it smashes all of those different ideas
into one setting and then I have to settle for the weakest common
denominator during an active attack. Assuming I haven't missed a third
option, will that be easier/better than the status quo of
require_auth+sslmode?

> A different line of thought is that to keep the attack surface as smal
> as possible, you should specify very explicitly what exactly you expect
> to happen in the authentication, and disallow any variance. For example,
> you expect SCRAM to be used, with a certificate signed by a particular
> CA, and if the server requests anything else, that's suspicious and the
> connection is aborted. We should make that possible too

That's 'require_auth=scram-sha-256 sslmode=verify-ca', no?

--Jacob

[1] 
https://www.postgresql.org/message-id/0768cedb-695a-8841-5f8b-da2aa64c8f3a%40timescale.com




Re: SQL:2011 application time

2024-05-13 Thread Paul Jungwirth

On 5/13/24 03:11, Peter Eisentraut wrote:
It looks like we missed some of these fundamental design questions early on, and it might be too 
late now to fix them for PG17.


For example, the discussion on unique constraints misses that the question of null values in unique 
constraints itself is controversial and that there is now a way to change the behavior.  So I 
imagine there is also a selection of possible behaviors you might want for empty ranges.  
Intuitively, I don't think empty ranges are sensible for temporal unique constraints.  But anyway, 
it's a bit late now to be discussing this.


I'm also concerned that if ranges have this fundamental incompatibility with periods, then the plan 
to eventually evolve this patch set to support standard periods will also have as-yet-unknown problems.


Some of these issues might be design flaws in the underlying mechanisms, like range types and 
exclusion constraints.  Like, if you're supposed to use this for scheduling but you can use empty 
ranges to bypass exclusion constraints, how is one supposed to use this?  Yes, a check constraint 
using isempty() might be the right answer.  But I don't see this documented anywhere.


On the technical side, adding an implicit check constraint as part of a primary key constraint is 
quite a difficult implementation task, as I think you are discovering.  I'm just reminded about how 
the patch for catalogued not-null constraints struggled with linking these not-null constraints to 
primary keys correctly.  This sounds a bit similar.


I'm afraid that these issues cannot be resolved in good time for this release, so we should revert 
this patch set for now.


I think reverting is a good idea. I'm not really happy with the CHECK constraint solution either. 
I'd be happy to have some more time to rework this for v18.


A couple alternatives I'd like to explore:

1. Domain constraints instead of a CHECK constraint. I think this is probably worse, and I don't 
plan to spend much time on it, but I thought I'd mention it in case someone else thought otherwise.


2. A slightly different overlaps operator, say &&&, where 'empty' &&& 'empty' is true. But 'empty' 
with anything else could still be false (or not). That operator would prevent duplicates in an 
exclusion constraint. This also means we could support more types than just ranges & multiranges. I 
need to think about whether this combines badly with existing operators, but if not it has a lot of 
promise. If anything it might be *less* contradictory, because it fits better with 'empty' @> 
'empty', which we say is true.


Another thing a revert would give me some time to consider: even though it's not standard syntax, I 
wonder if we want to require syntax something like `PRIMARY KEY USING gist (id, valid_at WITHOUT 
OVERLAPS)`. Everywhere else we default to btree, so defaulting to gist feels a little weird. In 
theory we could even someday support WITHOUT OVERLAPS with btree, if we taught that AM to answer 
that question. (I admit there is probably not a lot of desire for that though.)


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: Why is parula failing?

2024-05-13 Thread Tom Lane
David Rowley  writes:
> I've not seen any recent failures from Parula that relate to this
> issue.  The last one seems to have been about 4 weeks ago.

> I'm now wondering if it's time to revert the debugging code added in
> 1db689715.  Does anyone think differently?

+1.  It seems like we wrote off the other issue as a compiler bug,
so I don't have much trouble assuming that this one was too.

regards, tom lane




Re: Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Julien Rouhaud
On Mon, May 13, 2024 at 07:11:53PM GMT, Tom Lane wrote:
> Andres Freund  writes:
> > On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote:
> >> Hmmm, depending on the extension it can extensively call/use postgres code
> >> so would be nice if we can differentiate if the code is called from
> >> Postgres itself or from an extension.
> 
> > I think that's not realistically possible. It's also very fuzzy what that'd
> > mean. If there's a planner hook and then the query executes normally, what 
> > do
> > you report for an execution time error? And even the simpler case - should 
> > use
> > of pg_stat_statements cause everything within to be logged as a
> > pg_stat_statement message?
> 
> Not to mention that there could be more than one extension on the call
> stack.  I think tying this statically to the ereport call site is
> fine.
> 
> The mechanism that Andres describes for sourcing the name seems a bit
> overcomplex though.  Why not just allow/require each extension to
> specify its name as a constant string?  We could force the matter by
> redefining PG_MODULE_MAGIC as taking an argument:
> 
>   PG_MODULE_MAGIC("hstore");

FTR there was a proposal at [1] some time ago that could be used for that need
(and others), I thought it could be good to mention it just in case.  That
would obviously only work if all extensions uses that framework.

[1] https://www.postgresql.org/message-id/flat/3207907.AWbSqkKDnR%40aivenronan




Re: Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Andres Freund
Hi,

On 2024-05-13 19:11:53 -0400, Tom Lane wrote:
> The mechanism that Andres describes for sourcing the name seems a bit
> overcomplex though.  Why not just allow/require each extension to
> specify its name as a constant string?  We could force the matter by
> redefining PG_MODULE_MAGIC as taking an argument:
>   PG_MODULE_MAGIC("hstore");

Mostly because it seemed somewhat sad to require every extension to have
version-specific ifdefs around that, particularly because it's not hard for us
to infer.

I think there might be other use cases for the backend to provide "extension
scoped" information, FWIW. Even just providing the full path to the extension
library could be useful.

Greetings,

Andres Freund




Re: Why is parula failing?

2024-05-13 Thread David Rowley
On Thu, 21 Mar 2024 at 13:53, David Rowley  wrote:
>
> On Thu, 21 Mar 2024 at 12:36, Tom Lane  wrote:
> > So yeah, if we could have log_autovacuum_min_duration = 0 perhaps
> > that would yield a clue.
>
> FWIW, I agree with your earlier statement about it looking very much
> like auto-vacuum has run on that table, but equally, if something like
> the pg_index record was damaged we could get the same plan change.
>
> We could also do something like the attached just in case we're
> barking up the wrong tree.

I've not seen any recent failures from Parula that relate to this
issue.  The last one seems to have been about 4 weeks ago.

I'm now wondering if it's time to revert the debugging code added in
1db689715.  Does anyone think differently?

David




Re: Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Tom Lane
Andres Freund  writes:
> On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote:
>> Hmmm, depending on the extension it can extensively call/use postgres code
>> so would be nice if we can differentiate if the code is called from
>> Postgres itself or from an extension.

> I think that's not realistically possible. It's also very fuzzy what that'd
> mean. If there's a planner hook and then the query executes normally, what do
> you report for an execution time error? And even the simpler case - should use
> of pg_stat_statements cause everything within to be logged as a
> pg_stat_statement message?

Not to mention that there could be more than one extension on the call
stack.  I think tying this statically to the ereport call site is
fine.

The mechanism that Andres describes for sourcing the name seems a bit
overcomplex though.  Why not just allow/require each extension to
specify its name as a constant string?  We could force the matter by
redefining PG_MODULE_MAGIC as taking an argument:

PG_MODULE_MAGIC("hstore");

regards, tom lane




Re: Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Andres Freund
Hi,

On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote:
> On Mon, May 13, 2024 at 5:51 PM Andres Freund  wrote:
> > It's not entirely trivial to provide errfinish() with a parameter
> indicating
> > the extension, but it's doable:
> >
> > 1) Have PG_MODULE_MAGIC also define a new variable for the extension name,
> >empty at that point
> >
> > 2) In internal_load_library(), look up that new variable, and fill it
> with a,
> >mangled, libname.
> >
> > 4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set,
> >we're in the server, otherwise an extension). In the backend itself,
> define
> >it to NULL, otherwise to the variable created by PG_MODULE_MAGIC.
> >
> > 5) In elog/ereport/errsave/... pass this new variable to
> >errfinish/errsave_finish.
> >
> 
> Then every extension should define their own Pg_extension_filename, right?

It'd be automatically set by postgres when loading libraries.


> > I've attached a *very rough* prototype of this idea. My goal at this
> stage was
> > just to show that it's possible, not for the code to be in a reviewable
> state.
> >
> >
> > Here's e.g. what this produces with log_line_prefix='%m [%E] '
> >
> > 2024-05-13 13:50:17.518 PDT [postgres] LOG:  database system is ready to
> accept connections
> > 2024-05-13 13:50:19.138 PDT [cube] ERROR:  invalid input syntax for cube
> at character 13
> > 2024-05-13 13:50:19.138 PDT [cube] DETAIL:  syntax error at or near "f"
> > 2024-05-13 13:50:19.138 PDT [cube] STATEMENT:  SELECT cube('foo');
> >
> > 2024-05-13 13:43:07.484 PDT [postgres] LOG:  database system is ready to
> accept connections
> > 2024-05-13 13:43:11.699 PDT [hstore] ERROR:  syntax error in hstore:
> unexpected end of string at character 15
> > 2024-05-13 13:43:11.699 PDT [hstore] STATEMENT:  SELECT hstore('foo');
> >
> >
> 
> Was not able to build your patch by simply:

Oh, turns out it only builds with meson right now.  I forgot that, with
autoconf, for some unknown reason, we only set BUILDING_DLL on some OSs.

I attached a crude patch changing that.


> > It's worth pointing out that this, quite fundamentally, can only work
> when the
> > log message is triggered directly by the extension. If the extension code
> > calls some postgres function and that function then errors out, it'll be
> seens
> > as being part of postgres.
> >
> > But I think that's ok - they're going to be properly errcode-ified etc.
> >
> 
> Hmmm, depending on the extension it can extensively call/use postgres code
> so would be nice if we can differentiate if the code is called from
> Postgres itself or from an extension.

I think that's not realistically possible. It's also very fuzzy what that'd
mean. If there's a planner hook and then the query executes normally, what do
you report for an execution time error? And even the simpler case - should use
of pg_stat_statements cause everything within to be logged as a
pg_stat_statement message?

I think the best we can do is to actually say where the error is directly
triggered from.

Greetings,

Andres Freund
>From 1c59c465f2d359e8c47cf91d1ea458ea3b64ec84 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 13 May 2024 13:47:41 -0700
Subject: [PATCH v2 1/2] WIP: Track shared library in which elog/ereport is
 called

---
 src/include/fmgr.h |  1 +
 src/include/utils/elog.h   | 18 +-
 src/backend/utils/error/elog.c | 33 -
 src/backend/utils/fmgr/dfmgr.c | 30 ++
 4 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index ccb4070a251..3c7cfd7fee9 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -504,6 +504,7 @@ PG_MAGIC_FUNCTION_NAME(void) \
 	static const Pg_magic_struct Pg_magic_data = PG_MODULE_MAGIC_DATA; \
 	return _magic_data; \
 } \
+PGDLLEXPORT const char *Pg_extension_filename = NULL; \
 extern int no_such_variable
 
 
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 054dd2bf62f..5e57f01823d 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -137,6 +137,13 @@ struct Node;
  * prevents gcc from making the unreachability deduction at optlevel -O0.
  *--
  */
+#ifdef BUILDING_DLL
+#define ELOG_EXTNAME NULL
+#else
+extern PGDLLEXPORT const char *Pg_extension_filename;
+#define ELOG_EXTNAME Pg_extension_filename
+#endif
+
 #ifdef HAVE__BUILTIN_CONSTANT_P
 #define ereport_domain(elevel, domain, ...)	\
 	do { \
@@ -144,7 +151,7 @@ struct Node;
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
 			errstart_cold(elevel, domain) : \
 			errstart(elevel, domain)) \
-			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \
+			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
 			pg_unreachable(); \
 	} while(0)
@@ -154,7 +161,7 @@ struct Node;
 		const int elevel_ = (elevel); \
 		

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jacob Champion
On Mon, May 13, 2024 at 9:13 AM Robert Haas  wrote:
> I find this idea to be a massive improvement over the status quo,

+1

> and
> I didn't spot any major problems when I read through the patch,
> either.

Definitely not a major problem, but I think
select_next_encryption_method() has gone stale, since it originally
provided generality and lines of fallback that no longer exist. In
other words, I think the following code is now misleading:

> if (conn->sslmode[0] == 'a')
> SELECT_NEXT_METHOD(ENC_PLAINTEXT);
>
> SELECT_NEXT_METHOD(ENC_NEGOTIATED_SSL);
> SELECT_NEXT_METHOD(ENC_DIRECT_SSL);
>
> if (conn->sslmode[0] != 'a')
> SELECT_NEXT_METHOD(ENC_PLAINTEXT);

To me, that implies that negotiated mode takes precedence over direct,
but the point of the patch is that it's not possible to have both. And
if direct SSL is in use, then sslmode can't be "allow" anyway, and we
definitely don't want ENC_PLAINTEXT.

So if someone proposes a change to select_next_encryption_method(),
you'll have to remember to stare at init_allowed_encryption_methods()
as well, and think really hard about what's going on. And vice-versa.
That worries me.

> I don't have a strong opinion about whether sslnegotiation=direct
> should error out (as you propose here) or silently promote sslmode to
> require. I think either is defensible.

I'm comforted that, since sslrootcert=system already does it, plenty
of use cases will get that for free. And if you decide in the future
that you really really want it to promote, it won't be a compatibility
break to make that change. (That gives us more time for wider v16-17
adoption, to see how the sslrootcert=system magic promotion behavior
is going in practice.)

> Had I been implementing it, I
> think I would have done as Jacob proposes, just because once we've
> forced a direct SSL negotiation surely the only sensible behavior is
> to be using SSL, unless you think there should be a
> silently-reconnect-without-SSL behavior, which I sure don't.

We still allow GSS to preempt SSL, though, so "forced" is probably
overstating things.

> It's really hard to believe in 2024 that anyone should ever be using a
> setting that may or may not encrypt the connection. There's http and
> https but there's no httpmaybes.

+1. I think (someone hop in and correct me please) that Opportunistic
Encryption for HTTP mostly fizzled, and they gave it a *lot* of
thought.

--Jacob




Re: Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Fabrízio de Royes Mello
On Mon, May 13, 2024 at 5:51 PM Andres Freund  wrote:
>
> Hi,
>
> It can be very useful to look at the log messages emitted by a larger
number
> of postgres instances to see if anything unusual is happening. E.g.
checking
> whether there are an increased number of internal, IO, corruption errors
(and
> LOGs too, because we emit plenty bad things as LOG) . One difficulty is
that
> extensions tend to not categorize their errors. But unfortunately errors
in
> extensions are hard to distinguish from errors emitted by postgres.
>
> A related issue is that it'd be useful to be able to group log messages by
> extension, to e.g. see which extensions are emitting disproportionally
many
> log messages.
>
> Therefore I'd like to collect the extension name in elog/ereport and add a
> matching log_line_prefix escape code.
>

I liked the idea ... It is very helpful for troubleshooting problems in
production.


> It's not entirely trivial to provide errfinish() with a parameter
indicating
> the extension, but it's doable:
>
> 1) Have PG_MODULE_MAGIC also define a new variable for the extension name,
>empty at that point
>
> 2) In internal_load_library(), look up that new variable, and fill it
with a,
>mangled, libname.
>
> 4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set,
>we're in the server, otherwise an extension). In the backend itself,
define
>it to NULL, otherwise to the variable created by PG_MODULE_MAGIC.
>
> 5) In elog/ereport/errsave/... pass this new variable to
>errfinish/errsave_finish.
>

Then every extension should define their own Pg_extension_filename, right?


> I've attached a *very rough* prototype of this idea. My goal at this
stage was
> just to show that it's possible, not for the code to be in a reviewable
state.
>
>
> Here's e.g. what this produces with log_line_prefix='%m [%E] '
>
> 2024-05-13 13:50:17.518 PDT [postgres] LOG:  database system is ready to
accept connections
> 2024-05-13 13:50:19.138 PDT [cube] ERROR:  invalid input syntax for cube
at character 13
> 2024-05-13 13:50:19.138 PDT [cube] DETAIL:  syntax error at or near "f"
> 2024-05-13 13:50:19.138 PDT [cube] STATEMENT:  SELECT cube('foo');
>
> 2024-05-13 13:43:07.484 PDT [postgres] LOG:  database system is ready to
accept connections
> 2024-05-13 13:43:11.699 PDT [hstore] ERROR:  syntax error in hstore:
unexpected end of string at character 15
> 2024-05-13 13:43:11.699 PDT [hstore] STATEMENT:  SELECT hstore('foo');
>
>

Was not able to build your patch by simply:

./configure --prefix=/tmp/pg
...
make -j
...
/usr/bin/ld: ../../src/port/libpgport_srv.a(path_srv.o): warning:
relocation against `Pg_extension_filename' in read-only section `.text'
/usr/bin/ld: access/brin/brin.o: in function `brininsert':
/data/src/pg/main/src/backend/access/brin/brin.c:403: undefined reference
to `Pg_extension_filename'
/usr/bin/ld: access/brin/brin.o: in function `brinbuild':
/data/src/pg/main/src/backend/access/brin/brin.c:1107: undefined reference
to `Pg_extension_filename'
/usr/bin/ld: access/brin/brin.o: in function `brin_summarize_range':
/data/src/pg/main/src/backend/access/brin/brin.c:1383: undefined reference
to `Pg_extension_filename'
/usr/bin/ld: /data/src/pg/main/src/backend/access/brin/brin.c:1389:
undefined reference to `Pg_extension_filename'
/usr/bin/ld: /data/src/pg/main/src/backend/access/brin/brin.c:1434:
undefined reference to `Pg_extension_filename'
/usr/bin/ld:
access/brin/brin.o:/data/src/pg/main/src/backend/access/brin/brin.c:1450:
more undefined references to `Pg_extension_filename' follow
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:67: postgres] Error 1
make[2]: Leaving directory '/data/src/pg/main/src/backend'
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make[1]: Leaving directory '/data/src/pg/main/src'
make: *** [GNUmakefile:11: all-src-recurse] Error 2


> It's worth pointing out that this, quite fundamentally, can only work
when the
> log message is triggered directly by the extension. If the extension code
> calls some postgres function and that function then errors out, it'll be
seens
> as being part of postgres.
>
> But I think that's ok - they're going to be properly errcode-ified etc.
>

Hmmm, depending on the extension it can extensively call/use postgres code
so would be nice if we can differentiate if the code is called from
Postgres itself or from an extension.

Regards,

--
Fabrízio de Royes Mello


Re: pg_sequence_last_value() for unlogged sequences on standbys

2024-05-13 Thread Nathan Bossart
Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Andres Freund
Hi,

It can be very useful to look at the log messages emitted by a larger number
of postgres instances to see if anything unusual is happening. E.g. checking
whether there are an increased number of internal, IO, corruption errors (and
LOGs too, because we emit plenty bad things as LOG) . One difficulty is that
extensions tend to not categorize their errors. But unfortunately errors in
extensions are hard to distinguish from errors emitted by postgres.

A related issue is that it'd be useful to be able to group log messages by
extension, to e.g. see which extensions are emitting disproportionally many
log messages.

Therefore I'd like to collect the extension name in elog/ereport and add a
matching log_line_prefix escape code.


It's not entirely trivial to provide errfinish() with a parameter indicating
the extension, but it's doable:

1) Have PG_MODULE_MAGIC also define a new variable for the extension name,
   empty at that point

2) In internal_load_library(), look up that new variable, and fill it with a,
   mangled, libname.

4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set,
   we're in the server, otherwise an extension). In the backend itself, define
   it to NULL, otherwise to the variable created by PG_MODULE_MAGIC.

5) In elog/ereport/errsave/... pass this new variable to
   errfinish/errsave_finish.


I've attached a *very rough* prototype of this idea. My goal at this stage was
just to show that it's possible, not for the code to be in a reviewable state.


Here's e.g. what this produces with log_line_prefix='%m [%E] '

2024-05-13 13:50:17.518 PDT [postgres] LOG:  database system is ready to accept 
connections
2024-05-13 13:50:19.138 PDT [cube] ERROR:  invalid input syntax for cube at 
character 13
2024-05-13 13:50:19.138 PDT [cube] DETAIL:  syntax error at or near "f"
2024-05-13 13:50:19.138 PDT [cube] STATEMENT:  SELECT cube('foo');

2024-05-13 13:43:07.484 PDT [postgres] LOG:  database system is ready to accept 
connections
2024-05-13 13:43:11.699 PDT [hstore] ERROR:  syntax error in hstore: unexpected 
end of string at character 15
2024-05-13 13:43:11.699 PDT [hstore] STATEMENT:  SELECT hstore('foo');


It's worth pointing out that this, quite fundamentally, can only work when the
log message is triggered directly by the extension. If the extension code
calls some postgres function and that function then errors out, it'll be seens
as being part of postgres.

But I think that's ok - they're going to be properly errcode-ified etc.


Thoughts?

Greetings,

Andres Freund
>From 1c59c465f2d359e8c47cf91d1ea458ea3b64ec84 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 13 May 2024 13:47:41 -0700
Subject: [PATCH v1] WIP: Track shared library in which elog/ereport is called

---
 src/include/fmgr.h |  1 +
 src/include/utils/elog.h   | 18 +-
 src/backend/utils/error/elog.c | 33 -
 src/backend/utils/fmgr/dfmgr.c | 30 ++
 4 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index ccb4070a251..3c7cfd7fee9 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -504,6 +504,7 @@ PG_MAGIC_FUNCTION_NAME(void) \
 	static const Pg_magic_struct Pg_magic_data = PG_MODULE_MAGIC_DATA; \
 	return _magic_data; \
 } \
+PGDLLEXPORT const char *Pg_extension_filename = NULL; \
 extern int no_such_variable
 
 
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 054dd2bf62f..5e57f01823d 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -137,6 +137,13 @@ struct Node;
  * prevents gcc from making the unreachability deduction at optlevel -O0.
  *--
  */
+#ifdef BUILDING_DLL
+#define ELOG_EXTNAME NULL
+#else
+extern PGDLLEXPORT const char *Pg_extension_filename;
+#define ELOG_EXTNAME Pg_extension_filename
+#endif
+
 #ifdef HAVE__BUILTIN_CONSTANT_P
 #define ereport_domain(elevel, domain, ...)	\
 	do { \
@@ -144,7 +151,7 @@ struct Node;
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
 			errstart_cold(elevel, domain) : \
 			errstart(elevel, domain)) \
-			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \
+			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
 			pg_unreachable(); \
 	} while(0)
@@ -154,7 +161,7 @@ struct Node;
 		const int elevel_ = (elevel); \
 		pg_prevent_errno_in_scope(); \
 		if (errstart(elevel_, domain)) \
-			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \
+			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \
 		if (elevel_ >= ERROR) \
 			pg_unreachable(); \
 	} while(0)
@@ -169,7 +176,7 @@ extern bool message_level_is_interesting(int elevel);
 
 extern bool errstart(int elevel, const char *domain);
 extern pg_attribute_cold bool errstart_cold(int elevel, const char *domain);
-extern void errfinish(const char *filename, int lineno, const char 

Re: Large files for relations

2024-05-13 Thread Peter Eisentraut

On 06.03.24 22:54, Thomas Munro wrote:

Rebased.  I had intended to try to get this into v17, but a couple of
unresolved problems came up while rebasing over the new incremental
backup stuff.  You snooze, you lose.  Hopefully we can sort these out
in time for the next commitfest:

* should pg_combinebasebackup read the control file to fetch the segment size?
* hunt for other segment-size related problems that may be lurking in
new incremental backup stuff
* basebackup_incremental.c wants to use memory in proportion to
segment size, which looks like a problem, and I wrote about that in a
new thread[1]


Overall, I like this idea, and the patch seems to have many bases covered.

The patch will need a rebase.  I was able to test it on 
master@{2024-03-13}, but after that there are conflicts.


In .cirrus.tasks.yml, one of the test tasks uses 
--with-segsize-blocks=6, but you are removing that option.  You could 
replace that with something like


PG_TEST_INITDB_EXTRA_OPTS='--rel-segsize=48kB'

But that won't work exactly because

initdb: error: argument of --rel-segsize must be a power of two

I suppose that's ok as a change, since it makes the arithmetic more 
efficient.  But maybe it should be called out explicitly in the commit 
message.


If I run it with 64kB, the test pgbench/001_pgbench_with_server fails 
consistently, so it seems there is still a gap somewhere.


A minor point, the initdb error message

initdb: error: argument of --rel-segsize must be a multiple of BLCKSZ

would be friendlier if actually showed the value of the block size 
instead of just the symbol.  Similarly for the nearby error message 
about the off_t size.


In the control file, all the other fields use unsigned types.  Should 
relseg_size be uint64?


PG_CONTROL_VERSION needs to be changed.





Summary of Sort Improvement Proposals

2024-05-13 Thread Benjamin Coutu
Hello,

In light of multiple threads [1-6] discussing sorting improvements, I'd like to 
consolidate the old (+some new) ideas as a starting point.
It might make sense to brain storm on a few of these ideas and maybe even 
identify some that are worth implementing and testing.

1. Simple algorithmic ideas:
- Use single-assignment insertion-sort instead of swapping
- Increase insertion-sort threshold to at least 8 (possibly 10+), to be 
determined empirically based on current hardware
- Make insertion-sort threshold customizable via template based on sort 
element size

2. More complex/speculative algorithmic ideas:
- Try counting insertion-sort loop iterations and bail after a certain 
limit (include presorted check in insertion-sort loop and continue presorted 
check from last position in separate loop after bailout)
- Try binary search for presorted check (outside of insertion-sort-code)
- Try binary insertion sort (if comparison costs are high)
- Try partial insertion sort (include presorted check)
- Try presorted check only at top-level, not on every recursive step, 
or if on every level than at least only for n > some threshold
- Try asymmetric quick-sort partitioning
- Try dual pivot quick-sort
- Try switching to heap-sort dependent on recursion depth (might allow 
ripping out median-of-median)

3. TupleSort ideas:
- Use separate sort partition for NULL values to avoid null check on 
every comparison and to make nulls first/last trivial
- Pass down non-nullness info to avoid null check and/or null-partition 
creation (should ideally be determined by planner)
- Skip comparison of first sort key on subsequent full tuple 
tie-breaker comparison (unless abbreviated key)
- Encode NULL directly in abbreviated key (only if no null-partitioning)

4. Planner ideas:
- Use pg_stats.correlation to inform sort algorithm selection for sort 
keys that come from sequential-scans/bitmap-heap-scans
- Use n_distinct to inform sort algorithm selection (many tie-breaker 
comparisons necessary on multi-key sort)
- Improve costing of sorts in planner considering tuple size, 
distribution and n_distinct

[1] https://www.postgresql.org/message-id/flat/ddc4e498740a8e411c59%40zeyos.com
[2] 
https://www.postgresql.org/message-id/flat/CAFBsxsHanJTsX9DNJppXJxwg3bU%2BYQ6pnmSfPM0uvYUaFdwZdQ%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/flat/CAApHDvoTTtoQYfp3d0kTPF6y1pjexgLwquzKmjzvjC9NCw4RGw%40mail.gmail.com
[4] 
https://www.postgresql.org/message-id/flat/CAEYLb_Xn4-6f1ofsf2qduf24dDCVHbQidt7JPpdL_RiT1zBJ6A%40mail.gmail.com
[5] 
https://www.postgresql.org/message-id/flat/CAEYLb_W%2B%2BUhrcWprzG9TyBVF7Sn-c1s9oLbABvAvPGdeP2DFSQ%40mail.gmail.com
[6] 
https://www.postgresql.org/message-id/flat/683635b8-381b-5b08-6069-d6a45de19a12%40enterprisedb.com#12683b7a6c566eb5b926369b5fd2d41f

-- 

Benjamin Coutu
http://www.zeyos.com




Re: race condition in pg_class

2024-05-13 Thread Noah Misch
On Mon, May 13, 2024 at 03:53:08PM -0400, Robert Haas wrote:
> On Sun, May 12, 2024 at 7:29 PM Noah Misch  wrote:
> > - [consequences limited to transient failure] Since a PROC_IN_VACUUM 
> > backend's
> >   xmin does not stop pruning, an MVCC scan in that backend can find zero
> >   tuples when one is live.  This is like what all backends got in the days 
> > of
> >   SnapshotNow catalog scans.  See the pgbench test suite addition.  (Perhaps
> >   the fix is to make VACUUM do its MVCC scans outside of PROC_IN_VACUUM,
> >   setting that flag later and unsetting it earlier.)
> 
> Are you saying that this is a problem already, or that the patch
> causes it to start happening? If it's the former, that's horrible.

The former.




Re: New GUC autovacuum_max_threshold ?

2024-05-13 Thread Robert Haas
On Mon, May 13, 2024 at 11:14 AM Frédéric Yhuel
 wrote:
> FWIW, I do agree with your math. I found your demonstration convincing.
> 50 was selected with the wet finger.

Good to know.

> Using the formula I suggested earlier:
>
> vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples,
> vac_base_thresh + vac_scale_factor * sqrt(reltuples) * 1000);
>
> your table of 2.56 billion tuples will be vacuumed if there are
> more than 10 million dead tuples (every 28 minutes).

Yeah, so that is about 50x what we do now (twice an hour vs. once a
day). While that's a lot more reasonable than the behavior that we'd
get from a 500k hard cap (every 84 seconds), I suspect it's still too
aggressive.

I find these things much easier to reason about in gigabytes than in
time units. In that example, the table was 320GB and was getting
vacuumed after accumulating 64GB of bloat. That seems like a lot. It
means that the table can grow from 320GB all the way up until 384GB
before we even think about starting to vacuum it, and then we might
not start right away, depending on resource availability, and we may
take some time to finish, possibly considerable time, depending on the
number and size of indexes and the availability of I/O resources. So
actually the table might very plausibly be well above 400GB before we
get done processing it, or potentially even more. I think that's not
aggressive enough.

But how much would we like to push that 64GB of bloat number down for
a table of this size? I would argue that if we're vacuuming the table
when it's only got 1GB of bloat, or 2GB of bloat, that seems
excessive. Unless the system is very lightly loaded and has no
long-running transactions at all, we're unlikely to be able to vacuum
aggressively enough to keep a 320GB table at a size of 321GB or 322GB.
Without testing or doing any research, I'm going to guess that a
realistic number is probably in the range of 10-20GB of bloat. If the
table activity is very light, we might be able to get it even lower,
like say 5GB, but the costs ramp up very quickly as you push the
vacuuming threshold down. Also, if the table accumulates X amount of
bloat during the time it takes to run one vacuum, you can never
succeed in limiting bloat to a value less than X (and probably more
like 1.5*X or 2*X or something).

So without actually trying anything, which I do think somebody should
do and report results, my guess is that for a 320GB table, you'd like
to multiply the vacuum frequency by a value somewhere between 3 and
10, and probably much closer to 3 than to 10. Maybe even less than 3.
Not sure exactly. Like I say, I think someone needs to try some
different workloads and database sizes and numbers of indexes, and try
to get a feeling for what actually works well in practice.

> If we want to stick with the simple formula, we should probably choose a
> very high default, maybe 100 million, as you suggested earlier.
>
> However, it would be nice to have the visibility map updated more
> frequently than every 100 million dead tuples. I wonder if this could be
> decoupled from the vacuum process?

Yes, but if a page has had any non-HOT updates, it can't become
all-visible again without vacuum. If it has had only HOT updates, then
a HOT-prune could make it all-visible. I don't think we do that
currently, but I think in theory we could.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Why is citext/regress failing on hamerkop?

2024-05-13 Thread Tom Lane
Thomas Munro  writes:
> For citext_utf8, I pushed cff4e5a3.  Hamerkop runs infrequently, so
> here's hoping for 100% green on master by Tuesday or so.

In the meantime, some off-list investigation by Alexander Lakhin
has turned up a good deal of information about why we're seeing
failures on hamerkop in the back branches.  Summarizing, it
appears that

1. In a GSS-enabled Windows build without any active Kerberos server,
libpq's pg_GSS_have_cred_cache() succeeds, allowing libpq to try to
open a GSSAPI connection, but then gss_init_sec_context() fails,
leading to client-side reports like this:

+Connection 2 failed: could not initiate GSSAPI security context: Unspecified 
GSS failure.  Minor code may provide more information: Credential cache is empty
+FATAL:  sorry, too many clients already

(The first of these lines comes out during the attempted GSS
connection, the second during the only-slightly-more-successful
non-GSS connection.)  So that's problem number 1: how is it that
gss_acquire_cred() succeeds but then gss_init_sec_context() disclaims
knowledge of any credentials?  Can we find a way to get this failure
to be detected during pg_GSS_have_cred_cache()?  It is mighty
expensive to launch a backend connection that is doomed to fail,
especially when this happens during *every single libpq connection
attempt*.

2. Once gss_init_sec_context() fails, libpq abandons the connection
and starts over; since it has already initiated a GSS handshake
on the connection, there's not much choice.  Although libpq faithfully
issues closesocket() on the abandoned connection, Alexander found
that the connected backend doesn't reliably see that: it may just
sit there until the AuthenticationTimeout elapses (1 minute by
default).  That backend is still consuming a postmaster child slot,
so if this happens on any sizable fraction of test connection
attempts, it's little surprise that we soon get "sorry, too many
clients already" failures.

3. We don't know exactly why hamerkop suddenly started seeing these
failures, but a plausible theory emerges after noting that its
reported time for the successful "make check" step dropped pretty
substantially right when this started.  In the v13 branch, "make
check" was taking 2:18 or more in the several runs right before the
first isolationcheck failure, but 1:40 or less just after.  So it
looks like the animal was moved onto faster hardware.  That feeds
into this problem because, with a successful isolationcheck run
taking more than a minute, there was enough time for some of the
earlier stuck sessions to time out and exit before their
postmaster-child slots were needed.

Alexander confirmed this theory by demonstrating that the main
regression tests in v15 would pass if he limited their speed enough
(by reducing the CPU allowed to a VM) but not at full speed.  So the
buildfarm results suggesting this is only an issue in <= v13 must
be just a timing artifact; the problem is still there.

Of course, backends waiting till timeout is not a good behavior
independently of what is triggering that, so we have two problems to
solve here, not one.  I have no ideas about the gss_init_sec_context()
failure, but I see a plausible theory about the failure to detect
socket closure in Microsoft's documentation about closesocket() [1]:

If the l_onoff member of the LINGER structure is zero on a stream
socket, the closesocket call will return immediately and does not
receive WSAEWOULDBLOCK whether the socket is blocking or
nonblocking. However, any data queued for transmission will be
sent, if possible, before the underlying socket is closed. This is
also called a graceful disconnect or close. In this case, the
Windows Sockets provider cannot release the socket and other
resources for an arbitrary period, thus affecting applications
that expect to use all available sockets. This is the default
behavior for a socket.

I'm not sure whether we've got unsent data pending in the problematic
condition, nor why it'd remain unsent if we do (shouldn't the backend
consume it anyway?).  But this has the right odor for an explanation.

I'm pretty hesitant to touch this area myself, because it looks an
awful lot like commits 6051857fc and ed52c3707, which eventually
had to be reverted.  I think we need a deeper understanding of
exactly what Winsock is doing or not doing before we try to fix it.
I wonder if there are any Microsoft employees around here with
access to the relevant source code.

In the short run, it might be a good idea to deprecate using
--with-gssapi on Windows builds.  A different stopgap idea
could be to drastically reduce the default AuthenticationTimeout,
perhaps only on Windows.

regards, tom lane

[1] 
https://learn.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-closesocket




Re: race condition in pg_class

2024-05-13 Thread Robert Haas
On Sun, May 12, 2024 at 7:29 PM Noah Misch  wrote:
> - [consequences limited to transient failure] Since a PROC_IN_VACUUM backend's
>   xmin does not stop pruning, an MVCC scan in that backend can find zero
>   tuples when one is live.  This is like what all backends got in the days of
>   SnapshotNow catalog scans.  See the pgbench test suite addition.  (Perhaps
>   the fix is to make VACUUM do its MVCC scans outside of PROC_IN_VACUUM,
>   setting that flag later and unsetting it earlier.)

Are you saying that this is a problem already, or that the patch
causes it to start happening? If it's the former, that's horrible. If
it's the latter, I'd say that is a fatal defect.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

2024-05-13 Thread Robert Haas
On Mon, May 13, 2024 at 10:53 AM Matthias van de Meent
 wrote:
> It's not inconceivable that this will significantly increase WAL
> volume, but I think we should go for correctness rather than fastest
> copy.

I don't think we can afford to just do this blindly for the sake of a
hypothetical non-core AM that uses nonstandard pages. There must be
lots of cases where the holes are large, and where the WAL volume
would be a multiple of what it is currently. That's a *big*
regression.

> If we went with fastest copy, we'd better just skip logging the
> FSM and VM forks because we're already ignoring the data of the pages,
> so why not ignore the pages themselves, too? I don't think that holds
> water when we want to be crash-proof in CREATE DATABASE, with a full
> data copy of the template database.

This seems like a red herring. Either assuming standard pages is a
good idea or it isn't, and either logging the FSM and VM forks is a
good idea or it isn't, but those are two separate questions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Robert Haas
On Mon, May 13, 2024 at 1:42 PM Jelte Fennema-Nio  wrote:
> Like Jacob already said, I guess you meant me here. The main point I
> was trying to make is that sslmode=require is extremely insecure too,
> so if we're changing the default then I'd rather bite the bullet and
> actually make the default a secure one this time. No-ones browser
> trusts self-signed certs by default, but currently lots of people
> trust self-signed certs when connecting to their production database
> without realizing.
>
> IMHO the only benefit that sslmode=require brings over sslmode=prefer
> is detecting incorrectly configured servers i.e. servers that are
> supposed to support ssl but somehow don't due to a misconfigured
> GUC/pg_hba. Such "incorrectly configured server" detection avoids
> sending data to such a server, which an eavesdropper on the network
> could see. Which is definitely a security benefit, but it's an
> extremely small one. In all other cases sslmode=prefer brings exactly
> the same protection as sslmode=require, because sslmode=prefer
> encrypts the connection unless postgres actively tells the client to
> downgrade to plaintext (which never happens when the server is
> configured correctly).

I think I agree with *nearly* every word of this. However:

(1) I don't want to hijack this thread about a v17 open item to talk
too much about a hypothetical v18 proposal.

(2) While in general you need more than just SSL to ensure security,
I'm not sure that there's only one way to do it, and I doubt that we
should try to pick a winner.

(3) I suspect that even going as far as sslmode=require by default is
going to be extremely painful for hackers, the project, and users.
Moving the goalposts further increases the likelihood of nothing
happening at all.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Robert Haas
On Mon, May 13, 2024 at 12:45 PM Jacob Champion
 wrote:
> For the record, I didn't say that... You mean Jelte's quote up above?

Yeah, sorry, I got my J-named hackers confused. Apologies.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?

2024-05-13 Thread Chapman Flack
On 05/13/24 09:35, aa wrote:
> If you call the action of "sifting" ordering, then yes. If you don't call
> it ordering, then no.


One thing seems intriguing about this idea: normally, an expected
property of any ORDER BY is that no result row can be passed down
the pipe until all input rows have been seen.

In the case of ORDER BY , or more generally
ORDER BY , a
pigeonhole sort could be used—and rows mapping to the ordered-first
pigeonhole could be passed down the pipe on sight. (Rows mapping to
any later pigeonhole still have to be held to the end, unless some
further analysis can identify when all rows for earlier pigeonholes
must have been seen).

I don't know whether any such ORDER BY strategy is already implemented,
or would be useful enough to be worth implementing, but it might be
handy in cases where a large number of rows are expected to map to
the first pigeonhole. Intermediate storage wouldn't be needed for those,
and some follow-on processing could go on concurrently.

The usage example offered here ("sift" nulls last, followed by
a LIMIT) does look a lot like a job for a WHERE clause though.

Regards,
-Chap




Re: cataloguing NOT NULL constraints

2024-05-13 Thread Robert Haas
On Mon, May 13, 2024 at 12:45 PM Alvaro Herrera  wrote:
> The point is that a column can be in a primary key and not have an
> explicit not-null constraint.  This is different from having a column be
> NOT NULL and having a primary key on top.  In both cases the attnotnull
> flag is set; the difference between these two scenarios is what happens
> if you drop the primary key.  If you do not have an explicit not-null
> constraint, then the attnotnull flag is lost as soon as you drop the
> primary key.  You don't have to do DROP NOT NULL for that to happen
>
> This means that if you have a column that's in the primary key but does
> not have an explicit not-null constraint, then we shouldn't make one up.
> (Which we would, if we were to keep an unadorned NOT NULL that we can't
> drop at the end of the dump.)

It seems to me that the practical thing to do about this problem is
just decide not to solve it. I mean, it's currently the case that if
you establish a PRIMARY KEY when you create a table, the columns of
that key are marked NOT NULL and remain NOT NULL even if the primary
key is later dropped. So, if that didn't change, we would be no less
compliant with the SQL standard (or your reading of it) than we are
now. And if you do really want to make that change, why not split it
out into its own patch, so that the patch that does $SUBJECT is
changing the minimal number of other things at the same time? That
way, reverting something might not involve reverting everything, plus
you could have a separate design discussion about what that fix ought
to look like, separate from the issues that are truly inherent to
cataloging NOT NULL constraints per se.

What I meant about changing the order of operations is that,
currently, the database knows that the column is NOT NULL before the
COPY happens, and I don't think we can change that. I think you agree
-- that's why you invented the throwaway constraints. As far as I can
see, the problems all have to do with getting the "throwaway" part to
happen correctly. It can't be a problem to just mark the relevant
columns NOT NULL in the relevant tables -- we already do that. But if
you want to discard some of those NOT NULL markings once the PRIMARY
KEY is added, you have to know which ones to discard. If we just
consider the most straightforward scenario where somebody does a full
dump-and-restore, getting that right may be annoying, but it seems
like it surely has to be possible. The dump will just have to
understand which child tables (or, more generally, descendent tables)
got a NOT NULL marking on a column because of the PK and which ones
had an explicit marking in the old database and do the right thing in
each case.

But what if somebody does a selective restore of one table from a
partitioning hierarchy? Currently, the columns that would have been
part of the primary key end up NOT NULL, but the primary key itself is
not restored because it can't be. What will happen in this new system?
If you don't apply any NOT NULL constraints to those columns, then a
user who restores one partition from an old dump and tries to reattach
it to the correct partitioned table has to recheck the NOT NULL
constraint, unlike now. If you apply a normal-looking garden-variety
NOT NULL constraint to that column, you've invented a constraint that
didn't exist in the source database. And if you apply a throwaway NOT
NULL constraint but the user never attaches that table anywhere, then
the throwaway constraint survives. None of those options sound very
good to me.

Another scenario: Say that you have a table with a PRIMARY KEY. For
some reason, you want to drop the primary key and then add it back.
Well, with this definitional change, as soon as you drop it, you
forget that the underlying columns don't contain any nulls, so when
you add it back, you have to check them again. I don't know who would
find that behavior an improvement over what we have today.

So I don't really think it's a great idea to change this behavior, but
even if it is, is it such a good idea that we want to sink the whole
patch set repeatedly over it, as has already happened twice now? I
feel that if we did what Tom suggested a year ago in
https://www.postgresql.org/message-id/3801207.1681057...@sss.pgh.pa.us
-- "I'm inclined to think that this idea of suppressing the implied
NOT NULL from PRIMARY KEY is a nonstarter and we should just go ahead
and make such a constraint" -- there's a very good chance that a
revert would have been avoided here and it would still be just as
valid to think of revisiting this particular question in a future
release as it is now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: race condition in pg_class

2024-05-13 Thread Noah Misch
On Mon, May 13, 2024 at 04:59:59PM +0900, Michael Paquier wrote:
> About inplace050-tests-inj-v1.patch.
> 
> + /* Check if blocked_pid is in injection_wait(). */
> + proc = BackendPidGetProc(blocked_pid);
> + if (proc == NULL)
> + PG_RETURN_BOOL(false);  /* session gone: definitely unblocked */
> + wait_event =
> + 
> pgstat_get_wait_event(UINT32_ACCESS_ONCE(proc->wait_event_info));
> + if (wait_event && strncmp("INJECTION_POINT(",
> +   wait_event,
> +   
> strlen("INJECTION_POINT(")) == 0)
> + PG_RETURN_BOOL(true);
> 
> Hmm.  I am not sure that this is the right interface for the job
> because this is not only related to injection points but to the
> monitoring of a one or more wait events when running a permutation
> step.

Could you say more about that?  Permutation steps don't monitor wait events
today.  This patch would be the first instance of that.

> Perhaps this is something that should be linked to the spec
> files with some property area listing the wait events we're expected
> to wait on instead when running a step that we know will wait?

The spec syntax doesn't distinguish contention types at all.  The isolation
tester's needs are limited to distinguishing:

  (a) process is waiting on another test session
  (b) process is waiting on automatic background activity (autovacuum, mainly)

Automatic background activity doesn't make a process enter or leave
injection_wait(), so all injection point wait events fall in (a).  (The tester
ignores (b), since those clear up without intervention.  Failing to ignore
them, as the tester did long ago, made output unstable.)




Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?

2024-05-13 Thread Tom Lane
aa  writes:
> If you call the action of "sifting" ordering, then yes. If you don't call
> it ordering, then no.
> In essence, is the output of a filtering mechanism, done in a single result
> set pass. And this pass should be the same pass in charge of collecting the
> result set in the first place.

Sounds a lot like a WHERE clause to me.

regards, tom lane




Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?

2024-05-13 Thread aa
Hi,
If you call the action of "sifting" ordering, then yes. If you don't call
it ordering, then no.

In essence, is the output of a filtering mechanism, done in a single result
set pass. And this pass should be the same pass in charge of collecting the
result set in the first place.

Thanks


On Mon, May 13, 2024 at 5:48 AM Wolfgang Wilhelm 
wrote:

> Hi,
>
> do I interpret your idea correctly: You want some sort of ordering without
> ordering?
>
> Kind regards
> WW
>
> Am Montag, 13. Mai 2024 um 10:40:38 MESZ hat aa 
> Folgendes geschrieben:
>
>
> Hello Everyone!
>
> Is there any chance to get some kind of a result set sifting mechanism in
> Postgres?
>
> What I am looking for is a way to get for example: "nulls last" in a
> result set, without having to call "order by" or having to use UNION ALL,
> and if possible to get this in a single result set pass.
>
> Something on this line: SELECT a, b, c FROM my_table WHERE a nulls last
> OFFSET 0 LIMIT 25
>
> I don't want to use order by or union all because these are time consuming
> operations, especially on  large data sets and when comparations are done
> on dynamic values (eg: geolocation distances in between a mobile and a
> static location)
>
> What I would expect from such a feature, will be speeds comparable with
> non sorted selects, while getting a very rudimentary ordering.
>
> A use case for such a mechanism will be the implementation of QUICK
> relevant search results for a search engine.
>
> I'm not familiar with how Postgres logic handles simple select queries,
> but the way I would envision a result set sifting logic, would be to
> collect the result set, in 2 separate lists, based on the sifting
> condition, and then concatenate these 2 lists and return the result, when
> the pagination requests conditions are met.
>
> Any idea if such a functionality is feasible ?
>
> Thank you.
>
>   PS: if ever implemented, the  sifting mechanism could be extended to
> accommodate any type of thresholds, not just null values.
>
>
>
>


I have an exporting need...

2024-05-13 Thread Juan Hernández
Hi team!

First, i want to thank you for having your hands in this. You are doing a
fantastic and blessing job. Bless to you all!

I have a special need i want to comment to you. This is not a bug, is a
need i have and i write here for been redirected where needed.

I have to make a daily backup. The database is growing a lot per day, and
sometimes i've had the need to recover just a table. And would be easier
move a 200M file with only the needed table instead of moving a 5G file
with all the tables i don't need, just a matter of speed.

I've created a script to export every table one by one, so in case i need
to import a table again, don't have the need to use the very big
exportation file, but the "tablename.sql" file created for every table.

My hosting provider truncated my script because is very large (more than
200 lines, each line to export one table), so i think the way i do this is
hurting the server performance.

Then my question.

Do you consider useful to add a parameter (for example, --separatetables)
so when used the exporting file process can create a different
tablename.sql file for each table in database automatically?

Example...

PGHOST="/tmp" PGPASSWORD="mydbpass" pg_dump -U dbusername --separatetables
-Fp --inserts dbname > "/route/dbname.sql"

And if this database has tables table1...table10, then 10 files are
created...

dbname_table1.sql
dbname_table2.sql
dbname_table3.sql
...
dbname_table8.sql
dbname_table9.sql
dbname_table10.sql


In each file, all main parameters will be generated again. For example the
file dbname_table1.sql...

--
-- PostgreSQL database dump
--
-- Dumped from database version 10.21
-- Dumped by pg_dump version 15.6
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
...
...
SET default_tablespace = '';
--
-- Name: table1; Type: TABLE; Schema: public; Owner: dbusername
--
CREATE TABLE public.table1 (
code numeric(5,0),
name character varying(20)
)


I dont know if many developers have same need as me. I hope this help in
future.

Thanks for reading me and thanks for what you've done.. You are doing fine!
Cheers!


__
Juan de Jesús


On the use of channel binding without server certificates (was: Direct SSL connection with ALPN and HBA rules)

2024-05-13 Thread Jacob Champion
[soapbox thread, so I've changed the Subject]

On Mon, May 13, 2024 at 4:08 AM Heikki Linnakangas  wrote:
> "channel_binding=require sslmode=require" also protects from MITM attacks.

This isn't true in the same way that "standard" TLS protects against
MITM. I know you know that, but for the benefit of bystanders reading
the threads, I think we should stop phrasing it like this. Most people
who want MITM protection need to be using verify-full.

Details for those bystanders: Channel binding alone will only
disconnect you after the MITM is discovered, after your startup packet
is leaked but before you send any queries to the server. A hash of
your password will also be leaked in that situation, which starts the
timer on an offline attack. And IIRC, you won't get an alert that says
"someone's in the middle"; it'll just look like you mistyped your
password.

(Stronger passwords provide stronger protection in this situation,
which is not a property that most people are used to. If I choose to
sign into Google with the password "hunter2", it doesn't somehow make
the TLS protection weaker. But if you rely on SCRAM by itself for
server authentication, it does more or less work like that.)

Use channel_binding *in addition to* sslmode=verify-full if you want
enhanced authentication of the peer, as suggested in the docs [1].
Don't rely on channel binding alone for the vast majority of use
cases, and if you know better for your particular use case, then you
already know enough to be able to ignore my advice.

[/soapbox]

Thanks,
--Jacob

[1] https://www.postgresql.org/docs/current/preventing-server-spoofing.html




Re: Fix resource leak (src/backend/libpq/be-secure-common.c)

2024-05-13 Thread Daniel Gustafsson
> On 13 May 2024, at 20:05, Ranier Vilela  wrote:
> Em qua., 10 de abr. de 2024 às 15:33, Daniel Gustafsson  
> escreveu:

> Thanks, I'll have a look.  I've left this for post-freeze on purpose to not
> cause unnecessary rebasing.  Will take a look over the next few days unless
> beaten to it.

> Any chance we'll have these fixes in v17?

Nice timing, I was actually rebasing them today to get them committed.

--
Daniel Gustafsson





Re: Fix resource leak (src/backend/libpq/be-secure-common.c)

2024-05-13 Thread Ranier Vilela
Em qua., 10 de abr. de 2024 às 15:33, Daniel Gustafsson 
escreveu:

> On 10 Apr 2024, at 20:31, Ranier Vilela  wrote:
>
> Em ter., 2 de abr. de 2024 às 15:31, Daniel Gustafsson 
> escreveu:
>
>> > On 2 Apr 2024, at 20:13, Ranier Vilela  wrote:
>>
>> > Fix by freeing the pointer, like pclose_check (src/common/exec.c)
>> similar case.
>>
>> Off the cuff, seems reasonable when loglevel is LOG.
>>
>
> Per Coverity.
>
> Another case of resource leak, when loglevel is LOG.
> In the function shell_archive_file (src/backend/archive/shell_archive.c)
> The pointer *xlogarchcmd*  is not freed.
>
>
> Thanks, I'll have a look.  I've left this for post-freeze on purpose to not
> cause unnecessary rebasing.  Will take a look over the next few days unless
> beaten to it.
>
Any chance we'll have these fixes in v17?

best regards,
Ranier Vilela


Re: Fix out-of-bounds in the function GetCommandTagName

2024-05-13 Thread Ranier Vilela
Em seg., 13 de mai. de 2024 às 14:38, Tom Lane  escreveu:

> David Rowley  writes:
> > I've added a CF entry under your name for this:
> > https://commitfest.postgresql.org/48/4927/
>
> > If it was code new to PG17 I'd be inclined to go ahead with it now,
> > but it does not seem to align with making the release mode stable.
> > I'd bet others will feel differently about that.  Delaying seems a
> > better default choice at least.
>
> The security team's Coverity instance has started to show this
> complaint now too.  So I'm going to go ahead and push this change
> in HEAD.  It's probably unwise to change it in stable branches,
> since there's at least a small chance some external code is using
> COMMAND_TAG_NEXTTAG for the same purpose tag_behavior[] does.
> But we aren't anywhere near declaring v17's API stable, so
> I'd rather fix the issue than dismiss it in HEAD.
>
Thanks for the commit, Tom.

best regards,
Ranier Vilela


Re: UniqueKey v2

2024-05-13 Thread Antonin Houska
Antonin Houska  wrote:

> Andy Fan  wrote:
> > >
> > > * Combining the UKs
> > >
> > >   IMO this is the most problematic part of the patch. You call
> > >   populate_joinrel_uniquekeys() for the same join multiple times,
> > 
> > Why do you think so? The below code is called in "make_join_rel"
> 
> Consider join of tables "a", "b" and "c". My understanding is that
> make_join_rel() is called once with rel1={a} and rel2={b join c}, then with
> rel1={a join b} and rel2={c}, etc. I wanted to say that each call should
> produce the same set of unique keys.
> 
> I need to check this part more in detail.

I think I understand now. By calling populate_joinrel_uniquekeys() for various
orderings, you can find out that various input relation unique keys can
represent the whole join. For example, if the ordering is

A JOIN (B JOIN C)

you can prove that the unique keys of A can be used for the whole join, while
for the ordering

B JOIN (A JOIN C)

you can prove the same for the unique keys of B, and so on.

> > Is your original question is about populate_joinrel_uniquekey_for_rel
> > rather than populate_joinrel_uniquekeys? We have the below codes:
> > 
> > outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, 
> > outerrel,
> > 
> >  innerrel, restrictlist);
> > inneruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, 
> > innerrel,
> > 
> >  outerrel, restrictlist);
> > 
> > This is mainly because of the following theory. Quoted from
> > README.uniquekey. Let's called this as "rule 1".
> > 
> > """
> > How to maintain the uniquekey?
> > ---
> > .. From the join relation, it is maintained with two rules:
> > 
> > - the uniquekey in one side is still unique if it can't be duplicated
> >   after the join. for example:
> > 
> >   SELECT t1.pk FROM t1 JOIN t2 ON t1.a = t2.pk;
> >   UniqueKey on t1:  t1.pk
> >   UniqueKey on t1 Join t2:  t1.pk
> > """
> > 
> > AND the blow codes:
> > 
> > 
> > if (outeruk_still_valid || inneruk_still_valid)
> > 
> > /*
> >  * the uniquekey on outers or inners have been added into 
> > joinrel so
> >  * the combined uniuqekey from both sides is not needed.
> >  */
> > return;
> > 
> > 
> > We don't create the component uniquekey if any one side of the boths
> > sides is unique already. For example:
> > 
> > "(t1.id) in joinrel(t1, t2) is unique" OR "(t2.id) in joinrel is
> > unique", there is no need to create component UniqueKey (t1.id, t2.id);  
> 
> ok, I need to check more in detail how this part works.

This optimization makes sense to me.

> > >
> > >   Of course one problem is that the number of combinations can grow
> > >   exponentially as new relations are joined.
> > 
> > Yes, that's why "rule 1" needed and "How to reduce the overhead" in
> > UniqueKey.README is introduced. 

I think there should yet be some guarantee that the number of unique keys does
not grow exponentially. Perhaps a constant that allows a relation (base or
join) to have at most N unique keys. (I imagine N to be rather small, e.g. 3
or 4.) And when picking the "best N keys", one criterion could be the number
of expressions in the key (the shorter key the better).

> > >
> > >   2) Check if the join relation is single-row
> > >
> > >   I in order to get rid of the dependency on 'restrictlist', I think you 
> > > can
> > >   use ECs. Consider a query from your regression tests:
> > >
> > > CREATE TABLE uk_t (id int primary key, a int not null, b int not null, c 
> > > int, d int, e int);
> > >
> > > SELECT distinct t1.d FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id and t1.id 
> > > = 1;
> > >
> > >   The idea here seems to be that no more than one row of t1 matches the 
> > > query
> > >   clauses. Therefore, if t2(id) is unique, the clause t1.e=t2.id ensures 
> > > that
> > >   no more than one row of t2 matches the query (because t1 cannot provide 
> > > the
> > >   clause with more than one input value of 'e'). And therefore, the join 
> > > also
> > >   produces at most one row.
> > 
> > You are correct and IMO my current code are able to tell it is a single
> > row as well.
> > 
> > 1. Since t1.id = 1, so t1 is single row, so t1.d is unqiuekey as a
> > consequence.
> > 2. Given t2.id is unique, t1.e = t2.id so t1's unqiuekey can be kept
> > after the join because of rule 1 on joinrel. and t1 is singlerow, so the
> > joinrel is singlerow as well.

ok, I think I understand now.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jelte Fennema-Nio
On Mon, 13 May 2024 at 18:14, Robert Haas  wrote:
> I disagree with Jacob's assertion that sslmode=require has no security
> benefits over sslmode=prefer. That seems like the kind of pessimism
> that makes people hate security professionals. There have got to be
> some attacks that are foreclosed by encrypting the connection, even if
> you don't stop MITM attacks or other things that are more
> sophisticated than running wireshark and seeing what goes by on the
> wire.

Like Jacob already said, I guess you meant me here. The main point I
was trying to make is that sslmode=require is extremely insecure too,
so if we're changing the default then I'd rather bite the bullet and
actually make the default a secure one this time. No-ones browser
trusts self-signed certs by default, but currently lots of people
trust self-signed certs when connecting to their production database
without realizing.

IMHO the only benefit that sslmode=require brings over sslmode=prefer
is detecting incorrectly configured servers i.e. servers that are
supposed to support ssl but somehow don't due to a misconfigured
GUC/pg_hba. Such "incorrectly configured server" detection avoids
sending data to such a server, which an eavesdropper on the network
could see. Which is definitely a security benefit, but it's an
extremely small one. In all other cases sslmode=prefer brings exactly
the same protection as sslmode=require, because sslmode=prefer
encrypts the connection unless postgres actively tells the client to
downgrade to plaintext (which never happens when the server is
configured correctly).




Re: Fix out-of-bounds in the function GetCommandTagName

2024-05-13 Thread Tom Lane
David Rowley  writes:
> I've added a CF entry under your name for this:
> https://commitfest.postgresql.org/48/4927/

> If it was code new to PG17 I'd be inclined to go ahead with it now,
> but it does not seem to align with making the release mode stable.
> I'd bet others will feel differently about that.  Delaying seems a
> better default choice at least.

The security team's Coverity instance has started to show this
complaint now too.  So I'm going to go ahead and push this change
in HEAD.  It's probably unwise to change it in stable branches,
since there's at least a small chance some external code is using
COMMAND_TAG_NEXTTAG for the same purpose tag_behavior[] does.
But we aren't anywhere near declaring v17's API stable, so
I'd rather fix the issue than dismiss it in HEAD.

regards, tom lane




Re: Allowing additional commas between columns, and at the end of the SELECT clause

2024-05-13 Thread Tom Lane
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Tom Lane  writes:
>> I'm fairly down on this idea for SQL, because I think it creates
>> ambiguity for the ROW() constructor syntax.  That is:
>>  (x,y) is understood to be shorthand for ROW(x,y)
>>  (x) is not ROW(x), it's just x
>>  (x,) means what?

> Python has a similar issue: (x, y) is a tuple, but (x) is just x, and
> they use the trailing comma to disambiguate, so (x,) creates a
> single-item tuple.  AFAIK it's the only place where the trailing comma
> is significant.

Ugh :-(.  The semantic principle I'd prefer to have here is "a trailing
comma is ignored", but what they did breaks that.  But then again,
I'm not particularly a fan of anything about Python's syntax.

> Yeah, a more principled approach would be to not special-case target
> lists, but to allow one (and only one) trailing comma everywhere:
> select, order by, group by, array constructors, row constructors,
> everything that looks like a function call, etc.

If it can be made to work everywhere, that would get my vote.
I'm not sure if any other ambiguities arise, though.  SQL has
a lot of weird syntax corners (and the committee keeps adding
more :-().

regards, tom lane




Re: Improving information_schema._pg_expandarray()

2024-05-13 Thread Dagfinn Ilmari Mannsåker
[ I got distracted while writing this follow-up and only just found it
  in my list of unsent Gnus buffers, and now it's probably too late to
  make it for 17, but here it is anyway while I remember. ]

Tom Lane  writes:

> I happened to notice that information_schema._pg_expandarray(),
> which has the nigh-unreadable definition
>
> AS 'select $1[s],
> s operator(pg_catalog.-) pg_catalog.array_lower($1,1) 
> operator(pg_catalog.+) 1
> from pg_catalog.generate_series(pg_catalog.array_lower($1,1),
> pg_catalog.array_upper($1,1),
> 1) as g(s)';
>
> can now be implemented using unnest():
>
> AS 'SELECT * FROM pg_catalog.unnest($1) WITH ORDINALITY';
>
> It seems to be slightly more efficient this way, but the main point
> is to make it more readable.

I didn't spot this until it got committed, but it got me wondering what
eliminating the wrapper function completely would look like, so I
whipped up the attached.  It instead calls UNNEST() laterally in the
queries, which has the side benefit of getting rid of several
subselects, one of which was particularly confusing.  In one place the
lateral form eliminated the need for WITH ORDINALITY as well.


- ilmari

>From 9d1b2f2d16f10903d975a3bb7551a38c5ce62e15 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Thu, 28 Dec 2023 23:47:33 +
Subject: [PATCH] Eliminate information_schema._pg_expandarray completely

Commit 58054de2d0847c09ef091956f72ae5e9fb9a176e made it into a simple
wrapper around unnest, but we can simplfy things further by calling
unnest directly in the queries.
---
 src/backend/catalog/information_schema.sql | 104 +
 src/backend/utils/adt/arrayfuncs.c |   3 -
 src/test/regress/expected/psql.out |  30 +++---
 src/test/regress/sql/psql.sql  |   2 +-
 4 files changed, 58 insertions(+), 81 deletions(-)

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index c4145131ce..cf25f5d1bc 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -39,22 +39,15 @@ SET search_path TO information_schema;
  * A few supporting functions first ...
  */
 
-/* Expand any 1-D array into a set with integers 1..N */
-CREATE FUNCTION _pg_expandarray(IN anyarray, OUT x anyelement, OUT n int)
-RETURNS SETOF RECORD
-LANGUAGE sql STRICT IMMUTABLE PARALLEL SAFE
-ROWS 100 SUPPORT pg_catalog.array_unnest_support
-AS 'SELECT * FROM pg_catalog.unnest($1) WITH ORDINALITY';
-
 /* Given an index's OID and an underlying-table column number, return the
  * column's position in the index (NULL if not there) */
 CREATE FUNCTION _pg_index_position(oid, smallint) RETURNS int
 LANGUAGE sql STRICT STABLE
 BEGIN ATOMIC
-SELECT (ss.a).n FROM
-  (SELECT information_schema._pg_expandarray(indkey) AS a
-   FROM pg_catalog.pg_index WHERE indexrelid = $1) ss
-  WHERE (ss.a).x = $2;
+SELECT ik.icol
+FROM pg_catalog.pg_index,
+ pg_catalog.unnest(indkey) WITH ORDINALITY ik(tcol, icol)
+WHERE indexrelid = $1 AND ik.tcol = $2;
 END;
 
 CREATE FUNCTION _pg_truetypid(pg_attribute, pg_type) RETURNS oid
@@ -1079,37 +1072,32 @@ GRANT SELECT ON enabled_roles TO PUBLIC;
 
 CREATE VIEW key_column_usage AS
 SELECT CAST(current_database() AS sql_identifier) AS constraint_catalog,
-   CAST(nc_nspname AS sql_identifier) AS constraint_schema,
+   CAST(nc.nspname AS sql_identifier) AS constraint_schema,
CAST(conname AS sql_identifier) AS constraint_name,
CAST(current_database() AS sql_identifier) AS table_catalog,
-   CAST(nr_nspname AS sql_identifier) AS table_schema,
+   CAST(nr.nspname AS sql_identifier) AS table_schema,
CAST(relname AS sql_identifier) AS table_name,
CAST(a.attname AS sql_identifier) AS column_name,
-   CAST((ss.x).n AS cardinal_number) AS ordinal_position,
+   CAST(ck.icol AS cardinal_number) AS ordinal_position,
CAST(CASE WHEN contype = 'f' THEN
-   _pg_index_position(ss.conindid, ss.confkey[(ss.x).n])
+   _pg_index_position(c.conindid, c.confkey[ck.icol])
  ELSE NULL
 END AS cardinal_number)
  AS position_in_unique_constraint
 FROM pg_attribute a,
- (SELECT r.oid AS roid, r.relname, r.relowner,
- nc.nspname AS nc_nspname, nr.nspname AS nr_nspname,
- c.oid AS coid, c.conname, c.contype, c.conindid,
- c.confkey, c.confrelid,
- _pg_expandarray(c.conkey) AS x
-  FROM pg_namespace nr, pg_class r, pg_namespace nc,
-   pg_constraint c
-  WHERE nr.oid = r.relnamespace
-AND r.oid = c.conrelid
-AND nc.oid = c.connamespace
-AND c.contype IN ('p', 'u', 'f')

Re: cataloguing NOT NULL constraints

2024-05-13 Thread Alvaro Herrera
On 2024-May-13, Robert Haas wrote:

> On Mon, May 13, 2024 at 9:44 AM Alvaro Herrera  
> wrote:
> > The problematic point is the need to add NOT NULL constraints during
> > table creation that don't exist in the table being dumped, for
> > performance of primary key creation -- I called this a throwaway
> > constraint.  We needed to be able to drop those constraints after the PK
> > was created.  These were marked NO INHERIT to allow them to be dropped,
> > which is easier if the children don't have them.  This all worked fine.
> 
> This seems really weird to me. Why is it necessary? I mean, in
> existing releases, if you declare a column as PRIMARY KEY, the columns
> included in the key are forced to be NOT NULL, and you can't change
> that for so long as they are included in the PRIMARY KEY.

The point is that a column can be in a primary key and not have an
explicit not-null constraint.  This is different from having a column be
NOT NULL and having a primary key on top.  In both cases the attnotnull
flag is set; the difference between these two scenarios is what happens
if you drop the primary key.  If you do not have an explicit not-null
constraint, then the attnotnull flag is lost as soon as you drop the
primary key.  You don't have to do DROP NOT NULL for that to happen.

This means that if you have a column that's in the primary key but does
not have an explicit not-null constraint, then we shouldn't make one up.
(Which we would, if we were to keep an unadorned NOT NULL that we can't
drop at the end of the dump.)

> So I would have thought that after this patch, you'd end up with the
> same thing.

At least as I interpret the standard, you wouldn't.

> One way of doing that would be to make the PRIMARY KEY depend on the
> now-catalogued NOT NULL constraints, and the other way would be to
> keep it as an ad-hoc prohibition, same as now.

That would be against what [I think] the standard says.

> But I don't see why I need to end up with what the patch generates,
> which seems to be something like CONSTRAINT pgdump_throwaway_notnull_0
> NOT NULL NO INHERIT. That kind of thing suggests that we're changing
> around the order of operations in pg_dump, probably by adding the NOT
> NULL constraints at a later stage than currently, and I think the
> proper solution is most likely to be to avoid doing that in the first
> place.

The point of the throwaway constraints is that they don't remain after
the dump has restored completely.  They are there only so that we don't
have to scan the data looking for possible nulls when we create the
primary key.  We have a DROP CONSTRAINT for the throwaway not-nulls as
soon as the PK is created.

We're not changing any order of operations as such.

> That's not to say that we shouldn't try to make improvements, just
> that it may be hard to get right.

Sure, that's why this patch has now been reverted twice :-) and has been
in the works for ... how many years now?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jacob Champion
(There's, uh, a lot to respond to above and I'm trying to figure out
how best to type up all of it.)

On Mon, May 13, 2024 at 9:13 AM Robert Haas  wrote:
> However,
> I disagree with Jacob's assertion that sslmode=require has no security
> benefits over sslmode=prefer.

For the record, I didn't say that... You mean Jelte's quote up above?:

> sslmode=prefer and sslmode=require
> are the same amount of insecure imho (i.e. extremely insecure).

I agree that requiring passive security is tangibly better than
allowing fallback to plaintext. I think Jelte's point might be better
stated as, =prefer and =require give the same amount of protection
against active attack (none).

--Jacob




Re: Allowing additional commas between columns, and at the end of the SELECT clause

2024-05-13 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> =?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
>> Matthias van de Meent  writes:
>>> Single trailing commas are a feature that's more and more common in
>>> languages, yes, but arbitrary excess commas is new to me. Could you
>>> provide some examples of popular languages which have that, as I can't
>>> think of any.
>
>> The only one I can think of is Perl, which I'm not sure counts as
>> popular any more.  JavaScript allows consecutive commas in array
>> literals, but they're not no-ops, they create empty array slots:
>
> I'm fairly down on this idea for SQL, because I think it creates
> ambiguity for the ROW() constructor syntax.  That is:
>
>   (x,y) is understood to be shorthand for ROW(x,y)
>
>   (x) is not ROW(x), it's just x
>
>   (x,) means what?

Python has a similar issue: (x, y) is a tuple, but (x) is just x, and
they use the trailing comma to disambiguate, so (x,) creates a
single-item tuple.  AFAIK it's the only place where the trailing comma
is significant.

> I realize the original proposal intended to restrict the legality of
> excess commas to only a couple of places, but to me that just flags
> it as a kluge.  ROW(...) ought to work pretty much the same as a
> SELECT list.

Yeah, a more principled approach would be to not special-case target
lists, but to allow one (and only one) trailing comma everywhere:
select, order by, group by, array constructors, row constructors,
everything that looks like a function call, etc.

> As already mentioned, if you can get some variant of this through the
> SQL standards process, we'll probably adopt it.  But I doubt that we
> want to get out front of the committee in this area.

Agreed.

>   regards, tom lane

- ilmari




Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Robert Haas
On Mon, May 13, 2024 at 9:37 AM Heikki Linnakangas  wrote:
> On 10/05/2024 16:50, Heikki Linnakangas wrote:
> > New proposal:
> >
> > - Remove the "try both" mode completely, and rename "requiredirect" to
> > just "direct". So there would be just two modes: "postgres" and
> > "direct". On reflection, the automatic fallback mode doesn't seem very
> > useful. It would make sense as the default, because then you would get
> > the benefits automatically in most cases but still be compatible with
> > old servers. But if it's not the default, you have to fiddle with libpq
> > settings anyway to enable it, and then you might as well use the
> > "requiredirect" mode when you know the server supports it. There isn't
> > anything wrong with it as such, but given how much confusion there's
> > been on how this all works, I'd prefer to cut this back to the bare
> > minimum now. We can add it back in the future, and perhaps make it the
> > default at the same time. This addresses points 2. and 3. above.
> >
> > and:
> >
> > - Only allow sslnegotiation=direct with sslmode=require or higher. This
> > is what you, Jacob, wanted to do all along, and addresses point 1.
>
> Here's a patch to implement that.

I find this idea to be a massive improvement over the status quo, and
I didn't spot any major problems when I read through the patch,
either. I'm not quite sure if the patch takes the right approach in
emphasizing that weaker sslmode settings are not allowed because of
unintended fallbacks. It seems to me that we could equally well say
that those combinations are nonsensical. If we're making a direct SSL
connection, SSL is eo ipso required.

I don't have a strong opinion about whether sslnegotiation=direct
should error out (as you propose here) or silently promote sslmode to
require. I think either is defensible. Had I been implementing it, I
think I would have done as Jacob proposes, just because once we've
forced a direct SSL negotiation surely the only sensible behavior is
to be using SSL, unless you think there should be a
silently-reconnect-without-SSL behavior, which I sure don't. However,
I disagree with Jacob's assertion that sslmode=require has no security
benefits over sslmode=prefer. That seems like the kind of pessimism
that makes people hate security professionals. There have got to be
some attacks that are foreclosed by encrypting the connection, even if
you don't stop MITM attacks or other things that are more
sophisticated than running wireshark and seeing what goes by on the
wire.

I'm pleased to hear that you will propose to make sslmode=require the
default in v18. I think we'll need to do some work to figure out how
much collateral damage that will cause, and maybe it will be more than
we can stomach, but Magnus has been saying for years that the current
default is terrible. I'm not sure I was entirely convinced of that the
first time I heard him say it, but I'm smarter now than I was then.
It's really hard to believe in 2024 that anyone should ever be using a
setting that may or may not encrypt the connection. There's http and
https but there's no httpmaybes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: An improved README experience for PostgreSQL

2024-05-13 Thread Nathan Bossart
On Mon, May 13, 2024 at 05:43:45PM +0200, Alvaro Herrera wrote:
> Can't we add these two lines per topic to the README.md?

That would be fine with me, too.  The multiple-files approach is perhaps a
bit more tailored to GitHub, but there's something to be said for keeping
this information centralized.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: An improved README experience for PostgreSQL

2024-05-13 Thread Alvaro Herrera
On 2024-May-13, Nathan Bossart wrote:

> > If we want to enhance the GitHub experience, we can also add these files to
> > the organization instead: 
> > https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/creating-a-default-community-health-file
> 
> This was the intent of my patch.  There might be a few others that we could
> use, but I figured we could start with the low-hanging fruit that would
> have the most impact on the GitHub experience.

Can't we add these two lines per topic to the README.md?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"




Re: An improved README experience for PostgreSQL

2024-05-13 Thread Nathan Bossart
On Sun, May 12, 2024 at 05:17:42PM +0200, Peter Eisentraut wrote:
> I don't know, I find these files kind of "yelling".  It's fine to have a
> couple, but now it's getting a bit much, and there are more that could be
> added.

I'm not sure what you mean by this.  Do you mean that the contents are too
blunt?  That there are too many files?  Something else?

> If we want to enhance the GitHub experience, we can also add these files to
> the organization instead: 
> https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/creating-a-default-community-health-file

This was the intent of my patch.  There might be a few others that we could
use, but I figured we could start with the low-hanging fruit that would
have the most impact on the GitHub experience.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: cataloguing NOT NULL constraints

2024-05-13 Thread Robert Haas
On Mon, May 13, 2024 at 9:44 AM Alvaro Herrera  wrote:
> The problematic point is the need to add NOT NULL constraints during
> table creation that don't exist in the table being dumped, for
> performance of primary key creation -- I called this a throwaway
> constraint.  We needed to be able to drop those constraints after the PK
> was created.  These were marked NO INHERIT to allow them to be dropped,
> which is easier if the children don't have them.  This all worked fine.

This seems really weird to me. Why is it necessary? I mean, in
existing releases, if you declare a column as PRIMARY KEY, the columns
included in the key are forced to be NOT NULL, and you can't change
that for so long as they are included in the PRIMARY KEY. So I would
have thought that after this patch, you'd end up with the same thing.
One way of doing that would be to make the PRIMARY KEY depend on the
now-catalogued NOT NULL constraints, and the other way would be to
keep it as an ad-hoc prohibition, same as now. In PostgreSQL 16, I get
a dump like this:

CREATE TABLE public.foo (
a integer NOT NULL,
b text
);

COPY public.foo (a, b) FROM stdin;
\.

ALTER TABLE ONLY public.foo
ADD CONSTRAINT foo_pkey PRIMARY KEY (a);

If I'm dumping from an existing release, I don't see why any of that
needs to change. The NOT NULL decoration should lead to a
system-generated constraint name. If I'm dumping from a new release,
the NOT NULL decoration needs to be replaced with CONSTRAINT
existing_constraint_name NOT NULL. But I don't see why I need to end
up with what the patch generates, which seems to be something like
CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT. That kind
of thing suggests that we're changing around the order of operations
in pg_dump, probably by adding the NOT NULL constraints at a later
stage than currently, and I think the proper solution is most likely
to be to avoid doing that in the first place.

> However, at some point we realized that we needed to add NOT NULL
> constraints in child tables for the columns in which the parent had a
> primary key.  Then things become messy because we had the throwaway
> constraints on one hand and the not-nulls that descend from the PK on
> the other hand, where one was NO INHERIT and the other wasn't; worse if
> the child also has a primary key.

This seems like another problem that is created by changing the order
of operations in pg_dump.

> > The other possibility that occurs to me is that I think the motivation
> > for cataloging NOT NULL constraints was that we wanted to be able to
> > track dependencies on them, or something like that, which seems like
> > it might be able to create issues of the type that you're facing, but
> > the details aren't clear to me.
>
> NOT VALID constraints would be extremely useful, for one thing (because
> then you don't need to exclusively-lock the table during a long scan in
> order to add a constraint), and it's just one step away from having
> these constraints be catalogued.  It was also fixing some inconsistent
> handling of inheritance cases.

I agree that NOT VALID constraints would be very useful. I'm a little
scared by the idea of fixing inconsistent handling of inheritance
cases, just for fear that there may be more things relying on the
inconsistent behavior than we realize. I feel like this is an area
where it's easy for changes to be scarier than they at first seem. I
still have memories of discovering some of the current behavior back
in the mid-2000s when I was learning PostgreSQL (and databases
generally). It struck me as fiddly back then, and it still does. I
feel like there are probably some behaviors that look like arbitrary
decisions but are actually very important for some undocumented
reason. That's not to say that we shouldn't try to make improvements,
just that it may be hard to get right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: New GUC autovacuum_max_threshold ?

2024-05-13 Thread Frédéric Yhuel




Le 09/05/2024 à 16:58, Robert Haas a écrit :

As I see it, a lot of the lack of agreement up until now is people
just not understanding the math. Since I think I've got the right idea
about the math, I attribute this to other people being confused about
what is going to happen and would tend to phrase it as: some people
don't understand how catastrophically bad it will be if you set this
value too low.


FWIW, I do agree with your math. I found your demonstration convincing. 
50 was selected with the wet finger.


Using the formula I suggested earlier:

vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples, 
vac_base_thresh + vac_scale_factor * sqrt(reltuples) * 1000);


your table of 2.56 billion tuples will be vacuumed if there are
more than 10 million dead tuples (every 28 minutes).

If we want to stick with the simple formula, we should probably choose a 
very high default, maybe 100 million, as you suggested earlier.


However, it would be nice to have the visibility map updated more 
frequently than every 100 million dead tuples. I wonder if this could be 
decoupled from the vacuum process?





Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Heikki Linnakangas

On 13/05/2024 16:55, Jelte Fennema-Nio wrote:

On Mon, 13 May 2024 at 15:38, Heikki Linnakangas  wrote:

Here's a patch to implement that.


+   if (conn->sslnegotiation[0] == 'd' &&
+   conn->sslmode[0] != 'r' && conn->sslmode[0] != 'v')

I think these checks should use strcmp instead of checking magic first
characters. I see this same clever trick is used in the recently added
init_allowed_encryption_methods, and I think that should be changed to
use strcmp too for readability.


Oh yeah, I hate that too. These should be refactored into enums, with a 
clear separate stage of parsing the options from strings. But we use 
that pattern all over the place, so I didn't want to start reforming it 
with this patch.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

2024-05-13 Thread Matthias van de Meent
On Mon, 13 May 2024 at 16:13, Tom Lane  wrote:
>
> Matthias van de Meent  writes:
> > PFA a patch that fixes this issue, by assuming that all pages in the
> > source database utilize a non-standard page layout.
>
> Surely that cure is worse than the disease?

I don't know where we would get the information whether the selected
relation fork's pages are standard-compliant. We could base it off of
the fork number (that info is available locally) but that doesn't
guarantee much.
For VM and FSM-pages we know they're essentially never
standard-compliant (hence this thread), but for the main fork it is
anyone's guess once the user has installed an additional AM - which we
don't detect nor pass through to the offending
RelationCopyStorageUsingBuffer.

As for "worse", the default template database is still much smaller
than the working set of most databases. This will indeed regress the
workload a bit, but only by the fraction of holes in the page + all
FSM/VM data.
I think the additional WAL volume during CREATE DATABASE is worth it
when the alternative is losing that data with physical
replication/secondary instances. Note that this does not disable page
compression, it just stops the logging of holes in pages; holes which
generally are only a fraction of the whole database.

It's not inconceivable that this will significantly increase WAL
volume, but I think we should go for correctness rather than fastest
copy. If we went with fastest copy, we'd better just skip logging the
FSM and VM forks because we're already ignoring the data of the pages,
so why not ignore the pages themselves, too? I don't think that holds
water when we want to be crash-proof in CREATE DATABASE, with a full
data copy of the template database.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?

2024-05-13 Thread Isaac Morland
On Mon, 13 May 2024 at 04:40, aa  wrote:

> Hello Everyone!
>
> Is there any chance to get some kind of a result set sifting mechanism in
> Postgres?
>
> What I am looking for is a way to get for example: "nulls last" in a
> result set, without having to call "order by" or having to use UNION ALL,
> and if possible to get this in a single result set pass.
>
> Something on this line: SELECT a, b, c FROM my_table WHERE a nulls last
> OFFSET 0 LIMIT 25
>
> I don't want to use order by or union all because these are time consuming
> operations, especially on  large data sets and when comparations are done
> on dynamic values (eg: geolocation distances in between a mobile and a
> static location)
>

This already exists: ORDER BY a IS NULL

I've found it to be more useful than one might initially expect to order by
a boolean expression.


Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jelte Fennema-Nio
On Mon, 13 May 2024 at 13:07, Heikki Linnakangas  wrote:
> "channel_binding=require sslmode=require" also protects from MITM attacks.

Cool, I didn't realize we had this connection option and it could be
used like this. But I think there's a few security downsides of
channel_binding=require over sslmode=verify-full: If the client relies
on channel_binding to validate server authenticity, a leaked
server-side SCRAM hash is enough for an attacker to impersonate a
server. While normally a leaked scram hash isn't really much of a
security concern (assuming long enough passwords). I also don't know
of many people rotating their scram hashes, even though many rotate
TLS certs.

> I think these options should be designed from the user's point of view,
> so that the user can specify the risks they're willing to accept, and
> the details of how that's accomplished are handled by libpq. For
> example, I'm OK with (tick all that apply):
>
> [ ] passive eavesdroppers seeing all the traffic
> [ ] MITM being able to hijack the session
> [ ] connecting without verifying the server's identity
> [ ] divulging the plaintext password to the server
> [ ] ...

I think that sounds like a great idea, looking forward to the proposal.




Re: Why is citext/regress failing on hamerkop?

2024-05-13 Thread Andrew Dunstan



On 2024-05-12 Su 18:05, Thomas Munro wrote:

On Mon, May 13, 2024 at 12:26 AM Andrew Dunstan  wrote:

Well, this is more or less where I came in back in about 2002 :-) I've been 
trying to help support it ever since, mainly motivated by stubborn persistence 
than anything else. Still, I agree that the lack of support for the Windows 
port from Microsoft over the years has been more than disappointing.

I think "state of the Windows port" would make a good discussion topic
at pgconf.dev (with write-up for those who can't be there).  If there
is interest, I could organise that with a short presentation of the
issues I am aware of so far and possible improvements and
smaller-things-we-could-drop-instead-of-dropping-the-whole-port.  I
would focus on technical stuff, not who-should-be-doing-what, 'cause I
can't make anyone do anything.



+1


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

2024-05-13 Thread Tom Lane
Matthias van de Meent  writes:
> PFA a patch that fixes this issue, by assuming that all pages in the
> source database utilize a non-standard page layout.

Surely that cure is worse than the disease?

regards, tom lane




Re: Show WAL write and fsync stats in pg_stat_io

2024-05-13 Thread Bharath Rupireddy
On Fri, Apr 19, 2024 at 1:32 PM Nazir Bilal Yavuz  wrote:
>
> > I wanted to inform you that the 73f0a13266 commit changed all WALRead
> > calls to read variable bytes, only the WAL receiver was reading
> > variable bytes before.
>
> I want to start working on this again if possible. I will try to
> summarize the current status:

Thanks for working on this.

> * With the 73f0a13266 commit, the WALRead() function started to read
> variable bytes in every case. Before, only the WAL receiver was
> reading variable bytes.
>
> * With the 91f2cae7a4 commit, WALReadFromBuffers() is merged. We were
> discussing what we have to do when this is merged. It is decided that
> WALReadFromBuffers() does not call pgstat_report_wait_start() because
> this function does not perform any IO [1]. We may follow the same
> logic by not including these to pg_stat_io?

Right. WALReadFromBuffers doesn't do any I/O.

Whoever reads WAL from disk (backends, walsenders, recovery process)
using pg_pread (XLogPageRead, WALRead) needs to be tracked in
pg_stat_io or some other view. If it were to be in pg_stat_io,
although we may not be able to distinguish WAL read stats at a backend
level (like how many times/bytes a walsender or recovery process or a
backend read WAL from disk), but it can help understand overall impact
of WAL read I/O at a cluster level. With this approach, the WAL I/O
stats are divided up - WAL read I/O and write I/O stats are in
pg_stat_io and pg_stat_wal respectively.

This makes me think if we need to add WAL read I/O stats also to
pg_stat_wal. Then, we can also add WALReadFromBuffers stats
hits/misses there. With this approach, pg_stat_wal can be a one-stop
view for all the WAL related stats. If needed, we can join info from
pg_stat_wal to pg_stat_io in system_views.sql so that the I/O stats
are emitted to the end-user via pg_stat_io.

> * With the b5a9b18cd0 commit, streaming I/O is merged but AFAIK this
> does not block anything related to putting WAL stats in pg_stat_io.
>
> If I am not missing any new changes, the only problem is reading
> variable bytes now. We have discussed a couple of solutions:
>
> 1- Change op_bytes to something like -1, 0, 1, NULL etc. and document
> that this means some variable byte I/O is happening.
>
> I kind of dislike this solution because if the *only* read I/O is
> happening in variable bytes, it will look like write and extend I/Os
> are happening in variable bytes as well. As a solution, it could be
> documented that only read I/Os could happen in variable bytes for now.

Yes, read I/O for relation and WAL can happen in variable bytes. I
think this idea seems reasonable and simple yet useful to know the
cluster-wide read I/O.

However, another point here is how the total number of bytes read is
represented with existing pg_stat_io columns 'reads' and 'op_bytes'.
It is known now with 'reads' * 'op_bytes', but with variable bytes,
how is read bytes calculated? Maybe add new columns
read_bytes/write_bytes?

> 2- Use op_bytes_[read | write | extend] columns instead of one
> op_bytes column, also use the first solution.
>
> This can solve the first solution's weakness but it introduces two
> more columns. This is more future proof compared to the first solution
> if there is a chance that some variable I/O could happen in write and
> extend calls as well in the future.

-1 as more columns impact the readability and usability.

> 3- Create a new pg_stat_io_wal view to put WAL I/Os here instead of 
> pg_stat_io.
>
> pg_stat_io could remain untouchable and we will have flexibility to
> edit this new view as much as we want. But the original aim of the
> pg_stat_io is evaluating all I/O from a single view and adding a new
> view breaks this aim.

-1 as it defeats the very purpose of one-stop view pg_stat_io for all
kinds of I/O. PS: see my response above about adding both WAL write
I/O and read I/O stats to pg_stat_wal.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Allowing additional commas between columns, and at the end of the SELECT clause

2024-05-13 Thread Tom Lane
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Matthias van de Meent  writes:
>> Single trailing commas are a feature that's more and more common in
>> languages, yes, but arbitrary excess commas is new to me. Could you
>> provide some examples of popular languages which have that, as I can't
>> think of any.

> The only one I can think of is Perl, which I'm not sure counts as
> popular any more.  JavaScript allows consecutive commas in array
> literals, but they're not no-ops, they create empty array slots:

I'm fairly down on this idea for SQL, because I think it creates
ambiguity for the ROW() constructor syntax.  That is:

(x,y) is understood to be shorthand for ROW(x,y)

(x) is not ROW(x), it's just x

(x,) means what?

I realize the original proposal intended to restrict the legality of
excess commas to only a couple of places, but to me that just flags
it as a kluge.  ROW(...) ought to work pretty much the same as a
SELECT list.

As already mentioned, if you can get some variant of this through the
SQL standards process, we'll probably adopt it.  But I doubt that we
want to get out front of the committee in this area.

regards, tom lane




Re: BitmapHeapScan streaming read user and prelim refactoring

2024-05-13 Thread Melanie Plageman
On Sat, May 11, 2024 at 3:18 PM Tomas Vondra
 wrote:
>
> On 5/10/24 21:48, Melanie Plageman wrote:
> > Attached is v3. I didn't use your exact language because the test
> > wouldn't actually verify that we properly discard the tuples. Whether
> > or not the empty tuples are all emitted, it just resets the counter to
> > 0. I decided to go with "exercise" instead.
> >
>
> I did go over the v3 patch, did a bunch of tests, and I think it's fine
> and ready to go. The one thing that might need some minor tweaks is the
> commit message.
>
> 1) Isn't the subject "Remove incorrect assert" a bit misleading, as the
> patch does not simply remove an assert, but replaces it with a reset of
> the field the assert used to check? (The commit message does not mention
> this either, at least not explicitly.)

I've updated the commit message.

> 2) The "heap AM-specific bitmap heap scan code" sounds a bit strange to
> me, isn't the first "heap" unnecessary?

bitmap heap scan has been used to refer to bitmap table scans, as the
name wasn't changed from heap when the table AM API was added (e.g.
BitmapHeapNext() is used by all table AMs doing bitmap table scans).
04e72ed617be specifically pushed the skip fetch optimization into heap
implementations of bitmap table scan callbacks, so it was important to
make this distinction. I've changed the commit message to say heap AM
implementations of bitmap table scan callbacks.

While looking at the patch again, I wondered if I should set
enable_material=false in the test. It doesn't matter from the
perspective of exercising the correct code; however, I wasn't sure if
disabling materialization would make the test more resilient against
future planner changes which could cause it to incorrectly fail.

- Melanie
From d3628d8d36b2be54b64c18402bdda80e1c8a436f Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Fri, 10 May 2024 14:52:34 -0400
Subject: [PATCH v4] BitmapHeapScan: Replace incorrect assert with
 reinitialization

04e72ed617be pushed the skip fetch optimization (allowing bitmap heap
scans to operate like index-only scans if none of the underlying data is
needed) into heap AM implementations of bitmap table scan callbacks.

04e72ed617be added an assert that all tuples in blocks eligible for the
optimization had been NULL-filled and emitted by the end of the scan.
This assert is incorrect when not all tuples need be scanned to execute
the query; for example: a join in which not all inner tuples need to be
scanned before skipping to the next outer tuple.

Remove the assert and reset the field on which it previously asserted to
avoid incorrectly emitting NULL-filled tuples from a previous scan on
rescan.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra
Reported-by: Melanie Plageman
Reproduced-by: Tomas Vondra, Richard Guo
Discussion: https://postgr.es/m/CAMbWs48orzZVXa7-vP9Nt7vQWLTE04Qy4PePaLQYsVNQgo6qRg%40mail.gmail.com
---
 src/backend/access/heap/heapam.c   |  4 ++--
 src/test/regress/expected/join.out | 38 ++
 src/test/regress/sql/join.sql  | 26 
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4be0dee4de..8600c22515 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1184,7 +1184,7 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_vmbuffer = InvalidBuffer;
 	}
 
-	Assert(scan->rs_empty_tuples_pending == 0);
+	scan->rs_empty_tuples_pending = 0;
 
 	/*
 	 * The read stream is reset on rescan. This must be done before
@@ -1216,7 +1216,7 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_vmbuffer))
 		ReleaseBuffer(scan->rs_vmbuffer);
 
-	Assert(scan->rs_empty_tuples_pending == 0);
+	scan->rs_empty_tuples_pending = 0;
 
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 0246d56aea..8829bd76e7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -7924,3 +7924,41 @@ where exists (select 1 from j3
 (13 rows)
 
 drop table j3;
+-- Exercise the "skip fetch" Bitmap Heap Scan optimization when candidate
+-- tuples are discarded. This may occur when:
+--   1. A join doesn't require all inner tuples to be scanned for each outer
+--  tuple, and
+--   2. The inner side is scanned using a bitmap heap scan, and
+--   3. The bitmap heap scan is eligible for the "skip fetch" optimization.
+--  This optimization is usable when no data from the underlying table is
+--  needed. Use a temp table so it is only visible to this backend and
+--  vacuum may reliably mark all blocks in the table all visible in the
+--  visibility map.
+CREATE TEMP TABLE skip_fetch (a INT, b INT) WITH (fillfactor=10);
+INSERT INTO skip_fetch SELECT i % 3, i FROM generate_series(0,30) i;
+CREATE INDEX ON skip_fetch(a);
+VACUUM 

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jelte Fennema-Nio
On Mon, 13 May 2024 at 15:38, Heikki Linnakangas  wrote:
> Here's a patch to implement that.

+   if (conn->sslnegotiation[0] == 'd' &&
+   conn->sslmode[0] != 'r' && conn->sslmode[0] != 'v')

I think these checks should use strcmp instead of checking magic first
characters. I see this same clever trick is used in the recently added
init_allowed_encryption_methods, and I think that should be changed to
use strcmp too for readability.




Re: cataloguing NOT NULL constraints

2024-05-13 Thread Alvaro Herrera
On 2024-May-13, Robert Haas wrote:

> On Sat, May 11, 2024 at 5:40 AM Alvaro Herrera  
> wrote:

> > Specifically, the problem is that I mentioned that we could restrict the
> > NOT NULL NO INHERIT addition in pg_dump for primary keys to occur only
> > in pg_upgrade; but it turns this is not correct.  In normal
> > dump/restore, there's an additional table scan to check for nulls when
> > the constraints is not there, so the PK creation would become measurably
> > slower.  (In a table with a million single-int rows, PK creation goes
> > from 2000ms to 2300ms due to the second scan to check for nulls).
> 
> I have a feeling that any theory of the form "X only needs to happen
> during pg_upgrade" is likely to be wrong. pg_upgrade isn't really
> doing anything especially unusual: just creating some objects and
> loading data. Those things can also be done at other times, so
> whatever is needed during pg_upgrade is also likely to be needed at
> other times. Maybe that's not sound reasoning for some reason or
> other, but that's my intuition.

True.  It may be that by setting up the upgrade SQL script differently,
we don't need to make the distinction at all.  I hope to be able to do
that.

> I'm sorry that I haven't been following this thread closely, but I'm
> confused about how we ended up here. What exactly are the user-visible
> behavior changes wrought by this patch, and how do they give rise to
> these issues?

The problematic point is the need to add NOT NULL constraints during
table creation that don't exist in the table being dumped, for
performance of primary key creation -- I called this a throwaway
constraint.  We needed to be able to drop those constraints after the PK
was created.  These were marked NO INHERIT to allow them to be dropped,
which is easier if the children don't have them.  This all worked fine.

However, at some point we realized that we needed to add NOT NULL
constraints in child tables for the columns in which the parent had a
primary key.  Then things become messy because we had the throwaway
constraints on one hand and the not-nulls that descend from the PK on
the other hand, where one was NO INHERIT and the other wasn't; worse if
the child also has a primary key.

It turned out that we didn't have any mechanism to transform a NO
INHERIT constraint into a regular one that would be inherited.  I added
one, didn't like the way it worked, tried to restrict it but that caused
other problems; this is the mess that led to the revert (pg_dump in
normal mode would emit scripts that fail for some legitimate cases).

One possible way forward might be to make pg_dump smarter by adding one
more query to know the relationship between constraints that must be
dropped and those that don't.  Another might be to allow multiple
not-null constraints on the same column (one inherits, the other
doesn't, and you can drop them independently).  There may be others.

> The other possibility that occurs to me is that I think the motivation
> for cataloging NOT NULL constraints was that we wanted to be able to
> track dependencies on them, or something like that, which seems like
> it might be able to create issues of the type that you're facing, but
> the details aren't clear to me.

NOT VALID constraints would be extremely useful, for one thing (because
then you don't need to exclusively-lock the table during a long scan in
order to add a constraint), and it's just one step away from having
these constraints be catalogued.  It was also fixing some inconsistent
handling of inheritance cases.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Heikki Linnakangas

On 10/05/2024 16:50, Heikki Linnakangas wrote:

New proposal:

- Remove the "try both" mode completely, and rename "requiredirect" to
just "direct". So there would be just two modes: "postgres" and
"direct". On reflection, the automatic fallback mode doesn't seem very
useful. It would make sense as the default, because then you would get
the benefits automatically in most cases but still be compatible with
old servers. But if it's not the default, you have to fiddle with libpq
settings anyway to enable it, and then you might as well use the
"requiredirect" mode when you know the server supports it. There isn't
anything wrong with it as such, but given how much confusion there's
been on how this all works, I'd prefer to cut this back to the bare
minimum now. We can add it back in the future, and perhaps make it the
default at the same time. This addresses points 2. and 3. above.

and:

- Only allow sslnegotiation=direct with sslmode=require or higher. This
is what you, Jacob, wanted to do all along, and addresses point 1.


Here's a patch to implement that.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 2a7bde8502966a634b62d4109f0cde5fdc685da2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Mon, 13 May 2024 16:36:54 +0300
Subject: [PATCH 1/1] Remove option to fall back from direct to postgres SSL
 negotiation

There were three problems with the sslnegotiation options:

1. The sslmode=prefer and sslnegotiation=requiredirect combination was
somewhat dangerous, as you might unintentionally fall back to
plaintext authentication when connecting to a pre-v17 server.

2. There was an asymmetry between 'postgres' and 'direct'
options. 'postgres' meant "try only traditional negotiation", while
'direct' meant "try direct first, and fall back to traditional
negotiation if it fails". That was apparent only if you knew that the
'requiredirect' mode also exists.

3. The "require" word in 'requiredirect' suggests that it's somehow
more strict or more secure, similar to sslmode. However, I don't
consider direct SSL connections to be a security feature.

To address these problems:

- Only allow sslnegotiation='direct' if sslmode='require' or
stronger. And for the record, Jacob and Robert felt that we should do
that (or have sslnegotiation='direct' imply sslmode='require') anyway,
regardless of the first issue.

- Remove the 'direct' mode that falls back to traditional negotiation,
and rename what was called 'requiredirect' to 'direct' instead. In
other words, there is no "try both methods" option anymore, 'postgres'
now means the traditional negotiation and 'direct' means a direct SSL
connection.

Discussion: https://www.postgresql.org/message-id/d3b1608a-a1b6-4eda-9ec5-ddb3e4375808%40iki.fi
---
 doc/src/sgml/libpq.sgml   |  49 ++--
 src/interfaces/libpq/fe-connect.c |  52 ++--
 src/interfaces/libpq/libpq-int.h  |   3 +-
 .../libpq/t/005_negotiate_encryption.pl   | 254 --
 4 files changed, 150 insertions(+), 208 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 1d32c226d8..b32e497b1b 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1772,15 +1772,18 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   sslnegotiation
   

-This option controls whether PostgreSQL
-will perform its protocol negotiation to request encryption from the
-server or will just directly make a standard SSL
-connection.  Traditional PostgreSQL
-protocol negotiation is the default and the most flexible with
-different server configurations. If the server is known to support
-direct SSL connections then the latter requires one
-fewer round trip reducing connection latency and also allows the use
-of protocol agnostic SSL network tools.
+This option controls how SSL encryption is negotiated with the server,
+if SSL is used. In the default postgres mode, the
+client first asks the server if SSL is supported. In
+direct mode, the client starts the standard SSL
+handshake directly after establishing the TCP/IP connection. Traditional
+PostgreSQL protocol negotiation is the most
+flexible with different server configurations. If the server is known
+to support direct SSL connections then the latter
+requires one fewer round trip reducing connection latency and also
+allows the use of protocol agnostic SSL network tools. The direct SSL
+option was introduced in PostgreSQL version
+17.

 
 
@@ -1798,32 +1801,14 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   direct
   

- first attempt to establish a standard SSL connection and if that
- fails reconnect and perform the negotiation. This fallback
- process adds significant latency if the initial SSL connection
-   

Re: Fix parallel vacuum buffer usage reporting

2024-05-13 Thread Nazir Bilal Yavuz
Hi,

On Fri, 10 May 2024 at 19:09, Masahiko Sawada  wrote:
>
> On Fri, May 10, 2024 at 7:26 PM Nazir Bilal Yavuz  wrote:
> >
> > Hi,
> >
> > Thank you for working on this!
> >
> > On Wed, 1 May 2024 at 06:37, Masahiko Sawada  wrote:
> > >
> > > Thank you for further testing! I've pushed the patch.
> >
> > I realized a behaviour change while looking at 'Use pgBufferUsage for
> > block reporting in analyze' thread [1]. Since that change applies here
> > as well, I thought it is better to mention it here.
> >
> > Before this commit, VacuumPageMiss did not count the blocks if its
> > read was already completed by other backends [2]. Now,
> > 'pgBufferUsage.local_blks_read + pgBufferUsage.shared_blks_read'
> > counts every block attempted to be read; possibly double counting if
> > someone else has already completed the read.
>
> True. IIUC there is such a difference only in HEAD but not in v15 and
> v16. The following comment in WaitReadBuffers() says that it's a
> traditional behavior that we count blocks as read even if someone else
> has already completed its I/O:
>
> /*
>  * We count all these blocks as read by this backend.  This is traditional
>  * behavior, but might turn out to be not true if we find that someone
>  * else has beaten us and completed the read of some of these blocks.  In
>  * that case the system globally double-counts, but we traditionally don't
>  * count this as a "hit", and we don't have a separate counter for "miss,
>  * but another backend completed the read".
>  */
>
> So I think using pgBufferUsage for (parallel) vacuum is a legitimate
> usage and makes it more consistent with other parallel operations.

That sounds logical. Thank you for the clarification.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: cataloguing NOT NULL constraints

2024-05-13 Thread Robert Haas
On Sat, May 11, 2024 at 5:40 AM Alvaro Herrera  wrote:
> I have found two more problems that I think are going to require some
> more work to fix, so I've decided to cut my losses now and revert the
> whole.  I'll come back again in 18 with these problems fixed.

Bummer, but makes sense.

> Specifically, the problem is that I mentioned that we could restrict the
> NOT NULL NO INHERIT addition in pg_dump for primary keys to occur only
> in pg_upgrade; but it turns this is not correct.  In normal
> dump/restore, there's an additional table scan to check for nulls when
> the constraints is not there, so the PK creation would become measurably
> slower.  (In a table with a million single-int rows, PK creation goes
> from 2000ms to 2300ms due to the second scan to check for nulls).

I have a feeling that any theory of the form "X only needs to happen
during pg_upgrade" is likely to be wrong. pg_upgrade isn't really
doing anything especially unusual: just creating some objects and
loading data. Those things can also be done at other times, so
whatever is needed during pg_upgrade is also likely to be needed at
other times. Maybe that's not sound reasoning for some reason or
other, but that's my intuition.

> The addition of NOT NULL NO INHERIT constraints for this purpose
> collides with addition of constraints for other reasons, and it forces
> us to do unpleasant things such as altering an existing constraint to go
> from NO INHERIT to INHERIT.  If this happens only during pg_upgrade,
> that would be okay IMV; but if we're forced to allow in normal operation
> (and in some cases we are), it could cause inconsistencies, so I don't
> want to do that.  I see a way to fix this (adding another query in
> pg_dump that detects which columns descend from ones used in PKs in
> ancestor tables), but that's definitely too much additional mechanism to
> be adding this late in the cycle.

I'm sorry that I haven't been following this thread closely, but I'm
confused about how we ended up here. What exactly are the user-visible
behavior changes wrought by this patch, and how do they give rise to
these issues? One change I know about is that a constraint that is
explicitly catalogued (vs. just existing implicitly) has a name. But
it isn't obvious to me that such a difference, by itself, is enough to
cause all of these problems: if a NOT NULL constraint is created
without a name, then I suppose we just have to generate one. Maybe the
fact that the constraints have names somehow causes ugliness later,
but I can't quite understand why it would.

The other possibility that occurs to me is that I think the motivation
for cataloging NOT NULL constraints was that we wanted to be able to
track dependencies on them, or something like that, which seems like
it might be able to create issues of the type that you're facing, but
the details aren't clear to me. Changing any behavior in this area
seems like it could be quite tricky, because of things like the
interaction between PRIMARY KEY and NOT NULL, which is rather
idiosyncratic but upon which a lot of existing SQL (including SQL not
controlled by us) likely depends. If there's not a clear plan for how
we keep all the stuff that works today working, I fear we'll end up in
an endless game of whack-a-mole. If you've already written the design
ideas down someplace, I'd appreciate a pointer in the right direction.

Or maybe there's some other issue entirely. In any case, sorry about
the revert, and sorry that I haven't paid more attention to this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

2024-05-13 Thread Matthias van de Meent
Hi,

My collegue Konstantin Knizhnik pointed out that we fail to mark pages
with a non-standard page layout with page_std=false in
RelationCopyStorageUsingBuffer when we WAL-log them. This causes us to
interpret the registered buffer as a standard buffer, and omit the
hole in the page, which for FSM/VM pages covers the whole page.

The immediate effect of this bug is that replicas and primaries in a
physical replication system won't have the same data in their VM- and
FSM-forks until the first VACUUM on the new database has WAL-logged
these pages again. Whilst not actively harmful for the VM/FSM
subsystems, it's definitely suboptimal.
Secondary unwanted effects are that AMs that use the buffercache- but
which don't use or update the pageheader- also won't see the main data
logged in WAL, thus potentially losing user data in the physical
replication stream or with a system crash. I've not looked for any
such AMs and am unaware of any that would have this issue, but it's
better to fix this.


PFA a patch that fixes this issue, by assuming that all pages in the
source database utilize a non-standard page layout.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)


v1-0001-Fix-logging-of-non-standard-pages-in-RelationCopy.patch
Description: Binary data


Re: Allowing additional commas between columns, and at the end of the SELECT clause

2024-05-13 Thread Dagfinn Ilmari Mannsåker
Matthias van de Meent  writes:

> On Mon, 13 May 2024 at 10:42, Artur Formella  
> wrote:
>> Motivation:
>> Commas of this type are allowed in many programming languages, in some
>> it is even recommended to use them at the ends of lists or objects.
>
> Single trailing commas are a feature that's more and more common in
> languages, yes, but arbitrary excess commas is new to me. Could you
> provide some examples of popular languages which have that, as I can't
> think of any.

The only one I can think of is Perl, which I'm not sure counts as
popular any more.  JavaScript allows consecutive commas in array
literals, but they're not no-ops, they create empty array slots:

❯ js
Welcome to Node.js v18.19.0.
Type ".help" for more information.
> [1,,2,,]
[ 1, <1 empty item>, 2, <1 empty item> ]

- ilmari




RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for giving comments! New patch was posted in [1].

> 0.1 General - Patch name
> 
> /SUBSCIRPTION/SUBSCRIPTION/

Fixed.

> ==
> 0.2 General - Apply
> 
> FYI, there are whitespace warnings:
> 
> git
> apply ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTI
> ON-.-S.patch
> ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-
> S.patch:191:
> trailing whitespace.
> # command will abort the prepared transaction and succeed.
> warning: 1 line adds whitespace errors.

I didn't recognize, fixed.

> ==
> 0.3 General - Regression test fails
> 
> The subscription regression tests are not working.
> 
> ok 158   + publication  1187 ms
> not ok 159   + subscription  123 ms
> 
> See review comments #4 and #5 below for the reason why.

Yeah, I missed to update the expected result. Fixed.

> ==
> src/sgml/ref/alter_subscription.sgml
> 
> 1.
>   
>The two_phase parameter can only be altered when
> the
> -  subscription is disabled. When altering the parameter from
> on
> -  to off, the backend process checks prepared
> -  transactions done by the logical replication worker and aborts them.
> +  subscription is disabled. Altering the parameter from
> on
> +  to off will be failed when there are prepared
> +  transactions done by the logical replication worker. If you want to 
> alter
> +  the parameter forcibly in this case, force_alter
> +  option must be set to on at the same time. If
> +  specified, the backend process aborts prepared transactions.
>   
> 1a.
> That "will be failed when..." seems strange. Maybe say "will give an
> error when..."
> 
> ~
> 1b.
> Because "force" is a verb, I think true/false is more natural than
> on/off for this new boolean option. e.g. it acts more like a "flag"
> than a "mode". See all the other boolean options in CREATE
> SUBSCRIPTION -- those are mostly all verbs too and are all true/false
> AFAIK.

Fixed, but note that the part was moved.

> 
> ==
> 
> 2. CREATE SUBSCRIPTION
> 
> For my previous review, two comments [v7-0004#2] and [v7-0004#3] were
> not addressed. Kuroda-san wrote:
> Hmm, but the parameter cannot be used for CREATE SUBSCRIPTION. Should
> we modify to accept and add the description in the doc?
> 
> ~
> 
> Yes, that is what I am suggesting. IMO it is odd for the user to be
> able to ALTER a parameter that cannot be included in the CREATE
> SUBSCRIPTION in the first place. AFAIK there are no other parameters
> that behave that way.

Hmm. I felt that this change required the new attribute in pg_subscription 
system
catalog. Previously I did not like because it contains huge change, but...I 
tried to do.
New attribute 'subforcealter', and some parts were updated accordingly.

> src/backend/commands/subscriptioncmds.c
> 
> 3. AlterSubscription
> 
> + if (!opts.force_alter)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot alter %s when there are prepared transactions",
> + "two_phase = off"),
> + errhint("Resolve these transactions or set %s at the same time, and
> then try again.",
> + "force_alter = true")));
> 
> I think saying "at the same time" in the hint is unnecessary. Surely
> the user is allowed to set this parameter separately if they want to?
> 
> e.g.
> ALTER SUBSCRIPTION sub SET (force_alter=true);
> ALTER SUBSCRIPTION sub SET (two_phase=off);

Actually, it was correct. Since force_alter was not recorded in the system 
catalog, it must
be specified at the same time.
Now, we allow to be separated, so removed.

> ==
> src/test/regress/expected/subscription.out
> 
> 4.
> +-- fail - force_alter cannot be set alone
> +ALTER SUBSCRIPTION regress_testsub SET (force_alter = true);
> +ERROR:  force_alter must be specified with two_phase
> 
> This error cannot happen. You removed that error!

Fixed.

> ==
> src/test/subscription/t/099_twophase_added.pl
> 
> 6.
> +# Try altering the two_phase option to "off." The command will fail since 
> there
> +# is a prepared transaction and the force option is not specified.
> +my $stdout;
> +my $stderr;
> +
> +($result, $stdout, $stderr) = $node_subscriber->psql(
> + 'postgres', "ALTER SUBSCRIPTION regress_sub SET (two_phase = off);");
> +ok($stderr =~ /cannot alter two_phase = off when there are prepared
> transactions/,
> + 'ALTER SUBSCRIPTION failed');
> 
> /force option is not specified./'force_alter' option is not specified as 
> true./

Fixed.

> 
> 7.
> +# Verify the prepared transaction still exists
> +$result = $node_subscriber->safe_psql('postgres',
> +"SELECT count(*) FROM pg_prepared_xacts;");
> +is($result, q(1), "prepared transaction still exits");
> +
> 
> TYPO: /exits/exists/

Fixed.

> 
> ~~~
> 
> 8.
> +# Alter the two_phase with the force_alter option. Apart from the above, the
> +# command will abort the prepared transaction and succeed.
> 

RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for reviewing! Patch can be available in [1].

> ==
> src/sgml/ref/alter_subscription.sgml
> 
> 1.
> + 
> +  The two_phase parameter can only be altered when
> the
> +  subscription is disabled. When altering the parameter from
> on
> +  to off, the backend process checks prepared
> +  transactions done by the logical replication worker and aborts them.
> + 
> 
> The text may be OK as-is, but I was wondering if it might be better to
> give a more verbose explanation.
> 
> BEFORE
> ... the backend process checks prepared transactions done by the
> logical replication worker and aborts them.
> 
> SUGGESTION
> ... the backend process checks for any incomplete prepared
> transactions done by the logical replication worker (from when
> two_phase parameter was still "on") and, if any are found, those are
> aborted.
>

Fixed.

> ==
> src/backend/commands/subscriptioncmds.c
> 
> 2. AlterSubscription
> 
> - /*
> - * Since the altering two_phase option of subscriptions
> - * also leads to the change of slot option, this command
> - * cannot be rolled back. So prevent we are in the
> - * transaction block.
> + * If two_phase was enabled, there is a possibility the
> + * transactions has already been PREPARE'd. They must be
> + * checked and rolled back.
>   */
> 
> BEFORE
> ... there is a possibility the transactions has already been PREPARE'd.
> 
> SUGGESTION
> ... there is a possibility that transactions have already been PREPARE'd.

Fixed.

> 3. AlterSubscription
> + /*
> + * Since the altering two_phase option of subscriptions
> + * (especially on->off case) also leads to the
> + * change of slot option, this command cannot be rolled
> + * back. So prevent we are in the transaction block.
> + */
>   PreventInTransactionBlock(isTopLevel,
> "ALTER SUBSCRIPTION ... SET (two_phase = off)");
> 
> 
> This comment is a bit vague and includes some typos, but IIUC these
> problems will already be addressed by the 0002 patch changes.AFAIK
> patch 0003 is only moving the 0002 comment.

Yeah, the comment was updated accordingly.

> 
> ==
> src/test/subscription/t/099_twophase_added.pl
> 
> 4.
> +# Check the case that prepared transactions exist on the subscriber node
> +#
> +# If the two_phase is altering from "on" to "off" and there are prepared
> +# transactions on the subscriber, they must be aborted. This test checks it.
> +
> 
> Similar to the comment that I gave for v8-0002. I think there should
> be  comment for the major test comment to
> distinguish it from comments for the sub-steps.

Added.

> 5.
> +# Verify the prepared transaction are aborted because two_phase is changed to
> +# "off".
> +$result = $node_subscriber->safe_psql('postgres',
> +"SELECT count(*) FROM pg_prepared_xacts;");
> +is($result, q(0), "prepared transaction done by worker is aborted");
> +
> 
> /the prepared transaction are aborted/any prepared transactions are aborted/

Fixed.

[1]: 
https://www.postgresql.org/message-id/OSBPR01MB2552FEA48D265EA278AA9F7AF5E22%40OSBPR01MB2552.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: pg_stat_advisor extension

2024-05-13 Thread Ilia Evdokimov
1. In the case of parallel workers the plan_rows value has a different semantics than the number of rows predicted. Just explore 

get_parallel_divisor().

2. The extension recommends new statistics immediately upon an error finding. But what if the reason for the error is stale statistics? Or 
this error may be raised for only one specific set of constants, and 
estimation will be done well in another 99.% of cases for the same 
expression.


The new parameter, `pg_stat_advisor.analyze_scale_factor`, can suggest 
the execution of the ANALYZE command on specific tables. The extension 
now evaluates the ratio of `n_live_tup` (number of live tuples) to 
`n_mod_since_analyze` (number of modifications since last analyze) in 
the `pg_stat_all_tables` catalog. If this ratio exceeds the value 
specified in `analyze_scale_factor`, the extension will suggest an 
update to the table's statistics.


There are a lot of parameters that influences on estimated rows. 
Statistics might not help improve estimated rows. This feature is 
designed to provide users with data-driven insights to decide whether 
updating statistics via the ANALYZE command could potentially improve 
query performance. By suggesting rather than automatically executing 
statistics updates, we empower you to make informed decisions based on 
the specific needs and conditions of your database environment.


I've developed an extension that provides suggestions on whether to 
update or create statistics for your PostgreSQL database, without 
executing any changes. This approach allows you to consider various 
parameters that influence row estimates and make informed decisions 
about optimizing your database's performance.


Your feedback is invaluable, and we look forward to hearing about your 
experiences and any improvements you might suggest. Best regards, Ilia 
Evdokimov Tantor Labs LLC.
From eb998bea96a3640d240afa63e08cc8cf98925bf7 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 13 May 2024 14:41:59 +0300
Subject: [PATCH] 'pg_stat_advisor' extension

This service as a hook into executor. It has two GUC-parameters.

pg_stat_advisor.analyze_scale_factor: if ratio of pg_stat_all_tables.n_live_tup to pg_stat_all_tables.n_mod_since_analyze is greater than pg_stat_advisor.analyze_scale_factor extension prints suggestion executing ANALYZE command.

pg_stat_advisor.suggest_statistics_threshold: the ratio of total rows to planned rows is greater than or equal to this threshold the extension prints suggestion executing the creation of statistics, using the naming format 'relationName_columns'
---
 contrib/Makefile  |   1 +
 contrib/pg_stat_advisor/Makefile  |  20 +
 .../expected/pg_stat_advisor.out  |  52 ++
 contrib/pg_stat_advisor/meson.build   |  30 +
 contrib/pg_stat_advisor/pg_stat_advisor.c | 560 ++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  24 +
 6 files changed, 687 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index abd780f277..d6ce2fe562 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -33,6 +33,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 00..f31b939e8a
--- /dev/null
+++ b/contrib/pg_stat_advisor/Makefile
@@ -0,0 +1,20 @@
+# contrib/pg_stat_advisor/Makefile
+
+MODULE_big = pg_stat_advisor
+OBJS = \
+	$(WIN32RES) \
+	pg_stat_advisor.o
+PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics"
+
+REGRESS = pg_stat_advisor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_stat_advisor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_stat_advisor/expected/pg_stat_advisor.out b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out
new file mode 100644
index 00..8f3dab2c2f
--- /dev/null
+++ b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out
@@ -0,0 +1,52 @@
+LOAD 'pg_stat_advisor';
+SET pg_stat_advisor.analyze_scale_factor = 0.4;
+SET pg_stat_advisor.suggest_statistics_threshold = 0.11;
+CREATE TABLE my_tbl(fld_1 INTEGER, fld_2 BIGINT) WITH (autovacuum_enabled = false);
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+ i/100 as fld_1,
+ i/500 as fld_2
+FROM generate_series(1, 1000) s(i);
+ANALYZE my_tbl;
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+ i/100 as fld_1,

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-13 Thread Pavel Borisov
A correction of a typo in previous message:
non-leaf pages iteration cycles (under P_ISLEAF(topaque)) -> non-leaf pages
iteration cycles (under !P_ISLEAF(topaque))

On Mon, 13 May 2024 at 16:19, Pavel Borisov  wrote:

>
>
> On Mon, 13 May 2024 at 15:55, Pavel Borisov 
> wrote:
>
>> Hi, Alexander!
>>
>> On Mon, 13 May 2024 at 05:42, Alexander Korotkov 
>> wrote:
>>
>>> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov
>>>  wrote:
>>> > On Sat, May 11, 2024 at 4:13 AM Mark Dilger
>>> >  wrote:
>>> > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov <
>>> aekorot...@gmail.com> wrote:
>>> > > > The only bt_target_page_check() caller is
>>> > > > bt_check_level_from_leftmost(), which overrides state->target in
>>> the
>>> > > > next iteration anyway.  I think the patch is just refactoring to
>>> > > > eliminate the confusion pointer by Peter Geoghegan upthread.
>>> > >
>>> > > I find your argument unconvincing.
>>> > >
>>> > > After bt_target_page_check() returns at line 919, and before
>>> bt_check_level_from_leftmost() overrides state->target in the next
>>> iteration, bt_check_level_from_leftmost() conditionally fetches an item
>>> from the page referenced by state->target.  See line 963.
>>> > >
>>> > > I'm left with four possibilities:
>>> > >
>>> > >
>>> > > 1)  bt_target_page_check() never gets to the code that uses
>>> "rightpage" rather than "state->target" in the same iteration where
>>> bt_check_level_from_leftmost() conditionally fetches an item from
>>> state->target, so the change you're making doesn't matter.
>>> > >
>>> > > 2)  The code prior to v2-0003 was wrong, having changed
>>> state->target in an inappropriate way, causing the wrong thing to happen at
>>> what is now line 963.  The patch fixes the bug, because state->target no
>>> longer gets overwritten where you are now using "rightpage" for the value.
>>> > >
>>> > > 3)  The code used to work, having set up state->target correctly in
>>> the place where you are now using "rightpage", but v2-0003 has broken that.
>>> > >
>>> > > 4)  It's been broken all along and your patch just changes from
>>> wrong to wrong.
>>> > >
>>> > >
>>> > > If you believe (1) is true, then I'm complaining that you are
>>> relying far to much on action at a distance, and that you are not
>>> documenting it.  Even with documentation of this interrelationship, I'd be
>>> unhappy with how brittle the code is.  I cannot easily discern that the two
>>> don't ever happen in the same iteration, and I'm not at all convinced one
>>> way or the other.  I tried to set up some Asserts about that, but none of
>>> the test cases actually reach the new code, so adding Asserts doesn't help
>>> to investigate the question.
>>> > >
>>> > > If (2) is true, then I'm complaining that the commit message doesn't
>>> mention the fact that this is a bug fix.  Bug fixes should be clearly
>>> documented as such, otherwise future work might assume the commit can be
>>> reverted with only stylistic consequences.
>>> > >
>>> > > If (3) is true, then I'm complaining that the patch is flat busted.
>>> > >
>>> > > If (4) is true, then maybe we should revert the entire feature, or
>>> have a discussion of mitigation efforts that are needed.
>>> > >
>>> > > Regardless of which of 1..4 you pick, I think it could all do with
>>> more regression test coverage.
>>> > >
>>> > >
>>> > > For reference, I said something similar earlier today in another
>>> email to this thread:
>>> > >
>>> > > This patch introduces a change that stores a new page into variable
>>> "rightpage" rather than overwriting "state->target", which the old
>>> implementation most certainly did.  That means that after returning from
>>> bt_target_page_check() into the calling function
>>> bt_check_level_from_leftmost() the value in state->target is not what it
>>> would have been prior to this patch.  Now, that'd be irrelevant if nobody
>>> goes on to consult that value, but just 44 lines further down in
>>> bt_check_level_from_leftmost() state->target is clearly used.  So the
>>> behavior at that point is changing between the old and new versions of the
>>> code, and I think I'm within reason to ask if it was wrong before the
>>> patch, wrong after the patch, or something else?  Is this a bug being
>>> introduced, being fixed, or ... ?
>>> >
>>> > Thank you for your analysis.  I'm inclined to believe in 2, but not
>>> > yet completely sure.  It's really pity that our tests don't cover
>>> > this.  I'm investigating this area.
>>>
>>> It seems that I got to the bottom of this.  Changing
>>> BtreeCheckState.target for a cross-page unique constraint check is
>>> wrong, but that happens only for leaf pages.  After that
>>> BtreeCheckState.target is only used for setting the low key.  The low
>>> key is only used for non-leaf pages.  So, that didn't lead to any
>>> visible bug.  I've revised the commit message to reflect this.
>>>
>>
>> I agree with your analysis regarding state->target:
>> - when the unique check is on, 

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-13 Thread Pavel Borisov
On Mon, 13 May 2024 at 15:55, Pavel Borisov  wrote:

> Hi, Alexander!
>
> On Mon, 13 May 2024 at 05:42, Alexander Korotkov 
> wrote:
>
>> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov
>>  wrote:
>> > On Sat, May 11, 2024 at 4:13 AM Mark Dilger
>> >  wrote:
>> > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov <
>> aekorot...@gmail.com> wrote:
>> > > > The only bt_target_page_check() caller is
>> > > > bt_check_level_from_leftmost(), which overrides state->target in the
>> > > > next iteration anyway.  I think the patch is just refactoring to
>> > > > eliminate the confusion pointer by Peter Geoghegan upthread.
>> > >
>> > > I find your argument unconvincing.
>> > >
>> > > After bt_target_page_check() returns at line 919, and before
>> bt_check_level_from_leftmost() overrides state->target in the next
>> iteration, bt_check_level_from_leftmost() conditionally fetches an item
>> from the page referenced by state->target.  See line 963.
>> > >
>> > > I'm left with four possibilities:
>> > >
>> > >
>> > > 1)  bt_target_page_check() never gets to the code that uses
>> "rightpage" rather than "state->target" in the same iteration where
>> bt_check_level_from_leftmost() conditionally fetches an item from
>> state->target, so the change you're making doesn't matter.
>> > >
>> > > 2)  The code prior to v2-0003 was wrong, having changed state->target
>> in an inappropriate way, causing the wrong thing to happen at what is now
>> line 963.  The patch fixes the bug, because state->target no longer gets
>> overwritten where you are now using "rightpage" for the value.
>> > >
>> > > 3)  The code used to work, having set up state->target correctly in
>> the place where you are now using "rightpage", but v2-0003 has broken that.
>> > >
>> > > 4)  It's been broken all along and your patch just changes from wrong
>> to wrong.
>> > >
>> > >
>> > > If you believe (1) is true, then I'm complaining that you are relying
>> far to much on action at a distance, and that you are not documenting it.
>> Even with documentation of this interrelationship, I'd be unhappy with how
>> brittle the code is.  I cannot easily discern that the two don't ever
>> happen in the same iteration, and I'm not at all convinced one way or the
>> other.  I tried to set up some Asserts about that, but none of the test
>> cases actually reach the new code, so adding Asserts doesn't help to
>> investigate the question.
>> > >
>> > > If (2) is true, then I'm complaining that the commit message doesn't
>> mention the fact that this is a bug fix.  Bug fixes should be clearly
>> documented as such, otherwise future work might assume the commit can be
>> reverted with only stylistic consequences.
>> > >
>> > > If (3) is true, then I'm complaining that the patch is flat busted.
>> > >
>> > > If (4) is true, then maybe we should revert the entire feature, or
>> have a discussion of mitigation efforts that are needed.
>> > >
>> > > Regardless of which of 1..4 you pick, I think it could all do with
>> more regression test coverage.
>> > >
>> > >
>> > > For reference, I said something similar earlier today in another
>> email to this thread:
>> > >
>> > > This patch introduces a change that stores a new page into variable
>> "rightpage" rather than overwriting "state->target", which the old
>> implementation most certainly did.  That means that after returning from
>> bt_target_page_check() into the calling function
>> bt_check_level_from_leftmost() the value in state->target is not what it
>> would have been prior to this patch.  Now, that'd be irrelevant if nobody
>> goes on to consult that value, but just 44 lines further down in
>> bt_check_level_from_leftmost() state->target is clearly used.  So the
>> behavior at that point is changing between the old and new versions of the
>> code, and I think I'm within reason to ask if it was wrong before the
>> patch, wrong after the patch, or something else?  Is this a bug being
>> introduced, being fixed, or ... ?
>> >
>> > Thank you for your analysis.  I'm inclined to believe in 2, but not
>> > yet completely sure.  It's really pity that our tests don't cover
>> > this.  I'm investigating this area.
>>
>> It seems that I got to the bottom of this.  Changing
>> BtreeCheckState.target for a cross-page unique constraint check is
>> wrong, but that happens only for leaf pages.  After that
>> BtreeCheckState.target is only used for setting the low key.  The low
>> key is only used for non-leaf pages.  So, that didn't lead to any
>> visible bug.  I've revised the commit message to reflect this.
>>
>
> I agree with your analysis regarding state->target:
> - when the unique check is on, state->target was reassigned only for the
> leaf pages (under P_ISLEAF(topaque) in bt_target_page_check).
> - in this level (leaf) in bt_check_level_from_leftmost() this value of
> state->target was used to get state->lowkey. Then it was reset (in the next
> iteration of do loop in in bt_check_level_from_leftmost()
> - state->lowkey 

Re: Allowing additional commas between columns, and at the end of the SELECT clause

2024-05-13 Thread Étienne BERSAC
Hi,

As a developer, I love this feature.

But as a developer of an universal TDOP SQL parser[1], this can be a
pain. Please request it to the standard.

Regards,
Étienne


[1]: https://gitlab.com/dalibo/transqlate




Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-13 Thread Pavel Borisov
Hi, Alexander!

On Mon, 13 May 2024 at 05:42, Alexander Korotkov 
wrote:

> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov
>  wrote:
> > On Sat, May 11, 2024 at 4:13 AM Mark Dilger
> >  wrote:
> > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov <
> aekorot...@gmail.com> wrote:
> > > > The only bt_target_page_check() caller is
> > > > bt_check_level_from_leftmost(), which overrides state->target in the
> > > > next iteration anyway.  I think the patch is just refactoring to
> > > > eliminate the confusion pointer by Peter Geoghegan upthread.
> > >
> > > I find your argument unconvincing.
> > >
> > > After bt_target_page_check() returns at line 919, and before
> bt_check_level_from_leftmost() overrides state->target in the next
> iteration, bt_check_level_from_leftmost() conditionally fetches an item
> from the page referenced by state->target.  See line 963.
> > >
> > > I'm left with four possibilities:
> > >
> > >
> > > 1)  bt_target_page_check() never gets to the code that uses
> "rightpage" rather than "state->target" in the same iteration where
> bt_check_level_from_leftmost() conditionally fetches an item from
> state->target, so the change you're making doesn't matter.
> > >
> > > 2)  The code prior to v2-0003 was wrong, having changed state->target
> in an inappropriate way, causing the wrong thing to happen at what is now
> line 963.  The patch fixes the bug, because state->target no longer gets
> overwritten where you are now using "rightpage" for the value.
> > >
> > > 3)  The code used to work, having set up state->target correctly in
> the place where you are now using "rightpage", but v2-0003 has broken that.
> > >
> > > 4)  It's been broken all along and your patch just changes from wrong
> to wrong.
> > >
> > >
> > > If you believe (1) is true, then I'm complaining that you are relying
> far to much on action at a distance, and that you are not documenting it.
> Even with documentation of this interrelationship, I'd be unhappy with how
> brittle the code is.  I cannot easily discern that the two don't ever
> happen in the same iteration, and I'm not at all convinced one way or the
> other.  I tried to set up some Asserts about that, but none of the test
> cases actually reach the new code, so adding Asserts doesn't help to
> investigate the question.
> > >
> > > If (2) is true, then I'm complaining that the commit message doesn't
> mention the fact that this is a bug fix.  Bug fixes should be clearly
> documented as such, otherwise future work might assume the commit can be
> reverted with only stylistic consequences.
> > >
> > > If (3) is true, then I'm complaining that the patch is flat busted.
> > >
> > > If (4) is true, then maybe we should revert the entire feature, or
> have a discussion of mitigation efforts that are needed.
> > >
> > > Regardless of which of 1..4 you pick, I think it could all do with
> more regression test coverage.
> > >
> > >
> > > For reference, I said something similar earlier today in another email
> to this thread:
> > >
> > > This patch introduces a change that stores a new page into variable
> "rightpage" rather than overwriting "state->target", which the old
> implementation most certainly did.  That means that after returning from
> bt_target_page_check() into the calling function
> bt_check_level_from_leftmost() the value in state->target is not what it
> would have been prior to this patch.  Now, that'd be irrelevant if nobody
> goes on to consult that value, but just 44 lines further down in
> bt_check_level_from_leftmost() state->target is clearly used.  So the
> behavior at that point is changing between the old and new versions of the
> code, and I think I'm within reason to ask if it was wrong before the
> patch, wrong after the patch, or something else?  Is this a bug being
> introduced, being fixed, or ... ?
> >
> > Thank you for your analysis.  I'm inclined to believe in 2, but not
> > yet completely sure.  It's really pity that our tests don't cover
> > this.  I'm investigating this area.
>
> It seems that I got to the bottom of this.  Changing
> BtreeCheckState.target for a cross-page unique constraint check is
> wrong, but that happens only for leaf pages.  After that
> BtreeCheckState.target is only used for setting the low key.  The low
> key is only used for non-leaf pages.  So, that didn't lead to any
> visible bug.  I've revised the commit message to reflect this.
>

I agree with your analysis regarding state->target:
- when the unique check is on, state->target was reassigned only for the
leaf pages (under P_ISLEAF(topaque) in bt_target_page_check).
- in this level (leaf) in bt_check_level_from_leftmost() this value of
state->target was used to get state->lowkey. Then it was reset (in the next
iteration of do loop in in bt_check_level_from_leftmost()
- state->lowkey lives until the end of pages level (leaf) iteration cycle.
Then, low-key is reset (state->lowkey = NULL in the end of
 bt_check_level_from_leftmost())
- 

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Heikki Linnakangas

On 13/05/2024 12:50, Jelte Fennema-Nio wrote:

On Sun, 12 May 2024 at 23:39, Heikki Linnakangas  wrote:

In v18, I'd like to make sslmode=require the default. Or maybe introduce
a new setting like "encryption=ssl|gss|none", defaulting to 'ssl'. If we
want to encourage encryption, that's the right way to do it. (I'd still
recommend everyone to use an explicit sslmode=require in their
connection strings for many years, though, because you might be using an
older client without realizing it.)


I'm definitely a huge proponent of making the connection defaults more
secure. But as described above sslmode=require is still extremely
insecure and I don't think we gain anything substantial by making it
the default. I think the only useful secure default would be to use
sslmode=verify-full (with probably some automatic fallback to
sslmode=prefer when connecting to hardcoded IPs or localhost). Which
probably means that sslrootcert=system should also be made the
default. Which would mean that ~/.postgresql/root.crt would not be the
default anymore, which I personally think is fine but others likely
disagree.


"channel_binding=require sslmode=require" also protects from MITM attacks.

I think these options should be designed from the user's point of view, 
so that the user can specify the risks they're willing to accept, and 
the details of how that's accomplished are handled by libpq. For 
example, I'm OK with (tick all that apply):


[ ] passive eavesdroppers seeing all the traffic
[ ] MITM being able to hijack the session
[ ] connecting without verifying the server's identity
[ ] divulging the plaintext password to the server
[ ] ...

The requirements for whether SSL or GSS encryption is required, whether 
the server's certificate needs to signed with known CA, etc. can be 
derived from those. For example, if you need protection from 
eavesdroppers, SSL or GSS encryption must be used. If you need to verify 
the server's identity, it implies sslmode=verify-CA or 
channel_binding=true. If you don't want to divulge the password, it 
implies a suitable require_auth setting. I don't have a concrete 
proposal yet, but something like that. And the defaults for those are up 
for debate.


psql could perhaps help by listing the above properties at the beginning 
of the session, something like:


psql (16.2)
WARNING: Connection is not encrypted.
WARNING: The server's identity has not been verified
Type "help" for help.

postgres=#

Although for the "divulge plaintext password to server" property, it's 
too late to print a warning after connecting, because the damage has 
already been done.


A different line of thought is that to keep the attack surface as smal 
as possible, you should specify very explicitly what exactly you expect 
to happen in the authentication, and disallow any variance. For example, 
you expect SCRAM to be used, with a certificate signed by a particular 
CA, and if the server requests anything else, that's suspicious and the 
connection is aborted. We should make that possible too, but the above 
flexible risk-based approach seems good for the defaults.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: doc: some fixes for environment sections in ref pages

2024-05-13 Thread Daniel Gustafsson
> On 13 May 2024, at 10:48, Peter Eisentraut  wrote:

> Patches attached.

All patches look good.

> I think the first one is a bug fix.

Agreed.

--
Daniel Gustafsson





Upgrade Debian CI images to Bookworm

2024-05-13 Thread Nazir Bilal Yavuz
Hi,

Bookworm versions of the Debian CI images are available now [0]. The
patches to use these images are attached.

'v1-0001-Upgrade-Debian-CI-images-to-Bookworm_REL_16+.patch' patch can
be applied to both upstream and REL_16 and all of the tasks finish
successfully.

'v1-0001-Upgrade-Debian-CI-images-to-Bookworm_REL_15.patch' patch can
be applied to REL_15 but it gives a compiler warning. The fix for this
warning is proposed here [1]. After the fix is applied, all of the
tasks finish successfully.

Any kind of feedback would be appreciated.

[0] https://github.com/anarazel/pg-vm-images/pull/91

[1] 
postgr.es/m/CAN55FZ0o9wqVoMTh_gJCmj_%2B4XbX9VXzQF8OySPZ0R1saxV3bA%40mail.gmail.com

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From 26f4b2425cb04dc7218142f24f151d3698f33191 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Mon, 13 May 2024 10:56:28 +0300
Subject: [PATCH v1] Upgrade Debian CI images to Bookworm

New Debian version, namely Bookworm, is released. Use these new images
in CI tasks.

Perl version is upgraded in the Bookworm images, so update Perl version
at 'Linux - Debian Bookworm - Meson' task as well.

Upgrading Debian CI images to Bookworm PR: https://github.com/anarazel/pg-vm-images/pull/91
---
 .cirrus.tasks.yml | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 82261eccb85..d2ff833fcad 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -65,7 +65,7 @@ task:
 CPUS: 4
 BUILD_JOBS: 8
 TEST_JOBS: 8
-IMAGE_FAMILY: pg-ci-bullseye
+IMAGE_FAMILY: pg-ci-bookworm
 CCACHE_DIR: ${CIRRUS_WORKING_DIR}/ccache_dir
 # no options enabled, should be small
 CCACHE_MAXSIZE: "150M"
@@ -241,7 +241,7 @@ task:
 CPUS: 4
 BUILD_JOBS: 4
 TEST_JOBS: 8 # experimentally derived to be a decent choice
-IMAGE_FAMILY: pg-ci-bullseye
+IMAGE_FAMILY: pg-ci-bookworm
 
 CCACHE_DIR: /tmp/ccache_dir
 DEBUGINFOD_URLS: "https://debuginfod.debian.net;
@@ -312,7 +312,7 @@ task:
 #DEBIAN_FRONTEND=noninteractive apt-get -y install ...
 
   matrix:
-- name: Linux - Debian Bullseye - Autoconf
+- name: Linux - Debian Bookworm - Autoconf
 
   env:
 SANITIZER_FLAGS: -fsanitize=address
@@ -346,7 +346,7 @@ task:
   on_failure:
 <<: *on_failure_ac
 
-- name: Linux - Debian Bullseye - Meson
+- name: Linux - Debian Bookworm - Meson
 
   env:
 CCACHE_MAXSIZE: "400M" # tests two different builds
@@ -373,7 +373,7 @@ task:
 ${LINUX_MESON_FEATURES} \
 -Dllvm=disabled \
 --pkg-config-path /usr/lib/i386-linux-gnu/pkgconfig/ \
--DPERL=perl5.32-i386-linux-gnu \
+-DPERL=perl5.36-i386-linux-gnu \
 -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
 build-32
 EOF
@@ -648,7 +648,7 @@ task:
   env:
 CPUS: 4
 BUILD_JOBS: 4
-IMAGE_FAMILY: pg-ci-bullseye
+IMAGE_FAMILY: pg-ci-bookworm
 
 # Use larger ccache cache, as this task compiles with multiple compilers /
 # flag combinations
-- 
2.43.0

From d8f4b660bdd408fa22c857e24d1b4d05ff41df03 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Mon, 13 May 2024 11:55:48 +0300
Subject: [PATCH v1] Upgrade Debian CI images to Bookworm

New Debian version, namely Bookworm, is released. Use these new images
in CI tasks.

Upgrading Debian CI images to Bookworm PR: https://github.com/anarazel/pg-vm-images/pull/91
---
 .cirrus.tasks.yml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index b03d128184d..a2735f8bb60 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -138,13 +138,13 @@ LINUX_CONFIGURE_FEATURES: _CONFIGURE_FEATURES >-
 
 
 task:
-  name: Linux - Debian Bullseye
+  name: Linux - Debian Bookworm
 
   env:
 CPUS: 4
 BUILD_JOBS: 4
 TEST_JOBS: 8 # experimentally derived to be a decent choice
-IMAGE_FAMILY: pg-ci-bullseye
+IMAGE_FAMILY: pg-ci-bookworm
 
 CCACHE_DIR: /tmp/ccache_dir
 DEBUGINFOD_URLS: "https://debuginfod.debian.net;
@@ -451,12 +451,12 @@ task:
 
   # To limit unnecessary work only run this once the normal linux test succeeds
   depends_on:
-- Linux - Debian Bullseye
+- Linux - Debian Bookworm
 
   env:
 CPUS: 4
 BUILD_JOBS: 4
-IMAGE_FAMILY: pg-ci-bullseye
+IMAGE_FAMILY: pg-ci-bookworm
 
 # Use larger ccache cache, as this task compiles with multiple compilers /
 # flag combinations
-- 
2.43.0



Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-13 Thread Alexander Korotkov
On Mon, May 13, 2024 at 12:45 PM Dmitry Koval  wrote:
> 13.05.2024 11:45, Daniel Gustafsson пишет:
> > Commit 3ca43dbbb67f which adds the permission checks seems to cause 
> > conflicts
> > in the pg_upgrade tests
>
> Thanks!
>
> It will probably be enough to rename the roles:
>
> regress_partition_merge_alice -> regress_partition_split_alice
> regress_partition_merge_bob -> regress_partition_split_bob

Thanks to Danial for spotting this.
Thanks to Dmitry for the proposed fix.

The actual problem appears to be a bit more complex.  Additionally to
the role names, the lack of permissions on schemas lead to creation of
tables in public schema and potential conflict there.  Fixed in
2a679ae94e.

--
Regards,
Alexander Korotkov




Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS

2024-05-13 Thread Aleksander Alekseev
Hi,

> > Thanks. I see a few pieces of code that use special FOO_NUMBER enum
> > values instead of a macro. Should we refactor these pieces
> > accordingly? PFA another patch.
>
> I think this is a sensible improvement.
>
> But please keep the trailing commas on the last enum items.

Thanks, fixed.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Use-macro-to-define-the-number-of-enum-values.patch
Description: Binary data


Re: SQL:2011 application time

2024-05-13 Thread Peter Eisentraut

On 03.04.24 07:30, Paul Jungwirth wrote:
But is it *literally* unique? Well two identical keys, e.g. (5, 
'[Jan24,Mar24)') and (5, '[Jan24,Mar24)'), do have overlapping ranges, 
so the second is excluded. Normally a temporal unique index is *more* 
restrictive than a standard one, since it forbids other values too (e.g. 
(5, '[Jan24,Feb24)')). But sadly there is one exception: the ranges in 
these keys do not overlap: (5, 'empty'), (5, 'empty'). With 
ranges/multiranges, `'empty' && x` is false for all x. You can add that 
key as many times as you like, despite a PK/UQ constraint:


     postgres=# insert into t values
     ('[1,2)', 'empty', 'foo'),
     ('[1,2)', 'empty', 'bar');
     INSERT 0 2
     postgres=# select * from t;
   id   | valid_at | name
     ---+--+--
  [1,2) | empty    | foo
  [1,2) | empty    | bar
     (2 rows)

Cases like this shouldn't actually happen for temporal tables, since 
empty is not a meaningful value. An UPDATE/DELETE FOR PORTION OF would 
never cause an empty. But we should still make sure they don't cause 
problems.


We should give temporal primary keys an internal CHECK constraint saying 
`NOT isempty(valid_at)`. The problem is analogous to NULLs in parts of a 
primary key. NULLs prevent two identical keys from ever comparing as 
equal. And just as a regular primary key cannot contain NULLs, so a 
temporal primary key should not contain empties.


The standard effectively prevents this with PERIODs, because a PERIOD 
adds a constraint saying start < end. But our ranges enforce only start 
<= end. If you say `int4range(4,4)` you get `empty`. If we constrain 
primary keys as I'm suggesting, then they are literally unique, and 
indisunique seems safer.


Should we add the same CHECK constraint to temporal UNIQUE indexes? I'm 
inclined toward no, just as we don't forbid NULLs in parts of a UNIQUE 
key. We should try to pick what gives users more options, when possible. 
Even if it is questionably meaningful, I can see use cases for allowing 
empty ranges in a temporal table. For example it lets you "disable" a 
row, preserving its values but marking it as never true.


It looks like we missed some of these fundamental design questions early 
on, and it might be too late now to fix them for PG17.


For example, the discussion on unique constraints misses that the 
question of null values in unique constraints itself is controversial 
and that there is now a way to change the behavior.  So I imagine there 
is also a selection of possible behaviors you might want for empty 
ranges.  Intuitively, I don't think empty ranges are sensible for 
temporal unique constraints.  But anyway, it's a bit late now to be 
discussing this.


I'm also concerned that if ranges have this fundamental incompatibility 
with periods, then the plan to eventually evolve this patch set to 
support standard periods will also have as-yet-unknown problems.


Some of these issues might be design flaws in the underlying mechanisms, 
like range types and exclusion constraints.  Like, if you're supposed to 
use this for scheduling but you can use empty ranges to bypass exclusion 
constraints, how is one supposed to use this?  Yes, a check constraint 
using isempty() might be the right answer.  But I don't see this 
documented anywhere.


On the technical side, adding an implicit check constraint as part of a 
primary key constraint is quite a difficult implementation task, as I 
think you are discovering.  I'm just reminded about how the patch for 
catalogued not-null constraints struggled with linking these not-null 
constraints to primary keys correctly.  This sounds a bit similar.


I'm afraid that these issues cannot be resolved in good time for this 
release, so we should revert this patch set for now.






Re: [PATCH] Fix bug when calling strncmp in check_authmethod_valid

2024-05-13 Thread Aleksander Alekseev
Hi,

> Searching the archives I was unable to find any complaints, and this has been
> broken for the entire window of supported releases, so I propose we remove it
> as per the attached patch.  If anyone is keen on making this work again for 
> all
> the types where it makes sense, it can be resurrected (probably with a better
> implementation).
>
> Any objections to fixing this in 17 by removing it? (cc:ing Michael from the 
> RMT)

+1 Something that is not documented or used by anyone (apparently) and
is broken should just be removed.

-- 
Best regards,
Aleksander Alekseev




Re: UniqueKey v2

2024-05-13 Thread Antonin Houska
Andy Fan  wrote:

> > * I think that, before EC is considered suitable for an UK, its 
> > ec_opfamilies
> >   field needs to be checked. I try to do that in
> >   find_ec_position_matching_expr(), see 0004.
> 
> Could you make the reason clearer for adding 'List *opfamily_lists;'
> into UniqueKey?  You said "This is needed to create ECs in the parent
> query if the upper relation represents a subquery." but I didn't get the
> it. Since we need to maintain the UniqueKey in the many places, I'd like
> to keep it as simple as possbile. Of course, anything essentical should
> be added for sure. 

If unique keys are generated for a subquery output, they also need to be
created for the corresponding relation in the upper query ("sub" in the
following example):

select * from tab1 left join (select * from tab2) sub;

However, to create an unique key for "sub", you need an EC for each expression
of the key. And to create an EC, you in turn need the list of operator
families.

Even if the parent query already had ECs for the columns of "sub" which are
contained in the unique key, you need to make sure that those ECs are
"compatible" with the ECs of the subquery which generated the unique key. That
is, if an EC of the subquery considers certain input values equal, the EC of
the parent query must also be able to determine if they are equal or not.

> > * RelOptInfo.notnullattrs
> >
> >   My understanding is that this field contains the set attributes whose
> >   uniqueness is guaranteed by the unique key. They are acceptable because 
> > they
> >   are either 1) not allowed to be NULL due to NOT NULL constraint or 2) NULL
> >   value makes the containing row filtered out, so the row cannot break
> >   uniqueness of the output. Am I right?
> >
> >   If so, I suggest to change the field name to something less generic, maybe
> >   'uniquekey_attrs' or 'uniquekey_candidate_attrs', and adding a comment 
> > that
> >   more checks are needed before particular attribute can actually be used in
> >   UniqueKey.
> 
> I don't think so, UniqueKey is just one of the places to use this
> not-null property, see 3af704098 for the another user case of it. 
> 
> (Because of 3af704098, we should leverage notnullattnums somehow in this
> patch, which will be included in the next version as well).

In your patch you modify 'notnullattrs' in add_base_clause_to_rel(), but that
does not happen to 'notnullattnums' in the current master branch. Thus I think
that 'notnullattrs' is specific to the unique keys feature, so the field name
should be less generic.

> >
> > * uniquekey_useful_for_merging()
> >
> >   How does uniqueness relate to merge join? In README.uniquekey you seem to
> >   point out that a single row is always sorted, but I don't think this
> >   function is related to that fact. (Instead, I'd expect that pathkeys are
> >   added to all paths for a single-row relation, but I'm not sure you do that
> >   in the current version of the patch.)
> 
> The merging is for "mergejoinable join clauses", see function
> eclass_useful_for_merging. Usually I think it as operator "t1.a = t2.a";

My question is: why is the uniqueness important specifically to merge join? I
understand that join evaluation can be more efficient if we know that one
input relation is unique (i.e. we only scan that relation until we find the
first match), but this is not specific to merge join.

> > * is_uniquekey_useful_afterjoin()
> >
> >   Now that my patch (0004) allows propagation of the unique keys from a
> >   subquery to the upper query, I was wondering if the UniqueKey structure
> >   needs the 'use_for_distinct field' I mean we should now propagate the 
> > unique
> >   keys to the parent query whether the subquery has DISTINCT clause or not. 
> > I
> >   noticed that the field is checked by is_uniquekey_useful_afterjoin(), so I
> >   changed the function to always returned true. However nothing changed in 
> > the
> >   output of regression tests (installcheck). Do you insist that the
> >   'use_for_distinct' field is needed?

I miss your answer to this comment.

> > * uniquekey_contains_multinulls()
> >
> >   ** Instead of calling the function when trying to use the UK, how about
> >  checking the ECs when considering creation of the UK? If the tests 
> > fail,
> >  just don't create the UK.
> 
> I don't think so since we maintain the UniqueKey from bottom to top, you
> can double check if my reason is appropriate.  
> 
> CREATE TABLE t1(a int);
> CREATE INDEX ON t1(a);
> 
> SELECT distinct t1.a FROM t1 JOIN t2 using(a);
> 
> We need to create the UniqueKey on the baserel for t1 and the NULL
> values is filtered out in the joinrel. so we have to creating it with
> allowing NULL values first. 

ok

> >   ** What does the 'multi' word in the function name mean?
> 
> multi means multiple, I thought we use this short name in the many
> places, for ex bt_multi_page_stats after a quick search. 

Why not simply uniquekey_contains_nulls() ?

Actually 

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jelte Fennema-Nio
On Sun, 12 May 2024 at 23:39, Heikki Linnakangas  wrote:
> You might miss that by changing sslnegotiation to 'postgres', or by
> removing it altogether, you not only made it compatible with older
> server versions, but you also allowed falling back to a plaintext
> connection. Maybe you're fine with that, but maybe not. I'd like to
> nudge people to use sslmode=require, not rely on implicit stuff like
> this just to make connection strings a little shorter.

I understand your worry, but I'm not sure that this is actually much
of a security issue in practice. sslmode=prefer and sslmode=require
are the same amount of insecure imho (i.e. extremely insecure). The
only reason sslmode=prefer would connect as non-ssl to a server that
supports ssl is if an attacker has write access to the network in the
middle (i.e. eavesdropping ability alone is not enough). Once an
attacker has this level of network access, it's trivial for this
attacker to read any data sent to Postgres by intercepting the TLS
handshake and doing TLS termination with some arbitrary cert (any cert
is trusted by sslmode=require).

So the only actual case where this is a security issue I can think of
is when an attacker has only eavesdropping ability on the network. And
somehow the Postgres server that the client tries to connect to is
configured incorrectly, so that no ssl is set up at all. Then a client
would drop to plaintext, when connecting to this server instead of
erroring, and the attacker could now read the traffic. But I don't
really see this scenario end up any differently when requiring people
to enter sslmode=require. The only action a user can take to connect
to a server that does not have ssl support is to remove
sslmode=require from the connection string. Except if they are also
the server operator, in which case they could enable ssl on the
server. But if ssl is not set up, then it was probably never set up,
and thus providing sslnegotiation=direct would be to test if ssl
works.

But, if you disagree I'm fine with erroring on plain sslnegotiation=direct

> In v18, I'd like to make sslmode=require the default. Or maybe introduce
> a new setting like "encryption=ssl|gss|none", defaulting to 'ssl'. If we
> want to encourage encryption, that's the right way to do it. (I'd still
> recommend everyone to use an explicit sslmode=require in their
> connection strings for many years, though, because you might be using an
> older client without realizing it.)

I'm definitely a huge proponent of making the connection defaults more
secure. But as described above sslmode=require is still extremely
insecure and I don't think we gain anything substantial by making it
the default. I think the only useful secure default would be to use
sslmode=verify-full (with probably some automatic fallback to
sslmode=prefer when connecting to hardcoded IPs or localhost). Which
probably means that sslrootcert=system should also be made the
default. Which would mean that ~/.postgresql/root.crt would not be the
default anymore, which I personally think is fine but others likely
disagree.




Re: 039_end_of_wal: error in "xl_tot_len zero" test

2024-05-13 Thread Anton Voloshin

On 13/05/2024 00:39, Tom Lane wrote:

Hm.  It occurs to me that there *is* a system-specific component to
the amount of WAL emitted during initdb: the number of locales
that "locale -a" prints translates directly to the number of
rows inserted into pg_collation. [...]


Yes. Another system-specific circumstance affecting WAL position is the 
name length of the unix user doing initdb. I've seen 039_end_of_wal 
failing consistently under user  but working fine with , 
both on the same machine at the same time.


To be more precise, on one particular machine under those particular 
circumstances (including set of locales) it would work for any username 
with length < 8 or >= 16, but would fail for length 8..15 (in bytes, not 
characters, if non-ASCII usernames were used).


--
Anton Voloshin
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru




Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?

2024-05-13 Thread Wolfgang Wilhelm
 Hi,
do I interpret your idea correctly: You want some sort of ordering without 
ordering?
Kind regardsWW

Am Montag, 13. Mai 2024 um 10:40:38 MESZ hat aa  
Folgendes geschrieben:  
 
 Hello Everyone!
Is there any chance to get some kind of a result set sifting mechanism in 
Postgres? 
What I am looking for is a way to get for example: "nulls last" in a result 
set, without having to call "order by" or having to use UNION ALL, and if 
possible to get this in a single result set pass.
Something on this line: SELECT a, b, c FROM my_table WHERE a nulls last OFFSET 
0 LIMIT 25
I don't want to use order by or union all because these are time consuming 
operations, especially on  large data sets and when comparations are done on 
dynamic values (eg: geolocation distances in between a mobile and a static 
location) 
What I would expect from such a feature, will be speeds comparable with non 
sorted selects, while getting a very rudimentary ordering.
A use case for such a mechanism will be the implementation of QUICK relevant 
search results for a search engine.
I'm not familiar with how Postgres logic handles simple select queries, but the 
way I would envision a result set sifting logic, would be to collect the result 
set, in 2 separate lists, based on the sifting condition, and then concatenate 
these 2 lists and return the result, when the pagination requests conditions 
are met.
Any idea if such a functionality is feasible ?
Thank you.
  PS: if ever implemented, the sifting mechanism could be extended to 
accommodate any type of thresholds, not just null values.


  

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-13 Thread Dmitry Koval

Hi!

13.05.2024 11:45, Daniel Gustafsson пишет:

Commit 3ca43dbbb67f which adds the permission checks seems to cause conflicts
in the pg_upgrade tests


Thanks!

It will probably be enough to rename the roles:

regress_partition_merge_alice -> regress_partition_split_alice
regress_partition_merge_bob -> regress_partition_split_bob

(changes in attachment)
--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.comFrom f307add79397bcfe20f7c779e77630fb0df73020 Mon Sep 17 00:00:00 2001
From: Koval Dmitry 
Date: Mon, 13 May 2024 12:32:43 +0300
Subject: [PATCH v1] Rename roles to avoid conflicts in concurrent work

---
 src/test/regress/expected/partition_split.out | 20 +--
 src/test/regress/sql/partition_split.sql  | 20 +--
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/src/test/regress/expected/partition_split.out 
b/src/test/regress/expected/partition_split.out
index b1108c92a2..999e923ef1 100644
--- a/src/test/regress/expected/partition_split.out
+++ b/src/test/regress/expected/partition_split.out
@@ -1516,33 +1516,33 @@ DROP TABLE t;
 DROP ACCESS METHOD partition_split_heap;
 -- Test permission checks.  The user needs to own the parent table and the
 -- the partition to split to do the split.
-CREATE ROLE regress_partition_merge_alice;
-CREATE ROLE regress_partition_merge_bob;
-SET SESSION AUTHORIZATION regress_partition_merge_alice;
+CREATE ROLE regress_partition_split_alice;
+CREATE ROLE regress_partition_split_bob;
+SET SESSION AUTHORIZATION regress_partition_split_alice;
 CREATE TABLE t (i int) PARTITION BY RANGE (i);
 CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2);
-SET SESSION AUTHORIZATION regress_partition_merge_bob;
+SET SESSION AUTHORIZATION regress_partition_split_bob;
 ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
   (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
PARTITION tp_1_2 FOR VALUES FROM (1) TO (2));
 ERROR:  must be owner of table t
 RESET SESSION AUTHORIZATION;
-ALTER TABLE t OWNER TO regress_partition_merge_bob;
-SET SESSION AUTHORIZATION regress_partition_merge_bob;
+ALTER TABLE t OWNER TO regress_partition_split_bob;
+SET SESSION AUTHORIZATION regress_partition_split_bob;
 ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
   (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
PARTITION tp_1_2 FOR VALUES FROM (1) TO (2));
 ERROR:  must be owner of table tp_0_2
 RESET SESSION AUTHORIZATION;
-ALTER TABLE tp_0_2 OWNER TO regress_partition_merge_bob;
-SET SESSION AUTHORIZATION regress_partition_merge_bob;
+ALTER TABLE tp_0_2 OWNER TO regress_partition_split_bob;
+SET SESSION AUTHORIZATION regress_partition_split_bob;
 ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
   (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
PARTITION tp_1_2 FOR VALUES FROM (1) TO (2));
 RESET SESSION AUTHORIZATION;
 DROP TABLE t;
-DROP ROLE regress_partition_merge_alice;
-DROP ROLE regress_partition_merge_bob;
+DROP ROLE regress_partition_split_alice;
+DROP ROLE regress_partition_split_bob;
 -- Check extended statistics aren't copied from the parent table to new
 -- partitions.
 CREATE TABLE t (i int, j int) PARTITION BY RANGE (i);
diff --git a/src/test/regress/sql/partition_split.sql 
b/src/test/regress/sql/partition_split.sql
index 7f231b0d39..be4a785b75 100644
--- a/src/test/regress/sql/partition_split.sql
+++ b/src/test/regress/sql/partition_split.sql
@@ -896,36 +896,36 @@ DROP ACCESS METHOD partition_split_heap;
 
 -- Test permission checks.  The user needs to own the parent table and the
 -- the partition to split to do the split.
-CREATE ROLE regress_partition_merge_alice;
-CREATE ROLE regress_partition_merge_bob;
+CREATE ROLE regress_partition_split_alice;
+CREATE ROLE regress_partition_split_bob;
 
-SET SESSION AUTHORIZATION regress_partition_merge_alice;
+SET SESSION AUTHORIZATION regress_partition_split_alice;
 CREATE TABLE t (i int) PARTITION BY RANGE (i);
 CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2);
 
-SET SESSION AUTHORIZATION regress_partition_merge_bob;
+SET SESSION AUTHORIZATION regress_partition_split_bob;
 ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
   (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
PARTITION tp_1_2 FOR VALUES FROM (1) TO (2));
 RESET SESSION AUTHORIZATION;
 
-ALTER TABLE t OWNER TO regress_partition_merge_bob;
-SET SESSION AUTHORIZATION regress_partition_merge_bob;
+ALTER TABLE t OWNER TO regress_partition_split_bob;
+SET SESSION AUTHORIZATION regress_partition_split_bob;
 ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
   (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
PARTITION tp_1_2 FOR VALUES FROM (1) TO (2));
 RESET SESSION AUTHORIZATION;
 
-ALTER TABLE tp_0_2 OWNER TO regress_partition_merge_bob;
-SET SESSION AUTHORIZATION regress_partition_merge_bob;
+ALTER TABLE tp_0_2 OWNER TO regress_partition_split_bob;
+SET SESSION AUTHORIZATION regress_partition_split_bob;
 ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
   (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
PARTITION tp_1_2 

Re: Allowing additional commas between columns, and at the end of the SELECT clause

2024-05-13 Thread Matthias van de Meent
On Mon, 13 May 2024 at 10:42, Artur Formella  wrote:
> Motivation:
> Commas of this type are allowed in many programming languages, in some
> it is even recommended to use them at the ends of lists or objects.

Single trailing commas are a feature that's more and more common in
languages, yes, but arbitrary excess commas is new to me. Could you
provide some examples of popular languages which have that, as I can't
think of any.

> Accepted:
>  SELECT 1,;
>  SELECT 1,;
>  SELECT *, from information_schema.sql_features;
>  (...) RETURNING a,,b,c,;
>
> Not accepted:
>  SELECT ,;
>  SELECT ,1;
>  SELECT ,,,;
>
> Advantages:
> - simplifies the creation and debugging of queries by reducing the most
> common syntax error,
> - eliminates the need to use the popular `1::int as dummy` at the end of
> a SELECT list,

This is the first time I've heard of this `1 as dummy`.

> - simplifies query generators,
> - the query is still deterministic,

What part of a query would (or would not) be deterministic? I don't
think I understand the potential concern here. Is it about whether the
statement can be parsed deterministically?

> Disadvantages:
> - counting of returned columns can be difficult,
> - syntax checkers will still report errors,
> - probably not SQL standard compliant,

I'd argue you better raise this with the standard committee if this
isn't compliant. I don't see enough added value to break standard
compliance here, especially when the standard may at some point allow
only a single trailing comma (and not arbitrarily many).

> What do you think?

Do you expect `SELECT 1,,,` to have an equivalent query identifier
to `SELECT 1;` in pg_stat_statements? Why, or why not?

Overall, I don't think unlimited commas is a good feature. A trailing
comma in the select list would be less problematic, but I'd still want
to follow the standard first and foremost.


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




  1   2   >