date:20240513

[this should probably belong to a different thread, but I'm not sure
what to title it]

On Mon, May 13, 2024 at 4:08 AM Heikki Linnakangas  wrote:
> I think these options should be designed from the user's point of view,
> so that the user can specify the risks they're willing to accept, and
> the details of how that's accomplished are handled by libpq. For
> example, I'm OK with (tick all that apply):
>
> [ ] passive eavesdroppers seeing all the traffic
> [ ] MITM being able to hijack the session
> [ ] connecting without verifying the server's identity
> [ ] divulging the plaintext password to the server
> [ ] ...

I'm pessimistic about a quality-of-protection scheme for this use case
(*). I don't think users need more knobs in their connection strings,
and many of the goals of transport encryption are really not
independent from each other in practice. As evidence I point to the
absolute mess of GSSAPI wrapping, which lets you check separate boxes
for things like "require the server to authenticate itself" and
"require integrity" and "allow MITMs to reorder messages" and so on,
as if the whole idea of "integrity" is useful if you don't know who
you're talking to in the first place. I think I recall slapd having
something similarly arcane (but at least it doesn't make the clients
do it!). Those kinds of APIs don't evolve well, in my opinion.

I think most people probably just want browser-grade security as
quickly and cheaply as possible, and we don't make that very easy
today. I'll try to review a QoP scheme if someone works on it, don't
get me wrong, but I'd much rather spend time on a "very secure by
default" mode that gets rid of most of the options (i.e. a
postgresqls:// scheme).

(*) I've proposed quality-of-protection in the past, for Postgres
proxy authentication [1]. But I'm comfortable in my hypocrisy, because
in that case, the end user doing the configuration is a DBA with a
config file who is expected to understand the whole system, and it's a
niche use case (IMO) without an obvious "common setup". And frankly I
think my proposal is unlikely to go anywhere; the cost/benefit
probably isn't good enough.

> If you need to verify
> the server's identity, it implies sslmode=verify-CA or
> channel_binding=true.

Neither of those two options provides strong authentication of the
peer, and personally I wouldn't want them to be considered worthy of
"verify the server's identity" mode.

And -- taking a wild leap here -- if we disagree, then granularity
becomes a problem: either the QoP scheme now has to have
sub-checkboxes for "if the server knows my password, that's good
enough" and "it's fine if the server's hostname doesn't match the
cert, for some reason", or it smashes all of those different ideas
into one setting and then I have to settle for the weakest common
denominator during an active attack. Assuming I haven't missed a third
option, will that be easier/better than the status quo of
require_auth+sslmode?

> A different line of thought is that to keep the attack surface as smal
> as possible, you should specify very explicitly what exactly you expect
> to happen in the authentication, and disallow any variance. For example,
> you expect SCRAM to be used, with a certificate signed by a particular
> CA, and if the server requests anything else, that's suspicious and the
> connection is aborted. We should make that possible too

That's 'require_auth=scram-sha-256 sslmode=verify-ca', no?

--Jacob

[1] 
https://www.postgresql.org/message-id/0768cedb-695a-8841-5f8b-da2aa64c8f3a%40timescale.com

Re: SQL:2011 application time

2024-05-13 Thread Paul Jungwirth

On 5/13/24 03:11, Peter Eisentraut wrote:
It looks like we missed some of these fundamental design questions early on, and it might be too
late now to fix them for PG17.

For example, the discussion on unique constraints misses that the question of null values in unique
constraints itself is controversial and that there is now a way to change the behavior. So I
imagine there is also a selection of possible behaviors you might want for empty ranges.
Intuitively, I don't think empty ranges are sensible for temporal unique constraints. But anyway,
it's a bit late now to be discussing this.

I'm also concerned that if ranges have this fundamental incompatibility with periods, then the plan
to eventually evolve this patch set to support standard periods will also have as-yet-unknown problems.

Some of these issues might be design flaws in the underlying mechanisms, like range types and
exclusion constraints. Like, if you're supposed to use this for scheduling but you can use empty
ranges to bypass exclusion constraints, how is one supposed to use this? Yes, a check constraint
using isempty() might be the right answer. But I don't see this documented anywhere.

On the technical side, adding an implicit check constraint as part of a primary key constraint is
quite a difficult implementation task, as I think you are discovering. I'm just reminded about how
the patch for catalogued not-null constraints struggled with linking these not-null constraints to
primary keys correctly. This sounds a bit similar.

I'm afraid that these issues cannot be resolved in good time for this release, so we should revert
this patch set for now.

I think reverting is a good idea. I'm not really happy with the CHECK constraint solution either.
I'd be happy to have some more time to rework this for v18.

A couple alternatives I'd like to explore:

1. Domain constraints instead of a CHECK constraint. I think this is probably worse, and I don't
plan to spend much time on it, but I thought I'd mention it in case someone else thought otherwise.

2. A slightly different overlaps operator, say &&&, where 'empty' &&& 'empty' is true. But 'empty'
with anything else could still be false (or not). That operator would prevent duplicates in an
exclusion constraint. This also means we could support more types than just ranges & multiranges. I
need to think about whether this combines badly with existing operators, but if not it has a lot of
promise. If anything it might be *less* contradictory, because it fits better with 'empty' @>
'empty', which we say is true.

Another thing a revert would give me some time to consider: even though it's not standard syntax, I
wonder if we want to require syntax something like `PRIMARY KEY USING gist (id, valid_at WITHOUT
OVERLAPS)`. Everywhere else we default to btree, so defaulting to gist feels a little weird. In
theory we could even someday support WITHOUT OVERLAPS with btree, if we taught that AM to answer
that question. (I admit there is probably not a lot of desire for that though.)

Yours,

--
Paul ~{:-)
p...@illuminatedcomputing.com

Re: Why is parula failing?

David Rowley  writes:
> I've not seen any recent failures from Parula that relate to this
> issue.  The last one seems to have been about 4 weeks ago.

> I'm now wondering if it's time to revert the debugging code added in
> 1db689715.  Does anyone think differently?

+1.  It seems like we wrote off the other issue as a compiler bug,
so I don't have much trouble assuming that this one was too.

regards, tom lane

Re: Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Julien Rouhaud

On Mon, May 13, 2024 at 07:11:53PM GMT, Tom Lane wrote:
> Andres Freund  writes:
> > On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote:
> >> Hmmm, depending on the extension it can extensively call/use postgres code
> >> so would be nice if we can differentiate if the code is called from
> >> Postgres itself or from an extension.
> 
> > I think that's not realistically possible. It's also very fuzzy what that'd
> > mean. If there's a planner hook and then the query executes normally, what 
> > do
> > you report for an execution time error? And even the simpler case - should 
> > use
> > of pg_stat_statements cause everything within to be logged as a
> > pg_stat_statement message?
> 
> Not to mention that there could be more than one extension on the call
> stack.  I think tying this statically to the ereport call site is
> fine.
> 
> The mechanism that Andres describes for sourcing the name seems a bit
> overcomplex though.  Why not just allow/require each extension to
> specify its name as a constant string?  We could force the matter by
> redefining PG_MODULE_MAGIC as taking an argument:
> 
>   PG_MODULE_MAGIC("hstore");

FTR there was a proposal at [1] some time ago that could be used for that need
(and others), I thought it could be good to mention it just in case.  That
would obviously only work if all extensions uses that framework.

[1] https://www.postgresql.org/message-id/flat/3207907.AWbSqkKDnR%40aivenronan

Re: Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Andres Freund

Hi,

On 2024-05-13 19:11:53 -0400, Tom Lane wrote:
> The mechanism that Andres describes for sourcing the name seems a bit
> overcomplex though.  Why not just allow/require each extension to
> specify its name as a constant string?  We could force the matter by
> redefining PG_MODULE_MAGIC as taking an argument:
>   PG_MODULE_MAGIC("hstore");

Mostly because it seemed somewhat sad to require every extension to have
version-specific ifdefs around that, particularly because it's not hard for us
to infer.

I think there might be other use cases for the backend to provide "extension
scoped" information, FWIW. Even just providing the full path to the extension
library could be useful.

Greetings,

Andres Freund

Re: Why is parula failing?

2024-05-13 Thread David Rowley

On Thu, 21 Mar 2024 at 13:53, David Rowley  wrote:
>
> On Thu, 21 Mar 2024 at 12:36, Tom Lane  wrote:
> > So yeah, if we could have log_autovacuum_min_duration = 0 perhaps
> > that would yield a clue.
>
> FWIW, I agree with your earlier statement about it looking very much
> like auto-vacuum has run on that table, but equally, if something like
> the pg_index record was damaged we could get the same plan change.
>
> We could also do something like the attached just in case we're
> barking up the wrong tree.

I've not seen any recent failures from Parula that relate to this
issue.  The last one seems to have been about 4 weeks ago.

I'm now wondering if it's time to revert the debugging code added in
1db689715.  Does anyone think differently?

David

Re: Adding the extension name to EData / log_line_prefix

Andres Freund  writes:
> On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote:
>> Hmmm, depending on the extension it can extensively call/use postgres code
>> so would be nice if we can differentiate if the code is called from
>> Postgres itself or from an extension.

> I think that's not realistically possible. It's also very fuzzy what that'd
> mean. If there's a planner hook and then the query executes normally, what do
> you report for an execution time error? And even the simpler case - should use
> of pg_stat_statements cause everything within to be logged as a
> pg_stat_statement message?

Not to mention that there could be more than one extension on the call
stack.  I think tying this statically to the ereport call site is
fine.

The mechanism that Andres describes for sourcing the name seems a bit
overcomplex though.  Why not just allow/require each extension to
specify its name as a constant string?  We could force the matter by
redefining PG_MODULE_MAGIC as taking an argument:

PG_MODULE_MAGIC("hstore");

regards, tom lane

Re: Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Andres Freund

Hi,

On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote:
> On Mon, May 13, 2024 at 5:51 PM Andres Freund  wrote:
> > It's not entirely trivial to provide errfinish() with a parameter
> indicating
> > the extension, but it's doable:
> >
> > 1) Have PG_MODULE_MAGIC also define a new variable for the extension name,
> >empty at that point
> >
> > 2) In internal_load_library(), look up that new variable, and fill it
> with a,
> >mangled, libname.
> >
> > 4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set,
> >we're in the server, otherwise an extension). In the backend itself,
> define
> >it to NULL, otherwise to the variable created by PG_MODULE_MAGIC.
> >
> > 5) In elog/ereport/errsave/... pass this new variable to
> >errfinish/errsave_finish.
> >
> 
> Then every extension should define their own Pg_extension_filename, right?

It'd be automatically set by postgres when loading libraries.


> > I've attached a *very rough* prototype of this idea. My goal at this
> stage was
> > just to show that it's possible, not for the code to be in a reviewable
> state.
> >
> >
> > Here's e.g. what this produces with log_line_prefix='%m [%E] '
> >
> > 2024-05-13 13:50:17.518 PDT [postgres] LOG:  database system is ready to
> accept connections
> > 2024-05-13 13:50:19.138 PDT [cube] ERROR:  invalid input syntax for cube
> at character 13
> > 2024-05-13 13:50:19.138 PDT [cube] DETAIL:  syntax error at or near "f"
> > 2024-05-13 13:50:19.138 PDT [cube] STATEMENT:  SELECT cube('foo');
> >
> > 2024-05-13 13:43:07.484 PDT [postgres] LOG:  database system is ready to
> accept connections
> > 2024-05-13 13:43:11.699 PDT [hstore] ERROR:  syntax error in hstore:
> unexpected end of string at character 15
> > 2024-05-13 13:43:11.699 PDT [hstore] STATEMENT:  SELECT hstore('foo');
> >
> >
> 
> Was not able to build your patch by simply:

Oh, turns out it only builds with meson right now.  I forgot that, with
autoconf, for some unknown reason, we only set BUILDING_DLL on some OSs.

I attached a crude patch changing that.


> > It's worth pointing out that this, quite fundamentally, can only work
> when the
> > log message is triggered directly by the extension. If the extension code
> > calls some postgres function and that function then errors out, it'll be
> seens
> > as being part of postgres.
> >
> > But I think that's ok - they're going to be properly errcode-ified etc.
> >
> 
> Hmmm, depending on the extension it can extensively call/use postgres code
> so would be nice if we can differentiate if the code is called from
> Postgres itself or from an extension.

I think that's not realistically possible. It's also very fuzzy what that'd
mean. If there's a planner hook and then the query executes normally, what do
you report for an execution time error? And even the simpler case - should use
of pg_stat_statements cause everything within to be logged as a
pg_stat_statement message?

I think the best we can do is to actually say where the error is directly
triggered from.

Greetings,

Andres Freund
>From 1c59c465f2d359e8c47cf91d1ea458ea3b64ec84 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 13 May 2024 13:47:41 -0700
Subject: [PATCH v2 1/2] WIP: Track shared library in which elog/ereport is
 called

---
 src/include/fmgr.h |  1 +
 src/include/utils/elog.h   | 18 +-
 src/backend/utils/error/elog.c | 33 -
 src/backend/utils/fmgr/dfmgr.c | 30 ++
 4 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index ccb4070a251..3c7cfd7fee9 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -504,6 +504,7 @@ PG_MAGIC_FUNCTION_NAME(void) \
 	static const Pg_magic_struct Pg_magic_data = PG_MODULE_MAGIC_DATA; \
 	return _magic_data; \
 } \
+PGDLLEXPORT const char *Pg_extension_filename = NULL; \
 extern int no_such_variable
 
 
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 054dd2bf62f..5e57f01823d 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -137,6 +137,13 @@ struct Node;
  * prevents gcc from making the unreachability deduction at optlevel -O0.
  *--
  */
+#ifdef BUILDING_DLL
+#define ELOG_EXTNAME NULL
+#else
+extern PGDLLEXPORT const char *Pg_extension_filename;
+#define ELOG_EXTNAME Pg_extension_filename
+#endif
+
 #ifdef HAVE__BUILTIN_CONSTANT_P
 #define ereport_domain(elevel, domain, ...)	\
 	do { \
@@ -144,7 +151,7 @@ struct Node;
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
 			errstart_cold(elevel, domain) : \
 			errstart(elevel, domain)) \
-			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \
+			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
 			pg_unreachable(); \
 	} while(0)
@@ -154,7 +161,7 @@ struct Node;
 		const int elevel_ = (elevel); \

Re: Direct SSL connection with ALPN and HBA rules

On Mon, May 13, 2024 at 9:13 AM Robert Haas  wrote:
> I find this idea to be a massive improvement over the status quo,

+1

> and
> I didn't spot any major problems when I read through the patch,
> either.

Definitely not a major problem, but I think
select_next_encryption_method() has gone stale, since it originally
provided generality and lines of fallback that no longer exist. In
other words, I think the following code is now misleading:

> if (conn->sslmode[0] == 'a')
> SELECT_NEXT_METHOD(ENC_PLAINTEXT);
>
> SELECT_NEXT_METHOD(ENC_NEGOTIATED_SSL);
> SELECT_NEXT_METHOD(ENC_DIRECT_SSL);
>
> if (conn->sslmode[0] != 'a')
> SELECT_NEXT_METHOD(ENC_PLAINTEXT);

To me, that implies that negotiated mode takes precedence over direct,
but the point of the patch is that it's not possible to have both. And
if direct SSL is in use, then sslmode can't be "allow" anyway, and we
definitely don't want ENC_PLAINTEXT.

So if someone proposes a change to select_next_encryption_method(),
you'll have to remember to stare at init_allowed_encryption_methods()
as well, and think really hard about what's going on. And vice-versa.
That worries me.

> I don't have a strong opinion about whether sslnegotiation=direct
> should error out (as you propose here) or silently promote sslmode to
> require. I think either is defensible.

I'm comforted that, since sslrootcert=system already does it, plenty
of use cases will get that for free. And if you decide in the future
that you really really want it to promote, it won't be a compatibility
break to make that change. (That gives us more time for wider v16-17
adoption, to see how the sslrootcert=system magic promotion behavior
is going in practice.)

> Had I been implementing it, I
> think I would have done as Jacob proposes, just because once we've
> forced a direct SSL negotiation surely the only sensible behavior is
> to be using SSL, unless you think there should be a
> silently-reconnect-without-SSL behavior, which I sure don't.

We still allow GSS to preempt SSL, though, so "forced" is probably
overstating things.

> It's really hard to believe in 2024 that anyone should ever be using a
> setting that may or may not encrypt the connection. There's http and
> https but there's no httpmaybes.

+1. I think (someone hop in and correct me please) that Opportunistic
Encryption for HTTP mostly fizzled, and they gave it a *lot* of
thought.

--Jacob

Re: Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Fabrízio de Royes Mello

On Mon, May 13, 2024 at 5:51 PM Andres Freund  wrote:
>
> Hi,
>
> It can be very useful to look at the log messages emitted by a larger
number
> of postgres instances to see if anything unusual is happening. E.g.
checking
> whether there are an increased number of internal, IO, corruption errors
(and
> LOGs too, because we emit plenty bad things as LOG) . One difficulty is
that
> extensions tend to not categorize their errors. But unfortunately errors
in
> extensions are hard to distinguish from errors emitted by postgres.
>
> A related issue is that it'd be useful to be able to group log messages by
> extension, to e.g. see which extensions are emitting disproportionally
many
> log messages.
>
> Therefore I'd like to collect the extension name in elog/ereport and add a
> matching log_line_prefix escape code.
>

I liked the idea ... It is very helpful for troubleshooting problems in
production.

> It's not entirely trivial to provide errfinish() with a parameter
indicating
> the extension, but it's doable:
>
> 1) Have PG_MODULE_MAGIC also define a new variable for the extension name,
>empty at that point
>
> 2) In internal_load_library(), look up that new variable, and fill it
with a,
>mangled, libname.
>
> 4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set,
>we're in the server, otherwise an extension). In the backend itself,
define
>it to NULL, otherwise to the variable created by PG_MODULE_MAGIC.
>
> 5) In elog/ereport/errsave/... pass this new variable to
>errfinish/errsave_finish.
>

Then every extension should define their own Pg_extension_filename, right?

> I've attached a *very rough* prototype of this idea. My goal at this
stage was
> just to show that it's possible, not for the code to be in a reviewable
state.
>
>
> Here's e.g. what this produces with log_line_prefix='%m [%E] '
>
> 2024-05-13 13:50:17.518 PDT [postgres] LOG:  database system is ready to
accept connections
> 2024-05-13 13:50:19.138 PDT [cube] ERROR:  invalid input syntax for cube
at character 13
> 2024-05-13 13:50:19.138 PDT [cube] DETAIL:  syntax error at or near "f"
> 2024-05-13 13:50:19.138 PDT [cube] STATEMENT:  SELECT cube('foo');
>
> 2024-05-13 13:43:07.484 PDT [postgres] LOG:  database system is ready to
accept connections
> 2024-05-13 13:43:11.699 PDT [hstore] ERROR:  syntax error in hstore:
unexpected end of string at character 15
> 2024-05-13 13:43:11.699 PDT [hstore] STATEMENT:  SELECT hstore('foo');
>
>

Was not able to build your patch by simply:

./configure --prefix=/tmp/pg
...
make -j
...
/usr/bin/ld: ../../src/port/libpgport_srv.a(path_srv.o): warning:
relocation against `Pg_extension_filename' in read-only section `.text'
/usr/bin/ld: access/brin/brin.o: in function `brininsert':
/data/src/pg/main/src/backend/access/brin/brin.c:403: undefined reference
to `Pg_extension_filename'
/usr/bin/ld: access/brin/brin.o: in function `brinbuild':
/data/src/pg/main/src/backend/access/brin/brin.c:1107: undefined reference
to `Pg_extension_filename'
/usr/bin/ld: access/brin/brin.o: in function `brin_summarize_range':
/data/src/pg/main/src/backend/access/brin/brin.c:1383: undefined reference
to `Pg_extension_filename'
/usr/bin/ld: /data/src/pg/main/src/backend/access/brin/brin.c:1389:
undefined reference to `Pg_extension_filename'
/usr/bin/ld: /data/src/pg/main/src/backend/access/brin/brin.c:1434:
undefined reference to `Pg_extension_filename'
/usr/bin/ld:
access/brin/brin.o:/data/src/pg/main/src/backend/access/brin/brin.c:1450:
more undefined references to `Pg_extension_filename' follow
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:67: postgres] Error 1
make[2]: Leaving directory '/data/src/pg/main/src/backend'
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make[1]: Leaving directory '/data/src/pg/main/src'
make: *** [GNUmakefile:11: all-src-recurse] Error 2

> It's worth pointing out that this, quite fundamentally, can only work
when the
> log message is triggered directly by the extension. If the extension code
> calls some postgres function and that function then errors out, it'll be
seens
> as being part of postgres.
>
> But I think that's ok - they're going to be properly errcode-ified etc.
>

Hmmm, depending on the extension it can extensively call/use postgres code
so would be nice if we can differentiate if the code is called from
Postgres itself or from an extension.

Regards,

--
Fabrízio de Royes Mello

Re: pg_sequence_last_value() for unlogged sequences on standbys

2024-05-13 Thread Nathan Bossart

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Adding the extension name to EData / log_line_prefix

2024-05-13 Thread Andres Freund

Hi,

It can be very useful to look at the log messages emitted by a larger number
of postgres instances to see if anything unusual is happening. E.g. checking
whether there are an increased number of internal, IO, corruption errors (and
LOGs too, because we emit plenty bad things as LOG) . One difficulty is that
extensions tend to not categorize their errors. But unfortunately errors in
extensions are hard to distinguish from errors emitted by postgres.

A related issue is that it'd be useful to be able to group log messages by
extension, to e.g. see which extensions are emitting disproportionally many
log messages.

Therefore I'd like to collect the extension name in elog/ereport and add a
matching log_line_prefix escape code.


It's not entirely trivial to provide errfinish() with a parameter indicating
the extension, but it's doable:

1) Have PG_MODULE_MAGIC also define a new variable for the extension name,
   empty at that point

2) In internal_load_library(), look up that new variable, and fill it with a,
   mangled, libname.

4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set,
   we're in the server, otherwise an extension). In the backend itself, define
   it to NULL, otherwise to the variable created by PG_MODULE_MAGIC.

5) In elog/ereport/errsave/... pass this new variable to
   errfinish/errsave_finish.


I've attached a *very rough* prototype of this idea. My goal at this stage was
just to show that it's possible, not for the code to be in a reviewable state.


Here's e.g. what this produces with log_line_prefix='%m [%E] '

2024-05-13 13:50:17.518 PDT [postgres] LOG:  database system is ready to accept 
connections
2024-05-13 13:50:19.138 PDT [cube] ERROR:  invalid input syntax for cube at 
character 13
2024-05-13 13:50:19.138 PDT [cube] DETAIL:  syntax error at or near "f"
2024-05-13 13:50:19.138 PDT [cube] STATEMENT:  SELECT cube('foo');

2024-05-13 13:43:07.484 PDT [postgres] LOG:  database system is ready to accept 
connections
2024-05-13 13:43:11.699 PDT [hstore] ERROR:  syntax error in hstore: unexpected 
end of string at character 15
2024-05-13 13:43:11.699 PDT [hstore] STATEMENT:  SELECT hstore('foo');


It's worth pointing out that this, quite fundamentally, can only work when the
log message is triggered directly by the extension. If the extension code
calls some postgres function and that function then errors out, it'll be seens
as being part of postgres.

But I think that's ok - they're going to be properly errcode-ified etc.


Thoughts?

Greetings,

Andres Freund
>From 1c59c465f2d359e8c47cf91d1ea458ea3b64ec84 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 13 May 2024 13:47:41 -0700
Subject: [PATCH v1] WIP: Track shared library in which elog/ereport is called

---
 src/include/fmgr.h |  1 +
 src/include/utils/elog.h   | 18 +-
 src/backend/utils/error/elog.c | 33 -
 src/backend/utils/fmgr/dfmgr.c | 30 ++
 4 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index ccb4070a251..3c7cfd7fee9 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -504,6 +504,7 @@ PG_MAGIC_FUNCTION_NAME(void) \
 	static const Pg_magic_struct Pg_magic_data = PG_MODULE_MAGIC_DATA; \
 	return _magic_data; \
 } \
+PGDLLEXPORT const char *Pg_extension_filename = NULL; \
 extern int no_such_variable
 
 
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 054dd2bf62f..5e57f01823d 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -137,6 +137,13 @@ struct Node;
  * prevents gcc from making the unreachability deduction at optlevel -O0.
  *--
  */
+#ifdef BUILDING_DLL
+#define ELOG_EXTNAME NULL
+#else
+extern PGDLLEXPORT const char *Pg_extension_filename;
+#define ELOG_EXTNAME Pg_extension_filename
+#endif
+
 #ifdef HAVE__BUILTIN_CONSTANT_P
 #define ereport_domain(elevel, domain, ...)	\
 	do { \
@@ -144,7 +151,7 @@ struct Node;
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
 			errstart_cold(elevel, domain) : \
 			errstart(elevel, domain)) \
-			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \
+			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
 			pg_unreachable(); \
 	} while(0)
@@ -154,7 +161,7 @@ struct Node;
 		const int elevel_ = (elevel); \
 		pg_prevent_errno_in_scope(); \
 		if (errstart(elevel_, domain)) \
-			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \
+			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \
 		if (elevel_ >= ERROR) \
 			pg_unreachable(); \
 	} while(0)
@@ -169,7 +176,7 @@ extern bool message_level_is_interesting(int elevel);
 
 extern bool errstart(int elevel, const char *domain);
 extern pg_attribute_cold bool errstart_cold(int elevel, const char *domain);
-extern void errfinish(const char *filename, int lineno, const char

Re: Large files for relations

2024-05-13 Thread Peter Eisentraut


On 06.03.24 22:54, Thomas Munro wrote:

Rebased.  I had intended to try to get this into v17, but a couple of
unresolved problems came up while rebasing over the new incremental
backup stuff.  You snooze, you lose.  Hopefully we can sort these out
in time for the next commitfest:

* should pg_combinebasebackup read the control file to fetch the segment size?
* hunt for other segment-size related problems that may be lurking in
new incremental backup stuff
* basebackup_incremental.c wants to use memory in proportion to
segment size, which looks like a problem, and I wrote about that in a
new thread[1]


Overall, I like this idea, and the patch seems to have many bases covered.

The patch will need a rebase.  I was able to test it on 
master@{2024-03-13}, but after that there are conflicts.


In .cirrus.tasks.yml, one of the test tasks uses 
--with-segsize-blocks=6, but you are removing that option.  You could 
replace that with something like


PG_TEST_INITDB_EXTRA_OPTS='--rel-segsize=48kB'

But that won't work exactly because

initdb: error: argument of --rel-segsize must be a power of two

I suppose that's ok as a change, since it makes the arithmetic more 
efficient.  But maybe it should be called out explicitly in the commit 
message.


If I run it with 64kB, the test pgbench/001_pgbench_with_server fails 
consistently, so it seems there is still a gap somewhere.


A minor point, the initdb error message

initdb: error: argument of --rel-segsize must be a multiple of BLCKSZ

would be friendlier if actually showed the value of the block size 
instead of just the symbol.  Similarly for the nearby error message 
about the off_t size.


In the control file, all the other fields use unsigned types.  Should 
relseg_size be uint64?


PG_CONTROL_VERSION needs to be changed.

Summary of Sort Improvement Proposals

2024-05-13 Thread Benjamin Coutu

Hello,

In light of multiple threads [1-6] discussing sorting improvements, I'd like to 
consolidate the old (+some new) ideas as a starting point.
It might make sense to brain storm on a few of these ideas and maybe even 
identify some that are worth implementing and testing.

1. Simple algorithmic ideas:
- Use single-assignment insertion-sort instead of swapping
- Increase insertion-sort threshold to at least 8 (possibly 10+), to be 
determined empirically based on current hardware
- Make insertion-sort threshold customizable via template based on sort 
element size

2. More complex/speculative algorithmic ideas:
- Try counting insertion-sort loop iterations and bail after a certain 
limit (include presorted check in insertion-sort loop and continue presorted 
check from last position in separate loop after bailout)
- Try binary search for presorted check (outside of insertion-sort-code)
- Try binary insertion sort (if comparison costs are high)
- Try partial insertion sort (include presorted check)
- Try presorted check only at top-level, not on every recursive step, 
or if on every level than at least only for n > some threshold
- Try asymmetric quick-sort partitioning
- Try dual pivot quick-sort
- Try switching to heap-sort dependent on recursion depth (might allow 
ripping out median-of-median)

3. TupleSort ideas:
- Use separate sort partition for NULL values to avoid null check on 
every comparison and to make nulls first/last trivial
- Pass down non-nullness info to avoid null check and/or null-partition 
creation (should ideally be determined by planner)
- Skip comparison of first sort key on subsequent full tuple 
tie-breaker comparison (unless abbreviated key)
- Encode NULL directly in abbreviated key (only if no null-partitioning)

4. Planner ideas:
- Use pg_stats.correlation to inform sort algorithm selection for sort 
keys that come from sequential-scans/bitmap-heap-scans
- Use n_distinct to inform sort algorithm selection (many tie-breaker 
comparisons necessary on multi-key sort)
- Improve costing of sorts in planner considering tuple size, 
distribution and n_distinct

[1] https://www.postgresql.org/message-id/flat/ddc4e498740a8e411c59%40zeyos.com
[2] 
https://www.postgresql.org/message-id/flat/CAFBsxsHanJTsX9DNJppXJxwg3bU%2BYQ6pnmSfPM0uvYUaFdwZdQ%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/flat/CAApHDvoTTtoQYfp3d0kTPF6y1pjexgLwquzKmjzvjC9NCw4RGw%40mail.gmail.com
[4] 
https://www.postgresql.org/message-id/flat/CAEYLb_Xn4-6f1ofsf2qduf24dDCVHbQidt7JPpdL_RiT1zBJ6A%40mail.gmail.com
[5] 
https://www.postgresql.org/message-id/flat/CAEYLb_W%2B%2BUhrcWprzG9TyBVF7Sn-c1s9oLbABvAvPGdeP2DFSQ%40mail.gmail.com
[6] 
https://www.postgresql.org/message-id/flat/683635b8-381b-5b08-6069-d6a45de19a12%40enterprisedb.com#12683b7a6c566eb5b926369b5fd2d41f

-- 

Benjamin Coutu
http://www.zeyos.com

Re: race condition in pg_class

2024-05-13 Thread Noah Misch

On Mon, May 13, 2024 at 03:53:08PM -0400, Robert Haas wrote:
> On Sun, May 12, 2024 at 7:29 PM Noah Misch  wrote:
> > - [consequences limited to transient failure] Since a PROC_IN_VACUUM 
> > backend's
> >   xmin does not stop pruning, an MVCC scan in that backend can find zero
> >   tuples when one is live.  This is like what all backends got in the days 
> > of
> >   SnapshotNow catalog scans.  See the pgbench test suite addition.  (Perhaps
> >   the fix is to make VACUUM do its MVCC scans outside of PROC_IN_VACUUM,
> >   setting that flag later and unsetting it earlier.)
> 
> Are you saying that this is a problem already, or that the patch
> causes it to start happening? If it's the former, that's horrible.

The former.

Re: New GUC autovacuum_max_threshold ?

On Mon, May 13, 2024 at 11:14 AM Frédéric Yhuel
 wrote:
> FWIW, I do agree with your math. I found your demonstration convincing.
> 50 was selected with the wet finger.

Good to know.

> Using the formula I suggested earlier:
>
> vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples,
> vac_base_thresh + vac_scale_factor * sqrt(reltuples) * 1000);
>
> your table of 2.56 billion tuples will be vacuumed if there are
> more than 10 million dead tuples (every 28 minutes).

Yeah, so that is about 50x what we do now (twice an hour vs. once a
day). While that's a lot more reasonable than the behavior that we'd
get from a 500k hard cap (every 84 seconds), I suspect it's still too
aggressive.

I find these things much easier to reason about in gigabytes than in
time units. In that example, the table was 320GB and was getting
vacuumed after accumulating 64GB of bloat. That seems like a lot. It
means that the table can grow from 320GB all the way up until 384GB
before we even think about starting to vacuum it, and then we might
not start right away, depending on resource availability, and we may
take some time to finish, possibly considerable time, depending on the
number and size of indexes and the availability of I/O resources. So
actually the table might very plausibly be well above 400GB before we
get done processing it, or potentially even more. I think that's not
aggressive enough.

But how much would we like to push that 64GB of bloat number down for
a table of this size? I would argue that if we're vacuuming the table
when it's only got 1GB of bloat, or 2GB of bloat, that seems
excessive. Unless the system is very lightly loaded and has no
long-running transactions at all, we're unlikely to be able to vacuum
aggressively enough to keep a 320GB table at a size of 321GB or 322GB.
Without testing or doing any research, I'm going to guess that a
realistic number is probably in the range of 10-20GB of bloat. If the
table activity is very light, we might be able to get it even lower,
like say 5GB, but the costs ramp up very quickly as you push the
vacuuming threshold down. Also, if the table accumulates X amount of
bloat during the time it takes to run one vacuum, you can never
succeed in limiting bloat to a value less than X (and probably more
like 1.5*X or 2*X or something).

So without actually trying anything, which I do think somebody should
do and report results, my guess is that for a 320GB table, you'd like
to multiply the vacuum frequency by a value somewhere between 3 and
10, and probably much closer to 3 than to 10. Maybe even less than 3.
Not sure exactly. Like I say, I think someone needs to try some
different workloads and database sizes and numbers of indexes, and try
to get a feeling for what actually works well in practice.

> If we want to stick with the simple formula, we should probably choose a
> very high default, maybe 100 million, as you suggested earlier.
>
> However, it would be nice to have the visibility map updated more
> frequently than every 100 million dead tuples. I wonder if this could be
> decoupled from the vacuum process?

Yes, but if a page has had any non-HOT updates, it can't become
all-visible again without vacuum. If it has had only HOT updates, then
a HOT-prune could make it all-visible. I don't think we do that
currently, but I think in theory we could.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Why is citext/regress failing on hamerkop?

Thomas Munro  writes:
> For citext_utf8, I pushed cff4e5a3.  Hamerkop runs infrequently, so
> here's hoping for 100% green on master by Tuesday or so.

In the meantime, some off-list investigation by Alexander Lakhin
has turned up a good deal of information about why we're seeing
failures on hamerkop in the back branches.  Summarizing, it
appears that

1. In a GSS-enabled Windows build without any active Kerberos server,
libpq's pg_GSS_have_cred_cache() succeeds, allowing libpq to try to
open a GSSAPI connection, but then gss_init_sec_context() fails,
leading to client-side reports like this:

+Connection 2 failed: could not initiate GSSAPI security context: Unspecified 
GSS failure.  Minor code may provide more information: Credential cache is empty
+FATAL:  sorry, too many clients already

(The first of these lines comes out during the attempted GSS
connection, the second during the only-slightly-more-successful
non-GSS connection.)  So that's problem number 1: how is it that
gss_acquire_cred() succeeds but then gss_init_sec_context() disclaims
knowledge of any credentials?  Can we find a way to get this failure
to be detected during pg_GSS_have_cred_cache()?  It is mighty
expensive to launch a backend connection that is doomed to fail,
especially when this happens during *every single libpq connection
attempt*.

2. Once gss_init_sec_context() fails, libpq abandons the connection
and starts over; since it has already initiated a GSS handshake
on the connection, there's not much choice.  Although libpq faithfully
issues closesocket() on the abandoned connection, Alexander found
that the connected backend doesn't reliably see that: it may just
sit there until the AuthenticationTimeout elapses (1 minute by
default).  That backend is still consuming a postmaster child slot,
so if this happens on any sizable fraction of test connection
attempts, it's little surprise that we soon get "sorry, too many
clients already" failures.

3. We don't know exactly why hamerkop suddenly started seeing these
failures, but a plausible theory emerges after noting that its
reported time for the successful "make check" step dropped pretty
substantially right when this started.  In the v13 branch, "make
check" was taking 2:18 or more in the several runs right before the
first isolationcheck failure, but 1:40 or less just after.  So it
looks like the animal was moved onto faster hardware.  That feeds
into this problem because, with a successful isolationcheck run
taking more than a minute, there was enough time for some of the
earlier stuck sessions to time out and exit before their
postmaster-child slots were needed.

Alexander confirmed this theory by demonstrating that the main
regression tests in v15 would pass if he limited their speed enough
(by reducing the CPU allowed to a VM) but not at full speed.  So the
buildfarm results suggesting this is only an issue in <= v13 must
be just a timing artifact; the problem is still there.

Of course, backends waiting till timeout is not a good behavior
independently of what is triggering that, so we have two problems to
solve here, not one.  I have no ideas about the gss_init_sec_context()
failure, but I see a plausible theory about the failure to detect
socket closure in Microsoft's documentation about closesocket() [1]:

If the l_onoff member of the LINGER structure is zero on a stream
socket, the closesocket call will return immediately and does not
receive WSAEWOULDBLOCK whether the socket is blocking or
nonblocking. However, any data queued for transmission will be
sent, if possible, before the underlying socket is closed. This is
also called a graceful disconnect or close. In this case, the
Windows Sockets provider cannot release the socket and other
resources for an arbitrary period, thus affecting applications
that expect to use all available sockets. This is the default
behavior for a socket.

I'm not sure whether we've got unsent data pending in the problematic
condition, nor why it'd remain unsent if we do (shouldn't the backend
consume it anyway?).  But this has the right odor for an explanation.

I'm pretty hesitant to touch this area myself, because it looks an
awful lot like commits 6051857fc and ed52c3707, which eventually
had to be reverted.  I think we need a deeper understanding of
exactly what Winsock is doing or not doing before we try to fix it.
I wonder if there are any Microsoft employees around here with
access to the relevant source code.

In the short run, it might be a good idea to deprecate using
--with-gssapi on Windows builds.  A different stopgap idea
could be to drastically reduce the default AuthenticationTimeout,
perhaps only on Windows.

regards, tom lane

[1] 
https://learn.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-closesocket

Re: race condition in pg_class

On Sun, May 12, 2024 at 7:29 PM Noah Misch  wrote:
> - [consequences limited to transient failure] Since a PROC_IN_VACUUM backend's
>   xmin does not stop pruning, an MVCC scan in that backend can find zero
>   tuples when one is live.  This is like what all backends got in the days of
>   SnapshotNow catalog scans.  See the pgbench test suite addition.  (Perhaps
>   the fix is to make VACUUM do its MVCC scans outside of PROC_IN_VACUUM,
>   setting that flag later and unsetting it earlier.)

Are you saying that this is a problem already, or that the patch
causes it to start happening? If it's the former, that's horrible. If
it's the latter, I'd say that is a fatal defect.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

On Mon, May 13, 2024 at 10:53 AM Matthias van de Meent
 wrote:
> It's not inconceivable that this will significantly increase WAL
> volume, but I think we should go for correctness rather than fastest
> copy.

I don't think we can afford to just do this blindly for the sake of a
hypothetical non-core AM that uses nonstandard pages. There must be
lots of cases where the holes are large, and where the WAL volume
would be a multiple of what it is currently. That's a *big*
regression.

> If we went with fastest copy, we'd better just skip logging the
> FSM and VM forks because we're already ignoring the data of the pages,
> so why not ignore the pages themselves, too? I don't think that holds
> water when we want to be crash-proof in CREATE DATABASE, with a full
> data copy of the template database.

This seems like a red herring. Either assuming standard pages is a
good idea or it isn't, and either logging the FSM and VM forks is a
good idea or it isn't, but those are two separate questions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Direct SSL connection with ALPN and HBA rules

On Mon, May 13, 2024 at 1:42 PM Jelte Fennema-Nio  wrote:
> Like Jacob already said, I guess you meant me here. The main point I
> was trying to make is that sslmode=require is extremely insecure too,
> so if we're changing the default then I'd rather bite the bullet and
> actually make the default a secure one this time. No-ones browser
> trusts self-signed certs by default, but currently lots of people
> trust self-signed certs when connecting to their production database
> without realizing.
>
> IMHO the only benefit that sslmode=require brings over sslmode=prefer
> is detecting incorrectly configured servers i.e. servers that are
> supposed to support ssl but somehow don't due to a misconfigured
> GUC/pg_hba. Such "incorrectly configured server" detection avoids
> sending data to such a server, which an eavesdropper on the network
> could see. Which is definitely a security benefit, but it's an
> extremely small one. In all other cases sslmode=prefer brings exactly
> the same protection as sslmode=require, because sslmode=prefer
> encrypts the connection unless postgres actively tells the client to
> downgrade to plaintext (which never happens when the server is
> configured correctly).

I think I agree with *nearly* every word of this. However:

(1) I don't want to hijack this thread about a v17 open item to talk
too much about a hypothetical v18 proposal.

(2) While in general you need more than just SSL to ensure security,
I'm not sure that there's only one way to do it, and I doubt that we
should try to pick a winner.

(3) I suspect that even going as far as sslmode=require by default is
going to be extremely painful for hackers, the project, and users.
Moving the goalposts further increases the likelihood of nothing
happening at all.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Direct SSL connection with ALPN and HBA rules

On Mon, May 13, 2024 at 12:45 PM Jacob Champion
 wrote:
> For the record, I didn't say that... You mean Jelte's quote up above?

Yeah, sorry, I got my J-named hackers confused. Apologies.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?

2024-05-13 Thread Chapman Flack

On 05/13/24 09:35, aa wrote:
> If you call the action of "sifting" ordering, then yes. If you don't call
> it ordering, then no.

One thing seems intriguing about this idea: normally, an expected
property of any ORDER BY is that no result row can be passed down
the pipe until all input rows have been seen.

In the case of ORDER BY , or more generally
ORDER BY , a
pigeonhole sort could be used—and rows mapping to the ordered-first
pigeonhole could be passed down the pipe on sight. (Rows mapping to
any later pigeonhole still have to be held to the end, unless some
further analysis can identify when all rows for earlier pigeonholes
must have been seen).

I don't know whether any such ORDER BY strategy is already implemented,
or would be useful enough to be worth implementing, but it might be
handy in cases where a large number of rows are expected to map to
the first pigeonhole. Intermediate storage wouldn't be needed for those,
and some follow-on processing could go on concurrently.

The usage example offered here ("sift" nulls last, followed by
a LIMIT) does look a lot like a job for a WHERE clause though.

Regards,
-Chap

Re: cataloguing NOT NULL constraints

On Mon, May 13, 2024 at 12:45 PM Alvaro Herrera  wrote:
> The point is that a column can be in a primary key and not have an
> explicit not-null constraint.  This is different from having a column be
> NOT NULL and having a primary key on top.  In both cases the attnotnull
> flag is set; the difference between these two scenarios is what happens
> if you drop the primary key.  If you do not have an explicit not-null
> constraint, then the attnotnull flag is lost as soon as you drop the
> primary key.  You don't have to do DROP NOT NULL for that to happen
>
> This means that if you have a column that's in the primary key but does
> not have an explicit not-null constraint, then we shouldn't make one up.
> (Which we would, if we were to keep an unadorned NOT NULL that we can't
> drop at the end of the dump.)

It seems to me that the practical thing to do about this problem is
just decide not to solve it. I mean, it's currently the case that if
you establish a PRIMARY KEY when you create a table, the columns of
that key are marked NOT NULL and remain NOT NULL even if the primary
key is later dropped. So, if that didn't change, we would be no less
compliant with the SQL standard (or your reading of it) than we are
now. And if you do really want to make that change, why not split it
out into its own patch, so that the patch that does $SUBJECT is
changing the minimal number of other things at the same time? That
way, reverting something might not involve reverting everything, plus
you could have a separate design discussion about what that fix ought
to look like, separate from the issues that are truly inherent to
cataloging NOT NULL constraints per se.

What I meant about changing the order of operations is that,
currently, the database knows that the column is NOT NULL before the
COPY happens, and I don't think we can change that. I think you agree
-- that's why you invented the throwaway constraints. As far as I can
see, the problems all have to do with getting the "throwaway" part to
happen correctly. It can't be a problem to just mark the relevant
columns NOT NULL in the relevant tables -- we already do that. But if
you want to discard some of those NOT NULL markings once the PRIMARY
KEY is added, you have to know which ones to discard. If we just
consider the most straightforward scenario where somebody does a full
dump-and-restore, getting that right may be annoying, but it seems
like it surely has to be possible. The dump will just have to
understand which child tables (or, more generally, descendent tables)
got a NOT NULL marking on a column because of the PK and which ones
had an explicit marking in the old database and do the right thing in
each case.

But what if somebody does a selective restore of one table from a
partitioning hierarchy? Currently, the columns that would have been
part of the primary key end up NOT NULL, but the primary key itself is
not restored because it can't be. What will happen in this new system?
If you don't apply any NOT NULL constraints to those columns, then a
user who restores one partition from an old dump and tries to reattach
it to the correct partitioned table has to recheck the NOT NULL
constraint, unlike now. If you apply a normal-looking garden-variety
NOT NULL constraint to that column, you've invented a constraint that
didn't exist in the source database. And if you apply a throwaway NOT
NULL constraint but the user never attaches that table anywhere, then
the throwaway constraint survives. None of those options sound very
good to me.

Another scenario: Say that you have a table with a PRIMARY KEY. For
some reason, you want to drop the primary key and then add it back.
Well, with this definitional change, as soon as you drop it, you
forget that the underlying columns don't contain any nulls, so when
you add it back, you have to check them again. I don't know who would
find that behavior an improvement over what we have today.

So I don't really think it's a great idea to change this behavior, but
even if it is, is it such a good idea that we want to sink the whole
patch set repeatedly over it, as has already happened twice now? I
feel that if we did what Tom suggested a year ago in
https://www.postgresql.org/message-id/3801207.1681057...@sss.pgh.pa.us
-- "I'm inclined to think that this idea of suppressing the implied
NOT NULL from PRIMARY KEY is a nonstarter and we should just go ahead
and make such a constraint" -- there's a very good chance that a
revert would have been avoided here and it would still be just as
valid to think of revisiting this particular question in a future
release as it is now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: race condition in pg_class

2024-05-13 Thread Noah Misch

On Mon, May 13, 2024 at 04:59:59PM +0900, Michael Paquier wrote:
> About inplace050-tests-inj-v1.patch.
> 
> + /* Check if blocked_pid is in injection_wait(). */
> + proc = BackendPidGetProc(blocked_pid);
> + if (proc == NULL)
> + PG_RETURN_BOOL(false);  /* session gone: definitely unblocked */
> + wait_event =
> + 
> pgstat_get_wait_event(UINT32_ACCESS_ONCE(proc->wait_event_info));
> + if (wait_event && strncmp("INJECTION_POINT(",
> +   wait_event,
> +   
> strlen("INJECTION_POINT(")) == 0)
> + PG_RETURN_BOOL(true);
> 
> Hmm.  I am not sure that this is the right interface for the job
> because this is not only related to injection points but to the
> monitoring of a one or more wait events when running a permutation
> step.

Could you say more about that?  Permutation steps don't monitor wait events
today.  This patch would be the first instance of that.

> Perhaps this is something that should be linked to the spec
> files with some property area listing the wait events we're expected
> to wait on instead when running a step that we know will wait?

The spec syntax doesn't distinguish contention types at all.  The isolation
tester's needs are limited to distinguishing:

  (a) process is waiting on another test session
  (b) process is waiting on automatic background activity (autovacuum, mainly)

Automatic background activity doesn't make a process enter or leave
injection_wait(), so all injection point wait events fall in (a).  (The tester
ignores (b), since those clear up without intervention.  Failing to ignore
them, as the tester did long ago, made output unstable.)

Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?

aa  writes:
> If you call the action of "sifting" ordering, then yes. If you don't call
> it ordering, then no.
> In essence, is the output of a filtering mechanism, done in a single result
> set pass. And this pass should be the same pass in charge of collecting the
> result set in the first place.

Sounds a lot like a WHERE clause to me.

regards, tom lane

Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?

2024-05-13 Thread aa

Hi,
If you call the action of "sifting" ordering, then yes. If you don't call
it ordering, then no.

In essence, is the output of a filtering mechanism, done in a single result
set pass. And this pass should be the same pass in charge of collecting the
result set in the first place.

Thanks


On Mon, May 13, 2024 at 5:48 AM Wolfgang Wilhelm 
wrote:

> Hi,
>
> do I interpret your idea correctly: You want some sort of ordering without
> ordering?
>
> Kind regards
> WW
>
> Am Montag, 13. Mai 2024 um 10:40:38 MESZ hat aa 
> Folgendes geschrieben:
>
>
> Hello Everyone!
>
> Is there any chance to get some kind of a result set sifting mechanism in
> Postgres?
>
> What I am looking for is a way to get for example: "nulls last" in a
> result set, without having to call "order by" or having to use UNION ALL,
> and if possible to get this in a single result set pass.
>
> Something on this line: SELECT a, b, c FROM my_table WHERE a nulls last
> OFFSET 0 LIMIT 25
>
> I don't want to use order by or union all because these are time consuming
> operations, especially on  large data sets and when comparations are done
> on dynamic values (eg: geolocation distances in between a mobile and a
> static location)
>
> What I would expect from such a feature, will be speeds comparable with
> non sorted selects, while getting a very rudimentary ordering.
>
> A use case for such a mechanism will be the implementation of QUICK
> relevant search results for a search engine.
>
> I'm not familiar with how Postgres logic handles simple select queries,
> but the way I would envision a result set sifting logic, would be to
> collect the result set, in 2 separate lists, based on the sifting
> condition, and then concatenate these 2 lists and return the result, when
> the pagination requests conditions are met.
>
> Any idea if such a functionality is feasible ?
>
> Thank you.
>
>   PS: if ever implemented, the  sifting mechanism could be extended to
> accommodate any type of thresholds, not just null values.
>
>
>
>

I have an exporting need...

2024-05-13 Thread Juan Hernández

Hi team!

First, i want to thank you for having your hands in this. You are doing a
fantastic and blessing job. Bless to you all!

I have a special need i want to comment to you. This is not a bug, is a
need i have and i write here for been redirected where needed.

I have to make a daily backup. The database is growing a lot per day, and
sometimes i've had the need to recover just a table. And would be easier
move a 200M file with only the needed table instead of moving a 5G file
with all the tables i don't need, just a matter of speed.

I've created a script to export every table one by one, so in case i need
to import a table again, don't have the need to use the very big
exportation file, but the "tablename.sql" file created for every table.

My hosting provider truncated my script because is very large (more than
200 lines, each line to export one table), so i think the way i do this is
hurting the server performance.

Then my question.

Do you consider useful to add a parameter (for example, --separatetables)
so when used the exporting file process can create a different
tablename.sql file for each table in database automatically?

Example...

PGHOST="/tmp" PGPASSWORD="mydbpass" pg_dump -U dbusername --separatetables
-Fp --inserts dbname > "/route/dbname.sql"

And if this database has tables table1...table10, then 10 files are
created...

dbname_table1.sql
dbname_table2.sql
dbname_table3.sql
...
dbname_table8.sql
dbname_table9.sql
dbname_table10.sql


In each file, all main parameters will be generated again. For example the
file dbname_table1.sql...

--
-- PostgreSQL database dump
--
-- Dumped from database version 10.21
-- Dumped by pg_dump version 15.6
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
...
...
SET default_tablespace = '';
--
-- Name: table1; Type: TABLE; Schema: public; Owner: dbusername
--
CREATE TABLE public.table1 (
code numeric(5,0),
name character varying(20)
)


I dont know if many developers have same need as me. I hope this help in
future.

Thanks for reading me and thanks for what you've done.. You are doing fine!
Cheers!


__
Juan de Jesús

On the use of channel binding without server certificates (was: Direct SSL connection with ALPN and HBA rules)

[soapbox thread, so I've changed the Subject]

On Mon, May 13, 2024 at 4:08 AM Heikki Linnakangas  wrote:
> "channel_binding=require sslmode=require" also protects from MITM attacks.

This isn't true in the same way that "standard" TLS protects against
MITM. I know you know that, but for the benefit of bystanders reading
the threads, I think we should stop phrasing it like this. Most people
who want MITM protection need to be using verify-full.

Details for those bystanders: Channel binding alone will only
disconnect you after the MITM is discovered, after your startup packet
is leaked but before you send any queries to the server. A hash of
your password will also be leaked in that situation, which starts the
timer on an offline attack. And IIRC, you won't get an alert that says
"someone's in the middle"; it'll just look like you mistyped your
password.

(Stronger passwords provide stronger protection in this situation,
which is not a property that most people are used to. If I choose to
sign into Google with the password "hunter2", it doesn't somehow make
the TLS protection weaker. But if you rely on SCRAM by itself for
server authentication, it does more or less work like that.)

Use channel_binding *in addition to* sslmode=verify-full if you want
enhanced authentication of the peer, as suggested in the docs [1].
Don't rely on channel binding alone for the vast majority of use
cases, and if you know better for your particular use case, then you
already know enough to be able to ignore my advice.

[/soapbox]

Thanks,
--Jacob

[1] https://www.postgresql.org/docs/current/preventing-server-spoofing.html

Re: Fix resource leak (src/backend/libpq/be-secure-common.c)

2024-05-13 Thread Daniel Gustafsson

> On 13 May 2024, at 20:05, Ranier Vilela  wrote:
> Em qua., 10 de abr. de 2024 às 15:33, Daniel Gustafsson  
> escreveu:

> Thanks, I'll have a look.  I've left this for post-freeze on purpose to not
> cause unnecessary rebasing.  Will take a look over the next few days unless
> beaten to it.

> Any chance we'll have these fixes in v17?

Nice timing, I was actually rebasing them today to get them committed.

--
Daniel Gustafsson

Re: Fix resource leak (src/backend/libpq/be-secure-common.c)

2024-05-13 Thread Ranier Vilela

Em qua., 10 de abr. de 2024 às 15:33, Daniel Gustafsson 
escreveu:

> On 10 Apr 2024, at 20:31, Ranier Vilela  wrote:
>
> Em ter., 2 de abr. de 2024 às 15:31, Daniel Gustafsson 
> escreveu:
>
>> > On 2 Apr 2024, at 20:13, Ranier Vilela  wrote:
>>
>> > Fix by freeing the pointer, like pclose_check (src/common/exec.c)
>> similar case.
>>
>> Off the cuff, seems reasonable when loglevel is LOG.
>>
>
> Per Coverity.
>
> Another case of resource leak, when loglevel is LOG.
> In the function shell_archive_file (src/backend/archive/shell_archive.c)
> The pointer *xlogarchcmd*  is not freed.
>
>
> Thanks, I'll have a look.  I've left this for post-freeze on purpose to not
> cause unnecessary rebasing.  Will take a look over the next few days unless
> beaten to it.
>
Any chance we'll have these fixes in v17?

best regards,
Ranier Vilela

Re: Fix out-of-bounds in the function GetCommandTagName

2024-05-13 Thread Ranier Vilela

Em seg., 13 de mai. de 2024 às 14:38, Tom Lane  escreveu:

> David Rowley  writes:
> > I've added a CF entry under your name for this:
> > https://commitfest.postgresql.org/48/4927/
>
> > If it was code new to PG17 I'd be inclined to go ahead with it now,
> > but it does not seem to align with making the release mode stable.
> > I'd bet others will feel differently about that.  Delaying seems a
> > better default choice at least.
>
> The security team's Coverity instance has started to show this
> complaint now too.  So I'm going to go ahead and push this change
> in HEAD.  It's probably unwise to change it in stable branches,
> since there's at least a small chance some external code is using
> COMMAND_TAG_NEXTTAG for the same purpose tag_behavior[] does.
> But we aren't anywhere near declaring v17's API stable, so
> I'd rather fix the issue than dismiss it in HEAD.
>
Thanks for the commit, Tom.

best regards,
Ranier Vilela

Re: UniqueKey v2

2024-05-13 Thread Antonin Houska

Antonin Houska  wrote:

> Andy Fan  wrote:
> > >
> > > * Combining the UKs
> > >
> > >   IMO this is the most problematic part of the patch. You call
> > >   populate_joinrel_uniquekeys() for the same join multiple times,
> > 
> > Why do you think so? The below code is called in "make_join_rel"
> 
> Consider join of tables "a", "b" and "c". My understanding is that
> make_join_rel() is called once with rel1={a} and rel2={b join c}, then with
> rel1={a join b} and rel2={c}, etc. I wanted to say that each call should
> produce the same set of unique keys.
> 
> I need to check this part more in detail.

I think I understand now. By calling populate_joinrel_uniquekeys() for various
orderings, you can find out that various input relation unique keys can
represent the whole join. For example, if the ordering is

A JOIN (B JOIN C)

you can prove that the unique keys of A can be used for the whole join, while
for the ordering

B JOIN (A JOIN C)

you can prove the same for the unique keys of B, and so on.

> > Is your original question is about populate_joinrel_uniquekey_for_rel
> > rather than populate_joinrel_uniquekeys? We have the below codes:
> > 
> > outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, 
> > outerrel,
> > 
> >  innerrel, restrictlist);
> > inneruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, 
> > innerrel,
> > 
> >  outerrel, restrictlist);
> > 
> > This is mainly because of the following theory. Quoted from
> > README.uniquekey. Let's called this as "rule 1".
> > 
> > """
> > How to maintain the uniquekey?
> > ---
> > .. From the join relation, it is maintained with two rules:
> > 
> > - the uniquekey in one side is still unique if it can't be duplicated
> >   after the join. for example:
> > 
> >   SELECT t1.pk FROM t1 JOIN t2 ON t1.a = t2.pk;
> >   UniqueKey on t1:  t1.pk
> >   UniqueKey on t1 Join t2:  t1.pk
> > """
> > 
> > AND the blow codes:
> > 
> > 
> > if (outeruk_still_valid || inneruk_still_valid)
> > 
> > /*
> >  * the uniquekey on outers or inners have been added into 
> > joinrel so
> >  * the combined uniuqekey from both sides is not needed.
> >  */
> > return;
> > 
> > 
> > We don't create the component uniquekey if any one side of the boths
> > sides is unique already. For example:
> > 
> > "(t1.id) in joinrel(t1, t2) is unique" OR "(t2.id) in joinrel is
> > unique", there is no need to create component UniqueKey (t1.id, t2.id);  
> 
> ok, I need to check more in detail how this part works.

This optimization makes sense to me.

> > >
> > >   Of course one problem is that the number of combinations can grow
> > >   exponentially as new relations are joined.
> > 
> > Yes, that's why "rule 1" needed and "How to reduce the overhead" in
> > UniqueKey.README is introduced. 

I think there should yet be some guarantee that the number of unique keys does
not grow exponentially. Perhaps a constant that allows a relation (base or
join) to have at most N unique keys. (I imagine N to be rather small, e.g. 3
or 4.) And when picking the "best N keys", one criterion could be the number
of expressions in the key (the shorter key the better).

> > >
> > >   2) Check if the join relation is single-row
> > >
> > >   I in order to get rid of the dependency on 'restrictlist', I think you 
> > > can
> > >   use ECs. Consider a query from your regression tests:
> > >
> > > CREATE TABLE uk_t (id int primary key, a int not null, b int not null, c 
> > > int, d int, e int);
> > >
> > > SELECT distinct t1.d FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id and t1.id 
> > > = 1;
> > >
> > >   The idea here seems to be that no more than one row of t1 matches the 
> > > query
> > >   clauses. Therefore, if t2(id) is unique, the clause t1.e=t2.id ensures 
> > > that
> > >   no more than one row of t2 matches the query (because t1 cannot provide 
> > > the
> > >   clause with more than one input value of 'e'). And therefore, the join 
> > > also
> > >   produces at most one row.
> > 
> > You are correct and IMO my current code are able to tell it is a single
> > row as well.
> > 
> > 1. Since t1.id = 1, so t1 is single row, so t1.d is unqiuekey as a
> > consequence.
> > 2. Given t2.id is unique, t1.e = t2.id so t1's unqiuekey can be kept
> > after the join because of rule 1 on joinrel. and t1 is singlerow, so the
> > joinrel is singlerow as well.

ok, I think I understand now.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Direct SSL connection with ALPN and HBA rules

On Mon, 13 May 2024 at 18:14, Robert Haas  wrote:
> I disagree with Jacob's assertion that sslmode=require has no security
> benefits over sslmode=prefer. That seems like the kind of pessimism
> that makes people hate security professionals. There have got to be
> some attacks that are foreclosed by encrypting the connection, even if
> you don't stop MITM attacks or other things that are more
> sophisticated than running wireshark and seeing what goes by on the
> wire.

Like Jacob already said, I guess you meant me here. The main point I
was trying to make is that sslmode=require is extremely insecure too,
so if we're changing the default then I'd rather bite the bullet and
actually make the default a secure one this time. No-ones browser
trusts self-signed certs by default, but currently lots of people
trust self-signed certs when connecting to their production database
without realizing.

IMHO the only benefit that sslmode=require brings over sslmode=prefer
is detecting incorrectly configured servers i.e. servers that are
supposed to support ssl but somehow don't due to a misconfigured
GUC/pg_hba. Such "incorrectly configured server" detection avoids
sending data to such a server, which an eavesdropper on the network
could see. Which is definitely a security benefit, but it's an
extremely small one. In all other cases sslmode=prefer brings exactly
the same protection as sslmode=require, because sslmode=prefer
encrypts the connection unless postgres actively tells the client to
downgrade to plaintext (which never happens when the server is
configured correctly).

Re: Fix out-of-bounds in the function GetCommandTagName

David Rowley  writes:
> I've added a CF entry under your name for this:
> https://commitfest.postgresql.org/48/4927/

> If it was code new to PG17 I'd be inclined to go ahead with it now,
> but it does not seem to align with making the release mode stable.
> I'd bet others will feel differently about that.  Delaying seems a
> better default choice at least.

The security team's Coverity instance has started to show this
complaint now too.  So I'm going to go ahead and push this change
in HEAD.  It's probably unwise to change it in stable branches,
since there's at least a small chance some external code is using
COMMAND_TAG_NEXTTAG for the same purpose tag_behavior[] does.
But we aren't anywhere near declaring v17's API stable, so
I'd rather fix the issue than dismiss it in HEAD.

regards, tom lane

Re: Allowing additional commas between columns, and at the end of the SELECT clause

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Tom Lane  writes:
>> I'm fairly down on this idea for SQL, because I think it creates
>> ambiguity for the ROW() constructor syntax.  That is:
>>  (x,y) is understood to be shorthand for ROW(x,y)
>>  (x) is not ROW(x), it's just x
>>  (x,) means what?

> Python has a similar issue: (x, y) is a tuple, but (x) is just x, and
> they use the trailing comma to disambiguate, so (x,) creates a
> single-item tuple.  AFAIK it's the only place where the trailing comma
> is significant.

Ugh :-(.  The semantic principle I'd prefer to have here is "a trailing
comma is ignored", but what they did breaks that.  But then again,
I'm not particularly a fan of anything about Python's syntax.

> Yeah, a more principled approach would be to not special-case target
> lists, but to allow one (and only one) trailing comma everywhere:
> select, order by, group by, array constructors, row constructors,
> everything that looks like a function call, etc.

If it can be made to work everywhere, that would get my vote.
I'm not sure if any other ambiguities arise, though.  SQL has
a lot of weird syntax corners (and the committee keeps adding
more :-().

regards, tom lane

Re: Improving information_schema._pg_expandarray()

2024-05-13 Thread Dagfinn Ilmari Mannsåker

[ I got distracted while writing this follow-up and only just found it
  in my list of unsent Gnus buffers, and now it's probably too late to
  make it for 17, but here it is anyway while I remember. ]

Tom Lane  writes:

> I happened to notice that information_schema._pg_expandarray(),
> which has the nigh-unreadable definition
>
> AS 'select $1[s],
> s operator(pg_catalog.-) pg_catalog.array_lower($1,1) 
> operator(pg_catalog.+) 1
> from pg_catalog.generate_series(pg_catalog.array_lower($1,1),
> pg_catalog.array_upper($1,1),
> 1) as g(s)';
>
> can now be implemented using unnest():
>
> AS 'SELECT * FROM pg_catalog.unnest($1) WITH ORDINALITY';
>
> It seems to be slightly more efficient this way, but the main point
> is to make it more readable.

I didn't spot this until it got committed, but it got me wondering what
eliminating the wrapper function completely would look like, so I
whipped up the attached.  It instead calls UNNEST() laterally in the
queries, which has the side benefit of getting rid of several
subselects, one of which was particularly confusing.  In one place the
lateral form eliminated the need for WITH ORDINALITY as well.


- ilmari

>From 9d1b2f2d16f10903d975a3bb7551a38c5ce62e15 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Thu, 28 Dec 2023 23:47:33 +
Subject: [PATCH] Eliminate information_schema._pg_expandarray completely

Commit 58054de2d0847c09ef091956f72ae5e9fb9a176e made it into a simple
wrapper around unnest, but we can simplfy things further by calling
unnest directly in the queries.
---
 src/backend/catalog/information_schema.sql | 104 +
 src/backend/utils/adt/arrayfuncs.c |   3 -
 src/test/regress/expected/psql.out |  30 +++---
 src/test/regress/sql/psql.sql  |   2 +-
 4 files changed, 58 insertions(+), 81 deletions(-)

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index c4145131ce..cf25f5d1bc 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -39,22 +39,15 @@ SET search_path TO information_schema;
  * A few supporting functions first ...
  */
 
-/* Expand any 1-D array into a set with integers 1..N */
-CREATE FUNCTION _pg_expandarray(IN anyarray, OUT x anyelement, OUT n int)
-RETURNS SETOF RECORD
-LANGUAGE sql STRICT IMMUTABLE PARALLEL SAFE
-ROWS 100 SUPPORT pg_catalog.array_unnest_support
-AS 'SELECT * FROM pg_catalog.unnest($1) WITH ORDINALITY';
-
 /* Given an index's OID and an underlying-table column number, return the
  * column's position in the index (NULL if not there) */
 CREATE FUNCTION _pg_index_position(oid, smallint) RETURNS int
 LANGUAGE sql STRICT STABLE
 BEGIN ATOMIC
-SELECT (ss.a).n FROM
-  (SELECT information_schema._pg_expandarray(indkey) AS a
-   FROM pg_catalog.pg_index WHERE indexrelid = $1) ss
-  WHERE (ss.a).x = $2;
+SELECT ik.icol
+FROM pg_catalog.pg_index,
+ pg_catalog.unnest(indkey) WITH ORDINALITY ik(tcol, icol)
+WHERE indexrelid = $1 AND ik.tcol = $2;
 END;
 
 CREATE FUNCTION _pg_truetypid(pg_attribute, pg_type) RETURNS oid
@@ -1079,37 +1072,32 @@ GRANT SELECT ON enabled_roles TO PUBLIC;
 
 CREATE VIEW key_column_usage AS
 SELECT CAST(current_database() AS sql_identifier) AS constraint_catalog,
-   CAST(nc_nspname AS sql_identifier) AS constraint_schema,
+   CAST(nc.nspname AS sql_identifier) AS constraint_schema,
CAST(conname AS sql_identifier) AS constraint_name,
CAST(current_database() AS sql_identifier) AS table_catalog,
-   CAST(nr_nspname AS sql_identifier) AS table_schema,
+   CAST(nr.nspname AS sql_identifier) AS table_schema,
CAST(relname AS sql_identifier) AS table_name,
CAST(a.attname AS sql_identifier) AS column_name,
-   CAST((ss.x).n AS cardinal_number) AS ordinal_position,
+   CAST(ck.icol AS cardinal_number) AS ordinal_position,
CAST(CASE WHEN contype = 'f' THEN
-   _pg_index_position(ss.conindid, ss.confkey[(ss.x).n])
+   _pg_index_position(c.conindid, c.confkey[ck.icol])
  ELSE NULL
 END AS cardinal_number)
  AS position_in_unique_constraint
 FROM pg_attribute a,
- (SELECT r.oid AS roid, r.relname, r.relowner,
- nc.nspname AS nc_nspname, nr.nspname AS nr_nspname,
- c.oid AS coid, c.conname, c.contype, c.conindid,
- c.confkey, c.confrelid,
- _pg_expandarray(c.conkey) AS x
-  FROM pg_namespace nr, pg_class r, pg_namespace nc,
-   pg_constraint c
-  WHERE nr.oid = r.relnamespace
-AND r.oid = c.conrelid
-AND nc.oid = c.connamespace
-AND c.contype IN ('p', 'u', 'f')

Re: cataloguing NOT NULL constraints

2024-05-13 Thread Alvaro Herrera

On 2024-May-13, Robert Haas wrote:

> On Mon, May 13, 2024 at 9:44 AM Alvaro Herrera  
> wrote:
> > The problematic point is the need to add NOT NULL constraints during
> > table creation that don't exist in the table being dumped, for
> > performance of primary key creation -- I called this a throwaway
> > constraint.  We needed to be able to drop those constraints after the PK
> > was created.  These were marked NO INHERIT to allow them to be dropped,
> > which is easier if the children don't have them.  This all worked fine.
> 
> This seems really weird to me. Why is it necessary? I mean, in
> existing releases, if you declare a column as PRIMARY KEY, the columns
> included in the key are forced to be NOT NULL, and you can't change
> that for so long as they are included in the PRIMARY KEY.

The point is that a column can be in a primary key and not have an
explicit not-null constraint.  This is different from having a column be
NOT NULL and having a primary key on top.  In both cases the attnotnull
flag is set; the difference between these two scenarios is what happens
if you drop the primary key.  If you do not have an explicit not-null
constraint, then the attnotnull flag is lost as soon as you drop the
primary key.  You don't have to do DROP NOT NULL for that to happen.

This means that if you have a column that's in the primary key but does
not have an explicit not-null constraint, then we shouldn't make one up.
(Which we would, if we were to keep an unadorned NOT NULL that we can't
drop at the end of the dump.)

> So I would have thought that after this patch, you'd end up with the
> same thing.

At least as I interpret the standard, you wouldn't.

> One way of doing that would be to make the PRIMARY KEY depend on the
> now-catalogued NOT NULL constraints, and the other way would be to
> keep it as an ad-hoc prohibition, same as now.

That would be against what [I think] the standard says.

> But I don't see why I need to end up with what the patch generates,
> which seems to be something like CONSTRAINT pgdump_throwaway_notnull_0
> NOT NULL NO INHERIT. That kind of thing suggests that we're changing
> around the order of operations in pg_dump, probably by adding the NOT
> NULL constraints at a later stage than currently, and I think the
> proper solution is most likely to be to avoid doing that in the first
> place.

The point of the throwaway constraints is that they don't remain after
the dump has restored completely.  They are there only so that we don't
have to scan the data looking for possible nulls when we create the
primary key.  We have a DROP CONSTRAINT for the throwaway not-nulls as
soon as the PK is created.

We're not changing any order of operations as such.

> That's not to say that we shouldn't try to make improvements, just
> that it may be hard to get right.

Sure, that's why this patch has now been reverted twice :-) and has been
in the works for ... how many years now?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Direct SSL connection with ALPN and HBA rules

(There's, uh, a lot to respond to above and I'm trying to figure out
how best to type up all of it.)

On Mon, May 13, 2024 at 9:13 AM Robert Haas  wrote:
> However,
> I disagree with Jacob's assertion that sslmode=require has no security
> benefits over sslmode=prefer.

For the record, I didn't say that... You mean Jelte's quote up above?:

> sslmode=prefer and sslmode=require
> are the same amount of insecure imho (i.e. extremely insecure).

I agree that requiring passive security is tangibly better than
allowing fallback to plaintext. I think Jelte's point might be better
stated as, =prefer and =require give the same amount of protection
against active attack (none).

--Jacob

Re: Allowing additional commas between columns, and at the end of the SELECT clause

2024-05-13 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> =?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
>> Matthias van de Meent  writes:
>>> Single trailing commas are a feature that's more and more common in
>>> languages, yes, but arbitrary excess commas is new to me. Could you
>>> provide some examples of popular languages which have that, as I can't
>>> think of any.
>
>> The only one I can think of is Perl, which I'm not sure counts as
>> popular any more.  JavaScript allows consecutive commas in array
>> literals, but they're not no-ops, they create empty array slots:
>
> I'm fairly down on this idea for SQL, because I think it creates
> ambiguity for the ROW() constructor syntax.  That is:
>
>   (x,y) is understood to be shorthand for ROW(x,y)
>
>   (x) is not ROW(x), it's just x
>
>   (x,) means what?

Python has a similar issue: (x, y) is a tuple, but (x) is just x, and
they use the trailing comma to disambiguate, so (x,) creates a
single-item tuple.  AFAIK it's the only place where the trailing comma
is significant.

> I realize the original proposal intended to restrict the legality of
> excess commas to only a couple of places, but to me that just flags
> it as a kluge.  ROW(...) ought to work pretty much the same as a
> SELECT list.

Yeah, a more principled approach would be to not special-case target
lists, but to allow one (and only one) trailing comma everywhere:
select, order by, group by, array constructors, row constructors,
everything that looks like a function call, etc.

> As already mentioned, if you can get some variant of this through the
> SQL standards process, we'll probably adopt it.  But I doubt that we
> want to get out front of the committee in this area.

Agreed.

>   regards, tom lane

- ilmari

Re: Direct SSL connection with ALPN and HBA rules

On Mon, May 13, 2024 at 9:37 AM Heikki Linnakangas  wrote:
> On 10/05/2024 16:50, Heikki Linnakangas wrote:
> > New proposal:
> >
> > - Remove the "try both" mode completely, and rename "requiredirect" to
> > just "direct". So there would be just two modes: "postgres" and
> > "direct". On reflection, the automatic fallback mode doesn't seem very
> > useful. It would make sense as the default, because then you would get
> > the benefits automatically in most cases but still be compatible with
> > old servers. But if it's not the default, you have to fiddle with libpq
> > settings anyway to enable it, and then you might as well use the
> > "requiredirect" mode when you know the server supports it. There isn't
> > anything wrong with it as such, but given how much confusion there's
> > been on how this all works, I'd prefer to cut this back to the bare
> > minimum now. We can add it back in the future, and perhaps make it the
> > default at the same time. This addresses points 2. and 3. above.
> >
> > and:
> >
> > - Only allow sslnegotiation=direct with sslmode=require or higher. This
> > is what you, Jacob, wanted to do all along, and addresses point 1.
>
> Here's a patch to implement that.

I find this idea to be a massive improvement over the status quo, and
I didn't spot any major problems when I read through the patch,
either. I'm not quite sure if the patch takes the right approach in
emphasizing that weaker sslmode settings are not allowed because of
unintended fallbacks. It seems to me that we could equally well say
that those combinations are nonsensical. If we're making a direct SSL
connection, SSL is eo ipso required.

I don't have a strong opinion about whether sslnegotiation=direct
should error out (as you propose here) or silently promote sslmode to
require. I think either is defensible. Had I been implementing it, I
think I would have done as Jacob proposes, just because once we've
forced a direct SSL negotiation surely the only sensible behavior is
to be using SSL, unless you think there should be a
silently-reconnect-without-SSL behavior, which I sure don't. However,
I disagree with Jacob's assertion that sslmode=require has no security
benefits over sslmode=prefer. That seems like the kind of pessimism
that makes people hate security professionals. There have got to be
some attacks that are foreclosed by encrypting the connection, even if
you don't stop MITM attacks or other things that are more
sophisticated than running wireshark and seeing what goes by on the
wire.

I'm pleased to hear that you will propose to make sslmode=require the
default in v18. I think we'll need to do some work to figure out how
much collateral damage that will cause, and maybe it will be more than
we can stomach, but Magnus has been saying for years that the current
default is terrible. I'm not sure I was entirely convinced of that the
first time I heard him say it, but I'm smarter now than I was then.
It's really hard to believe in 2024 that anyone should ever be using a
setting that may or may not encrypt the connection. There's http and
https but there's no httpmaybes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: An improved README experience for PostgreSQL

2024-05-13 Thread Nathan Bossart

On Mon, May 13, 2024 at 05:43:45PM +0200, Alvaro Herrera wrote:
> Can't we add these two lines per topic to the README.md?

That would be fine with me, too.  The multiple-files approach is perhaps a
bit more tailored to GitHub, but there's something to be said for keeping
this information centralized.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: An improved README experience for PostgreSQL

2024-05-13 Thread Alvaro Herrera

On 2024-May-13, Nathan Bossart wrote:

> > If we want to enhance the GitHub experience, we can also add these files to
> > the organization instead: 
> > https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/creating-a-default-community-health-file
> 
> This was the intent of my patch.  There might be a few others that we could
> use, but I figured we could start with the low-hanging fruit that would
> have the most impact on the GitHub experience.

Can't we add these two lines per topic to the README.md?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"

Re: An improved README experience for PostgreSQL

2024-05-13 Thread Nathan Bossart

On Sun, May 12, 2024 at 05:17:42PM +0200, Peter Eisentraut wrote:
> I don't know, I find these files kind of "yelling".  It's fine to have a
> couple, but now it's getting a bit much, and there are more that could be
> added.

I'm not sure what you mean by this.  Do you mean that the contents are too
blunt?  That there are too many files?  Something else?

> If we want to enhance the GitHub experience, we can also add these files to
> the organization instead: 
> https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/creating-a-default-community-health-file

This was the intent of my patch.  There might be a few others that we could
use, but I figured we could start with the low-hanging fruit that would
have the most impact on the GitHub experience.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: cataloguing NOT NULL constraints

On Mon, May 13, 2024 at 9:44 AM Alvaro Herrera  wrote:
> The problematic point is the need to add NOT NULL constraints during
> table creation that don't exist in the table being dumped, for
> performance of primary key creation -- I called this a throwaway
> constraint.  We needed to be able to drop those constraints after the PK
> was created.  These were marked NO INHERIT to allow them to be dropped,
> which is easier if the children don't have them.  This all worked fine.

This seems really weird to me. Why is it necessary? I mean, in
existing releases, if you declare a column as PRIMARY KEY, the columns
included in the key are forced to be NOT NULL, and you can't change
that for so long as they are included in the PRIMARY KEY. So I would
have thought that after this patch, you'd end up with the same thing.
One way of doing that would be to make the PRIMARY KEY depend on the
now-catalogued NOT NULL constraints, and the other way would be to
keep it as an ad-hoc prohibition, same as now. In PostgreSQL 16, I get
a dump like this:

CREATE TABLE public.foo (
a integer NOT NULL,
b text
);

COPY public.foo (a, b) FROM stdin;
\.

ALTER TABLE ONLY public.foo
ADD CONSTRAINT foo_pkey PRIMARY KEY (a);

If I'm dumping from an existing release, I don't see why any of that
needs to change. The NOT NULL decoration should lead to a
system-generated constraint name. If I'm dumping from a new release,
the NOT NULL decoration needs to be replaced with CONSTRAINT
existing_constraint_name NOT NULL. But I don't see why I need to end
up with what the patch generates, which seems to be something like
CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT. That kind
of thing suggests that we're changing around the order of operations
in pg_dump, probably by adding the NOT NULL constraints at a later
stage than currently, and I think the proper solution is most likely
to be to avoid doing that in the first place.

> However, at some point we realized that we needed to add NOT NULL
> constraints in child tables for the columns in which the parent had a
> primary key.  Then things become messy because we had the throwaway
> constraints on one hand and the not-nulls that descend from the PK on
> the other hand, where one was NO INHERIT and the other wasn't; worse if
> the child also has a primary key.

This seems like another problem that is created by changing the order
of operations in pg_dump.

> > The other possibility that occurs to me is that I think the motivation
> > for cataloging NOT NULL constraints was that we wanted to be able to
> > track dependencies on them, or something like that, which seems like
> > it might be able to create issues of the type that you're facing, but
> > the details aren't clear to me.
>
> NOT VALID constraints would be extremely useful, for one thing (because
> then you don't need to exclusively-lock the table during a long scan in
> order to add a constraint), and it's just one step away from having
> these constraints be catalogued.  It was also fixing some inconsistent
> handling of inheritance cases.

I agree that NOT VALID constraints would be very useful. I'm a little
scared by the idea of fixing inconsistent handling of inheritance
cases, just for fear that there may be more things relying on the
inconsistent behavior than we realize. I feel like this is an area
where it's easy for changes to be scarier than they at first seem. I
still have memories of discovering some of the current behavior back
in the mid-2000s when I was learning PostgreSQL (and databases
generally). It struck me as fiddly back then, and it still does. I
feel like there are probably some behaviors that look like arbitrary
decisions but are actually very important for some undocumented
reason. That's not to say that we shouldn't try to make improvements,
just that it may be hard to get right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New GUC autovacuum_max_threshold ?

2024-05-13 Thread Frédéric Yhuel





Le 09/05/2024 à 16:58, Robert Haas a écrit :

As I see it, a lot of the lack of agreement up until now is people
just not understanding the math. Since I think I've got the right idea
about the math, I attribute this to other people being confused about
what is going to happen and would tend to phrase it as: some people
don't understand how catastrophically bad it will be if you set this
value too low.


FWIW, I do agree with your math. I found your demonstration convincing. 
50 was selected with the wet finger.


Using the formula I suggested earlier:

vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples, 
vac_base_thresh + vac_scale_factor * sqrt(reltuples) * 1000);


your table of 2.56 billion tuples will be vacuumed if there are
more than 10 million dead tuples (every 28 minutes).

If we want to stick with the simple formula, we should probably choose a 
very high default, maybe 100 million, as you suggested earlier.


However, it would be nice to have the visibility map updated more 
frequently than every 100 million dead tuples. I wonder if this could be 
decoupled from the vacuum process?

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Heikki Linnakangas


On 13/05/2024 16:55, Jelte Fennema-Nio wrote:

On Mon, 13 May 2024 at 15:38, Heikki Linnakangas  wrote:

Here's a patch to implement that.


+   if (conn->sslnegotiation[0] == 'd' &&
+   conn->sslmode[0] != 'r' && conn->sslmode[0] != 'v')

I think these checks should use strcmp instead of checking magic first
characters. I see this same clever trick is used in the recently added
init_allowed_encryption_methods, and I think that should be changed to
use strcmp too for readability.


Oh yeah, I hate that too. These should be refactored into enums, with a 
clear separate stage of parsing the options from strings. But we use 
that pattern all over the place, so I didn't want to start reforming it 
with this patch.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

2024-05-13 Thread Matthias van de Meent

On Mon, 13 May 2024 at 16:13, Tom Lane  wrote:
>
> Matthias van de Meent  writes:
> > PFA a patch that fixes this issue, by assuming that all pages in the
> > source database utilize a non-standard page layout.
>
> Surely that cure is worse than the disease?

I don't know where we would get the information whether the selected
relation fork's pages are standard-compliant. We could base it off of
the fork number (that info is available locally) but that doesn't
guarantee much.
For VM and FSM-pages we know they're essentially never
standard-compliant (hence this thread), but for the main fork it is
anyone's guess once the user has installed an additional AM - which we
don't detect nor pass through to the offending
RelationCopyStorageUsingBuffer.

As for "worse", the default template database is still much smaller
than the working set of most databases. This will indeed regress the
workload a bit, but only by the fraction of holes in the page + all
FSM/VM data.
I think the additional WAL volume during CREATE DATABASE is worth it
when the alternative is losing that data with physical
replication/secondary instances. Note that this does not disable page
compression, it just stops the logging of holes in pages; holes which
generally are only a fraction of the whole database.

It's not inconceivable that this will significantly increase WAL
volume, but I think we should go for correctness rather than fastest
copy. If we went with fastest copy, we'd better just skip logging the
FSM and VM forks because we're already ignoring the data of the pages,
so why not ignore the pages themselves, too? I don't think that holds
water when we want to be crash-proof in CREATE DATABASE, with a full
data copy of the template database.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?

2024-05-13 Thread Isaac Morland

On Mon, 13 May 2024 at 04:40, aa  wrote:

> Hello Everyone!
>
> Is there any chance to get some kind of a result set sifting mechanism in
> Postgres?
>
> What I am looking for is a way to get for example: "nulls last" in a
> result set, without having to call "order by" or having to use UNION ALL,
> and if possible to get this in a single result set pass.
>
> Something on this line: SELECT a, b, c FROM my_table WHERE a nulls last
> OFFSET 0 LIMIT 25
>
> I don't want to use order by or union all because these are time consuming
> operations, especially on  large data sets and when comparations are done
> on dynamic values (eg: geolocation distances in between a mobile and a
> static location)
>

This already exists: ORDER BY a IS NULL

I've found it to be more useful than one might initially expect to order by
a boolean expression.

Re: Direct SSL connection with ALPN and HBA rules

On Mon, 13 May 2024 at 13:07, Heikki Linnakangas  wrote:
> "channel_binding=require sslmode=require" also protects from MITM attacks.

Cool, I didn't realize we had this connection option and it could be
used like this. But I think there's a few security downsides of
channel_binding=require over sslmode=verify-full: If the client relies
on channel_binding to validate server authenticity, a leaked
server-side SCRAM hash is enough for an attacker to impersonate a
server. While normally a leaked scram hash isn't really much of a
security concern (assuming long enough passwords). I also don't know
of many people rotating their scram hashes, even though many rotate
TLS certs.

> I think these options should be designed from the user's point of view,
> so that the user can specify the risks they're willing to accept, and
> the details of how that's accomplished are handled by libpq. For
> example, I'm OK with (tick all that apply):
>
> [ ] passive eavesdroppers seeing all the traffic
> [ ] MITM being able to hijack the session
> [ ] connecting without verifying the server's identity
> [ ] divulging the plaintext password to the server
> [ ] ...

I think that sounds like a great idea, looking forward to the proposal.

Re: Why is citext/regress failing on hamerkop?

2024-05-13 Thread Andrew Dunstan




On 2024-05-12 Su 18:05, Thomas Munro wrote:

On Mon, May 13, 2024 at 12:26 AM Andrew Dunstan  wrote:

Well, this is more or less where I came in back in about 2002 :-) I've been 
trying to help support it ever since, mainly motivated by stubborn persistence 
than anything else. Still, I agree that the lack of support for the Windows 
port from Microsoft over the years has been more than disappointing.

I think "state of the Windows port" would make a good discussion topic
at pgconf.dev (with write-up for those who can't be there).  If there
is interest, I could organise that with a short presentation of the
issues I am aware of so far and possible improvements and
smaller-things-we-could-drop-instead-of-dropping-the-whole-port.  I
would focus on technical stuff, not who-should-be-doing-what, 'cause I
can't make anyone do anything.



+1


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

Matthias van de Meent  writes:
> PFA a patch that fixes this issue, by assuming that all pages in the
> source database utilize a non-standard page layout.

Surely that cure is worse than the disease?

regards, tom lane

Re: Show WAL write and fsync stats in pg_stat_io

2024-05-13 Thread Bharath Rupireddy

On Fri, Apr 19, 2024 at 1:32 PM Nazir Bilal Yavuz  wrote:
>
> > I wanted to inform you that the 73f0a13266 commit changed all WALRead
> > calls to read variable bytes, only the WAL receiver was reading
> > variable bytes before.
>
> I want to start working on this again if possible. I will try to
> summarize the current status:

Thanks for working on this.

> * With the 73f0a13266 commit, the WALRead() function started to read
> variable bytes in every case. Before, only the WAL receiver was
> reading variable bytes.
>
> * With the 91f2cae7a4 commit, WALReadFromBuffers() is merged. We were
> discussing what we have to do when this is merged. It is decided that
> WALReadFromBuffers() does not call pgstat_report_wait_start() because
> this function does not perform any IO [1]. We may follow the same
> logic by not including these to pg_stat_io?

Right. WALReadFromBuffers doesn't do any I/O.

Whoever reads WAL from disk (backends, walsenders, recovery process)
using pg_pread (XLogPageRead, WALRead) needs to be tracked in
pg_stat_io or some other view. If it were to be in pg_stat_io,
although we may not be able to distinguish WAL read stats at a backend
level (like how many times/bytes a walsender or recovery process or a
backend read WAL from disk), but it can help understand overall impact
of WAL read I/O at a cluster level. With this approach, the WAL I/O
stats are divided up - WAL read I/O and write I/O stats are in
pg_stat_io and pg_stat_wal respectively.

This makes me think if we need to add WAL read I/O stats also to
pg_stat_wal. Then, we can also add WALReadFromBuffers stats
hits/misses there. With this approach, pg_stat_wal can be a one-stop
view for all the WAL related stats. If needed, we can join info from
pg_stat_wal to pg_stat_io in system_views.sql so that the I/O stats
are emitted to the end-user via pg_stat_io.

> * With the b5a9b18cd0 commit, streaming I/O is merged but AFAIK this
> does not block anything related to putting WAL stats in pg_stat_io.
>
> If I am not missing any new changes, the only problem is reading
> variable bytes now. We have discussed a couple of solutions:
>
> 1- Change op_bytes to something like -1, 0, 1, NULL etc. and document
> that this means some variable byte I/O is happening.
>
> I kind of dislike this solution because if the *only* read I/O is
> happening in variable bytes, it will look like write and extend I/Os
> are happening in variable bytes as well. As a solution, it could be
> documented that only read I/Os could happen in variable bytes for now.

Yes, read I/O for relation and WAL can happen in variable bytes. I
think this idea seems reasonable and simple yet useful to know the
cluster-wide read I/O.

However, another point here is how the total number of bytes read is
represented with existing pg_stat_io columns 'reads' and 'op_bytes'.
It is known now with 'reads' * 'op_bytes', but with variable bytes,
how is read bytes calculated? Maybe add new columns
read_bytes/write_bytes?

> 2- Use op_bytes_[read | write | extend] columns instead of one
> op_bytes column, also use the first solution.
>
> This can solve the first solution's weakness but it introduces two
> more columns. This is more future proof compared to the first solution
> if there is a chance that some variable I/O could happen in write and
> extend calls as well in the future.

-1 as more columns impact the readability and usability.

> 3- Create a new pg_stat_io_wal view to put WAL I/Os here instead of 
> pg_stat_io.
>
> pg_stat_io could remain untouchable and we will have flexibility to
> edit this new view as much as we want. But the original aim of the
> pg_stat_io is evaluating all I/O from a single view and adding a new
> view breaks this aim.

-1 as it defeats the very purpose of one-stop view pg_stat_io for all
kinds of I/O. PS: see my response above about adding both WAL write
I/O and read I/O stats to pg_stat_wal.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Allowing additional commas between columns, and at the end of the SELECT clause

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Matthias van de Meent  writes:
>> Single trailing commas are a feature that's more and more common in
>> languages, yes, but arbitrary excess commas is new to me. Could you
>> provide some examples of popular languages which have that, as I can't
>> think of any.

> The only one I can think of is Perl, which I'm not sure counts as
> popular any more.  JavaScript allows consecutive commas in array
> literals, but they're not no-ops, they create empty array slots:

I'm fairly down on this idea for SQL, because I think it creates
ambiguity for the ROW() constructor syntax.  That is:

(x,y) is understood to be shorthand for ROW(x,y)

(x) is not ROW(x), it's just x

(x,) means what?

I realize the original proposal intended to restrict the legality of
excess commas to only a couple of places, but to me that just flags
it as a kluge.  ROW(...) ought to work pretty much the same as a
SELECT list.

As already mentioned, if you can get some variant of this through the
SQL standards process, we'll probably adopt it.  But I doubt that we
want to get out front of the committee in this area.

regards, tom lane

Re: BitmapHeapScan streaming read user and prelim refactoring

2024-05-13 Thread Melanie Plageman

On Sat, May 11, 2024 at 3:18 PM Tomas Vondra
 wrote:
>
> On 5/10/24 21:48, Melanie Plageman wrote:
> > Attached is v3. I didn't use your exact language because the test
> > wouldn't actually verify that we properly discard the tuples. Whether
> > or not the empty tuples are all emitted, it just resets the counter to
> > 0. I decided to go with "exercise" instead.
> >
>
> I did go over the v3 patch, did a bunch of tests, and I think it's fine
> and ready to go. The one thing that might need some minor tweaks is the
> commit message.
>
> 1) Isn't the subject "Remove incorrect assert" a bit misleading, as the
> patch does not simply remove an assert, but replaces it with a reset of
> the field the assert used to check? (The commit message does not mention
> this either, at least not explicitly.)

I've updated the commit message.

> 2) The "heap AM-specific bitmap heap scan code" sounds a bit strange to
> me, isn't the first "heap" unnecessary?

bitmap heap scan has been used to refer to bitmap table scans, as the
name wasn't changed from heap when the table AM API was added (e.g.
BitmapHeapNext() is used by all table AMs doing bitmap table scans).
04e72ed617be specifically pushed the skip fetch optimization into heap
implementations of bitmap table scan callbacks, so it was important to
make this distinction. I've changed the commit message to say heap AM
implementations of bitmap table scan callbacks.

While looking at the patch again, I wondered if I should set
enable_material=false in the test. It doesn't matter from the
perspective of exercising the correct code; however, I wasn't sure if
disabling materialization would make the test more resilient against
future planner changes which could cause it to incorrectly fail.

- Melanie
From d3628d8d36b2be54b64c18402bdda80e1c8a436f Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Fri, 10 May 2024 14:52:34 -0400
Subject: [PATCH v4] BitmapHeapScan: Replace incorrect assert with
 reinitialization

04e72ed617be pushed the skip fetch optimization (allowing bitmap heap
scans to operate like index-only scans if none of the underlying data is
needed) into heap AM implementations of bitmap table scan callbacks.

04e72ed617be added an assert that all tuples in blocks eligible for the
optimization had been NULL-filled and emitted by the end of the scan.
This assert is incorrect when not all tuples need be scanned to execute
the query; for example: a join in which not all inner tuples need to be
scanned before skipping to the next outer tuple.

Remove the assert and reset the field on which it previously asserted to
avoid incorrectly emitting NULL-filled tuples from a previous scan on
rescan.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra
Reported-by: Melanie Plageman
Reproduced-by: Tomas Vondra, Richard Guo
Discussion: https://postgr.es/m/CAMbWs48orzZVXa7-vP9Nt7vQWLTE04Qy4PePaLQYsVNQgo6qRg%40mail.gmail.com
---
 src/backend/access/heap/heapam.c   |  4 ++--
 src/test/regress/expected/join.out | 38 ++
 src/test/regress/sql/join.sql  | 26 
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4be0dee4de..8600c22515 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1184,7 +1184,7 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_vmbuffer = InvalidBuffer;
 	}

-	Assert(scan->rs_empty_tuples_pending == 0);
+	scan->rs_empty_tuples_pending = 0;

 	/*
 	 * The read stream is reset on rescan. This must be done before
@@ -1216,7 +1216,7 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_vmbuffer))
 		ReleaseBuffer(scan->rs_vmbuffer);

-	Assert(scan->rs_empty_tuples_pending == 0);
+	scan->rs_empty_tuples_pending = 0;

 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 0246d56aea..8829bd76e7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -7924,3 +7924,41 @@ where exists (select 1 from j3
 (13 rows)

 drop table j3;
+-- Exercise the "skip fetch" Bitmap Heap Scan optimization when candidate
+-- tuples are discarded. This may occur when:
+--   1. A join doesn't require all inner tuples to be scanned for each outer
+--  tuple, and
+--   2. The inner side is scanned using a bitmap heap scan, and
+--   3. The bitmap heap scan is eligible for the "skip fetch" optimization.
+--  This optimization is usable when no data from the underlying table is
+--  needed. Use a temp table so it is only visible to this backend and
+--  vacuum may reliably mark all blocks in the table all visible in the
+--  visibility map.
+CREATE TEMP TABLE skip_fetch (a INT, b INT) WITH (fillfactor=10);
+INSERT INTO skip_fetch SELECT i % 3, i FROM generate_series(0,30) i;
+CREATE INDEX ON skip_fetch(a);
+VACUUM

Re: Direct SSL connection with ALPN and HBA rules

On Mon, 13 May 2024 at 15:38, Heikki Linnakangas  wrote:
> Here's a patch to implement that.

+   if (conn->sslnegotiation[0] == 'd' &&
+   conn->sslmode[0] != 'r' && conn->sslmode[0] != 'v')

I think these checks should use strcmp instead of checking magic first
characters. I see this same clever trick is used in the recently added
init_allowed_encryption_methods, and I think that should be changed to
use strcmp too for readability.

Re: cataloguing NOT NULL constraints

2024-05-13 Thread Alvaro Herrera

On 2024-May-13, Robert Haas wrote:

> On Sat, May 11, 2024 at 5:40 AM Alvaro Herrera  
> wrote:

> > Specifically, the problem is that I mentioned that we could restrict the
> > NOT NULL NO INHERIT addition in pg_dump for primary keys to occur only
> > in pg_upgrade; but it turns this is not correct.  In normal
> > dump/restore, there's an additional table scan to check for nulls when
> > the constraints is not there, so the PK creation would become measurably
> > slower.  (In a table with a million single-int rows, PK creation goes
> > from 2000ms to 2300ms due to the second scan to check for nulls).
> 
> I have a feeling that any theory of the form "X only needs to happen
> during pg_upgrade" is likely to be wrong. pg_upgrade isn't really
> doing anything especially unusual: just creating some objects and
> loading data. Those things can also be done at other times, so
> whatever is needed during pg_upgrade is also likely to be needed at
> other times. Maybe that's not sound reasoning for some reason or
> other, but that's my intuition.

True.  It may be that by setting up the upgrade SQL script differently,
we don't need to make the distinction at all.  I hope to be able to do
that.

> I'm sorry that I haven't been following this thread closely, but I'm
> confused about how we ended up here. What exactly are the user-visible
> behavior changes wrought by this patch, and how do they give rise to
> these issues?

The problematic point is the need to add NOT NULL constraints during
table creation that don't exist in the table being dumped, for
performance of primary key creation -- I called this a throwaway
constraint.  We needed to be able to drop those constraints after the PK
was created.  These were marked NO INHERIT to allow them to be dropped,
which is easier if the children don't have them.  This all worked fine.

However, at some point we realized that we needed to add NOT NULL
constraints in child tables for the columns in which the parent had a
primary key.  Then things become messy because we had the throwaway
constraints on one hand and the not-nulls that descend from the PK on
the other hand, where one was NO INHERIT and the other wasn't; worse if
the child also has a primary key.

It turned out that we didn't have any mechanism to transform a NO
INHERIT constraint into a regular one that would be inherited.  I added
one, didn't like the way it worked, tried to restrict it but that caused
other problems; this is the mess that led to the revert (pg_dump in
normal mode would emit scripts that fail for some legitimate cases).

One possible way forward might be to make pg_dump smarter by adding one
more query to know the relationship between constraints that must be
dropped and those that don't.  Another might be to allow multiple
not-null constraints on the same column (one inherits, the other
doesn't, and you can drop them independently).  There may be others.

> The other possibility that occurs to me is that I think the motivation
> for cataloging NOT NULL constraints was that we wanted to be able to
> track dependencies on them, or something like that, which seems like
> it might be able to create issues of the type that you're facing, but
> the details aren't clear to me.

NOT VALID constraints would be extremely useful, for one thing (because
then you don't need to exclusively-lock the table during a long scan in
order to add a constraint), and it's just one step away from having
these constraints be catalogued.  It was also fixing some inconsistent
handling of inheritance cases.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Heikki Linnakangas


On 10/05/2024 16:50, Heikki Linnakangas wrote:

New proposal:

- Remove the "try both" mode completely, and rename "requiredirect" to
just "direct". So there would be just two modes: "postgres" and
"direct". On reflection, the automatic fallback mode doesn't seem very
useful. It would make sense as the default, because then you would get
the benefits automatically in most cases but still be compatible with
old servers. But if it's not the default, you have to fiddle with libpq
settings anyway to enable it, and then you might as well use the
"requiredirect" mode when you know the server supports it. There isn't
anything wrong with it as such, but given how much confusion there's
been on how this all works, I'd prefer to cut this back to the bare
minimum now. We can add it back in the future, and perhaps make it the
default at the same time. This addresses points 2. and 3. above.

and:

- Only allow sslnegotiation=direct with sslmode=require or higher. This
is what you, Jacob, wanted to do all along, and addresses point 1.


Here's a patch to implement that.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 2a7bde8502966a634b62d4109f0cde5fdc685da2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Mon, 13 May 2024 16:36:54 +0300
Subject: [PATCH 1/1] Remove option to fall back from direct to postgres SSL
 negotiation

There were three problems with the sslnegotiation options:

1. The sslmode=prefer and sslnegotiation=requiredirect combination was
somewhat dangerous, as you might unintentionally fall back to
plaintext authentication when connecting to a pre-v17 server.

2. There was an asymmetry between 'postgres' and 'direct'
options. 'postgres' meant "try only traditional negotiation", while
'direct' meant "try direct first, and fall back to traditional
negotiation if it fails". That was apparent only if you knew that the
'requiredirect' mode also exists.

3. The "require" word in 'requiredirect' suggests that it's somehow
more strict or more secure, similar to sslmode. However, I don't
consider direct SSL connections to be a security feature.

To address these problems:

- Only allow sslnegotiation='direct' if sslmode='require' or
stronger. And for the record, Jacob and Robert felt that we should do
that (or have sslnegotiation='direct' imply sslmode='require') anyway,
regardless of the first issue.

- Remove the 'direct' mode that falls back to traditional negotiation,
and rename what was called 'requiredirect' to 'direct' instead. In
other words, there is no "try both methods" option anymore, 'postgres'
now means the traditional negotiation and 'direct' means a direct SSL
connection.

Discussion: https://www.postgresql.org/message-id/d3b1608a-a1b6-4eda-9ec5-ddb3e4375808%40iki.fi
---
 doc/src/sgml/libpq.sgml   |  49 ++--
 src/interfaces/libpq/fe-connect.c |  52 ++--
 src/interfaces/libpq/libpq-int.h  |   3 +-
 .../libpq/t/005_negotiate_encryption.pl   | 254 --
 4 files changed, 150 insertions(+), 208 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 1d32c226d8..b32e497b1b 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1772,15 +1772,18 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   sslnegotiation
   

-This option controls whether PostgreSQL
-will perform its protocol negotiation to request encryption from the
-server or will just directly make a standard SSL
-connection.  Traditional PostgreSQL
-protocol negotiation is the default and the most flexible with
-different server configurations. If the server is known to support
-direct SSL connections then the latter requires one
-fewer round trip reducing connection latency and also allows the use
-of protocol agnostic SSL network tools.
+This option controls how SSL encryption is negotiated with the server,
+if SSL is used. In the default postgres mode, the
+client first asks the server if SSL is supported. In
+direct mode, the client starts the standard SSL
+handshake directly after establishing the TCP/IP connection. Traditional
+PostgreSQL protocol negotiation is the most
+flexible with different server configurations. If the server is known
+to support direct SSL connections then the latter
+requires one fewer round trip reducing connection latency and also
+allows the use of protocol agnostic SSL network tools. The direct SSL
+option was introduced in PostgreSQL version
+17.

 
 
@@ -1798,32 +1801,14 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   direct
   

- first attempt to establish a standard SSL connection and if that
- fails reconnect and perform the negotiation. This fallback
- process adds significant latency if the initial SSL connection
-

Re: Fix parallel vacuum buffer usage reporting

2024-05-13 Thread Nazir Bilal Yavuz

Hi,

On Fri, 10 May 2024 at 19:09, Masahiko Sawada  wrote:
>
> On Fri, May 10, 2024 at 7:26 PM Nazir Bilal Yavuz  wrote:
> >
> > Hi,
> >
> > Thank you for working on this!
> >
> > On Wed, 1 May 2024 at 06:37, Masahiko Sawada  wrote:
> > >
> > > Thank you for further testing! I've pushed the patch.
> >
> > I realized a behaviour change while looking at 'Use pgBufferUsage for
> > block reporting in analyze' thread [1]. Since that change applies here
> > as well, I thought it is better to mention it here.
> >
> > Before this commit, VacuumPageMiss did not count the blocks if its
> > read was already completed by other backends [2]. Now,
> > 'pgBufferUsage.local_blks_read + pgBufferUsage.shared_blks_read'
> > counts every block attempted to be read; possibly double counting if
> > someone else has already completed the read.
>
> True. IIUC there is such a difference only in HEAD but not in v15 and
> v16. The following comment in WaitReadBuffers() says that it's a
> traditional behavior that we count blocks as read even if someone else
> has already completed its I/O:
>
> /*
>  * We count all these blocks as read by this backend.  This is traditional
>  * behavior, but might turn out to be not true if we find that someone
>  * else has beaten us and completed the read of some of these blocks.  In
>  * that case the system globally double-counts, but we traditionally don't
>  * count this as a "hit", and we don't have a separate counter for "miss,
>  * but another backend completed the read".
>  */
>
> So I think using pgBufferUsage for (parallel) vacuum is a legitimate
> usage and makes it more consistent with other parallel operations.

That sounds logical. Thank you for the clarification.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: cataloguing NOT NULL constraints

On Sat, May 11, 2024 at 5:40 AM Alvaro Herrera  wrote:
> I have found two more problems that I think are going to require some
> more work to fix, so I've decided to cut my losses now and revert the
> whole.  I'll come back again in 18 with these problems fixed.

Bummer, but makes sense.

> Specifically, the problem is that I mentioned that we could restrict the
> NOT NULL NO INHERIT addition in pg_dump for primary keys to occur only
> in pg_upgrade; but it turns this is not correct.  In normal
> dump/restore, there's an additional table scan to check for nulls when
> the constraints is not there, so the PK creation would become measurably
> slower.  (In a table with a million single-int rows, PK creation goes
> from 2000ms to 2300ms due to the second scan to check for nulls).

I have a feeling that any theory of the form "X only needs to happen
during pg_upgrade" is likely to be wrong. pg_upgrade isn't really
doing anything especially unusual: just creating some objects and
loading data. Those things can also be done at other times, so
whatever is needed during pg_upgrade is also likely to be needed at
other times. Maybe that's not sound reasoning for some reason or
other, but that's my intuition.

> The addition of NOT NULL NO INHERIT constraints for this purpose
> collides with addition of constraints for other reasons, and it forces
> us to do unpleasant things such as altering an existing constraint to go
> from NO INHERIT to INHERIT.  If this happens only during pg_upgrade,
> that would be okay IMV; but if we're forced to allow in normal operation
> (and in some cases we are), it could cause inconsistencies, so I don't
> want to do that.  I see a way to fix this (adding another query in
> pg_dump that detects which columns descend from ones used in PKs in
> ancestor tables), but that's definitely too much additional mechanism to
> be adding this late in the cycle.

I'm sorry that I haven't been following this thread closely, but I'm
confused about how we ended up here. What exactly are the user-visible
behavior changes wrought by this patch, and how do they give rise to
these issues? One change I know about is that a constraint that is
explicitly catalogued (vs. just existing implicitly) has a name. But
it isn't obvious to me that such a difference, by itself, is enough to
cause all of these problems: if a NOT NULL constraint is created
without a name, then I suppose we just have to generate one. Maybe the
fact that the constraints have names somehow causes ugliness later,
but I can't quite understand why it would.

The other possibility that occurs to me is that I think the motivation
for cataloging NOT NULL constraints was that we wanted to be able to
track dependencies on them, or something like that, which seems like
it might be able to create issues of the type that you're facing, but
the details aren't clear to me. Changing any behavior in this area
seems like it could be quite tricky, because of things like the
interaction between PRIMARY KEY and NOT NULL, which is rather
idiosyncratic but upon which a lot of existing SQL (including SQL not
controlled by us) likely depends. If there's not a clear plan for how
we keep all the stuff that works today working, I fear we'll end up in
an endless game of whack-a-mole. If you've already written the design
ideas down someplace, I'd appreciate a pointer in the right direction.

Or maybe there's some other issue entirely. In any case, sorry about
the revert, and sorry that I haven't paid more attention to this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

2024-05-13 Thread Matthias van de Meent

Hi,

My collegue Konstantin Knizhnik pointed out that we fail to mark pages
with a non-standard page layout with page_std=false in
RelationCopyStorageUsingBuffer when we WAL-log them. This causes us to
interpret the registered buffer as a standard buffer, and omit the
hole in the page, which for FSM/VM pages covers the whole page.

The immediate effect of this bug is that replicas and primaries in a
physical replication system won't have the same data in their VM- and
FSM-forks until the first VACUUM on the new database has WAL-logged
these pages again. Whilst not actively harmful for the VM/FSM
subsystems, it's definitely suboptimal.
Secondary unwanted effects are that AMs that use the buffercache- but
which don't use or update the pageheader- also won't see the main data
logged in WAL, thus potentially losing user data in the physical
replication stream or with a system crash. I've not looked for any
such AMs and am unaware of any that would have this issue, but it's
better to fix this.


PFA a patch that fixes this issue, by assuming that all pages in the
source database utilize a non-standard page layout.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)


v1-0001-Fix-logging-of-non-standard-pages-in-RelationCopy.patch
Description: Binary data

Re: Allowing additional commas between columns, and at the end of the SELECT clause

2024-05-13 Thread Dagfinn Ilmari Mannsåker

Matthias van de Meent  writes:

> On Mon, 13 May 2024 at 10:42, Artur Formella  
> wrote:
>> Motivation:
>> Commas of this type are allowed in many programming languages, in some
>> it is even recommended to use them at the ends of lists or objects.
>
> Single trailing commas are a feature that's more and more common in
> languages, yes, but arbitrary excess commas is new to me. Could you
> provide some examples of popular languages which have that, as I can't
> think of any.

The only one I can think of is Perl, which I'm not sure counts as
popular any more.  JavaScript allows consecutive commas in array
literals, but they're not no-ops, they create empty array slots:

❯ js
Welcome to Node.js v18.19.0.
Type ".help" for more information.
> [1,,2,,]
[ 1, <1 empty item>, 2, <1 empty item> ]

- ilmari

RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thanks for giving comments! New patch was posted in [1].

> 0.1 General - Patch name
> 
> /SUBSCIRPTION/SUBSCRIPTION/

Fixed.

> ==
> 0.2 General - Apply
> 
> FYI, there are whitespace warnings:
> 
> git
> apply ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTI
> ON-.-S.patch
> ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-
> S.patch:191:
> trailing whitespace.
> # command will abort the prepared transaction and succeed.
> warning: 1 line adds whitespace errors.

I didn't recognize, fixed.

> ==
> 0.3 General - Regression test fails
> 
> The subscription regression tests are not working.
> 
> ok 158   + publication  1187 ms
> not ok 159   + subscription  123 ms
> 
> See review comments #4 and #5 below for the reason why.

Yeah, I missed to update the expected result. Fixed.

> ==
> src/sgml/ref/alter_subscription.sgml
> 
> 1.
>   
>The two_phase parameter can only be altered when
> the
> -  subscription is disabled. When altering the parameter from
> on
> -  to off, the backend process checks prepared
> -  transactions done by the logical replication worker and aborts them.
> +  subscription is disabled. Altering the parameter from
> on
> +  to off will be failed when there are prepared
> +  transactions done by the logical replication worker. If you want to 
> alter
> +  the parameter forcibly in this case, force_alter
> +  option must be set to on at the same time. If
> +  specified, the backend process aborts prepared transactions.
>   
> 1a.
> That "will be failed when..." seems strange. Maybe say "will give an
> error when..."
> 
> ~
> 1b.
> Because "force" is a verb, I think true/false is more natural than
> on/off for this new boolean option. e.g. it acts more like a "flag"
> than a "mode". See all the other boolean options in CREATE
> SUBSCRIPTION -- those are mostly all verbs too and are all true/false
> AFAIK.

Fixed, but note that the part was moved.

> 
> ==
> 
> 2. CREATE SUBSCRIPTION
> 
> For my previous review, two comments [v7-0004#2] and [v7-0004#3] were
> not addressed. Kuroda-san wrote:
> Hmm, but the parameter cannot be used for CREATE SUBSCRIPTION. Should
> we modify to accept and add the description in the doc?
> 
> ~
> 
> Yes, that is what I am suggesting. IMO it is odd for the user to be
> able to ALTER a parameter that cannot be included in the CREATE
> SUBSCRIPTION in the first place. AFAIK there are no other parameters
> that behave that way.

Hmm. I felt that this change required the new attribute in pg_subscription 
system
catalog. Previously I did not like because it contains huge change, but...I 
tried to do.
New attribute 'subforcealter', and some parts were updated accordingly.

> src/backend/commands/subscriptioncmds.c
> 
> 3. AlterSubscription
> 
> + if (!opts.force_alter)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot alter %s when there are prepared transactions",
> + "two_phase = off"),
> + errhint("Resolve these transactions or set %s at the same time, and
> then try again.",
> + "force_alter = true")));
> 
> I think saying "at the same time" in the hint is unnecessary. Surely
> the user is allowed to set this parameter separately if they want to?
> 
> e.g.
> ALTER SUBSCRIPTION sub SET (force_alter=true);
> ALTER SUBSCRIPTION sub SET (two_phase=off);

Actually, it was correct. Since force_alter was not recorded in the system 
catalog, it must
be specified at the same time.
Now, we allow to be separated, so removed.

> ==
> src/test/regress/expected/subscription.out
> 
> 4.
> +-- fail - force_alter cannot be set alone
> +ALTER SUBSCRIPTION regress_testsub SET (force_alter = true);
> +ERROR:  force_alter must be specified with two_phase
> 
> This error cannot happen. You removed that error!

Fixed.

> ==
> src/test/subscription/t/099_twophase_added.pl
> 
> 6.
> +# Try altering the two_phase option to "off." The command will fail since 
> there
> +# is a prepared transaction and the force option is not specified.
> +my $stdout;
> +my $stderr;
> +
> +($result, $stdout, $stderr) = $node_subscriber->psql(
> + 'postgres', "ALTER SUBSCRIPTION regress_sub SET (two_phase = off);");
> +ok($stderr =~ /cannot alter two_phase = off when there are prepared
> transactions/,
> + 'ALTER SUBSCRIPTION failed');
> 
> /force option is not specified./'force_alter' option is not specified as 
> true./

Fixed.

> 
> 7.
> +# Verify the prepared transaction still exists
> +$result = $node_subscriber->safe_psql('postgres',
> +"SELECT count(*) FROM pg_prepared_xacts;");
> +is($result, q(1), "prepared transaction still exits");
> +
> 
> TYPO: /exits/exists/

Fixed.

> 
> ~~~
> 
> 8.
> +# Alter the two_phase with the force_alter option. Apart from the above, the
> +# command will abort the prepared transaction and succeed.
>

RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thanks for reviewing! Patch can be available in [1].

> ==
> src/sgml/ref/alter_subscription.sgml
> 
> 1.
> + 
> +  The two_phase parameter can only be altered when
> the
> +  subscription is disabled. When altering the parameter from
> on
> +  to off, the backend process checks prepared
> +  transactions done by the logical replication worker and aborts them.
> + 
> 
> The text may be OK as-is, but I was wondering if it might be better to
> give a more verbose explanation.
> 
> BEFORE
> ... the backend process checks prepared transactions done by the
> logical replication worker and aborts them.
> 
> SUGGESTION
> ... the backend process checks for any incomplete prepared
> transactions done by the logical replication worker (from when
> two_phase parameter was still "on") and, if any are found, those are
> aborted.
>

Fixed.

> ==
> src/backend/commands/subscriptioncmds.c
> 
> 2. AlterSubscription
> 
> - /*
> - * Since the altering two_phase option of subscriptions
> - * also leads to the change of slot option, this command
> - * cannot be rolled back. So prevent we are in the
> - * transaction block.
> + * If two_phase was enabled, there is a possibility the
> + * transactions has already been PREPARE'd. They must be
> + * checked and rolled back.
>   */
> 
> BEFORE
> ... there is a possibility the transactions has already been PREPARE'd.
> 
> SUGGESTION
> ... there is a possibility that transactions have already been PREPARE'd.

Fixed.

> 3. AlterSubscription
> + /*
> + * Since the altering two_phase option of subscriptions
> + * (especially on->off case) also leads to the
> + * change of slot option, this command cannot be rolled
> + * back. So prevent we are in the transaction block.
> + */
>   PreventInTransactionBlock(isTopLevel,
> "ALTER SUBSCRIPTION ... SET (two_phase = off)");
> 
> 
> This comment is a bit vague and includes some typos, but IIUC these
> problems will already be addressed by the 0002 patch changes.AFAIK
> patch 0003 is only moving the 0002 comment.

Yeah, the comment was updated accordingly.

> 
> ==
> src/test/subscription/t/099_twophase_added.pl
> 
> 4.
> +# Check the case that prepared transactions exist on the subscriber node
> +#
> +# If the two_phase is altering from "on" to "off" and there are prepared
> +# transactions on the subscriber, they must be aborted. This test checks it.
> +
> 
> Similar to the comment that I gave for v8-0002. I think there should
> be  comment for the major test comment to
> distinguish it from comments for the sub-steps.

Added.

> 5.
> +# Verify the prepared transaction are aborted because two_phase is changed to
> +# "off".
> +$result = $node_subscriber->safe_psql('postgres',
> +"SELECT count(*) FROM pg_prepared_xacts;");
> +is($result, q(0), "prepared transaction done by worker is aborted");
> +
> 
> /the prepared transaction are aborted/any prepared transactions are aborted/

Fixed.

[1]: 
https://www.postgresql.org/message-id/OSBPR01MB2552FEA48D265EA278AA9F7AF5E22%40OSBPR01MB2552.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: pg_stat_advisor extension

2024-05-13 Thread Ilia Evdokimov

1. In the case of parallel workers the plan_rows value has a different semantics than the number of rows predicted. Just explore 

get_parallel_divisor().

2. The extension recommends new statistics immediately upon an error finding. But what if the reason for the error is stale statistics? Or 
this error may be raised for only one specific set of constants, and 
estimation will be done well in another 99.% of cases for the same 
expression.


The new parameter, `pg_stat_advisor.analyze_scale_factor`, can suggest 
the execution of the ANALYZE command on specific tables. The extension 
now evaluates the ratio of `n_live_tup` (number of live tuples) to 
`n_mod_since_analyze` (number of modifications since last analyze) in 
the `pg_stat_all_tables` catalog. If this ratio exceeds the value 
specified in `analyze_scale_factor`, the extension will suggest an 
update to the table's statistics.


There are a lot of parameters that influences on estimated rows. 
Statistics might not help improve estimated rows. This feature is 
designed to provide users with data-driven insights to decide whether 
updating statistics via the ANALYZE command could potentially improve 
query performance. By suggesting rather than automatically executing 
statistics updates, we empower you to make informed decisions based on 
the specific needs and conditions of your database environment.


I've developed an extension that provides suggestions on whether to 
update or create statistics for your PostgreSQL database, without 
executing any changes. This approach allows you to consider various 
parameters that influence row estimates and make informed decisions 
about optimizing your database's performance.


Your feedback is invaluable, and we look forward to hearing about your 
experiences and any improvements you might suggest. Best regards, Ilia 
Evdokimov Tantor Labs LLC.
From eb998bea96a3640d240afa63e08cc8cf98925bf7 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 13 May 2024 14:41:59 +0300
Subject: [PATCH] 'pg_stat_advisor' extension

This service as a hook into executor. It has two GUC-parameters.

pg_stat_advisor.analyze_scale_factor: if ratio of pg_stat_all_tables.n_live_tup to pg_stat_all_tables.n_mod_since_analyze is greater than pg_stat_advisor.analyze_scale_factor extension prints suggestion executing ANALYZE command.

pg_stat_advisor.suggest_statistics_threshold: the ratio of total rows to planned rows is greater than or equal to this threshold the extension prints suggestion executing the creation of statistics, using the naming format 'relationName_columns'
---
 contrib/Makefile  |   1 +
 contrib/pg_stat_advisor/Makefile  |  20 +
 .../expected/pg_stat_advisor.out  |  52 ++
 contrib/pg_stat_advisor/meson.build   |  30 +
 contrib/pg_stat_advisor/pg_stat_advisor.c | 560 ++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  24 +
 6 files changed, 687 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index abd780f277..d6ce2fe562 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -33,6 +33,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 00..f31b939e8a
--- /dev/null
+++ b/contrib/pg_stat_advisor/Makefile
@@ -0,0 +1,20 @@
+# contrib/pg_stat_advisor/Makefile
+
+MODULE_big = pg_stat_advisor
+OBJS = \
+	$(WIN32RES) \
+	pg_stat_advisor.o
+PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics"
+
+REGRESS = pg_stat_advisor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_stat_advisor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_stat_advisor/expected/pg_stat_advisor.out b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out
new file mode 100644
index 00..8f3dab2c2f
--- /dev/null
+++ b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out
@@ -0,0 +1,52 @@
+LOAD 'pg_stat_advisor';
+SET pg_stat_advisor.analyze_scale_factor = 0.4;
+SET pg_stat_advisor.suggest_statistics_threshold = 0.11;
+CREATE TABLE my_tbl(fld_1 INTEGER, fld_2 BIGINT) WITH (autovacuum_enabled = false);
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+ i/100 as fld_1,
+ i/500 as fld_2
+FROM generate_series(1, 1000) s(i);
+ANALYZE my_tbl;
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+ i/100 as fld_1,

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-13 Thread Pavel Borisov

A correction of a typo in previous message:
non-leaf pages iteration cycles (under P_ISLEAF(topaque)) -> non-leaf pages
iteration cycles (under !P_ISLEAF(topaque))

On Mon, 13 May 2024 at 16:19, Pavel Borisov  wrote:

>
>
> On Mon, 13 May 2024 at 15:55, Pavel Borisov 
> wrote:
>
>> Hi, Alexander!
>>
>> On Mon, 13 May 2024 at 05:42, Alexander Korotkov 
>> wrote:
>>
>>> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov
>>>  wrote:
>>> > On Sat, May 11, 2024 at 4:13 AM Mark Dilger
>>> >  wrote:
>>> > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov <
>>> aekorot...@gmail.com> wrote:
>>> > > > The only bt_target_page_check() caller is
>>> > > > bt_check_level_from_leftmost(), which overrides state->target in
>>> the
>>> > > > next iteration anyway.  I think the patch is just refactoring to
>>> > > > eliminate the confusion pointer by Peter Geoghegan upthread.
>>> > >
>>> > > I find your argument unconvincing.
>>> > >
>>> > > After bt_target_page_check() returns at line 919, and before
>>> bt_check_level_from_leftmost() overrides state->target in the next
>>> iteration, bt_check_level_from_leftmost() conditionally fetches an item
>>> from the page referenced by state->target.  See line 963.
>>> > >
>>> > > I'm left with four possibilities:
>>> > >
>>> > >
>>> > > 1)  bt_target_page_check() never gets to the code that uses
>>> "rightpage" rather than "state->target" in the same iteration where
>>> bt_check_level_from_leftmost() conditionally fetches an item from
>>> state->target, so the change you're making doesn't matter.
>>> > >
>>> > > 2)  The code prior to v2-0003 was wrong, having changed
>>> state->target in an inappropriate way, causing the wrong thing to happen at
>>> what is now line 963.  The patch fixes the bug, because state->target no
>>> longer gets overwritten where you are now using "rightpage" for the value.
>>> > >
>>> > > 3)  The code used to work, having set up state->target correctly in
>>> the place where you are now using "rightpage", but v2-0003 has broken that.
>>> > >
>>> > > 4)  It's been broken all along and your patch just changes from
>>> wrong to wrong.
>>> > >
>>> > >
>>> > > If you believe (1) is true, then I'm complaining that you are
>>> relying far to much on action at a distance, and that you are not
>>> documenting it.  Even with documentation of this interrelationship, I'd be
>>> unhappy with how brittle the code is.  I cannot easily discern that the two
>>> don't ever happen in the same iteration, and I'm not at all convinced one
>>> way or the other.  I tried to set up some Asserts about that, but none of
>>> the test cases actually reach the new code, so adding Asserts doesn't help
>>> to investigate the question.
>>> > >
>>> > > If (2) is true, then I'm complaining that the commit message doesn't
>>> mention the fact that this is a bug fix.  Bug fixes should be clearly
>>> documented as such, otherwise future work might assume the commit can be
>>> reverted with only stylistic consequences.
>>> > >
>>> > > If (3) is true, then I'm complaining that the patch is flat busted.
>>> > >
>>> > > If (4) is true, then maybe we should revert the entire feature, or
>>> have a discussion of mitigation efforts that are needed.
>>> > >
>>> > > Regardless of which of 1..4 you pick, I think it could all do with
>>> more regression test coverage.
>>> > >
>>> > >
>>> > > For reference, I said something similar earlier today in another
>>> email to this thread:
>>> > >
>>> > > This patch introduces a change that stores a new page into variable
>>> "rightpage" rather than overwriting "state->target", which the old
>>> implementation most certainly did.  That means that after returning from
>>> bt_target_page_check() into the calling function
>>> bt_check_level_from_leftmost() the value in state->target is not what it
>>> would have been prior to this patch.  Now, that'd be irrelevant if nobody
>>> goes on to consult that value, but just 44 lines further down in
>>> bt_check_level_from_leftmost() state->target is clearly used.  So the
>>> behavior at that point is changing between the old and new versions of the
>>> code, and I think I'm within reason to ask if it was wrong before the
>>> patch, wrong after the patch, or something else?  Is this a bug being
>>> introduced, being fixed, or ... ?
>>> >
>>> > Thank you for your analysis.  I'm inclined to believe in 2, but not
>>> > yet completely sure.  It's really pity that our tests don't cover
>>> > this.  I'm investigating this area.
>>>
>>> It seems that I got to the bottom of this.  Changing
>>> BtreeCheckState.target for a cross-page unique constraint check is
>>> wrong, but that happens only for leaf pages.  After that
>>> BtreeCheckState.target is only used for setting the low key.  The low
>>> key is only used for non-leaf pages.  So, that didn't lead to any
>>> visible bug.  I've revised the commit message to reflect this.
>>>
>>
>> I agree with your analysis regarding state->target:
>> - when the unique check is on,

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-13 Thread Pavel Borisov

On Mon, 13 May 2024 at 15:55, Pavel Borisov  wrote:

> Hi, Alexander!
>
> On Mon, 13 May 2024 at 05:42, Alexander Korotkov 
> wrote:
>
>> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov
>>  wrote:
>> > On Sat, May 11, 2024 at 4:13 AM Mark Dilger
>> >  wrote:
>> > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov <
>> aekorot...@gmail.com> wrote:
>> > > > The only bt_target_page_check() caller is
>> > > > bt_check_level_from_leftmost(), which overrides state->target in the
>> > > > next iteration anyway.  I think the patch is just refactoring to
>> > > > eliminate the confusion pointer by Peter Geoghegan upthread.
>> > >
>> > > I find your argument unconvincing.
>> > >
>> > > After bt_target_page_check() returns at line 919, and before
>> bt_check_level_from_leftmost() overrides state->target in the next
>> iteration, bt_check_level_from_leftmost() conditionally fetches an item
>> from the page referenced by state->target.  See line 963.
>> > >
>> > > I'm left with four possibilities:
>> > >
>> > >
>> > > 1)  bt_target_page_check() never gets to the code that uses
>> "rightpage" rather than "state->target" in the same iteration where
>> bt_check_level_from_leftmost() conditionally fetches an item from
>> state->target, so the change you're making doesn't matter.
>> > >
>> > > 2)  The code prior to v2-0003 was wrong, having changed state->target
>> in an inappropriate way, causing the wrong thing to happen at what is now
>> line 963.  The patch fixes the bug, because state->target no longer gets
>> overwritten where you are now using "rightpage" for the value.
>> > >
>> > > 3)  The code used to work, having set up state->target correctly in
>> the place where you are now using "rightpage", but v2-0003 has broken that.
>> > >
>> > > 4)  It's been broken all along and your patch just changes from wrong
>> to wrong.
>> > >
>> > >
>> > > If you believe (1) is true, then I'm complaining that you are relying
>> far to much on action at a distance, and that you are not documenting it.
>> Even with documentation of this interrelationship, I'd be unhappy with how
>> brittle the code is.  I cannot easily discern that the two don't ever
>> happen in the same iteration, and I'm not at all convinced one way or the
>> other.  I tried to set up some Asserts about that, but none of the test
>> cases actually reach the new code, so adding Asserts doesn't help to
>> investigate the question.
>> > >
>> > > If (2) is true, then I'm complaining that the commit message doesn't
>> mention the fact that this is a bug fix.  Bug fixes should be clearly
>> documented as such, otherwise future work might assume the commit can be
>> reverted with only stylistic consequences.
>> > >
>> > > If (3) is true, then I'm complaining that the patch is flat busted.
>> > >
>> > > If (4) is true, then maybe we should revert the entire feature, or
>> have a discussion of mitigation efforts that are needed.
>> > >
>> > > Regardless of which of 1..4 you pick, I think it could all do with
>> more regression test coverage.
>> > >
>> > >
>> > > For reference, I said something similar earlier today in another
>> email to this thread:
>> > >
>> > > This patch introduces a change that stores a new page into variable
>> "rightpage" rather than overwriting "state->target", which the old
>> implementation most certainly did.  That means that after returning from
>> bt_target_page_check() into the calling function
>> bt_check_level_from_leftmost() the value in state->target is not what it
>> would have been prior to this patch.  Now, that'd be irrelevant if nobody
>> goes on to consult that value, but just 44 lines further down in
>> bt_check_level_from_leftmost() state->target is clearly used.  So the
>> behavior at that point is changing between the old and new versions of the
>> code, and I think I'm within reason to ask if it was wrong before the
>> patch, wrong after the patch, or something else?  Is this a bug being
>> introduced, being fixed, or ... ?
>> >
>> > Thank you for your analysis.  I'm inclined to believe in 2, but not
>> > yet completely sure.  It's really pity that our tests don't cover
>> > this.  I'm investigating this area.
>>
>> It seems that I got to the bottom of this.  Changing
>> BtreeCheckState.target for a cross-page unique constraint check is
>> wrong, but that happens only for leaf pages.  After that
>> BtreeCheckState.target is only used for setting the low key.  The low
>> key is only used for non-leaf pages.  So, that didn't lead to any
>> visible bug.  I've revised the commit message to reflect this.
>>
>
> I agree with your analysis regarding state->target:
> - when the unique check is on, state->target was reassigned only for the
> leaf pages (under P_ISLEAF(topaque) in bt_target_page_check).
> - in this level (leaf) in bt_check_level_from_leftmost() this value of
> state->target was used to get state->lowkey. Then it was reset (in the next
> iteration of do loop in in bt_check_level_from_leftmost()
> - state->lowkey

Re: Allowing additional commas between columns, and at the end of the SELECT clause

2024-05-13 Thread Étienne BERSAC

Hi,

As a developer, I love this feature.

But as a developer of an universal TDOP SQL parser[1], this can be a
pain. Please request it to the standard.

Regards,
Étienne


[1]: https://gitlab.com/dalibo/transqlate

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-13 Thread Pavel Borisov

Hi, Alexander!

On Mon, 13 May 2024 at 05:42, Alexander Korotkov 
wrote:

> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov
>  wrote:
> > On Sat, May 11, 2024 at 4:13 AM Mark Dilger
> >  wrote:
> > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov <
> aekorot...@gmail.com> wrote:
> > > > The only bt_target_page_check() caller is
> > > > bt_check_level_from_leftmost(), which overrides state->target in the
> > > > next iteration anyway.  I think the patch is just refactoring to
> > > > eliminate the confusion pointer by Peter Geoghegan upthread.
> > >
> > > I find your argument unconvincing.
> > >
> > > After bt_target_page_check() returns at line 919, and before
> bt_check_level_from_leftmost() overrides state->target in the next
> iteration, bt_check_level_from_leftmost() conditionally fetches an item
> from the page referenced by state->target.  See line 963.
> > >
> > > I'm left with four possibilities:
> > >
> > >
> > > 1)  bt_target_page_check() never gets to the code that uses
> "rightpage" rather than "state->target" in the same iteration where
> bt_check_level_from_leftmost() conditionally fetches an item from
> state->target, so the change you're making doesn't matter.
> > >
> > > 2)  The code prior to v2-0003 was wrong, having changed state->target
> in an inappropriate way, causing the wrong thing to happen at what is now
> line 963.  The patch fixes the bug, because state->target no longer gets
> overwritten where you are now using "rightpage" for the value.
> > >
> > > 3)  The code used to work, having set up state->target correctly in
> the place where you are now using "rightpage", but v2-0003 has broken that.
> > >
> > > 4)  It's been broken all along and your patch just changes from wrong
> to wrong.
> > >
> > >
> > > If you believe (1) is true, then I'm complaining that you are relying
> far to much on action at a distance, and that you are not documenting it.
> Even with documentation of this interrelationship, I'd be unhappy with how
> brittle the code is.  I cannot easily discern that the two don't ever
> happen in the same iteration, and I'm not at all convinced one way or the
> other.  I tried to set up some Asserts about that, but none of the test
> cases actually reach the new code, so adding Asserts doesn't help to
> investigate the question.
> > >
> > > If (2) is true, then I'm complaining that the commit message doesn't
> mention the fact that this is a bug fix.  Bug fixes should be clearly
> documented as such, otherwise future work might assume the commit can be
> reverted with only stylistic consequences.
> > >
> > > If (3) is true, then I'm complaining that the patch is flat busted.
> > >
> > > If (4) is true, then maybe we should revert the entire feature, or
> have a discussion of mitigation efforts that are needed.
> > >
> > > Regardless of which of 1..4 you pick, I think it could all do with
> more regression test coverage.
> > >
> > >
> > > For reference, I said something similar earlier today in another email
> to this thread:
> > >
> > > This patch introduces a change that stores a new page into variable
> "rightpage" rather than overwriting "state->target", which the old
> implementation most certainly did.  That means that after returning from
> bt_target_page_check() into the calling function
> bt_check_level_from_leftmost() the value in state->target is not what it
> would have been prior to this patch.  Now, that'd be irrelevant if nobody
> goes on to consult that value, but just 44 lines further down in
> bt_check_level_from_leftmost() state->target is clearly used.  So the
> behavior at that point is changing between the old and new versions of the
> code, and I think I'm within reason to ask if it was wrong before the
> patch, wrong after the patch, or something else?  Is this a bug being
> introduced, being fixed, or ... ?
> >
> > Thank you for your analysis.  I'm inclined to believe in 2, but not
> > yet completely sure.  It's really pity that our tests don't cover
> > this.  I'm investigating this area.
>
> It seems that I got to the bottom of this.  Changing
> BtreeCheckState.target for a cross-page unique constraint check is
> wrong, but that happens only for leaf pages.  After that
> BtreeCheckState.target is only used for setting the low key.  The low
> key is only used for non-leaf pages.  So, that didn't lead to any
> visible bug.  I've revised the commit message to reflect this.
>

I agree with your analysis regarding state->target:
- when the unique check is on, state->target was reassigned only for the
leaf pages (under P_ISLEAF(topaque) in bt_target_page_check).
- in this level (leaf) in bt_check_level_from_leftmost() this value of
state->target was used to get state->lowkey. Then it was reset (in the next
iteration of do loop in in bt_check_level_from_leftmost()
- state->lowkey lives until the end of pages level (leaf) iteration cycle.
Then, low-key is reset (state->lowkey = NULL in the end of
 bt_check_level_from_leftmost())
-

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Heikki Linnakangas


On 13/05/2024 12:50, Jelte Fennema-Nio wrote:

On Sun, 12 May 2024 at 23:39, Heikki Linnakangas  wrote:

In v18, I'd like to make sslmode=require the default. Or maybe introduce
a new setting like "encryption=ssl|gss|none", defaulting to 'ssl'. If we
want to encourage encryption, that's the right way to do it. (I'd still
recommend everyone to use an explicit sslmode=require in their
connection strings for many years, though, because you might be using an
older client without realizing it.)


I'm definitely a huge proponent of making the connection defaults more
secure. But as described above sslmode=require is still extremely
insecure and I don't think we gain anything substantial by making it
the default. I think the only useful secure default would be to use
sslmode=verify-full (with probably some automatic fallback to
sslmode=prefer when connecting to hardcoded IPs or localhost). Which
probably means that sslrootcert=system should also be made the
default. Which would mean that ~/.postgresql/root.crt would not be the
default anymore, which I personally think is fine but others likely
disagree.


"channel_binding=require sslmode=require" also protects from MITM attacks.

I think these options should be designed from the user's point of view, 
so that the user can specify the risks they're willing to accept, and 
the details of how that's accomplished are handled by libpq. For 
example, I'm OK with (tick all that apply):


[ ] passive eavesdroppers seeing all the traffic
[ ] MITM being able to hijack the session
[ ] connecting without verifying the server's identity
[ ] divulging the plaintext password to the server
[ ] ...

The requirements for whether SSL or GSS encryption is required, whether 
the server's certificate needs to signed with known CA, etc. can be 
derived from those. For example, if you need protection from 
eavesdroppers, SSL or GSS encryption must be used. If you need to verify 
the server's identity, it implies sslmode=verify-CA or 
channel_binding=true. If you don't want to divulge the password, it 
implies a suitable require_auth setting. I don't have a concrete 
proposal yet, but something like that. And the defaults for those are up 
for debate.


psql could perhaps help by listing the above properties at the beginning 
of the session, something like:


psql (16.2)
WARNING: Connection is not encrypted.
WARNING: The server's identity has not been verified
Type "help" for help.

postgres=#

Although for the "divulge plaintext password to server" property, it's 
too late to print a warning after connecting, because the damage has 
already been done.


A different line of thought is that to keep the attack surface as smal 
as possible, you should specify very explicitly what exactly you expect 
to happen in the authentication, and disallow any variance. For example, 
you expect SCRAM to be used, with a certificate signed by a particular 
CA, and if the server requests anything else, that's suspicious and the 
connection is aborted. We should make that possible too, but the above 
flexible risk-based approach seems good for the defaults.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: doc: some fixes for environment sections in ref pages

2024-05-13 Thread Daniel Gustafsson

> On 13 May 2024, at 10:48, Peter Eisentraut  wrote:

> Patches attached.

All patches look good.

> I think the first one is a bug fix.

Agreed.

--
Daniel Gustafsson

Upgrade Debian CI images to Bookworm

2024-05-13 Thread Nazir Bilal Yavuz

Hi,

Bookworm versions of the Debian CI images are available now [0]. The
patches to use these images are attached.

'v1-0001-Upgrade-Debian-CI-images-to-Bookworm_REL_16+.patch' patch can
be applied to both upstream and REL_16 and all of the tasks finish
successfully.

'v1-0001-Upgrade-Debian-CI-images-to-Bookworm_REL_15.patch' patch can
be applied to REL_15 but it gives a compiler warning. The fix for this
warning is proposed here [1]. After the fix is applied, all of the
tasks finish successfully.

Any kind of feedback would be appreciated.

[0] https://github.com/anarazel/pg-vm-images/pull/91

[1] 
postgr.es/m/CAN55FZ0o9wqVoMTh_gJCmj_%2B4XbX9VXzQF8OySPZ0R1saxV3bA%40mail.gmail.com

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From 26f4b2425cb04dc7218142f24f151d3698f33191 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Mon, 13 May 2024 10:56:28 +0300
Subject: [PATCH v1] Upgrade Debian CI images to Bookworm

New Debian version, namely Bookworm, is released. Use these new images
in CI tasks.

Perl version is upgraded in the Bookworm images, so update Perl version
at 'Linux - Debian Bookworm - Meson' task as well.

Upgrading Debian CI images to Bookworm PR: https://github.com/anarazel/pg-vm-images/pull/91
---
 .cirrus.tasks.yml | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 82261eccb85..d2ff833fcad 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -65,7 +65,7 @@ task:
 CPUS: 4
 BUILD_JOBS: 8
 TEST_JOBS: 8
-IMAGE_FAMILY: pg-ci-bullseye
+IMAGE_FAMILY: pg-ci-bookworm
 CCACHE_DIR: ${CIRRUS_WORKING_DIR}/ccache_dir
 # no options enabled, should be small
 CCACHE_MAXSIZE: "150M"
@@ -241,7 +241,7 @@ task:
 CPUS: 4
 BUILD_JOBS: 4
 TEST_JOBS: 8 # experimentally derived to be a decent choice
-IMAGE_FAMILY: pg-ci-bullseye
+IMAGE_FAMILY: pg-ci-bookworm
 
 CCACHE_DIR: /tmp/ccache_dir
 DEBUGINFOD_URLS: "https://debuginfod.debian.net;
@@ -312,7 +312,7 @@ task:
 #DEBIAN_FRONTEND=noninteractive apt-get -y install ...
 
   matrix:
-- name: Linux - Debian Bullseye - Autoconf
+- name: Linux - Debian Bookworm - Autoconf
 
   env:
 SANITIZER_FLAGS: -fsanitize=address
@@ -346,7 +346,7 @@ task:
   on_failure:
 <<: *on_failure_ac
 
-- name: Linux - Debian Bullseye - Meson
+- name: Linux - Debian Bookworm - Meson
 
   env:
 CCACHE_MAXSIZE: "400M" # tests two different builds
@@ -373,7 +373,7 @@ task:
 ${LINUX_MESON_FEATURES} \
 -Dllvm=disabled \
 --pkg-config-path /usr/lib/i386-linux-gnu/pkgconfig/ \
--DPERL=perl5.32-i386-linux-gnu \
+-DPERL=perl5.36-i386-linux-gnu \
 -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
 build-32
 EOF
@@ -648,7 +648,7 @@ task:
   env:
 CPUS: 4
 BUILD_JOBS: 4
-IMAGE_FAMILY: pg-ci-bullseye
+IMAGE_FAMILY: pg-ci-bookworm
 
 # Use larger ccache cache, as this task compiles with multiple compilers /
 # flag combinations
-- 
2.43.0

From d8f4b660bdd408fa22c857e24d1b4d05ff41df03 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Mon, 13 May 2024 11:55:48 +0300
Subject: [PATCH v1] Upgrade Debian CI images to Bookworm

New Debian version, namely Bookworm, is released. Use these new images
in CI tasks.

Upgrading Debian CI images to Bookworm PR: https://github.com/anarazel/pg-vm-images/pull/91
---
 .cirrus.tasks.yml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index b03d128184d..a2735f8bb60 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -138,13 +138,13 @@ LINUX_CONFIGURE_FEATURES: _CONFIGURE_FEATURES >-
 
 
 task:
-  name: Linux - Debian Bullseye
+  name: Linux - Debian Bookworm
 
   env:
 CPUS: 4
 BUILD_JOBS: 4
 TEST_JOBS: 8 # experimentally derived to be a decent choice
-IMAGE_FAMILY: pg-ci-bullseye
+IMAGE_FAMILY: pg-ci-bookworm
 
 CCACHE_DIR: /tmp/ccache_dir
 DEBUGINFOD_URLS: "https://debuginfod.debian.net;
@@ -451,12 +451,12 @@ task:
 
   # To limit unnecessary work only run this once the normal linux test succeeds
   depends_on:
-- Linux - Debian Bullseye
+- Linux - Debian Bookworm
 
   env:
 CPUS: 4
 BUILD_JOBS: 4
-IMAGE_FAMILY: pg-ci-bullseye
+IMAGE_FAMILY: pg-ci-bookworm
 
 # Use larger ccache cache, as this task compiles with multiple compilers /
 # flag combinations
-- 
2.43.0

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-13 Thread Alexander Korotkov

On Mon, May 13, 2024 at 12:45 PM Dmitry Koval  wrote:
> 13.05.2024 11:45, Daniel Gustafsson пишет:
> > Commit 3ca43dbbb67f which adds the permission checks seems to cause 
> > conflicts
> > in the pg_upgrade tests
>
> Thanks!
>
> It will probably be enough to rename the roles:
>
> regress_partition_merge_alice -> regress_partition_split_alice
> regress_partition_merge_bob -> regress_partition_split_bob

Thanks to Danial for spotting this.
Thanks to Dmitry for the proposed fix.

The actual problem appears to be a bit more complex.  Additionally to
the role names, the lack of permissions on schemas lead to creation of
tables in public schema and potential conflict there.  Fixed in
2a679ae94e.

--
Regards,
Alexander Korotkov

Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS

2024-05-13 Thread Aleksander Alekseev

Hi,

> > Thanks. I see a few pieces of code that use special FOO_NUMBER enum
> > values instead of a macro. Should we refactor these pieces
> > accordingly? PFA another patch.
>
> I think this is a sensible improvement.
>
> But please keep the trailing commas on the last enum items.

Thanks, fixed.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Use-macro-to-define-the-number-of-enum-values.patch
Description: Binary data

Re: SQL:2011 application time

2024-05-13 Thread Peter Eisentraut


On 03.04.24 07:30, Paul Jungwirth wrote:
But is it *literally* unique? Well two identical keys, e.g. (5, 
'[Jan24,Mar24)') and (5, '[Jan24,Mar24)'), do have overlapping ranges, 
so the second is excluded. Normally a temporal unique index is *more* 
restrictive than a standard one, since it forbids other values too (e.g. 
(5, '[Jan24,Feb24)')). But sadly there is one exception: the ranges in 
these keys do not overlap: (5, 'empty'), (5, 'empty'). With 
ranges/multiranges, `'empty' && x` is false for all x. You can add that 
key as many times as you like, despite a PK/UQ constraint:


     postgres=# insert into t values
     ('[1,2)', 'empty', 'foo'),
     ('[1,2)', 'empty', 'bar');
     INSERT 0 2
     postgres=# select * from t;
   id   | valid_at | name
     ---+--+--
  [1,2) | empty    | foo
  [1,2) | empty    | bar
     (2 rows)

Cases like this shouldn't actually happen for temporal tables, since 
empty is not a meaningful value. An UPDATE/DELETE FOR PORTION OF would 
never cause an empty. But we should still make sure they don't cause 
problems.


We should give temporal primary keys an internal CHECK constraint saying 
`NOT isempty(valid_at)`. The problem is analogous to NULLs in parts of a 
primary key. NULLs prevent two identical keys from ever comparing as 
equal. And just as a regular primary key cannot contain NULLs, so a 
temporal primary key should not contain empties.


The standard effectively prevents this with PERIODs, because a PERIOD 
adds a constraint saying start < end. But our ranges enforce only start 
<= end. If you say `int4range(4,4)` you get `empty`. If we constrain 
primary keys as I'm suggesting, then they are literally unique, and 
indisunique seems safer.


Should we add the same CHECK constraint to temporal UNIQUE indexes? I'm 
inclined toward no, just as we don't forbid NULLs in parts of a UNIQUE 
key. We should try to pick what gives users more options, when possible. 
Even if it is questionably meaningful, I can see use cases for allowing 
empty ranges in a temporal table. For example it lets you "disable" a 
row, preserving its values but marking it as never true.


It looks like we missed some of these fundamental design questions early 
on, and it might be too late now to fix them for PG17.


For example, the discussion on unique constraints misses that the 
question of null values in unique constraints itself is controversial 
and that there is now a way to change the behavior.  So I imagine there 
is also a selection of possible behaviors you might want for empty 
ranges.  Intuitively, I don't think empty ranges are sensible for 
temporal unique constraints.  But anyway, it's a bit late now to be 
discussing this.


I'm also concerned that if ranges have this fundamental incompatibility 
with periods, then the plan to eventually evolve this patch set to 
support standard periods will also have as-yet-unknown problems.


Some of these issues might be design flaws in the underlying mechanisms, 
like range types and exclusion constraints.  Like, if you're supposed to 
use this for scheduling but you can use empty ranges to bypass exclusion 
constraints, how is one supposed to use this?  Yes, a check constraint 
using isempty() might be the right answer.  But I don't see this 
documented anywhere.


On the technical side, adding an implicit check constraint as part of a 
primary key constraint is quite a difficult implementation task, as I 
think you are discovering.  I'm just reminded about how the patch for 
catalogued not-null constraints struggled with linking these not-null 
constraints to primary keys correctly.  This sounds a bit similar.


I'm afraid that these issues cannot be resolved in good time for this 
release, so we should revert this patch set for now.

Re: [PATCH] Fix bug when calling strncmp in check_authmethod_valid

2024-05-13 Thread Aleksander Alekseev

Hi,

> Searching the archives I was unable to find any complaints, and this has been
> broken for the entire window of supported releases, so I propose we remove it
> as per the attached patch.  If anyone is keen on making this work again for 
> all
> the types where it makes sense, it can be resurrected (probably with a better
> implementation).
>
> Any objections to fixing this in 17 by removing it? (cc:ing Michael from the 
> RMT)

+1 Something that is not documented or used by anyone (apparently) and
is broken should just be removed.

-- 
Best regards,
Aleksander Alekseev

Re: UniqueKey v2

2024-05-13 Thread Antonin Houska

Andy Fan  wrote:

> > * I think that, before EC is considered suitable for an UK, its 
> > ec_opfamilies
> >   field needs to be checked. I try to do that in
> >   find_ec_position_matching_expr(), see 0004.
> 
> Could you make the reason clearer for adding 'List *opfamily_lists;'
> into UniqueKey?  You said "This is needed to create ECs in the parent
> query if the upper relation represents a subquery." but I didn't get the
> it. Since we need to maintain the UniqueKey in the many places, I'd like
> to keep it as simple as possbile. Of course, anything essentical should
> be added for sure. 

If unique keys are generated for a subquery output, they also need to be
created for the corresponding relation in the upper query ("sub" in the
following example):

select * from tab1 left join (select * from tab2) sub;

However, to create an unique key for "sub", you need an EC for each expression
of the key. And to create an EC, you in turn need the list of operator
families.

Even if the parent query already had ECs for the columns of "sub" which are
contained in the unique key, you need to make sure that those ECs are
"compatible" with the ECs of the subquery which generated the unique key. That
is, if an EC of the subquery considers certain input values equal, the EC of
the parent query must also be able to determine if they are equal or not.

> > * RelOptInfo.notnullattrs
> >
> >   My understanding is that this field contains the set attributes whose
> >   uniqueness is guaranteed by the unique key. They are acceptable because 
> > they
> >   are either 1) not allowed to be NULL due to NOT NULL constraint or 2) NULL
> >   value makes the containing row filtered out, so the row cannot break
> >   uniqueness of the output. Am I right?
> >
> >   If so, I suggest to change the field name to something less generic, maybe
> >   'uniquekey_attrs' or 'uniquekey_candidate_attrs', and adding a comment 
> > that
> >   more checks are needed before particular attribute can actually be used in
> >   UniqueKey.
> 
> I don't think so, UniqueKey is just one of the places to use this
> not-null property, see 3af704098 for the another user case of it. 
> 
> (Because of 3af704098, we should leverage notnullattnums somehow in this
> patch, which will be included in the next version as well).

In your patch you modify 'notnullattrs' in add_base_clause_to_rel(), but that
does not happen to 'notnullattnums' in the current master branch. Thus I think
that 'notnullattrs' is specific to the unique keys feature, so the field name
should be less generic.

> >
> > * uniquekey_useful_for_merging()
> >
> >   How does uniqueness relate to merge join? In README.uniquekey you seem to
> >   point out that a single row is always sorted, but I don't think this
> >   function is related to that fact. (Instead, I'd expect that pathkeys are
> >   added to all paths for a single-row relation, but I'm not sure you do that
> >   in the current version of the patch.)
> 
> The merging is for "mergejoinable join clauses", see function
> eclass_useful_for_merging. Usually I think it as operator "t1.a = t2.a";

My question is: why is the uniqueness important specifically to merge join? I
understand that join evaluation can be more efficient if we know that one
input relation is unique (i.e. we only scan that relation until we find the
first match), but this is not specific to merge join.

> > * is_uniquekey_useful_afterjoin()
> >
> >   Now that my patch (0004) allows propagation of the unique keys from a
> >   subquery to the upper query, I was wondering if the UniqueKey structure
> >   needs the 'use_for_distinct field' I mean we should now propagate the 
> > unique
> >   keys to the parent query whether the subquery has DISTINCT clause or not. 
> > I
> >   noticed that the field is checked by is_uniquekey_useful_afterjoin(), so I
> >   changed the function to always returned true. However nothing changed in 
> > the
> >   output of regression tests (installcheck). Do you insist that the
> >   'use_for_distinct' field is needed?

I miss your answer to this comment.

> > * uniquekey_contains_multinulls()
> >
> >   ** Instead of calling the function when trying to use the UK, how about
> >  checking the ECs when considering creation of the UK? If the tests 
> > fail,
> >  just don't create the UK.
> 
> I don't think so since we maintain the UniqueKey from bottom to top, you
> can double check if my reason is appropriate.  
> 
> CREATE TABLE t1(a int);
> CREATE INDEX ON t1(a);
> 
> SELECT distinct t1.a FROM t1 JOIN t2 using(a);
> 
> We need to create the UniqueKey on the baserel for t1 and the NULL
> values is filtered out in the joinrel. so we have to creating it with
> allowing NULL values first. 

ok

> >   ** What does the 'multi' word in the function name mean?
> 
> multi means multiple, I thought we use this short name in the many
> places, for ex bt_multi_page_stats after a quick search. 

Why not simply uniquekey_contains_nulls() ?

Actually

Re: Direct SSL connection with ALPN and HBA rules