Re: ResourceOwner refactoring

2024-06-06 Thread Heikki Linnakangas

On 06/06/2024 18:27, Robert Haas wrote:

On Thu, Jun 6, 2024 at 7:32 AM Heikki Linnakangas  wrote:

Barring objections, I'll commit this later today or tomorrow. Thanks for
the report!


Committed.


I think you may have forgotten to push.


Huh, you're right. I could swear I did...

Pushed now thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: [multithreading] extension compatibility

2024-06-06 Thread Heikki Linnakangas

On 06/06/2024 17:23, Robert Haas wrote:

On Thu, Jun 6, 2024 at 5:00 AM Heikki Linnakangas  wrote:

If there is some material harm from compiling with multithreading
support even if you're not using it, we should try to fix that. I'm not
dead set against having a compile-time option, but I don't see the need
for it at the moment.


Well, OK, so it sounds like I'm outvoted, at least at the moment.
Maybe that will change as more people vote, but for now, that's where
we are. Given that, I suppose we want something more like Tristan's
patch, but with a more extensible syntax. Does that sound right?


+1

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: ResourceOwner refactoring

2024-06-06 Thread Heikki Linnakangas

On 05/06/2024 16:58, Heikki Linnakangas wrote:

On 04/06/2024 01:49, Heikki Linnakangas wrote:

A straightforward fix is to modify RelationFlushRelation() so that if
!IsTransactionState(), it just marks the entry as invalid instead of
calling RelationClearRelation(). That's what RelationClearRelation()
would do anyway, if it didn't hit the assertion first.


Here's a patch with that straightforward fix. Your test case hit the
"rd_createSubid != InvalidSubTransactionId" case, I expanded it to also
cover the "rd_firstRelfilelocatorSubid != InvalidSubTransactionId" case.


For the record, I got the above backwards: your test case covered the 
rd_firstRelfilelocatorSubid case and I expanded it to also cover the 
rd_createSubid case.



Barring objections, I'll commit this later today or tomorrow. Thanks for
the report!


Committed.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: [multithreading] extension compatibility

2024-06-06 Thread Heikki Linnakangas

On 06/06/2024 05:47, Robert Haas wrote:

On Wed, Jun 5, 2024 at 10:09 PM Andres Freund  wrote:

Maybe. I think shipping a mode where users can fairly simply toggle between
threaded and process mode will allow us to get this stable a *lot* quicker
than if we distribute two builds. Most users don't build from source, distros
will have to pick the mode. If they don't choose threaded mode, we'll not find
problems. If they do choose threaded mode, we can't ask users to switch to a
process based mode to check if the problem is related.


I don't believe that being coercive here is the right approach. I
think distros see the value in compiling with as many things turned on
as possible; when they ship with something turned off, it's because
it's unstable or introduces weird dependencies or has some other
disadvantage.


I presume there's no harm in building with multithreading support. If 
you don't want to use it, put "multithreading=off" in your config file 
(which will presumably be the default for a while).


If we're worried about the performance impact of thread-local variables 
in particular, we can try to measure that. I don't think it's material 
though.


If there is some material harm from compiling with multithreading 
support even if you're not using it, we should try to fix that. I'm not 
dead set against having a compile-time option, but I don't see the need 
for it at the moment.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: [multithreading] extension compatibility

2024-06-05 Thread Heikki Linnakangas

On 05/06/2024 23:55, Robert Haas wrote:

On Wed, Jun 5, 2024 at 4:32 PM Tristan Partin  wrote:

Not entirely sure how I feel about the approach you've taken, but here
is a patch that Heikki and I put together for extension compatibility.
It's not a build time solution, but a runtime solution. Instead of
PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
is a GUC called `multithreaded` which controls the variable
IsMultithreaded. We operated under the assumption that being able to
toggle multithreading and multi-processing without recompiling has
value.


That's interesting, because I thought Heikki was against having a
runtime toggle.


I'm very much in favor of a runtime toggle. To be precise, a 
PGC_POSTMASTER setting. We'll get a lot more testing if you can easily 
turn it on/off, and so far I haven't seen anything that would require it 
to be a compile time option.



I don't think PG_MODULE_MAGIC_REENTRANT is a good syntax. It all looks
great as long as we only ever need the PG_MODULE_MAGIC line to signal
one bit of information, but as soon as you get to two bits it's pretty
questionable, and anything more than two bits is insane. If we want to
do something with the PG_MODULE_MAGIC line, I think it should involve
options-passing of some form rather than just having an alternate
macro name.


+1, that would be nicer.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: ResourceOwner refactoring

2024-06-05 Thread Heikki Linnakangas

On 04/06/2024 01:49, Heikki Linnakangas wrote:

A straightforward fix is to modify RelationFlushRelation() so that if
!IsTransactionState(), it just marks the entry as invalid instead of
calling RelationClearRelation(). That's what RelationClearRelation()
would do anyway, if it didn't hit the assertion first.


Here's a patch with that straightforward fix. Your test case hit the 
"rd_createSubid != InvalidSubTransactionId" case, I expanded it to also 
cover the "rd_firstRelfilelocatorSubid != InvalidSubTransactionId" case. 
Barring objections, I'll commit this later today or tomorrow. Thanks for 
the report!


I started a new thread with the bigger refactorings I had in mind: 
https://www.postgresql.org/message-id/9c9e8908-7b3e-4ce7-85a8-00c0e165a3d6%40iki.fi


--
Heikki Linnakangas
Neon (https://neon.tech)
From 227f173d1066955a2ab6aba12c508aaa1f416c27 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 5 Jun 2024 13:29:51 +0300
Subject: [PATCH 1/3] Make RelationFlushRelation() work without ResourceOwner
 during abort

ReorderBufferImmediateInvalidation() executes invalidation messages in
an aborted transaction. However, RelationFlushRelation sometimes
required a valid resource owner, to temporarily increment the refcount
of the relache entry. Commit b8bff07daa worked around that in the main
subtransaction abort function, AbortSubTransaction(), but missed this
similar case in ReorderBufferImmediateInvalidation().

To fix, introduce a separate function to invalidate a relcache
entry. It does the same thing as RelationClearRelation(rebuild==true)
does, when outside a transaction, but can be called without
incrementing the refcount.

Add regression test. Before this fix, it failed with:

ERROR: ResourceOwnerEnlarge called after release started

Reported-by: Alexander Lakhin 
Disussion: https://www.postgresql.org/message-id/e56be7d9-14b1-664d-0bfc-00ce97727...@gmail.com
---
 .../expected/decoding_in_xact.out | 48 +
 .../test_decoding/sql/decoding_in_xact.sql| 27 ++
 src/backend/access/transam/xact.c | 13 -
 src/backend/utils/cache/relcache.c| 54 ---
 4 files changed, 122 insertions(+), 20 deletions(-)

diff --git a/contrib/test_decoding/expected/decoding_in_xact.out b/contrib/test_decoding/expected/decoding_in_xact.out
index b65253f4630..8555b3d9436 100644
--- a/contrib/test_decoding/expected/decoding_in_xact.out
+++ b/contrib/test_decoding/expected/decoding_in_xact.out
@@ -79,6 +79,54 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
  COMMIT
 (6 rows)
 
+-- Decoding works in transaction that issues DDL
+--
+-- We had issues handling relcache invalidations with these, see
+-- https://www.postgresql.org/message-id/e56be7d9-14b1-664d-0bfc-00ce97727...@gmail.com
+CREATE TABLE tbl_created_outside_xact(id SERIAL PRIMARY KEY);
+BEGIN;
+  -- TRUNCATE changes the relfilenode and sends relcache invalidation
+  TRUNCATE tbl_created_outside_xact;
+  INSERT INTO tbl_created_outside_xact(id) VALUES('1');
+  -- don't show yet, haven't committed
+  SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data 
+--
+(0 rows)
+
+COMMIT;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data 
+--
+ BEGIN
+ table public.tbl_created_outside_xact: TRUNCATE: (no-flags)
+ table public.tbl_created_outside_xact: INSERT: id[integer]:1
+ COMMIT
+(4 rows)
+
+set debug_logical_replication_streaming=immediate;
+BEGIN;
+  CREATE TABLE tbl_created_in_xact(id SERIAL PRIMARY KEY);
+  INSERT INTO tbl_created_in_xact VALUES (1);
+  CHECKPOINT; -- Force WAL flush, so that the above changes will be streamed
+  SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+   data   
+--
+ opening a streamed block for transaction
+ streaming change for transaction
+ closing a streamed block for transaction
+(3 rows)
+
+COMMIT;
+reset debug_logical_replication_streaming;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+  data   
+-
+ BEGIN
+ table public.tbl_created_in_xact: INSERT: id[integer]:1
+ COMMIT
+(3 rows)
+
 SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
  ?column? 
 --
diff --git a/contrib/test_decoding/sql/decoding_in_xact.sql b/contrib/test_decoding/sql/decoding_in_xact.sql
index 108782dc2e9..841a92fa780 100644
--- a/contrib/test_decoding/sql/decoding_in_xact.sql
+++ b/contr

Relcache refactoring

2024-06-05 Thread Heikki Linnakangas
While looking at the recent bug report from Alexander Lakhin [1], I got 
annoyed by the relcache code, and RelationClearRelation in particular. I 
propose to refactor it for clarity.


[1] 
https://www.postgresql.org/message-id/e56be7d9-14b1-664d-0bfc-00ce9772721c%40gmail.com


## Patch 1

This is just a narrow fix for the reported bug [1], same as I posted on 
that thread. Included here because I wrote the refactorings on top of 
this patch and didn't commit it yet.



## Patch 2: Simplify call to rebuild relcache entry for indexes

To rebuild a relcache entry that's been marked as invalid, 
RelationIdGetRelation() calls RelationReloadIndexInfo() for indexes and 
RelationClearRelation(rebuild == true) for other relations. However, 
RelationClearRelation calls RelationReloadIndexInfo() for indexes 
anyway, so RelationIdGetRelation() can just always call 
RelationClearRelation() and let RelationClearRelation() do the right 
thing to rebuild the relation, whether it's an index or something else. 
That seems more straightforward.


Also add comments explaining how the rebuild works at index creation. 
It's a bit special, see the comments.



## Patch 3: Split RelationClearRelation into three different functions

RelationClearRelation() is complicated. Depending on the 'rebuild' 
argument and the circumstances, like if it's called in a transaction and 
whether the relation is an index, a nailed relation, a regular table, or 
a relation dropped in the same xact, it does different things:


- Remove the relation completely from the cache (rebuild == false),
- Mark the entry as invalid (rebuild == true, but not in xact), or
- Rebuild the entry (rebuild == true).

The callers have expectations on what they want it to do. Mostly the 
callers with 'rebuild == false' expect the entry to be removed, and 
callers with 'rebuild == true' expect it to be rebuilt or invalidated, 
but there are exceptions. RelationForgetRelation() for example sets 
rd_droppedSubid and expects RelationClearRelation() to then merely 
invalidate it, and the call from RelationIdGetRelation() expects it to 
rebuild, not merely invalidate it.


I propose to split RelationClearRelation() into three functions:

RelationInvalidateRelation: mark the relcache entry as invalid, so that 
it it is rebuilt on next access.

RelationRebuildRelation: rebuild the relcache entry in-place.
RelationClearRelation: Remove the entry from the relcache.

This moves the responsibility of deciding the right action to the 
callers. Which they were mostly already doing. Each of those actions 
have different preconditions, e.g. RelationRebuildRelation() can only be 
called in a valid transaction, and RelationClearRelation() can only be 
called if the reference count is zero. Splitting them makes those 
preconditions more clear, we can have assertions to document them in each.



## RelationInitPhysicalAddr() call in RelationReloadNailed()

One question or little doubt I have: Before these patches, 
RelationReloadNailed() calls RelationInitPhysicalAddr() even when it 
leaves the relation as invalidated because we're not in a transaction or 
if the relation isn't currently in use. That's a bit surprising, I'd 
expect it to be done when the entry is reloaded, not when it's 
invalidated. That's how it's done for non-nailed relations. And in fact, 
for a nailed relation, RelationInitPhysicalAddr() is called *again* when 
it's reloaded.


Is it important? Before commit a54e1f1587, nailed non-index relations 
were not reloaded at all, except for the call to 
RelationInitPhysicalAddr(), which seemed consistent. I think this was 
unintentional in commit a54e1f1587, or perhaps just overly defensive, as 
there's no harm in some extra RelationInitPhysicalAddr() calls.


This patch removes that extra call from the invalidation path, but if it 
turns out to be wrong, we can easily add it to RelationInvalidateRelation.


--
Heikki Linnakangas
Neon (https://neon.tech)From 227f173d1066955a2ab6aba12c508aaa1f416c27 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 5 Jun 2024 13:29:51 +0300
Subject: [PATCH 1/3] Make RelationFlushRelation() work without ResourceOwner
 during abort

ReorderBufferImmediateInvalidation() executes invalidation messages in
an aborted transaction. However, RelationFlushRelation sometimes
required a valid resource owner, to temporarily increment the refcount
of the relache entry. Commit b8bff07daa worked around that in the main
subtransaction abort function, AbortSubTransaction(), but missed this
similar case in ReorderBufferImmediateInvalidation().

To fix, introduce a separate function to invalidate a relcache
entry. It does the same thing as RelationClearRelation(rebuild==true)
does, when outside a transaction, but can be called without
incrementing the refcount.

Add regression test. Before this fix, it failed with:

ERROR: ResourceOwnerEnlarge called after release started

Reported-by: Alexander Lakhin 
Disussion: https://www.postgresql.org/message

Re: meson "experimental"?

2024-06-04 Thread Heikki Linnakangas

On 04/06/2024 18:28, Dave Page wrote:
I clearly missed the discussion in which it was decided to remove the 
old MSVC++ support from the tree (and am disappointed about that as I 
would have objected loudly). Having recently started testing Meson on 
Windows when I realised that change had been made, and having to update 
builds for pgAdmin and the Windows installers, I think it's clear it's 
far more experimental on that platform than it is on Linux at least.


What kind of issues did you run into? Have they been fixed since?

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: ResourceOwner refactoring

2024-06-03 Thread Heikki Linnakangas

Thanks for the testing!

On 01/06/2024 11:00, Alexander Lakhin wrote:

Hello Heikki,

I've stumbled upon yet another anomaly introduced with b8bff07da.

Please try the following script:
SELECT 'init' FROM pg_create_logical_replication_slot('slot',
    'test_decoding');
CREATE TABLE tbl(t text);
BEGIN;
TRUNCATE table tbl;

SELECT data FROM pg_logical_slot_get_changes('slot', NULL, NULL,
   'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');

On b8bff07da (and the current HEAD), it ends up with:
ERROR:  ResourceOwnerEnlarge called after release started

But on the previous commit, I get no error:
   data
--
(0 rows)


This is another case of the issue I tried to fix in commit b8bff07da 
with this:



--- b/src/backend/access/transam/xact.c
+++ a/src/backend/access/transam/xact.c
@@ -5279,7 +5279,20 @@
 
 		AtEOSubXact_RelationCshould be ache(false, s->subTransactionId,

  
s->parent->subTransactionId);
+
+
+   /*
+* AtEOSubXact_Inval sometimes needs to temporarily bump the 
refcount
+* on the relcache entries that it processes.  We cannot use the
+* subtransaction's resource owner anymore, because we've 
already
+* started releasing it.  But we can use the parent resource 
owner.
+*/
+   CurrentResourceOwner = s->parent->curTransactionOwner;
+
AtEOSubXact_Inval(false);
+
+   CurrentResourceOwner = s->curTransactionOwner;
+
ResourceOwnerRelease(s->curTransactionOwner,
 RESOURCE_RELEASE_LOCKS,
 false, false);


AtEOSubXact_Inval calls LocalExecuteInvalidationMessage(), which 
ultimately calls RelationFlushRelation(). Your test case reaches 
ReorderBufferExecuteInvalidations(), which also calls 
LocalExecuteInvalidationMessage(), in an already aborted subtransaction.


Looking closer at relcache.c, I think we need to make 
RelationFlushRelation() safe to call without a valid resource owner. 
There are already IsTransactionState() checks in 
RelationClearRelation(), and a comment noting that it does not touch 
CurrentResourceOwner. So we should make sure RelationFlushRelation() 
doesn't touch it either, and revert the above change to 
AbortTransaction(). I didn't feel very good about it in the first place, 
and now I see what the proper fix is.


A straightforward fix is to modify RelationFlushRelation() so that if 
!IsTransactionState(), it just marks the entry as invalid instead of 
calling RelationClearRelation(). That's what RelationClearRelation() 
would do anyway, if it didn't hit the assertion first.


RelationClearRelation() is complicated. Depending on the circumstances, 
it removes the relcache entry, rebuilds it, marks it as invalid, or 
performs a partial reload of a nailed relation. So before going for the 
straightforward fix, I'm going to see if I can refactor 
RelationClearRelation() to be less complicated.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Question: Why Are File Descriptors Not Closed and Accounted for PostgreSQL Backends?

2024-05-24 Thread Heikki Linnakangas

On 24/05/2024 15:17, Srinath Reddy Sadipiralla wrote:

Hi PostgreSQL Community,
when a backend process starts, pq_init is called where it opens a FD during 
CreateWaitEventSet()


if (!AcquireExternalFD())
{
/* treat this as though epoll_create1 itself returned EMFILE */
elog(ERROR, "epoll_create1 failed: %m");
}
set->epoll_fd = epoll_create1(EPOLL_CLOEXEC);


but we didn't closed or called ReleaseExternalFD() for accounting


Yes we do, see FreeWaitEventSet().

The WaitEventSet created fro pq_init() is never explicitly free'd 
though, because it's created in the per-connection backend process. When 
the connection is terminated, the backend process exits, cleaning up any 
resources including the WaitEventSet.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: AIX support

2024-05-23 Thread Heikki Linnakangas

On 23/05/2024 18:36, Srirama Kucherlapati wrote:

Hi Peter, thanks for your feedback.

We are eager to extend our support in resolving the issues specific 
to AIX or corresponding compilers (XLC and cLang).


But as there are no issues with the patch after reverting the 
changes(with the latest compilers gcc12 and xlc-16.0.1.18), we were 
wondering if this patch can be merged with the current release 17??


Having said that, we are committed to resolve all the hacks and 
caveats that got accumulated specific to AIX over the period by 
picking and resolving one after the other, rather than waiting for 
all the hacks to be fixed.


I'm not eager to put back those hacks just to have them be removed
again. So I'd like to see a minimal patch, with the *minimal* changes
required for AIX support. And perhaps split that into two patches: First
add back AIX support with GCC, and second patch to add XLC support. I'd
like to to see how much of the changes are because of the different
compiler and how much from the OS.

No promises for v17, but if the patch is small and non-intrusive, I
would consider it at least. But let's see what it looks like first. It's
the same work that needs to be done whether it goes into v17 or v18 anyway.

One of the reasons that the AIX port ultimately became 
unmaintainable was that so many hacks and caveats were accumulated

over the years.  A new port should set a more recent baseline and
trim all those hacks.


Please help me understand this, with respect to the AIX specific 
hacks, is it just we can find all the location where _AIX macros are 
involved OR can we just look at the patch changes only, as all the 
changes that were made were specific to AIX. If not, is there any 
other location where we could find all the hacks to be resolved.


Can you provide some more details on the expectations here?


Smallest possible patch that makes Postgres work on AIX again.

Perhaps start with the patch you posted yesterday, but remove hunks from 
it one by one, to see which ones are still needed.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: 回复: An implementation of multi-key sort

2024-05-23 Thread Heikki Linnakangas

On 23/05/2024 15:39, Wang Yao wrote:

No obvious perf regression is expected because PG will follow original
qsort code path when mksort is disabled. For the case, the only extra
cost is the check in tuplesort_sort_memtuples() to enter mksort code path.


And what about the case the mksort is enabled, but it's not effective 
because all leading keys are different?


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: An implementation of multi-key sort

2024-05-22 Thread Heikki Linnakangas

On 22/05/2024 15:48, Wang Yao wrote:

Comparing to classic quick sort, it can get significant performance
improvement once multiple keys are available. A rough test shows it got
~129% improvement than qsort for ORDER BY on 6 keys, and ~52% for CREATE
INDEX on the same data set. (See more details in section "Performance
Test")


Impressive. Did you test the performance of the cases where MK-sort 
doesn't help, to check if there is a performance regression?


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

2024-05-21 Thread Heikki Linnakangas
=100 width=36)
   ->  Index Only Scan using foo_i_idx on foo foo_1 
(cost=0.42..26038.42 rows=100 width=36)

(6 rows)


The following two queries are the same from the user's point of view, 
but one is written using WITH:


postgres=# explain (select i from foo union (select 1::int order by 1) 
union select i from foo) order by 1;
 QUERY PLAN 



 Unique  (cost=326083.66..336083.67 rows=201 width=4)
   ->  Sort  (cost=326083.66..331083.67 rows=201 width=4)
 Sort Key: foo.i
 ->  Append  (cost=0.42..62076.87 rows=201 width=4)
   ->  Index Only Scan using foo_i_idx on foo 
(cost=0.42..26038.42 rows=100 width=4)

   ->  Result  (cost=0.00..0.01 rows=1 width=4)
   ->  Index Only Scan using foo_i_idx on foo foo_1 
(cost=0.42..26038.42 rows=100 width=4)

(7 rows)

postgres=# explain with x (i) as (select 1::int order by 1)  (select i 
from foo union select i from x union select i from foo) order by 1;
  QUERY PLAN 


--
 Unique  (cost=0.89..82926.54 rows=201 width=4)
   ->  Merge Append  (cost=0.89..77926.54 rows=201 width=4)
 Sort Key: foo.i
 ->  Index Only Scan using foo_i_idx on foo 
(cost=0.42..26038.42 rows=100 width=4)

 ->  Sort  (cost=0.02..0.03 rows=1 width=4)
   Sort Key: (1)
   ->  Result  (cost=0.00..0.01 rows=1 width=4)
 ->  Index Only Scan using foo_i_idx on foo foo_1 
(cost=0.42..26038.42 rows=100 width=4)

(8 rows)

I would've expected a MergeAppend in both cases.


None of these test cases are broken as such, you just don't get the 
benefit of the optimization. I suspect they might all have the same root 
cause, as they all involve constants in the target list. I think that's 
a pretty common use case of UNION though.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Heikki Linnakangas

On 17/05/2024 05:26, Robert Haas wrote:

For bonus points, suppose we make it so that when you click the link,
it takes you to a box where you can type in a text comment that will
display in the app, explaining why your patch needs review: "Robert
says the wire protocol changes in this patch are wrong, but I think
he's full of hot garbage and want a second opinion!" (a purely
hypothetical example, I'm sure) If you put in a comment like this when
you register your patch for the CommitFest, it gets a sparkly gold
border and floats to the top of the list, or we mail you a Kit-Kat
bar, or something. I don't know.


Dunno about having to click a link or sparkly gold borders, but +1 on 
having a free-form text box for notes like that. Things like "cfbot says 
this has bitrotted" or "Will review this after other patch this depends 
on". On the mailing list, notes like that are both noisy and easily lost 
in the threads. But as a little free-form text box on the commitfest, it 
would be handy.


One risk is that if we start to rely too much on that, or on the other 
fields in the commitfest app for that matter, we de-value the mailing 
list archives. I'm not too worried about it, the idea is that the 
summary box just summarizes what's already been said on the mailing 
list, or is transient information like "I'll get to this tomorrow" 
that's not interesting to archive.



The point is - we need a much better signal to noise ratio here. I bet
the number of patches in the CommitFest that actually need review is
something like 25% of the total. The rest are things that are just
parked there by a committer, or that the author doesn't care about
right now, or that are already being actively discussed, or where
there's not a clear way forward. We could create new statuses for all
of those states - "Parked", "In Hibernation," "Under Discussion," and
"Unclear" - but I think that's missing the point. What we really want
is to not see that stuff in the first place. It's a CommitFest, not
once-upon-a-time-I-wrote-a-patch-Fest.


Yeah, I'm also skeptical of adding new categories or statuses to the 
commitfest.


I sometimes add patches to the commitfest that are not ready to be 
committed, because I want review on the general idea or approach, before 
polishing the patch to final state. That's also a fine use of the 
commitfest app. The expected workflow is to get some review on the 
patch, and then move it back to Waiting on Author.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Heikki Linnakangas

On 17/05/2024 08:05, Peter Eisentraut wrote:

On 17.05.24 04:26, Robert Haas wrote:

I do*emphatically*  think we need a parking lot.


People seem to like this idea; I'm not quite sure I follow it.  If you
just want the automated patch testing, you can do that on your own by
setting up github/cirrus for your own account.  If you keep emailing the
public your patches just because you don't want to set up your private
testing tooling, that seems a bit abusive?


Agreed. Also, if you do want to park a patch in the commitfest, setting 
it to "Waiting on Author" is effectively that.


I used to add patches to the commitfest to run CFBot on them, but some 
time back I bit the bullet and set up github/cirrus to run on my own 
github repository. I highly recommend that. It only takes a few clicks, 
and the user experience is much better: push a branch to my own github 
repository, and cirrus CI runs automatically.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-05-16 Thread Heikki Linnakangas

On 16/05/2024 17:08, Daniel Gustafsson wrote:

On 16 May 2024, at 15:54, Robert Haas  wrote:

On Wed, May 15, 2024 at 9:33 AM Heikki Linnakangas  wrote:

Ok, yeah, I can see that now. Here's a new version to address that. I
merged ENC_SSL_NEGOTIATED_SSL and ENC_SSL_DIRECT_SSL to a single method,
ENC_SSL. The places that need to distinguish between them now check
conn-sslnegotiation. That seems more clear now that there is no fallback.


Unless there is a compelling reason to do otherwise, we should
expedite getting this committed so that it is included in beta1.
Release freeze begins Saturday.


+1. Having reread the thread and patch I think we should go for this one.


Yep, committed. Thanks everyone!

On 15/05/2024 21:24, Jacob Champion wrote:

This assertion seems a little strange to me:


  if (conn->sslnegotiation[0] == 'p')
  {
  ProtocolVersion pv;

  Assert(conn->sslnegotiation[0] == 'p');


But other than that nitpick, nothing else jumps out at me at the moment.


Fixed that. It was a leftover, I had the if-else conditions the other 
way round at one point during development.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-05-15 Thread Heikki Linnakangas

On 14/05/2024 01:29, Jacob Champion wrote:

Definitely not a major problem, but I think
select_next_encryption_method() has gone stale, since it originally
provided generality and lines of fallback that no longer exist. In
other words, I think the following code is now misleading:


if (conn->sslmode[0] == 'a')
 SELECT_NEXT_METHOD(ENC_PLAINTEXT);

SELECT_NEXT_METHOD(ENC_NEGOTIATED_SSL);
SELECT_NEXT_METHOD(ENC_DIRECT_SSL);

if (conn->sslmode[0] != 'a')
 SELECT_NEXT_METHOD(ENC_PLAINTEXT);


To me, that implies that negotiated mode takes precedence over direct,
but the point of the patch is that it's not possible to have both. And
if direct SSL is in use, then sslmode can't be "allow" anyway, and we
definitely don't want ENC_PLAINTEXT.

So if someone proposes a change to select_next_encryption_method(),
you'll have to remember to stare at init_allowed_encryption_methods()
as well, and think really hard about what's going on. And vice-versa.
That worries me.


Ok, yeah, I can see that now. Here's a new version to address that. I 
merged ENC_SSL_NEGOTIATED_SSL and ENC_SSL_DIRECT_SSL to a single method, 
ENC_SSL. The places that need to distinguish between them now check 
conn-sslnegotiation. That seems more clear now that there is no fallback.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 7a2bc2ede5ba7bef147e509ce3c4d5472c8e0247 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 15 May 2024 16:27:51 +0300
Subject: [PATCH v2 1/1] Remove option to fall back from direct to postgres SSL
 negotiation

There were three problems with the sslnegotiation options:

1. The sslmode=prefer and sslnegotiation=requiredirect combination was
somewhat dangerous, as you might unintentionally fall back to
plaintext authentication when connecting to a pre-v17 server.

2. There was an asymmetry between 'postgres' and 'direct'
options. 'postgres' meant "try only traditional negotiation", while
'direct' meant "try direct first, and fall back to traditional
negotiation if it fails". That was apparent only if you knew that the
'requiredirect' mode also exists.

3. The "require" word in 'requiredirect' suggests that it's somehow
more strict or more secure, similar to sslmode. However, I don't
consider direct SSL connections to be a security feature.

To address these problems:

- Only allow sslnegotiation='direct' if sslmode='require' or
stronger. And for the record, Jacob and Robert felt that we should do
that (or have sslnegotiation='direct' imply sslmode='require') anyway,
regardless of the first issue.

- Remove the 'direct' mode that falls back to traditional negotiation,
and rename what was called 'requiredirect' to 'direct' instead. In
other words, there is no "try both methods" option anymore, 'postgres'
now means the traditional negotiation and 'direct' means a direct SSL
connection.

Reviewed-by: Jelte Fennema-Nio, Robert Haas, Jacob Champion
Discussion: https://www.postgresql.org/message-id/d3b1608a-a1b6-4eda-9ec5-ddb3e4375808%40iki.fi
---
 doc/src/sgml/libpq.sgml   |  49 ++--
 src/interfaces/libpq/fe-connect.c | 144 +-
 src/interfaces/libpq/fe-secure-openssl.c  |   2 +-
 src/interfaces/libpq/libpq-int.h  |   6 +-
 .../libpq/t/005_negotiate_encryption.pl   | 254 --
 5 files changed, 202 insertions(+), 253 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 1d32c226d8..b32e497b1b 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1772,15 +1772,18 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   sslnegotiation
   

-This option controls whether PostgreSQL
-will perform its protocol negotiation to request encryption from the
-server or will just directly make a standard SSL
-connection.  Traditional PostgreSQL
-protocol negotiation is the default and the most flexible with
-different server configurations. If the server is known to support
-direct SSL connections then the latter requires one
-fewer round trip reducing connection latency and also allows the use
-of protocol agnostic SSL network tools.
+This option controls how SSL encryption is negotiated with the server,
+if SSL is used. In the default postgres mode, the
+client first asks the server if SSL is supported. In
+direct mode, the client starts the standard SSL
+handshake directly after establishing the TCP/IP connection. Traditional
+PostgreSQL protocol negotiation is the most
+flexible with different server configurations. If the server is known
+to support direct SSL connections then the latter
+requires one fewer round trip reducing connection latency and also
+allows the use of protocol agnostic SSL network tools. The direct SSL
+option was introduced in PostgreSQL versio

Re: I have an exporting need...

2024-05-14 Thread Heikki Linnakangas

On 13/05/2024 16:01, Juan Hernández wrote:

Hi team!

First, i want to thank you for having your hands in this. You are doing 
a fantastic and blessing job. Bless to you all!


I have a special need i want to comment to you. This is not a bug, is a 
need i have and i write here for been redirected where needed.


I have to make a daily backup. The database is growing a lot per day, 
and sometimes i've had the need to recover just a table. And would be 
easier move a 200M file with only the needed table instead of moving a 
5G file with all the tables i don't need, just a matter of speed.


I've created a script to export every table one by one, so in case i 
need to import a table again, don't have the need to use the very big 
exportation file, but the "tablename.sql" file created for every table.


My hosting provider truncated my script because is very large (more than 
200 lines, each line to export one table), so i think the way i do this 
is hurting the server performance.


Some ideas for you to explore:

- Use "pg_dump -Fcustom" format. That still creates one large file, but 
you can then use "pg_restore --table=foobar" to extract a .sql file for 
single table from that when restoring.


- "pg_dump -Fdirectory" format does actually create one file per table. 
It's in pg_dump's internal format though, so you'll still need to use 
pg_restore to make sense of it.


- Use rsync to copy just the changed parts between two dump.


Then my question.

Do you consider useful to add a parameter (for example, 
--separatetables) so when used the exporting file process can create a 
different tablename.sql file for each table in database automatically?


It'd be tricky to restore from, as you need to restore the tables in the 
right order. I think you'd still need a "main" sql file that includes 
all the other files in the right order. And using the table names as 
filenames gets tricky if the table names contain any funny characters.


For manual operations, yeah, I can see it being useful nevertheless.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Heikki Linnakangas

On 13/05/2024 16:55, Jelte Fennema-Nio wrote:

On Mon, 13 May 2024 at 15:38, Heikki Linnakangas  wrote:

Here's a patch to implement that.


+   if (conn->sslnegotiation[0] == 'd' &&
+   conn->sslmode[0] != 'r' && conn->sslmode[0] != 'v')

I think these checks should use strcmp instead of checking magic first
characters. I see this same clever trick is used in the recently added
init_allowed_encryption_methods, and I think that should be changed to
use strcmp too for readability.


Oh yeah, I hate that too. These should be refactored into enums, with a 
clear separate stage of parsing the options from strings. But we use 
that pattern all over the place, so I didn't want to start reforming it 
with this patch.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Heikki Linnakangas

On 10/05/2024 16:50, Heikki Linnakangas wrote:

New proposal:

- Remove the "try both" mode completely, and rename "requiredirect" to
just "direct". So there would be just two modes: "postgres" and
"direct". On reflection, the automatic fallback mode doesn't seem very
useful. It would make sense as the default, because then you would get
the benefits automatically in most cases but still be compatible with
old servers. But if it's not the default, you have to fiddle with libpq
settings anyway to enable it, and then you might as well use the
"requiredirect" mode when you know the server supports it. There isn't
anything wrong with it as such, but given how much confusion there's
been on how this all works, I'd prefer to cut this back to the bare
minimum now. We can add it back in the future, and perhaps make it the
default at the same time. This addresses points 2. and 3. above.

and:

- Only allow sslnegotiation=direct with sslmode=require or higher. This
is what you, Jacob, wanted to do all along, and addresses point 1.


Here's a patch to implement that.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 2a7bde8502966a634b62d4109f0cde5fdc685da2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Mon, 13 May 2024 16:36:54 +0300
Subject: [PATCH 1/1] Remove option to fall back from direct to postgres SSL
 negotiation

There were three problems with the sslnegotiation options:

1. The sslmode=prefer and sslnegotiation=requiredirect combination was
somewhat dangerous, as you might unintentionally fall back to
plaintext authentication when connecting to a pre-v17 server.

2. There was an asymmetry between 'postgres' and 'direct'
options. 'postgres' meant "try only traditional negotiation", while
'direct' meant "try direct first, and fall back to traditional
negotiation if it fails". That was apparent only if you knew that the
'requiredirect' mode also exists.

3. The "require" word in 'requiredirect' suggests that it's somehow
more strict or more secure, similar to sslmode. However, I don't
consider direct SSL connections to be a security feature.

To address these problems:

- Only allow sslnegotiation='direct' if sslmode='require' or
stronger. And for the record, Jacob and Robert felt that we should do
that (or have sslnegotiation='direct' imply sslmode='require') anyway,
regardless of the first issue.

- Remove the 'direct' mode that falls back to traditional negotiation,
and rename what was called 'requiredirect' to 'direct' instead. In
other words, there is no "try both methods" option anymore, 'postgres'
now means the traditional negotiation and 'direct' means a direct SSL
connection.

Discussion: https://www.postgresql.org/message-id/d3b1608a-a1b6-4eda-9ec5-ddb3e4375808%40iki.fi
---
 doc/src/sgml/libpq.sgml   |  49 ++--
 src/interfaces/libpq/fe-connect.c |  52 ++--
 src/interfaces/libpq/libpq-int.h  |   3 +-
 .../libpq/t/005_negotiate_encryption.pl   | 254 --
 4 files changed, 150 insertions(+), 208 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 1d32c226d8..b32e497b1b 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1772,15 +1772,18 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   sslnegotiation
   

-This option controls whether PostgreSQL
-will perform its protocol negotiation to request encryption from the
-server or will just directly make a standard SSL
-connection.  Traditional PostgreSQL
-protocol negotiation is the default and the most flexible with
-different server configurations. If the server is known to support
-direct SSL connections then the latter requires one
-fewer round trip reducing connection latency and also allows the use
-of protocol agnostic SSL network tools.
+This option controls how SSL encryption is negotiated with the server,
+if SSL is used. In the default postgres mode, the
+client first asks the server if SSL is supported. In
+direct mode, the client starts the standard SSL
+handshake directly after establishing the TCP/IP connection. Traditional
+PostgreSQL protocol negotiation is the most
+flexible with different server configurations. If the server is known
+to support direct SSL connections then the latter
+requires one fewer round trip reducing connection latency and also
+allows the use of protocol agnostic SSL network tools. The direct SSL
+option was introduced in PostgreSQL version
+17.

 
 
@@ -1798,32 +1801,14 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   direct
   

- first attempt to establish a standard SSL connection and if that
- fails reconnect and perform the neg

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Heikki Linnakangas

On 13/05/2024 12:50, Jelte Fennema-Nio wrote:

On Sun, 12 May 2024 at 23:39, Heikki Linnakangas  wrote:

In v18, I'd like to make sslmode=require the default. Or maybe introduce
a new setting like "encryption=ssl|gss|none", defaulting to 'ssl'. If we
want to encourage encryption, that's the right way to do it. (I'd still
recommend everyone to use an explicit sslmode=require in their
connection strings for many years, though, because you might be using an
older client without realizing it.)


I'm definitely a huge proponent of making the connection defaults more
secure. But as described above sslmode=require is still extremely
insecure and I don't think we gain anything substantial by making it
the default. I think the only useful secure default would be to use
sslmode=verify-full (with probably some automatic fallback to
sslmode=prefer when connecting to hardcoded IPs or localhost). Which
probably means that sslrootcert=system should also be made the
default. Which would mean that ~/.postgresql/root.crt would not be the
default anymore, which I personally think is fine but others likely
disagree.


"channel_binding=require sslmode=require" also protects from MITM attacks.

I think these options should be designed from the user's point of view, 
so that the user can specify the risks they're willing to accept, and 
the details of how that's accomplished are handled by libpq. For 
example, I'm OK with (tick all that apply):


[ ] passive eavesdroppers seeing all the traffic
[ ] MITM being able to hijack the session
[ ] connecting without verifying the server's identity
[ ] divulging the plaintext password to the server
[ ] ...

The requirements for whether SSL or GSS encryption is required, whether 
the server's certificate needs to signed with known CA, etc. can be 
derived from those. For example, if you need protection from 
eavesdroppers, SSL or GSS encryption must be used. If you need to verify 
the server's identity, it implies sslmode=verify-CA or 
channel_binding=true. If you don't want to divulge the password, it 
implies a suitable require_auth setting. I don't have a concrete 
proposal yet, but something like that. And the defaults for those are up 
for debate.


psql could perhaps help by listing the above properties at the beginning 
of the session, something like:


psql (16.2)
WARNING: Connection is not encrypted.
WARNING: The server's identity has not been verified
Type "help" for help.

postgres=#

Although for the "divulge plaintext password to server" property, it's 
too late to print a warning after connecting, because the damage has 
already been done.


A different line of thought is that to keep the attack surface as smal 
as possible, you should specify very explicitly what exactly you expect 
to happen in the authentication, and disallow any variance. For example, 
you expect SCRAM to be used, with a certificate signed by a particular 
CA, and if the server requests anything else, that's suspicious and the 
connection is aborted. We should make that possible too, but the above 
flexible risk-based approach seems good for the defaults.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-05-12 Thread Heikki Linnakangas

On 11/05/2024 23:45, Jelte Fennema-Nio wrote:

On Fri, 10 May 2024 at 15:50, Heikki Linnakangas  wrote:

New proposal:

- Remove the "try both" mode completely, and rename "requiredirect" to
just "direct". So there would be just two modes: "postgres" and
"direct". On reflection, the automatic fallback mode doesn't seem very
useful. It would make sense as the default, because then you would get
the benefits automatically in most cases but still be compatible with
old servers. But if it's not the default, you have to fiddle with libpq
settings anyway to enable it, and then you might as well use the
"requiredirect" mode when you know the server supports it. There isn't
anything wrong with it as such, but given how much confusion there's
been on how this all works, I'd prefer to cut this back to the bare
minimum now. We can add it back in the future, and perhaps make it the
default at the same time. This addresses points 2. and 3. above.

and:

- Only allow sslnegotiation=direct with sslmode=require or higher. This
is what you, Jacob, wanted to do all along, and addresses point 1.

Thoughts?


Sounds mostly good to me. But I think we'd want to automatically
increase sslmode to require if it is unset, but sslnegotation is set
to direct. Similar to how we bump sslmode to verify-full if
sslrootcert is set to system, but sslmode is unset. i.e. it seems
unnecessary/unwanted to throw an error if the connection string only
contains sslnegotiation=direct


I find that error-prone. For example:

1. Try to connect to a server with direct negotiation: psql "host=foobar 
dbname=mydb sslnegotiation=direct"


2. It fails. Maybe it was an old server? Let's change it to 
sslnegotiation=postgres.


3. Now it succeeds. Great!

You might miss that by changing sslnegotiation to 'postgres', or by 
removing it altogether, you not only made it compatible with older 
server versions, but you also allowed falling back to a plaintext 
connection. Maybe you're fine with that, but maybe not. I'd like to 
nudge people to use sslmode=require, not rely on implicit stuff like 
this just to make connection strings a little shorter.


I'm not a fan of sslrootcert=system implying sslmode=verify-full either, 
for the same reasons. But at least "sslrootcert" is a clearly 
security-related setting, so removing it might give you a pause, whereas 
sslnegotition is about performance and compatibility.


In v18, I'd like to make sslmode=require the default. Or maybe introduce 
a new setting like "encryption=ssl|gss|none", defaulting to 'ssl'. If we 
want to encourage encryption, that's the right way to do it. (I'd still 
recommend everyone to use an explicit sslmode=require in their 
connection strings for many years, though, because you might be using an 
older client without realizing it.)


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: open items

2024-05-11 Thread Heikki Linnakangas

On 09/05/2024 22:39, Robert Haas wrote:

On Thu, May 9, 2024 at 3:38 PM Dagfinn Ilmari Mannsåker
 wrote:

Robert Haas  writes:

* Register ALPN protocol id with IANA. From the mailing list thread,
it is abundantly clear that IANA is in no hurry to finish dealing with
what seems to be a completely pro forma request from our end. I think
we just have to be patient.


This appears to have been approved without anyone mentioning it on the
list, and the registry now lists "postgresql" at the bottom:

https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml#alpn-protocol-ids


Nice, thanks!


Committed the change from "TBD-pgsql" to "postgresql", thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-05-10 Thread Heikki Linnakangas

On 29/04/2024 22:32, Jacob Champion wrote:

On Mon, Apr 29, 2024 at 12:06 PM Heikki Linnakangas  wrote:

There is a small benefit with sslmode=prefer if you connect to a server
that doesn't support SSL, though. With sslnegotiation=direct, if the
server rejects the direct SSL connection, the client will reconnect and
try SSL with SSLRequest. The server will respond with 'N', and the
client will proceed without encryption. sslnegotiation=directonly
removes that SSLRequest attempt, eliminating one roundtrip.


Okay, agreed that in this case, there is a performance benefit. It's
not enough to convince me, honestly, but are there any other cases I
missed as well?


I realized one case that hasn't been discussed so far: If you use the 
combination of "sslmode=prefer sslnegotiation=requiredirect" to connect 
to a pre-v17 server that has SSL enabled but does not support direct SSL 
connections, you will fall back to a plaintext connection instead. 
That's almost certainly not what you wanted. I'm coming around to your 
opinion that we should not allow that combination.


Stepping back to summarize my thoughts, there are now three things I 
don't like about the status quo:


1. As noted above, the sslmode=prefer and sslnegotiation=requiredirect 
combination is somewhat dangerous, as you might unintentionally fall 
back to plaintext authentication when connecting to a pre-v17 server.


2. There is an asymmetry between "postgres" and "direct"
option names. "postgres" means "try only traditional negotiation", while
"direct" means "try direct first, and fall back to traditional
negotiation if it fails". That is apparent only if you know that the
"requiredirect" mode also exists.

3. The "require" word in "requiredirect" suggests that it's somehow
more strict or more secure, similar to sslmode=require. However, I don't 
consider direct SSL connections to be a security feature.



New proposal:

- Remove the "try both" mode completely, and rename "requiredirect" to 
just "direct". So there would be just two modes: "postgres" and 
"direct". On reflection, the automatic fallback mode doesn't seem very 
useful. It would make sense as the default, because then you would get 
the benefits automatically in most cases but still be compatible with 
old servers. But if it's not the default, you have to fiddle with libpq 
settings anyway to enable it, and then you might as well use the 
"requiredirect" mode when you know the server supports it. There isn't 
anything wrong with it as such, but given how much confusion there's 
been on how this all works, I'd prefer to cut this back to the bare 
minimum now. We can add it back in the future, and perhaps make it the 
default at the same time. This addresses points 2. and 3. above.


and:

- Only allow sslnegotiation=direct with sslmode=require or higher. This 
is what you, Jacob, wanted to do all along, and addresses point 1.


Thoughts?

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: consider -Wmissing-variable-declarations

2024-05-10 Thread Heikki Linnakangas

On 09/05/2024 12:23, Peter Eisentraut wrote:

In [0] I had noticed that we have no automated verification that global
variables are declared in header files.  (For global functions, we have
this through -Wmissing-prototypes.)  As I mentioned there, I discovered
the Clang compiler option -Wmissing-variable-declarations, which does
exactly that.  Clang has supported this for quite some time, and GCC 14,
which was released a few days ago, now also supports it.  I went and
installed this option into the standard build flags and cleaned up the
warnings it found, which revealed a number of interesting things.


Nice! More checks like this is good in general.


Attached are patches organized by sub-topic.  The most dubious stuff is
in patches 0006 and 0007.  A bunch of GUC-related variables are not in
header files but are pulled in via ad-hoc extern declarations.  I can't
recognize an intentional scheme there, probably just done for
convenience or copied from previous practice.  These should be organized
into appropriate header files.


+1 for moving all these to header files. Also all the "other stuff" in 
patch 0007.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Set appropriate processing mode for auxiliary processes.

2024-05-09 Thread Heikki Linnakangas

On 09/05/2024 16:12, Xing Guo wrote:

Hi hackers,

After several refactoring iterations, auxiliary processes are no
longer initialized from the bootstrapper. I think using the
InitProcessing mode for initializing auxiliary processes is more
appropriate.


At first I was sure this was introduced by my refactorings in v17, but 
in fact it's been like this forever. I agree that InitProcessing makes 
much more sense. The ProcessingMode variable is initialized to 
InitProcessing, so I think we can simply remove that line from 
AuxiliaryProcessMainCommon(). There are existing 
"SetProcessingMode(InitProcessing)" calls in other Main functions too 
(AutoVacLauncherMain, BackgroundWorkerMain, etc.), and I think those can 
also be removed.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-04-29 Thread Heikki Linnakangas

On 29/04/2024 21:43, Jacob Champion wrote:

On Mon, Apr 29, 2024 at 1:38 AM Heikki Linnakangas  wrote:

If you only
have v17 servers in your environment, so you know all servers support
direct negotiation if they support SSL at all, but a mix of servers with
and without SSL, sslnegotiation=directonly would reduce roundtrips with
sslmode=prefer.


But if you're in that situation, what does the use of directonly give
you over `sslnegotiation=direct`? You already know that servers
support direct, so there's no additional performance penalty from the
less strict mode.


Well, by that argument we don't need requiredirect/directonly at all. 
This goes back to whether it's a security feature or a performance feature.


There is a small benefit with sslmode=prefer if you connect to a server 
that doesn't support SSL, though. With sslnegotiation=direct, if the 
server rejects the direct SSL connection, the client will reconnect and 
try SSL with SSLRequest. The server will respond with 'N', and the 
client will proceed without encryption. sslnegotiation=directonly 
removes that SSLRequest attempt, eliminating one roundtrip.



Making requiredirect to imply sslmode=require, or error out unless you
also set sslmode=require, feels like a cavalier way of forcing SSL. We
should have a serious discussion on making sslmode=require the default
instead. That would be a more direct way of nudging people to use SSL.
It would cause a lot of breakage, but it would also be a big improvement
to security.

Consider how sslnegotiation=requiredirect/directonly would feel, if we
made sslmode=require the default. If you explicitly set "sslmode=prefer"
or "sslmode=disable", it would be annoying if you would also need to
remove "sslnegotiation=requiredirect" from your connection string.


That's similar to how sslrootcert=system already works. To me, it
feels great, because I don't have to worry about nonsensical
combinations (with the exception of GSS, which we've touched on
above). libpq complains loudly if I try to shoot myself in the foot,
and if I'm using sslrootcert=system then it's a pretty clear signal
that I care more about security than the temporary inconvenience of
editing my connection string for one weird server that doesn't use SSL
for some reason.


Oh I was not aware sslrootcert=system works like that. That's a bit 
surprising, none of the other ssl-related settings imply or require that 
SSL is actually used. Did we intend to set a precedence for new settings 
with that?


(adding Daniel in case he has an opinion)

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-04-29 Thread Heikki Linnakangas

On 29/04/2024 21:04, Jacob Champion wrote:

On Fri, Apr 26, 2024 at 3:51 PM Heikki Linnakangas  wrote:

I finally understood what you mean. So if the client supports ALPN, but
the list of protocols that it provides does not include 'postgresql',
the server should reject the connection with 'no_applicaton_protocol'
alert.


Right. (And additionally, we reject clients that don't advertise ALPN
over direct SSL, also during the TLS handshake.)


The attached patch makes that change. I used the alpn_cb() function in
openssl's own s_server program as example for that.


This patch as written will apply the new requirement to the old
negotiation style, though, won't it? My test suite sees a bunch of
failures with that.


Yes, and that is what we want, right? If the client uses old negotiation 
style, and includes ALPN in its ClientHello, but requests protocol 
"noodles" instead of "postgresql", it seems good to reject the connection.


Note that if the client does not request ALPN at all, the callback is 
not called, and the connection is accepted. Old clients still work 
because they do not request ALPN.



Unfortunately the error message you got in the client with that was
horrible (I modified the server to not accept the 'postgresql' protocol):

psql "dbname=postgres sslmode=require host=localhost"
psql: error: connection to server at "localhost" (::1), port 5432
failed: SSL error: SSL error code 167773280




I filed a bug upstream [1].


Thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection and ALPN loose ends

2024-04-29 Thread Heikki Linnakangas

On 29/04/2024 21:06, Ranier Vilela wrote:
Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas 
mailto:hlinn...@iki.fi>> escreveu:


On 29/04/2024 20:10, Ranier Vilela wrote:
 > Hi,
 >
 > With TLS 1.3 and others there is possibly a security flaw using
ALPN [1].
 >
 > It seems to me that the ALPN protocol can be bypassed if the
client does
 > not correctly inform the ClientHello header.
 >
 > So, the suggestion is to check the ClientHello header in the
server and
 > terminate the TLS handshake early.

Sounds to me like it's working as designed. ALPN in general is
optional;
if the client doesn't request it, then you proceed without it. We do
require ALPN for direct SSL connections though. We can, because direct
SSL connections is a new feature in Postgres. But we cannot require it
for the connections negotiated with SSLRequest, or we break
compatibility with old clients that don't use ALPN.

Ok.
But what if I have a server configured for TLS 1.3 and that requires 
ALPN to allow access?

What about a client configured without ALPN requiring connection?


Sorry, I don't understand the questions. What about them?

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection and ALPN loose ends

2024-04-29 Thread Heikki Linnakangas

On 29/04/2024 20:10, Ranier Vilela wrote:

Hi,

With TLS 1.3 and others there is possibly a security flaw using ALPN [1].

It seems to me that the ALPN protocol can be bypassed if the client does 
not correctly inform the ClientHello header.


So, the suggestion is to check the ClientHello header in the server and
terminate the TLS handshake early.


Sounds to me like it's working as designed. ALPN in general is optional; 
if the client doesn't request it, then you proceed without it. We do 
require ALPN for direct SSL connections though. We can, because direct 
SSL connections is a new feature in Postgres. But we cannot require it 
for the connections negotiated with SSLRequest, or we break 
compatibility with old clients that don't use ALPN.


There is a check in direct SSL mode that ALPN was used 
(ProcessSSLStartup in backend_startup.c):



if (!port->alpn_used)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("received direct SSL connection request 
without ALPN protocol negotiation extension")));
goto reject;
}


That happens immediately after the SSL connection has been established.

Hmm. I guess it would be better to abort the connection earlier, without 
completing the TLS handshake. Otherwise the client might send the first 
message in wrong protocol to the PostgreSQL server. That's not a 
security issue for the PostgreSQL server: the server disconnects without 
reading the message. And I don't see any way for an ALPACA attack when 
the server ignores the client's message. Nevertheless, from the point of 
view of keeping the attack surface as small as possible, aborting 
earlier seems better.


--
Heikki Linnakangas
Neon (https://neon.tech)





Direct SSL connection and ALPN loose ends

2024-04-29 Thread Heikki Linnakangas
There's been a bunch of bugs, and discussion on the intended behavior of 
sslnegotiation and ALPN. This email summarizes the current status:


## Status and loose ends for beta1

All reported bugs have now been fixed. We now enforce ALPN in all the 
right places. Please let me know if I missed something.


There are two open items remaining that I intend to address in the next 
few days, before beta1:


- I am going to rename sslnegotiation=requiredirect to 
sslnegotiation=directonly. I acknowledge that there is still some debate 
on this: Jacob (and Robert?) would prefer to change the behavior 
instead, so that sslnegotiation=requiredirect would also imply or 
require sslmode=require, while IMHO the settings should be orthogonal so 
that sslmode controls whether SSL is used or not, and sslnegotiation 
controls how the SSL layer is negotiated when SSL is used. Given that 
they are orthogonal, "directonly" is a better name. I will also take 
another look at the documentation, if it needs clarification on that 
point. If you have more comments on whether this is a good idea or not 
or how sslnegotiation should work, please reply on the other thread, 
let's keep this one focused on the overall status. [1]


- The registration of the ALPN name with IANA hasn't been finished yet 
[2]. I originally requested the name "pgsql", but after Peter's comment, 
I changed the request to "postgresql". The string we have in 'master' is 
currently "TBD-pgsql". I'm very confident that the registration will go 
through with "postgresql", so my plan is to commit that change before 
beta1, even if the IANA process hasn't completed by then.


## V18 material

- Add an option to disable traditional SSL negotiation in the server. 
There was discussion on doing this via HBA rules or as a global option, 
and the consensus seems to be for a global option. This would be just to 
reduce the attach surface, there is no known vulnerabilities or other 
issues with the traditional negotiation. And maybe to help with testing. [3]


These are not directly related to sslnegotiation, but came up in the 
discussion:


- Clarify the situation with sslmode=require and gssencmode=require 
combination, by replacing sslmode and gssencmode options with a single 
"encryption=[ssl|gss|none], [...]" option. [4]


- Make sslmode=require the default. This is orthogonal to the SSL 
negotiation, but I think the root cause for the disagreements on 
sslnegotiation is actually that we'd like SSL to be the default. [5]


The details of these need to be hashed out, in particular the 
backwards-compatibility and migration aspects, but the consensus seems 
to be that it's the right direction.


## V19 and beyond

In the future, once v17 is ubiquitous and the ecosystem (pgbouncer etc) 
have added direct SSL support, we can change the default sslnegotiation 
from 'postgres' to 'direct'. I'm thinking 3-5 years from now. In the 
more distant future, we could remove the traditional SSLRequest 
negotiation altogether and always use direct SSL negotiation.


There's no rush on these.

## Retrospective

There were a lot more cleanups required for this work than I expected, 
given that there were little changes to the patches between January and 
March commitfests. I was mostly worried about the refactoring of the 
retry logic in libpq (and about the pre-existing logic too to be honest, 
it was complicated before these changes already). That's why I added a 
lot more tests for that. However, I did not foresee all the ALPN related 
issues. In hindsight, it would have been good to commit most of the ALPN 
changes first, and with more tests. Jacob wrote a python test suite; I 
should've played more with that, that could have demonstrated the ALPN 
issues earlier.


[1] 
https://www.postgresql.org/message-id/CA%2BTgmobV9JEk4AFy61Xw%2B2%2BcCTBqdTsDopkeB%2Bgb81kq3f-o6A%40mail.gmail.com


[2] 
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/


[3] 
https://www.postgresql.org/message-id/CA%2BTgmoaLpDVY2ywqQUfxvKEQZ%2Bnwkabcw_f%3Di4Zyivt9CLjcmA%40mail.gmail.com


[4] 
https://www.postgresql.org/message-id/3a6f126c-e1aa-4dcc-9252-9868308f6cf0%40iki.fi


[5] 
https://www.postgresql.org/message-id/CA%2BTgmoaNkRerEmB9JPgW0FhcJAe337AA%3D5kp6je9KekQhhRbmA%40mail.gmail.com


--
Heikki Linnakangas
Neon (https://neon.tech)




Re: Direct SSL connection with ALPN and HBA rules

2024-04-29 Thread Heikki Linnakangas

On 23/04/2024 10:07, Michael Paquier wrote:

In the documentation of PQsslAttribute(), it is mentioned that empty
string is returned for "alpn" if ALPN was not used, however the code
returns NULL in this case:
 SSL_get0_alpn_selected(conn->ssl, , );
 if (data == NULL || len == 0 || len > sizeof(alpn_str) - 1)
 return NULL;


Good catch. I changed the code to return an empty string, as the 
documentation says.


I considered if NULL or empty string would be better here. The docs for 
PQsslAttribute also says:


"Returns NULL if the connection does not use SSL or the specified 
attribute name is not defined for the library in use."


If a caller wants to distinguish between "libpq or the SSL library 
doesn't support ALPN at all" from "the server didn't support ALPN", you 
can tell from whether PQsslAttribute returns NULL or an empty string. So 
I think an empty string is better.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-04-29 Thread Heikki Linnakangas

On 26/04/2024 02:23, Jacob Champion wrote:

On Thu, Apr 25, 2024 at 2:50 PM Heikki Linnakangas  wrote:

I think that comes down to the debate upthread, and whether you think
it's a performance tweak or a security feature. My take on it is,
`direct` mode is performance, and `requiredirect` is security.


Agreed, although the the security benefits from `requiredirect` are
pretty vague. It reduces the attack surface, but there are no known
issues with the 'postgres' or 'direct' negotiation either.


I think reduction in attack surface is a concrete security benefit,
not a vague one. True, I don't know of any exploits today, but that
seems almost tautological -- if there were known exploits in our
upgrade handshake, I assume we'd be working to fix them ASAP?


Sure, we'd try to fix them ASAP. But we've had the SSLRequest 
negotiation since time immemorial. If a new vulnerability is found, it's 
unlikely that we'd need to disable the SSLRequest negotiation altogether 
to fix it. We'd be in serious trouble with back-branches in that case. 
There's no sudden need to have a kill-switch for it.


Taking that to the extreme, you could argue for a kill-switch for every 
feature, just in case there's a vulnerability in them. I agree that 
authentication is more sensitive so reducing the surface of that is more 
reasonable. And but nevertheless.


(This discussion is moot though, because we do have the 
sslnegotiation=requiredirect mode, so you can disable the SSLRequest 
negotiation.)



Perhaps 'requiredirect' should be renamed to 'directonly'?


If it's agreed that we don't want to require a stronger sslmode for
that sslnegotiation setting, then that would probably be an
improvement. But who is the target user for
`sslnegotiation=directonly`, in your opinion? Would they ever have a
reason to use a weak sslmode?


It's unlikely, I agree. A user who's sophisticated enough to use 
sslnegotiation=directonly would probably also want sslmode=require and 
require_auth=scram-sha256 and channel_binding=require. Or 
sslmode=verify-full. But in principle they're orthogonal. If you only 
have v17 servers in your environment, so you know all servers support 
direct negotiation if they support SSL at all, but a mix of servers with 
and without SSL, sslnegotiation=directonly would reduce roundtrips with 
sslmode=prefer.


Making requiredirect to imply sslmode=require, or error out unless you 
also set sslmode=require, feels like a cavalier way of forcing SSL. We 
should have a serious discussion on making sslmode=require the default 
instead. That would be a more direct way of nudging people to use SSL. 
It would cause a lot of breakage, but it would also be a big improvement 
to security.


Consider how sslnegotiation=requiredirect/directonly would feel, if we 
made sslmode=require the default. If you explicitly set "sslmode=prefer" 
or "sslmode=disable", it would be annoying if you would also need to 
remove "sslnegotiation=requiredirect" from your connection string.


I'm leaning towards renaming sslnegotiation=requiredirect to 
sslnegotiation=directonly at this point.



I'm not sure about this either. The 'gssencmode' option is already
quite weird in that it seems to override the "require"d priority of
"sslmode=require", which it IMO really shouldn't.


Yeah, that combination is weird. I think we should forbid it. But that's
separate from sslnegotiation.


Separate but related, IMO. If we were all hypothetically okay with
gssencmode ignoring `sslmode=require`, then it's hard for me to claim
that `sslnegotiation=requiredirect` should behave differently. On the
other hand, if we're not okay with that and we'd like to change it,
it's easier for me to argue that `requiredirect` should also be
stricter from the get-go.


I think the best way forward for those is something like a new 
"require_proto" parameter that you suggested upthread. Perhaps call it 
"encryption", with options "none", "ssl", "gss" so that you can provide 
multiple options and libpq will try them in the order specified. For 
example:


encryption=none
encryption=ssl, none  # like sslmode=prefer
encryption=gss
encryption=gss, ssl   # try GSS first, then SSL
encryption=ssl, gss   # try SSL first, then GSS

This would make gssencmode and sslmode=disable/allow/prefer/require 
settings obsolete. sslmode would stil be needed to distinguish between 
verify-ca/verify-full though. But sslnegotiation would be still 
orthogonal to that.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2024-04-28 Thread Heikki Linnakangas

On 27/04/2024 11:27, Anton A. Melnikov wrote:

Hello!

Maybe add PGDLLIMPORT to
extern bool LoadedSSL;
and
extern struct ClientSocket *MyClientSocket;
definitions in the src/include/postmaster/postmaster.h ?

Peter E noticed and Michael fixed them in commit 768ceeeaa1 already.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-04-26 Thread Heikki Linnakangas

On 23/04/2024 20:02, Jacob Champion wrote:

On Fri, Apr 19, 2024 at 2:43 PM Heikki Linnakangas  wrote:


On 19/04/2024 19:48, Jacob Champion wrote:

On Fri, Apr 19, 2024 at 6:56 AM Heikki Linnakangas  wrote:

With direct SSL negotiation, we always require ALPN.


   (As an aside: I haven't gotten to test the version of the patch that
made it into 17 yet, but from a quick glance it looks like we're not
rejecting mismatched ALPN during the handshake as noted in [1].)


Ah, good catch, that fell through the cracks. Agreed, the client should
reject a direct SSL connection if the server didn't send ALPN. I'll add
that to the Open Items so we don't forget again.


Yes, the client should also reject, but that's not what I'm referring
to above. The server needs to fail the TLS handshake itself with the
proper error code (I think it's `no_application_protocol`?); otherwise
a client implementing a different protocol could consume the
application-level bytes coming back from the server and act on them.
That's the protocol confusion attack from ALPACA we're trying to
avoid.


I finally understood what you mean. So if the client supports ALPN, but 
the list of protocols that it provides does not include 'postgresql', 
the server should reject the connection with 'no_applicaton_protocol' 
alert. Makes sense. I thought OpenSSL would do that with the alpn 
callback we have, but it does not.


The attached patch makes that change. I used the alpn_cb() function in 
openssl's own s_server program as example for that.


Unfortunately the error message you got in the client with that was 
horrible (I modified the server to not accept the 'postgresql' protocol):


psql "dbname=postgres sslmode=require host=localhost"
psql: error: connection to server at "localhost" (::1), port 5432 
failed: SSL error: SSL error code 167773280


This is similar to the case with system errors discussed at 
https://postgr.es/m/b6fb018b-f05c-4afd-abd3-318c649fa...@highgo.ca, but 
this one is equally bad on OpenSSL 1.1.1 and 3.3.0. It seems like an 
OpenSSL bug to me, because there is an error string "no application 
protocol" in the OpenSSL sources (ssl/ssl_err.c):


{ERR_PACK(ERR_LIB_SSL, 0, SSL_R_NO_APPLICATION_PROTOCOL),
"no application protocol"},

and in the server log, you get that message. But the error code seen in 
the client is different. There are also messages for other alerts, for 
example:


{ERR_PACK(ERR_LIB_SSL, 0, SSL_R_TLSV13_ALERT_MISSING_EXTENSION),
"tlsv13 alert missing extension"},

The bottom line is that that seems like a bug of omission to me in 
OpenSSL, but I wouldn't hold my breath waiting for it to be fixed. We 
can easily check for that error code and print the right message 
ourselves however, as in the attached patch.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 125e9adda6cdab644b660772e29c713863e93cc2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sat, 27 Apr 2024 01:47:55 +0300
Subject: [PATCH 1/1] Reject SSL connection if ALPN is used but there's no
 common protocol

If the client supports ALPN but tries to use some other protocol, like
HTTPS, reject the connection in the server. That is surely a confusion
of some sort like trying to PostgreSQL server with a
browser. Furthermore, the ALPN RFC 7301 says:

> In the event that the server supports no protocols that the client
> advertises, then the server SHALL respond with a fatal
> "no_application_protocol" alert.

This commit makes the server follow that advice.

In the client, specifically check for the OpenSSL error code for the
"no_application_protocol" alert. Otherwise you got a cryptic "SSL
error: SSL error code 167773280" error if you tried to connect to a
non-PostgreSQL server that rejects the connection with
"no_application_protocol". ERR_reason_error_string() returns NULL for
that code, which frankly seems like an OpenSSL bug to me, but we can
easily print a better message ourselves.
---
 src/backend/libpq/be-secure-openssl.c| 10 +++---
 src/interfaces/libpq/fe-secure-openssl.c | 12 
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index fc46a33539..60cf68aac4 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -1336,10 +1336,14 @@ alpn_cb(SSL *ssl,
 
 	if (retval == OPENSSL_NPN_NEGOTIATED)
 		return SSL_TLSEXT_ERR_OK;
-	else if (retval == OPENSSL_NPN_NO_OVERLAP)
-		return SSL_TLSEXT_ERR_NOACK;
 	else
-		return SSL_TLSEXT_ERR_NOACK;
+	{
+		/*
+		 * The client doesn't support our protocol.  Reject the connection
+		 * with TLS "no_application_protocol" alert, per RFC 7301.
+		 */
+		return SSL_TLSEXT_ERR_ALERT_FATAL;
+	}
 }
 
 
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 0de21dc7e4..ee1a47f2b1 10

Re: Direct SSL connection with ALPN and HBA rules

2024-04-25 Thread Heikki Linnakangas

On 25/04/2024 21:13, Jacob Champion wrote:

On Thu, Apr 25, 2024 at 10:35 AM Robert Haas  wrote:

Maybe I'm missing something here, but why doesn't sslnegotiation
override sslmode completely? Or alternatively, why not remove
sslnegotiation entirely and just have more sslmode values? I mean
maybe this shouldn't happen categorically, but if I say I want to
require a direct SSL connection, to me that implies that I don't want
an indirect SSL connection, and I really don't want a non-SSL
connection.


My thinking with sslnegotiation is that it controls how SSL is 
negotiated with the server, if SSL is to be used at all. It does not 
control whether SSL is used or required; that's what sslmode is for.



I think that comes down to the debate upthread, and whether you think
it's a performance tweak or a security feature. My take on it is,
`direct` mode is performance, and `requiredirect` is security.


Agreed, although the the security benefits from `requiredirect` are 
pretty vague. It reduces the attack surface, but there are no known 
issues with the 'postgres' or 'direct' negotiation either.


Perhaps 'requiredirect' should be renamed to 'directonly'?


(Especially since, with the current implementation, requiredirect can
slow things down?)


Yes: the case is gssencmode=prefer, kerberos credentical cache present 
in client, and server doesn't support GSS. With 
sslnegotiation='postgres' or 'direct', libpq can do the SSL negotiation 
over the same TCP connection after the server rejected the GSSRequest. 
With sslnegotiation='requiredirect', it needs to open a new TCP connection.



>> I think it's pretty questionable in 2024 whether sslmode=allow and

sslmode=prefer make any sense at all. I don't think it would be crazy
to remove them entirely. But I certainly don't think that they should
be allowed to bleed into the behavior of new, higher-security
configurations. Surely if I say I want direct SSL, it's that or
nothing, right?


I agree, but I more or less lost the battle at [1]. Like Matthias
mentioned in [2]:


I'm not sure about this either. The 'gssencmode' option is already
quite weird in that it seems to override the "require"d priority of
"sslmode=require", which it IMO really shouldn't.


Yeah, that combination is weird. I think we should forbid it. But that's 
separate from sslnegotiation.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2024-04-25 Thread Heikki Linnakangas

On 24/04/2024 23:51, Peter Eisentraut wrote:

On 08.04.24 10:38, Heikki Linnakangas wrote:

On 08/04/2024 04:25, Heikki Linnakangas wrote:

One important open item now is that we need to register a proper ALPN
protocol ID with IANA.


I sent a request for that:
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/


Why did you ask for "pgsql"?  The IANA protocol name for port 5432 is
"postgres".  This seems confusing.


Oh, I was not aware of that. According to [1], it's actually 
"postgresql". The ALPN registration has not been approved yet, so I'll 
reply on the ietf thread to point that out.


[1] 
https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Feature request: schema diff tool

2024-04-24 Thread Heikki Linnakangas

On 24/04/2024 09:44, Neszt Tibor wrote:

Hello,

A diff tool that would generate create, drop, alter, etc. commands from 
the differences between 2 specified schemes would be very useful. The 
diff could even manage data, so there would be insert, delete update 
command outputs, although I think the schema diff management is much 
more important and necessary.


Today, all modern applications are version-tracked, including the sql 
scheme. Now the schema changes must be handled twice: on the one hand, 
the schema must be modified, and on the other hand, the schema 
modification commands must also be written for the upgrade process. A 
good diff tool would allow only the schema to be modified.


Such a tool already exists because the community needed it, e.g. 
apgdiff. I think the problem with this is that the concept isn't even 
good. I think this tool should be part of postgresql, because postgresql 
always knows what the 100% sql syntax is current, an external program, 
for example apgdiff can only follow changes afterwards, generating 
continuous problems.


Does a schema diff tool need / want to actually parse the SQL syntax?

On the other hand, an external tool can be developed independently of 
the PostgreSQL release schedule. And you'd want the same tool to work 
with different PostgreSQL versions. Those are reasons to *not* bundle it 
with PostgreSQL itself. PostgreSQL has a rich ecosystem of tools with 
different approaches and tradeoffs, and that's a good thing.


On the whole, -1 from me. I could be convinced otherwise if there's a 
technical reason it would need to be part of PostgreSQL itself, but 
otherwise it's better to not bundle it.


Not to mention that an external application can 
stop, e.g. apgdiff is also no longer actively developed, so users who 
built on a diff tool are now in trouble.


Furthermore, it is the least amount of work to do this on the postgresql 
development side, you have the expertise, the sql language processor, etc.


That's just asking for Someone Else to do the work. There are many other 
schema diff tools out there. And you can get pretty far by just running 
'pg_dump --schema-only' on both databases and comparing them with 'diff'.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-04-23 Thread Heikki Linnakangas

On 23/04/2024 22:33, Jacob Champion wrote:

On Tue, Apr 23, 2024 at 10:43 AM Robert Haas  wrote:

I've not followed this thread closely enough to understand the comment
about requiredirect maybe not actually requiring direct, but if that
were true it seems like it might be concerning.


It may be my misunderstanding. This seems to imply bad behavior:


If the server rejects GSS encryption, SSL is
negotiated over the same TCP connection using the traditional postgres
protocol, regardless of sslnegotiation.


As does this comment:


+   /*
+* If enabled, try direct SSL. Unless we have a valid TCP connection that
+* failed negotiating GSSAPI encryption or a plaintext connection in case
+* of sslmode='allow'; in that case we prefer to reuse the connection with
+* negotiated SSL, instead of reconnecting to do direct SSL. The point of
+* direct SSL is to avoid the roundtrip from the negotiation, but
+* reconnecting would also incur a roundtrip.
+*/


but when I actually try those cases, I see that requiredirect does
actually cause a direct SSL connection to be done, even with
sslmode=allow. So maybe it's just misleading documentation (or my
misreading of it) that needs to be expanded? Am I missing a different
corner case where requiredirect is ignored, Heikki?


You're right, the comment is wrong about sslmode=allow. There is no 
negotiation of a plaintext connection, the client just sends the startup 
packet directly. The HBA rules can reject it, but the client will have 
to disconnect and reconnect in that case.


The documentation and that comment are misleading about failed GSSAPI 
encryption too, and I also misremembered that. With 
sslnegotiation=requiredirect, libpq never uses negotiated SSL mode. It 
will reconnect after a rejected GSSAPI request. So that comment applies 
to sslnegotiation=direct, but not sslnegotiation=requiredirect.


Attached patch tries to fix and clarify those.

(Note that the client will only attempt GSSAPI encryption if it can find 
kerberos credentials in the environment.)


--
Heikki Linnakangas
Neon (https://neon.tech)
From 664555decb695123a4bf25ea56f789202b926ea0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 24 Apr 2024 00:10:24 +0300
Subject: [PATCH 1/1] Fix documentation and comments on what happens after GSS
 rejection

The paragraph in the docs and the comment applied to
sslnegotiaton=direct, but not sslnegotiation=requiredirect. In
'requiredirect' mode, negotiated SSL is never used. Move the paragraph
in the docs under the description of 'direct' mode, and rephrase it.

Also the comment's reference to reusing a plaintext connection was
bogus. Authentication failure in plaintext mode only happens after
sending the startup packet, so the connection cannot be reused.
---
 doc/src/sgml/libpq.sgml   | 19 +--
 src/interfaces/libpq/fe-connect.c | 11 ++-
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 9199d0d2e5..7f854edfa2 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1803,6 +1803,15 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
  process adds significant latency if the initial SSL connection
  fails.

+   
+ An exception is if gssencmode is set
+ to prefer, but the server rejects GSS encryption.
+ In that case, SSL is negotiated over the same TCP connection using
+ PostgreSQL protocol negotiation. In
+ other words, the direct SSL handshake is not used, if a TCP
+ connection has already been established and can be used for the
+ SSL handshake.
+   
   
  
 
@@ -1816,16 +1825,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
-
-   
-Note that if gssencmode is set
-to prefer, a GSS connection is
-attempted first. If the server rejects GSS encryption, SSL is
-negotiated over the same TCP connection using the traditional postgres
-protocol, regardless of sslnegotiation. In other
-words, the direct SSL handshake is not used, if a TCP connection has
-already been established and can be used for the SSL handshake.
-   
   
  
 
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index ec20e3f3a9..b1d3bfa59d 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -4430,11 +4430,12 @@ select_next_encryption_method(PGconn *conn, bool have_valid_connection)
 
 	/*
 	 * If enabled, try direct SSL. Unless we have a valid TCP connection that
-	 * failed negotiating GSSAPI encryption or a plaintext connection in case
-	 * of sslmode='allow'; in that case we prefer to reuse the connection with
-	 * negotiated SSL, instead of reconnecting to do direct SSL. The point of
-	 * direct SSL is to avoid

Re: POC: make mxidoff 64 bits

2024-04-23 Thread Heikki Linnakangas

On 23/04/2024 11:23, Maxim Orlov wrote:

PROPOSAL
Make multixact offsets 64 bit.


+1, this is a good next step and useful regardless of 64-bit XIDs.


@@ -156,7 +148,7 @@
((uint32) ((0x % MULTIXACT_MEMBERS_PER_PAGE) + 1))
 
 /* page in which a member is to be found */

-#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) 
MULTIXACT_MEMBERS_PER_PAGE)
+#define MXOffsetToMemberPage(xid) ((xid) / (MultiXactOffset) 
MULTIXACT_MEMBERS_PER_PAGE)
 #define MXOffsetToMemberSegment(xid) (MXOffsetToMemberPage(xid) / 
SLRU_PAGES_PER_SEGMENT)
 
 /* Location (byte offset within page) of flag word for a given member */


This is really a bug fix. It didn't matter when TransactionId and 
MultiXactOffset were both typedefs of uint32, but it was always wrong. 
The argument name 'xid' is also misleading.


I think there are some more like that, MXOffsetToFlagsBitShift for example.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-04-22 Thread Heikki Linnakangas

On 22/04/2024 10:47, Heikki Linnakangas wrote:

On 22/04/2024 10:19, Michael Paquier wrote:

On Sat, Apr 20, 2024 at 12:43:24AM +0300, Heikki Linnakangas wrote:

On 19/04/2024 19:48, Jacob Champion wrote:

On Fri, Apr 19, 2024 at 6:56 AM Heikki Linnakangas  wrote:

With direct SSL negotiation, we always require ALPN.


(As an aside: I haven't gotten to test the version of the patch that
made it into 17 yet, but from a quick glance it looks like we're not
rejecting mismatched ALPN during the handshake as noted in [1].)


Ah, good catch, that fell through the cracks. Agreed, the client should
reject a direct SSL connection if the server didn't send ALPN. I'll add that
to the Open Items so we don't forget again.


Would somebody like to write a patch for that?  I'm planning to look
at this code more closely, as well.


I plan to write the patch later today.


Here's the patch for that. The error message is:

"direct SSL connection was established without ALPN protocol negotiation 
extension"


That's accurate, but I wonder if we could make it more useful to a user 
who's wondering what went wrong. I'd imagine that if the server doesn't 
support ALPN, it's because you have some kind of a (not necessarily 
malicious) generic SSL man-in-the-middle that doesn't support it. Or 
you're trying to connect to an HTTPS server. Suggestions welcome.


--
Heikki Linnakangas
Neon (https://neon.tech)
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index ec20e3f3a9..f9156e29ca 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3524,6 +3524,13 @@ keep_going:		/* We will come back to here until there is
 pollres = pqsecure_open_client(conn);
 if (pollres == PGRES_POLLING_OK)
 {
+	/* ALPN is mandatory with direct SSL negotiation */
+	if (conn->current_enc_method == ENC_DIRECT_SSL && !conn->ssl_alpn_used)
+	{
+		libpq_append_conn_error(conn, "direct SSL connection was established without ALPN protocol negotiation extension");
+		CONNECTION_FAILED();
+	}
+
 	/*
 	 * At this point we should have no data already buffered.
 	 * If we do, it was received before we performed the SSL
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index e7a4d006e1..8df3c751db 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1585,6 +1585,31 @@ open_client_SSL(PGconn *conn)
 		}
 	}
 
+	/* Get the protocol selected by ALPN */
+	{
+		const unsigned char *selected;
+		unsigned int len;
+
+		SSL_get0_alpn_selected(conn->ssl, , );
+
+		/* If ALPN is used, check that we negotiated the expected protocol */
+		if (selected != NULL)
+		{
+			if (len == strlen(PG_ALPN_PROTOCOL) &&
+memcmp(selected, PG_ALPN_PROTOCOL, strlen(PG_ALPN_PROTOCOL)) == 0)
+			{
+conn->ssl_alpn_used = true;
+			}
+			else
+			{
+/* shouldn't happen */
+libpq_append_conn_error(conn, "SSL connection was established with unexpected ALPN protocol");
+pgtls_close(conn);
+return PGRES_POLLING_FAILED;
+			}
+		}
+	}
+
 	/*
 	 * We already checked the server certificate in initialize_SSL() using
 	 * SSL_CTX_set_verify(), if root.crt exists.
@@ -1632,6 +1657,7 @@ pgtls_close(PGconn *conn)
 			conn->ssl = NULL;
 			conn->ssl_in_use = false;
 			conn->ssl_handshake_started = false;
+			conn->ssl_alpn_used = false;
 
 			destroy_needed = true;
 		}
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 3691e5ee96..a6792917cf 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -569,6 +569,7 @@ struct pg_conn
 	bool		ssl_handshake_started;
 	bool		ssl_cert_requested; /* Did the server ask us for a cert? */
 	bool		ssl_cert_sent;	/* Did we send one in reply? */
+	bool		ssl_alpn_used;	/* Did we negotiate a protocol with TLS ALPN? */
 
 #ifdef USE_SSL
 #ifdef USE_OPENSSL


Re: Direct SSL connection with ALPN and HBA rules

2024-04-22 Thread Heikki Linnakangas

On 22/04/2024 10:19, Michael Paquier wrote:

On Sat, Apr 20, 2024 at 12:43:24AM +0300, Heikki Linnakangas wrote:

On 19/04/2024 19:48, Jacob Champion wrote:

On Fri, Apr 19, 2024 at 6:56 AM Heikki Linnakangas  wrote:

With direct SSL negotiation, we always require ALPN.


   (As an aside: I haven't gotten to test the version of the patch that
made it into 17 yet, but from a quick glance it looks like we're not
rejecting mismatched ALPN during the handshake as noted in [1].)


Ah, good catch, that fell through the cracks. Agreed, the client should
reject a direct SSL connection if the server didn't send ALPN. I'll add that
to the Open Items so we don't forget again.


Would somebody like to write a patch for that?  I'm planning to look
at this code more closely, as well.


I plan to write the patch later today.


Controlling these in HBA is a bit inconvenient, because you only find
out after authentication if it's allowed or not. So if e.g. direct SSL
connections are disabled for a user,


Hopefully disabling direct SSL piecemeal is not a desired use case?
I'm not sure it makes sense to focus on that. Forcing it to be enabled
shouldn't have the same problem, should it?


I'd get behind the case where a server rejects everything except
direct SSL, yeah.  Sticking that into a format similar to HBA rules
would easily give the flexibility to be able to accept or reject
direct or default SSL, though, while making it easy to parse.  The
implementation is not really complicated, and not far from the
existing hostssl and nohostssl.

As a whole, I can get behind a unique GUC that forces the use of
direct.  Or, we could extend the existing "ssl" GUC with a new
"direct" value to accept only direct connections and restrict the
original protocol (and a new "postgres" for the pre-16 protocol,
rejecting direct?), while "on" is able to accept both.


I'd be OK with that, although I still don't really see the point of 
forcing this from the server side. We could also add this later.



Forcing it to be enabled piecemeal based on role or database has similar
problems. Forcing it enabled for all connections seems sensible, though.
Forcing it enabled based on the client's source IP address, but not
user/database would be somewhat sensible too, but we don't currently have
the HBA code to check the source IP and accept/reject SSLRequest based on
that. The HBA rejection always happens after the client has sent the startup
packet.


Hmm.  Splitting the logic checking HBA entries (around check_hba) so
as we'd check for a portion of its contents depending on what the
server has received or not from the client would not be that
complicated.  I'd question whether it makes sense to mix this
information within the same configuration files as the ones holding
the current HBA rules.  If the same rules are used for the
pre-startup-packet phase and the post-startup-packet phase, we'd want
new keywords for these HBA rules, something different than the
existing sslmode and no sslmode?


Sounds complicated, and I don't really see the use case for controlling 
the direct SSL support in such a fine-grained fashion.


It would be nice if we could reject non-SSL connections before the 
client sends the startup packet, but that's not possible because in a 
plaintext connection, that's the first packet that the client sends. The 
reverse would be possible: reject SSLRequest or direct SSL connection 
immediately, if HBA doesn't allow non-SSL connections from that IP 
address. But that's not very interesting.


HBA-based control would certainly be v18 material.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-04-19 Thread Heikki Linnakangas

On 19/04/2024 19:48, Jacob Champion wrote:

On Fri, Apr 19, 2024 at 6:56 AM Heikki Linnakangas  wrote:

With direct SSL negotiation, we always require ALPN.


  (As an aside: I haven't gotten to test the version of the patch that
made it into 17 yet, but from a quick glance it looks like we're not
rejecting mismatched ALPN during the handshake as noted in [1].)


Ah, good catch, that fell through the cracks. Agreed, the client should 
reject a direct SSL connection if the server didn't send ALPN. I'll add 
that to the Open Items so we don't forget again.



I don't see direct SSL negotiation as a security feature.


`direct` mode is not, since it's opportunistic, but I think people are
going to use `requiredirect` as a security feature. At least, I was
hoping to do that myself...


Rather, the
point is to reduce connection latency by saving one round-trip. For
example, if gssencmode=prefer, but the server refuses GSS encryption, it
seems fine to continue with negotiated SSL, instead of establishing a
new connection with direct SSL.


Well, assuming the user is okay with plaintext negotiation at all.
(Was that fixed before the patch went in? Is GSS negotiation still
allowed even with requiredirect?)


To disable sending the startup packet in plaintext, you need to use 
sslmode=require. Same as before the patch. GSS is still allowed, as it 
takes precedence over SSL if both are enabled in libpq. Per the docs:



Note that if gssencmode is set to prefer, a GSS connection is
attempted first. If the server rejects GSS encryption, SSL is
negotiated over the same TCP connection using the traditional
postgres protocol, regardless of sslnegotiation. In other words, the
direct SSL handshake is not used, if a TCP connection has already
been established and can be used for the SSL handshake.




What would be the use case of requiring
direct SSL in the server? What extra security do you get?


You get protection against attacks that could have otherwise happened
during the plaintext portion of the handshake. That has architectural
implications for more advanced uses of SCRAM, and it should prevent
any repeats of CVE-2021-23222/23214. And if the peer doesn't pass the
TLS handshake, they can't send you anything that you might forget is
untrusted (like, say, an error message).


Can you elaborate on the more advanced uses of SCRAM?


Controlling these in HBA is a bit inconvenient, because you only find
out after authentication if it's allowed or not. So if e.g. direct SSL
connections are disabled for a user,


Hopefully disabling direct SSL piecemeal is not a desired use case?
I'm not sure it makes sense to focus on that. Forcing it to be enabled
shouldn't have the same problem, should it?


Forcing it to be enabled piecemeal based on role or database has similar 
problems. Forcing it enabled for all connections seems sensible, though. 
Forcing it enabled based on the client's source IP address, but not 
user/database would be somewhat sensible too, but we don't currently 
have the HBA code to check the source IP and accept/reject SSLRequest 
based on that. The HBA rejection always happens after the client has 
sent the startup packet.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Direct SSL connection with ALPN and HBA rules

2024-04-19 Thread Heikki Linnakangas

On 19/04/2024 08:06, Michael Paquier wrote:

Hi all,
(Heikki in CC.)


(Adding Jacob)


Since 91044ae4baea (require ALPN for direct SSL connections) and
d39a49c1e459 (direct hanshake), direct SSL connections are supported
(yeah!), still the thread where this has been discussed does not cover
the potential impact on HBA rules:
https://www.postgresql.org/message-id/CAM-w4HOEAzxyY01ZKOj-iq%3DM4-VDk%3DvzQgUsuqiTFjFDZaebdg%40mail.gmail.com

My point is, would there be a point in being able to enforce that ALPN
is used from the server rather than just relying on the client-side
sslnegotiation to decide if direct SSL connections should be forced or
not?

Hence, I'd imagine that we could have an HBA option for hostssl rules,
like a negotiation={direct,postgres,all} that cross-checks
Port->alpn_used with the option value in a hostssl entry, rejecting
the use of connections using direct connections or the default
protocol if these are not used, giving users a way to avoid one.  As
this is a new thing, there may be an argument in this option for
security reasons, as well, so as it would be possible for operators to
turn that off in the server.


I don't think ALPN gives any meaningful security benefit, when used with 
the traditional 'postgres' SSL negotiation. There's little possibility 
of mixing that up with some other protocol, so I don't see any need to 
enforce it from server side. This was briefly discussed on that original 
thread [1]. With direct SSL negotiation, we always require ALPN.


I don't see direct SSL negotiation as a security feature. Rather, the 
point is to reduce connection latency by saving one round-trip. For 
example, if gssencmode=prefer, but the server refuses GSS encryption, it 
seems fine to continue with negotiated SSL, instead of establishing a 
new connection with direct SSL. What would be the use case of requiring 
direct SSL in the server? What extra security do you get?


Controlling these in HBA is a bit inconvenient, because you only find 
out after authentication if it's allowed or not. So if e.g. direct SSL 
connections are disabled for a user, the client would still establish a 
direct SSL connection, send the startup packet, and only then get 
rejected. The client would not know if it was rejected because of the 
direct SSL or for some reason, so it needs to retry with negotiated SSL. 
Currently, as it is master, if the TLS layer is established with direct 
SSL, you never need to retry with traditional negotiation, or vice versa.


[1] 
https://www.postgresql.org/message-id/CAAWbhmjetCVgu9pHJFkQ4ejuXuaz2mD1oniXokRHft0usCa7Yg%40mail.gmail.com


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: AIX support

2024-04-18 Thread Heikki Linnakangas
On 18 April 2024 14:15:43 GMT+03:00, Sriram RK  wrote:
>Thanks Noah and Team,
>
>We (IBM-AIX team) looked into this issue
>
>https://www.postgresql.org/message-id/20240225194322...@rfd.leadboat.com
>
>This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0) 
>have issues. But we verified that this issue is resolved with the newer 
>compiler versions openXL(xlc17.1) and gcc(12.0) onwards.
>
>We reported this issue to the xlc team and they have noted this issue. A fix 
>might be possible in May for this issue in xlc v16.  We would like to 
>understand if the community can start using the latest compilers to build the 
>source.
>
>Also as part of the support, we will help in fixing all the issues related to 
>AIX and continue to support AIX for Postgres. If we need another CI 
>environment we can work to make one available. But for time being can we start 
>reverting the patch that has removed AIX support.

Let's start by setting up a new AIX buildfarm member. Regardless of what we do 
with v17, we continue to support AIX on the stable branches, and we really need 
a buildfarm member to keep the stable branches working anyway.

>We want to make a note that postgres is used extensively in our IBM product 
>and is being exploited by multiple customers.

Noted. I'm glad to hear you are interested to put in some effort for this. The 
situation from the current maintainers is that none of us have much interest, 
or resources or knowledge to keep the AIX build working, so we'll definitely 
need the help.

No promises on v17, but let's at least make sure the stable branches keep 
working. And with the patches and buildfarm support from you, maybe v17 is 
feasible too.


- Heikki




Re: plenty code is confused about function level static

2024-04-18 Thread Heikki Linnakangas

On 18/04/2024 00:39, Andres Freund wrote:

Hi,

We have a fair amount of code that uses non-constant function level static
variables for read-only data. Which makes little sense - it prevents the
compiler from understanding

a) that the data is read only and can thus be put into a segment that's shared
between all invocations of the program
b) the data will be the same on every invocation, and thus from optimizing
based on that.

The most common example of this is that all our binaries use
   static struct option long_options[] = { ... };
which prevents long_options from being put into read-only memory.


Is there some reason we went for this pattern in a fair number of places? I
assume it's mostly copy-pasta, but...


In practice it often is useful to use 'static const' instead of just
'const'. At least gcc otherwise soemtimes fills the data on the stack, instead
of having a read-only data member that's already initialized. I'm not sure
why, tbh.


Weird. I guess it can be faster if you assume the data in the read-only 
section might not be in cache, but the few instructions needed to fill 
the data locally in stack are.



Attached are fixes for struct option and a few more occurrences I've found
with a bit of grepping.


+1

--
Heikki Linnakangas
Neon (https://neon.tech)





White-box testing heap pruning

2024-04-14 Thread Heikki Linnakangas
While working on the "Combine Prune and Freeze records emitted by 
vacuum" patch [1], I wished we would have an easier way to test pruning. 
There's a lot of logic with following HOT chains etc., and it's very 
hard to construct all those scenarios just by INSERT/UPDATE/DELETE 
commands. In principle though, pruning should be very amenable for good 
test coverage. The input is one heap page and some parameters, the 
output is one heap page and a few other fields that are already packaged 
neatly in the PruneFreezeResult struct.


Back then, I started to work on a little tool for that to verify the 
correctness of pruning refactoring, but I never got around to polish it 
or write proper repeatable tests with it. I did use it for some ad hoc 
testing, though.


I don't know when I'll find the time to polish it, so here is the very 
rough work-in-progress version I've got now.


One thing I used this for was to test that we still handle 
HEAP_MOVED_IN/OFF correctly. Yes, it still works. But what surprised me 
is that when a HEAP_MOVED_IN tuple is frozen, we replace xvac with 
FrozenTransactondId, and leave the HEAP_MOVED_IN flag in place. I 
assumed that we would clear the HEAP_MOVED_IN flag instead.


[1] 
https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g%40mail.gmail.com


--
Heikki Linnakangas
Neon (https://neon.tech)From 9dcd528dd42f689e207d808bee388fb233b2e25e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sun, 14 Apr 2024 22:50:10 +0300
Subject: [PATCH 1/2] Expose conflict_xid to caller, for tests

---
 src/backend/access/heap/pruneheap.c | 20 +++-
 src/include/access/heapam.h |  5 +
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d2eecaf7ebc..8f4d17f7d08 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -366,6 +366,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_hint;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
@@ -794,7 +795,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
 		 */
-		if (RelationNeedsWAL(relation))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -807,7 +807,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * record will freeze will conflict.
 			 */
 			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
 
 			/*
 			 * We can use the visibility_cutoff_xid as our cutoff for
@@ -832,13 +831,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			else
 conflict_xid = prstate.latest_xid_removed;
 
-			log_heap_prune_and_freeze(relation, buffer,
-	  conflict_xid,
-	  true, reason,
-	  prstate.frozen, prstate.nfrozen,
-	  prstate.redirected, prstate.nredirected,
-	  prstate.nowdead, prstate.ndead,
-	  prstate.nowunused, prstate.nunused);
+			if (RelationNeedsWAL(relation))
+log_heap_prune_and_freeze(relation, buffer,
+		  conflict_xid,
+		  true, reason,
+		  prstate.frozen, prstate.nfrozen,
+		  prstate.redirected, prstate.nredirected,
+		  prstate.nowdead, prstate.ndead,
+		  prstate.nowunused, prstate.nunused);
 		}
 	}
 
@@ -876,6 +876,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	presult->hastup = prstate.hastup;
 
+	presult->conflict_xid = conflict_xid;
+
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
 	 * for that record must be the newest xmin on the page.  However, if the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 735662dc9df..934b5bfbe50 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -255,6 +255,11 @@ typedef struct PruneFreezeResult
 	 */
 	bool		hastup;
 
+	/*
+	 * Recovery conflict XID, if any. This is returned just for white-box tests.
+	 */
+	TransactionId conflict_xid;
+
 	/*
 	 * LP_DEAD items on the page after pruning.  Includes existing LP_DEAD
 	 * items.
-- 
2.39.2

From 9af964cfddf8634e796af5facfd2476401f8ef78 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sun, 14 Apr 2024 23:29:35 +0300
Subject: [PATCH 2/2] XXX: Add test_heapam

heappage_craft.c contains SQL-callable functions for constructing a
heap page from scratch, with the exact line pointers, XIDs, flags etc.
---
 src/test/modules/Makefile |   1 +
 src/test/modules/meson.build  |   1 +
 src/test/modules/test_heapam/Makefile |  24 ++
 .../test_heapam/expected/move_in_out.out  |  69 
 .../modules/test_heapam/expected/pr

Re: some LLVM function checks missing in meson

2024-04-13 Thread Heikki Linnakangas

On 11/04/2024 18:26, Peter Eisentraut wrote:

I have been checking the pg_config.h generated by configure and meson to
see if there is anything materially different.  I found that

HAVE_DECL_LLVMCREATEGDBREGISTRATIONLISTENER and
HAVE_DECL_LLVMCREATEPERFJITEVENTLISTENER

are missing on the meson side.

Something like the below would appear to fix that:

diff --git a/meson.build b/meson.build
index 43fad5323c0..cdfd31377d1 100644
--- a/meson.build
+++ b/meson.build
@@ -2301,6 +2301,14 @@ decl_checks += [
 ['pwritev', 'sys/uio.h'],
   ]

+# Check presence of some optional LLVM functions.
+if llvm.found()
+  decl_checks += [
+['LLVMCreateGDBRegistrationListener', 'llvm-c/ExecutionEngine.h'],
+['LLVMCreatePerfJITEventListener', 'llvm-c/ExecutionEngine.h'],
+  ]
+endif
+
   foreach c : decl_checks
 func = c.get(0)
 header = c.get(1)

I don't know what these functions do, but the symbols are used in the
source code.  Thoughts?


+1. I also don't know what they do, but clearly the configure and meson 
checks should be in sync.


There's also this in llvmjit.c:


if (llvm_opt3_orc)
{
#if defined(HAVE_DECL_LLVMORCREGISTERPERF) && HAVE_DECL_LLVMORCREGISTERPERF
if (jit_profiling_support)
LLVMOrcUnregisterPerf(llvm_opt3_orc);
#endif
LLVMOrcDisposeInstance(llvm_opt3_orc);
llvm_opt3_orc = NULL;
}

if (llvm_opt0_orc)
{
#if defined(HAVE_DECL_LLVMORCREGISTERPERF) && HAVE_DECL_LLVMORCREGISTERPERF
if (jit_profiling_support)
LLVMOrcUnregisterPerf(llvm_opt0_orc);
#endif
LLVMOrcDisposeInstance(llvm_opt0_orc);
llvm_opt0_orc = NULL;
}
}


The autoconf test that set HAVE_DECL_LLVMORCREGISTERPERF was removed in 
commit e9a9843e13. I believe that's a leftover that should also have 
been removed.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-13 Thread Heikki Linnakangas

On 11/04/2024 16:37, Ranier Vilela wrote:
Em qui., 11 de abr. de 2024 às 09:54, Heikki Linnakangas 
mailto:hlinn...@iki.fi>> escreveu:


On 11/04/2024 15:03, Ranier Vilela wrote:
 > Em qua., 10 de abr. de 2024 às 18:28, Heikki Linnakangas
 > mailto:hlinn...@iki.fi> <mailto:hlinn...@iki.fi
<mailto:hlinn...@iki.fi>>> escreveu:
 >
 >     On 10/04/2024 21:07, Ranier Vilela wrote:
 >      > Hi,
 >      >
 >      > Per Coverity.
 >      >
 >      > The function ReorderBufferTXNByXid,
 >      > can return NULL when the parameter *create* is false.
 >      >
 >      > In the functions ReorderBufferSetBaseSnapshot
 >      > and ReorderBufferXidHasBaseSnapshot,
 >      > the second call to ReorderBufferTXNByXid,
 >      > pass false to *create* argument.
 >      >
 >      > In the function ReorderBufferSetBaseSnapshot,
 >      > fixed passing true as argument to always return
 >      > a valid ReorderBufferTXN pointer.
 >      >
 >      > In the function ReorderBufferXidHasBaseSnapshot,
 >      > fixed by checking if the pointer is NULL.
 >
 >     If it's a "known subxid", the top-level XID should already
have its
 >     ReorderBufferTXN entry, so ReorderBufferTXN() should never
return NULL.
 >
 > There are several conditions to not return NULL,
 > I think trusting never is insecure.

Well, you could make it an elog(ERROR, ..) instead. But the point is
that it should not happen, and if it does for some reason, that's very
suprpising and there is a bug somewhere. In that case, we should *not*
just blindly create it and proceed as if everything was OK.

Thanks for the clarification.
I will then suggest improving robustness,
but avoiding hiding any possible errors that may occur.


I don't much like adding extra runtime checks for "can't happen" 
scenarios either. Assertions would be more clear, but in this case the 
code would just segfault trying to dereference the NULL pointer, which 
is fine for a "can't happen" scenario.


Looking closer, when we identify an XID as a subtransaction, we:
- assign toplevel_xid,
- set RBTXN_IS_SUBXACT, and
- assign toptxn pointer.

ISTM the 'toplevel_xid' and RBTXN_IS_SUBXACT are redundant. 
'toplevel_xid' is only used in those two calls that do 
ReorderBufferTXNByXid(rb, txn->toplevel_xid,...), and you could replace 
those by following the 'toptxn' pointer directly. And RBTXN_IS_SUBXACT 
is redundant with (toptxn != NULL). So here's a patch to remove 
'toplevel_xid' and RBTXN_IS_SUBXACT altogether.


Amit, you added 'toptxn' in commit c55040ccd017; does this look right to 
you?


--
Heikki Linnakangas
Neon (https://neon.tech)
From e8452054a79034d070407775dfcd8b9754602cb9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sat, 13 Apr 2024 10:02:41 +0300
Subject: [PATCH 1/1] Remove redundant field and flag from ReorderBufferTXN

RBTXN_IS_SUBXACT and toplevel_xid are redundant with the
ReorderBufferTXN->toptxn field.
---
 .../replication/logical/reorderbuffer.c   | 28 ---
 src/include/replication/reorderbuffer.h   | 15 ++
 2 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 00a8327e771..fb0dbec155c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -947,7 +947,7 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
 			Assert(prev_first_lsn < cur_txn->first_lsn);
 
 		/* known-as-subtxn txns must not be listed */
-		Assert(!rbtxn_is_known_subxact(cur_txn));
+		Assert(!rbtxn_is_subtxn(cur_txn));
 
 		prev_first_lsn = cur_txn->first_lsn;
 	}
@@ -967,7 +967,7 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
 			Assert(prev_base_snap_lsn < cur_txn->base_snapshot_lsn);
 
 		/* known-as-subtxn txns must not be listed */
-		Assert(!rbtxn_is_known_subxact(cur_txn));
+		Assert(!rbtxn_is_subtxn(cur_txn));
 
 		prev_base_snap_lsn = cur_txn->base_snapshot_lsn;
 	}
@@ -1022,7 +1022,7 @@ ReorderBufferGetOldestTXN(ReorderBuffer *rb)
 
 	txn = dlist_head_element(ReorderBufferTXN, node, >toplevel_by_lsn);
 
-	Assert(!rbtxn_is_known_subxact(txn));
+	Assert(!rbtxn_is_subtxn(txn));
 	Assert(txn->first_lsn != InvalidXLogRecPtr);
 	return txn;
 }
@@ -1079,7 +1079,7 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
 
 	if (!new_sub)
 	{
-		if (rbtxn_is_known_subxact(subtxn))
+		if (rbtxn_is_subtxn(subtxn))
 		{
 			/* already associated, nothing to do */
 			return;
@@ -1095,8 +1095,6 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
 		}
 	}
 
-	subtxn->txn_flags |= RBTXN_IS_SUBXACT;
-	subtxn->toplevel_xid = xid

Re: Typos in the code and README

2024-04-12 Thread Heikki Linnakangas

On 11/04/2024 16:05, Daniel Gustafsson wrote:

Now that the tree has settled down a bit post-freeze I ran some tooling to
check spelling.  I was primarily interested in docs and README* which were
mostly free from simply typos, while the code had some in various comments and
one in code.  The attached fixes all that I came across (not cross-referenced
against ongoing reverts or any other fixup threads but will be before pushing
of course).


Here's a few more. I've accumulate these over the past couple of months, 
keeping them stashed in a branch, adding to it whenever I've spotted a 
minor typo while reading the code.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 5f531b719c176b2f316b6341fa062af508ed2e10 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sun, 7 Apr 2024 22:34:23 +0300
Subject: [PATCH 1/2] fix typos

---
 doc/src/sgml/maintenance.sgml   | 2 +-
 doc/src/sgml/meson.build| 2 +-
 src/backend/access/rmgrdesc/xactdesc.c  | 2 +-
 src/backend/catalog/system_functions.sql| 2 +-
 src/backend/commands/amcmds.c   | 2 +-
 src/backend/commands/tablecmds.c| 4 ++--
 src/backend/postmaster/walsummarizer.c  | 2 +-
 src/backend/replication/logical/slotsync.c  | 2 +-
 src/backend/statistics/dependencies.c   | 4 ++--
 src/backend/utils/adt/multirangetypes.c | 2 +-
 src/backend/utils/mmgr/aset.c   | 4 ++--
 src/backend/utils/mmgr/generation.c | 4 ++--
 src/bin/pg_basebackup/bbstreamer_tar.c  | 2 +-
 src/bin/pg_combinebackup/pg_combinebackup.c | 2 +-
 src/interfaces/libpq/fe-secure-openssl.c| 2 +-
 src/test/modules/test_resowner/test_resowner_many.c | 2 +-
 src/test/regress/expected/aggregates.out| 2 +-
 src/test/regress/sql/aggregates.sql | 2 +-
 18 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 2bfa05b8bc..0be90bdc7e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -802,7 +802,7 @@ HINT:  Execute a database-wide VACUUM in that database.
 
  Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
  system will begin to emit warning messages when the database's oldest MXIDs reach forty
- million transactions from the wraparound point.  And, just as an the XID case, if these
+ million transactions from the wraparound point.  And, just as in the XID case, if these
  warnings are ignored, the system will refuse to generate new MXIDs once there are fewer
  than three million left until wraparound.
 
diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index 3a4b47d387..e418de83a7 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -207,7 +207,7 @@ if docs_dep.found()
   alias_target('man', man)
   alias_target('install-man', install_doc_man)
 
-  # built and installed as part of the the docs target
+  # built and installed as part of the docs target
   installdocs += install_doc_man
   docs += man
 endif
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 41b842d80e..dccca201e0 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -25,7 +25,7 @@
  * Parse the WAL format of an xact commit and abort records into an easier to
  * understand format.
  *
- * This routines are in xactdesc.c because they're accessed in backend (when
+ * These routines are in xactdesc.c because they're accessed in backend (when
  * replaying WAL) and frontend (pg_waldump) code. This file is the only xact
  * specific one shared between both. They're complicated enough that
  * duplication would be bothersome.
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index fe2bb50f46..ae099e328c 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -5,7 +5,7 @@
  *
  * src/backend/catalog/system_functions.sql
  *
- * This file redefines certain built-in functions that it's impractical
+ * This file redefines certain built-in functions that are impractical
  * to fully define in pg_proc.dat.  In most cases that's because they use
  * SQL-standard function bodies and/or default expressions.  The node
  * tree representations of those are too unreadable, platform-dependent,
diff --git a/src/backend/commands/amcmds.c b/src/backend/commands/amcmds.c
index 10e386288a..aaa0f9a1dc 100644
--- a/src/backend/commands/amcmds.c
+++ b/src/backend/commands/amcmds.c
@@ -167,7 +167,7 @@ get_index_am_oid(const char *amname, bool missing_ok)
 
 /*
  * get_table_am_oid - given an access method name, look up its OID
- *		and verify it corresponds to an table AM.
+ *		and verify it corresponds to a table AM.
  */
 Oid

PG_TEST_EXTRAs by theme rather than test name (Re: pgsql: Add tests for libpq gssencmode and sslmode options)

2024-04-12 Thread Heikki Linnakangas

(moved to pgsql-hackers, change subject)

On 10/04/2024 18:54, Heikki Linnakangas wrote:

On 10/04/2024 17:48, Peter Eisentraut wrote:

On 08.04.24 01:50, Heikki Linnakangas wrote:

Add tests for libpq gssencmode and sslmode options


Why aren't these tests at
src/interfaces/libpq/t/nnn_negotiate_encryption.pl ?


To be honest, it never occurred to me. It started out as extra tests
under src/test/ssl/, and when I decided to move them out to its own
module, I didn't think of moving them to src/interfaces/libpq/t/.

I will move it, barring any objections or better ideas.


Moved.

I also added an extra check for PG_TEST_EXTRA=kerberos, so that the 
tests that require a MIT Kerberos installation are only run if 
PG_TEST_EXTRA=kerberos is specified. That seems prudent; it seems 
unlikely that you would want to run libpq_encryption tests with Kerberos 
tests included, but not the main kerberos tests. If you specify 
PG_TEST_EXTRA=libpq_encryption, but not 'kerberos', it's probably 
because you don't have an MIT Kerberos installation on your system.


I added documentation for the new PG_TEST_EXTRA=libpq_encryption option, 
I missed that earlier, with a note on the above interaction with 'kerberos'.



As we accumulate more PG_TEST_EXTRA options, I think we should 
categorize the tests by the capabilities they need or the risk 
associated, rather than by test names. Currently we have:


- kerberos: Requires MIT Kerberos installation and opens TCP/IP listen 
sockets

- ldap: Requires OpenLDAP installation and opens TCP/IP listen sockets
- ssl: Opens TCP/IP listen sockets.
- load_balance: Requires editing the system 'hosts' file and opens 
TCP/IP listen sockets.
- libpq_encryption: Opens TCP/IP listen sockets. For the GSSAPI tests, 
requires MIT Kerberos installation

- wal_consistency_checking: is resource intensive
- xid_wraparound: is resource intensive

There are a few clear themes here:

- tests that open TCP/IP listen sockets
- tests that require OpenLDAP installation
- tests that require MIT Kerberos installation
- tests that require editing 'hosts' file
- tests that are resource intensive

We could have PG_TEST_EXTRA options that match those themes, and 
enable/disable the individual tests based on those requirements. For 
example, if you're on a single-user system and have no issue with 
opening TCP/IP listen sockets, you would specify 
"PG_TEST_EXTRA=tcp-listen", and all the tests that need to open TCP/IP 
listen sockets would run. Also it would be nice to have autoconf/meson 
tests for the presence of OpenLDAP / MIT Kerberos installations, instead 
of having to enable/disable them with PG_TEST_EXTRA.


--
Heikki Linnakangas
Neon (https://neon.tech)




Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-11 Thread Heikki Linnakangas

On 11/04/2024 18:09, Alexander Korotkov wrote:

On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas  wrote:

On 07/04/2024 00:52, Alexander Korotkov wrote:

* At first, we check that pg_wal_replay_wait() is called in a non-atomic
* context.  That is, a procedure call isn't wrapped into a transaction,
* another procedure call, or a function call.
*


It's pretty unfortunate to have all these restrictions. It would be nice
to do:

select pg_wal_replay_wait('0/1234'); select * from foo;


This works for me, except that it needs "call" not "select".

# call pg_wal_replay_wait('0/1234'); select * from foo;
CALL
  i
---
(0 rows)


If you do that from psql prompt, it works because psql parses and sends 
it as two separate round-trips. Try:


psql postgres -p 5433 -c "call pg_wal_replay_wait('0/4101BBB8'); select 1"
ERROR:  pg_wal_replay_wait() must be only called in non-atomic context
DETAIL:  Make sure pg_wal_replay_wait() isn't called within a 
transaction, another procedure, or a function.



This assumption that PortalRunUtility() can tolerate us popping the
snapshot sounds very fishy. I haven't looked at what's going on there,
but doesn't sound like a great assumption.


This is what PortalRunUtility() says about this.

/*
  * Some utility commands (e.g., VACUUM) pop the ActiveSnapshot stack from
  * under us, so don't complain if it's now empty.  Otherwise, our snapshot
  * should be the top one; pop it.  Note that this could be a different
  * snapshot from the one we made above; see EnsurePortalSnapshotExists.
  */

So, if the vacuum pops a snapshot when it needs to run without a
snapshot, then it's probably OK for other utilities.  But I agree this
decision needs some consensus.


Ok, so it's at least somewhat documented that it's fine.


Overall, this feature doesn't feel quite ready for v17, and IMHO should
be reverted. It's a nice feature, so I'd love to have it fixed and
reviewed early in the v18 cycle.


Thank you for your review.  I've reverted this. Will repost this for early v18.


Thanks Alexander for working on this.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Improve eviction algorithm in ReorderBuffer

2024-04-11 Thread Heikki Linnakangas

On 11/04/2024 11:20, Masahiko Sawada wrote:

On Thu, Apr 11, 2024 at 11:52 AM Masahiko Sawada  wrote:


We can see 2% ~ 3% performance regressions compared to the current
HEAD, but it's much smaller than I expected. Given that we can make
the code simple, I think we can go with this direction.


Pushed the patch and reverted binaryheap changes.


Thank you!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-11 Thread Heikki Linnakangas

On 11/04/2024 15:03, Ranier Vilela wrote:
Em qua., 10 de abr. de 2024 às 18:28, Heikki Linnakangas 
mailto:hlinn...@iki.fi>> escreveu:


On 10/04/2024 21:07, Ranier Vilela wrote:
 > Hi,
 >
 > Per Coverity.
 >
 > The function ReorderBufferTXNByXid,
 > can return NULL when the parameter *create* is false.
 >
 > In the functions ReorderBufferSetBaseSnapshot
 > and ReorderBufferXidHasBaseSnapshot,
 > the second call to ReorderBufferTXNByXid,
 > pass false to *create* argument.
 >
 > In the function ReorderBufferSetBaseSnapshot,
 > fixed passing true as argument to always return
 > a valid ReorderBufferTXN pointer.
 >
 > In the function ReorderBufferXidHasBaseSnapshot,
 > fixed by checking if the pointer is NULL.

If it's a "known subxid", the top-level XID should already have its
ReorderBufferTXN entry, so ReorderBufferTXN() should never return NULL.

There are several conditions to not return NULL,
I think trusting never is insecure.


Well, you could make it an elog(ERROR, ..) instead. But the point is 
that it should not happen, and if it does for some reason, that's very 
suprpising and there is a bug somewhere. In that case, we should *not* 
just blindly create it and proceed as if everything was OK.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Improve eviction algorithm in ReorderBuffer

2024-04-10 Thread Heikki Linnakangas

On 11/04/2024 01:37, Michael Paquier wrote:

On Thu, Apr 11, 2024 at 12:20:55AM +0300, Heikki Linnakangas wrote:

To move this forward, here's a patch to switch to a pairing heap. In my very
quick testing, with the performance test cases posted earlier in this thread
[1] [2], I'm seeing no meaningful performance difference between this and
what's in master currently.


Reading through the patch, that's a nice cleanup.  It cuts quite some
code.

+++ b/src/include/replication/reorderbuffer.h
@@ -12,6 +12,7 @@
  #include "access/htup_details.h"
  #include "lib/binaryheap.h"
  #include "lib/ilist.h"
+#include "lib/pairingheap.h"

I'm slightly annoyed by the extra amount of information that gets
added to reorderbuffer.h for stuff that's only local to
reorderbuffer.c, but that's not something new in this area, so..


We can actually remove the "lib/binaryheap.h" in this patch; I missed 
that. There are no other uses of binaryheap in the file.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-10 Thread Heikki Linnakangas

On 07/04/2024 00:52, Alexander Korotkov wrote:

On Fri, Apr 5, 2024 at 9:15 PM Alvaro Herrera  wrote:

I'm still concerned that WaitLSNCleanup is only called in ProcKill.
Does this mean that if a process throws an error while waiting, it'll
not get cleaned up until it exits?  Maybe this is not a big deal, but it
seems odd.


I've added WaitLSNCleanup() to the AbortTransaction().  Just pushed
that together with the improvements upthread.


Race condition:

1. backend: pg_wal_replay_wait('0/1234') is called. It calls WaitForLSN
2. backend: WaitForLSN calls addLSNWaiter('0/1234'). It adds the backend 
process to the LSN heap and returns

3. replay:  rm_redo record '0/1234'
4. backend: WaitForLSN enters for-loop, calls GetXLogReplayRecPtr()
5. backend: current replay LSN location is '0/1234', so we exit the loop
6. replay: calls WaitLSNSetLatches()

In a nutshell, it's possible for the loop in WaitForLSN to exit without 
cleaning up the process from the heap. I was able to hit that by adding 
a delay after the addLSNWaiter() call:



TRAP: failed Assert("!procInfo->inHeap"), File: 
"../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
postgres: heikki postgres [local] 
CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
postgres: heikki postgres [local] CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
postgres: heikki postgres [local] CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
postgres: heikki postgres [local] 
CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]


I think there's a similar race condition if the timeout is reached at 
the same time that the startup process wakes up the process.



 * At first, we check that pg_wal_replay_wait() is called in a 
non-atomic
 * context.  That is, a procedure call isn't wrapped into a transaction,
 * another procedure call, or a function call.
 *


It's pretty unfortunate to have all these restrictions. It would be nice 
to do:


select pg_wal_replay_wait('0/1234'); select * from foo;

in a single multi-query call, to avoid the round-trip to the client. You 
can avoid it with libpq or protocol level pipelining, too, but it's more 
complicated.



 * Secondly, according to PlannedStmtRequiresSnapshot(), even in an 
atomic
 * context, CallStmt is processed with a snapshot.  Thankfully, we can 
pop
 * this snapshot, because PortalRunUtility() can tolerate this.


This assumption that PortalRunUtility() can tolerate us popping the 
snapshot sounds very fishy. I haven't looked at what's going on there, 
but doesn't sound like a great assumption.


If recovery ends while a process is waiting for an LSN to arrive, does 
it ever get woken up?


The docs could use some-copy-editing, but just to point out one issue:


There are also procedures to control the progress of recovery.


That's copy-pasted from an earlier sentence at the table that lists 
functions like pg_promote(), pg_wal_replay_pause(), and 
pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control the 
progress of recovery like those functions do, it only causes the calling 
backend to wait.


Overall, this feature doesn't feel quite ready for v17, and IMHO should 
be reverted. It's a nice feature, so I'd love to have it fixed and 
reviewed early in the v18 cycle.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-10 Thread Heikki Linnakangas

On 10/04/2024 21:07, Ranier Vilela wrote:

Hi,

Per Coverity.

The function ReorderBufferTXNByXid,
can return NULL when the parameter *create* is false.

In the functions ReorderBufferSetBaseSnapshot
and ReorderBufferXidHasBaseSnapshot,
the second call to ReorderBufferTXNByXid,
pass false to *create* argument.

In the function ReorderBufferSetBaseSnapshot,
fixed passing true as argument to always return
a valid ReorderBufferTXN pointer.

In the function ReorderBufferXidHasBaseSnapshot,
fixed by checking if the pointer is NULL.


If it's a "known subxid", the top-level XID should already have its 
ReorderBufferTXN entry, so ReorderBufferTXN() should never return NULL. 
It's not surprising if Coverity doesn't understand that, but setting the 
'create' flag doesn't seem like the right fix. If we add "Assert(txn != 
NULL)", does that silence it?


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Improve eviction algorithm in ReorderBuffer

2024-04-10 Thread Heikki Linnakangas

On 10/04/2024 08:31, Amit Kapila wrote:

On Wed, Apr 10, 2024 at 11:00 AM Heikki Linnakangas  wrote:


On 10/04/2024 07:45, Michael Paquier wrote:

On Tue, Apr 09, 2024 at 09:16:53PM -0700, Jeff Davis wrote:

On Wed, 2024-04-10 at 12:13 +0900, Michael Paquier wrote:

Wouldn't the best way forward be to revert
5bec1d6bc5e3 and revisit the whole in v18?


Also consider commits b840508644 and bcb14f4abc.


Indeed.  These are also linked.


I don't feel the urge to revert this:

- It's not broken as such, we're just discussing better ways to
implement it. We could also do nothing, and revisit this in v18. The
only must-fix issue is some compiler warnings IIUC.

- It's a pretty localized change in reorderbuffer.c, so it's not in the
way of other patches or reverts. Nothing else depends on the binaryheap
changes yet either.

- It seems straightforward to repeat the performance tests with whatever
alternative implementations we want to consider.

My #1 choice would be to write a patch to switch the pairing heap,
performance test that, and revert the binary heap changes.



+1.


To move this forward, here's a patch to switch to a pairing heap. In my 
very quick testing, with the performance test cases posted earlier in 
this thread [1] [2], I'm seeing no meaningful performance difference 
between this and what's in master currently.


Sawada-san, what do you think of this? To be sure, if you could also 
repeat the performance tests you performed earlier, that'd be great. If 
you agree with this direction, and you're happy with this patch, feel 
free take it from here and commit this, and also revert commits 
b840508644 and bcb14f4abc.


--
Heikki Linnakangas
Neon (https://neon.tech)
From c883d03bd221341b0bbb315d376d9e125b84329a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 10 Apr 2024 08:09:38 +0300
Subject: [PATCH 1/1] Replace binaryheap + index with pairingheap

A pairing heap can perform the same operations as the binary heap +
index, with as good or better algorithmic complexity, and that's an
existing data structure so that we don't need to invent anything new
compared to v16. This commit makes the new binaryheap functionality
that was added in commits b840508644 and bcb14f4abc unnecessarily, but
they will be reverted separately.

Remove the optimization to only build and maintain the heap when the
amount of memory used is close to the limit, becuase the bookkeeping
overhead with the pairing heap seems to be small enough that it
doesn't matter in practice.
---
 .../replication/logical/reorderbuffer.c   | 186 +++---
 src/include/replication/reorderbuffer.h   |   8 +-
 2 files changed, 29 insertions(+), 165 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5cf28d4df42..ab2ddabfadd 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -68,19 +68,9 @@
  *	  memory gets actually freed.
  *
  *	  We use a max-heap with transaction size as the key to efficiently find
- *	  the largest transaction. While the max-heap is empty, we don't update
- *	  the max-heap when updating the memory counter. Therefore, we can get
- *	  the largest transaction in O(N) time, where N is the number of
- *	  transactions including top-level transactions and subtransactions.
- *
- *	  We build the max-heap just before selecting the largest transactions
- *	  if the number of transactions being decoded is higher than the threshold,
- *	  MAX_HEAP_TXN_COUNT_THRESHOLD. After building the max-heap, we also
- *	  update the max-heap when updating the memory counter. The intention is
- *	  to efficiently find the largest transaction in O(1) time instead of
- *	  incurring the cost of memory counter updates (O(log N)). Once the number
- *	  of transactions got lower than the threshold, we reset the max-heap
- *	  (refer to ReorderBufferMaybeResetMaxHeap() for details).
+ *	  the largest transaction. We update the max-heap whenever the memory
+ *	  counter is updated; however transactions with size 0 are not stored in
+ *	  the heap, because they have no changes to evict.
  *
  *	  We still rely on max_changes_in_memory when loading serialized changes
  *	  back into memory. At that point we can't use the memory limit directly
@@ -122,23 +112,6 @@
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
 
-/*
- * Threshold of the total number of top-level and sub transactions that
- * controls whether we use the max-heap for tracking their sizes. Although
- * using the max-heap to select the largest transaction is effective when
- * there are many transactions being decoded, maintaining the max-heap while
- * updating the memory statistics can be costly. Therefore, we use
- * MaxConnections as the threshold so that we use the max-heap only when
- * using subtransactions.
- */
-#define MAX_HEAP_TXN_COUNT_THRESHOLD	MaxConnections
-
-/*

Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Heikki Linnakangas

On 10/04/2024 07:45, Michael Paquier wrote:

On Tue, Apr 09, 2024 at 09:16:53PM -0700, Jeff Davis wrote:

On Wed, 2024-04-10 at 12:13 +0900, Michael Paquier wrote:

Wouldn't the best way forward be to revert
5bec1d6bc5e3 and revisit the whole in v18?


Also consider commits b840508644 and bcb14f4abc.


Indeed.  These are also linked.


I don't feel the urge to revert this:

- It's not broken as such, we're just discussing better ways to 
implement it. We could also do nothing, and revisit this in v18. The 
only must-fix issue is some compiler warnings IIUC.


- It's a pretty localized change in reorderbuffer.c, so it's not in the 
way of other patches or reverts. Nothing else depends on the binaryheap 
changes yet either.


- It seems straightforward to repeat the performance tests with whatever 
alternative implementations we want to consider.


My #1 choice would be to write a patch to switch the pairing heap, 
performance test that, and revert the binary heap changes.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Heikki Linnakangas

On 09/04/2024 21:04, Jeff Davis wrote:

On Fri, 2024-04-05 at 16:58 +0900, Masahiko Sawada wrote:

I have some further comments and I believe changes are required for
v17.


I also noticed that the simplehash is declared in binaryheap.h with
"SH_SCOPE extern", which seems wrong. Attached is a rough patch to
bring the declarations into binaryheap.c.

Note that the attached patch uses "SH_SCOPE static", which makes sense
to me in this case, but causes a bunch of warnings in gcc. I will post
separately about silencing that warning, but for now you can either
use:

SH_SCOPE static inline

which is probably fine, but will encourage the compiler to inline more
code, when not all callers even use the hash table. Alternatively, you
can do:

SH_SCOPE static pg_attribute_unused()

which looks a bit wrong to me, but seems to silence the warnings, and
lets the compiler decide whether or not to inline.

Also probably needs comment updates, etc.


Sorry I'm late to the party, I didn't pay attention to this thread 
earlier. But I wonder why this doesn't use the existing pairing heap 
implementation? I would assume that to be at least as good as the binary 
heap + hash table. And it's cheap to to insert to (O(1)), so we could 
probably remove the MAX_HEAP_TXN_COUNT_THRESHOLD, and always keep the 
heap up-to-date.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: libpq.sgml: "server ejectes GSS" -> server rejects GSS

2024-04-08 Thread Heikki Linnakangas

On 09/04/2024 07:40, Erik Rijkers wrote:

Typo. fix:

-attempted first. If the server ejectes GSS encryption, SSL is
+attempted first. If the server rejects GSS encryption, SSL is


Fixed, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: PostgreSQL 17 Release Management Team & Feature Freeze

2024-04-08 Thread Heikki Linnakangas

On 08/04/2024 16:43, Tom Lane wrote:

Robert Haas  writes:

And maybe we need to think of a way to further mitigate this crush of
last minute commits. e.g. In the last week, you can't have more
feature commits, or more lines of insertions in your commits, than you
did in the prior 3 weeks combined. I don't know. I think this mad rush
of last-minute commits is bad for the project.


I was just about to pen an angry screed along the same lines.
The commit flux over the past couple days, and even the last
twelve hours, was flat-out ridiculous.  These patches weren't
ready a week ago, and I doubt they were ready now.

The RMT should feel free to exercise their right to require
revert "early and often", or we are going to be shipping a
very buggy release.



Can you elaborate, which patches you think were not ready? Let's make 
sure to capture any concrete concerns in the Open Items list.


I agree the last-minute crunch felt more intense than in previous years. 
I'm guilty myself; I crunched the "direct SSL" patches in. My rationale 
for that one: It's been in a pretty settled state for a long time. There 
hasn't been any particular concerns about the design or the 
implementation. I haven't commit tit sooner because I was not 
comfortable with the lack of tests, especially the libpq parts. So I 
made a last minute dash to write the tests so that I'm comfortable with 
it, and I restructured the commits so that the tests and refactorings 
come first. The resulting code changes look the same they have for a 
long time, and I didn't uncover any significant new issues while doing 
that. I would not have felt comfortable committing it otherwise.


Yeah, I should have done that sooner, but realistically, there's nothing 
like a looming deadline as a motivator. One idea to avoid the mad rush 
in the future would be to make the feature freeze deadline more 
progressive. For example:


April 1: If you are still working on a feature that you still want to 
commit, you must explicitly flag it in the commitfest as "I plan to 
commit this very soon".


April 4: You must give a short status update about anything that you 
haven't committed yet, with an explanation of how you plan to proceed 
with it.


April 5-8: Mandatory daily status updates, explicit approval by the 
commitfest manager needed each day to continue working on it.


April 8: Hard feature freeze deadline

This would also give everyone more visibility, so that we're not all 
surprised by the last minute flood of commits.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2024-04-08 Thread Heikki Linnakangas

On 08/04/2024 04:25, Heikki Linnakangas wrote:

One important open item now is that we need to register a proper ALPN
protocol ID with IANA.


I sent a request for that: 
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2024-04-07 Thread Heikki Linnakangas

On 08/04/2024 04:28, Tom Lane wrote:

Heikki Linnakangas  writes:

Committed this. Thank you to everyone involved!


Looks like perlcritic isn't too happy with the test code:
koel and crake say

./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of 
flagged function ignored - chmod at line 138, column 2.  See pages 208,278 of 
PBP.  ([InputOutput::RequireCheckedSyscalls] Severity: 5)
./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of 
flagged function ignored - open at line 184, column 1.  See pages 208,278 of 
PBP.  ([InputOutput::RequireCheckedSyscalls] Severity: 5)


Fixed, thanks.

I'll make a note in my personal TODO list to add perlcritic to cirrus CI 
if possible. I rely heavily on that nowadays to catch issues before the 
buildfarm.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2024-04-07 Thread Heikki Linnakangas

Committed this. Thank you to everyone involved!

On 04/04/2024 14:08, Matthias van de Meent wrote:

Patch 0003:

The read size in secure_raw_read is capped to port->raw_buf_remaining
if the raw buf has any data. While the user will probably call into
this function again, I think that's a waste of cycles.


Hmm, yeah, I suppose we could read more data in the same call. It seems 
simpler not to. The case that "raw_buf_remaining > 0" is a very rare.



pq_buffer_has_data now doesn't have any protections against
desynchronized state between PqRecvLength and PqRecvPointer. An
Assert(PqRecvLength >= PqRecvPointer) to that value would be
appreciated.


Added.


0006:

In CONNECTION_FAILED, we use connection_failed() to select whether we
need a new connection or stop trying altogether, but that function's
description states:


+ * Out-of-line portion of the CONNECTION_FAILED() macro
+ *
+ * Returns true, if we should retry the connection with different encryption 
method.


Which to me reads like we should reuse the connection, and try a
different method on that same connection. Maybe we can improve the
wording to something like
+ * Returns true, if we should reconnect with a different encryption method.
to make the reconnect part more clear.


Changed to "Returns true, if we should reconnect and retry with a 
different encryption method".



In select_next_encryption_method, there are several copies of this pattern:

if ((remaining_methods & ENC_METHOD) != 0)
{
 conn->current_enc_method = ENC_METHOD;
 return true;
}

I think a helper macro would reduce the verbosity of the scaffolding,
like in the attached SELECT_NEXT_METHOD.diff.txt.


Applied.

In addition to the above, I made heavy changes to the tests. I wanted to 
test not just the outcome (SSL, GSSAPI, plaintext, or fail), but also 
the steps and reconnections needed to get there. To facilitate that, I 
rewrote how the expected outcome was represented in the test script. It 
now uses a table-driven approach, with a line for each test iteration, 
ie. for each different combination of options that are tested.


I then added some more logging, so that whenever the server receives an 
SSLRequest or GSSENCRequest packet, it logs a line. That's controlled by 
a new not-in-sample GUC ("trace_connection_negotiation"), intended only 
for the test and debugging. The test scrapes the log for the lines that 
it prints, and the expected output includes a compact trace of expected 
events. For example, the expected output for "user=testuser 
gssencmode=prefer sslmode=prefer sslnegotiation=direct", when GSS and 
SSL are both disabled in the server, looks like this:


# USER  GSSENCMODE  SSLMODE   SSLNEGOTIATION  EVENTS -> OUTCOME
testuserprefer  preferdirect  connect, 
directsslreject, reconnect, sslreject, authok  -> plain


That means, we expect libpq to first try direct SSL, which is rejected 
by the server. It should then reconnect and attempt traditional 
negotiated SSL, which is also rejected. Finally, it should try plaintext 
authentication, without reconnecting, which succeeds.


That actually revealed a couple of slightly bogus behaviors with the 
current code. Here's one example:


# XXX: libpq retries the connection unnecessarily in this case:
nogssuser   require allow connect, gssaccept, authfail, 
reconnect, gssaccept, authfail -> fail


That means, with "gssencmode=require sslmode=allow", if the server 
accepts the GSS encryption but refuses the connection at authentication, 
libpq will reconnect and go through the same motions again. The second 
attempt is pointless, we know it's going to fail. The refactoring to the 
libpq state machine fixed that issue as a side-effect.


I removed the server ssl_enable_alpn and libpq sslalpn options. The idea 
was that they could be useful for testing, but we didn't actually have 
any tests that would use them, and you get the same result by testing 
with an older server or client version. I'm open to adding them back if 
we also add tests that benefit from them, but they were pretty pointless 
as they were.


One important open item now is that we need to register a proper ALPN 
protocol ID with IANA.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: LDAP & Kerberos test woes

2024-04-07 Thread Heikki Linnakangas

On 07/04/2024 13:19, Heikki Linnakangas wrote:

1st patch fixes the LDAP setup tests, and 2nd patch fixes the error
handling in the END blocks.


Committed and backpatched these test fixes.

--
Heikki Linnakangas
Neon (https://neon.tech)





LDAP & Kerberos test woes

2024-04-07 Thread Heikki Linnakangas
While refactoring the Kerberos test module in preparation for adding 
libpq encryption negotiation tests [1], I noticed that if the test 
script die()s during setup, the whole test is marked as SKIPped rather 
than failed. The cleanup END section is missing this trick:


--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -203,7 +203,12 @@ system_or_bail $krb5kdc, '-P', $kdc_pidfile;

 END
 {
+   # take care not to change the script's exit value
+   my $exit_code = $?;
+
kill 'INT', `cat $kdc_pidfile` if defined($kdc_pidfile) && -f 
$kdc_pidfile;

+
+   $? = $exit_code;
 }

The PostgreSQL::Cluster module got that right, but this test and the 
LdapServer module didn't get the memo.


After fixing that, the ldap tests started failing on my laptop:

[12:45:28.997](0.054s) # setting up LDAP server
# Checking port 59839
# Found port 59839
# Checking port 59840
# Found port 59840
# Running: /usr/sbin/slapd -f 
/home/heikki/git-sandbox/postgresql/build/testrun/ldap/001_auth/data/ldap-001_auth_j_WZ/slapd.conf 
-s0 -h ldap://localhost:59839 ldaps://localhost:59840
Can't exec "/usr/sbin/slapd": No such file or directory at 
/home/heikki/git-sandbox/postgresql/src/test/perl/PostgreSQL/Test/Utils.pm 
line 349.
[12:45:29.004](0.008s) Bail out!  command "/usr/sbin/slapd -f 
/home/heikki/git-sandbox/postgresql/build/testrun/ldap/001_auth/data/ldap-001_auth_j_WZ/slapd.conf 
-s0 -h ldap://localhost:59839 ldaps://localhost:59840" exited with value 2


That's because I don't have 'slapd' installed. The test script it 
supposed to check for that, and mark the test as SKIPped, but it's not 
really doing that on Linux. Attached patch fixes that, and also makes 
the error message a bit more precise, when the OpenLDAP installation is 
not found.


There's a lot more we could do with that code that tries to find the 
OpenLDAP installation. It should probably be a configure/meson test. 
This patch is just the minimum to keep this working after fixing the END 
block.


1st patch fixes the LDAP setup tests, and 2nd patch fixes the error 
handling in the END blocks.


[1] https://commitfest.postgresql.org/47/4742/

--
Heikki Linnakangas
Neon (https://neon.tech)From 6f09361ac0dbac5c2aef59b6a24e92486097cb43 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sun, 7 Apr 2024 13:06:39 +0300
Subject: [PATCH 1/2] Improve check in LDAP test to find the OpenLDAP
 installation

If the OpenLDAP installation directory is not found, set $setup to 0
so that the LDAP tests are skipped. The macOS checks were already
doing that, but the checks on other OS's were not. While we're at it,
improve the error message when the tests are skipped, to specify
whether the OS is supported at all, or if we just didn't find the
installation directory.

This was accidentally "working" without this, i.e. we were skipping
the tests if the OpenLDAP installation was not found, because of a bug
in the LdapServer test module: the END block clobbered the exit code
so if the script die()s before running the first subtest, the whole
test script was marked as SKIPped. The next commit will fix that bug,
but we need to fix the setup code first before we can do that.

These checks should probably go into configure/meson in the long run,
but this is better than nothing and allows fixing the bug in the END
block.
---
 src/test/ldap/LdapServer.pm   | 91 ++-
 src/test/ldap/t/001_auth.pl   |  3 +-
 src/test/ldap/t/002_bindpasswd.pl |  3 +-
 3 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/src/test/ldap/LdapServer.pm b/src/test/ldap/LdapServer.pm
index d2a043bc8f..91cd9e3762 100644
--- a/src/test/ldap/LdapServer.pm
+++ b/src/test/ldap/LdapServer.pm
@@ -57,49 +57,88 @@ use File::Basename;
 # private variables
 my ($slapd, $ldap_schema_dir, @servers);
 
-# visible variable
-our ($setup);
+# visible variables
+our ($setup, $setup_error);
 
 INIT
 {
+	# Find the OpenLDAP server binary and directory containing schema
+	# definition files.  On success, $setup is set to 1. On failure,
+	# it's set to 0, and an error message is set in $setup_error.
 	$setup = 1;
-	if ($^O eq 'darwin' && -d '/opt/homebrew/opt/openldap')
+	if ($^O eq 'darwin')
 	{
-		# typical paths for Homebrew on ARM
-		$slapd = '/opt/homebrew/opt/openldap/libexec/slapd';
-		$ldap_schema_dir = '/opt/homebrew/etc/openldap/schema';
-	}
-	elsif ($^O eq 'darwin' && -d '/usr/local/opt/openldap')
-	{
-		# typical paths for Homebrew on Intel
-		$slapd = '/usr/local/opt/openldap/libexec/slapd';
-		$ldap_schema_dir = '/usr/local/etc/openldap/schema';
-	}
-	elsif ($^O eq 'darwin' && -d '/opt/local/etc/openldap')
-	{
-		# typical paths for MacPorts
-		$slapd = '/opt/local/libexec/slapd';
-		$ldap_schema_dir = '/opt/local/etc/openldap/schema';
+		if (-d '/opt/homebrew/opt/openldap')
+		{
+			# typical paths for Homebrew on ARM
+			$slapd = '/opt/homebrew/opt/openldap/libe

Re: Use streaming read API in ANALYZE

2024-04-03 Thread Heikki Linnakangas

On 03/04/2024 13:31, Nazir Bilal Yavuz wrote:

Streaming API has been committed but the committed version has a minor
change, the read_stream_begin_relation function takes Relation instead
of BufferManagerRelation now. So, here is a v5 which addresses this
change.


I'm getting a repeatable segfault / assertion failure with this:

postgres=# CREATE TABLE tengiga (i int, filler text) with (fillfactor=10);
CREATE TABLE
postgres=# insert into tengiga select g, repeat('x', 900) from 
generate_series(1, 140) g;

INSERT 0 140
postgres=# set default_statistics_target = 10; ANALYZE tengiga;
SET
ANALYZE
postgres=# set default_statistics_target = 100; ANALYZE tengiga;
SET
ANALYZE
postgres=# set default_statistics_target =1000; ANALYZE tengiga;
SET
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

TRAP: failed Assert("BufferIsValid(hscan->rs_cbuf)"), File: 
"heapam_handler.c", Line: 1079, PID: 262232
postgres: heikki postgres [local] 
ANALYZE(ExceptionalCondition+0xa8)[0x56488a0de9d8]
postgres: heikki postgres [local] 
ANALYZE(heapam_scan_analyze_next_block+0x63)[0x5648899ece34]

postgres: heikki postgres [local] ANALYZE(+0x2d3f34)[0x564889b6af34]
postgres: heikki postgres [local] ANALYZE(+0x2d2a3a)[0x564889b69a3a]
postgres: heikki postgres [local] ANALYZE(analyze_rel+0x33e)[0x564889b68fa9]
postgres: heikki postgres [local] ANALYZE(vacuum+0x4b3)[0x564889c2dcc0]
postgres: heikki postgres [local] ANALYZE(ExecVacuum+0xd6f)[0x564889c2d7fe]
postgres: heikki postgres [local] 
ANALYZE(standard_ProcessUtility+0x901)[0x564889f0b8b9]
postgres: heikki postgres [local] 
ANALYZE(ProcessUtility+0x136)[0x564889f0afb1]

postgres: heikki postgres [local] ANALYZE(+0x6728c8)[0x564889f098c8]
postgres: heikki postgres [local] ANALYZE(+0x672b3b)[0x564889f09b3b]
postgres: heikki postgres [local] ANALYZE(PortalRun+0x320)[0x564889f09015]
postgres: heikki postgres [local] ANALYZE(+0x66b2c6)[0x564889f022c6]
postgres: heikki postgres [local] 
ANALYZE(PostgresMain+0x80c)[0x564889f06fd7]

postgres: heikki postgres [local] ANALYZE(+0x667876)[0x564889efe876]
postgres: heikki postgres [local] 
ANALYZE(postmaster_child_launch+0xe6)[0x564889e1f4b3]

postgres: heikki postgres [local] ANALYZE(+0x58e68e)[0x564889e2568e]
postgres: heikki postgres [local] ANALYZE(+0x58b7f0)[0x564889e227f0]
postgres: heikki postgres [local] 
ANALYZE(PostmasterMain+0x152b)[0x564889e2214d]

postgres: heikki postgres [local] ANALYZE(+0xb4)[0x564889cdb4b4]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7f7d83b6724a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f7d83b67305]
postgres: heikki postgres [local] ANALYZE(_start+0x21)[0x564889971a61]
2024-04-03 20:15:49.157 EEST [262101] LOG:  server process (PID 262232) 
was terminated by signal 6: Aborted


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Combine Prune and Freeze records emitted by vacuum

2024-04-03 Thread Heikki Linnakangas

On 03/04/2024 17:18, Melanie Plageman wrote:

I noticed you didn't make the comment updates I suggested in my
version 13 here [1]. A few of them are outdated references to
heap_page_prune() and one to a now deleted variable name
(all_visible_except_removable).

I applied them to your v13 and attached the diff.


Applied those changes, and committed. Thank you!


Off-list, Melanie reported that there is a small regression with the
benchmark script she posted yesterday, after all, but I'm not able to
reproduce that.


Actually, I think it was noise.


Ok, phew.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Combine Prune and Freeze records emitted by vacuum

2024-04-03 Thread Heikki Linnakangas

On 02/04/2024 16:11, Heikki Linnakangas wrote:

On 01/04/2024 20:22, Melanie Plageman wrote:

Review for 0003-0006 (I didn't have any new thoughts on 0002). I know
you didn't modify them much/at all, but I noticed some things in my code
that could be better.


Ok, here's what I have now. I made a lot of small comment changes here
and there, and some minor local refactorings, but nothing major. I lost
track of all the individual changes I'm afraid, so I'm afraid you'll
have to just diff against the previous version if you want to see what's
changed. I hope I didn't break anything.

I'm pretty happy with this now. I will skim through it one more time
later today or tomorrow, and commit. Please review once more if you have
a chance.


This probably doesn't belong here. I noticed spgdoinsert.c had a static
function for sorting OffsetNumbers, but I didn't see anything general
purpose anywhere else.


I copied the spgdoinsert.c implementation to vacuumlazy.c as is. Would
be nice to have just one copy of this in some common place, but I also
wasn't sure where to put it.


One more version, with two small fixes:

1. I fumbled the offsetnumber-cmp function at the last minute so that it 
didn't compile. Fixed. that


2. On VACUUM on an unlogged or temp table, the logic always thought that 
we would be generating an FPI, causing it to always freeze when it 
could. But of course, you never generate FPIs on an unlogged table. 
Fixed that. (Perhaps we should indeed freeze more aggressively on an 
unlogged table, but changing the heuristic is out of scope for this patch.)


Off-list, Melanie reported that there is a small regression with the 
benchmark script she posted yesterday, after all, but I'm not able to 
reproduce that.


--
Heikki Linnakangas
Neon (https://neon.tech)
From e04bda4666d7eaff0c520a9c9e1468a9c4cc9f51 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 2 Apr 2024 15:47:06 +0300
Subject: [PATCH v13 1/3] Refactor how heap_prune_chain() updates prunable_xid

In preparation of freezing and counting tuples which are not
candidates for pruning, split heap_prune_record_unchanged() into
multiple functions, depending the kind of line pointer. That's not too
interesting right now, but makes the next commit smaller.

Recording the lowest soon-to-be prunable xid is one of the actions we
take for unchanged LP_NORMAL item pointers but not for others, so move
that to the new heap_prune_record_unchanged_lp_normal() function. The
next commit will add more actions to these functions.

Author: Melanie Plageman 
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
---
 src/backend/access/heap/pruneheap.c | 125 
 1 file changed, 92 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef563e19aa5..1b5bf990d21 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -78,7 +78,11 @@ static void heap_prune_record_redirect(PruneState *prstate,
 static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
 static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
 static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum);
+
+static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
 
 static void page_verify_redirects(Page page);
 
@@ -311,7 +315,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged(, offnum);
+			heap_prune_record_unchanged_lp_unused(page, , offnum);
 			continue;
 		}
 
@@ -324,7 +328,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 			if (unlikely(prstate.mark_unused_now))
 heap_prune_record_unused(, offnum, false);
 			else
-heap_prune_record_unchanged(, offnum);
+heap_prune_record_unchanged_lp_dead(page, , offnum);
 			continue;
 		}
 
@@ -434,7 +438,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 			}
 		}
 		else
-			heap_prune_record_unchanged(, offnum);
+			heap_prune_record_unchanged_lp_normal(page, presult->htsv, , offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -652,9 +656,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
 		 */
 		chainitems[nchain++] = offnum;
 
-		/*
-		 * Check tuple's visibility status.
-		 */
 		switch (htsv_get_valid_status(htsv[off

Re: Streaming read-ready sequential scan code

2024-04-02 Thread Heikki Linnakangas

On 01/04/2024 22:58, Melanie Plageman wrote:

Attached v7 has version 14 of the streaming read API as well as a few
small tweaks to comments and code.


I saw benchmarks in this thread to show that there's no regression when 
the data is in cache, but I didn't see any benchmarks demonstrating the 
benefit of this. So I ran this quick test:


-- create table ~1 GB table with only 1 row per page.
CREATE TABLE giga (i int, filler text) with (fillfactor=10);
insert into giga select g, repeat('x', 900) from generate_series(1, 
14) g;

vacuum freeze giga;

\timing on
select count(*) from giga;

The SELECT takes about 390 ms on 'master', and 230 ms with the patch.

This is pretty much the best case for this patch, real world gains will 
be much smaller. Nevertheless, nice speedup!


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Combine Prune and Freeze records emitted by vacuum

2024-04-02 Thread Heikki Linnakangas

On 01/04/2024 20:22, Melanie Plageman wrote:

Review for 0003-0006 (I didn't have any new thoughts on 0002). I know
you didn't modify them much/at all, but I noticed some things in my code
that could be better.


Ok, here's what I have now. I made a lot of small comment changes here 
and there, and some minor local refactorings, but nothing major. I lost 
track of all the individual changes I'm afraid, so I'm afraid you'll 
have to just diff against the previous version if you want to see what's 
changed. I hope I didn't break anything.


I'm pretty happy with this now. I will skim through it one more time 
later today or tomorrow, and commit. Please review once more if you have 
a chance.



This probably doesn't belong here. I noticed spgdoinsert.c had a static
function for sorting OffsetNumbers, but I didn't see anything general
purpose anywhere else.


I copied the spgdoinsert.c implementation to vacuumlazy.c as is. Would 
be nice to have just one copy of this in some common place, but I also 
wasn't sure where to put it.


--
Heikki Linnakangas
Neon (https://neon.tech)
From be8891155c93f3555c49371f9804bdf5ba578f6e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 2 Apr 2024 15:47:06 +0300
Subject: [PATCH v12 1/2] Refactor how heap_prune_chain() updates prunable_xid

In preparation of freezing and counting tuples which are not
candidates for pruning, split heap_prune_record_unchanged() into
multiple functions, depending the kind of line pointer. That's not too
interesting right now, but makes the next commit smaller.

Recording the lowest soon-to-be prunable xid is one of the actions we
take for unchanged LP_NORMAL item pointers but not for others, so move
that to the new heap_prune_record_unchanged_lp_normal() function. The
next commit will add more actions to these functions.

Author: Melanie Plageman 
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
---
 src/backend/access/heap/pruneheap.c | 125 
 1 file changed, 92 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef563e19aa5..1b5bf990d21 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -78,7 +78,11 @@ static void heap_prune_record_redirect(PruneState *prstate,
 static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
 static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
 static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum);
+
+static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
 
 static void page_verify_redirects(Page page);
 
@@ -311,7 +315,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged(, offnum);
+			heap_prune_record_unchanged_lp_unused(page, , offnum);
 			continue;
 		}
 
@@ -324,7 +328,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 			if (unlikely(prstate.mark_unused_now))
 heap_prune_record_unused(, offnum, false);
 			else
-heap_prune_record_unchanged(, offnum);
+heap_prune_record_unchanged_lp_dead(page, , offnum);
 			continue;
 		}
 
@@ -434,7 +438,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 			}
 		}
 		else
-			heap_prune_record_unchanged(, offnum);
+			heap_prune_record_unchanged_lp_normal(page, presult->htsv, , offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -652,9 +656,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
 		 */
 		chainitems[nchain++] = offnum;
 
-		/*
-		 * Check tuple's visibility status.
-		 */
 		switch (htsv_get_valid_status(htsv[offnum]))
 		{
 			case HEAPTUPLE_DEAD:
@@ -670,9 +671,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
 			case HEAPTUPLE_RECENTLY_DEAD:
 
 /*
- * This tuple may soon become DEAD.  Update the hint field so
- * that the page is reconsidered for pruning in future.
- *
  * We don't need to advance the conflict horizon for
  * RECENTLY_DEAD tuples, even if we are removing them.  This
  * is because we only remove RECENTLY_DEAD tuples if they
@@ -681,8 +679,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
  * tuple by virtue of being later in the chain.  We will have
  * advanced the conflict horizon for the D

Re: Combine Prune and Freeze records emitted by vacuum

2024-04-01 Thread Heikki Linnakangas

On 01/04/2024 20:22, Melanie Plageman wrote:

 From 17e183835a968e81daf7b74a4164b243e2de35aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Fri, 29 Mar 2024 19:43:09 -0400
Subject: [PATCH v11 3/7] Introduce PRUNE_DO_* actions

We will eventually take additional actions in heap_page_prune() at the
discretion of the caller. For now, introduce these PRUNE_DO_* macros and
turn mark_unused_now, a paramter to heap_page_prune(), into a PRUNE_DO_


paramter -> parameter


action.
---
  src/backend/access/heap/pruneheap.c  | 51 ++--
  src/backend/access/heap/vacuumlazy.c | 11 --
  src/include/access/heapam.h  | 13 ++-
  3 files changed, 46 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c 
b/src/backend/access/heap/pruneheap.c
index fb0ad834f1b..30965c3c5a1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -29,10 +29,11 @@
  /* Working data for heap_page_prune and subroutines */
  typedef struct
  {
+   /* PRUNE_DO_* arguments */
+   uint8   actions;


I wasn't sure if actions is a good name. What do you think?


Committed this part, with the name 'options'. There's some precedent for 
that in heap_insert().


I decided to keep it a separate bool field here in the PruneState 
struct, though, and only changed it in the heap_page_prune() function 
signature. It didn't feel worth the code churn here, and 
'prstate.mark_unused_now' is a shorter than "(prstate.options & 
HEAP_PRUNE_PAGE_MARK_UNUSED_NOW) != 0" anyway.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Combine Prune and Freeze records emitted by vacuum

2024-04-01 Thread Heikki Linnakangas

On 01/04/2024 19:08, Melanie Plageman wrote:

On Mon, Apr 01, 2024 at 05:17:51PM +0300, Heikki Linnakangas wrote:

diff --git a/src/backend/access/heap/pruneheap.c 
b/src/backend/access/heap/pruneheap.c
@@ -256,15 +270,16 @@ heap_page_prune(Relation relation, Buffer buffer,
tup.t_tableOid = RelationGetRelid(relation);
  
  	/*

-* Determine HTSV for all tuples.
+* Determine HTSV for all tuples, and queue them up for processing as 
HOT
+* chain roots or as a heap-only items.


Reading this comment now as a whole, I would add something like
"Determining HTSV for all tuples once is required for correctness" to
the start of the second paragraph. The new conjunction on the first
paragraph sentence followed by the next paragraph is a bit confusing
because it sounds like queuing them up for processing is required for
correctness (which, I suppose it is on some level). Basically, I'm just
saying that it is now less clear what is required for correctness.


Fixed.


While I recognize this is a matter of style and not important, I
personally prefer this for reverse looping:

for (int i = prstate.nheaponly_items; i --> 0;)


I don't think we use that style anywhere in the Postgres source tree 
currently. (And I don't like it ;-) )



I do think a comment about the reverse order would be nice. I know it
says something above the first loop to this effect:

* Processing the items in reverse order (and thus the tuples in
* increasing order) increases prefetching efficiency significantly /
* decreases the number of cache misses.

So perhaps we could just say "as above, process the items in reverse
order"


I'm actually not sure why it makes a difference. I would assume all the 
data to already be in CPU cache at this point, since the first loop 
already accessed it, so I think there's something else going on. But I 
didn't investigate it deeper. Anyway, added a comment.


Committed the first of the remaining patches with those changes. And 
also this, which is worth calling out:



if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId  itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) 
PageGetItem(page, itemid);

if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,

  _xid_removed);
heap_prune_record_unused(, offnum, 
true);
}
else
{
/*
 * This tuple should've been processed and 
removed as part of
 * a HOT chain, so something's wrong.  To 
preserve evidence,
 * we don't dare to remove it.  We cannot leave 
behind a DEAD
 * tuple either, because that will cause VACUUM 
to error out.
 * Throwing an error with a distinct error 
message seems like
 * the least bad option.
 */
elog(ERROR, "dead heap-only tuple (%u, %d) is not 
linked to from any HOT chain",
 blockno, offnum);
}
}
else
heap_prune_record_unchanged_lp_normal(page, , 
offnum);


As you can see, I turned that into a hard error. Previously, that code 
was at the top of heap_prune_chain(), and it was normal to see DEAD 
heap-only tuples there, because they would presumably get processed 
later as part of a HOT chain. But now this is done after all the HOT 
chains have already been processed.


Previously if there was a dead heap-only tuple like that on the page for 
some reason, it was silently not processed by heap_prune_chain() 
(because it was assumed that it would be processed later as part of a 
HOT chain), and was left behind as a HEAPTUPLE_DEAD tuple. If the 
pruning was done as part of VACUUM, VACUUM would fail with "ERROR: 
unexpected HeapTupleSatisfiesVacuum result". Or am I missing something?


Now you get that above error also on on-access pruning, which is not 
ideal. But I don't remember hearing about corruption like that, and 
you'd get the error on VACUUM anyway.


With the next patches, heap_prune_record_unchanged() will do more, and 
will also throw an error on a HEAPTUPLE_LIVE tuple, so even though in 
the first patch we could print just a WARNING and move on, it gets more 
awkward with the rest of the patches.


(Continuing with the remaining patches..)

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Commitfest Manager for March

2024-04-01 Thread Heikki Linnakangas

On 01/04/2024 09:05, Andrey M. Borodin wrote:

I think it makes sense to close this commitfest only after Feature Freeze on 
April 8, 2024 at 0:00 AoE.
What do you think?


+1. IIRC that's how it's been done in last commitfest in previous years too.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Streaming I/O, vectored I/O (WIP)

2024-03-29 Thread Heikki Linnakangas

On 29/03/2024 09:01, Thomas Munro wrote:

On Fri, Mar 29, 2024 at 9:45 AM Heikki Linnakangas  wrote:

master (213c959a29):8.0 s
streaming-api v13:  9.5 s


Hmm, that's not great, and I think I know one factor that has
confounded my investigation and the conflicting reports I have
received from a couple of people: some are using meson, which is
defaulting to -O3 by default, and others are using make which gives
you -O2 by default, but at -O2, GCC doesn't inline that
StartReadBuffer specialisation that is used in the "fast path", and
possibly more.  Some of that gap is closed by using
pg_attribute_inline_always.  Clang fails to inline at any level.  So I
should probably use the "always" macro there because that is the
intention.  Still processing the rest of your email...


Ah yeah, I also noticed that the inlining didn't happen with some 
compilers and flags. I use a mix of gcc and clang and meson and autoconf 
in my local environment.


The above micro-benchmarks were with meson and gcc -O3. GCC version:

$ gcc --version
gcc (Debian 12.2.0-14) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Combine Prune and Freeze records emitted by vacuum

2024-03-29 Thread Heikki Linnakangas

On 29/03/2024 07:04, Melanie Plageman wrote:

On Thu, Mar 28, 2024 at 11:07:10AM -0400, Melanie Plageman wrote:

These comments could use another pass. I had added some extra
(probably redundant) content when I thought I was refactoring it a
certain way and then changed my mind.

Attached is a diff with some ideas I had for a bit of code simplification.

Are you working on cleaning this patch up or should I pick it up?


Attached v9 is rebased over master. But, more importantly, I took
another pass at heap_prune_chain() and am pretty happy with what I came
up with. See 0021. I simplified the traversal logic and then grouped the
chain processing into three branches at the end. I find it much easier
to understand what we are doing for different types of HOT chains.


Ah yes, agreed, that's nicer.

The 'survivor' variable is a little confusing, especially here:

if (!survivor)
{
int i;

/*
 * If no DEAD tuple was found, and the root is redirected, mark 
it as
 * such.
 */
if ((i = ItemIdIsRedirected(rootlp)))
heap_prune_record_unchanged_lp_redirect(prstate, 
rootoffnum);

/* the rest of tuples in the chain are normal, unchanged tuples 
*/
for (; i < nchain; i++)
heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}

You would think that "if(!survivor)", it means that there is no live 
tuples on the chain, i.e. no survivors. But in fact it's the opposite; 
it means that the are all live. Maybe call it 'ndeadchain' instead, 
meaning the number of dead items in the chain.



I got rid of revisited. We can put it back, but I was thinking: we stash
all HOT tuples and then loop over them later, calling record_unchanged()
on the ones that aren't marked. But, if we have a lot of HOT tuples, is
this really that much better than just looping through all the offsets
and calling record_unchanged() on just the ones that aren't marked?


Well, it requires looping through all the offsets one more time, and 
unless you have a lot of HOT tuples, most items would be marked already. 
But maybe the overhead is negligible anyway.



I've done that in my version. While testing this, I found that only
on-access pruning needed this final loop calling record_unchanged() on
items not yet marked. I know we can't skip this final loop entirely in
the ON ACCESS case because it calls record_prunable(), but we could
consider moving that back out into heap_prune_chain()? Or what do you
think?


Hmm, why is that different with on-access pruning?

Here's another idea: In the first loop through the offsets, where we 
gather the HTSV status of each item, also collect the offsets of all HOT 
and non-HOT items to two separate arrays. Call heap_prune_chain() for 
all the non-HOT items first, and then process any remaining HOT tuples 
that haven't been marked yet.



I haven't finished updating all the comments, but I am really interested
to know what you think about heap_prune_chain() now.


Looks much better now, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2024-03-28 Thread Heikki Linnakangas

On 28/03/2024 13:15, Matthias van de Meent wrote:

On Tue, 5 Mar 2024 at 15:08, Heikki Linnakangas  wrote:


I hope I didn't joggle your elbow reviewing this, Jacob, but I spent
some time rebase and fix various little things:


With the recent changes to backend startup committed by you, this
patchset has gotten major apply failures.

Could you provide a new version of the patchset so that it can be
reviewed in the context of current HEAD?


Here you are.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 83484696e470ab130bcd3038f0e28d494065071a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 5 Mar 2024 15:40:30 +0200
Subject: [PATCH v9 1/6] Move Kerberos module

So that we can reuse it in new tests.
---
 src/test/kerberos/t/001_auth.pl   | 174 ++--
 src/test/perl/PostgreSQL/Test/Kerberos.pm | 229 ++
 2 files changed, 240 insertions(+), 163 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/Kerberos.pm

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index e51e87d0a2e..d4f1ec58092 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -21,6 +21,7 @@ use strict;
 use warnings FATAL => 'all';
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Kerberos;
 use Test::More;
 use Time::HiRes qw(usleep);
 
@@ -34,177 +35,27 @@ elsif (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/)
 	  'Potentially unsafe test GSSAPI/Kerberos not enabled in PG_TEST_EXTRA';
 }
 
-my ($krb5_bin_dir, $krb5_sbin_dir);
-
-if ($^O eq 'darwin' && -d "/opt/homebrew")
-{
-	# typical paths for Homebrew on ARM
-	$krb5_bin_dir = '/opt/homebrew/opt/krb5/bin';
-	$krb5_sbin_dir = '/opt/homebrew/opt/krb5/sbin';
-}
-elsif ($^O eq 'darwin')
-{
-	# typical paths for Homebrew on Intel
-	$krb5_bin_dir = '/usr/local/opt/krb5/bin';
-	$krb5_sbin_dir = '/usr/local/opt/krb5/sbin';
-}
-elsif ($^O eq 'freebsd')
-{
-	$krb5_bin_dir = '/usr/local/bin';
-	$krb5_sbin_dir = '/usr/local/sbin';
-}
-elsif ($^O eq 'linux')
-{
-	$krb5_sbin_dir = '/usr/sbin';
-}
-
-my $krb5_config = 'krb5-config';
-my $kinit = 'kinit';
-my $klist = 'klist';
-my $kdb5_util = 'kdb5_util';
-my $kadmin_local = 'kadmin.local';
-my $krb5kdc = 'krb5kdc';
-
-if ($krb5_bin_dir && -d $krb5_bin_dir)
-{
-	$krb5_config = $krb5_bin_dir . '/' . $krb5_config;
-	$kinit = $krb5_bin_dir . '/' . $kinit;
-	$klist = $krb5_bin_dir . '/' . $klist;
-}
-if ($krb5_sbin_dir && -d $krb5_sbin_dir)
-{
-	$kdb5_util = $krb5_sbin_dir . '/' . $kdb5_util;
-	$kadmin_local = $krb5_sbin_dir . '/' . $kadmin_local;
-	$krb5kdc = $krb5_sbin_dir . '/' . $krb5kdc;
-}
-
-my $host = 'auth-test-localhost.postgresql.example.com';
-my $hostaddr = '127.0.0.1';
-my $realm = 'EXAMPLE.COM';
-
-my $krb5_conf = "${PostgreSQL::Test::Utils::tmp_check}/krb5.conf";
-my $kdc_conf = "${PostgreSQL::Test::Utils::tmp_check}/kdc.conf";
-my $krb5_cache = "${PostgreSQL::Test::Utils::tmp_check}/krb5cc";
-my $krb5_log = "${PostgreSQL::Test::Utils::log_path}/krb5libs.log";
-my $kdc_log = "${PostgreSQL::Test::Utils::log_path}/krb5kdc.log";
-my $kdc_port = PostgreSQL::Test::Cluster::get_free_port();
-my $kdc_datadir = "${PostgreSQL::Test::Utils::tmp_check}/krb5kdc";
-my $kdc_pidfile = "${PostgreSQL::Test::Utils::tmp_check}/krb5kdc.pid";
-my $keytab = "${PostgreSQL::Test::Utils::tmp_check}/krb5.keytab";
-
 my $pgpass = "${PostgreSQL::Test::Utils::tmp_check}/.pgpass";
 
 my $dbname = 'postgres';
 my $username = 'test1';
 my $application = '001_auth.pl';
 
-note "setting up Kerberos";
-
-my ($stdout, $krb5_version);
-run_log [ $krb5_config, '--version' ], '>', \$stdout
-  or BAIL_OUT("could not execute krb5-config");
-BAIL_OUT("Heimdal is not supported") if $stdout =~ m/heimdal/;
-$stdout =~ m/Kerberos 5 release ([0-9]+\.[0-9]+)/
-  or BAIL_OUT("could not get Kerberos version");
-$krb5_version = $1;
-
 # Construct a pgpass file to make sure we don't use it
 append_to_file($pgpass, '*:*:*:*:abc123');
 
 chmod 0600, $pgpass or die $!;
 
-# Build the krb5.conf to use.
-#
-# Explicitly specify the default (test) realm and the KDC for
-# that realm to avoid the Kerberos library trying to look up
-# that information in DNS, and also because we're using a
-# non-standard KDC port.
-#
-# Also explicitly disable DNS lookups since this isn't really
-# our domain and we shouldn't be causing random DNS requests
-# to be sent out (not to mention that broken DNS environments
-# can cause the tests to take an extra long time and timeout).
-#
-# Reverse DNS is explicitly disabled to avoid any issue with a
-# captive portal or other cases where the reverse DNS succeeds
-# and the Kerberos library uses that as the canonical name of
-# the host and then tries to acquire a cross-realm ticket.
-append_to_file(
-	$krb5_conf,
-	qq![log

Re: Combine Prune and Freeze records emitted by vacuum

2024-03-28 Thread Heikki Linnakangas

On 28/03/2024 03:53, Melanie Plageman wrote:

On Thu, Mar 28, 2024 at 01:04:04AM +0200, Heikki Linnakangas wrote:

One change with this is that live_tuples and many of the other fields are
now again updated, even if the caller doesn't need them. It was hard to skip
them in a way that would save any cycles, with the other refactorings.


I am worried we are writing checks that are going to have to come out of
SELECT queries' bank accounts, but I'll do some benchmarking when we're
all done with major refactoring.


Sounds good, thanks.


+*
+* Whether we arrive at the dead HOT tuple first here 
or while
+* following a chain below affects whether preceding 
RECENTLY_DEAD
+* tuples in the chain can be removed or not.  Imagine 
that you
+* have a chain with two tuples: RECENTLY_DEAD -> DEAD. 
 If we
+* reach the RECENTLY_DEAD tuple first, the 
chain-following logic
+* will find the DEAD tuple and conclude that both 
tuples are in
+* fact dead and can be removed.  But if we reach the 
DEAD tuple
+* at the end of the chain first, when we reach the 
RECENTLY_DEAD
+* tuple later, we will not follow the chain because 
the DEAD
+* TUPLE is already 'marked', and will not remove the
+* RECENTLY_DEAD tuple.  This is not a correctness 
issue, and the
+* RECENTLY_DEAD tuple will be removed by a later 
VACUUM.
 */
if (!HeapTupleHeaderIsHotUpdated(htup))


Is this intentional? Like would it be correct to remove the
RECENTLY_DEAD tuple during the current vacuum?


Yes, it would be correct. And if we happen to visit the items in 
different order, the RECENTLY_DEAD tuple first, it will get removed.


(This is just based on me reading the code, I'm not sure if I've missed 
something. Would be nice to construct a test case for that and step 
through it with a debugger to really see what happens. But this is not a 
new issue, doesn't need to be part of this patch)



 From c2ee7508456d0e76de985f9997a6840450e342a8 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 28 Mar 2024 00:45:26 +0200
Subject: [PATCH v8 22/22] WIP

- Got rid of all_visible_except_removable again. We're back to the
   roughly the same mechanism as on 'master', where the all_visible
   doesn't include LP_DEAD items, but at the end of
   heap_page_prune_and_freeze() when we return to the caller, we clear
   it if there were any LP_DEAD items. I considered calling the
   variable 'all_visible_except_lp_dead', which would be more accurate,
   but it's also very long.


not longer than all_visible_except_removable. I would be happy to keep
it more exact, but I'm also okay with just all_visible.


Ok, let's make it 'all_visible_except_lp_dead' for clarity.


What's this "cutoffs TODO"?


+ * cutoffs TODO
+ *
   * presult contains output parameters needed by callers such as the number of
   * tuples removed and the number of line pointers newly marked LP_DEAD.
   * heap_page_prune_and_freeze() is responsible for initializing it.


All the other arguments are documented in the comment, except 'cutoffs'.


@@ -728,10 +832,12 @@ htsv_get_valid_status(int status)
   * DEAD, our visibility test is just too coarse to detect it.
   *
   * In general, pruning must never leave behind a DEAD tuple that still has
- * tuple storage.  VACUUM isn't prepared to deal with that case.  That's why
+ * tuple storage.  VACUUM isn't prepared to deal with that case (FIXME: it no 
longer cares, right?).
+ * That's why
   * VACUUM prunes the same heap page a second time (without dropping its lock
   * in the interim) when it sees a newly DEAD tuple that we initially saw as
- * in-progress.  Retrying pruning like this can only happen when an inserting
+ * in-progress (FIXME: Really? Does it still do that?).


Yea, I think all that is no longer true. I missed this comment back
then.


Committed a patch to remove it.


@@ -981,7 +1069,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 * RECENTLY_DEAD tuples just in case there's a DEAD one after 
them;
 * but we can't advance past anything else.  We have to make 
sure that
 * we don't miss any DEAD tuples, since DEAD tuples that still 
have
-* tuple storage after pruning will confuse VACUUM.
+* tuple storage after pruning will confuse VACUUM (FIXME: not 
anymore
+* I think?).


Meaning, it won't confuse vacuum anymore or there won't be DEAD tuples
with storage after pruning anymore?


I meant that it won't confuse VACUUM anymore. lazy_scan_prune() doesn't 
loop through the items on the page checking their visibility anymore.


Hmm, one confusion remains thoug

Re: Combine Prune and Freeze records emitted by vacuum

2024-03-27 Thread Heikki Linnakangas

On 27/03/2024 17:18, Melanie Plageman wrote:

I need some way to modify the control flow or accounting such that I
know which HEAPTUPLE_RECENTLY_DEAD tuples will not be marked LP_DEAD.
And a way to consider freezing and do live tuple accounting for these
and HEAPTUPLE_LIVE tuples exactly once.


Just a quick update: I've been massaging this some more today, and I 
think I'm onto got something palatable. I'll send an updated patch later 
today, but the key is to note that for each item on the page, there is 
one point where we determine the fate of the item, whether it's pruned 
or not. That can happen in different points in in heap_page_prune(). 
That's also when we set marked[offnum] = true. Whenever that happens, we 
all call one of the a heap_page_prune_record_*() subroutines. We already 
have those subroutines for when a tuple is marked as dead or unused, but 
let's add similar subroutines for the case that we're leaving the tuple 
unchanged. If we move all the bookkeeping logic to those subroutines, we 
can ensure that it gets done exactly once for each tuple, and at that 
point we know what we are going to do to the tuple, so we can count it 
correctly. So heap_prune_chain() decides what to do with each tuple, and 
ensures that each tuple is marked only once, and the subroutines update 
all the variables, add the item to the correct arrays etc. depending on 
what we're doing with it.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Streaming I/O, vectored I/O (WIP)

2024-03-26 Thread Heikki Linnakangas

On 24/03/2024 15:02, Thomas Munro wrote:

On Wed, Mar 20, 2024 at 4:04 AM Heikki Linnakangas  wrote:

Maybe 'pg_streaming_read_next_buffer' or just 'pg_streaming_read_next',
for a shorter name.


Hmm.  The idea of 'buffer' appearing in a couple of names is that
there are conceptually other kinds of I/O that we might want to
stream, like raw files or buffers other than the buffer pool, maybe
even sockets, so this would be part of a family of similar interfaces.
I think it needs to be clear that this variant gives you buffers.  I'm
OK with removing "get" but I guess it would be better to keep the
words in the same order across the three functions?  What about these?

streaming_read_buffer_begin();
streaming_read_buffer_next();
streaming_read_buffer_end();

Tried like that in this version.  Other ideas would be to make
"stream" the main noun, buffered_read_stream_begin() or something.
Ideas welcome.


Works for me, although "streaming_read_buffer" is a pretty long prefix. 
The flags like "STREAMING_READ_MAINTENANCE" probably ought to be 
"STREAMING_READ_BUFFER_MAINTENANCE" as well.


Maybe "buffer_stream_next()"?


Here are some other changes:

* I'm fairly happy with the ABC adaptive distance algorithm so far, I
think, but I spent more time tidying up the way it is implemented.  I
didn't like the way each 'range' had buffer[MAX_BUFFERS_PER_TRANSFER],
so I created a new dense array stream->buffers that behaved as a
second circular queue.

* The above also made it trivial for MAX_BUFFERS_PER_TRANSFER to
become the GUC that it always wanted to be: buffer_io_size defaulting
to 128kB.  Seems like a reasonable thing to have?  Could also
influence things like bulk write?  (The main problem I have with the
GUC currently is choosing a category, async resources is wrong)

* By analogy, it started to look a bit funny that each range had room
for a ReadBuffersOperation, and we had enough ranges for
max_pinned_buffers * 1 block range.  So I booted that out to another
dense array, of size max_ios.

* At the same time, Bilal and Andres had been complaining privately
about 'range' management overheads showing up in perf and creating a
regression against master on fully cached scans that do nothing else
(eg pg_prewarm, where we lookup, pin, unpin every page and do no I/O
and no CPU work with the page, a somewhat extreme case but a
reasonable way to isolate the management costs); having made the above
change, it suddenly seemed obvious that I should make the buffers
array the 'main' circular queue, pointing off to another place for
information required for dealing with misses.  In this version, there
are no more range objects.  This feels better and occupies and touches
less memory.  See pictures below.


+1 for all that. Much better!


* Various indexes and sizes that couldn't quite fit in uint8_t but
couldn't possibly exceed a few thousand because they are bounded by
numbers deriving from range-limited GUCs are now int16_t (while I was
looking for low hanging opportunities to reduce memory usage...)


Is int16 enough though? It seems so, because:

max_pinned_buffers = Max(max_ios * 4, buffer_io_size);

and max_ios is constrained by the GUC's maximum MAX_IO_CONCURRENCY, and 
buffer_io_size is constrained by MAX_BUFFER_IO_SIZE == PG_IOV_MAX == 32.


If someone changes those constants though, int16 might overflow and fail 
in weird ways. I'd suggest being more careful here and explicitly clamp 
max_pinned_buffers at PG_INT16_MAX or have a static assertion or 
something. (I think it needs to be somewhat less than PG_INT16_MAX, 
because of the extra "overflow buffers" stuff and some other places 
where you do arithmetic.)



/*
 * We gave a contiguous range of buffer space to StartReadBuffers(), but
 * we want it to wrap around at max_pinned_buffers.  Move values that
 * overflowed into the extra space.  At the same time, put -1 in the I/O
 * slots for the rest of the buffers to indicate no I/O.  They are 
covered
 * by the head buffer's I/O, if there is one.  We avoid a % operator.
 */
overflow = (stream->next_buffer_index + nblocks) - 
stream->max_pinned_buffers;
if (overflow > 0)
{
memmove(>buffers[0],
>buffers[stream->max_pinned_buffers],
sizeof(stream->buffers[0]) * overflow);
for (int i = 0; i < overflow; ++i)
stream->buffer_io_indexes[i] = -1;
for (int i = 1; i < nblocks - overflow; ++i)
stream->buffer_io_indexes[stream->next_buffer_index + 
i] = -1;
}
else
{
for (int i = 1; i < nblocks; ++i)
stream->buffer_io_indexes[stream->next_buffer_index + 
i] = -1;
}


Instead of clearing buffer_io

Re: Combine Prune and Freeze records emitted by vacuum

2024-03-25 Thread Heikki Linnakangas

On 24/03/2024 18:32, Melanie Plageman wrote:

On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas  wrote:


In heap_page_prune_and_freeze(), we now do some extra work on each live
tuple, to set the all_visible_except_removable correctly. And also to
update live_tuples, recently_dead_tuples and hastup. When we're not
freezing, that's a waste of cycles, the caller doesn't care. I hope it's
enough that it doesn't matter, but is it?


Last year on an early version of the patch set I did some pgbench
tpcb-like benchmarks -- since there is a lot of on-access pruning in
that workload -- and I don't remember it being a showstopper. The code
has changed a fair bit since then. However, I think it might be safer
to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
the rest of the loop after heap_prune_satisifies_vacuum() when
on-access pruning invokes it. I had avoided that because it felt ugly
and error-prone, however it addresses a few other of your points as
well.


Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the 
argument described what it does, rather than who it's for. For example, 
'need_all_visible'. If set to true, the function determines 
'all_visible', otherwise it does not.


I started to look closer at the loops in heap_prune_chain() and how they 
update all the various flags and counters. There's a lot going on there. 
We have:


- live_tuples counter
- recently_dead_tuples counter
- all_visible[_except_removable]
- all_frozen
- visibility_cutoff_xid
- hastup
- prstate.frozen array
- nnewlpdead
- deadoffsets array

And that doesn't even include all the local variables and the final 
dead/redirected arrays.


Some of those are set in the first loop that initializes 'htsv' for each 
tuple on the page. Others are updated in heap_prune_chain(). Some are 
updated in both. It's hard to follow which are set where.


I think recently_dead_tuples is updated incorrectly, for tuples that are 
part of a completely dead HOT chain. For example, imagine a hot chain 
with two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow 
the chain, see the DEAD tuple at the end of the chain, and mark both 
tuples for pruning. However, we already updated 'recently_dead_tuples' 
in the first loop, which is wrong if we remove the tuple.


Maybe that's the only bug like this, but I'm a little scared. Is there 
something we could do to make this simpler? Maybe move all the new work 
that we added to the first loop, into heap_prune_chain() ? Maybe 
introduce a few more helper heap_prune_record_*() functions, to update 
the flags and counters also for live and insert/delete-in-progress 
tuples and for dead line pointers? Something like 
heap_prune_record_live() and heap_prune_record_lp_dead().



The 'frz_conflict_horizon' stuff is still fuzzy to me. (Not necessarily
these patches's fault). This at least is wrong, because Max(a, b)
doesn't handle XID wraparound correctly:


   if (do_freeze)
   conflict_xid = 
Max(prstate.snapshotConflictHorizon,
  
presult->frz_conflict_horizon);
   else
   conflict_xid = prstate.snapshotConflictHorizon;


Then there's this in lazy_scan_prune():


   /* Using same cutoff when setting VM is now unnecessary */
   if (presult.all_frozen)
   presult.frz_conflict_horizon = InvalidTransactionId;

This does the right thing in the end, but if all the tuples are frozen
shouldn't frz_conflict_horizon already be InvalidTransactionId? The
comment says it's "newest xmin on the page", and if everything was
frozen, all xmins are FrozenTransactionId. In other words, that should
be moved to heap_page_prune_and_freeze() so that it doesn't lie to its
caller. Also, frz_conflict_horizon is only set correctly if
'all_frozen==true', would be good to mention that in the comments too.


Yes, this is a good point. I've spent some time swapping all of this
back into my head. I think we should change the names of all these
conflict horizon variables and introduce some local variables again.
In the attached patch, I've updated the name of the variable in
PruneFreezeResult to vm_conflict_horizon, as it is only used for
emitting a VM update record. Now, I don't set it until the end of
heap_page_prune_and_freeze(). It is only updated from
InvalidTransactionId if the page is not all frozen. As you say, if the
page is all frozen, there can be no conflict.


Makes sense.


I've also changed PruneState->snapshotConflictHorizon to
PruneState->latest_xid_removed.

And I introduced the local variables visibility_cutoff_xid and
frz_conflict_horizon. I think it is important we distinguish between
the latest xid pruned, the latest xmin of tuples frozen, and the
latest xid of all live tuples on the page.

Though we end up using visibility_cutoff_xid as the fre

Re: Recording whether Heap2/PRUNE records are from VACUUM or from opportunistic pruning (Was: Show various offset arrays for heap WAL records)

2024-03-25 Thread Heikki Linnakangas

On 09/12/2023 23:48, Peter Geoghegan wrote:

On Tue, Mar 21, 2023 at 3:37 PM Peter Geoghegan  wrote:

I think that we should do something like the attached, to completely
avoid this ambiguity. This patch adds a new XLOG_HEAP2 bit that's
similar to XLOG_HEAP_INIT_PAGE -- XLOG_HEAP2_BYVACUUM. This allows all
XLOG_HEAP2 record types to indicate that they took place during
VACUUM, by XOR'ing the flag with the record type/info when
XLogInsert() is called. For now this is only used by PRUNE records.
Tools like pg_walinspect will report a separate "Heap2/PRUNE+BYVACUUM"
record_type, as well as the unadorned Heap2/PRUNE record_type, which
we'll now know must have been opportunistic pruning.

The approach of using a bit in the style of the heapam init bit makes
sense to me, because the bit is available, and works in a way that is
minimally invasive. Also, one can imagine needing to resolve a similar
ambiguity in the future, when (say) opportunistic freezing is added.


Starting a new, dedicated thread to keep track of this in the CF app.

This patch bitrot. Attached is v2, rebased on top of HEAD.


I included changes like this in commit f83d709760 ("Merge prune, freeze 
and vacuum WAL record formats"). Marking this as Committed in the 
commitfest.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Combine Prune and Freeze records emitted by vacuum

2024-03-25 Thread Heikki Linnakangas

On 24/03/2024 21:55, Melanie Plageman wrote:

On Sat, Mar 23, 2024 at 01:09:30AM +0200, Heikki Linnakangas wrote:

On 20/03/2024 21:17, Melanie Plageman wrote:
There is another patch in the commitfest that touches this area: "Recording
whether Heap2/PRUNE records are from VACUUM or from opportunistic pruning"
[1]. That actually goes in the opposite direction than this patch. That
patch wants to add more information, to show whether a record was emitted by
VACUUM or on-access pruning, while this patch makes the freezing and
VACUUM's 2nd phase records also look the same. We could easily add more
flags to xl_heap_prune to distinguish them. Or assign different xl_info code
for them, like that other patch proposed. But I don't think that needs to
block this patch, that can be added as a separate patch.

[1] 
https://www.postgresql.org/message-id/cah2-wzmsevhox+hjpfmqgcxwwdgnzp0r9f+vbnpok6tgxmp...@mail.gmail.com


I think it would be very helpful to distinguish amongst vacuum pass 1,
2, and on-access pruning. I often want to know what did most of the
pruning -- and I could see also wanting to know if the first or second
vacuum pass was responsible for removing the tuples. I agree it could be
done separately, but it would be very helpful to have as soon as
possible now that the record type will be the same for all three.


Ok, I used separate 'info' codes for records generated on on-access 
pruning and vacuum's 1st and 2nd pass. Similar to Peter's patch on that 
other thread, except that I didn't reserve the whole high bit for this, 
but used three different 'info' codes. Freezing uses the same 
XLOG_HEAP2_PRUNE_VACUUM_SCAN as the pruning in vacuum's 1st pass. You 
can distinguish them based on whether the record has nfrozen > 0, and 
with the rest of the patches, they will be merged anyway.



 From 042185d3de14dcb7088bbe50e9c64e365ac42c2a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 22 Mar 2024 23:10:22 +0200
Subject: [PATCH v6] Merge prune, freeze and vacuum WAL record formats
  
  /*

- * Handles XLOG_HEAP2_PRUNE record type.
- *
- * Acquires a full cleanup lock.
+ * Replay XLOG_HEAP2_PRUNE_FREEZE record.
   */
  static void
-heap_xlog_prune(XLogReaderState *record)
+heap_xlog_prune_freeze(XLogReaderState *record)
  {
XLogRecPtr  lsn = record->EndRecPtr;
-   xl_heap_prune *xlrec = (xl_heap_prune *) XLogRecGetData(record);
+   char   *ptr;
+   xl_heap_prune *xlrec;
Buffer  buffer;
RelFileLocator rlocator;
BlockNumber blkno;
XLogRedoAction action;
  
  	XLogRecGetBlockTag(record, 0, , NULL, );

+   ptr = XLogRecGetData(record);


I don't love having two different pointers that alias each other and we
don't know which one is for what. Perhaps we could memcpy xlrec like in
my attached diff (log_updates.diff). It also might perform slightly
better than accessing flags through a xl_heap_prune
* -- since it wouldn't be doing pointer dereferencing.


Ok.


/*
-* We're about to remove tuples. In Hot Standby mode, ensure that 
there's
-* no queries running for which the removed tuples are still visible.
+* We will take an ordinary exclusive lock or a cleanup lock depending 
on
+* whether the XLHP_CLEANUP_LOCK flag is set.  With an ordinary 
exclusive
+* lock, we better not be doing anything that requires moving existing
+* tuple data.
 */
-   if (InHotStandby)
-   
ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-  
 xlrec->isCatalogRel,
+   Assert((xlrec->flags & XLHP_CLEANUP_LOCK) != 0 ||
+  (xlrec->flags & (XLHP_HAS_REDIRECTIONS | 
XLHP_HAS_DEAD_ITEMS)) == 0);
+
+   /*
+* We are about to remove and/or freeze tuples.  In Hot Standby mode,
+* ensure that there are no queries running for which the removed tuples
+* are still visible or which still consider the frozen xids as running.
+* The conflict horizon XID comes after xl_heap_prune.
+*/
+   if (InHotStandby && (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
+   {


My attached patch has a TODO here for the comment. It sticks out that
the serialization and deserialization conditions are different for the
snapshot conflict horizon. We don't deserialize if InHotStandby is
false. That's still correct now because we don't write anything else to
the main data chunk afterward. But, if someone were to add another
member after snapshot_conflict_horizon, they would want to know to
deserialize snapshot_conflict_horizon first even if InHotStandby is
false.


Good point. Fixed so that 'snapshot_conflict_horizon' is deserialized 
even if !InHotStandby. A memcpy is cheap, and is probably optimized away 
by the compiler anyway.



+   TransactionId snapshot_conflict_horiz

Re: Combine Prune and Freeze records emitted by vacuum

2024-03-22 Thread Heikki Linnakangas

On 20/03/2024 21:17, Melanie Plageman wrote:

Attached patch changes-for-0001.patch has a bunch of updated comments --
especially for heapam_xlog.h (including my promised diagram). And a few
suggestions (mostly changes that I should have made before).


New version of these WAL format changes attached. Squashed to one patch now.


+   // TODO: should we avoid this if we only froze? heap_xlog_freeze() 
doesn't
+   // do it


Ah yes, that makes sense, did that.


In the final commit message, I think it is worth calling out that all of
those freeze logging functions heap_log_freeze_eq/cmp/etc are lifted and
shifted from one file to another. When I am reviewing a big diff, it is
always helpful to know where I need to review line-by-line.


Done.


 From 5d6fc2ffbdd839e0b69242af16446a46cf6a2dc7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 20 Mar 2024 13:49:59 +0200
Subject: [PATCH v5 04/26] 'nplans' is a pointer

I'm surprised the compiler didn't warn about this


oops. while looking at this, I noticed that the asserts I added that
nredirected > 0, ndead > 0, and nunused > 0 have the same problem.


Good catch, fixed.


- remove xlhp_conflict_horizon and store a TransactionId directly. A
   separate struct would make sense if we needed to store anything else
   there, but for now it just seems like more code.


makes sense. I just want to make sure heapam_xlog.h makes it extra clear
that this is happening. I see your comment addition. If you combine it
with my comment additions in the attached patch for 0001, hopefully that
makes it clear enough.


Thanks. I spent more time on the comments throughout the patch. And one 
notable code change: I replaced the XLHP_LP_TRUNCATE_ONLY flag with 
XLHP_CLEANUP_LOCK. XLHP_CLEANUP_LOCK directly indicates if you need a 
cleanup lock to replay the record. It must always be set when 
XLHP_HAS_REDIRECTIONS or XLHP_HAS_DEAD_ITEMS is set, because replaying 
those always needs a cleanup lock. That felt easier to document and 
understand than XLHP_LP_TRUNCATE_ONLY.



 From 8af186ee9dd8c7dc20f37a69b34cab7b95faa43b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 20 Mar 2024 14:03:06 +0200
Subject: [PATCH v5 07/26] Add comment to log_heap_prune_and_freeze().

XXX: This should be rewritten, but I tried to at least list some
important points.


Are you thinking that it needs to mention more things or that the things
it mentions need more detail?


I previously just quickly jotted down things that seemed worth 
mentioning in the comment. It was not so bad actually, but I reworded it 
a little.



 From b26e36ba8614d907a6e15810ed4f684f8f628dd2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 20 Mar 2024 14:53:31 +0200
Subject: [PATCH v5 08/26] minor refactoring in log_heap_prune_and_freeze()

Mostly to make local variables more tightly-scoped.


So, I don't think you can move those sub-records into the tighter scope.
If you run tests with this applied, you'll see it crashes and a lot of
the data in the record is wrong. If you move the sub-record declarations
out to a wider scope, the tests pass.

The WAL record data isn't actually copied into the buffer until

recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);

after registering everything.
So all of those sub-records you made are out of scope the time it tries
to copy from them.

I spent the better part of a day last week trying to figure out what was
happening after I did the exact same thing. I must say that I found the
xloginsert API incredibly unhelpful on this point.


Oops. I had that in mind and that was actually why I moved the 
XLogRegisterData() call to the end of the function, because I found it 
confusing to register the struct before filling it in completely, even 
though it works perfectly fine. But then I missed it anyway when I moved 
the local variables. I added a brief comment on that.



I would like to review the rest of the suggested changes in this patch
after we fix the issue I just mentioned.


Thanks, review is appreciated. I feel this is ready now, so barring any 
big new issues, I plan to commit this early next week.


There is another patch in the commitfest that touches this area: 
"Recording whether Heap2/PRUNE records are from VACUUM or from 
opportunistic pruning" [1]. That actually goes in the opposite direction 
than this patch. That patch wants to add more information, to show 
whether a record was emitted by VACUUM or on-access pruning, while this 
patch makes the freezing and VACUUM's 2nd phase records also look the 
same. We could easily add more flags to xl_heap_prune to distinguish 
them. Or assign different xl_info code for them, like that other patch 
proposed. But I don't think that needs to block this patch, that can be 
added as a separate patch.


[1] 
https://www.postgresql.org/message-id/cah2-wzmsevhox+hjpfmqgcxwwdgnzp0r9f+vbnpok6tgxmp...@mail.gmail.com


--
Heikki Linnakangas
Neon (http

Re: Combine Prune and Freeze records emitted by vacuum

2024-03-21 Thread Heikki Linnakangas

On 20/03/2024 23:03, Melanie Plageman wrote:

On Wed, Mar 20, 2024 at 03:15:49PM +0200, Heikki Linnakangas wrote:

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ee0eca8ae02..b2015f5a1ac 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -202,14 +202,17 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
int ndeleted;   /* Number of tuples 
deleted from the page */
int nnewlpdead; /* Number of newly 
LP_DEAD items */
+   int nfrozen;


Let's add a comment after int nfrozen like
/* Number of newly frozen tuples */


+
boolall_visible;/* Whether or not the page is all 
visible */
boolhastup; /* Does page make rel 
truncation unsafe */
  
+	/* The following fields are only set if freezing */


So, all_frozen will be set correctly if we are even considering freezing
(if pagefrz is passed). all_frozen will be true even if we didn't freeze
anything if the page is all-frozen and can be set as such in the VM. And
it will be false if the page is not all-frozen. So, maybe we say
"considering freezing".

But, I'm glad you thought to call out which of these fields will make
sense to the caller.

Also, maybe we should just name the members to which this applies. It is
a bit confusing that I can't tell if the comment also refers to the
other members following it (lpdead_items and deadoffsets) -- which it
doesn't.


Right, sorry, I spotted the general issue that it's not clear which 
fields are valid when. I added that comment to remind about that, but I 
then forgot about it.


In heap_page_prune_and_freeze(), we now do some extra work on each live 
tuple, to set the all_visible_except_removable correctly. And also to 
update live_tuples, recently_dead_tuples and hastup. When we're not 
freezing, that's a waste of cycles, the caller doesn't care. I hope it's 
enough that it doesn't matter, but is it?


The first commit (after the WAL format changes) changes the all-visible 
check to use GlobalVisTestIsRemovableXid. The commit message says that 
it's because we don't have 'cutoffs' available, but we only care about 
that when we're freezing, and when we're freezing, we actually do have 
'cutoffs' in HeapPageFreeze. Using GlobalVisTestIsRemovableXid seems 
sensible anyway, because that's what we use in 
heap_prune_satisfies_vacuum() too, but just wanted to point that out.



The 'frz_conflict_horizon' stuff is still fuzzy to me. (Not necessarily 
these patches's fault). This at least is wrong, because Max(a, b) 
doesn't handle XID wraparound correctly:



if (do_freeze)
conflict_xid = 
Max(prstate.snapshotConflictHorizon,
   
presult->frz_conflict_horizon);
else
conflict_xid = prstate.snapshotConflictHorizon;


Then there's this in lazy_scan_prune():


/* Using same cutoff when setting VM is now unnecessary */
if (presult.all_frozen)
presult.frz_conflict_horizon = InvalidTransactionId;
This does the right thing in the end, but if all the tuples are frozen 
shouldn't frz_conflict_horizon already be InvalidTransactionId? The 
comment says it's "newest xmin on the page", and if everything was 
frozen, all xmins are FrozenTransactionId. In other words, that should 
be moved to heap_page_prune_and_freeze() so that it doesn't lie to its 
caller. Also, frz_conflict_horizon is only set correctly if 
'all_frozen==true', would be good to mention that in the comments too.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2024-03-20 Thread Heikki Linnakangas

On 20/03/2024 07:37, Tom Lane wrote:

A couple of buildfarm animals don't like these tests:

Assert(child_type >= 0 && child_type < lengthof(child_process_kinds));

for example

  ayu   | 2024-03-19 13:08:05 | launch_backend.c:211:39: warning: 
comparison of constant 16 with expression of type 'BackendType' (aka 'enum 
BackendType') is always true [-Wtautological-constant-out-of-range-compare]
  ayu   | 2024-03-19 13:08:05 | launch_backend.c:233:39: warning: 
comparison of constant 16 with expression of type 'BackendType' (aka 'enum 
BackendType') is always true [-Wtautological-constant-out-of-range-compare]

I'm not real sure why it's moaning about the "<" check but not the
">= 0" check, which ought to be equally tautological given the
assumption that the input is a valid member of the enum.  But
in any case, exactly how much value do these assertions carry?
If you're intent on keeping them, perhaps casting child_type to
int here would suppress the warnings.  But personally I think
I'd lose the Asserts.


Yeah, it's not a very valuable assertion. Removed, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Streaming I/O, vectored I/O (WIP)

2024-03-19 Thread Heikki Linnakangas

Some quick comments:

On 12/03/2024 15:02, Thomas Munro wrote:

src/backend/storage/aio/streaming_read.c
src/include/storage/streaming_read.h


Standard file header comments missing.

It would be nice to have a comment at the top of streaming_read.c, 
explaining at a high level how the circular buffer, lookahead and all 
that works. Maybe even some diagrams.


For example, what is head and what is tail? Before reading the code, I 
assumed that 'head' was the next block range to return in 
pg_streaming_read_buffer_get_next(). But I think it's actually the other 
way round?



/*
 * Create a new streaming read object that can be used to perform the
 * equivalent of a series of ReadBuffer() calls for one fork of one relation.
 * Internally, it generates larger vectored reads where possible by looking
 * ahead.
 */
PgStreamingRead *
pg_streaming_read_buffer_alloc(int flags,
   void *pgsr_private,
   size_t 
per_buffer_data_size,
   BufferAccessStrategy 
strategy,
   
BufferManagerRelation bmr,
   ForkNumber forknum,
   
PgStreamingReadBufferCB next_block_cb)


I'm not a fan of the name, especially the 'alloc' part. Yeah, most of 
the work it does is memory allocation. But I'd suggest something like 
'pg_streaming_read_begin' instead.


Do we really need the pg_ prefix in these?


Buffer
pg_streaming_read_buffer_get_next(PgStreamingRead *pgsr, void **per_buffer_data)


Maybe 'pg_streaming_read_next_buffer' or just 'pg_streaming_read_next', 
for a shorter name.





/*
 * pgsr->ranges is a circular buffer.  When it is empty, head == tail.
 * When it is full, there is an empty element between head and tail.  
Head
 * can also be empty (nblocks == 0), therefore we need two extra 
elements
 * for non-occupied ranges, on top of max_pinned_buffers to allow for 
the
 * maxmimum possible number of occupied ranges of the smallest possible
 * size of one.
 */
size = max_pinned_buffers + 2;


I didn't understand this explanation for why it's + 2.


/*
 * Skip the initial ramp-up phase if the caller says we're going to be
 * reading the whole relation.  This way we start out doing full-sized
 * reads.
 */
if (flags & PGSR_FLAG_FULL)
pgsr->distance = Min(MAX_BUFFERS_PER_TRANSFER, 
pgsr->max_pinned_buffers);
else
pgsr->distance = 1;


Should this be "Max(MAX_BUFFERS_PER_TRANSFER, 
pgsr->max_pinned_buffers)"? max_pinned_buffers cannot be smaller than 
MAX_BUFFERS_PER_TRANSFER though, given how it's initialized earlier. So 
perhaps just 'pgsr->distance = pgsr->max_pinned_buffers' ?


--
Heikki Linnakangas
Neon (https://neon.tech)




Re: BitmapHeapScan streaming read user and prelim refactoring

2024-03-19 Thread Heikki Linnakangas

On 18/03/2024 17:19, Melanie Plageman wrote:

I've attached v7 rebased over this commit.


Thanks!


v7-0001-BitmapHeapScan-begin-scan-after-bitmap-creation.patch


If we delayed table_beginscan_bm() call further, after starting the TBM 
iterator, we could skip it altogether when the iterator is empty.


That's a further improvement, doesn't need to be part of this patch set. 
Just caught my eye while reading this.



v7-0003-Push-BitmapHeapScan-skip-fetch-optimization-into-.patch


I suggest to avoid the double negative with SO_CAN_SKIP_FETCH, and call 
the flag e.g. SO_NEED_TUPLE.



As yet another preliminary patch before the streaming read API, it would 
be nice to move the prefetching code to heapam.c too.


What's the point of having separate table_scan_bitmap_next_block() and 
table_scan_bitmap_next_tuple() functions anymore? The AM owns the TBM 
iterator now. The executor node updates the lossy/exact page counts, but 
that's the only per-page thing it does now.



/*
 * If this is the first scan of the underlying table, create 
the table
 * scan descriptor and begin the scan.
 */
if (!scan)
{
uint32  extra_flags = 0;

/*
 * We can potentially skip fetching heap pages if we do 
not need
 * any columns of the table, either for checking 
non-indexable
 * quals or for returning data.  This test is a bit 
simplistic, as
 * it checks the stronger condition that there's no 
qual or return
 * tlist at all. But in most cases it's probably not 
worth working
 * harder than that.
 */
if (node->ss.ps.plan->qual == NIL && 
node->ss.ps.plan->targetlist == NIL)
extra_flags |= SO_CAN_SKIP_FETCH;

scan = node->ss.ss_currentScanDesc = table_beginscan_bm(
   
 
node->ss.ss_currentRelation,
  
  
node->ss.ps.state->es_snapshot,

0,

NULL,

extra_flags);
}

scan->tbmiterator = tbmiterator;
scan->shared_tbmiterator = shared_tbmiterator;


How about passing the iterator as an argument to table_beginscan_bm()? 
You'd then need some other function to change the iterator on rescan, 
though. Not sure what exactly to do here, but feels that this part of 
the API is not fully thought-out. Needs comments at least, to explain 
who sets tbmiterator / shared_tbmiterator and when. For comparison, for 
a TID scan there's a separate scan_set_tidrange() table AM function. 
Maybe follow that example and introduce scan_set_tbm_iterator().


It's bit awkward to have separate tbmiterator and shared_tbmiterator 
fields. Could regular and shared iterators be merged, or wrapped under a 
common interface?


--
Heikki Linnakangas
Neon (https://neon.tech)




Re: Possibility to disable `ALTER SYSTEM`

2024-03-19 Thread Heikki Linnakangas
I want to remind everyone of this from Gabriele's first message that 
started this thread:



At the moment, a possible workaround is that `ALTER SYSTEM` can be blocked
by making the postgresql.auto.conf read only, but the returned message is
misleading and that’s certainly bad user experience (which is very
important in a cloud native environment):


```
postgres=# ALTER SYSTEM SET wal_level TO minimal;
ERROR:  could not open file "postgresql.auto.conf": Permission denied
```


I think making the config file read-only is a fine solution. If you 
don't want postgres to mess with the config files, forbid it with the 
permission system.


Problems with pg_rewind, pg_basebackup were mentioned with that 
approach. I think if you want the config files to be managed outside 
PostgreSQL, by kubernetes, patroni or whatever, it would be good for 
them to be read-only to the postgres user anyway, even if we had a 
mechanism to disable ALTER SYSTEM. So it would be good to fix the 
problems with those tools anyway.


The error message is not great, I agree with that. Can we improve it? 
Maybe just add a HINT like this:


postgres=# ALTER SYSTEM SET wal_level TO minimal;
ERROR:  could not open file "postgresql.auto.conf" for writing: 
Permission denied

HINT:  Configuration might be managed outside PostgreSQL


Perhaps we could make that even better with a GUC though. I propose a 
GUC called 'configuration_managed_externally = true / false". If you set 
it to true, we prevent ALTER SYSTEM and make the error message more 
definitive:


postgres=# ALTER SYSTEM SET wal_level TO minimal;
ERROR:  configuration is managed externally

As a bonus, if that GUC is set, we could even check at server startup 
that all the configuration files are not writable by the postgres user, 
and print a warning or refuse to start up if they are.


(Another way to read this proposal is to rename the GUC that's been 
discussed in this thread to 'configuration_managed_externally'. That 
makes it look less like a security feature, and describes the intended 
use case.)


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: BitmapHeapScan streaming read user and prelim refactoring

2024-03-18 Thread Heikki Linnakangas

On 14/02/2024 21:42, Andres Freund wrote:

On 2024-02-13 18:11:25 -0500, Melanie Plageman wrote:

patch 0004 is, I think, a bug fix. see [2].


I'd not quite call it a bugfix, it's not like it leads to wrong
behaviour. Seems more like an optimization. But whatever :)


It sure looks like bug to me, albeit a very minor one. Certainly not an 
optimization, it doesn't affect performance in any way, only what 
EXPLAIN reports. So committed and backported that to all supported branches.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2024-03-18 Thread Heikki Linnakangas

On 13/03/2024 09:30, Heikki Linnakangas wrote:

Attached is a new version of the remaining patches.


Committed, with some final cosmetic cleanups. Thanks everyone!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Combine Prune and Freeze records emitted by vacuum

2024-03-17 Thread Heikki Linnakangas
ill setting hint bit generate FPI" check as discussed 
above


These patches are in a very rough WIP state, but I wanted to share 
early. I haven't done much testing, and I'm not wedded to these changes, 
but I think they make it more readable. Please include / squash in the 
patch set if you agree with them.


Please also take a look at the comments I marked with HEIKKI or FIXME, 
in the patches and commit messages.


I'll wait for a new version from you before reviewing more.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 622620a7875ae8c1626e9cd118156e0c734d44ed Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sun, 17 Mar 2024 22:52:28 +0200
Subject: [PATCH v3heikki 1/4] Inline heap_frz_conflict_horizon() to the
 caller.

FIXME: This frz_conflict_horizon business looks fishy to me. We have:
- local frz_conflict_horizon variable,
- presult->frz_conflict_horizon, and
- prstate.snapshotConflictHorizon

should we really have all three, and what are the differences?
---
 src/backend/access/heap/pruneheap.c  | 17 ++---
 src/backend/access/heap/vacuumlazy.c | 27 ---
 src/include/access/heapam.h  |  2 --
 3 files changed, 14 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d3643b1ecc6..f4f5468e144 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -275,7 +276,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		hint_bit_fpi;
 	bool		prune_fpi = false;
 	int64		fpi_before = pgWalUsage.wal_fpi;
-	TransactionId frz_conflict_horizon = InvalidTransactionId;
 
 	/*
 	 * One entry for every tuple that we may freeze.
@@ -691,7 +691,18 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	if (do_freeze)
 	{
-		frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
+		/*
+		 * We can use frz_conflict_horizon as our cutoff for conflicts when
+		 * the whole page is eligible to become all-frozen in the VM once
+		 * we're done with it.  Otherwise we generate a conservative cutoff by
+		 * stepping back from OldestXmin.
+		 */
+		if (!(presult->all_visible_except_removable && presult->all_frozen))
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+			TransactionIdRetreat(presult->frz_conflict_horizon);
+		}
 		heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
 	}
 	else if ((!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
@@ -740,7 +751,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 		if (do_freeze)
 			xlrec.snapshotConflictHorizon = Max(prstate.snapshotConflictHorizon,
-frz_conflict_horizon);
+presult->frz_conflict_horizon);
 		else
 			xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4b45e8be1ad..8d3723faf3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1373,33 +1373,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * Determine the snapshotConflictHorizon for freezing. Must only be called
- * after pruning and determining if the page is freezable.
- */
-TransactionId
-heap_frz_conflict_horizon(PruneFreezeResult *presult, HeapPageFreeze *pagefrz)
-{
-	TransactionId result;
-
-	/*
-	 * We can use frz_conflict_horizon as our cutoff for conflicts when the
-	 * whole page is eligible to become all-frozen in the VM once we're done
-	 * with it.  Otherwise we generate a conservative cutoff by stepping back
-	 * from OldestXmin.
-	 */
-	if (presult->all_visible_except_removable && presult->all_frozen)
-		result = presult->frz_conflict_horizon;
-	else
-	{
-		/* Avoids false conflicts when hot_standby_feedback in use */
-		result = pagefrz->cutoffs->OldestXmin;
-		TransactionIdRetreat(result);
-	}
-
-	return result;
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c36623f53bd..4e17347e625 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -290,8 +290,6 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 
 extern void heap_inplace_update(Relation relation, HeapTuple tuple);
 
-extern TransactionId heap_frz_conflict_horizon(PruneFreezeResult *presult,
-			   HeapPageFreeze *pagefrz);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,

Re: Weird test mixup

2024-03-15 Thread Heikki Linnakangas

On 15/03/2024 16:00, Tom Lane wrote:

Heikki Linnakangas  writes:

On 15/03/2024 13:09, Heikki Linnakangas wrote:

I committed a patch to do that, to put out the fire.



That's turning the buildfarm quite red. Many, but not all animals are
failing like this:


It may be even worse than it appears from the buildfarm status page.
My animals were stuck in infinite loops that required a manual "kill"
to get out of, and it's reasonable to assume there are others that
will require owner intervention.  Why would this test have done that,
if the module failed to load?


The gin_incomplete_split test inserts rows until it hits the injection 
point, at page split. There is a backstop, it should give up after 1 
iterations, but that was broken. Fixed that, thanks for the report!


Hmm, don't we have any timeout that would kill tests if they get stuck?

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Weird test mixup

2024-03-15 Thread Heikki Linnakangas

On 15/03/2024 14:10, Heikki Linnakangas wrote:

On 15/03/2024 13:09, Heikki Linnakangas wrote:

I committed a patch to do that, to put out the fire.


That's turning the buildfarm quite red. Many, but not all animals are
failing like this:


--- 
/home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/src/test/modules/injection_points/expected/injection_points.out
 2024-03-15 12:41:16.363286975 +0100
+++ 
/home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/src/test/modules/injection_points/results/injection_points.out
  2024-03-15 12:53:11.528159615 +0100
@@ -1,118 +1,111 @@
  CREATE EXTENSION injection_points;
+ERROR:  extension "injection_points" is not available
+DETAIL:  Could not open extension control file 
"/home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/tmp_install/home/buildfarm/hippopotamus/buildroot/HEAD/inst/share/postgresql/extension/injection_points.control":
 No such file or directory.
+HINT:  The extension must first be installed on the system where PostgreSQL is 
running.
...


Looks like adding NO_INSTALLCHECK somehow affected how the modules are
installed in tmp_install. I'll investigate..


I think this is a bug in the buildfarm client. In the make_misc_check 
step, it does this (reduced to just the interesting parts):



# run the modules that can't be run with installcheck
sub make_misc_check
{
...
my @dirs = glob("$pgsql/src/test/modules/* $pgsql/contrib/*");
foreach my $dir (@dirs)
{
next unless -e "$dir/Makefile";
my $makefile = file_contents("$dir/Makefile");
next unless $makefile =~ /^NO_INSTALLCHECK/m;
my $test = basename($dir);

# skip redundant TAP tests which are called elsewhere
my @out = run_log("cd $dir && $make $instflags TAP_TESTS= 
check");
...
}


So it scans src/test/modules, and runs "make check" for all 
subdirectories that have NO_INSTALLCHECK in the makefile. But the 
injection fault tests are also conditional on the 
enable_injection_points in the parent Makefile:



ifeq ($(enable_injection_points),yes)
SUBDIRS += injection_points gin
else
ALWAYS_SUBDIRS += injection_points gin
endif


The buildfarm client doesn't pay any attention to that, and runs the 
test anyway.


I committed an ugly hack to the subdirectory Makefiles, to turn "make 
check" into a no-op if injection points are disabled. Normally when you 
run "make check" at the parent level, it doesn't even recurse to the 
directories, but this works around the buildfarm script. I hope...


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Weird test mixup

2024-03-15 Thread Heikki Linnakangas

On 15/03/2024 13:09, Heikki Linnakangas wrote:

On 15/03/2024 01:13, Tom Lane wrote:

Michael Paquier  writes:

Or we could just disable runningcheck because of the concurrency
requirement in this test.  The test would still be able to run, just
less times.


No, actually we *must* mark all these tests NO_INSTALLCHECK if we
stick with the current definition of injection points.  The point
of installcheck mode is that the tests are supposed to be safe to
run in a live installation.  Side-effects occurring in other
databases are completely not OK.


I committed a patch to do that, to put out the fire.


That's turning the buildfarm quite red. Many, but not all animals are 
failing like this:



--- 
/home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/src/test/modules/injection_points/expected/injection_points.out
 2024-03-15 12:41:16.363286975 +0100
+++ 
/home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/src/test/modules/injection_points/results/injection_points.out
  2024-03-15 12:53:11.528159615 +0100
@@ -1,118 +1,111 @@
 CREATE EXTENSION injection_points;
+ERROR:  extension "injection_points" is not available
+DETAIL:  Could not open extension control file 
"/home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/tmp_install/home/buildfarm/hippopotamus/buildroot/HEAD/inst/share/postgresql/extension/injection_points.control":
 No such file or directory.
+HINT:  The extension must first be installed on the system where PostgreSQL is 
running.
... 


Looks like adding NO_INSTALLCHECK somehow affected how the modules are 
installed in tmp_install. I'll investigate..


--
Heikki Linnakangas
Neon (https://neon.tech)




Re: Weird test mixup

2024-03-15 Thread Heikki Linnakangas

On 15/03/2024 01:13, Tom Lane wrote:

Michael Paquier  writes:

Or we could just disable runningcheck because of the concurrency
requirement in this test.  The test would still be able to run, just
less times.


No, actually we *must* mark all these tests NO_INSTALLCHECK if we
stick with the current definition of injection points.  The point
of installcheck mode is that the tests are supposed to be safe to
run in a live installation.  Side-effects occurring in other
databases are completely not OK.


I committed a patch to do that, to put out the fire.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Weird test mixup

2024-03-15 Thread Heikki Linnakangas

On 15/03/2024 09:39, Michael Paquier wrote:

On Thu, Mar 14, 2024 at 07:13:53PM -0400, Tom Lane wrote:

I can see that some tests would want to be able to inject code
cluster-wide, but I bet that's going to be a small minority.
I suggest that we invent a notion of "global" vs "local"
injection points, where a "local" one only fires in the DB it
was defined in.  Then only tests that require a global injection
point need to be NO_INSTALLCHECK.


Attached is a POC of what could be done.  I have extended the module
injection_points so as it is possible to register what I'm calling a
"condition" in the module that can be defined with a new SQL function.

The condition is stored in shared memory with the point name, then at
runtime the conditions are cross-checked in the callbacks.  With the
interface of this patch, the condition should be registered *before* a
point is attached, but this stuff could also be written so as
injection_points_attach() takes an optional argument with a database
name.  Or this could use a different, new SQL function, say a
injection_points_attach_local() that registers a condition with
MyDatabaseId on top of attaching the point, making the whole happening
while holding once the spinlock of the shmem state for the module.


For the gin test, a single "SELECT injection_points_attach_local()" at 
the top of the test file would be most convenient.


If I have to do "SELECT 
injection_points_condition('gin-finish-incomplete-split', :'datname');" 
for every injection point in the test, I will surely forget it sometimes.


In the 'gin' test, they could actually be scoped to the same backend.


Wrt. the spinlock and shared memory handling, I think this would be 
simpler if you could pass some payload in the InjectionPointAttach() 
call, which would be passed back to the callback function:


 void
 InjectionPointAttach(const char *name,
 const char *library,
-const char *function)
+const char *function,
+uint64 payload)

In this case, the payload would be the "slot index" in shared memory.

Or perhaps always allocate, say, 1024 bytes of working area for every 
attached injection point that the test module can use any way it wants. 
Like for storing extra conditions, or for the wakeup counter stuff in 
injection_wait(). A fixed size working area is a little crude, but would 
be very handy in practice.



By the way, modules/gin/ was missing missing a detach, so the test was
not repeatable with successive installchecks.


Oops.

It would be nice to automatically detach all the injection points on 
process exit. You wouldn't always want that, but I think most tests hold 
a session open throughout the test, and for those it would be handy.


--
Heikki Linnakangas
Neon (https://neon.tech)




Weird test mixup

2024-03-14 Thread Heikki Linnakangas
I got a weird test failure while testing my forking refactor patches on 
Cirrus CI 
(https://cirrus-ci.com/task/5880724448870400?logs=test_running#L121):



[16:52:39.753] Summary of Failures:
[16:52:39.753] 
[16:52:39.753] 66/73 postgresql:intarray-running / intarray-running/regress   ERROR 6.27s   exit status 1
[16:52:39.753] 
[16:52:39.753] Ok: 72  
[16:52:39.753] Expected Fail:  0   
[16:52:39.753] Fail:   1   
[16:52:39.753] Unexpected Pass:0   
[16:52:39.753] Skipped:0   
[16:52:39.753] Timeout:0   
[16:52:39.753] 
[16:52:39.753] Full log written to /tmp/cirrus-ci-build/build/meson-logs/testlog-running.txt


And:


diff -U3 /tmp/cirrus-ci-build/contrib/intarray/expected/_int.out 
/tmp/cirrus-ci-build/build/testrun/intarray-running/regress/results/_int.out
--- /tmp/cirrus-ci-build/contrib/intarray/expected/_int.out 2024-03-14 
16:48:48.690367000 +
+++ 
/tmp/cirrus-ci-build/build/testrun/intarray-running/regress/results/_int.out
2024-03-14 16:52:05.759444000 +
@@ -804,6 +804,7 @@
 
 DROP INDEX text_idx;

 CREATE INDEX text_idx on test__int using gin ( a gin__int_ops );
+ERROR:  error triggered for injection point gin-leave-leaf-split-incomplete
 SELECT count(*) from test__int WHERE a && '{23,50}';
  count 
 ---

@@ -877,6 +878,7 @@
 (1 row)
 
 DROP INDEX text_idx;

+ERROR:  index "text_idx" does not exist
 -- Repeat the same queries with an extended data set. The data set is the
 -- same that we used before, except that each element in the array is
 -- repeated three times, offset by 1000 and 2000. For example, {1, 5}


Somehow the 'gin-leave-leaf-split-incomplete' injection point was active 
in the 'intarray' test. That makes no sense. That injection point is 
only used by the test in src/test/modules/gin/. Perhaps that ran at the 
same time as the intarray test? But they run in separate instances, with 
different data directories. And the 'gin' test passed.


I'm completely stumped. Anyone have a theory?

--
Heikki Linnakangas
Neon (https://neon.tech)




  1   2   3   4   5   6   7   8   9   10   >