date:20240405

Re: remaining sql/json patches

2024-04-05 Thread Amit Langote

Hi Alexander,

On Fri, Apr 5, 2024 at 3:00 PM Alexander Lakhin  wrote:
>
> Hello Amit,
>
> 04.04.2024 15:02, Amit Langote wrote:
> > Pushed after fixing these and a few other issues.  I didn't include
> > the testing function you proposed in your other email.  It sounds
> > useful for testing locally but will need some work before we can
> > include it in the tree.
> >
> > I'll post the rebased 0002 tomorrow after addressing your comments.
>
> Please look at an assertion failure:
> TRAP: failed Assert("count <= tupdesc->natts"), File: "parse_relation.c", 
> Line: 3048, PID: 1325146
>
> triggered by the following query:
> SELECT * FROM JSON_TABLE('0', '$' COLUMNS (js int PATH '$')),
>COALESCE(row(1)) AS (a int, b int);
>
> Without JSON_TABLE() I get:
> ERROR:  function return row and query-specified return row do not match
> DETAIL:  Returned row contains 1 attribute, but query expects 2.

Thanks for the report.

Seems like it might be a pre-existing issue, because I can also
reproduce the crash with:

SELECT * FROM COALESCE(row(1)) AS (a int, b int);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Backtrace:

#0  __pthread_kill_implementation (threadid=281472845250592,
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x806c4334 in __pthread_kill_internal (signo=6,
threadid=) at pthread_kill.c:78
#2  0x8067c73c in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3  0x80669034 in __GI_abort () at abort.c:79
#4  0x00ad9d4c in ExceptionalCondition (conditionName=0xcbb368
"!(tupdesc->natts >= colcount)", errorType=0xcbb278 "FailedAssertion",
fileName=0xcbb2c8 "nodeFunctionscan.c",
lineNumber=379) at assert.c:54
#5  0x0073edec in ExecInitFunctionScan (node=0x293d4ed0,
estate=0x293d51b8, eflags=16) at nodeFunctionscan.c:379
#6  0x00724bc4 in ExecInitNode (node=0x293d4ed0,
estate=0x293d51b8, eflags=16) at execProcnode.c:248
#7  0x0071b1cc in InitPlan (queryDesc=0x292f5d78, eflags=16)
at execMain.c:1006
#8  0x00719f6c in standard_ExecutorStart
(queryDesc=0x292f5d78, eflags=16) at execMain.c:252
#9  0x00719cac in ExecutorStart (queryDesc=0x292f5d78,
eflags=0) at execMain.c:134
#10 0x00945520 in PortalStart (portal=0x29399458, params=0x0,
eflags=0, snapshot=0x0) at pquery.c:527
#11 0x0093ee50 in exec_simple_query (query_string=0x29332d38
"SELECT * FROM COALESCE(row(1)) AS (a int, b int);") at
postgres.c:1175
#12 0x00943cb8 in PostgresMain (argc=1, argv=0x2935d610,
dbname=0x2935d450 "postgres", username=0x2935d430 "amit") at
postgres.c:4297
#13 0x0087e978 in BackendRun (port=0x29356c00) at postmaster.c:4517
#14 0x0087e0bc in BackendStartup (port=0x29356c00) at postmaster.c:4200
#15 0x00879638 in ServerLoop () at postmaster.c:1725
#16 0x00878eb4 in PostmasterMain (argc=3, argv=0x292eeac0) at
postmaster.c:1398
#17 0x00791db8 in main (argc=3, argv=0x292eeac0) at main.c:228

Backtrace looks a bit different with a query similar to yours:

SELECT * FROM generate_series(1, 1), COALESCE(row(1)) AS (a int, b int);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

#0  __pthread_kill_implementation (threadid=281472845250592,
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x806c4334 in __pthread_kill_internal (signo=6,
threadid=) at pthread_kill.c:78
#2  0x8067c73c in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3  0x80669034 in __GI_abort () at abort.c:79
#4  0x00ad9d4c in ExceptionalCondition (conditionName=0xc903b0
"!(count <= tupdesc->natts)", errorType=0xc8f8c8 "FailedAssertion",
fileName=0xc8f918 "parse_relation.c",
lineNumber=2649) at assert.c:54
#5  0x00649664 in expandTupleDesc (tupdesc=0x293da188,
eref=0x293d7318, count=2, offset=0, rtindex=2, sublevels_up=0,
location=-1, include_dropped=true, colnames=0x0,
colvars=0xc39253c8) at parse_relation.c:2649
#6  0x00648d08 in expandRTE (rte=0x293d7390, rtindex=2,
sublevels_up=0, location=-1, include_dropped=true, colnames=0x0,
colvars=0xc39253c8) at parse_relation.c:2361
#7  0x00849bd0 in build_physical_tlist (root=0x293d5318,
rel=0x293d88e8) at plancat.c:1681
#8  0x00806ad0 in create_scan_plan (root=0x293d5318,
best_path=0x293cd888, flags=0) at createplan.c:605
#9  0x0080666c in create_plan_recurse (root=0x293d5318,
best_path=0x293cd888, flags=0) at createplan.c:389
#10 0x0080c4e8 in create_nestloop_plan (root=0x293d5318,
best_path=0x293d96f0) at createplan.c:4056
#11 0x00807464 in create_join_plan (root=0x293d5318,
best_path=0x293d96f0) at createplan.c:1037
#

Re: Popcount optimization using AVX512

2024-04-05 Thread Ants Aasma

On Fri, 5 Apr 2024 at 07:15, Nathan Bossart  wrote:
> Here is an updated patch set.  IMHO this is in decent shape and is
> approaching committable.

I checked the code generation on various gcc and clang versions. It
looks mostly fine starting from versions where avx512 is supported,
gcc-7.1 and clang-5.

The main issue I saw was that clang was able to peel off the first
iteration of the loop and then eliminate the mask assignment and
replace masked load with a memory operand for vpopcnt. I was not able
to convince gcc to do that regardless of optimization options.
Generated code for the inner loop:

clang:
:
  50:  add rdx, 64
  54:  cmp rdx, rdi
  57:  jae 
  59:  vpopcntq zmm1, zmmword ptr [rdx]
  5f:  vpaddq zmm0, zmm1, zmm0
  65:  jmp 

gcc:
:
  38:  kmovq k1, rdx
  3d:  vmovdqu8 zmm0 {k1} {z}, zmmword ptr [rax]
  43:  add rax, 64
  47:  mov rdx, -1
  4e:  vpopcntq zmm0, zmm0
  54:  vpaddq zmm0, zmm0, zmm1
  5a:  vmovdqa64 zmm1, zmm0
  60:  cmp rax, rsi
  63:  jb 

I'm not sure how much that matters in practice. Attached is a patch to
do this manually giving essentially the same result in gcc. As most
distro packages are built using gcc I think it would make sense to
have the extra code if it gives a noticeable benefit for large cases.

The visibility map patch has the same issue, otherwise looks good.

Regards,
Ants Aasma
diff --git a/src/port/pg_popcount_avx512.c b/src/port/pg_popcount_avx512.c
index dacc7553d29..f6e718b86e9 100644
--- a/src/port/pg_popcount_avx512.c
+++ b/src/port/pg_popcount_avx512.c
@@ -52,13 +52,21 @@ pg_popcount_avx512(const char *buf, int bytes)
 	 * Iterate through all but the final iteration.  Starting from second
 	 * iteration, the start index mask is ignored.
 	 */
-	for (; buf < final; buf += sizeof(__m512i))
+	if (buf < final)
 	{
 		val = _mm512_maskz_loadu_epi8(mask, (const __m512i *) buf);
 		cnt = _mm512_popcnt_epi64(val);
 		accum = _mm512_add_epi64(accum, cnt);

+		buf += sizeof(__m512i);
 		mask = ~UINT64CONST(0);
+
+		for (; buf < final; buf += sizeof(__m512i))
+		{
+			val = _mm512_load_si512((const __m512i *) buf);
+			cnt = _mm512_popcnt_epi64(val);
+			accum = _mm512_add_epi64(accum, cnt);
+		}
 	}

 	/* Final iteration needs to ignore bytes that are not within the length */

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-05 Thread Bertrand Drouvot

Hi,

On Fri, Apr 05, 2024 at 11:21:43AM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 9:57 PM Bertrand Drouvot
>  wrote:
> Please find the attached v36 patch.

Thanks!

A few comments:

1 ===

+   
+The timeout is measured from the time since the slot has become
+inactive (known from its
+inactive_since value) until it gets
+used (i.e., its active is set to true).
+   

That's right except when it's invalidated during the checkpoint (as the slot
is not acquired in CheckPointReplicationSlots()).

So, what about adding: "or a checkpoint occurs"? That would also explain that
the invalidation could occur during checkpoint.

2 ===

+   /* If the slot has been invalidated, recalculate the resource limits */
+   if (invalidated)
+   {

/If the slot/If a slot/?

3 ===

+ * NB - this function also runs as part of checkpoint, so avoid raising errors

s/NB - this/NB: This function/? (that looks more consistent with other comments
in the code)

4 ===

+ * Note that having a new function for RS_INVAL_INACTIVE_TIMEOUT cause instead

I understand it's "the RS_INVAL_INACTIVE_TIMEOUT cause" but reading "cause 
instead"
looks weird to me. Maybe it would make sense to reword this a bit.

5 ===

+* considered not active as they don't actually perform logical 
decoding.

Not sure that's 100% accurate as we switched in fast forward logical
in 2ec005b4e2.

"as they perform only fast forward logical decoding (or not at all)", maybe?

6 ===

+   if (RecoveryInProgress() && slot->data.synced)
+   return false;
+
+   if (replication_slot_inactive_timeout == 0)
+   return false;

What about just using one if? It's more a matter of taste but it also probably
reduces the object file size a bit for non optimized build.

7 ===

+   /*
+* Do not invalidate the slots which are currently being synced 
from
+* the primary to the standby.
+*/
+   if (RecoveryInProgress() && slot->data.synced)
+   return false;

I think we don't need this check as the exact same one is done just before.

8 ===

+sub check_for_slot_invalidation_in_server_log
+{
+   my ($node, $slot_name, $offset) = @_;
+   my $invalidated = 0;
+
+   for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; 
$i++)
+   {
+   $node->safe_psql('postgres', "CHECKPOINT");

Wouldn't be better to wait for the replication_slot_inactive_timeout time before
instead of triggering all those checkpoints? (it could be passed as an extra arg
to wait_for_slot_invalidation()).

9 ===

# Synced slot mustn't get invalidated on the standby, it must sync invalidation
# from the primary. So, we must not see the slot's invalidation message in 
server
# log.
ok( !$standby1->log_contains(
"invalidating obsolete replication slot \"lsub1_sync_slot\"",
$standby1_logstart),
'check that syned slot has not been invalidated on the standby');

Would that make sense to trigger a checkpoint on the standby before this test?
I mean I think that without a checkpoint on the standby we should not see the
invalidation in the log anyway.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Improve eviction algorithm in ReorderBuffer

2024-04-05 Thread Masahiko Sawada

On Fri, Apr 5, 2024 at 2:55 AM Jeff Davis  wrote:
>
> On Thu, 2024-04-04 at 17:28 +0900, Masahiko Sawada wrote:
> > > Perhaps it's not worth the effort though, if performance is already
> > > good enough?
> >
> > Yeah, it would be better to measure the overhead first. I'll do that.
>
> I have some further comments and I believe changes are required for
> v17.
>
> An indexed binary heap API requires both a comparator and a hash
> function to be specified, and has two different kinds of keys: the heap
> key (mutable) and the hash key (immutable). It provides heap methods
> and hashtable methods, and keep the two internal structures (heap and
> HT) in sync.

IIUC for example in ReorderBuffer, the heap key is transaction size
and the hash key is xid.

>
> The implementation in b840508644 uses the bh_node_type as the hash key,
> which is just a Datum, and it just hashes the bytes. I believe the
> implicit assumption is that the Datum is a pointer -- I'm not sure how
> one would use that API if the Datum were a value. Hashing a pointer
> seems strange to me and, while I see why you did it that way, I think
> it reflects that the API boundaries are not quite right.

I see your point. It assumes that the bh_node_type is a pointer or at
least unique. So it cannot work with Datum being a value.

>
> One consequence of using the pointer as the hash key is that you need
> to find the pointer first: you can't change or remove elements based on
> the transaction ID, you have to get the ReorderBufferTXN pointer by
> finding it in another structure, first. Currently, that's being done by
> searching ReorderBuffer->by_txn. So we actually have two hash tables
> for essentially the same purpose: one with xid as the key, and the
> other with the pointer as the key. That makes no sense -- let's have a
> proper indexed binary heap to look things up by xid (the internal HT)
> or by transaction size (using the internal heap).
>
> I suggest:
>
>   * Make a proper indexed binary heap API that accepts a hash function
> and provides both heap methods and HT methods that operate based on
> values (transaction size and transaction ID, respectively).
>   * Get rid of ReorderBuffer->by_txn and use the indexed binary heap
> instead.
>
> This will be a net simplification in reorderbuffer.c, which is good,
> because that file makes use of a *lot* of data strucutres.
>

It sounds like a data structure that mixes the hash table and the
binary heap and we use it as the main storage (e.g. for
ReorderBufferTXN) instead of using the binary heap as the secondary
data structure. IIUC with your idea, the indexed binary heap has a
hash table to store elements each of which has its index within the
heap node array. I guess it's better to create it as a new data
structure rather than extending the existing binaryheap, since APIs
could be very different. I might be missing something, though.

On Fri, Apr 5, 2024 at 3:55 AM Jeff Davis  wrote:
>
> On Thu, 2024-04-04 at 10:55 -0700, Jeff Davis wrote:
> >   * Make a proper indexed binary heap API that accepts a hash
> > function
> > and provides both heap methods and HT methods that operate based on
> > values (transaction size and transaction ID, respectively).
> >   * Get rid of ReorderBuffer->by_txn and use the indexed binary heap
> > instead.
>
> An alternative idea:
>
> * remove the hash table from binaryheap.c
>
> * supply a new callback to the binary heap with type like:
>
>   typedef void (*binaryheap_update_index)(
> bh_node_type node,
> int new_element_index);
>
> * make the remove, update_up, and update_down methods take the element
> index rather than the pointer
>
> reorderbuffer.c would then do something like:
>
>   void
>   txn_update_heap_index(ReorderBufferTXN *txn, int new_element_index)
>   {
>  txn->heap_element_index = new_element_index;
>   }
>
>   ...
>
>   txn_heap = binaryheap_allocate(..., txn_update_heap_index, ...);
>
> and then binaryheap.c would effectively maintain txn-
> >heap_element_index, so reorderbuffer.c can pass that to the APIs that
> require the element index.

Thank you for the idea. I was thinking the same idea when considering
your previous comment. With this idea, we still use the binaryheap for
ReorderBuffer as the second data structure. Since we can implement
this idea with relatively small changes to the current binaryheap,
I've implemented it and measured performances.

I've attached a patch that adds an extension for benchmarking
binaryheap implementations. binaryheap_bench.c is the main test
module. To make the comparison between different binaryheap
implementations, the extension includes two different binaryheap
implementations. Therefore, binaryheap_bench.c uses three different
binaryheap implementation in total as the comment on top of the file
says:

/*
 * This benchmark tool uses three binary heap implementations.
 *
 * "binaryheap" is the current binaryheap implementation in PostgreSQL. That
 * is, it internally has a hash table to track

Re: remaining sql/json patches

2024-04-05 Thread Alexander Lakhin


05.04.2024 10:09, Amit Langote wrote:

Seems like it might be a pre-existing issue, because I can also
reproduce the crash with:

SELECT * FROM COALESCE(row(1)) AS (a int, b int);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Backtrace:

#0  __pthread_kill_implementation (threadid=281472845250592,
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x806c4334 in __pthread_kill_internal (signo=6,
threadid=) at pthread_kill.c:78
#2  0x8067c73c in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3  0x80669034 in __GI_abort () at abort.c:79
#4  0x00ad9d4c in ExceptionalCondition (conditionName=0xcbb368
"!(tupdesc->natts >= colcount)", errorType=0xcbb278 "FailedAssertion",
fileName=0xcbb2c8 "nodeFunctionscan.c",
 lineNumber=379) at assert.c:54


That's strange, because I get the error (on master, 6f132ed69).
With backtrace_functions = 'tupledesc_match', I see
2024-04-05 10:48:27.827 MSK client backend[2898632] regress ERROR: function return row and query-specified return row do 
not match

2024-04-05 10:48:27.827 MSK client backend[2898632] regress DETAIL: Returned 
row contains 1 attribute, but query expects 2.
2024-04-05 10:48:27.827 MSK client backend[2898632] regress BACKTRACE:
tupledesc_match at execSRF.c:948:3
ExecMakeTableFunctionResult at execSRF.c:427:13
FunctionNext at nodeFunctionscan.c:94:5
ExecScanFetch at execScan.c:131:10
ExecScan at execScan.c:180:10
ExecFunctionScan at nodeFunctionscan.c:272:1
ExecProcNodeFirst at execProcnode.c:465:1
ExecProcNode at executor.h:274:9
 (inlined by) ExecutePlan at execMain.c:1646:10
standard_ExecutorRun at execMain.c:363:3
ExecutorRun at execMain.c:305:1
PortalRunSelect at pquery.c:926:26
PortalRun at pquery.c:775:8
exec_simple_query at postgres.c:1282:3
PostgresMain at postgres.c:4684:27
BackendMain at backend_startup.c:57:2
pgarch_die at pgarch.c:847:1
BackendStartup at postmaster.c:3593:8
ServerLoop at postmaster.c:1674:6
main at main.c:184:3
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f37127f0e40]
2024-04-05 10:48:27.827 MSK client backend[2898632] regress STATEMENT:  SELECT 
* FROM COALESCE(row(1)) AS (a int, b int);

That's why I had attributed the failure to JSON_TABLE().

Though SELECT * FROM generate_series(1, 1), COALESCE(row(1)) AS (a int, b int);
really triggers the assert too.
Sorry for the noise...

Best regards,
Alexander

Re: remaining sql/json patches

2024-04-05 Thread Amit Langote

On Fri, Apr 5, 2024 at 5:00 PM Alexander Lakhin  wrote:
> 05.04.2024 10:09, Amit Langote wrote:
> > Seems like it might be a pre-existing issue, because I can also
> > reproduce the crash with:
>
> That's strange, because I get the error (on master, 6f132ed69).
> With backtrace_functions = 'tupledesc_match', I see
> 2024-04-05 10:48:27.827 MSK client backend[2898632] regress ERROR: function 
> return row and query-specified return row do
> not match
> 2024-04-05 10:48:27.827 MSK client backend[2898632] regress DETAIL: Returned 
> row contains 1 attribute, but query expects 2.
> 2024-04-05 10:48:27.827 MSK client backend[2898632] regress BACKTRACE:
> tupledesc_match at execSRF.c:948:3
> ExecMakeTableFunctionResult at execSRF.c:427:13
> FunctionNext at nodeFunctionscan.c:94:5
> ExecScanFetch at execScan.c:131:10
> ExecScan at execScan.c:180:10
> ExecFunctionScan at nodeFunctionscan.c:272:1
> ExecProcNodeFirst at execProcnode.c:465:1
> ExecProcNode at executor.h:274:9
>   (inlined by) ExecutePlan at execMain.c:1646:10
> standard_ExecutorRun at execMain.c:363:3
> ExecutorRun at execMain.c:305:1
> PortalRunSelect at pquery.c:926:26
> PortalRun at pquery.c:775:8
> exec_simple_query at postgres.c:1282:3
> PostgresMain at postgres.c:4684:27
> BackendMain at backend_startup.c:57:2
> pgarch_die at pgarch.c:847:1
> BackendStartup at postmaster.c:3593:8
> ServerLoop at postmaster.c:1674:6
> main at main.c:184:3
>  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) 
> [0x7f37127f0e40]
> 2024-04-05 10:48:27.827 MSK client backend[2898632] regress STATEMENT:  
> SELECT * FROM COALESCE(row(1)) AS (a int, b int);
>
> That's why I had attributed the failure to JSON_TABLE().
>
> Though SELECT * FROM generate_series(1, 1), COALESCE(row(1)) AS (a int, b 
> int);
> really triggers the assert too.
> Sorry for the noise...

No worries.  Let's start another thread so that this gets more attention.

-- 
Thanks, Amit Langote

Bad estimation for NOT IN clause with big null fraction

2024-04-05 Thread Donghang Lin

Hi hackers

Discussion[1] and the relevant commit[2] improved the selectivity
calculation for IN/NOT IN.

This is the current logic for NOT IN selectivity calculation and it loops
over the array elements.

else
{
s1 = s1 * s2;
if (isInequality)
 s1disjoint += s2 - 1.0;
}

By calculating s2 for each array element, it calls neqsel and returns 1 -
eqsel - nullfrac.
If I expand the s1disjoint calculation for a NOT IN (2,5,8) clause,
It eventually becomes 1 - eqsel(2) - eqsel(5) - eqsel(8) - 3*nullfrac.
If nullfrac is big, s1disjoint will be less than 0 quickly when the array
has more elements,
and the selectivity algorithm falls back to the one prior to commit[2]
which had bad estimation for NOT IN as well.

It seems to me that nullfrac should be subtracted only once. Is it feasible
that we have a new variable s1disjoint2
that add back nullfrac when we get back the result for each s2 and subtract
it once at the end of the loop as a 2nd heuristic?
We then maybe prefer s1disjoint2 over s1disjoint and then s1?

Donghang Lin
(ServiceNow)

[1]
https://www.postgresql.org/message-id/flat/CA%2Bmi_8aPEAzBgWZpNTABGM%3DcSq7mRMyPWbMsU8eGmUfH75OTLA%40mail.gmail.com
[2]
https://github.com/postgres/postgres/commit/66a7e6bae98592d1d98d9ef589753f0e953c5828

Re: LogwrtResult contended spinlock

2024-04-05 Thread Alvaro Herrera

On 2024-Apr-05, Bharath Rupireddy wrote:

> 1.
>  /*
>   * Update local copy of shared XLogCtl->log{Write,Flush}Result
> + *
> + * It's critical that Flush always trails Write, so the order of the reads is
> + * important, as is the barrier.
>   */
>  #define RefreshXLogWriteResult(_target) \
>  do { \
> -_target.Write = XLogCtl->logWriteResult; \
> -_target.Flush = XLogCtl->logFlushResult; \
> +_target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \
> +pg_read_barrier(); \
> +_target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \
>  } while (0)
> 
> Is it "Flush always trails Write" or "Flush always leades Write"? I
> guess the latter, no?

Well, Flush cannot lead Write.  We cannot Flush what hasn't been
Written!  To me, "Flush trails Write" means that flush is behind.  That
seems supported by Merriam-Webster[1], which says in defining it as a
transitive verb

3 a : to follow upon the scent or trace of : TRACK
  b : to follow in the footsteps of : PURSUE
  c : to follow along behind
  d : to lag behind (someone, such as a competitor)

Maybe it's not super clear as a term.  We could turn it around and say "Write
always leads Flush".

[1] https://www.merriam-webster.com/dictionary/trail

The reason we have the barrier, and the reason we write and read them in
this order, is that we must never read a Flush value that is newer than
the Write value we read.  That is: the values are written in pairs
(w1,f1) first, (w2,f2) next, and so on.  It's okay if we obtain (w2,f1)
(new write, old flush), but it's not okay if we obtain (w1,f2) (old
write, new flush).  If we did, we might end up with a Flush value that's
ahead of Write, violating the invariant.  

> 2.
> +
> +pg_atomic_write_u64(&XLogCtl->logWriteResult, LogwrtResult.Write);
> +pg_write_barrier();
> +pg_atomic_write_u64(&XLogCtl->logFlushResult, LogwrtResult.Flush);
>  }
> 
> Maybe add the reason as to why we had to write logWriteResult first
> and then logFlushResult, similar to the comment atop
> RefreshXLogWriteResult?

Hmm, I had one there and removed it.  I'll put something back.

> > For 0002, did you consider having pg_atomic_monotonic_advance_u64()
> > return the currval?
> 
> +1 for returning currval here.

Oh yeah, I had that when the monotonic stuff was used by 0001 but lost
it when I moved to 0002.  I'll push 0001 now and send an updated 0002
with the return value added.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

Re: Looking for an index that supports top-n searches by enforcing a max-n automatically

2024-04-05 Thread Morris de Oryx

Just about as soon as I sent the above, I realized that it's unlikely to
make sense in the real world in a row-store. If the goal is to keep the
top-25 results and trim the rest, what happens when values are
added/modified/deleted? You now *have to go look at all of the data you
aren't caching in the index. *Unless you can *guarantee* that data is
entered in perfect order, and/or never changes, I don't think what I'm
looking for is likely to make sense.


On Fri, Apr 5, 2024 at 11:27 AM Morris de Oryx 
wrote:

> Looking for an index to support top-n searches, were n has a fixed maximum
>
> Recently, I've been looking at strategies to handle top-n queries in
> Postgres. In my current cases, we've got definition tables, and very large
> related tables. Here's a stripped-down example, the real tables are much
> wider.
>
> CREATE TABLE IF NOT EXISTS data.inventory (
> id  uuid  NOT NULL DEFAULT NULL PRIMARY KEY,
> inv_id  uuid  NOT NULL DEFAULT NULL
> );
>
>
> CREATE TABLE IF NOT EXISTS data.scan (
> id  uuid  NOT NULL DEFAULT NULL PRIMARY KEY,
> inv_id  uuid  NOT NULL DEFAULT NULL
> scan_dts_utctimestamp NOT NULL DEFAULT NOW(); -- We run out
> servers on UTC
> );
>
> Every item in inventory is scanned when it passes through various steps in
> a clean-dispatch-arrive-use-clean sort of a work cycle. The ratio between
> inventory and scan is 1:0-n, where n can be virtually any number. In
> another table pair like this, the average is 1:1,000. In the inventory
> example, it's roughly 0:200,000. The distribution of related row counts is
> all over the map. The reasons behind these ratios sometimes map to valid
> processes, and sometimes are a consequence of some low-quality data leaking
> into the system. In the case of inventory, the results make sense. In our
> case:
>
> * The goal value for n is often 1, and not more than up to 25.
>
> * Depending on the tables, the % of rows that are discarded because
> they're past the 25th most recent record is 97% or more.
>
> * Partial indexes do not work as well on huge tables as I hoped. The same
> data copied via a STATEMENT trigger into a thin, subsetted table is much
> faster. I think this has to do with the increase in random disk access
> required for a table and/or index with more pages spread around on the disk.
>
> * We can't filter in advance by *any* reasonable date range. Could get 25
> scans on one item in an hour, or a year, or five years, or never.
>
> We're finding that we need the top-n records more and more often, returned
> quickly. This gets harder to do as the table(s) grow.
>
>SELECT id, scan_dts_utc
>  FROM data.scan
> WHERE inv_id = 'b7db5d06-8275-224d-a38a-ac263dc1c767'  curve.
>  ORDER BY scan_dts_utc DESC
> LIMIT 25; -- Full search product might be 0, 200K, or anything
> in-between. Not on a bell curve.
>
> A compound index works really well to optimize these kinds of searches:
>
> CREATE INDEX scan_inv_id_scan_time_utc_dts_idx
>   ON ascendco.analytic_scan (inv_id, scan_time_utc_dts DESC);
>
> What I'm wondering is if there is some index option, likely not with a
> B-tree, that can *automatically* enforce a maximum-length list of top
> values, based on a defined sort
>
> CREATE INDEX scan_inv_id_scan_time_utc_dts_idx
>   ON ascendco.analytic_scan (inv_id, scan_time_utc_dts DESC) --
> This defines the ordering
>LIMIT 25; --
> This sets the hard max for n
>
> The goal is to have an automatically maintained list of the top values
> *in* the index itself. In the right situations (like ours), this reduces
> the index size by 20x or more. Smaller index, faster results. And, since
> the index is on the source table, the row references are already there.
> (Something I lose when maintaining this by hand in a side/shadow/top table.)
>
> I've looked at a ton of plans, and Postgres *clearly* goes to a lot of
> effort to recognize and optimize top-n searches already. That's
> encouraging, as it suggests that the planner takes LIMIT into account.
> (I've picked up already that maintaining the purity of the planner and
> executor abstractions is a core value to the project.)
>
> And, for sure, I can build and maintain my own custom, ordered list in
> various ways. None of them seem like they can possibly rival the trimming
> behavior being handled by an index.
>
> I'm out over my skis here, but I'm intuiting that this might be a job for
> one of the multi-value/inverted index types/frameworks. I tried some
> experiments, but only got worse results.
>
> Hope that reads as understandable...grateful for any suggestions.
>

Recovery of .partial WAL segments

2024-04-05 Thread Stefan Fercot

Dear hackers,

Generating a ".partial" WAL segment is pretty common nowadays (using 
pg_receivewal or during standby promotion).
However, we currently don't do anything with it unless the user manually 
removes that ".partial" extension.

The 028_pitr_timelines tests are highlighting that fact: with test data being 
being in 00020003 and 00010003.partial, a 
recovery following the latest timeline (2) will succeed but fail if we follow 
the current timeline (1).

By simply trying to fetch the ".partial" file in XLogFileRead, we can easily 
recover more data and also cover that (current timeline) recovery case.

So, this proposed patch makes XLogFileRead try to restore ".partial" WAL 
archives and adds a test to 028_pitr_timelines using current 
recovery_target_timeline.

As far as I've seen, the current pg_receivewal tests only seem to cover the 
archives generation but not actually trying to recover using it. I wasn't sure 
it was interesting to add such tests right now, so I didn't considered it for 
this patch.

Many thanks in advance for your feedback and thoughts about this,
Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)From 8c73284b72120ddf3537e1632f12455502d47f6d Mon Sep 17 00:00:00 2001
From: Stefan Fercot 
Date: Fri, 5 Apr 2024 10:57:03 +0200
Subject: [PATCH v1.] Make XLogFileRead try to restore .partial wal archives

Try to restore the normal archived wal segment first and, if not found, then try to restore the archived .partial wal segment.
This is safe because the next completed wal segment should contain at least the same data.
---
 src/backend/access/transam/xlogrecovery.c | 20 ++--
 src/test/recovery/t/028_pitr_timelines.pl | 37 ++-
 2 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index b2fe2d04cc..6d51b6a296 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4206,6 +4206,8 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 	char		activitymsg[MAXFNAMELEN + 16];
 	char		path[MAXPGPATH];
 	int			fd;
+	char	   *partialxlogfname;
+	bool		restoredArchivedFile;
 
 	XLogFileName(xlogfname, tli, segno, wal_segment_size);
 
@@ -4217,10 +4219,24 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 	 xlogfname);
 			set_ps_display(activitymsg);
 
-			if (!RestoreArchivedFile(path, xlogfname,
+			/*
+			 * Try to restore the normal wal segment first and, if not found,
+			 * then try to restore the .partial wal segment.
+			 */
+
+			partialxlogfname = psprintf("%s.partial", xlogfname);
+
+			restoredArchivedFile = !RestoreArchivedFile(path, xlogfname,
+		"RECOVERYXLOG",
+		wal_segment_size,
+		InRedo) &&
+!RestoreArchivedFile(path, partialxlogfname,
 	 "RECOVERYXLOG",
 	 wal_segment_size,
-	 InRedo))
+	 InRedo);
+
+			pfree(partialxlogfname);
+			if (restoredArchivedFile)
 return -1;
 			break;
 
diff --git a/src/test/recovery/t/028_pitr_timelines.pl b/src/test/recovery/t/028_pitr_timelines.pl
index 4b7d825b71..512695cd0a 100644
--- a/src/test/recovery/t/028_pitr_timelines.pl
+++ b/src/test/recovery/t/028_pitr_timelines.pl
@@ -110,7 +110,7 @@ $node_standby->stop;
 # segment 00020003, before the timeline switching
 # record.  (They are also present in the
 # 00010003.partial file, but .partial files are not
-# used automatically.)
+# used when recovering along the latest timeline by default.)
 
 # Now test PITR to the recovery target.  It should find the WAL in
 # segment 00020003, but not follow the timeline switch
@@ -173,4 +173,39 @@ $node_pitr2->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';")
 $result = $node_pitr2->safe_psql('postgres', "SELECT max(i) FROM foo;");
 is($result, qq{3}, "check table contents after point-in-time recovery");
 
+# The 00010003.partial file could have been generated
+# by pg_receivewal without any standby node involved. In this case, we
+# wouldn't be able to recover from 00020003.
+# Now, test PITR to the initial recovery target staying on the backup's
+# current timeline, trying to fetch the data from the
+# 00010003.partial file.
+
+my $node_pitr3 = PostgreSQL::Test::Cluster->new('node_pitr3');
+$node_pitr3->init_from_backup(
+	$node_primary, $backup_name,
+	standby => 0,
+	has_restoring => 1);
+$node_pitr3->append_conf(
+	'postgresql.conf', qq{
+recovery_target_name = 'rp'
+recovery_target_action = 'promote'
+recovery_target_timeline = 'current'
+});
+
+my $log_offset = -s $node_pitr3->logfile;
+$node_pitr3->start;
+
+ok( $node_pitr3->log_contains(
+		"restored log file \"00010003.partial\" from archive",
+		$log_offset),
+	"restored 00010003.partial");
+
+# Wait until recovery finishes.
+$node_pi

Re: Synchronizing slots from primary to standby

2024-04-05 Thread shveta malik

On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot
 wrote:
>
> What about something like?
>
> ereport(LOG,
> errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from 
> remote slot",
> remote_slot->name),
> errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.",
> LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> LSN_FORMAT_ARGS(slot->data.restart_lsn));
>
> Regards,

+1. Better than earlier. I will update and post the patch.

thanks
Shveta

Re: CSN snapshots in hot standby

2024-04-05 Thread Andrey M. Borodin

> On 5 Apr 2024, at 02:08, Kirill Reshke  wrote:
> 
> maybe we need some hooks here? Or maybe, we can take CSN here from extension 
> somehow.

I really like the idea of CSN-provider-as-extension.
But it's very important to move on with CSN, at least on standby, to make CSN 
actually happen some day.
So, from my perspective, having LSN-as-CSN is already huge step forward.

Best regards, Andrey Borodin.

Re: Another WaitEventSet resource leakage in back branches

2024-04-05 Thread Etsuro Fujita

On Fri, Mar 22, 2024 at 9:15 PM Etsuro Fujita  wrote:
> While working on [1], I noticed $SUBJECT: WaitLatchOrSocket in back
> branches is ignoring the possibility of failing partway through, too.
> I added a PG_FAINALLY block to that function, like commit 555276f85.
> Patch attached.

I noticed that PG_FAINALLY was added in v13.  I created a separate
patch for v12 using PG_CATCH instead.  Patch attached.  I am attaching
the previous patch for later versions as well.

I am planning to back-patch these next week.

Best regards,
Etsuro Fujita

fix-another-WaitEventSet-resource-leakage.patch
Description: Binary data

fix-another-WaitEventSet-resource-leakage-PG12.patch
Description: Binary data

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Bertrand Drouvot

Hi,

On Fri, Apr 05, 2024 at 04:09:01PM +0530, shveta malik wrote:
> On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot
>  wrote:
> >
> > What about something like?
> >
> > ereport(LOG,
> > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs 
> > from remote slot",
> > remote_slot->name),
> > errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.",
> > LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> > LSN_FORMAT_ARGS(slot->data.restart_lsn));
> >
> > Regards,
> 
> +1. Better than earlier. I will update and post the patch.
> 

Thanks!

BTW, I just realized that the LSN I used in my example in the LSN_FORMAT_ARGS()
are not the right ones.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-05 Thread Ajin Cherian

On Thu, Apr 4, 2024 at 4:38 PM Amit Kapila  wrote:

>
> I think this would probably be better than the current situation but
> can we think of a solution to allow toggling the value of two_phase
> even when prepared transactions are present? Can you please summarize
> the reason for the problems in doing that and the solutions, if any?
>
> --
> With Regards,
> Amit Kapila.
>

Updated the patch, as it wasn't addressing updating of two-phase in the
remote slot.

 Currently the main issue that needs to be handled is the handling of
pending prepared transactions while the two_phase is altered. I see 3
issues with the current approach.

1. Uncommitted prepared transactions when toggling two_phase from true to
false
When two_phase was true, prepared transactions were decoded at PREPARE time
and send to the subscriber, which is then prepared on the subscriber with a
new gid. Once the two_phase is toggled to false, then the COMMIT PREPARED
on the publisher is converted to commit and the entire transaction is
decoded and sent to the subscriber. This will   leave the previously
prepared transaction pending.

2. Uncommitted prepared transactions when toggling two_phase form false to
true
When two_phase was false, prepared transactions were ignored and not
decoded at PREPARE time on the publisher. Once the two_phase is toggled to
true, the apply worker and the walsender are restarted and a replication is
restarted from a new "start_decoding_at" LSN. Now, this new
"start_decoding_at" could be past the LSN of the PREPARE record and if so,
the PREPARE record is skipped and not send to the subscriber. Look at
comments in DecodeTXNNeedSkip() for detail.  Later when the user issues
COMMIT PREPARED, this is decoded and sent to the subscriber. but there is
no prepared transaction on the subscriber, and this fails because the
corresponding gid of the transaction couldn't be found.

3. While altering the two_phase of the subscription, it is required to also
alter the two_phase field of the slot on the primary. The subscription
cannot remotely alter the two_phase option of the slot when the
subscription is  enabled, as the slot is owned by the walsender on the
publisher side.

Possible solutions for the 3 problems:

1. While toggling two_phase from true to false, we could probably get list
of prepared transactions for this subscriber id and rollback/abort the
prepared transactions. This will allow the transactions to be re-applied
like a normal transaction when the commit comes. Alternatively, if this
isn't appropriate doing it in the ALTER SUBSCRIPTION context, we could
store the xids of all prepared transactions of this subscription in a list
and when the corresponding xid is being committed by the apply worker,
prior to commit, we make sure the previously prepared transaction is rolled
back. But this would add the overhead of checking this list every time a
transaction is committed by the apply worker.

2. No solution yet.

3. We could mandate that the altering of two_phase state only be done after
disabling the subscription, just like how it is handled for failover option.
Let me know your thoughts.

regards,
Ajin Cherian
Fujitsu Australia

v2-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch
Description: Binary data

Re: Synchronizing slots from primary to standby

2024-04-05 Thread shveta malik

On Fri, Apr 5, 2024 at 4:31 PM Bertrand Drouvot
 wrote:
>
> BTW, I just realized that the LSN I used in my example in the 
> LSN_FORMAT_ARGS()
> are not the right ones.

Noted. Thanks.

Please find v3 with the comments addressed.

thanks
Shveta


v3-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
Description: Binary data

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila

On Thu, Apr 4, 2024 at 2:59 PM shveta malik  wrote:
>
> There is an intermittent BF failure observed at [1] after this commit 
> (2ec005b).
>

Thanks for analyzing and providing the patch. I'll look into it. There
is another BF failure [1] which I have analyzed. The main reason for
failure is the following test:

#   Failed test 'logical slots have synced as true on standby'
#   at 
/home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
line 198.
#  got: 'f'
# expected: 't'

Here, we are expecting that the two logical slots (lsub1_slot, and
lsub2_slot), one created via subscription and another one via API
pg_create_logical_replication_slot() are synced. The standby LOGs
which are as follows show that the one created by API 'lsub2_slot' is
synced but the other one 'lsub1_slot':

LOG for lsub1_slot:

2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL:
Streaming transactions committing after 0/0, reading WAL from
0/360.
2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0]
STATEMENT:  SELECT pg_sync_replication_slots();
2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG:
xmin required by slots: data 0, catalog 740
2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG:
could not sync slot "lsub1_slot"

LOG for lsub2_slot:

2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG:
xmin required by slots: data 0, catalog 740
2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG:
newly created slot "lsub2_slot" is sync-ready now
2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0]
STATEMENT:  SELECT pg_sync_replication_slots();

We can see from the log of lsub1_slot that its restart_lsn is
0/360 which means it will start reading from the WAL from that
location. Now, if we check the publisher log, we have running_xacts
record at that location. See following LOGs:

2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG:
statement: SELECT pg_create_logical_replication_slot('lsub2_slot',
'test_decoding', false, false, true);
2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG:
snapshot of 0+0 running transaction ids (lsn 0/360 oldest xid 740
latest complete 739 next xid 740)

2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740
latest complete 739 next xid 740)

The first running_xact record ends at 360 and the second one at
398. So, the start location of the second running_xact is 360,
the same can be confirmed by the following LOG line of walsender:

2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG:
serializing snapshot to pg_logical/snapshots/0-360.snap

This shows that while processing running_xact at location 360, we
have serialized the snapshot. As there is no running transaction in
WAL at 360 so ideally we should have reached a consistent state
after processing that record on standby. But the reason standby didn't
process that LOG is that the confirmed_flush LSN is also at the same
location so the function LogicalSlotAdvanceAndCheckSnapState() exits
without reading the WAL at that location. Now, this can be confirmed
by the below walsender-specific LOG in publisher:

2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write
0/360 flush 0/360 apply 0/360 reply_time 2024-04-05
04:36:59.155181+00

We update the confirmed_flush location with the flush location after
receiving the above feedback. You can notice that we didn't receive
the feedback for the 398 location and hence both the
confirmed_flush and restart_lsn are at the same location 0/360.
Now, the test is waiting for the subscriber to send feedback of the
last WAL write location by
$primary->wait_for_catchup('regress_mysub1'); As noticed from the
publisher LOGs, the query we used for wait is:

SELECT '0/360' <= replay_lsn AND state = 'streaming'
FROM pg_catalog.pg_stat_replication
WHERE application_name IN ('regress_mysub1', 'walreceiver')

Here, instead of '0/360' it should have used ''0/398' which is
the last write location. This position we get via function
pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And
this variable seems to be touched by commit
c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could
c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At
this stage, I am not sure so just sharing with others to see if what I
am saying sounds logical. I'll think more about this.

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-04-05%2004%3A34%3A27

-- 
With Regards,
Amit Kapila.

Re: LogwrtResult contended spinlock

2024-04-05 Thread Alvaro Herrera

Couldn't push: I tested with --disable-atomics --disable-spinlocks and
the tests fail because the semaphore for the atomic variables is not
always initialized.  This is weird -- it's like a client process is
running at a time when StartupXLOG has not initialized the variable ...
so the initialization in the other place was not completely wrong.
I'll investigate after lunch.  Here's v16.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"XML!" Exclaimed C++.  "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.
https://burningbird.net/the-parable-of-the-languages/
>From 2164778b74681910544a64ba6c2ae36e5204ed9e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 5 Apr 2024 13:21:39 +0200
Subject: [PATCH v16 1/2] Make XLogCtl->log{Write,Flush}Result accessible with
 atomics

This removes the need to hold both the info_lck spinlock and
WALWriteLock to update them.  We use stock atomic write instead, with
WALWriteLock held.  Readers can use atomic read, without any locking.

This allows for some code to be reordered: some places were a bit
contorted to avoid repeated spinlock acquisition, but that's no longer a
concern, so we can turn them to more natural coding.  Some further
changes are likely possible with a positive performance impact, but in
this commit I did rather minimal ones only, to avoid increasing the
blast radius.

Reviewed-by: Bharath Rupireddy 
Reviewed-by: Jeff Davis 
Reviewed-by: Andres Freund  (earlier versions)
Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql
---
 src/backend/access/transam/xlog.c | 105 --
 1 file changed, 57 insertions(+), 48 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3bdd9a2ddd..748ed1f2ef 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -292,12 +292,7 @@ static bool doPageWrites;
  * LogwrtRqst indicates a byte position that we need to write and/or fsync
  * the log up to (all records before that point must be written or fsynced).
  * The positions already written/fsynced are maintained in logWriteResult
- * and logFlushResult.
- *
- * To read XLogCtl->logWriteResult or ->logFlushResult, you must hold either
- * info_lck or WALWriteLock.  To update them, you need to hold both locks.
- * The point of this arrangement is that the value can be examined by code
- * that already holds WALWriteLock without needing to grab info_lck as well.
+ * and logFlushResult using atomic access.
  * In addition to the shared variable, each backend has a private copy of
  * both in LogwrtResult, which is updated when convenient.
  *
@@ -473,12 +468,9 @@ typedef struct XLogCtlData
 	pg_time_t	lastSegSwitchTime;
 	XLogRecPtr	lastSegSwitchLSN;
 
-	/*
-	 * Protected by info_lck and WALWriteLock (you must hold either lock to
-	 * read it, but both to update)
-	 */
-	XLogRecPtr	logWriteResult; /* last byte + 1 written out */
-	XLogRecPtr	logFlushResult; /* last byte + 1 flushed */
+	/* These are accessed using atomics -- info_lck not needed */
+	pg_atomic_uint64 logWriteResult;	/* last byte + 1 written out */
+	pg_atomic_uint64 logFlushResult;	/* last byte + 1 flushed */
 
 	/*
 	 * Latest initialized page in the cache (last byte position + 1).
@@ -616,11 +608,15 @@ static XLogwrtResult LogwrtResult = {0, 0};
 
 /*
  * Update local copy of shared XLogCtl->log{Write,Flush}Result
+ *
+ * It's critical that Flush always trails Write, so the order of the reads is
+ * important, as is the barrier.  See also XLogWrite.
  */
 #define RefreshXLogWriteResult(_target) \
 	do { \
-		_target.Write = XLogCtl->logWriteResult; \
-		_target.Flush = XLogCtl->logFlushResult; \
+		_target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \
+		pg_read_barrier(); \
+		_target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \
 	} while (0)
 
 /*
@@ -968,9 +964,8 @@ XLogInsertRecord(XLogRecData *rdata,
 		/* advance global request to include new block(s) */
 		if (XLogCtl->LogwrtRqst.Write < EndPos)
 			XLogCtl->LogwrtRqst.Write = EndPos;
-		/* update local result copy while I have the chance */
-		RefreshXLogWriteResult(LogwrtResult);
 		SpinLockRelease(&XLogCtl->info_lck);
+		RefreshXLogWriteResult(LogwrtResult);
 	}
 
 	/*
@@ -1989,17 +1984,17 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 			if (opportunistic)
 break;
 
-			/* Before waiting, get info_lck and update LogwrtResult */
+			/* Advance shared memory write request position */
 			SpinLockAcquire(&XLogCtl->info_lck);
 			if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
 XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
-			RefreshXLogWriteResult(LogwrtResult);
 			SpinLockRelease(&XLogCtl->info_lck);
 
 			/*
-			 * Now that we have an up-to-date LogwrtResult value, see if we
-			 * still need to write it or if someone else already did.
+			 * Acquire an up-to-date LogwrtResu

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Dave Cramer

On Thu, 4 Apr 2024 at 12:45, Jelte Fennema-Nio  wrote:

> On Thu, 4 Apr 2024 at 14:50, Peter Eisentraut 
> wrote:
> > It appears there are several different perspectives about this.  My
> > intuition was that a protocol version change indicates something that we
> > eventually want all client libraries to support.  Whereas protocol
> > extensions are truly opt-in.
> >
> > For example, if we didn't have the ParameterStatus message and someone
> > wanted to add it, then this ought to be a protocol version change, so
> > that we eventually get everyone to adopt it.
> >
> > But, for example, the column encryption feature is not something I'd
> > expect all client libraries to implement, so it can be opt-in.
>
> I think that is a good way of deciding between version bump vs
> protocol extension parameter. But I'd like to make one
> clarification/addition to this logic, if the protocol feature already
> requires opt-in from the client because of how the feature works, then
> there's no need for a protocol extension parameter. e.g. if you're
> column-encryption feature would require the client to send a
> ColumnDecrypt message before the server would exhibit any behaviour
> relating to the column-encryption feature, then the client can simply
> "support" the feature by never sending the ColumnDecrypt message.
> Thus, in such cases a protocol extension parameter would not be
> necessary, even if the feature is considered opt-in.
>
> > (I believe we have added a number of new protocol messages since the
> > beginning of the 3.0 protocol, without any version change, so "new
> > protocol message" might not always be a good example to begin with.)
>
> Personally, I feel like this is something we should change. IMHO, we
> should get to a state where protocol minor version bumps are so
> low-risk that we can do them whenever we add message types. Right now
> there are basically multiple versions of the 3.0 protocol, which is
> very confusing to anyone implementing it. Different servers
> implementing the 3.0 protocol without the NegotiateVersion message is
> a good example of that.
>

Totally agree.


>
> > I fear that if we prefer protocol extensions over version increases,
> > then we'd get a very fragmented landscape of different client libraries
> > supporting different combinations of things.
>
> +1
Dave

> +1
>

Re: meson vs windows perl

2024-04-05 Thread Andrew Dunstan



On 2024-04-02 Tu 09:34, Andrew Dunstan wrote:


meson.build has this code

    ldopts = run_command(perl, '-MExtUtils::Embed', '-e',
'ldopts', check: true).stdout().strip()     undesired =
run_command(perl_conf_cmd, 'ccdlflags', check:
true).stdout().split()     undesired += run_command(perl_conf_cmd,
'ldflags', check: true).stdout().split()     perl_ldopts = []    
foreach ldopt : ldopts.split(' ')   if ldopt == '' or ldopt in
undesired     continue   endif   perl_ldopts +=
ldopt.strip('"')     endforeach     message('LDFLAGS recommended
by perl: "@0@"'.format(ldopts))     message('LDFLAGS for embedding
perl: "@0@"'.format(' '.join(perl_ldopts)))


This code is seriously broken if perl reports items including spaces, 
when a) removing the quotes is quite wrong, and b) splitting on spaces 
is also wrong.


Here's an example from one of my colleagues:


C:\Program Files\Microsoft Visual Studio\2022\Professional>perl.EXE 
-MExtUtils::Embed -e ldopts
   -nologo -nodefaultlib -debug -opt:ref,icf -ltcg  
-libpath:"C:\edb\languagepack\v4\Perl-5.38\lib\CORE"
-machine:AMD64 -subsystem:console,"5.02"  
"C:\edb\languagepack\v4\Perl-5.38\lib\CORE\perl538.lib"
"C:\Program Files\Microsoft Visual 
Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\lib\x64\oldnames.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\kernel32.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\user32.lib"
"C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\gdi32.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\winspool.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\comdlg32.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\advapi32.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\shell32.lib"
"C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\ole32.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\oleaut32.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\netapi32.lib"
"C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\uuid.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\ws2_32.lib"
"C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\mpr.lib"
"C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\winmm.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\version.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\odbc32.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\odbccp32.lib"
"C:\Program Files (x86)\Windows 
Kits\10\\lib\10.0.22621.0\\um\x64\comctl32.lib"
"C:\Program Files\Microsoft Visual 
Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\lib\x64\msvcrt.lib"
"C:\Program Files\Microsoft Visual 
Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\lib\x64\vcruntime.lib"
"C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64\ucrt.lib"

And with that we get errors like

cl : Command line warning D9024 : unrecognized source file type 
'C:\Program', object file assumed
cl : Command line warning D9024 : unrecognized source file type 
'Files\Microsoft', object file assumed
cl : Command line warning D9024 : unrecognized source file type 'Visual', 
object file assumed
cl : Command line warning D9024 : unrecognized source file type 
'C:\Program', object file assumed
cl : Command line warning D9024 : unrecognized source file type 'Files', 
object file assumed
cl : Command line warning D9024 : unrecognized source file type 
'(x86)\Windows', object file assumed


It looks like we need to get smarter about how we process the ldopts and strip 
out the ccdlflags and ldflags



Here is an attempt to fix all that. It's ugly, but I think it's more 
principled.


First, instead of getting the ldopts and then trying to filter out the 
ldflags and ccdlflags, it tells perl not to include those in the first 
place, by overriding a couple of routines in ExtUtils::Embed. And 
second, it's smarter about splitting what's left, so that it doesn't 
split on a space that's in a quoted item. The perl that's used to do 
that second bit is not pretty, but it has been tested on the system 
where the problem arose and apparently cures the problem. (No doubt some 
perl guru could improve it.) It also works on my Ubuntu system, so I 
don't think we'll be breaking anything (famous last words).



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com
diff --git a/meson.build b/meson.build
index 87437960bc..91e87055d6 100644
--- a/meson.build
+++ b/meson.build
@@ -993,22 +993,11 @@ if not perlopt.disabled()
 # Config's ccdlflags and ldflags.  (Those are the choices of those who
 # built the Perl installation, which are not necessarily ap

Re: remaining sql/json patches

2024-04-05 Thread Amit Langote

On Thu, Apr 4, 2024 at 9:02 PM Amit Langote  wrote:
> I'll post the rebased 0002 tomorrow after addressing your comments.

Here's one.  Main changes:

* Fixed a bug in get_table_json_columns() which caused nested columns
to be deparsed incorrectly, something Jian reported upthread.
* Simplified the algorithm in JsonTablePlanNextRow()

I'll post another revision or two maybe tomorrow, but posting what I
have now in case Jian wants to do more testing.

-- 
Thanks, Amit Langote


v50-0001-JSON_TABLE-Add-support-for-NESTED-columns.patch
Description: Binary data

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila

On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila  wrote:
>
> On Thu, Apr 4, 2024 at 2:59 PM shveta malik  wrote:
> >
> > There is an intermittent BF failure observed at [1] after this commit 
> > (2ec005b).
> >
>
> Thanks for analyzing and providing the patch. I'll look into it. There
> is another BF failure [1] which I have analyzed. The main reason for
> failure is the following test:
>
> #   Failed test 'logical slots have synced as true on standby'
> #   at 
> /home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
> line 198.
> #  got: 'f'
> # expected: 't'
>
> Here, we are expecting that the two logical slots (lsub1_slot, and
> lsub2_slot), one created via subscription and another one via API
> pg_create_logical_replication_slot() are synced. The standby LOGs
> which are as follows show that the one created by API 'lsub2_slot' is
> synced but the other one 'lsub1_slot':
>
> LOG for lsub1_slot:
> 
> 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL:
> Streaming transactions committing after 0/0, reading WAL from
> 0/360.
> 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0]
> STATEMENT:  SELECT pg_sync_replication_slots();
> 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG:
> xmin required by slots: data 0, catalog 740
> 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG:
> could not sync slot "lsub1_slot"
>
> LOG for lsub2_slot:
> 
> 2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG:
> xmin required by slots: data 0, catalog 740
> 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG:
> newly created slot "lsub2_slot" is sync-ready now
> 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0]
> STATEMENT:  SELECT pg_sync_replication_slots();
>
> We can see from the log of lsub1_slot that its restart_lsn is
> 0/360 which means it will start reading from the WAL from that
> location. Now, if we check the publisher log, we have running_xacts
> record at that location. See following LOGs:
>
> 2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG:
> statement: SELECT pg_create_logical_replication_slot('lsub2_slot',
> 'test_decoding', false, false, true);
> 2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG:
> snapshot of 0+0 running transaction ids (lsn 0/360 oldest xid 740
> latest complete 739 next xid 740)
> 
> 
> 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
> snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740
> latest complete 739 next xid 740)
>
> The first running_xact record ends at 360 and the second one at
> 398. So, the start location of the second running_xact is 360,
> the same can be confirmed by the following LOG line of walsender:
>
> 2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG:
> serializing snapshot to pg_logical/snapshots/0-360.snap
>
> This shows that while processing running_xact at location 360, we
> have serialized the snapshot. As there is no running transaction in
> WAL at 360 so ideally we should have reached a consistent state
> after processing that record on standby. But the reason standby didn't
> process that LOG is that the confirmed_flush LSN is also at the same
> location so the function LogicalSlotAdvanceAndCheckSnapState() exits
> without reading the WAL at that location. Now, this can be confirmed
> by the below walsender-specific LOG in publisher:
>
> 2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write
> 0/360 flush 0/360 apply 0/360 reply_time 2024-04-05
> 04:36:59.155181+00
>
> We update the confirmed_flush location with the flush location after
> receiving the above feedback. You can notice that we didn't receive
> the feedback for the 398 location and hence both the
> confirmed_flush and restart_lsn are at the same location 0/360.
> Now, the test is waiting for the subscriber to send feedback of the
> last WAL write location by
> $primary->wait_for_catchup('regress_mysub1'); As noticed from the
> publisher LOGs, the query we used for wait is:
>
> SELECT '0/360' <= replay_lsn AND state = 'streaming'
> FROM pg_catalog.pg_stat_replication
> WHERE application_name IN ('regress_mysub1', 'walreceiver')
>
> Here, instead of '0/360' it should have used ''0/398' which is
> the last write location. This position we get via function
> pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And
> this variable seems to be touched by commit
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At
> this stage, I am not sure so just sharing with others to see if what I
> am saying sounds logical. I'll think more about this.
>

Thinking more on this, it doesn't seem related to
c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit does

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Robert Haas

On Thu, Apr 4, 2024 at 8:50 AM Peter Eisentraut  wrote:
> It appears there are several different perspectives about this.  My
> intuition was that a protocol version change indicates something that we
> eventually want all client libraries to support.  Whereas protocol
> extensions are truly opt-in.

Hmm. This doesn't seem like a bad way to make a distinction to me,
except that I fear it would be mushy in practice. For example:

> For example, if we didn't have the ParameterStatus message and someone
> wanted to add it, then this ought to be a protocol version change, so
> that we eventually get everyone to adopt it.
>
> But, for example, the column encryption feature is not something I'd
> expect all client libraries to implement, so it can be opt-in.

I agree that column encryption might not necessarily be supported by
all client libraries, but equally, the ParameterStatus message is just
for the benefit of the client. A client that doesn't care about the
contents of such a message is free to ignore it, and would be better
off if it weren't sent in the first place; it's just extra bytes on
the wire that aren't needed for anything. So why would we want to
force everyone to adopt that, if it didn't exist already?

I also wonder how the protocol negotiation for column encryption is
actually going to work. What are the actual wire protocol changes that
are needed? What does the server need to know from the client, or the
client from the server, about what is supported?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Allow non-superuser to cancel superuser tasks.

2024-04-05 Thread Nathan Bossart

On Fri, Apr 05, 2024 at 02:39:05PM +0900, Michael Paquier wrote:
> + /*
> +  * If the backend is autovacuum worker, allow user with the privileges 
> of
> +  * pg_signal_autovacuum role to signal the backend.
> +  */
> + if (pgstat_get_backend_type(GetNumberFromPGProc(proc)) == 
> B_AUTOVAC_WORKER)
> + {
> + if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM))
> + return SIGNAL_BACKEND_NOAUTOVACUUM;
> + }
> 
> I was wondering why this is not done after we've checked that we have
> a superuser-owned backend, and this is giving me a pause.  @Nathan,
> why do you think we should not rely on the roleId for an autovacuum
> worker?  In core, do_autovacuum() is only called in a process without
> a role specified, and I've noticed your remark here:
> https://www.postgresql.org/message-id/20240401202146.GA2354284@nathanxps13
> It's feeling more natural here to check that we have a superuser-owned
> backend first, and then do a lookup of the process type.

I figured since there's no reason to rely on that behavior, we might as
well do a bit of future-proofing in case autovacuum workers are ever not
run as InvalidOid.  It'd be easy enough to fix this code if that ever
happened, so I'm not too worried about this.

> One thing that we should definitely not do is letting any user calling
> pg_signal_backend() know that a given PID maps to an autovacuum
> worker.  This information is hidden in pg_stat_activity.  And
> actually, doesn't the patch leak this information to all users when
> calling pg_signal_backend with random PID numbers because of the fact
> that SIGNAL_BACKEND_NOAUTOVACUUM exists?  Any role could guess which
> PIDs are used by an autovacuum worker because of the granularity
> required for the error related to pg_signal_autovacuum.

Hm.  I hadn't considered that angle.  IIUC right now they'll just get the
generic superuser error for autovacuum workers.  I don't know how concerned
to be about users distinguishing autovacuum workers from other superuser
backends, _but_ if roles with pg_signal_autovacuum can't even figure out
the PIDs for the autovacuum workers, then this feature seems kind-of
useless.  Perhaps we should allow roles with privileges of
pg_signal_autovacuum to see the autovacuum workers in pg_stat_activity.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: IPC::Run::time[r|out] vs our TAP tests

2024-04-05 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> Erik Wienhold  writes:
>> Libedit 20191025-3.1 is the first version where ":{?VERB" works as
>> expected.  The previous release 20190324-3.1 still produces the escaped
>> output that Michael found.  That narrows down the changes to everything
>> between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and
>> [2] (both inclusive).
>
> Hm.  I just installed NetBSD 8.2 in a VM, and it passes this test:
>
> # +++ tap install-check in src/bin/psql +++
> t/001_basic.pl ... ok 
> t/010_tab_completion.pl .. ok
> t/020_cancel.pl .. ok   
> All tests successful.
>
> So it seems like the bug does not exist in any currently-supported
> NetBSD release.  Debian has been known to ship obsolete libedit
> versions, though.

Both the current (bokworm/12) and previous (bullseye/11) versions of
Debian have new enough libedits to not be affected by this bug:

libedit| 3.1-20181209-1 | oldoldstable   | source
libedit| 3.1-20191231-2 | oldstable  | source
libedit| 3.1-20221030-2 | stable | source
libedit| 3.1-20230828-1 | testing| source
libedit| 3.1-20230828-1 | unstable   | source
libedit| 3.1-20230828-1 | unstable-debug | source

But in bullseye they decided that OpenSSL is a system library as far as
the GPL is concerned, so are linking directly to readline.

And even before then their psql wrapper would LD_PRELOAD readline
instead if installed, so approximately nobody actually ever used psql
with libedit on Debian.

>   regards, tom lane

- ilmari

Re: Fixing backslash dot for COPY FROM...CSV

2024-04-05 Thread Daniel Verite

Tom Lane wrote:

> I've looked over this patch and I generally agree that this is a
> reasonable solution.

Thanks for reviewing this!

> I'm also wondering why the patch adds a test for
> "PQprotocolVersion(conn) >= 3" in handleCopyIn. 

I've removed this in the attached update.

> I concur with Robert's doubts about some of the doc changes though.
> In particular, since we're not changing the behavior for non-CSV
> mode, we shouldn't remove any statements that apply to non-CSV mode.

I don't quite understand the issues with the doc changes. Let me
detail the changes.

The first change is under 
  
   CSV Format
so it does no concern non-CSV modes.

--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -761,11 +761,7 @@ COPY count
format, \., the end-of-data marker, could also appear
as a data value.  To avoid any misinterpretation, a \.
data value appearing as a lone entry on a line is automatically
-quoted on output, and on input, if quoted, is not interpreted as the
-end-of-data marker.  If you are loading a file created by another
-application that has a single unquoted column and might have a
-value of \., you might need to quote that value in
the
-input file.
+quoted on output.
   


The part about quoting output is kept because the code still does that.

About this bit:  
 "and on input, if quoted, is not interpreted as the end-of-data marker."

So the patch removes that part. The reason is \. is now not interpreted as
EOD in both cases, quoted or unquoted, conforming to spec.
Previously, what the reader should have understood by "if quoted, ..."
is that it implies "if not quoted, then .\ will be interpreted as EOD
even though that behavior does not conform to the CSV spec".
If we documented the new behavior, it would be something like:
when quoted, it works as expected, and when unquoted, it works as
expected too. Isn't it simpler not to say anything?

About the next sentence:
  "If you are loading a file created by another application
  that has a single unquoted column and might have a value of \., you
  might need to quote that value in the input file."

It's removed as well because it's not longer necessary
to do that.

The other hunk is in psql doc:

--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1119,10 +1119,6 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value'
'second value' \g
 destination, because all data must pass through the client/server
 connection.  For large amounts of data the SQL
 command might be preferable.
-Also, because of this pass-through method, \copy
-... from in CSV mode will erroneously
-treat a \. data value alone on a line as an
-end-of-input marker.

That behavior no longer happens, so this gets removed as well.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
From 5a16b81f10e47fe1ac3842abd34e8bef7e639e26 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= 
Date: Fri, 5 Apr 2024 14:22:49 +0200
Subject: [PATCH v4] Support backslash-dot on a line by itself as valid data in
 COPY FROM...CSV

---
 doc/src/sgml/ref/copy.sgml   |  6 +--
 doc/src/sgml/ref/psql-ref.sgml   |  4 --
 src/backend/commands/copyfromparse.c | 57 +++-
 src/bin/psql/copy.c  | 30 ++-
 src/test/regress/expected/copy.out   | 18 +
 src/test/regress/sql/copy.sql| 12 ++
 6 files changed, 65 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 33ce7c4ea6..a63f073c1b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -814,11 +814,7 @@ COPY count
 format, \., the end-of-data marker, could also appear
 as a data value.  To avoid any misinterpretation, a \.
 data value appearing as a lone entry on a line is automatically
-quoted on output, and on input, if quoted, is not interpreted as the
-end-of-data marker.  If you are loading a file created by another
-application that has a single unquoted column and might have a
-value of \., you might need to quote that value in the
-input file.
+quoted on output.

 

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index cc7d797159..d94e3cacfc 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1119,10 +1119,6 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 
'second value' \g
 destination, because all data must pass through the client/server
 connection.  For large amounts of data the SQL
 command might be preferable.
-Also, because of this pass-through method, \copy
-... from in CSV mode will erroneously
-treat a \. data value alone on a line as an
-end-of-input marker.
 
 
 
diff --git a/src/backend/commands/copyfromparse.c

Re: Popcount optimization using AVX512

2024-04-05 Thread Nathan Bossart

On Fri, Apr 05, 2024 at 10:33:27AM +0300, Ants Aasma wrote:
> The main issue I saw was that clang was able to peel off the first
> iteration of the loop and then eliminate the mask assignment and
> replace masked load with a memory operand for vpopcnt. I was not able
> to convince gcc to do that regardless of optimization options.
> Generated code for the inner loop:
> 
> clang:
> :
>   50:  add rdx, 64
>   54:  cmp rdx, rdi
>   57:  jae 
>   59:  vpopcntq zmm1, zmmword ptr [rdx]
>   5f:  vpaddq zmm0, zmm1, zmm0
>   65:  jmp 
> 
> gcc:
> :
>   38:  kmovq k1, rdx
>   3d:  vmovdqu8 zmm0 {k1} {z}, zmmword ptr [rax]
>   43:  add rax, 64
>   47:  mov rdx, -1
>   4e:  vpopcntq zmm0, zmm0
>   54:  vpaddq zmm0, zmm0, zmm1
>   5a:  vmovdqa64 zmm1, zmm0
>   60:  cmp rax, rsi
>   63:  jb 
> 
> I'm not sure how much that matters in practice. Attached is a patch to
> do this manually giving essentially the same result in gcc. As most
> distro packages are built using gcc I think it would make sense to
> have the extra code if it gives a noticeable benefit for large cases.

Yeah, I did see this, but I also wasn't sure if it was worth further
complicating the code.  I can test with and without your fix and see if it
makes any difference in the benchmarks.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: broken JIT support on Fedora 40

2024-04-05 Thread Thomas Munro

On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro  wrote:
> https://github.com/llvm/llvm-project/pull/87093

Oh, with those clues, I think I might see...  It is a bit strange that
we copy attributes from AttributeTemplate(), a function that returns
Datum, to our void deform function.  It works (I mean doesn't crash)
if you just comment this line out:

llvm_copy_attributes(AttributeTemplate, v_deform_fn);

... but I guess that disables inlining of the deform function?  So
perhaps we just need to teach that thing not to try to copy the return
value's attributes, which also seems to work here:

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index ec0fdd49324..92b4993a98a 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -552,8 +552,11 @@ llvm_copy_attributes(LLVMValueRef v_from,
LLVMValueRef v_to)
/* copy function attributes */
llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeFunctionIndex);

-   /* and the return value attributes */
-   llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex);
+   if (LLVMGetTypeKind(LLVMGetFunctionReturnType(v_to)) !=
LLVMVoidTypeKind)
+   {
+   /* and the return value attributes */
+   llvm_copy_attributes_at_index(v_from, v_to,
LLVMAttributeReturnIndex);
+   }

Re: broken JIT support on Fedora 40

2024-04-05 Thread Dmitry Dolgov

> On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
> On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro  wrote:
> > https://github.com/llvm/llvm-project/pull/87093
>
> Oh, with those clues, I think I might see...  It is a bit strange that
> we copy attributes from AttributeTemplate(), a function that returns
> Datum, to our void deform function.  It works (I mean doesn't crash)
> if you just comment this line out:
>
> llvm_copy_attributes(AttributeTemplate, v_deform_fn);
>
> ... but I guess that disables inlining of the deform function?  So
> perhaps we just need to teach that thing not to try to copy the return
> value's attributes, which also seems to work here:

Yep, I think this is it. I've spent some hours trying to understand why
suddenly deform function has noundef ret attribute, when it shouldn't --
this explains it and the proposed change fixes the crash. One thing that
is still not clear to me though is why this copied attribute doesn't
show up in the bitcode dumped right before running inline pass (I've
added this to troubleshoot the issue).

Re: Security lessons from liblzma

2024-04-05 Thread Robert Haas

On Thu, Apr 4, 2024 at 4:48 PM Daniel Gustafsson  wrote:
> AFAIK we haven't historically enforced that installations have the openssl
> binary in PATH, but it would be a pretty low bar to add.  The bigger issue is
> likely to find someone to port this to Windows, it probably won't be too hard
> but as with all things building on Windows, we need someone skilled in that
> area to do it.

I wonder how hard it would be to just code up our own binary to do
this. If it'd be a pain to do that, or to maintain it across SSL
versions, then it's a bad plan and we shouldn't do it. But if it's not
that much code, maybe it'd be worth considering.

I'm also sort of afraid that we're getting sucked into thinking real
hard about this SSL certificate issue rather than trying to brainstorm
all the other places that might be problematic. The latter might be a
more fruitful exercise (or maybe not, what do I know?).

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add bump memory context type and use it for tuplesorts

2024-04-05 Thread Matthias van de Meent

On Thu, 4 Apr 2024 at 22:43, Tom Lane  wrote:
>
> Matthias van de Meent  writes:
> > On Mon, 25 Mar 2024 at 22:44, Tom Lane  wrote:
> >> Basically, I'm not happy with consuming the last reasonably-available
> >> pattern for a memory context type that has little claim to being the
> >> Last Context Type We Will Ever Want.  Rather than making a further
> >> dent in our ability to detect corrupted chunks, we should do something
> >> towards restoring the expansibility that existed in the original
> >> design.  Then we can add bump contexts and whatever else we want.
>
> > So, would something like the attached make enough IDs available so
> > that we can add the bump context anyway?
>
> > It extends memory context IDs to 5 bits (32 values), of which
> > - 8 have glibc's malloc pattern of 001/010;
> > - 1 is unused memory's 0
> > - 1 is wipe_mem's 1
> > - 4 are used by existing contexts (Aset/Generation/Slab/AlignedRedirect)
> > - 18 are newly available.
>
> This seems like it would solve the problem for a good long time
> to come; and if we ever need more IDs, we could steal one more bit
> by requiring the offset to the block header to be a multiple of 8.
> (Really, we could just about do that today at little or no cost ...
> machines with MAXALIGN less than 8 are very thin on the ground.)

Hmm, it seems like a decent idea, but I didn't want to deal with the
repercussions of that this late in the cycle when these 2 bits were
still relatively easy to get hold of.

> The only objection I can think of is that perhaps this would slow
> things down a tad by requiring more complicated shifting/masking.
> I wonder if we could redo the performance checks that were done
> on the way to accepting the current design.

I didn't do very extensive testing, but the light performance tests
that I did with the palloc performance benchmark patch & script shared
above indicate didn't measure an observable negative effect.
An adapted version of the test that uses repalloc() to check
performance differences in MCXT_METHOD() doesn't show a significant
performance difference from master either. That test case is attached
as repalloc-performance-test-function.patch.txt.

The full set of patches would then accumulate to the attached v5 of
the patchset.
0001 is an update of my patch from yesterday, in which I update
MemoryContextMethodID infrastructure for more IDs, and use a new
naming scheme for unused/reserved IDs.
0002 and 0003 are David's patches, with minor changes to work with
0001 (rebasing, and I moved the location around to keep function
declaration in order with memctx ids)

Kind regards,

Matthias van de Meent
From b65320736cf63684b4710b3d63e243a0862d37c6 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Fri, 5 Apr 2024 15:13:52 +0200
Subject: [PATCH] repalloc performance test function

---
 src/backend/utils/adt/mcxtfuncs.c | 231 ++
 src/include/catalog/pg_proc.dat   |  12 ++
 2 files changed, 243 insertions(+)

diff --git a/src/backend/utils/adt/mcxtfuncs.c 
b/src/backend/utils/adt/mcxtfuncs.c
index 4d4a70915b..88de4bbcb5 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,8 +15,11 @@
 
 #include "postgres.h"
 
+#include 
+
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
@@ -185,3 +188,231 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
 
PG_RETURN_BOOL(true);
 }
+
+typedef struct AllocateTestNext
+{
+   struct AllocateTestNext *next;  /* ptr to the next allocation */
+} AllocateTestNext;
+
+/* #define ALLOCATE_TEST_DEBUG */
+/*
+ * pg_allocate_memory_test
+ * Used to test the performance of a memory context types
+ */
+Datum
+pg_allocate_memory_test(PG_FUNCTION_ARGS)
+{
+   int32   chunk_size = PG_GETARG_INT32(0);
+   int64   keep_memory = PG_GETARG_INT64(1);
+   int64   total_alloc = PG_GETARG_INT64(2);
+   text   *context_type_text = PG_GETARG_TEXT_PP(3);
+   char   *context_type;
+   int64   curr_memory_use = 0;
+   int64   remaining_alloc_bytes = total_alloc;
+   MemoryContext context;
+   MemoryContext oldContext;
+   AllocateTestNext   *next_free_ptr = NULL;
+   AllocateTestNext   *last_alloc = NULL;
+   clock_t start, end;
+
+   if (chunk_size < sizeof(AllocateTestNext))
+   elog(ERROR, "chunk_size (%d) must be at least %ld bytes", 
chunk_size,
+sizeof(AllocateTestNext));
+   if (keep_memory > total_alloc)
+   elog(ERROR, "keep_memory (" INT64_FORMAT ") must be less than 
total_alloc (" INT64_FORMAT ")",
+keep_memory, total_alloc);
+
+   context_type = text_to_cstring(context_type_text);
+
+   start = clock();
+
+   if (strcmp(context_type, "generation") == 0)
+   context = GenerationContextCreate(CurrentMemoryContext

Re: Speed up clean meson builds by ~25%

2024-04-05 Thread Jelte Fennema-Nio

On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio  wrote:
> It improved clean build times on my machine (10 cores/20 threads) from ~40
> seconds to ~30 seconds.

After discussing this off-list with Bilal, I realized that this gain
is only happening for clang builds on my system. Because those take a
long time as was also recently discussed in[1]. My builds don't take
nearly as long though. I tried with clang 15 through 18 and they all
took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu
22.04

[1]: 
https://www.postgresql.org/message-id/flat/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com

Re: broken JIT support on Fedora 40

2024-04-05 Thread Dmitry Dolgov

> On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote:
> > On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
> > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro  
> > wrote:
> > > https://github.com/llvm/llvm-project/pull/87093
> >
> > Oh, with those clues, I think I might see...  It is a bit strange that
> > we copy attributes from AttributeTemplate(), a function that returns
> > Datum, to our void deform function.  It works (I mean doesn't crash)
> > if you just comment this line out:
> >
> > llvm_copy_attributes(AttributeTemplate, v_deform_fn);
> >
> > ... but I guess that disables inlining of the deform function?  So
> > perhaps we just need to teach that thing not to try to copy the return
> > value's attributes, which also seems to work here:
>
> Yep, I think this is it. I've spent some hours trying to understand why
> suddenly deform function has noundef ret attribute, when it shouldn't --
> this explains it and the proposed change fixes the crash. One thing that
> is still not clear to me though is why this copied attribute doesn't
> show up in the bitcode dumped right before running inline pass (I've
> added this to troubleshoot the issue).

One thing to consider in this context is indeed adding "verify" pass as
suggested in the PR, at least for the debugging configuration. Without the fix
it immediately returns:

Running analysis: VerifierAnalysis on deform_0_1
Attribute 'noundef' applied to incompatible type!

llvm error: Broken function found, compilation aborted!

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Robert Haas

On Thu, Apr 4, 2024 at 12:45 PM Jelte Fennema-Nio  wrote:
> Yeah, we definitely think differently here then. To me bumping the
> minor protocol version shouldn't be a thing that we would need to
> carefully consider. It should be easy to do, and probably done often.

Often?

I kind of hope that the protocol starts to evolve a bit more than it
has, but I don't want a continuous stream of changes. That will be
very hard to test and verify correctness, and a hassle for drivers to
keep up with, and a mess for compatibility.

> So, what approach of checking feature support are you envisioning
> instead? A function for every feature?
> Something like SupportsSetProtocolParameter, that returns an error
> message if it's not supported and NULL otherwise. And then add such a
> support function for every feature?

Yeah, something like that.

> I think I might agree with you that it would be nice for features that
> depend on a protocol extension parameter, indeed because in the future
> we might make them required and they don't have to update their logic
> then. But for features only depending on the protocol version I
> honestly don't see the point. A protocol version check is always going
> to continue working.

Sure, if we introduce it based on the protocol version then we don't
need anything else.

> I'm also not sure why you're saying a user is not entitled to this
> information. We already provide users of libpq a way to see the full
> Postgres version, and the major protocol version. I think allowing a
> user to access this information is only a good thing. But I agree that
> providing easy to use feature support functions is a better user
> experience in some cases.

I guess that's a fair point. But I'm worried that if we expose too
much of the internals, we won't be able to change things later.

> > But this is another example of a problem that can *easily* be fixed
> > without using up the entirety of the _pq_ namespace. You can introduce
> > _pq_.dont_error_on_unknown_gucs=1 or
> > _pq_.dont_error_on_these_gucs='foo, bar, baz'. The distinction between
> > the startup packet containing whizzbang=frob and instead containing
> > _pq_.whizzbang=frob shouldn't be just whether an error is thrown if we
> > don't know anything about whizzbang.
>
> Nice idea, but that sadly won't work well in practice. Because old PG
> versions don't know about _pq_.dont_error_on_unknown_gucs. So that
> would mean we'd have to wait another 5 years (and probably more) until
> we could set non-_pq_-prefixed GUCs safely in the startup message,
> using this approach.

Hmm. I guess that is a problem.

> Two side-notes:
> 1. I realized there is a second benefit to using _pq_ for all GUCs
> that change the protocol level that I didn't mention in my previous
> email: It allows clients to assume that _pq_ prefixed GUCs will change
> the wire-protocol and that they should not allow a user of the client
> to set them willy-nilly. I'll go into this benefit more in the rest of
> this email.
> 2. To clarify: I'm suggesting that a startup packet containing
> _pq_.whizzbang would actually set the _pq_.whizzbang GUC, and not the
> whizzbang GUC. i.e. the _pq_ prefix is not stripped-off when parsing
> the startup packet.

I really intended the _pq_ prefix as a way of taking something out of
the GUC namespace, not as a part of the GUC namespace that users would
see. And I'm reluctant to go back on that. If we want to make
pg_protocol.${NAME} mean a wire protocol parameter, well maybe there's
something to that idea [insert caveats here]. But doesn't _pq_ look
like something that was intended to be internal? That's certainly how
I intended it.

> > I want to be able to know the state of my protocol
> > parameters by calling libpq functions that answer my questions
> > definitively based on libpq's own internal state. libpq itself *must*
> > know what compression I'm using on my connection; the server's answer
> > may be different.
>
> I think that totally makes sense that libpq should be able to answer
> those questions without contacting the server, and indeed some
> introspection should be added for that. But being able to introspect
> what the server thinks the setting is seems quite useful too. That
> still doesn't solve the problem of poolers though. How about
> introducing a new ParameterGet message type too (matching the proposed
> ParameterSet), so that poolers can easily parse and intercept that
> message type.

Wouldn't libpq already know what value it last set? Or is this needed
because it doesn't know what the other side's default is?

> 2. By using PGC_PROTOCOL to indicate that those GUCs can only be
> changed using ParameterSet

Hmm, OK. I guess if the PGC_PROTOCOL flag makes it so that the GUC can
only be set using ParameterSet, and it also makes them
non-transactional, then it's fine. So to be clear, I can't set these
in postgresql.conf, or postgresql.auto.conf, or via ALTER $ANYTHING,
or via SET, or in any other way than by sending P

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Robert Haas

On Thu, Apr 4, 2024 at 1:10 PM Jelte Fennema-Nio  wrote:
> Attached is a rebased patchset

We should keep talking about this, but I think we're far too close to
the wire at this point to think about committing anything for v17 at
this point. These are big changes, they haven't been thoroughly
reviewed by anyone AFAICT, and we don't have consensus on what we
ought to be doing. I know that's probably not what you want to hear,
but realistically, I think that's the only reasonable decision at this
point.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: meson vs windows perl

2024-04-05 Thread Andrew Dunstan




On 2024-04-05 Fr 08:25, Andrew Dunstan wrote:




Here is an attempt to fix all that. It's ugly, but I think it's more 
principled.


First, instead of getting the ldopts and then trying to filter out the 
ldflags and ccdlflags, it tells perl not to include those in the first 
place, by overriding a couple of routines in ExtUtils::Embed. And 
second, it's smarter about splitting what's left, so that it doesn't 
split on a space that's in a quoted item. The perl that's used to do 
that second bit is not pretty, but it has been tested on the system 
where the problem arose and apparently cures the problem. (No doubt 
some perl guru could improve it.) It also works on my Ubuntu system, 
so I don't think we'll be breaking anything (famous last words).





Apparently I spoke too soon. Please ignore the above for now.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: IPC::Run::time[r|out] vs our TAP tests

2024-04-05 Thread Andrew Dunstan



On 2024-04-04 Th 17:24, Tom Lane wrote:

TIL that IPC::Run::timer is not the same as IPC::Run::timeout.
With a timer object you have to check $timer->is_expired to see
if the timeout has elapsed, but with a timeout object you don't
because it will throw a Perl exception upon timing out, probably
killing your test program.

It appears that a good chunk of our TAP codebase has not read this
memo, because I see plenty of places that are checking is_expired
in the naive belief that they'll still have control after a timeout
has fired.



I started having a look at these.

Here are the cases I found:

./src/bin/psql/t/010_tab_completion.pl:    my $okay = ($out =~ $pattern 
&& !$h->{timeout}->is_expired);
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm:      until 
$self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm:    die "psql startup 
timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm:    die "psql query 
timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm:    die "psql query 
timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/Cluster.pm:            # timeout, which 
we'll handle by testing is_expired
./src/test/perl/PostgreSQL/Test/Cluster.pm:              unless 
$timeout->is_expired;
./src/test/perl/PostgreSQL/Test/Cluster.pm:timeout is the 
IPC::Run::Timeout object whose is_expired method can be tested
./src/test/perl/PostgreSQL/Test/Cluster.pm:            # timeout, which 
we'll handle by testing is_expired
./src/test/perl/PostgreSQL/Test/Cluster.pm:              unless 
$timeout->is_expired;

./src/test/perl/PostgreSQL/Test/Utils.pm:        if ($timeout->is_expired)
./src/test/recovery/t/021_row_visibility.pl:        if 
($psql_timeout->is_expired)
./src/test/recovery/t/032_relfilenode_reuse.pl:        if 
($psql_timeout->is_expired)


Those in Cluster.pm look correct - they are doing the run() in an eval 
block and testing for the is_expired setting in an exception block. The 
other cases look more suspect. I'll take a closer look.


cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: WIP Incremental JSON Parser

2024-04-05 Thread Andrew Dunstan



On 2024-04-04 Th 15:54, Nathan Bossart wrote:

On Thu, Apr 04, 2024 at 03:31:12PM -0400, Andrew Dunstan wrote:

Does the attached patch fix it for you?

It clears up 2 of the 3 warnings for me:

../postgresql/src/common/jsonapi.c: In function ‘IsValidJsonNumber’:
../postgresql/src/common/jsonapi.c:2018:30: error: ‘dummy_lex.inc_state’ may be 
used uninitialized in this function [-Werror=maybe-uninitialized]
  2018 |  if (lex->incremental && !lex->inc_state->is_last_chunk &&
   |



Thanks, please try this instead.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index 5f947dd618..44dbb7f7f9 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -279,6 +279,8 @@ IsValidJsonNumber(const char *str, int len)
 		return false;
 
 	dummy_lex.incremental = false;
+	dummy_lex.inc_state = NULL;
+	dummy_lex.pstack = NULL;
 
 	/*
 	 * json_lex_number expects a leading  '-' to have been eaten already.
@@ -297,6 +299,8 @@ IsValidJsonNumber(const char *str, int len)
 		dummy_lex.input_length = len;
 	}
 
+	dummy_lex.token_start = dummy_lex.input;
+
 	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len);
 
 	return (!numeric_error) && (total_len == dummy_lex.input_length);
@@ -2018,6 +2022,9 @@ json_lex_number(JsonLexContext *lex, char *s,
 	{
 		appendBinaryStringInfo(&lex->inc_state->partial_token,
 			   lex->token_start, s - lex->token_start);
+		if (num_err != NULL)
+			*num_err = error;
+
 		return JSON_INCOMPLETE;
 	}
 	else if (num_err != NULL)

Re: Add bump memory context type and use it for tuplesorts

2024-04-05 Thread Tom Lane

Matthias van de Meent  writes:
> On Thu, 4 Apr 2024 at 22:43, Tom Lane  wrote:
>> The only objection I can think of is that perhaps this would slow
>> things down a tad by requiring more complicated shifting/masking.
>> I wonder if we could redo the performance checks that were done
>> on the way to accepting the current design.

> I didn't do very extensive testing, but the light performance tests
> that I did with the palloc performance benchmark patch & script shared
> above indicate didn't measure an observable negative effect.

OK.  I did not read the patch very closely, but at least in principle
I have no further objections.  David, are you planning to take point
on getting this in?

regards, tom lane

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Bertrand Drouvot

Hi,

On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila  wrote:
> Thinking more on this, it doesn't seem related to
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change
> any locking or something like that which impacts write positions.

Agree.

> I think what has happened here is that running_xact record written by
> the background writer [1] is not written to the kernel or disk (see
> LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> current_lsn to be compared with replayed LSN.

Agree, I think it's not visible through pg_current_wal_lsn() yet.

Also I think that the DEBUG message in LogCurrentRunningXacts() 

"
elog(DEBUG2,
 "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid 
%u latest complete %u next xid %u)",
 CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
 LSN_FORMAT_ARGS(recptr),
 CurrRunningXacts->oldestRunningXid,
 CurrRunningXacts->latestCompletedXid,
 CurrRunningXacts->nextXid);
"

should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is
visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is
released, see:

\watch on Session 1 provides: 

 pg_current_wal_lsn

 0/87D110
(1 row)

Until:

Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579 
2579XLogRecPtr  WriteRqstPtr = asyncXactLSN;
(gdb) n
2581boolwakeup = false;
(gdb)
2584SpinLockAcquire(&XLogCtl->info_lck);
(gdb)
2585RefreshXLogWriteResult(LogwrtResult);
(gdb)
2586sleeping = XLogCtl->WalWriterSleeping;
(gdb)
2587prevAsyncXactLSN = XLogCtl->asyncXactLSN;
(gdb)
2588if (XLogCtl->asyncXactLSN < asyncXactLSN)
(gdb)
2589XLogCtl->asyncXactLSN = asyncXactLSN;
(gdb)
2590SpinLockRelease(&XLogCtl->info_lck);
(gdb) p p/x (uint32) XLogCtl->asyncXactLSN
$1 = 0x87d148

Then session 1 provides:

 pg_current_wal_lsn

 0/87D148
(1 row)

So, when we see in the log:

2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:  snapshot 
of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 
739 next xid 740)
2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:  statement: 
SELECT '0/360' <= replay_lsn AND state = 'streaming'

It's indeed possible that the new LSN was not visible yet (spinlock not 
released?)
before the query began (because we can not rely on the time the DEBUG message 
has
been emitted).

> Note that the reason why
> walsender has picked the running_xact written by background writer is
> because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> I think we can probably try to reproduce manually via debugger.
> 
> If this theory is correct

It think it is.

> then I think we will need to use injection
> points to control the behavior of bgwriter or use the slots created
> via SQL API for syncing in tests.
> 
> Thoughts?

I think that maybe as a first step we should move the "elog(DEBUG2," message as
proposed above to help debugging (that could help to confirm the above theory).

If the theory is proven then I'm not sure we need the extra complexity of
injection point here, maybe just relying on the slots created via SQL API could
be enough.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Psql meta-command conninfo+

2024-04-05 Thread Peter Eisentraut


On 04.04.24 18:15, Maiquel Grassi wrote:

Well, I can revert \conninfo to its original state and keep \conninfo+
as it is, perhaps removing the application name, as I believe everything
else is important for a DBA/SysAdmin. Do you think we can proceed
with the patch this way? I am open to ideas that make \conninfo not
limited to just four or five basic pieces of information and easily bring
everything else to the screen.


The original \conninfo was designed to report values from the libpq API 
about what libpq connected to.  And the convention in psql is that "+" 
provide more or less the same information but a bit more.  So I think it 
is wrong to make "\conninfo+" something fundamentally different than 
"\conninfo".


And even more so if it contains information that isn't really 
"connection information".  For example, current_user() doesn't make 
sense here, I think.


I mean, if you want to add a command \some_useful_status_information, we 
can talk about that, but let's leave \conninfo to what it does.


But I think there are better ways to implement this kind of server-side 
status, for example, with a system view.


One problem in this patch is that it has no tests.  Any change in any of 
the involved functions or views will silently break this.  (Note that 
plain \conninfo doesn't have this problem to this extent because it only 
relies on libpq function calls.  Also, a system view would not have this 
problem.)

Re: documentation structure

2024-04-05 Thread Robert Haas

On Fri, Mar 29, 2024 at 9:40 AM Robert Haas  wrote:
> 2. Demote "Monitoring Disk Usage" from a chapter on its own to a
> section of the "Monitoring Database Activity" chapter. I haven't seen
> any objections to this, and I'd like to move ahead with it.
>
> 3. Merge the separate chapters on various built-in index AMs into one.
> Peter didn't think this was a good idea, but Tom and Alvaro's comments
> focused on how to do it mechanically, and on whether the chapters
> needed to be reordered afterwards, which I took to mean that they were
> OK with the basic concept. David Johnston was also clearly in favor of
> it. So I'd like to move ahead with this one, too.

I committed these two patches.

> 4. Consolidate the "Generic WAL Records" and "Custom WAL Resource
> Managers" chapters, which cover related topics, into a single one. I
> didn't see anyone object to this, but David Johnston pointed out that
> the patch I posted was a few bricks short of a load, because it really
> needed to put some introductory text into the new chapter. I'll study
> this a bit more and propose a new patch that does the same thing a bit
> more carefully than my previous version did.

Here is a new version of this patch. I think this is v18 material at
this point, absent an outcry to the contrary. Sometimes we're flexible
about doc patches.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v3-0001-docs-Consolidate-into-new-WAL-for-Extensions-chap.patch
Description: Binary data

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Bertrand Drouvot

Hi,

On Fri, Apr 05, 2024 at 02:35:42PM +, Bertrand Drouvot wrote:
> I think that maybe as a first step we should move the "elog(DEBUG2," message 
> as
> proposed above to help debugging (that could help to confirm the above 
> theory).

If you agree and think that makes sense, pleae find attached a tiny patch doing
so.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From a414e81a4c5c88963b2412d1641f3de1262c15c0 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 5 Apr 2024 14:49:51 +
Subject: [PATCH v1] Move DEBUG message in LogCurrentRunningXacts()

Indeed the new LSN is visible to others session (say through pg_current_wal_lsn())
only after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is released.

So moving the DEBUG message after the XLogSetAsyncXactLSN() call seems more
appropriate for debugging purpose.
---
 src/backend/storage/ipc/standby.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)
 100.0% src/backend/storage/ipc/

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 87b04e51b3..ba549acf50 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -1366,6 +1366,17 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 
 	recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS);
 
+	/*
+	 * Ensure running_xacts information is synced to disk not too far in the
+	 * future. We don't want to stall anything though (i.e. use XLogFlush()),
+	 * so we let the wal writer do it during normal operation.
+	 * XLogSetAsyncXactLSN() conveniently will mark the LSN as to-be-synced
+	 * and nudge the WALWriter into action if sleeping. Check
+	 * XLogBackgroundFlush() for details why a record might not be flushed
+	 * without it.
+	 */
+	XLogSetAsyncXactLSN(recptr);
+
 	if (CurrRunningXacts->subxid_overflow)
 		elog(DEBUG2,
 			 "snapshot of %d running transactions overflowed (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
@@ -1383,17 +1394,6 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 			 CurrRunningXacts->latestCompletedXid,
 			 CurrRunningXacts->nextXid);
 
-	/*
-	 * Ensure running_xacts information is synced to disk not too far in the
-	 * future. We don't want to stall anything though (i.e. use XLogFlush()),
-	 * so we let the wal writer do it during normal operation.
-	 * XLogSetAsyncXactLSN() conveniently will mark the LSN as to-be-synced
-	 * and nudge the WALWriter into action if sleeping. Check
-	 * XLogBackgroundFlush() for details why a record might not be flushed
-	 * without it.
-	 */
-	XLogSetAsyncXactLSN(recptr);
-
 	return recptr;
 }
 
-- 
2.34.1

Re: Speed up clean meson builds by ~25%

2024-04-05 Thread Andres Freund

Hi,

On 2024-04-05 15:36:34 +0200, Jelte Fennema-Nio wrote:
> On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio  wrote:
> > It improved clean build times on my machine (10 cores/20 threads) from ~40
> > seconds to ~30 seconds.
> 
> After discussing this off-list with Bilal, I realized that this gain
> is only happening for clang builds on my system. Because those take a
> long time as was also recently discussed in[1]. My builds don't take
> nearly as long though. I tried with clang 15 through 18 and they all
> took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu
> 22.04
> 
> [1]: 
> https://www.postgresql.org/message-id/flat/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com

I recommend opening a bug report for clang, best with an already preprocessed
input file.

We're going to need to do something about this from our side as well, I
suspect. The times aren't great with gcc either, even if not as bad as with
clang.

Greetings,

Andres Freund

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Jelte Fennema-Nio

On Fri, 5 Apr 2024 at 16:04, Robert Haas  wrote:
>
> On Thu, Apr 4, 2024 at 1:10 PM Jelte Fennema-Nio  wrote:
> > Attached is a rebased patchset
>
> We should keep talking about this, but I think we're far too close to
> the wire at this point to think about committing anything for v17 at
> this point. These are big changes, they haven't been thoroughly
> reviewed by anyone AFAICT, and we don't have consensus on what we
> ought to be doing. I know that's probably not what you want to hear,
> but realistically, I think that's the only reasonable decision at this
> point.

Agreed on not considering this for PG17 nor this commitfest anymore. I
changed the commit fest entry accordingly.

Re: Streaming read-ready sequential scan code

2024-04-05 Thread Melanie Plageman

On Thu, Apr 4, 2024 at 12:39 PM Andres Freund  wrote:
>
> On 2024-04-04 22:37:39 +1300, Thomas Munro wrote:
> > On Thu, Apr 4, 2024 at 10:31 PM Thomas Munro  wrote:
> > > Alright what about this?
>
> I think it's probably worth adding a bit more of the commit message to the
> function comment. Yes, there's a bit in one of the return branches, but that's
> not what you're going to look at when just checking what the function does.

Agreed about the comment. I kept thinking that BAS_BULKREAD should
maybe return nbuffers - 1, but I couldn't convince myself why.
Otherwise v2-0001-Allow-BufferAccessStrategy-to-limit-pin-count LGTM.

- Melanie

Re: Popcount optimization using AVX512

2024-04-05 Thread Nathan Bossart

On Fri, Apr 05, 2024 at 07:58:44AM -0500, Nathan Bossart wrote:
> On Fri, Apr 05, 2024 at 10:33:27AM +0300, Ants Aasma wrote:
>> The main issue I saw was that clang was able to peel off the first
>> iteration of the loop and then eliminate the mask assignment and
>> replace masked load with a memory operand for vpopcnt. I was not able
>> to convince gcc to do that regardless of optimization options.
>> Generated code for the inner loop:
>> 
>> clang:
>> :
>>   50:  add rdx, 64
>>   54:  cmp rdx, rdi
>>   57:  jae 
>>   59:  vpopcntq zmm1, zmmword ptr [rdx]
>>   5f:  vpaddq zmm0, zmm1, zmm0
>>   65:  jmp 
>> 
>> gcc:
>> :
>>   38:  kmovq k1, rdx
>>   3d:  vmovdqu8 zmm0 {k1} {z}, zmmword ptr [rax]
>>   43:  add rax, 64
>>   47:  mov rdx, -1
>>   4e:  vpopcntq zmm0, zmm0
>>   54:  vpaddq zmm0, zmm0, zmm1
>>   5a:  vmovdqa64 zmm1, zmm0
>>   60:  cmp rax, rsi
>>   63:  jb 
>> 
>> I'm not sure how much that matters in practice. Attached is a patch to
>> do this manually giving essentially the same result in gcc. As most
>> distro packages are built using gcc I think it would make sense to
>> have the extra code if it gives a noticeable benefit for large cases.
> 
> Yeah, I did see this, but I also wasn't sure if it was worth further
> complicating the code.  I can test with and without your fix and see if it
> makes any difference in the benchmarks.

This seems to provide a small performance boost, so I've incorporated it
into v27.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 9fc4b7556b72d51fce676db84b446099767efff3 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 27 Mar 2024 16:39:24 -0500
Subject: [PATCH v27 1/2] AVX512 popcount support

---
 config/c-compiler.m4 |  58 ++
 configure| 252 +++
 configure.ac |  51 ++
 meson.build  |  87 +
 src/Makefile.global.in   |   5 +
 src/include/pg_config.h.in   |  12 ++
 src/include/port/pg_bitutils.h   |  11 ++
 src/makefiles/meson.build|   4 +-
 src/port/Makefile|  11 ++
 src/port/meson.build |   6 +-
 src/port/pg_bitutils.c   |   5 +
 src/port/pg_popcount_avx512.c|  82 +
 src/port/pg_popcount_avx512_choose.c |  81 +
 src/test/regress/expected/bit.out|  24 +++
 src/test/regress/sql/bit.sql |   4 +
 15 files changed, 690 insertions(+), 3 deletions(-)
 create mode 100644 src/port/pg_popcount_avx512.c
 create mode 100644 src/port/pg_popcount_avx512_choose.c

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 3268a780bb..892b3c9580 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -694,3 +694,61 @@ if test x"$Ac_cachevar" = x"yes"; then
 fi
 undefine([Ac_cachevar])dnl
 ])# PGAC_LOONGARCH_CRC32C_INTRINSICS
+
+# PGAC_XSAVE_INTRINSICS
+# -
+# Check if the compiler supports the XSAVE instructions using the _xgetbv
+# intrinsic function.
+#
+# An optional compiler flag can be passed as argument (e.g., -mxsave).  If the
+# intrinsic is supported, sets pgac_xsave_intrinsics and CFLAGS_XSAVE.
+AC_DEFUN([PGAC_XSAVE_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_xsave_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for _xgetbv with CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
+  [return _xgetbv(0) & 0xe0;])],
+  [Ac_cachevar=yes],
+  [Ac_cachevar=no])
+CFLAGS="$pgac_save_CFLAGS"])
+if test x"$Ac_cachevar" = x"yes"; then
+  CFLAGS_XSAVE="$1"
+  pgac_xsave_intrinsics=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_XSAVE_INTRINSICS
+
+# PGAC_AVX512_POPCNT_INTRINSICS
+# -
+# Check if the compiler supports the AVX512 POPCNT instructions using the
+# _mm512_setzero_si512, _mm512_maskz_loadu_epi8, _mm512_popcnt_epi64,
+# _mm512_add_epi64, and _mm512_reduce_add_epi64 intrinsic functions.
+#
+# Optional compiler flags can be passed as argument (e.g., -mavx512vpopcntdq
+# -mavx512bw).  If the intrinsics are supported, sets
+# pgac_avx512_popcnt_intrinsics and CFLAGS_POPCNT.
+AC_DEFUN([PGAC_AVX512_POPCNT_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for _mm512_popcnt_epi64 with CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
+  [const char buf@<:@sizeof(__m512i)@:>@;
+   PG_INT64_TYPE popcnt = 0;
+   __m512i accum = _mm512_setzero_si512();
+   const __m512i val = _mm512_maskz_loadu_epi8((__mmask64) 0xf0f0f0f0f0f0f0f0, (const __m512i *) buf);
+   const __m512i cnt = _mm512_popcnt_epi64(val);
+   accum = _mm512_add_epi64(accum, cnt);
+   popcnt = _mm512_reduce_add_epi64(accum);
+   /* return computed va

Re: WIP Incremental JSON Parser

2024-04-05 Thread Nathan Bossart

On Fri, Apr 05, 2024 at 10:15:45AM -0400, Andrew Dunstan wrote:
> On 2024-04-04 Th 15:54, Nathan Bossart wrote:
>> On Thu, Apr 04, 2024 at 03:31:12PM -0400, Andrew Dunstan wrote:
>> > Does the attached patch fix it for you?
>> It clears up 2 of the 3 warnings for me:
>> 
>> ../postgresql/src/common/jsonapi.c: In function ‘IsValidJsonNumber’:
>> ../postgresql/src/common/jsonapi.c:2018:30: error: ‘dummy_lex.inc_state’ may 
>> be used uninitialized in this function [-Werror=maybe-uninitialized]
>>   2018 |  if (lex->incremental && !lex->inc_state->is_last_chunk &&
>>|
> 
> Thanks, please try this instead.

LGTM, thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Table AM Interface Enhancements

2024-04-05 Thread Pavel Borisov

Hi, hackers!

On Tue, 2 Apr 2024 at 19:17, Jeff Davis  wrote:

> On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote:
> > I don't like the idea that every custom table AM reltoptions should
> > begin with StdRdOptions.  I would rather introduce the new data
> > structure with table options, which need to be accessed outside of
> > table AM.  Then reloptions will be a backbox only directly used in
> > table AM, while table AM has a freedom on what to store in reloptions
> > and how to calculate externally-visible options.  What do you think?
>
> Hi Alexander!
>
> I agree with all of that. It will take some refactoring to get there,
> though.
>
> One idea is to store StdRdOptions like normal, but if an unrecognized
> option is found, ask the table AM if it understands the option. In that
> case I think we'd just use a different field in pg_class so that it can
> use whatever format it wants to represent its options.
>
> Regards,
> Jeff Davis
>
I tried to rework a patch regarding table am according to the input from
Alexander and Jeff.

It splits table reloptions into two categories:
- common for all tables (stored in a fixed size structure and could be
accessed from outside)
- table-am specific (variable size, parsed and accessed by access method
only)

Please find a patch attached.


v9-0001-Custom-reloptions-for-table-AM.patch
Description: Binary data

Re: documentation structure

2024-04-05 Thread Robert Haas

On Mon, Mar 25, 2024 at 11:40 AM Peter Eisentraut  wrote:
> I think a possible problem we need to consider with these proposals to
> combine chapters is that they could make the chapters themselves too
> deep and harder to navigate.

I looked into various options for further combining chapters and/or
appendixes and found that this is indeed a huge problem. For example,
I had thought of creating a Developer Information chapter in the
appendix and moving various existing chapters and appendixes inside of
it, but that means that the  elements in those chapters get
demoted to , and what used to be a whole chapter or appendix
becomes a . And since you get one HTML page per , that
means that instead of a bunch of individual HTML pages of very
pleasant length, you suddenly get one very long HTML page that is,
exactly as you say, hard to navigate.

> The rendering can be adjusted to some degree, but then we also need to
> make sure any new chunking makes sense in other chapters.  (And it might
> also change a bunch of externally known HTML links.)

I looked into this and I'm unclear how much customization is possible.
I gather that the current scheme comes from having chunk.section.depth
of 1, and I guess you can change that to 2 to get an HTML page per
, but it seems like it would take a LOT of restructuring to
make that work. It would be much easier if you could vary this across
different parts of the documentation; for instance, if you could say,
well, in this particular chapter or appendix, I want
chunk.section.depth of 2, but elsewhere 1, that would be quite handy,
but after several hours reading various things about DocBook on the
Internet, I was still unable to determine  conclusively whether this
was possible. There's an interesting comment in
stylesheet-speedup-xhtml.xsl that says "Since we set a fixed
$chunk.section.depth, we can do away with a bunch of complicated XPath
searches for the previous and next sections at various levels." That
sounds like it's suggesting that it is in fact possible for this
setting to vary, but I don't know if that's true, or how to do it, and
it sounds like there might be performance consequences, too.

> I think maybe more could also be done at the top-level structure, too.
> Right now, we have  ->  -> .  We could add  on
> top of that.

Does this let you create structures of non-uniform depth? i.e. is
there a way that we can group some chapters into sets while leaving
others as standalone chapters, or somesuch?

I'm not 100% confident that non-uniform depth (either via  or via
chunk.section.depth or via some other mechanism) is a good idea.
There's a sort of uniformity to our navigation right now that does
have some real appeal. The downside, though, is that if you want
something to be a single HTML page, it's got to either be a chapter
(or appendix) by itself with no sections inside of it, or it's got to
be a  inside of a chapter, and so anything that's long enough
that it should be an HTML page by itself can never be more than one
level below the index. And that seems to make it quite difficult to
keep the index small.

Without some kind of variable-depth structure, the only other ways
that I can see to improve things are:

1. Make chunk.section.depth 2 and restructure the entire documentation
until the results look reasonable. This might be possible but I bet
it's difficult. We have, at present, chapters of *wildly* varying
length, from a few sentences to many, many pages. That is perhaps a
bad thing; you most likely wouldn't do that in a printed book. But
fixing it is a huge project. We don't necessarily have the same amount
of content about each topic, and there isn't necessarily a way of
grouping related topics together that produces units of relatively
uniform length. I think it's sensible to try to make improvements
where we can, by pushing stuff down that's short and not that
important, but finding our way to a chunk.section.depth=2 world that
feels good to most people compared to what we have today seems like
it's going to be challening.

2. Replace the current index with a custom index or landing page of
some kind. Or keep the current index and add a new landing page
alongside it. Something that isn't derived automatically from the
documentation structure but is created by hand.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: broken JIT support on Fedora 40

2024-04-05 Thread Dmitry Dolgov

> On Fri, Apr 05, 2024 at 03:50:50PM +0200, Dmitry Dolgov wrote:
> > On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote:
> > > On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
> > > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro  
> > > wrote:
> > > > https://github.com/llvm/llvm-project/pull/87093
> > >
> > > Oh, with those clues, I think I might see...  It is a bit strange that
> > > we copy attributes from AttributeTemplate(), a function that returns
> > > Datum, to our void deform function.  It works (I mean doesn't crash)
> > > if you just comment this line out:
> > >
> > > llvm_copy_attributes(AttributeTemplate, v_deform_fn);
> > >
> > > ... but I guess that disables inlining of the deform function?  So
> > > perhaps we just need to teach that thing not to try to copy the return
> > > value's attributes, which also seems to work here:
> >
> > Yep, I think this is it. I've spent some hours trying to understand why
> > suddenly deform function has noundef ret attribute, when it shouldn't --
> > this explains it and the proposed change fixes the crash. One thing that
> > is still not clear to me though is why this copied attribute doesn't
> > show up in the bitcode dumped right before running inline pass (I've
> > added this to troubleshoot the issue).
>
> One thing to consider in this context is indeed adding "verify" pass as
> suggested in the PR, at least for the debugging configuration. Without the fix
> it immediately returns:
>
>   Running analysis: VerifierAnalysis on deform_0_1
>   Attribute 'noundef' applied to incompatible type!
>
>   llvm error: Broken function found, compilation aborted!

Here is what I have in mind. Interestingly enough, it also shows few
more errors besides "noundef":

Intrinsic name not mangled correctly for type arguments! Should be: 
llvm.lifetime.end.p0
ptr @llvm.lifetime.end.p0i8

It refers to the function from create_LifetimeEnd, not sure how
important is this.
>From 7a05bd48a4270998a08e63e07cdf1cb9932bfddd Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Fri, 5 Apr 2024 17:52:06 +0200
Subject: [PATCH v1] Add jit_verify_bitcode

When passing LLVM IR through optimizer, for troubleshooting purposes
it's useful to apply the "verify" pass as well. It can catch an invalid
IR, if such was produced by mistake, and output a meaningful message
about the error, e.g.:

Attribute 'noundef' applied to incompatible type!
ptr @deform_0_1

Control this via jit_verify_bitcode development guc.
---
 doc/src/sgml/config.sgml| 19 +++
 src/backend/jit/jit.c   |  1 +
 src/backend/jit/llvm/llvmjit.c  | 11 ---
 src/backend/utils/misc/guc_tables.c | 11 +++
 src/include/jit/jit.h   |  2 +-
 5 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 624518e0b01..5f07ae099f4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11931,6 +11931,25 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
   
  
 
+ 
+  jit_verify_bitcode (boolean)
+  
+   jit_verify_bitcode configuration parameter
+  
+  
+  
+   
+Adds a verify pass to the pipeline, when optimizing
+generated LLVM IR. It helps to identify
+cases when an invalid IR was generated due to errors, and is only
+useful for working on the internals of the JIT implementation.
+The default setting is off.
+Only superusers and users with the appropriate SET
+privilege can change this setting.
+   
+  
+ 
+
  
   jit_expressions (boolean)
   
diff --git a/src/backend/jit/jit.c b/src/backend/jit/jit.c
index 815b58f33c5..ad94e4b55ee 100644
--- a/src/backend/jit/jit.c
+++ b/src/backend/jit/jit.c
@@ -39,6 +39,7 @@ bool		jit_tuple_deforming = true;
 double		jit_above_cost = 10;
 double		jit_inline_above_cost = 50;
 double		jit_optimize_above_cost = 50;
+bool		jit_verify_bitcode = false;
 
 static JitProviderCallbacks provider;
 static bool provider_successfully_loaded = false;
diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index ec0fdd49324..a4b21abb83a 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -698,12 +698,17 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 #else
 	LLVMPassBuilderOptionsRef options;
 	LLVMErrorRef err;
-	const char *passes;
+	const char *passes, *intermediate_passes;
 
 	if (context->base.flags & PGJIT_OPT3)
-		passes = "default";
+		intermediate_passes = "default";
 	else
-		passes = "default,mem2reg";
+		intermediate_passes = "default,mem2reg";
+
+	if (jit_verify_bitcode)
+		passes = psprintf("verify,%s", intermediate_passes);
+	else
+		passes = intermediate_passes;
 
 	options = LLVMCreatePassBuilderOptions();
 
diff --git a/src/backend/utils/misc/guc_table

Re: Fixing backslash dot for COPY FROM...CSV

2024-04-05 Thread Tom Lane

"Daniel Verite"  writes:
>   Tom Lane wrote:
>> I've looked over this patch and I generally agree that this is a
>> reasonable solution.

> Thanks for reviewing this!

While testing this, I tried running the tests with an updated server
and non-updated psql, and not only did the new test case fail, but
so did a bunch of existing ones.  That's because existing psql will
send the trailing "\." of inlined data to the server, and the updated
server will now think that's data if it's in CSV mode.

So this means that the patch introduces a rather serious cross-version
compatibility problem.  I doubt we can consider inlined CSV data to be
a niche case that few people use; but it will fail every time if your
psql is older than your server.

Not sure what to do here.  One idea is to install just the psql-side
fix, which should break nothing now that version-2 protocol is dead,
and then wait a few years before introducing the server-side change.
That seems kind of sad though.

An argument for not waiting is that psql may not be the only client
that needs this behavioral adjustment, and if so there's going to
be breakage anyway when we change the server; so we might as well
get it over with.

regards, tom lane

Re: Psql meta-command conninfo+

2024-04-05 Thread Imseih (AWS), Sami

> The original \conninfo was designed to report values from the libpq API
> about what libpq connected to. And the convention in psql is that "+"
> provide more or less the same information but a bit more. So I think it
> is wrong to make "\conninfo+" something fundamentally different than
> "\conninfo".

That is a fair argument. Looking at this a bit more, it looks like we can
enhance the GSSAPI output of conninfo to get the fields that GSSAPI fields that
conninfo+ was aiming for

Right now it just shows:

printf(_("GSSAPI-encrypted connection\n"));

but we can use libpq to get the other GSSAPI attributes in the output.

> And even more so if it contains information that isn't really
> "connection information". For example, current_user() doesn't make
> sense here, I think.

> I mean, if you want to add a command \some_useful_status_information, we
> can talk about that, but let's leave \conninfo to what it does.

> But I think there are better ways to implement this kind of server-side
> status, for example, with a system view.

What about a \sessioninfo command that shows all the
information for the current session, PID, app_name, current_user,
session_user, system_user, client_address, client_port, etc.

These will be the fields that we cannot gather if the connection is
broken for whatever reason.

The distinction here is \sessioninfo are the details after the connection
is established and could possibly be changed during the connection.

Regards,

Sami

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Jelte Fennema-Nio

On Fri, 5 Apr 2024 at 16:02, Robert Haas  wrote:
> Often?
>
> I kind of hope that the protocol starts to evolve a bit more than it
> has, but I don't want a continuous stream of changes. That will be
> very hard to test and verify correctness, and a hassle for drivers to
> keep up with, and a mess for compatibility.

I definitely think protocol changes require a lot more scrutiny than
many other things, given their hard/impossible to change nature.

But I do think that we shouldn't be at all averse to the act of
bumping the protocol version itself. If we have a single small
protocol change in one release, then imho it's no problem to bump the
protocol version. Bumping the version should be painless. So we
shouldn't be inclined to push an already agreed upon protocol change
to the next release, because there are some more protocol changes in
the pipeline that won't make it in the current one.

I don't think this would be any harder for drivers to keep up with,
then if we'd bulk all changes together. If driver developers only want
to support changes in bulk changes, they can simply choose not to
support 3.1 at all if they want and wait until 3.2 to support
everything in bulk then.

> > So, what approach of checking feature support are you envisioning
> > instead? A function for every feature?
> > Something like SupportsSetProtocolParameter, that returns an error
> > message if it's not supported and NULL otherwise. And then add such a
> > support function for every feature?
>
> Yeah, something like that.
> ...
>
> > I'm also not sure why you're saying a user is not entitled to this
> > information. We already provide users of libpq a way to see the full
> > Postgres version, and the major protocol version. I think allowing a
> > user to access this information is only a good thing. But I agree that
> > providing easy to use feature support functions is a better user
> > experience in some cases.
>
> I guess that's a fair point. But I'm worried that if we expose too
> much of the internals, we won't be able to change things later.

I'll take a look at redesigning the protocol parameter stuff. To work
with dedicated functions instead.

> I really intended the _pq_ prefix as a way of taking something out of
> the GUC namespace, not as a part of the GUC namespace that users would
> see. And I'm reluctant to go back on that. If we want to make
> pg_protocol.${NAME} mean a wire protocol parameter, well maybe there's
> something to that idea [insert caveats here]. But doesn't _pq_ look
> like something that was intended to be internal? That's certainly how
> I intended it.

I agree that _pq_ does look internal and doesn't clearly indicate that
it's a protocol related change. But sadly the _pq_ prefix is the only
one that doesn't error in startup packets, waiting another 5 years
until pg_protocol is allowed in the startup packet doesn't seem like a
reasonable solution either.

How about naming the GUCs pg_protocol.${NAME}, but still requiring the
_pq_ prefix in the StartupPacket. That way only client libraries would
have to see this internal prefix and they could remap it someway. I
see two options for that:
1. At the server replace the _pq_ prefix with pg_protocol. So
_pq_.${NAME} would map to pg_protocol.${name}
2. At the server replace the _pq_.pg_protocol prefix with pg_protocol.
So _pq_.pg_protocol.${NAME} would map to pg_protocol.${name}.

I guess you prefer option 2, because that would still leave lots of
space to do something with the rest of the _pq_ space, i.e.
_pq_.magic_pixie_dust can still be used for something different than a
GUC.

Bikeshedding: I think I prefer protocol.${NAME} over
pg_protocol.${NAME}, it's shorter and it seems obvious that protocol
is the postgres protocol in this context.

This should be a fairly simple change to make.

> Wouldn't libpq already know what value it last set? Or is this needed
> because it doesn't know what the other side's default is?

libpq could/should indeed know this, but for debugging/testing
purposes it is quite useful to have a facility to read the server side
value. I think defaults should always be whatever was happening if the
parameter wasn't specified before, so knowing the server default is
not something the client needs to worry about (i.e. the default is
defined as part of the protocol spec).

> Hmm, OK. I guess if the PGC_PROTOCOL flag makes it so that the GUC can
> only be set using ParameterSet, and it also makes them
> non-transactional, then it's fine. So to be clear, I can't set these
> in postgresql.conf, or postgresql.auto.conf, or via ALTER $ANYTHING,
> or via SET, or in any other way than by sending ParameterStatus
> messages. And when I send a ParameterStatus message, it doesn't matter
> if I'm in a good transaction, an aborted transaction, or no
> transaction at all, and the setting change takes effect regardless of
> that and regardless of any subsequent rollbacks. Is that right?
>
> I feel like maybe it's not, because you seem to be thin

Re: AIX support

2024-04-05 Thread Sriram RK

> What you do need to do to reproduce the described problems is
> check out the Postgres git tree and rewind to just before
> commit 0b16bb877, where we deleted AIX support.  Any attempt
> to restore AIX support would have to start with reverting that
> commit (and perhaps the followup f0827b443).

> regards, tom lane

Hi Team, thank you for all the info.

We progressed to build the source on our nodes and the build was successful 
with the below configuration.

Postgres  - github-bcdfa5f2e2f
AIX - 71c
Xlc - 13.1.0
Bison- 3.0.5

Going ahead, we want to build the changes that were removed as part of 
“0b16bb8776bb8”, with latest Xlc and gcc version.

We were building the source from the postgres ftp 
server(https://ftp.postgresql.org/pub/source/), would like to understand if 
there are any source level changes between the ftp server and the source on 
github?

Regards,
Sriram.

From: Tom Lane 
Date: Friday, 29 March 2024 at 9:03 AM
To: Thomas Munro 
Cc: Noah Misch , Sriram RK , Alvaro 
Herrera , pgsql-hack...@postgresql.org 

Subject: Re: AIX support
Thomas Munro  writes:
> Oh, sorry, I had missed the part where newer compilers fix the issue
> too.  Old out-of-support versions of AIX running old compilers, what
> fun.

Indeed.  One of the topics that needs investigation if you want to
pursue this is which AIX system and compiler versions still deserve
support, and which of the AIX hacks we had been carrying still need
to be there based on that analysis.  For context, we've been pruning
support for extinct-in-the-wild OS versions pretty aggressively
over the past couple of years, and I'd expect to apply the same
standard to AIX.

regards, tom lane

Re: documentation structure

2024-04-05 Thread David G. Johnston

On Fri, Apr 5, 2024 at 9:01 AM Robert Haas  wrote:

>
> > The rendering can be adjusted to some degree, but then we also need to
> > make sure any new chunking makes sense in other chapters.  (And it might
> > also change a bunch of externally known HTML links.)
>
> I looked into this and I'm unclear how much customization is possible.
>
>
Here is a link to my attempt at this a couple of years ago.  It basically
"abuses" refentry.

https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com

I never did dive into the man page or PDF dynamics of this
particular change but it seemed to solve HTML pagination without negative
consequences and with minimal risk of unintended consequences since only
the markup on the pages we want to alter is changed, not global
configuration.

David J.

Re: documentation structure

2024-04-05 Thread Robert Haas

On Fri, Apr 5, 2024 at 12:15 PM David G. Johnston
 wrote:
> Here is a link to my attempt at this a couple of years ago.  It basically 
> "abuses" refentry.
>
> https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com
>
> I never did dive into the man page or PDF dynamics of this particular change 
> but it seemed to solve HTML pagination without negative consequences and with 
> minimal risk of unintended consequences since only the markup on the pages we 
> want to alter is changed, not global configuration.

Hmm, but it seems like that might have generated some man page entries
that we don't want?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Speed up clean meson builds by ~25%

2024-04-05 Thread Jelte Fennema-Nio

On Fri, 5 Apr 2024 at 17:24, Andres Freund  wrote:
> I recommend opening a bug report for clang, best with an already preprocessed
> input file.

> We're going to need to do something about this from our side as well, I
> suspect. The times aren't great with gcc either, even if not as bad as with
> clang.

Agreed. While not a full solution, I think this patch is still good to
address some of the pain: Waiting 10 seconds at the end of my build
with only 1 of my 10 cores doing anything.

So while this doesn't decrease CPU time spent it does decrease
wall-clock time significantly in some cases, with afaict no downsides.

Re: documentation structure

2024-04-05 Thread David G. Johnston

On Fri, Apr 5, 2024 at 9:18 AM Robert Haas  wrote:

> On Fri, Apr 5, 2024 at 12:15 PM David G. Johnston
>  wrote:
> > Here is a link to my attempt at this a couple of years ago.  It
> basically "abuses" refentry.
> >
> >
> https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com
> >
> > I never did dive into the man page or PDF dynamics of this particular
> change but it seemed to solve HTML pagination without negative consequences
> and with minimal risk of unintended consequences since only the markup on
> the pages we want to alter is changed, not global configuration.
>
> Hmm, but it seems like that might have generated some man page entries
> that we don't want?
>

If so (didn't check) maybe just remove them in post?

David J.

Re: Fixing backslash dot for COPY FROM...CSV

2024-04-05 Thread Tom Lane

I wrote:
> So this means that the patch introduces a rather serious cross-version
> compatibility problem.  I doubt we can consider inlined CSV data to be
> a niche case that few people use; but it will fail every time if your
> psql is older than your server.

On third thought, that may not be as bad as I was thinking.
We don't blink at the idea that an older psql's \d commands may
malfunction with a newer server, and I think most users have
internalized the idea that they want psql >= server.  If the
patch created an incompatibility with that direction, it'd be
a problem, but I don't think it does.

regards, tom lane

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Dave Cramer

On Fri, 5 Apr 2024 at 12:09, Jelte Fennema-Nio  wrote:

> On Fri, 5 Apr 2024 at 16:02, Robert Haas  wrote:
> > Often?
> >
> > I kind of hope that the protocol starts to evolve a bit more than it
> > has, but I don't want a continuous stream of changes. That will be
> > very hard to test and verify correctness, and a hassle for drivers to
> > keep up with, and a mess for compatibility.
>
> I definitely think protocol changes require a lot more scrutiny than
> many other things, given their hard/impossible to change nature.
>
> But I do think that we shouldn't be at all averse to the act of
> bumping the protocol version itself. If we have a single small
> protocol change in one release, then imho it's no problem to bump the
> protocol version. Bumping the version should be painless. So we
> shouldn't be inclined to push an already agreed upon protocol change
> to the next release, because there are some more protocol changes in
> the pipeline that won't make it in the current one.
>
> I don't think this would be any harder for drivers to keep up with,
> then if we'd bulk all changes together. If driver developers only want
> to support changes in bulk changes, they can simply choose not to
> support 3.1 at all if they want and wait until 3.2 to support
> everything in bulk then.
>
> > > So, what approach of checking feature support are you envisioning
> > > instead? A function for every feature?
> > > Something like SupportsSetProtocolParameter, that returns an error
> > > message if it's not supported and NULL otherwise. And then add such a
> > > support function for every feature?
> >
> > Yeah, something like that.
> > ...
> >
> > > I'm also not sure why you're saying a user is not entitled to this
> > > information. We already provide users of libpq a way to see the full
> > > Postgres version, and the major protocol version. I think allowing a
> > > user to access this information is only a good thing. But I agree that
> > > providing easy to use feature support functions is a better user
> > > experience in some cases.
> >
> > I guess that's a fair point. But I'm worried that if we expose too
> > much of the internals, we won't be able to change things later.
>
> I'll take a look at redesigning the protocol parameter stuff. To work
> with dedicated functions instead.
>
+1

>
> > I really intended the _pq_ prefix as a way of taking something out of
> > the GUC namespace, not as a part of the GUC namespace that users would
> > see. And I'm reluctant to go back on that. If we want to make
> > pg_protocol.${NAME} mean a wire protocol parameter, well maybe there's
> > something to that idea [insert caveats here]. But doesn't _pq_ look
> > like something that was intended to be internal? That's certainly how
> > I intended it.
>

Is this actually used in practice? If so, how ?

>
> I agree that _pq_ does look internal and doesn't clearly indicate that
> it's a protocol related change. But sadly the _pq_ prefix is the only
> one that doesn't error in startup packets, waiting another 5 years
> until pg_protocol is allowed in the startup packet doesn't seem like a
> reasonable solution either.
>
> How about naming the GUCs pg_protocol.${NAME}, but still requiring the
> _pq_ prefix in the StartupPacket. That way only client libraries would
> have to see this internal prefix and they could remap it someway. I
> see two options for that:
> 1. At the server replace the _pq_ prefix with pg_protocol. So
> _pq_.${NAME} would map to pg_protocol.${name}
> 2. At the server replace the _pq_.pg_protocol prefix with pg_protocol.
> So _pq_.pg_protocol.${NAME} would map to pg_protocol.${name}.
>
> I guess you prefer option 2, because that would still leave lots of
> space to do something with the rest of the _pq_ space, i.e.
> _pq_.magic_pixie_dust can still be used for something different than a
> GUC.
>
> Bikeshedding: I think I prefer protocol.${NAME} over
> pg_protocol.${NAME}, it's shorter and it seems obvious that protocol
> is the postgres protocol in this context.
>
> This should be a fairly simple change to make.
>
> > Wouldn't libpq already know what value it last set? Or is this needed
> > because it doesn't know what the other side's default is?
>
> libpq could/should indeed know this, but for debugging/testing
> purposes it is quite useful to have a facility to read the server side
> value. I think defaults should always be whatever was happening if the
> parameter wasn't specified before, so knowing the server default is
> not something the client needs to worry about (i.e. the default is
> defined as part of the protocol spec).
>
> > Hmm, OK. I guess if the PGC_PROTOCOL flag makes it so that the GUC can
> > only be set using ParameterSet, and it also makes them
> > non-transactional, then it's fine. So to be clear, I can't set these
> > in postgresql.conf, or postgresql.auto.conf, or via ALTER $ANYTHING,
> > or via SET, or in any other way than by sending ParameterStatus
> > messages. And when I send a Paramete

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Jacob Champion

On Thu, Apr 4, 2024 at 6:37 PM Michael Paquier  wrote:
> From where did you pull the LibreSSL sources?  Directly from the
> OpenBSD tree?

I've been building LibreSSL Portable: https://github.com/libressl/portable

> Ah, right.  OpenSSL_add_all_algorithms() is documented as having no
> effect in 1.1.0.

I think it still has an effect, but it's unnecessary unless you need
some specific initialization other than the defaults -- for which we
now have OPENSSL_init_crypto(). Best I could tell, pgcrypto had no
such requirements (but extra eyes on that would be appreciated).

> I would be OK to draw a line to what we test in the buildfarm if it
> comes to that, down to OpenBSD 6.9.

That would correspond to LibreSSL 3.3 if I'm not mistaken. Any
particular reason for 6.9 as the dividing line, and not something
later? And by "test in the buildfarm", do you mean across all
versions, or just what we support for PG17? (For the record, I don't
think there's any reason to drop older LibreSSL testing for earlier
branches.)

> This version is already not
> supported, and we had a number of issues with older versions and
> timestamps going backwards.

Speaking of which: for completeness I should note that LibreSSL 3.2
(OpenBSD 6.8) is failing src/test/ssl because of alternate error
messages. That failure exists on HEAD and is not introduced in this
patchset. LibreSSL 3.3, which passes, has the following changelog
note:

  * If x509_verify() fails, ensure that the error is set on both
the x509_verify_ctx() and its store context to make some failures
visible from SSL_get_verify_result().

So that would explain that. If we drop support for 3.2 and earlier
then there's nothing to be done anyway.

> -/* Define to 1 if you have the `CRYPTO_lock' function. */
> -#undef HAVE_CRYPTO_LOCK
>
> I'm happy to see that gone for good.

+1

> +  # Functions introduced in OpenSSL 1.1.0/LibreSSL 2.7.0.
> +  ['OPENSSL_init_ssl', {'required': true}],
> +  ['BIO_meth_new', {'required': true}],
> +  ['ASN1_STRING_get0_data', {'required': true}],
> +  ['HMAC_CTX_new', {'required': true}],
> +  ['HMAC_CTX_free', {'required': true}],
>
> These should be removed to save cycles in ./configure and meson, no?
> We don't have any more of their HAVE_* flags in the tree with this
> patch applied.

True, but they are required, and I needed something in there to reject
earlier builds. Daniel's suggested approach with _VERSION_NUMBER
should be able to replace this I think.

> -  cdata.set('OPENSSL_API_COMPAT', '0x10002000L',
> +  cdata.set('OPENSSL_API_COMPAT', '0x1010L',
>
> Seems to me that this should also document in meson.build why 1.1.0 is
> chosen, same as ./configure.

Good point.

> It seems to me that src/common/protocol_openssl.c could be cleaned up;
> I see SSL_CTX_set_min_proto_version and SSL_CTX_set_max_proto_version
> listed in LibreSSL (looking at some past version of
> https://github.com/libressl/libressl.git that I still have around).
>
> There is an extra thing in pg_strong_random.c once we cut OpenSSL <
> 1.1.1..  Do we still need pg_strong_random_init() and its RAND_poll()
> when it comes to LibreSSL?  This is a sensitive area, so we should be
> careful.

It would be cool if there are more cleanups that can happen. I agree
we need to be careful around removal, though, especially now that we
know that LibreSSL testing is spotty... I will look more into these
later today.

Thanks,
--Jacob

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Tom Lane

Dave Cramer  writes:
> On Fri, 5 Apr 2024 at 12:09, Jelte Fennema-Nio  wrote:
>> Setting PGC_PROTOCOL gucs would be allowed in the startup packet,
>> which is fine afaict because that's also something that's part of the
>> protocol level and is thus fully controlled by client libraries and
>> poolers) But other than that: Yes, conf files, ALTER, and SET cannot
>> change these GUCs.

> +1

I don't buy that argument, actually.  libpq, and pretty much every
other client AFAIK, has provisions to let higher code levels insert
random options into the startup packet.  So to make this work libpq
would have to filter or at least inspect such options, which is
logic that doesn't exist and doesn't seem nice to need.

The other problem with adding these things in the startup packet
is that when you send that packet, you don't know what the server
version is and hence don't know if it will take these options.

What's so bad about insisting that these options must be sent in a
separate message?

regards, tom lane

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Robert Haas

On Fri, Apr 5, 2024 at 12:09 PM Jelte Fennema-Nio  wrote:
> But I do think that we shouldn't be at all averse to the act of
> bumping the protocol version itself. If we have a single small
> protocol change in one release, then imho it's no problem to bump the
> protocol version. Bumping the version should be painless. So we
> shouldn't be inclined to push an already agreed upon protocol change
> to the next release, because there are some more protocol changes in
> the pipeline that won't make it in the current one.

I think I half-agree with this. Let's say we all agree that the world
will end unless we make wire protocol changes A and B, and for
whatever reason we also agree that these changes should be handled via
a protocol version bump rather than any other method, but only the
patch for A is sufficiently stable by the end of the release cycle.
Then we commit A and bump the protocol version and the next release we
do the same for B, hopefully before the world ends. In this sense this
is just like CATALOG_VERSION_NO or XLOG_PAGE_MAGIC. We don't postpone
commits because they'd require bumping those values; we just bump the
values when it's necessary.

But on the other hand, I find it a bit hard to believe that the
statement "Bumping the version should be painless" will ever
correspond to reality. Since we've not done a hard wire protocol break
in a very long time, I assume that we would not want to start now. But
that also means that when the PG version with protocol version 3.5
comes out, that server is going to have to be compatible with 3.4,
3.3, 3.2, 3.1, and 3.0. How will we test that it really is? We could
test against libpqs from older server versions, but that quickly
becomes awkward, because it means that there won't be tests that run
as part of a regular build, but it'll have to all be done in the
buildfarm or CI with something like the cross-version upgrade tests we
already have. Maybe we'd be better off adding a libpq connection
option that forces the use of a specific minor protocol version, but
then we'll need backward-compatibility code in libpq basically
forever. But maybe we need that anyway to avoid older and newer
servers being unable to communicate.

Plus, you've got all of the consequences for non-core drivers, which
have to both add support for the new wire protocol - if they don't
want to seem outdated and eventually obsolete - and also test that
they're still compatible with all supported server versions.
Connection poolers have the same set of problems. The whole thing is
almost a hole with no bottom. Keeping up with core changes in this
area could become a massive undertaking for lots and lots of people,
some of whom may be the sole maintainer of some important driver that
now needs a whole bunch of work.

I'm not sure how much it improves things if we imagine adding feature
flags to the existing protocol versions, rather than whole new
protocol versions, but at least it cuts down on the assumption that
adopting new features is mandatory, and that such features are
cumulative. If a driver wants to support TDE but not protocol
parameters or protocol parameters but not TDE, who are we to say no?
If in supporting those things we bump the protocol version to 3.2, and
then 3.3 fixes a huge performance problem, are drivers going to be
required to add support for features they don't care about to get the
performance fixes? I see some benefit in bumping the protocol version
for major changes, or for changes that we have an important reason to
make mandatory, or to make previously-optional features for which
support has become in practical terms universal part of the base
feature set. But I'm very skeptical of the idea that we should just
handle as many things as possible via a protocol version bump. We've
been avoiding protocol version bumps like the plague since forever,
and swinging all the way to the other extreme doesn't sound like the
right idea to me.

> How about naming the GUCs pg_protocol.${NAME}, but still requiring the
> _pq_ prefix in the StartupPacket. That way only client libraries would
> have to see this internal prefix and they could remap it someway. I
> see two options for that:
> 1. At the server replace the _pq_ prefix with pg_protocol. So
> _pq_.${NAME} would map to pg_protocol.${name}
> 2. At the server replace the _pq_.pg_protocol prefix with pg_protocol.
> So _pq_.pg_protocol.${NAME} would map to pg_protocol.${name}.
>
> I guess you prefer option 2, because that would still leave lots of
> space to do something with the rest of the _pq_ space, i.e.
> _pq_.magic_pixie_dust can still be used for something different than a
> GUC.

I'm not sure what I think about this. Do we need these new GUCs to be
both PGC_PROTOCOL *and also* live in a separate namespace? I see the
need for the former pretty clearly: if these kinds of things are to be
part of the GUC system (which wasn't my initial bias, but whatever)
then they need to have some important behavioral differences

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Jelte Fennema-Nio

On Fri, 5 Apr 2024 at 18:30, Dave Cramer  wrote:
>> > I really intended the _pq_ prefix as a way of taking something out of
>> > the GUC namespace, not as a part of the GUC namespace that users would
>> > see. And I'm reluctant to go back on that. If we want to make
>> > pg_protocol.${NAME} mean a wire protocol parameter, well maybe there's
>> > something to that idea [insert caveats here]. But doesn't _pq_ look
>> > like something that was intended to be internal? That's certainly how
>> > I intended it.
>
>
> Is this actually used in practice? If so, how ?

No, it's not used for anything at the moment. This whole thread is
basically about trying to agree on how we want to make protocol
changes in the future in a somewhat standardized way. But using the
tools available that we have to not break connecting to old postgres
servers: ProtocolVersionNegotation messages, minor version numbers,
and _pq_ parameters in the startup message. All of those have so far
been completely theoretical and have not appeared in any client-server
communication.

Re: Fixing backslash dot for COPY FROM...CSV

2024-04-05 Thread Daniel Verite

Tom Lane wrote:

> Not sure what to do here.  One idea is to install just the psql-side
> fix, which should break nothing now that version-2 protocol is dead,
> and then wait a few years before introducing the server-side change.
> That seems kind of sad though.

Wouldn't backpatching solve this?  Then only the users who don't
apply the minor updates would have non-matching server and psql.
Initially I though that obsoleting the v2 protocol was a recent
move, but reading older messages from the list I've got the
impression that it was more or less in the pipeline since way
before version 10.

Also one of the cases the patch fixes, the one when imported
data are silently truncated at the point of \., is quite nasty
IMO.
I can imagine an app where user-supplied data would be
appended row-by-row into a CSV file, and would be processed
periodically by batch. Under some conditions, in particular
if newlines in the first column are allowed, a malevolent user could
submit a \. sequence  to cause the batch to miss the rest of the data
without any error being raised.

[1]
https://www.postgresql.org/message-id/11648.1403147417%40sss.pgh.pa.us

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Jelte Fennema-Nio

On Fri, 5 Apr 2024 at 18:43, Tom Lane  wrote:
> I don't buy that argument, actually.  libpq, and pretty much every
> other client AFAIK, has provisions to let higher code levels insert
> random options into the startup packet.  So to make this work libpq
> would have to filter or at least inspect such options, which is
> logic that doesn't exist and doesn't seem nice to need.

libpq actually doesn't support doing this (only by putting them in the
"options" parameter, but _pq_ parameters would not be allowed there),
but indeed many other clients do allow this and indeed likely don't
have logic to filter/disallow _pq_ prefixed parameters.

This seems very easy to address though: Only parse _pq_ options when
protocol version 3.1 is requested by the client, and otherwise always
report them as "not supported". Then clients upgrading to 3.1, they
should filter/disallow _pq_ parameters to be arbitrarily set. I don't
think that's hard/not nice to add, it's literally a prefix check for
the "_pq_." string.

> The other problem with adding these things in the startup packet
> is that when you send that packet, you don't know what the server
> version is and hence don't know if it will take these options.

(imho) the whole point of the _pq_ options is that they don't trigger
an error when they are requested by the client, but not supported by
the server. So I don't understand your problem here.

> What's so bad about insisting that these options must be sent in a
> separate message?

To not require an additional roundtrip waiting for the server to respond.

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Daniel Gustafsson

> On 5 Apr 2024, at 03:37, Michael Paquier  wrote:
> On Thu, Apr 04, 2024 at 11:03:35AM -0700, Jacob Champion wrote:

>> v3 does that by putting back checks for symbols that aren't part of
>> LibreSSL (tested back to 2.7, which is where the 1.1.x APIs started to
>> arrive).
> 
> From where did you pull the LibreSSL sources?  Directly from the
> OpenBSD tree?
> 
>> It also makes adjustments for the new OPENSSL_API_COMPAT
>> version, getting rid of OpenSSL_add_all_algorithms() and adding a
>> missing header.
> 
> Ah, right.  OpenSSL_add_all_algorithms() is documented as having no
> effect in 1.1.0.

This API was deprecated and made into a no-op in OpenBSD 6.4 which corresponds
to LibreSSL 2.8.3.

>> This patch has a deficiency where 1.1.0 itself isn't actually rejected
>> at configure time; Daniel's working on an explicit check for the
>> OPENSSL/LIBRESSL_VERSION_NUMBER that should fix that up. There's an
>> open question about which version we should pin for LibreSSL, which
>> should ultimately come down to which versions of OpenBSD we want PG17
>> to support.
> 
> I would be OK to draw a line to what we test in the buildfarm if it
> comes to that, down to OpenBSD 6.9.  This version is already not
> supported, and we had a number of issues with older versions and
> timestamps going backwards.

There is a member on 6.8 as well, and while 6.8 work fine the tests all fail
due to the error messages being different.  Rather than adding alternate output
for an EOL version of OpenBSD (which currently don't even run the ssl checks in
the BF) I think using 6.9 as the minimum makes sense.

> +  # Functions introduced in OpenSSL 1.1.0/LibreSSL 2.7.0.
> +  ['OPENSSL_init_ssl', {'required': true}],
> +  ['BIO_meth_new', {'required': true}],
> +  ['ASN1_STRING_get0_data', {'required': true}],
> +  ['HMAC_CTX_new', {'required': true}],
> +  ['HMAC_CTX_free', {'required': true}],
> 
> These should be removed to save cycles in ./configure and meson, no?

Correct, they are removed in favor of a compile test for OpenSSL version.

> -  cdata.set('OPENSSL_API_COMPAT', '0x10002000L',
> +  cdata.set('OPENSSL_API_COMPAT', '0x1010L',
> 
> Seems to me that this should also document in meson.build why 1.1.0 is
> chosen, same as ./configure.

Done.

> It seems to me that src/common/protocol_openssl.c could be cleaned up;
> I see SSL_CTX_set_min_proto_version and SSL_CTX_set_max_proto_version
> listed in LibreSSL (looking at some past version of
> https://github.com/libressl/libressl.git that I still have around).

Both SSL_CTX_set_min_proto_version and SSL_CTX_set_max_proto_version are
available in at least OpenBSD 6.3 which is LibreSSL 2.7.5.  With this we can
thus remove the whole file.

> There is an extra thing in pg_strong_random.c once we cut OpenSSL <
> 1.1.1..  Do we still need pg_strong_random_init() and its RAND_poll()
> when it comes to LibreSSL?  This is a sensitive area, so we should be
> careful.

Re-reading the thread which added this comment, and the OpenSSL docs and code,
I'm leaning towards leaving this in.  The overhead is marginal and fork safety
has been broken at least once in OpenSSL since 1.1.1:

https://github.com/openssl/openssl/issues/12377

That particular bug was thankfully caught before it shipped, but mitigating the
risk is this cheap enough that is seems reasonable to keep this in.

Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 fails
on Windows in CI which I will investigate next.

--
Daniel Gustafsson

v4-0001-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Daniel Gustafsson

> On 5 Apr 2024, at 18:41, Jacob Champion  
> wrote:
> On Thu, Apr 4, 2024 at 6:37 PM Michael Paquier  wrote:

>> I would be OK to draw a line to what we test in the buildfarm if it
>> comes to that, down to OpenBSD 6.9.
> 
> That would correspond to LibreSSL 3.3 if I'm not mistaken. Any
> particular reason for 6.9 as the dividing line, and not something
> later? And by "test in the buildfarm", do you mean across all
> versions, or just what we support for PG17? (For the record, I don't
> think there's any reason to drop older LibreSSL testing for earlier
> branches.)

We should draw the line on something we can reliably test, so 6.9 seems fine to
me (unless there is evidence of older versions being common in the wild).
OpenBSD themselves support 2 backbranches so 6.9 is still far beyond the EOL
mark upstream.

--
Daniel Gustafsson

Re: AIX support

2024-04-05 Thread Noah Misch

On Fri, Apr 05, 2024 at 04:12:06PM +, Sriram RK wrote:
> 
> > What you do need to do to reproduce the described problems is
> > check out the Postgres git tree and rewind to just before
> > commit 0b16bb877, where we deleted AIX support.  Any attempt
> > to restore AIX support would have to start with reverting that
> > commit (and perhaps the followup f0827b443).

> Going ahead, we want to build the changes that were removed as part of 
> “0b16bb8776bb8”, with latest Xlc and gcc version.
> 
> We were building the source from the postgres ftp 
> server(https://ftp.postgresql.org/pub/source/), would like to understand if 
> there are any source level changes between the ftp server and the source on 
> github?

To succeed in this endeavor, you'll need to develop fluency in the tools to
answer questions like that, or bring in someone who is fluent.  In this case,
GNU diff is the standard tool for answering your question.  These resources
cover other topics you would need to learn:

https://wiki.postgresql.org/wiki/Developer_FAQ
https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F

Re: Can't compile PG 17 (master) from git under Msys2 autoconf

2024-04-05 Thread 'Alvaro Herrera'

On 2024-Apr-04, Regina Obe wrote:

> I think I got something not too far off from what's there now that works 
> under my msys2 setup again.  This is partly using your idea of using 
> $(top_builddir) to qualify the path but in the LN_S section that is causing 
> me grief.
> This seems to work okay building in tree and out of tree.
> By changing these lines in src/backend/Makefile
> 
> $(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h
> rm -f '$@'
> -   $(LN_S) ../../backend/$< '$@'
> +   $(LN_S) $(top_builddir)/src/backend/$< '$@'
> 
>  $(top_builddir)/src/include/utils/wait_event_types.h: 
> utils/activity/wait_event_types.h
> rm -f '$@'
> -   $(LN_S) ../../backend/$< '$@'
> +   $(LN_S) $(top_builddir)/src/backend/$< '$@'
> 
> I've also attached as a patch.

Hmm, that's quite strange ... it certainly doesn't work for me.  Maybe
the LN_S utility is resolving the symlink at creation time, rather than
letting it be a reference to be resolved later.  Apparently, the only Msys2 
buildfarm animal is now using Meson,
so we don't have any representative animal for your situation.

What does LN_S do for you anyway?  Looking at
https://stackoverflow.com/questions/61594025/symlink-in-msys2-copy-or-hard-link
it sounds like this would work if the MSYS environment variable was set
to winsymlinks (or maybe not. I don't know if a "windows shortcut" would
be usable in this case.)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Nadie está tan esclavizado como el que se cree libre no siéndolo" (Goethe)

Re: incremental backup breakage in BlockRefTableEntryGetBlocks

2024-04-05 Thread Robert Haas

On Fri, Apr 5, 2024 at 2:59 AM Jakub Wartak
 wrote:
> And of course i'm attaching reproducer with some braindump notes in
> case in future one hits similiar issue and wonders where to even start
> looking (it's very primitive though but might help).

Thanks. I've committed the patch now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: LogwrtResult contended spinlock

2024-04-05 Thread Alvaro Herrera

Pushed 0001.  Here's the patch that adds the Copy position one more
time, with the monotonic_advance function returning the value.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From 3f5c860576245b92701e7bfc517947c418c68510 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 5 Apr 2024 13:52:44 +0200
Subject: [PATCH v17] Add logCopyResult

Updated using monotonic increment
---
 src/backend/access/transam/xlog.c | 32 ++-
 src/include/port/atomics.h| 36 +++
 2 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4499cda7b..a40c1fb79e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -469,6 +469,7 @@ typedef struct XLogCtlData
 	XLogRecPtr	lastSegSwitchLSN;
 
 	/* These are accessed using atomics -- info_lck not needed */
+	pg_atomic_uint64 logCopyResult; /* last byte + 1 copied to WAL buffers */
 	pg_atomic_uint64 logWriteResult;	/* last byte + 1 written out */
 	pg_atomic_uint64 logFlushResult;	/* last byte + 1 flushed */
 
@@ -1499,6 +1500,7 @@ static XLogRecPtr
 WaitXLogInsertionsToFinish(XLogRecPtr upto)
 {
 	uint64		bytepos;
+	XLogRecPtr	copyptr;
 	XLogRecPtr	reservedUpto;
 	XLogRecPtr	finishedUpto;
 	XLogCtlInsert *Insert = &XLogCtl->Insert;
@@ -1507,6 +1509,11 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto)
 	if (MyProc == NULL)
 		elog(PANIC, "cannot wait without a PGPROC structure");
 
+	/* check if there's any work to do */
+	copyptr = pg_atomic_read_u64(&XLogCtl->logCopyResult);
+	if (upto <= copyptr)
+		return copyptr;
+
 	/* Read the current insert position */
 	SpinLockAcquire(&Insert->insertpos_lck);
 	bytepos = Insert->CurrBytePos;
@@ -1586,6 +1593,10 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto)
 		if (insertingat != InvalidXLogRecPtr && insertingat < finishedUpto)
 			finishedUpto = insertingat;
 	}
+
+	finishedUpto =
+		pg_atomic_monotonic_advance_u64(&XLogCtl->logCopyResult, finishedUpto);
+
 	return finishedUpto;
 }
 
@@ -1727,13 +1738,24 @@ WALReadFromBuffers(char *dstbuf, XLogRecPtr startptr, Size count,
 {
 	char	   *pdst = dstbuf;
 	XLogRecPtr	recptr = startptr;
+	XLogRecPtr	copyptr;
 	Size		nbytes = count;
 
 	if (RecoveryInProgress() || tli != GetWALInsertionTimeLine())
 		return 0;
 
 	Assert(!XLogRecPtrIsInvalid(startptr));
-	Assert(startptr + count <= LogwrtResult.Write);
+
+	/*
+	 * Caller should ensure that the requested data has been copied to WAL
+	 * buffers before we try to read it.
+	 */
+	copyptr = pg_atomic_read_u64(&XLogCtl->logCopyResult);
+	if (startptr + count > copyptr)
+		ereport(ERROR,
+errmsg("request to read past end of generated WAL; requested %X/%X, current position %X/%X",
+	   LSN_FORMAT_ARGS(startptr + count),
+	   LSN_FORMAT_ARGS(copyptr)));
 
 	/*
 	 * Loop through the buffers without a lock. For each buffer, atomically
@@ -2571,13 +2593,19 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 	{
 		XLogRecPtr	Flush;
 		XLogRecPtr	Write;
+		XLogRecPtr	Copy;
 
 		Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult);
 		pg_read_barrier();
 		Write = pg_atomic_read_u64(&XLogCtl->logWriteResult);
+		pg_read_barrier();
+		Copy = pg_atomic_read_u64(&XLogCtl->logCopyResult);
 
 		/* WAL written to disk is always ahead of WAL flushed */
 		Assert(Write >= Flush);
+
+		/* WAL copied to buffers is always ahead of WAL written */
+		Assert(Copy >= Write);
 	}
 #endif
 }
@@ -4951,6 +4979,7 @@ XLOGShmemInit(void)
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 	SpinLockInit(&XLogCtl->info_lck);
+	pg_atomic_init_u64(&XLogCtl->logCopyResult, InvalidXLogRecPtr);
 	pg_atomic_init_u64(&XLogCtl->logWriteResult, InvalidXLogRecPtr);
 	pg_atomic_init_u64(&XLogCtl->logFlushResult, InvalidXLogRecPtr);
 	pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr);
@@ -5979,6 +6008,7 @@ StartupXLOG(void)
 	 * because no other process can be reading or writing WAL yet.
 	 */
 	LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;
+	pg_atomic_write_u64(&XLogCtl->logCopyResult, EndOfLog);
 	pg_atomic_write_u64(&XLogCtl->logWriteResult, EndOfLog);
 	pg_atomic_write_u64(&XLogCtl->logFlushResult, EndOfLog);
 	XLogCtl->LogwrtRqst.Write = EndOfLog;
diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index ff47782cdb..78987f3154 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -570,6 +570,42 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
 	return pg_atomic_sub_fetch_u64_impl(ptr, sub_);
 }
 
+/*
+ * Monotonically advance the given variable using only atomic operations until
+ * it's at least the target value.  Returns the latest value observed, which
+ * may or may not be the target value.
+ *
+ * Full barrier semantics (even when value is unchanged).
+ */
+static inline uint64
+pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
+{
+	uint64		c

Re: Streaming read-ready sequential scan code

2024-04-05 Thread Melanie Plageman

On Fri, Apr 5, 2024 at 12:15 AM Thomas Munro  wrote:
>
> Yeah, I plead benchmarking myopia, sorry.  The fastpath as committed
> is only reached when distance goes 2->1, as pg_prewarm does.  Oops.
> With the attached minor rearrangement, it works fine.  I also poked
> some more at that memory prefetcher.  Here are the numbers I got on a
> desktop system (Intel i9-9900 @ 3.1GHz, Linux 6.1, turbo disabled,
> cpufreq governor=performance, 2MB huge pages, SB=8GB, consumer NMVe,
> GCC -O3).
>
> create table t (i int, filler text) with (fillfactor=10);
> insert into t
> select g, repeat('x', 900) from generate_series(1, 56) g;
> vacuum freeze t;
> set max_parallel_workers_per_gather = 0;
>
> select count(*) from t;
>
> cold = must be read from actual disk (Linux drop_caches)
> warm = read from linux page cache
> hot = already in pg cache via pg_prewarm
>
> cold   warmhot
> master2479ms  886ms  200ms
> seqscan   2498ms  716ms  211ms <-- regression
> seqscan + fastpath2493ms  711ms  200ms <-- fixed, I think?
> seqscan + memprefetch 2499ms  716ms  182ms
> seqscan + fastpath + memprefetch  2505ms  710ms  170ms <-- \O/
>
> Cold has no difference.  That's just my disk demonstrating Linux RA at
> 128kB (default); random I/O is obviously a more interesting story.
> It's consistently a smidgen faster with Linux RA set to 2MB (as in
> blockdev --setra 4096 /dev/nvmeXXX), and I believe this effect
> probably also increases on fancier faster storage than what I have on
> hand:
>
> cold
> master1775ms
> seqscan + fastpath + memprefetch  1700ms
>
> Warm is faster as expected (fewer system calls schlepping data
> kernel->userspace).
>
> The interesting column is hot.  The 200ms->211ms regression is due to
> the extra bookkeeping in the slow path.  The rejiggered fastpath code
> fixes it for me, or maybe sometimes shows an extra 1ms.  Phew.  Can
> you reproduce that?

I am able to reproduce the fast path solving the issue using Heikki's
example here [1] but in shared buffers (hot).

master:   25 ms
stream read:   29 ms
stream read + fast path: 25 ms

I haven't looked into or reviewed the memory prefetching part.

While reviewing 0002, I realized that I don't quite see how
read_stream_get_block() will be used in the fastpath -- which it
claims in its comments.
read_stream_next_buffer() is the only caller of
read_stream_look_ahead()->read_stream_get_block(), and if fast_path is
true, read_stream_next_buffer() always returns before calling
read_stream_look_ahead(). Maybe I am missing something. I see
fast_path uses read_stream_fill_blocknums() to invoke the callback.

Oh and why does READ_STREAM_DISABLE_FAST_PATH macro exist?

Otherwise 0002 looks good to me.

I haven't reviewed 0003 or 0004. I attached a new version (v11)
because I noticed an outdated comment in my seq scan streaming read
user patch (0001). The other patches in the set are untouched from
your versions besides adding author/reviewer info in commit message
for 0002.

- Melanie

[1] 
https://www.postgresql.org/message-id/3b0f3701-addd-4629-9257-cf28e1a6e6a1%40iki.fi
From acbd7172f10857d49d4b5d4afc4efab704b33486 Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Mon, 10 Jul 2023 11:22:34 +0200
Subject: [PATCH v11 3/4] Add pg_prefetch_mem() macro to load cache lines.

Initially mapping to GCC, Clang and MSVC builtins.

Discussion: https://postgr.es/m/CAEepm%3D2y9HM9QP%2BHhRZdQ3pU6FShSMyu%3DV1uHXhQ5gG-dketHg%40mail.gmail.com
---
 config/c-compiler.m4   | 17 
 configure  | 40 ++
 configure.ac   |  3 +++
 meson.build|  1 +
 src/include/c.h|  8 
 src/include/pg_config.h.in |  3 +++
 6 files changed, 72 insertions(+)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 3268a780bb0..4cc02f97601 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -355,6 +355,23 @@ AC_DEFINE_UNQUOTED(AS_TR_CPP([HAVE$1]), 1,
[Define to 1 if your compiler understands $1.])
 fi])# PGAC_CHECK_BUILTIN_FUNC
 
+# PGAC_CHECK_BUILTIN_VOID_FUNC
+# ---
+# Variant for void functions.
+AC_DEFUN([PGAC_CHECK_BUILTIN_VOID_FUNC],
+[AC_CACHE_CHECK(for $1, pgac_cv$1,
+[AC_LINK_IFELSE([AC_LANG_PROGRAM([
+void
+call$1($2)
+{
+$1(x);
+}], [])],
+[pgac_cv$1=yes],
+[pgac_cv$1=no])])
+if test x"${pgac_cv$1}" = xyes ; then
+AC_DEFINE_UNQUOTED(AS_TR_CPP([HAVE$1]), 1,
+   [Define to 1 if your compiler understands $1.])
+fi])# PGAC_CHECK_BUILTIN_VOID_FUNC
 
 
 # PGAC_CHECK_BUILTIN_FUNC_PTR
diff --git a/configure b/configure
index 36feeafbb23..79b78c33ddc 100755
--- a/configure
+++ b/configure
@@ -15543,6 +15543,46 @@ _ACEOF
 
 fi
 
+# Can we use a built-in to prefetch memory?
+{ $as_echo "$as_me:${a

RE: Can't compile PG 17 (master) from git under Msys2 autoconf

2024-04-05 Thread Regina Obe

> > I think I got something not too far off from what's there now that works
> under my msys2 setup again.  This is partly using your idea of using
> $(top_builddir) to qualify the path but in the LN_S section that is causing me
> grief.
> > This seems to work okay building in tree and out of tree.
> > By changing these lines in src/backend/Makefile
> >
> > $(top_builddir)/src/include/storage/lwlocknames.h:
> storage/lmgr/lwlocknames.h
> > rm -f '$@'
> > -   $(LN_S) ../../backend/$< '$@'
> > +   $(LN_S) $(top_builddir)/src/backend/$< '$@'
> >
> >  $(top_builddir)/src/include/utils/wait_event_types.h:
> utils/activity/wait_event_types.h
> > rm -f '$@'
> > -   $(LN_S) ../../backend/$< '$@'
> > +   $(LN_S) $(top_builddir)/src/backend/$< '$@'
> >
> > I've also attached as a patch.
> 
> Hmm, that's quite strange ... it certainly doesn't work for me.  Maybe the
> LN_S utility is resolving the symlink at creation time, rather than letting 
> it be a
> reference to be resolved later.  Apparently, the only Msys2 buildfarm animal 
> is
> now using Meson, so we don't have any representative animal for your
> situation.
> 
> What does LN_S do for you anyway?  Looking at
> https://stackoverflow.com/questions/61594025/symlink-in-msys2-copy-or-
> hard-link
> it sounds like this would work if the MSYS environment variable was set to
> winsymlinks (or maybe not. I don't know if a "windows shortcut" would be
> usable in this case.)
> 
> --
> Álvaro Herrera PostgreSQL Developer  —
> https://www.EnterpriseDB.com/
> "Nadie está tan esclavizado como el que se cree libre no siéndolo" (Goethe)

I think it ends up doing a copy thus the copy error in my log failures which 
don't exist anywhere in the Makefil

cp -pR ../../backend/storage/lmgr/lwlocknames.h

Sorry for not checking on a linux system.  I was thinking I should have done 
that first.

Thanks,
Regina

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-05 Thread Dave Cramer

>
>
> Plus, you've got all of the consequences for non-core drivers, which
> have to both add support for the new wire protocol - if they don't
> want to seem outdated and eventually obsolete - and also test that
> they're still compatible with all supported server versions.
> Connection poolers have the same set of problems. The whole thing is
> almost a hole with no bottom. Keeping up with core changes in this
> area could become a massive undertaking for lots and lots of people,
> some of whom may be the sole maintainer of some important driver that
> now needs a whole bunch of work.
>

We already have this in many places. Headers or functions change and
extensions have to fix their code.
catalog changes force drivers to change their code.
This argument blocks any improvement to the protocol. I don't think it's
unreasonable to expect maintainers to keep up.
We could make it easier by having a specific list for maintainers, but that
doesn't change the work.



> I'm not sure how much it improves things if we imagine adding feature
> flags to the existing protocol versions, rather than whole new
> protocol versions, but at least it cuts down on the assumption that
> adopting new features is mandatory, and that such features are
> cumulative. If a driver wants to support TDE but not protocol
> parameters or protocol parameters but not TDE, who are we to say no?
> If in supporting those things we bump the protocol version to 3.2, and
> then 3.3 fixes a huge performance problem, are drivers going to be
> required to add support for features they don't care about to get the
> performance fixes? I see some benefit in bumping the protocol version
> for major changes, or for changes that we have an important reason to
> make mandatory, or to make previously-optional features for which
> support has become in practical terms universal part of the base
> feature set. But I'm very skeptical of the idea that we should just
> handle as many things as possible via a protocol version bump. We've
> been avoiding protocol version bumps like the plague since forever,
> and swinging all the way to the other extreme doesn't sound like the
> right idea to me.
>

+1 for not swinging too far here. But I don't think it should be a non
starter.
Dave

Re: Can't compile PG 17 (master) from git under Msys2 autoconf

2024-04-05 Thread 'Alvaro Herrera'

On 2024-Apr-05, Regina Obe wrote:

> I think it ends up doing a copy thus the copy error in my log failures which 
> don't exist anywhere in the Makefil
> 
> cp -pR ../../backend/storage/lmgr/lwlocknames.h
> 
> Sorry for not checking on a linux system.  I was thinking I should have done 
> that first.

Ah yeah, that's per configure:

  if ln -s conf$$.file conf$$ 2>/dev/null; then
as_ln_s='ln -s'
# ... but there are two gotchas:
# 1) On MSYS, both `ln -s file dir' and `ln file dir' fail.
# 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable.
# In both cases, we have to default to `cp -pR'.
ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe ||
  as_ln_s='cp -pR'

I guess we need to patch the rule so that the LN_S is called so that
it'd resolve correctly in both cases.  I guess the easy option is to go
back to the original recipe and update the comment to indicate that we
do it to placate Msys2.  Here it is with the old comment:

  -# The point of the prereqdir incantation in some of the rules below is to
  -# force the symlink to use an absolute path rather than a relative path.
  -# For headers which are generated by make distprep, the actual header within
  -# src/backend will be in the source tree, while the symlink in src/include
  -# will be in the build tree, so a simple ../.. reference won't work.
  -# For headers generated during regular builds, we prefer a relative symlink.

   $(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h
  -  prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
  -cd '$(dir $@)' && rm -f $(notdir $@) && \
  -$(LN_S) "$$prereqdir/$(notdir $<)" .

Maybe it's possible to make this simpler, as it looks overly baroque,
and we don't really need absolute paths anyway -- we just need the path
resolved at the right time.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-05 Thread Alvaro Herrera

Hello,

BTW I noticed that 
https://coverage.postgresql.org/src/backend/commands/waitlsn.c.gcov.html
says that lsn_cmp is not covered by the tests.  This probably indicates
that the tests are a little too light, but I'm not sure how much extra
effort we want to spend.

I'm still concerned that WaitLSNCleanup is only called in ProcKill.
Does this mean that if a process throws an error while waiting, it'll
not get cleaned up until it exits?  Maybe this is not a big deal, but it
seems odd.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Now I have my system running, not a byte was off the shelf;
It rarely breaks and when it does I fix the code myself.
It's stable, clean and elegant, and lightning fast as well,
And it doesn't cost a nickel, so Bill Gates can go to hell."

Re: LogwrtResult contended spinlock

2024-04-05 Thread Jeff Davis

On Fri, 2024-04-05 at 13:54 +0200, Alvaro Herrera wrote:
> Couldn't push: I tested with --disable-atomics --disable-spinlocks
> and
> the tests fail because the semaphore for the atomic variables is not
> always initialized.  This is weird -- it's like a client process is
> running at a time when StartupXLOG has not initialized the variable
> ...
> so the initialization in the other place was not completely wrong.

It looks like you solved the problem and pushed the first patch. Looks
good to me.

Patch 0002 also looks good to me, after a mostly-trivial rebase (just
remember to initialize logCopyResult).

Minor comments:

* You could consider using a read barrier before reading the Copy ptr,
or using the pg_atomic_read_membarrier_u64() variant. I don't see a
correctness problem, but it might be slightly more clear and I don't
see a lot of downside.

* Also, I assume that the Max() call in
pg_atomic_monotonic_advance_u64() is just for clarity? Surely the
currval cannot be less than _target when it returns. I'd probably just
do Assert(currval >= _target) and then return currval.

Regards,
Jeff Davis

Re: Security lessons from liblzma

2024-04-05 Thread Jacob Champion

On Fri, Apr 5, 2024 at 6:24 AM Robert Haas  wrote:
> I wonder how hard it would be to just code up our own binary to do
> this. If it'd be a pain to do that, or to maintain it across SSL
> versions, then it's a bad plan and we shouldn't do it. But if it's not
> that much code, maybe it'd be worth considering.

I think my biggest concern, other than the maintenance costs, would be
the statement "we know SSL works on Windows because we test it against
some certificates we hand-rolled ourselves." We can become experts in
certificate formats, of course, but... does it buy us much? If someone
comes and complains that a certificate doesn't work correctly (as they
have *very* recently [3]), I would like to be able to write a test
that uses what OpenSSL actually generates as opposed to learning how
to make it myself first.

> I'm also sort of afraid that we're getting sucked into thinking real
> hard about this SSL certificate issue rather than trying to brainstorm
> all the other places that might be problematic. The latter might be a
> more fruitful exercise (or maybe not, what do I know?).

+1. Don't get me wrong: I spent a lot of time refactoring the sslfiles
machinery a while back, and I would love for it to all go away. I
don't really want to interrupt any lines of thought that are moving in
that direction. Please continue.

_And also:_ the xz attack relied on a long chain of injections, both
technical and social. I'm still wrapping my head around Russ Cox's
writeup [1, 2], but the "hidden blob of junk" is only a single part of
all that. I'm not even sure it was a necessary part; it just happened
to work well for this particular project and line of attack.

I've linked Russ Cox in particular because Golang has made a bunch of
language-level decisions with the supply chain in mind, including the
philosophy that a build should ideally not be able to run arbitrary
code at all, and therefore generated files _must_ be checked in. I
remember $OLDJOB having buildbots that would complain if the contents
of the file you checked in didn't match what was (reproducibly!)
generated. I think there's a lot more to think about here.

--Jacob

[1] https://research.swtch.com/xz-timeline
[2] https://research.swtch.com/xz-script
[3] 
https://www.postgresql.org/message-id/flat/17760-b6c61e752ec07060%40postgresql.org

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-05 Thread stephane tachoires

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

All three patches applied nivcely.
Code fits standart, comments are relevant.

The new status of this patch is: Ready for Committer

Re: Fixing backslash dot for COPY FROM...CSV

2024-04-05 Thread Tom Lane

"Daniel Verite"  writes:
>   Tom Lane wrote:
>> Not sure what to do here.  One idea is to install just the psql-side
>> fix, which should break nothing now that version-2 protocol is dead,
>> and then wait a few years before introducing the server-side change.
>> That seems kind of sad though.

> Wouldn't backpatching solve this?

No, it'd just reduce the surface area a bit.  People on less-than-
the-latest-minor-release would still have the issue.  In any case
back-patching further than v14 would be a nonstarter, because we
didn't remove protocol v2 support till then.

However, the analogy to "\d commands might fail against a newer
server" reduces my level of concern quite a lot: it's hard to
draw much of a line separating that kind of issue from "inline
COPY CSV will fail against a newer server".  It's not like such
failures won't be obvious and fairly easy to diagnose.

regards, tom lane

Obsolete comment in CopyReadLineText()

2024-04-05 Thread Tom Lane

CopyReadLineText quoth:

 * The objective of this loop is to transfer the entire next input line
 * into line_buf.  Hence, we only care for detecting newlines (\r and/or
 * \n) and the end-of-copy marker (\.).
 *
 * In CSV mode, \r and \n inside a quoted field are just part of the data
 * value and are put in line_buf.  We keep just enough state to know if we
 * are currently in a quoted field or not.
 *
 * These four characters, and the CSV escape and quote characters, are
 * assumed the same in frontend and backend encodings.

When that last bit was written, it was because we were detecting
newlines and end-of-copy markers before performing encoding
conversion.  That's not true any more: by the time CopyReadLineText
sees the data, it was already converted by CopyConvertBuf.  So
I don't believe there actually is any such dependency anymore,
and we should simply remove that last sentence.  Any objections?

regards, tom lane

Re: WIP Incremental JSON Parser

2024-04-05 Thread Andrew Dunstan




On 2024-04-05 Fr 11:43, Nathan Bossart wrote:

On Fri, Apr 05, 2024 at 10:15:45AM -0400, Andrew Dunstan wrote:

On 2024-04-04 Th 15:54, Nathan Bossart wrote:

On Thu, Apr 04, 2024 at 03:31:12PM -0400, Andrew Dunstan wrote:

Does the attached patch fix it for you?

It clears up 2 of the 3 warnings for me:

../postgresql/src/common/jsonapi.c: In function ‘IsValidJsonNumber’:
../postgresql/src/common/jsonapi.c:2018:30: error: ‘dummy_lex.inc_state’ may be 
used uninitialized in this function [-Werror=maybe-uninitialized]
   2018 |  if (lex->incremental && !lex->inc_state->is_last_chunk &&
|

Thanks, please try this instead.

LGTM, thanks!



Thanks for checking. Pushed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: meson vs windows perl

2024-04-05 Thread Andrew Dunstan



On 2024-04-05 Fr 10:12, Andrew Dunstan wrote:


On 2024-04-05 Fr 08:25, Andrew Dunstan wrote:




Here is an attempt to fix all that. It's ugly, but I think it's more 
principled.


First, instead of getting the ldopts and then trying to filter out 
the ldflags and ccdlflags, it tells perl not to include those in the 
first place, by overriding a couple of routines in ExtUtils::Embed. 
And second, it's smarter about splitting what's left, so that it 
doesn't split on a space that's in a quoted item. The perl that's 
used to do that second bit is not pretty, but it has been tested on 
the system where the problem arose and apparently cures the problem. 
(No doubt some perl guru could improve it.) It also works on my 
Ubuntu system, so I don't think we'll be breaking anything (famous 
last words).





Apparently I spoke too soon. Please ignore the above for now.





OK, this has been fixed and checked. The attached is what I propose.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/meson.build b/meson.build
index 87437960bc..be87ef6950 100644
--- a/meson.build
+++ b/meson.build
@@ -993,21 +993,15 @@ if not perlopt.disabled()
 # Config's ccdlflags and ldflags.  (Those are the choices of those who
 # built the Perl installation, which are not necessarily appropriate
 # for building PostgreSQL.)
-ldopts = run_command(perl, '-MExtUtils::Embed', '-e', 'ldopts', check: true).stdout().strip()
-undesired = run_command(perl_conf_cmd, 'ccdlflags', check: true).stdout().split()
-undesired += run_command(perl_conf_cmd, 'ldflags', check: true).stdout().split()
-
-perl_ldopts = []
-foreach ldopt : ldopts.split(' ')
-  if ldopt == '' or ldopt in undesired
-continue
-  endif
-
-  perl_ldopts += ldopt.strip('"')
-endforeach
-
+ldopts = run_command(perl, '-MExtUtils::Embed', '-e', '*ExtUtils::Embed::_ldflags = *Extutils::Embed::_ccdlflags = sub { return q[]; }; ldopts',
+ check: true).stdout().strip()
+# avoid use of " char in this perl mini-program to avoid possibly
+# confusing the Windows command processor
+perl_ldopts = run_command(perl, '-MEnglish', '-e',
+  'my $arg = shift; while ($arg =~ /\S/) { if ($arg =~ /^\s*([^\042 ]+)(?![\042])/) { print qq[$1\n]; $arg = $POSTMATCH;} elsif ($arg =~ /^\s*\042([^\042]+)\042/) { print qq[$1\n]; $arg = $POSTMATCH;} }',
+  '--', ldopts, check: true).stdout().strip().split('\n')
 message('LDFLAGS recommended by perl: "@0@"'.format(ldopts))
-message('LDFLAGS for embedding perl: "@0@"'.format(' '.join(perl_ldopts)))
+message('LDFLAGS for embedding perl:\n"@0@"'.format('\n'.join(perl_ldopts)))
 
 perl_dep_int = declare_dependency(
   compile_args: perl_ccflags,

Re: Fixing backslash dot for COPY FROM...CSV

2024-04-05 Thread Tom Lane

After some more poking at this topic, I realize that there is already
very strange and undocumented behavior for backslash-dot even in
non-CSV mode.  Create a file like this:

$ cat eofdata
foobar
foobaz\.
more
\.
yet more

and try importing it with COPY:

regression=# create table eofdata(f1 text);
CREATE TABLE
regression=# copy eofdata from '/home/tgl/pgsql/eofdata';
COPY 2
regression=# table eofdata;
   f1   

 foobar
 foobaz
(2 rows)

That's what you get in 9.0 and earlier versions, and it's already
not-as-documented, because we claim that only \. alone on a line is an
EOF marker; we certainly don't suggest that what's in front of it will
be taken as valid data.  However, somebody broke it some more in 9.1,
because 9.1 up to HEAD produce this result:

regression=# create table eofdata(f1 text);
CREATE TABLE
regression=# copy eofdata from '/home/tgl/pgsql/eofdata';
COPY 3
regression=# table eofdata;
   f1   

 foobar
 foobaz
 more
(3 rows)

So the current behavior is that \. that is on the end of a line,
but is not the whole line, is silently discarded and we keep going.

All versions throw "end-of-copy marker corrupt" if there is
something after \. on the same line.

This is sufficiently weird that I'm starting to come around to
Daniel's original proposal that we just drop the server's recognition
of \. altogether (which would allow removal of some dozens of lines of
complicated and now known-buggy code).  Alternatively, we could fix it
so that \. at the end of a line draws "end-of-copy marker corrupt",
which would at least make things consistent, but I'm not sure that has
any great advantage.  I surely don't want to document the current
behavioral details as being the right thing that we're going to keep
doing.

Thoughts?

regards, tom lane

Re: Improve eviction algorithm in ReorderBuffer

2024-04-05 Thread Jeff Davis

On Fri, 2024-04-05 at 16:58 +0900, Masahiko Sawada wrote:
> IIUC for example in ReorderBuffer, the heap key is transaction size
> and the hash key is xid.

Yes.

> 
> I see your point. It assumes that the bh_node_type is a pointer or at
> least unique. So it cannot work with Datum being a value.

Right. One option might just be to add some comments explaining the API
and limitations, but in general I feel it's confusing to hash a pointer
without a good reason.

> 
> It sounds like a data structure that mixes the hash table and the
> binary heap and we use it as the main storage (e.g. for
> ReorderBufferTXN) instead of using the binary heap as the secondary
> data structure. IIUC with your idea, the indexed binary heap has a
> hash table to store elements each of which has its index within the
> heap node array. I guess it's better to create it as a new data
> structure rather than extending the existing binaryheap, since APIs
> could be very different. I might be missing something, though.

You are right that this approach starts to feel like a new data
structure and is not v17 material.

I am interested in this for v18 though -- we could make the API more
like simplehash to be more flexible when using values (rather than
pointers) and to be able to inline the comparator.

> > * remove the hash table from binaryheap.c
> > 
> > * supply a new callback to the binary heap with type like:
> > 
> >   typedef void (*binaryheap_update_index)(
> >     bh_node_type node,
> >     int new_element_index);
> > 
> > * make the remove, update_up, and update_down methods take the
> > element
> > index rather than the pointer

...

> This shows that the current indexed binaryheap is much slower than
> the
> other two implementations, and the xx_binaryheap has a good number in
> spite of also being indexed.

xx_binaryheap isn't indexed though, right?

> When it comes to
> implementing the above idea (i.e. changing binaryheap to
> xx_binaryheap), it was simple since we just replace the code where we
> update the hash table with the code where we call the callback, if we
> get the consensus on API change.

That seems reasonable to me.

> The fact that we use simplehash for the internal hash table might
> make
> this idea complex. If I understand your suggestion correctly, the
> caller needs to tell the hash table the hash function when creating a
> binaryheap but the hash function needs to be specified at a compile
> time. We can use a dynahash instead but it would make the binaryheap
> slow further.

simplehash.h supports private_data, which makes it easier to track a
callback.

In binaryheap.c, that would look something like:

  static inline uint32
  binaryheap_hash(bh_nodeidx_hash *tab, uint32 key)
  {
 binaryheap_hashfunc hashfunc = tab->private_data;
 return hashfunc(key);
  }

  ...
  #define SH_HASH_KEY(tb, key) binaryheap_hash(tb, key)
  ...

  binaryheap_allocate(int num_nodes, binaryheap_comparator compare,
  void *arg, binaryheap_hashfunc hashfunc)
  {
...
if (hashfunc != NULL)
{
   /* could have a new structure, but we only need to
* store one pointer, so don't bother with palloc/pfree */
   void *private_data = (void *)hashfunc;
   heap->bh_nodeidx = bh_nodeidx_create(..., private_data);
   ...

And in reorderbuffer.c, define the callback like:

  static uint32
  reorderbuffer_xid_hash(TransactionId xid)
  {
/* fasthash32 is 'static inline' so may
 * be faster than hash_bytes()? */
return fasthash32(&xid, sizeof(TransactionId), 0);
  }

In summary, there are two viable approaches for addressing the concerns
in v17:

1. Provide callback to update ReorderBufferTXN->heap_element_index, and
use that index (rather than the pointer) for updating the heap key
(transaction size) or removing elements from the heap.

2. Provide callback for hashing, so that binaryheap.c can hash the xid
value rather than the pointer.

I don't have a strong opinion about which one to use. I prefer
something closer to #1 for v18, but for v17 I suggest whichever one
comes out simpler.

Regards,
Jeff Davis

fasthash32() returning uint64?

2024-04-05 Thread Jeff Davis



In hashfn_unstable.h, fasthash32() is declared as:

  /* like fasthash64, but returns a 32-bit hashcode */
  static inline uint64
  fasthash32(const char *k, size_t len, uint64 seed)

Is the return type of uint64 a typo?

Regards,
Jeff Davis

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Jacob Champion

On Fri, Apr 5, 2024 at 9:59 AM Daniel Gustafsson  wrote:
> Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 fails
> on Windows in CI which I will investigate next.

The changes for SSL_OP_NO_CLIENT_RENEGOTIATION and
SSL_R_VERSION_TOO_LOW look good to me.

> -Remove support for OpenSSL 1.0.2 and 1.1.0
> +Remove support for OpenSSL 1.0.2

I modified the commit message while working on v3 and forgot to put it
back before posting, sorry.

> +++ b/src/include/pg_config.h.in
> @@ -84,9 +84,6 @@
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_CRTDEFS_H
>
> -/* Define to 1 if you have the `CRYPTO_lock' function. */
> -#undef HAVE_CRYPTO_LOCK

An autoreconf run on my machine pulls in more changes (getting rid of
the symbols we no longer check for).

--Jacob

Re: IPC::Run::time[r|out] vs our TAP tests

2024-04-05 Thread Tom Lane

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Tom Lane  writes:
>> So it seems like the bug does not exist in any currently-supported
>> NetBSD release.  Debian has been known to ship obsolete libedit
>> versions, though.

> Both the current (bokworm/12) and previous (bullseye/11) versions of
> Debian have new enough libedits to not be affected by this bug:
> ...
> But in bullseye they decided that OpenSSL is a system library as far as
> the GPL is concerned, so are linking directly to readline.

> And even before then their psql wrapper would LD_PRELOAD readline
> instead if installed, so approximately nobody actually ever used psql
> with libedit on Debian.

Based on this info, I'm disinclined to put work into trying to
make the case behave correctly with that old libedit version, or
even to lobotomize the test case enough so it would pass.

What I suggest Michael do with tanager is install the
OS-version-appropriate version of GNU readline, so that the animal
will test what ilmari describes as the actually common use-case.

(I see that what he did for the moment is add --without-readline.
Perhaps that's a decent long-term choice too, because I think we
have rather little coverage of that option except on Windows.)

regards, tom lane

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Peter Eisentraut


On 05.04.24 18:59, Daniel Gustafsson wrote:

Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 fails
on Windows in CI which I will investigate next.


I'm not a fan of the new PGAC_CHECK_OPENSSL.  It creates a second place 
where the OpenSSL version number has to be updated.  We had this 
carefully constructed so that there is only one place that 
OPENSSL_API_COMPAT is defined and that is the only place that needs to 
be updated.  We put the setting of OPENSSL_API_COMPAT into configure so 
that the subsequent OpenSSL tests would use it, and if the API number 
higher than what the library supports, the tests should fail.  So I 
don't understand why the configure changes have to be so expansive.

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Daniel Gustafsson

> On 5 Apr 2024, at 23:26, Peter Eisentraut  wrote:
> 
> On 05.04.24 18:59, Daniel Gustafsson wrote:
>> Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 
>> fails
>> on Windows in CI which I will investigate next.
> 
> I'm not a fan of the new PGAC_CHECK_OPENSSL.  It creates a second place where 
> the OpenSSL version number has to be updated.  We had this carefully 
> constructed so that there is only one place that OPENSSL_API_COMPAT is 
> defined and that is the only place that needs to be updated.  We put the 
> setting of OPENSSL_API_COMPAT into configure so that the subsequent OpenSSL 
> tests would use it, and if the API number higher than what the library 
> supports, the tests should fail.

But does that actually work?  If I change the API_COMPAT to the 1.1.1 version
number and run configure against 1.0.2 it passes just fine.  Am I missing some
clever trick here?

The reason to expand the check is to ensure that we have the version we want
for both OpenSSL and LibreSSL, and deprecating OpenSSL versions isn't all that
commonly done so having to change the version in the check didn't seem that
invasive to me.

--
Daniel Gustafsson

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Jacob Champion

On Fri, Apr 5, 2024 at 2:48 PM Daniel Gustafsson  wrote:
> But does that actually work?  If I change the API_COMPAT to the 1.1.1 version
> number and run configure against 1.0.2 it passes just fine.  Am I missing some
> clever trick here?

Similarly, I changed my API_COMPAT to a nonsense 0x9010L and
- a 1.1.1 build succeeds
- a 3.0 build fails
- LibreSSL doesn't appear to care or check that #define at all

--Jacob

Re: Add last_commit_lsn to pg_stat_database

2024-04-05 Thread Cary Huang

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Hello

I think it is convenient to know the last commit LSN of a database for 
troubleshooting and monitoring purposes similar to the "pd_lsn" field in a page 
header that records the last LSN that modifies this page. I am not sure if it 
can help determine the WAL location to resume / catch up in logical replication 
as the "confirmed_flush_lsn" and "restart_lsn" in a logical replication slot 
are already there to figure out the resume / catchup point as I understand. 
Anyhow, I had a look at the patch, in general it looks good to me. Also ran it 
against the CI bot, which also turned out fine. Just one small comment though. 
The patch supports the recording of last commit lsn from 2 phase commit as 
well, but the test does not seem to have a test on 2 phase commit. In my 
opinion, it should test whether the last commit lsn increments when a prepared 
transaction is committed in addition to a regular transaction.

thank you
-
Cary Huang
www.highgo.ca

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Daniel Gustafsson

> On 5 Apr 2024, at 22:55, Jacob Champion  
> wrote:
> 
> On Fri, Apr 5, 2024 at 9:59 AM Daniel Gustafsson  wrote:
>> Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 
>> fails
>> on Windows in CI which I will investigate next.

The attached version fixes the Windows 1.1.1 check which was missing the
include directory.

> The changes for SSL_OP_NO_CLIENT_RENEGOTIATION and
> SSL_R_VERSION_TOO_LOW look good to me.

Thanks!

>> +++ b/src/include/pg_config.h.in
>> @@ -84,9 +84,6 @@
>> /* Define to 1 if you have the  header file. */
>> #undef HAVE_CRTDEFS_H
>> 
>> -/* Define to 1 if you have the `CRYPTO_lock' function. */
>> -#undef HAVE_CRYPTO_LOCK
> 
> An autoreconf run on my machine pulls in more changes (getting rid of
> the symbols we no longer check for).

Ah yes, missed updating before formatting the patch. Done in the attached.

--
Daniel Gustafsson



v5-0001-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data

RE: Can't compile PG 17 (master) from git under Msys2 autoconf

2024-04-05 Thread Regina Obe

> > I think it ends up doing a copy thus the copy error in my log failures
> > which don't exist anywhere in the Makefil
> >
> > cp -pR ../../backend/storage/lmgr/lwlocknames.h
> >
> > Sorry for not checking on a linux system.  I was thinking I should have done
> that first.
> 
> Ah yeah, that's per configure:
> 
>   if ln -s conf$$.file conf$$ 2>/dev/null; then
> as_ln_s='ln -s'
> # ... but there are two gotchas:
> # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail.
> # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable.
> # In both cases, we have to default to `cp -pR'.
> ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe ||
>   as_ln_s='cp -pR'
> 
> I guess we need to patch the rule so that the LN_S is called so that it'd 
> resolve
> correctly in both cases.  I guess the easy option is to go back to the 
> original
> recipe and update the comment to indicate that we do it to placate Msys2.
> Here it is with the old comment:
> 
>   -# The point of the prereqdir incantation in some of the rules below is to
>   -# force the symlink to use an absolute path rather than a relative path.
>   -# For headers which are generated by make distprep, the actual header
> within
>   -# src/backend will be in the source tree, while the symlink in src/include
>   -# will be in the build tree, so a simple ../.. reference won't work.
>   -# For headers generated during regular builds, we prefer a relative 
> symlink.
> 
>$(top_builddir)/src/include/storage/lwlocknames.h:
> storage/lmgr/lwlocknames.h
>   -  prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
>   -cd '$(dir $@)' && rm -f $(notdir $@) && \
>   -$(LN_S) "$$prereqdir/$(notdir $<)" .
> 
> 
> Maybe it's possible to make this simpler, as it looks overly baroque, and we
> don't really need absolute paths anyway -- we just need the path resolved at
> the right time.
> 
> --
> Álvaro HerreraBreisgau, Deutschland  —
> https://www.EnterpriseDB.com/

Yah I was thinking it was removed cause 
no one could figure out why it was so complicated and decided to make it more 
readable.

Re: Popcount optimization using AVX512

2024-04-05 Thread David Rowley

On Sat, 6 Apr 2024 at 04:38, Nathan Bossart  wrote:
> This seems to provide a small performance boost, so I've incorporated it
> into v27.

Won't Valgrind complain about this?

+pg_popcount_avx512(const char *buf, int bytes)

+ buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf);

+ val = _mm512_maskz_loadu_epi8(mask, (const __m512i *) buf);

David

DROP OWNED BY fails to clean out pg_init_privs grants

2024-04-05 Thread Tom Lane

I wondered why buildfarm member copperhead has started to fail
xversion-upgrade-HEAD-HEAD tests.  I soon reproduced the problem here:

pg_restore: creating ACL "regress_pg_dump_schema.TYPE "test_type""
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 4355; 0 0 ACL TYPE "test_type" buildfarm
pg_restore: error: could not execute query: ERROR:  role "74603" does not exist
Command was: SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);
GRANT ALL ON TYPE "regress_pg_dump_schema"."test_type" TO "74603";
SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);
REVOKE ALL ON TYPE "regress_pg_dump_schema"."test_type" FROM "74603";

(So now I'm wondering why *only* copperhead has shown this so far.
Are our other cross-version-upgrade testing animals AWOL?)

I believe this is a longstanding problem that was exposed by accident
by commit 936e3fa37.  If you run "make installcheck" in HEAD's
src/test/modules/test_pg_dump, and then poke around in the leftover
contrib_regression database, you can find dangling grants in
pg_init_privs:

contrib_regression=# table pg_init_privs;
 objoid | classoid | objsubid | privtype |initprivs 
   
+--+--+--+--
---
  ...
es}
  43134 | 1259 |0 | e| {postgres=rwU/postgres,43125=U/postgr
es}
  43128 | 1259 |0 | e| {postgres=arwdDxtm/postgres,43125=r/p
ostgres}
  ...

The fact that the DROP ROLE added by 936e3fa37 succeeded indicates
that these role references weren't captured in pg_shdepend.
I imagine that we also lack code that would allow DROP OWNED BY to
follow up on such entries if they existed, but I've not checked that
for sure.  In any case, there's probably a nontrivial amount of code
to be written to make this work.

Given the lack of field complaints, I suspect that extension scripts
simply don't grant privileges to random roles that aren't the
extension's owner.  So I wonder a little bit if this is even worth
fixing, as opposed to blocking off somehow.  But probably we should
first try to fix it.

I doubt this is something we'll have fixed by Monday, so I will
go add an open item for it.

regards, tom lane

Re: Streaming read-ready sequential scan code

2024-04-05 Thread Thomas Munro

On Sat, Apr 6, 2024 at 6:55 AM Melanie Plageman
 wrote:
> On Fri, Apr 5, 2024 at 12:15 AM Thomas Munro  wrote:
> > The interesting column is hot.  The 200ms->211ms regression is due to
> > the extra bookkeeping in the slow path.  The rejiggered fastpath code
> > fixes it for me, or maybe sometimes shows an extra 1ms.  Phew.  Can
> > you reproduce that?
>
> I am able to reproduce the fast path solving the issue using Heikki's
> example here [1] but in shared buffers (hot).
>
> master:   25 ms
> stream read:   29 ms
> stream read + fast path: 25 ms

Great, thanks.

> I haven't looked into or reviewed the memory prefetching part.
>
> While reviewing 0002, I realized that I don't quite see how
> read_stream_get_block() will be used in the fastpath -- which it
> claims in its comments.

Comments improved.

> Oh and why does READ_STREAM_DISABLE_FAST_PATH macro exist?

Just for testing purposes.  Behaviour should be identical to external
observers either way, it's just a hand-rolled specialisation for
certain parameters, and it's useful to be able to verify that and
measure the speedup.  I think it's OK to leave a bit of
developer/testing scaffolding in the tree if it's likely to be useful
again and especially if like this case it doesn't create any dead
code.  (Perhaps in later work we might find the right way to get the
compiler to do the specialisation?  It's so simple though.)

The occasional CI problem I mentioned turned out to be
read_stream_reset() remembering a little too much state across
rescans.  Fixed.

Thanks both for the feedback on the ring buffer tweaks.  Comments
updated.  Here is the full stack of patches I would like to commit
very soon, though I may leave the memory prefetching part out for a
bit longer to see if I can find any downside.
From 54efd755040507b55672092907d53b4db30f5a06 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 5 Apr 2024 12:08:24 +1300
Subject: [PATCH v12 1/6] Improve read_stream.c's fast path.

Unfortunately the "fast path" for cached scans that don't do any I/O was
accidentally coded in a way that could only be triggered by pg_prewarm's
usage patter.  We really need it to work for the streaming sequential
scan patch.  Refactor.

Reviewed-by: Melanie Plageman 
Discussion: https://postgr.es/m/CA%2BhUKGKXZALJ%3D6aArUsXRJzBm%3Dqvc4AWp7%3DiJNXJQqpbRLnD_w%40mail.gmail.com
---
 src/backend/storage/aio/read_stream.c | 75 +++
 1 file changed, 31 insertions(+), 44 deletions(-)

diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index 4f21262ff5e..b9e11a28312 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -169,7 +169,7 @@ get_per_buffer_data(ReadStream *stream, int16 buffer_index)
 /*
  * Ask the callback which block it would like us to read next, with a small
  * buffer in front to allow read_stream_unget_block() to work and to allow the
- * fast path to work in batches.
+ * fast path to skip this function and work directly from the array.
  */
 static inline BlockNumber
 read_stream_get_block(ReadStream *stream, void *per_buffer_data)
@@ -578,13 +578,12 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
 	if (likely(stream->fast_path))
 	{
 		BlockNumber next_blocknum;
-		bool		need_wait;
 
 		/* Fast path assumptions. */
 		Assert(stream->ios_in_progress == 0);
 		Assert(stream->pinned_buffers == 1);
 		Assert(stream->distance == 1);
-		Assert(stream->pending_read_nblocks == 1);
+		Assert(stream->pending_read_nblocks == 0);
 		Assert(stream->per_buffer_data_size == 0);
 
 		/* We're going to return the buffer we pinned last time. */
@@ -594,40 +593,29 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
 		buffer = stream->buffers[oldest_buffer_index];
 		Assert(buffer != InvalidBuffer);
 
-		/*
-		 * Pin a buffer for the next call.  Same buffer entry, and arbitrary
-		 * I/O entry (they're all free).
-		 */
-		need_wait = StartReadBuffer(&stream->ios[0].op,
-	&stream->buffers[oldest_buffer_index],
-	stream->pending_read_blocknum,
-	stream->advice_enabled ?
-	READ_BUFFERS_ISSUE_ADVICE : 0);
-
-		/* Choose the block the next call will pin. */
+		/* Choose the next block to pin. */
 		if (unlikely(stream->blocknums_next == stream->blocknums_count))
 			read_stream_fill_blocknums(stream);
 		next_blocknum = stream->blocknums[stream->blocknums_next++];
 
-		/*
-		 * Fast return if the next call doesn't require I/O for the buffer we
-		 * just pinned, and we have a block number to give it as a pending
-		 * read.
-		 */
-		if (likely(!need_wait && next_blocknum != InvalidBlockNumber))
+		if (likely(next_blocknum != InvalidBlockNumber))
 		{
-			stream->pending_read_blocknum = next_blocknum;
-			return buffer;
-		}
-
-		/*
-		 * For anything more complex, set up some more state and take the slow
-		 * path next time.
-		 */
-		stream->fast_path = false;
+

Re: Is this a problem in GenericXLogFinish()?

2024-04-05 Thread Michael Paquier

On Fri, Apr 05, 2024 at 01:26:29PM +0900, Michael Paquier wrote:
> On Wed, Feb 07, 2024 at 06:42:21PM +0900, Michael Paquier wrote:
>> On Wed, Feb 07, 2024 at 02:08:42PM +0530, Amit Kapila wrote:
>> > Thanks for the report and looking into it. Pushed!
>> 
>> Thanks Amit for the commit and Kuroda-san for the patch!
> 
> I have been pinged about this thread and I should have looked a bit
> closer, but wait a minute here.

I have forgotten to mention that Zubeyr Eryilmaz, a colleague, should
be credited here.
--
Michael


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 131 matches

Mail list logo