Re: remaining sql/json patches
Hi Alexander, On Fri, Apr 5, 2024 at 3:00 PM Alexander Lakhin wrote: > > Hello Amit, > > 04.04.2024 15:02, Amit Langote wrote: > > Pushed after fixing these and a few other issues. I didn't include > > the testing function you proposed in your other email. It sounds > > useful for testing locally but will need some work before we can > > include it in the tree. > > > > I'll post the rebased 0002 tomorrow after addressing your comments. > > Please look at an assertion failure: > TRAP: failed Assert("count <= tupdesc->natts"), File: "parse_relation.c", > Line: 3048, PID: 1325146 > > triggered by the following query: > SELECT * FROM JSON_TABLE('0', '$' COLUMNS (js int PATH '$')), >COALESCE(row(1)) AS (a int, b int); > > Without JSON_TABLE() I get: > ERROR: function return row and query-specified return row do not match > DETAIL: Returned row contains 1 attribute, but query expects 2. Thanks for the report. Seems like it might be a pre-existing issue, because I can also reproduce the crash with: SELECT * FROM COALESCE(row(1)) AS (a int, b int); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> Backtrace: #0 __pthread_kill_implementation (threadid=281472845250592, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x806c4334 in __pthread_kill_internal (signo=6, threadid=) at pthread_kill.c:78 #2 0x8067c73c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x80669034 in __GI_abort () at abort.c:79 #4 0x00ad9d4c in ExceptionalCondition (conditionName=0xcbb368 "!(tupdesc->natts >= colcount)", errorType=0xcbb278 "FailedAssertion", fileName=0xcbb2c8 "nodeFunctionscan.c", lineNumber=379) at assert.c:54 #5 0x0073edec in ExecInitFunctionScan (node=0x293d4ed0, estate=0x293d51b8, eflags=16) at nodeFunctionscan.c:379 #6 0x00724bc4 in ExecInitNode (node=0x293d4ed0, estate=0x293d51b8, eflags=16) at execProcnode.c:248 #7 0x0071b1cc in InitPlan (queryDesc=0x292f5d78, eflags=16) at execMain.c:1006 #8 0x00719f6c in standard_ExecutorStart (queryDesc=0x292f5d78, eflags=16) at execMain.c:252 #9 0x00719cac in ExecutorStart (queryDesc=0x292f5d78, eflags=0) at execMain.c:134 #10 0x00945520 in PortalStart (portal=0x29399458, params=0x0, eflags=0, snapshot=0x0) at pquery.c:527 #11 0x0093ee50 in exec_simple_query (query_string=0x29332d38 "SELECT * FROM COALESCE(row(1)) AS (a int, b int);") at postgres.c:1175 #12 0x00943cb8 in PostgresMain (argc=1, argv=0x2935d610, dbname=0x2935d450 "postgres", username=0x2935d430 "amit") at postgres.c:4297 #13 0x0087e978 in BackendRun (port=0x29356c00) at postmaster.c:4517 #14 0x0087e0bc in BackendStartup (port=0x29356c00) at postmaster.c:4200 #15 0x00879638 in ServerLoop () at postmaster.c:1725 #16 0x00878eb4 in PostmasterMain (argc=3, argv=0x292eeac0) at postmaster.c:1398 #17 0x00791db8 in main (argc=3, argv=0x292eeac0) at main.c:228 Backtrace looks a bit different with a query similar to yours: SELECT * FROM generate_series(1, 1), COALESCE(row(1)) AS (a int, b int); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> #0 __pthread_kill_implementation (threadid=281472845250592, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x806c4334 in __pthread_kill_internal (signo=6, threadid=) at pthread_kill.c:78 #2 0x8067c73c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x80669034 in __GI_abort () at abort.c:79 #4 0x00ad9d4c in ExceptionalCondition (conditionName=0xc903b0 "!(count <= tupdesc->natts)", errorType=0xc8f8c8 "FailedAssertion", fileName=0xc8f918 "parse_relation.c", lineNumber=2649) at assert.c:54 #5 0x00649664 in expandTupleDesc (tupdesc=0x293da188, eref=0x293d7318, count=2, offset=0, rtindex=2, sublevels_up=0, location=-1, include_dropped=true, colnames=0x0, colvars=0xc39253c8) at parse_relation.c:2649 #6 0x00648d08 in expandRTE (rte=0x293d7390, rtindex=2, sublevels_up=0, location=-1, include_dropped=true, colnames=0x0, colvars=0xc39253c8) at parse_relation.c:2361 #7 0x00849bd0 in build_physical_tlist (root=0x293d5318, rel=0x293d88e8) at plancat.c:1681 #8 0x00806ad0 in create_scan_plan (root=0x293d5318, best_path=0x293cd888, flags=0) at createplan.c:605 #9 0x0080666c in create_plan_recurse (root=0x293d5318, best_path=0x293cd888, flags=0) at createplan.c:389 #10 0x0080c4e8 in create_nestloop_plan (root=0x293d5318, best_path=0x293d96f0) at createplan.c:4056 #11 0x00807464 in create_join_plan (root=0x293d5318, best_path=0x293d96f0) at createplan.c:1037 #
Re: Popcount optimization using AVX512
On Fri, 5 Apr 2024 at 07:15, Nathan Bossart wrote: > Here is an updated patch set. IMHO this is in decent shape and is > approaching committable. I checked the code generation on various gcc and clang versions. It looks mostly fine starting from versions where avx512 is supported, gcc-7.1 and clang-5. The main issue I saw was that clang was able to peel off the first iteration of the loop and then eliminate the mask assignment and replace masked load with a memory operand for vpopcnt. I was not able to convince gcc to do that regardless of optimization options. Generated code for the inner loop: clang: : 50: add rdx, 64 54: cmp rdx, rdi 57: jae 59: vpopcntq zmm1, zmmword ptr [rdx] 5f: vpaddq zmm0, zmm1, zmm0 65: jmp gcc: : 38: kmovq k1, rdx 3d: vmovdqu8 zmm0 {k1} {z}, zmmword ptr [rax] 43: add rax, 64 47: mov rdx, -1 4e: vpopcntq zmm0, zmm0 54: vpaddq zmm0, zmm0, zmm1 5a: vmovdqa64 zmm1, zmm0 60: cmp rax, rsi 63: jb I'm not sure how much that matters in practice. Attached is a patch to do this manually giving essentially the same result in gcc. As most distro packages are built using gcc I think it would make sense to have the extra code if it gives a noticeable benefit for large cases. The visibility map patch has the same issue, otherwise looks good. Regards, Ants Aasma diff --git a/src/port/pg_popcount_avx512.c b/src/port/pg_popcount_avx512.c index dacc7553d29..f6e718b86e9 100644 --- a/src/port/pg_popcount_avx512.c +++ b/src/port/pg_popcount_avx512.c @@ -52,13 +52,21 @@ pg_popcount_avx512(const char *buf, int bytes) * Iterate through all but the final iteration. Starting from second * iteration, the start index mask is ignored. */ - for (; buf < final; buf += sizeof(__m512i)) + if (buf < final) { val = _mm512_maskz_loadu_epi8(mask, (const __m512i *) buf); cnt = _mm512_popcnt_epi64(val); accum = _mm512_add_epi64(accum, cnt); + buf += sizeof(__m512i); mask = ~UINT64CONST(0); + + for (; buf < final; buf += sizeof(__m512i)) + { + val = _mm512_load_si512((const __m512i *) buf); + cnt = _mm512_popcnt_epi64(val); + accum = _mm512_add_epi64(accum, cnt); + } } /* Final iteration needs to ignore bytes that are not within the length */
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Apr 05, 2024 at 11:21:43AM +0530, Bharath Rupireddy wrote: > On Wed, Apr 3, 2024 at 9:57 PM Bertrand Drouvot > wrote: > Please find the attached v36 patch. Thanks! A few comments: 1 === + +The timeout is measured from the time since the slot has become +inactive (known from its +inactive_since value) until it gets +used (i.e., its active is set to true). + That's right except when it's invalidated during the checkpoint (as the slot is not acquired in CheckPointReplicationSlots()). So, what about adding: "or a checkpoint occurs"? That would also explain that the invalidation could occur during checkpoint. 2 === + /* If the slot has been invalidated, recalculate the resource limits */ + if (invalidated) + { /If the slot/If a slot/? 3 === + * NB - this function also runs as part of checkpoint, so avoid raising errors s/NB - this/NB: This function/? (that looks more consistent with other comments in the code) 4 === + * Note that having a new function for RS_INVAL_INACTIVE_TIMEOUT cause instead I understand it's "the RS_INVAL_INACTIVE_TIMEOUT cause" but reading "cause instead" looks weird to me. Maybe it would make sense to reword this a bit. 5 === +* considered not active as they don't actually perform logical decoding. Not sure that's 100% accurate as we switched in fast forward logical in 2ec005b4e2. "as they perform only fast forward logical decoding (or not at all)", maybe? 6 === + if (RecoveryInProgress() && slot->data.synced) + return false; + + if (replication_slot_inactive_timeout == 0) + return false; What about just using one if? It's more a matter of taste but it also probably reduces the object file size a bit for non optimized build. 7 === + /* +* Do not invalidate the slots which are currently being synced from +* the primary to the standby. +*/ + if (RecoveryInProgress() && slot->data.synced) + return false; I think we don't need this check as the exact same one is done just before. 8 === +sub check_for_slot_invalidation_in_server_log +{ + my ($node, $slot_name, $offset) = @_; + my $invalidated = 0; + + for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++) + { + $node->safe_psql('postgres', "CHECKPOINT"); Wouldn't be better to wait for the replication_slot_inactive_timeout time before instead of triggering all those checkpoints? (it could be passed as an extra arg to wait_for_slot_invalidation()). 9 === # Synced slot mustn't get invalidated on the standby, it must sync invalidation # from the primary. So, we must not see the slot's invalidation message in server # log. ok( !$standby1->log_contains( "invalidating obsolete replication slot \"lsub1_sync_slot\"", $standby1_logstart), 'check that syned slot has not been invalidated on the standby'); Would that make sense to trigger a checkpoint on the standby before this test? I mean I think that without a checkpoint on the standby we should not see the invalidation in the log anyway. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Improve eviction algorithm in ReorderBuffer
On Fri, Apr 5, 2024 at 2:55 AM Jeff Davis wrote: > > On Thu, 2024-04-04 at 17:28 +0900, Masahiko Sawada wrote: > > > Perhaps it's not worth the effort though, if performance is already > > > good enough? > > > > Yeah, it would be better to measure the overhead first. I'll do that. > > I have some further comments and I believe changes are required for > v17. > > An indexed binary heap API requires both a comparator and a hash > function to be specified, and has two different kinds of keys: the heap > key (mutable) and the hash key (immutable). It provides heap methods > and hashtable methods, and keep the two internal structures (heap and > HT) in sync. IIUC for example in ReorderBuffer, the heap key is transaction size and the hash key is xid. > > The implementation in b840508644 uses the bh_node_type as the hash key, > which is just a Datum, and it just hashes the bytes. I believe the > implicit assumption is that the Datum is a pointer -- I'm not sure how > one would use that API if the Datum were a value. Hashing a pointer > seems strange to me and, while I see why you did it that way, I think > it reflects that the API boundaries are not quite right. I see your point. It assumes that the bh_node_type is a pointer or at least unique. So it cannot work with Datum being a value. > > One consequence of using the pointer as the hash key is that you need > to find the pointer first: you can't change or remove elements based on > the transaction ID, you have to get the ReorderBufferTXN pointer by > finding it in another structure, first. Currently, that's being done by > searching ReorderBuffer->by_txn. So we actually have two hash tables > for essentially the same purpose: one with xid as the key, and the > other with the pointer as the key. That makes no sense -- let's have a > proper indexed binary heap to look things up by xid (the internal HT) > or by transaction size (using the internal heap). > > I suggest: > > * Make a proper indexed binary heap API that accepts a hash function > and provides both heap methods and HT methods that operate based on > values (transaction size and transaction ID, respectively). > * Get rid of ReorderBuffer->by_txn and use the indexed binary heap > instead. > > This will be a net simplification in reorderbuffer.c, which is good, > because that file makes use of a *lot* of data strucutres. > It sounds like a data structure that mixes the hash table and the binary heap and we use it as the main storage (e.g. for ReorderBufferTXN) instead of using the binary heap as the secondary data structure. IIUC with your idea, the indexed binary heap has a hash table to store elements each of which has its index within the heap node array. I guess it's better to create it as a new data structure rather than extending the existing binaryheap, since APIs could be very different. I might be missing something, though. On Fri, Apr 5, 2024 at 3:55 AM Jeff Davis wrote: > > On Thu, 2024-04-04 at 10:55 -0700, Jeff Davis wrote: > > * Make a proper indexed binary heap API that accepts a hash > > function > > and provides both heap methods and HT methods that operate based on > > values (transaction size and transaction ID, respectively). > > * Get rid of ReorderBuffer->by_txn and use the indexed binary heap > > instead. > > An alternative idea: > > * remove the hash table from binaryheap.c > > * supply a new callback to the binary heap with type like: > > typedef void (*binaryheap_update_index)( > bh_node_type node, > int new_element_index); > > * make the remove, update_up, and update_down methods take the element > index rather than the pointer > > reorderbuffer.c would then do something like: > > void > txn_update_heap_index(ReorderBufferTXN *txn, int new_element_index) > { > txn->heap_element_index = new_element_index; > } > > ... > > txn_heap = binaryheap_allocate(..., txn_update_heap_index, ...); > > and then binaryheap.c would effectively maintain txn- > >heap_element_index, so reorderbuffer.c can pass that to the APIs that > require the element index. Thank you for the idea. I was thinking the same idea when considering your previous comment. With this idea, we still use the binaryheap for ReorderBuffer as the second data structure. Since we can implement this idea with relatively small changes to the current binaryheap, I've implemented it and measured performances. I've attached a patch that adds an extension for benchmarking binaryheap implementations. binaryheap_bench.c is the main test module. To make the comparison between different binaryheap implementations, the extension includes two different binaryheap implementations. Therefore, binaryheap_bench.c uses three different binaryheap implementation in total as the comment on top of the file says: /* * This benchmark tool uses three binary heap implementations. * * "binaryheap" is the current binaryheap implementation in PostgreSQL. That * is, it internally has a hash table to track
Re: remaining sql/json patches
05.04.2024 10:09, Amit Langote wrote: Seems like it might be a pre-existing issue, because I can also reproduce the crash with: SELECT * FROM COALESCE(row(1)) AS (a int, b int); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> Backtrace: #0 __pthread_kill_implementation (threadid=281472845250592, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x806c4334 in __pthread_kill_internal (signo=6, threadid=) at pthread_kill.c:78 #2 0x8067c73c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x80669034 in __GI_abort () at abort.c:79 #4 0x00ad9d4c in ExceptionalCondition (conditionName=0xcbb368 "!(tupdesc->natts >= colcount)", errorType=0xcbb278 "FailedAssertion", fileName=0xcbb2c8 "nodeFunctionscan.c", lineNumber=379) at assert.c:54 That's strange, because I get the error (on master, 6f132ed69). With backtrace_functions = 'tupledesc_match', I see 2024-04-05 10:48:27.827 MSK client backend[2898632] regress ERROR: function return row and query-specified return row do not match 2024-04-05 10:48:27.827 MSK client backend[2898632] regress DETAIL: Returned row contains 1 attribute, but query expects 2. 2024-04-05 10:48:27.827 MSK client backend[2898632] regress BACKTRACE: tupledesc_match at execSRF.c:948:3 ExecMakeTableFunctionResult at execSRF.c:427:13 FunctionNext at nodeFunctionscan.c:94:5 ExecScanFetch at execScan.c:131:10 ExecScan at execScan.c:180:10 ExecFunctionScan at nodeFunctionscan.c:272:1 ExecProcNodeFirst at execProcnode.c:465:1 ExecProcNode at executor.h:274:9 (inlined by) ExecutePlan at execMain.c:1646:10 standard_ExecutorRun at execMain.c:363:3 ExecutorRun at execMain.c:305:1 PortalRunSelect at pquery.c:926:26 PortalRun at pquery.c:775:8 exec_simple_query at postgres.c:1282:3 PostgresMain at postgres.c:4684:27 BackendMain at backend_startup.c:57:2 pgarch_die at pgarch.c:847:1 BackendStartup at postmaster.c:3593:8 ServerLoop at postmaster.c:1674:6 main at main.c:184:3 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f37127f0e40] 2024-04-05 10:48:27.827 MSK client backend[2898632] regress STATEMENT: SELECT * FROM COALESCE(row(1)) AS (a int, b int); That's why I had attributed the failure to JSON_TABLE(). Though SELECT * FROM generate_series(1, 1), COALESCE(row(1)) AS (a int, b int); really triggers the assert too. Sorry for the noise... Best regards, Alexander
Re: remaining sql/json patches
On Fri, Apr 5, 2024 at 5:00 PM Alexander Lakhin wrote: > 05.04.2024 10:09, Amit Langote wrote: > > Seems like it might be a pre-existing issue, because I can also > > reproduce the crash with: > > That's strange, because I get the error (on master, 6f132ed69). > With backtrace_functions = 'tupledesc_match', I see > 2024-04-05 10:48:27.827 MSK client backend[2898632] regress ERROR: function > return row and query-specified return row do > not match > 2024-04-05 10:48:27.827 MSK client backend[2898632] regress DETAIL: Returned > row contains 1 attribute, but query expects 2. > 2024-04-05 10:48:27.827 MSK client backend[2898632] regress BACKTRACE: > tupledesc_match at execSRF.c:948:3 > ExecMakeTableFunctionResult at execSRF.c:427:13 > FunctionNext at nodeFunctionscan.c:94:5 > ExecScanFetch at execScan.c:131:10 > ExecScan at execScan.c:180:10 > ExecFunctionScan at nodeFunctionscan.c:272:1 > ExecProcNodeFirst at execProcnode.c:465:1 > ExecProcNode at executor.h:274:9 > (inlined by) ExecutePlan at execMain.c:1646:10 > standard_ExecutorRun at execMain.c:363:3 > ExecutorRun at execMain.c:305:1 > PortalRunSelect at pquery.c:926:26 > PortalRun at pquery.c:775:8 > exec_simple_query at postgres.c:1282:3 > PostgresMain at postgres.c:4684:27 > BackendMain at backend_startup.c:57:2 > pgarch_die at pgarch.c:847:1 > BackendStartup at postmaster.c:3593:8 > ServerLoop at postmaster.c:1674:6 > main at main.c:184:3 > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) > [0x7f37127f0e40] > 2024-04-05 10:48:27.827 MSK client backend[2898632] regress STATEMENT: > SELECT * FROM COALESCE(row(1)) AS (a int, b int); > > That's why I had attributed the failure to JSON_TABLE(). > > Though SELECT * FROM generate_series(1, 1), COALESCE(row(1)) AS (a int, b > int); > really triggers the assert too. > Sorry for the noise... No worries. Let's start another thread so that this gets more attention. -- Thanks, Amit Langote
Bad estimation for NOT IN clause with big null fraction
Hi hackers Discussion[1] and the relevant commit[2] improved the selectivity calculation for IN/NOT IN. This is the current logic for NOT IN selectivity calculation and it loops over the array elements. else { s1 = s1 * s2; if (isInequality) s1disjoint += s2 - 1.0; } By calculating s2 for each array element, it calls neqsel and returns 1 - eqsel - nullfrac. If I expand the s1disjoint calculation for a NOT IN (2,5,8) clause, It eventually becomes 1 - eqsel(2) - eqsel(5) - eqsel(8) - 3*nullfrac. If nullfrac is big, s1disjoint will be less than 0 quickly when the array has more elements, and the selectivity algorithm falls back to the one prior to commit[2] which had bad estimation for NOT IN as well. It seems to me that nullfrac should be subtracted only once. Is it feasible that we have a new variable s1disjoint2 that add back nullfrac when we get back the result for each s2 and subtract it once at the end of the loop as a 2nd heuristic? We then maybe prefer s1disjoint2 over s1disjoint and then s1? Donghang Lin (ServiceNow) [1] https://www.postgresql.org/message-id/flat/CA%2Bmi_8aPEAzBgWZpNTABGM%3DcSq7mRMyPWbMsU8eGmUfH75OTLA%40mail.gmail.com [2] https://github.com/postgres/postgres/commit/66a7e6bae98592d1d98d9ef589753f0e953c5828
Re: LogwrtResult contended spinlock
On 2024-Apr-05, Bharath Rupireddy wrote: > 1. > /* > * Update local copy of shared XLogCtl->log{Write,Flush}Result > + * > + * It's critical that Flush always trails Write, so the order of the reads is > + * important, as is the barrier. > */ > #define RefreshXLogWriteResult(_target) \ > do { \ > -_target.Write = XLogCtl->logWriteResult; \ > -_target.Flush = XLogCtl->logFlushResult; \ > +_target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \ > +pg_read_barrier(); \ > +_target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \ > } while (0) > > Is it "Flush always trails Write" or "Flush always leades Write"? I > guess the latter, no? Well, Flush cannot lead Write. We cannot Flush what hasn't been Written! To me, "Flush trails Write" means that flush is behind. That seems supported by Merriam-Webster[1], which says in defining it as a transitive verb 3 a : to follow upon the scent or trace of : TRACK b : to follow in the footsteps of : PURSUE c : to follow along behind d : to lag behind (someone, such as a competitor) Maybe it's not super clear as a term. We could turn it around and say "Write always leads Flush". [1] https://www.merriam-webster.com/dictionary/trail The reason we have the barrier, and the reason we write and read them in this order, is that we must never read a Flush value that is newer than the Write value we read. That is: the values are written in pairs (w1,f1) first, (w2,f2) next, and so on. It's okay if we obtain (w2,f1) (new write, old flush), but it's not okay if we obtain (w1,f2) (old write, new flush). If we did, we might end up with a Flush value that's ahead of Write, violating the invariant. > 2. > + > +pg_atomic_write_u64(&XLogCtl->logWriteResult, LogwrtResult.Write); > +pg_write_barrier(); > +pg_atomic_write_u64(&XLogCtl->logFlushResult, LogwrtResult.Flush); > } > > Maybe add the reason as to why we had to write logWriteResult first > and then logFlushResult, similar to the comment atop > RefreshXLogWriteResult? Hmm, I had one there and removed it. I'll put something back. > > For 0002, did you consider having pg_atomic_monotonic_advance_u64() > > return the currval? > > +1 for returning currval here. Oh yeah, I had that when the monotonic stuff was used by 0001 but lost it when I moved to 0002. I'll push 0001 now and send an updated 0002 with the return value added. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)
Re: Looking for an index that supports top-n searches by enforcing a max-n automatically
Just about as soon as I sent the above, I realized that it's unlikely to make sense in the real world in a row-store. If the goal is to keep the top-25 results and trim the rest, what happens when values are added/modified/deleted? You now *have to go look at all of the data you aren't caching in the index. *Unless you can *guarantee* that data is entered in perfect order, and/or never changes, I don't think what I'm looking for is likely to make sense. On Fri, Apr 5, 2024 at 11:27 AM Morris de Oryx wrote: > Looking for an index to support top-n searches, were n has a fixed maximum > > Recently, I've been looking at strategies to handle top-n queries in > Postgres. In my current cases, we've got definition tables, and very large > related tables. Here's a stripped-down example, the real tables are much > wider. > > CREATE TABLE IF NOT EXISTS data.inventory ( > id uuid NOT NULL DEFAULT NULL PRIMARY KEY, > inv_id uuid NOT NULL DEFAULT NULL > ); > > > CREATE TABLE IF NOT EXISTS data.scan ( > id uuid NOT NULL DEFAULT NULL PRIMARY KEY, > inv_id uuid NOT NULL DEFAULT NULL > scan_dts_utctimestamp NOT NULL DEFAULT NOW(); -- We run out > servers on UTC > ); > > Every item in inventory is scanned when it passes through various steps in > a clean-dispatch-arrive-use-clean sort of a work cycle. The ratio between > inventory and scan is 1:0-n, where n can be virtually any number. In > another table pair like this, the average is 1:1,000. In the inventory > example, it's roughly 0:200,000. The distribution of related row counts is > all over the map. The reasons behind these ratios sometimes map to valid > processes, and sometimes are a consequence of some low-quality data leaking > into the system. In the case of inventory, the results make sense. In our > case: > > * The goal value for n is often 1, and not more than up to 25. > > * Depending on the tables, the % of rows that are discarded because > they're past the 25th most recent record is 97% or more. > > * Partial indexes do not work as well on huge tables as I hoped. The same > data copied via a STATEMENT trigger into a thin, subsetted table is much > faster. I think this has to do with the increase in random disk access > required for a table and/or index with more pages spread around on the disk. > > * We can't filter in advance by *any* reasonable date range. Could get 25 > scans on one item in an hour, or a year, or five years, or never. > > We're finding that we need the top-n records more and more often, returned > quickly. This gets harder to do as the table(s) grow. > >SELECT id, scan_dts_utc > FROM data.scan > WHERE inv_id = 'b7db5d06-8275-224d-a38a-ac263dc1c767' curve. > ORDER BY scan_dts_utc DESC > LIMIT 25; -- Full search product might be 0, 200K, or anything > in-between. Not on a bell curve. > > A compound index works really well to optimize these kinds of searches: > > CREATE INDEX scan_inv_id_scan_time_utc_dts_idx > ON ascendco.analytic_scan (inv_id, scan_time_utc_dts DESC); > > What I'm wondering is if there is some index option, likely not with a > B-tree, that can *automatically* enforce a maximum-length list of top > values, based on a defined sort > > CREATE INDEX scan_inv_id_scan_time_utc_dts_idx > ON ascendco.analytic_scan (inv_id, scan_time_utc_dts DESC) -- > This defines the ordering >LIMIT 25; -- > This sets the hard max for n > > The goal is to have an automatically maintained list of the top values > *in* the index itself. In the right situations (like ours), this reduces > the index size by 20x or more. Smaller index, faster results. And, since > the index is on the source table, the row references are already there. > (Something I lose when maintaining this by hand in a side/shadow/top table.) > > I've looked at a ton of plans, and Postgres *clearly* goes to a lot of > effort to recognize and optimize top-n searches already. That's > encouraging, as it suggests that the planner takes LIMIT into account. > (I've picked up already that maintaining the purity of the planner and > executor abstractions is a core value to the project.) > > And, for sure, I can build and maintain my own custom, ordered list in > various ways. None of them seem like they can possibly rival the trimming > behavior being handled by an index. > > I'm out over my skis here, but I'm intuiting that this might be a job for > one of the multi-value/inverted index types/frameworks. I tried some > experiments, but only got worse results. > > Hope that reads as understandable...grateful for any suggestions. >
Recovery of .partial WAL segments
Dear hackers, Generating a ".partial" WAL segment is pretty common nowadays (using pg_receivewal or during standby promotion). However, we currently don't do anything with it unless the user manually removes that ".partial" extension. The 028_pitr_timelines tests are highlighting that fact: with test data being being in 00020003 and 00010003.partial, a recovery following the latest timeline (2) will succeed but fail if we follow the current timeline (1). By simply trying to fetch the ".partial" file in XLogFileRead, we can easily recover more data and also cover that (current timeline) recovery case. So, this proposed patch makes XLogFileRead try to restore ".partial" WAL archives and adds a test to 028_pitr_timelines using current recovery_target_timeline. As far as I've seen, the current pg_receivewal tests only seem to cover the archives generation but not actually trying to recover using it. I wasn't sure it was interesting to add such tests right now, so I didn't considered it for this patch. Many thanks in advance for your feedback and thoughts about this, Kind Regards, -- Stefan FERCOT Data Egret (https://dataegret.com)From 8c73284b72120ddf3537e1632f12455502d47f6d Mon Sep 17 00:00:00 2001 From: Stefan Fercot Date: Fri, 5 Apr 2024 10:57:03 +0200 Subject: [PATCH v1.] Make XLogFileRead try to restore .partial wal archives Try to restore the normal archived wal segment first and, if not found, then try to restore the archived .partial wal segment. This is safe because the next completed wal segment should contain at least the same data. --- src/backend/access/transam/xlogrecovery.c | 20 ++-- src/test/recovery/t/028_pitr_timelines.pl | 37 ++- 2 files changed, 54 insertions(+), 3 deletions(-) diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index b2fe2d04cc..6d51b6a296 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -4206,6 +4206,8 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, char activitymsg[MAXFNAMELEN + 16]; char path[MAXPGPATH]; int fd; + char *partialxlogfname; + bool restoredArchivedFile; XLogFileName(xlogfname, tli, segno, wal_segment_size); @@ -4217,10 +4219,24 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, xlogfname); set_ps_display(activitymsg); - if (!RestoreArchivedFile(path, xlogfname, + /* + * Try to restore the normal wal segment first and, if not found, + * then try to restore the .partial wal segment. + */ + + partialxlogfname = psprintf("%s.partial", xlogfname); + + restoredArchivedFile = !RestoreArchivedFile(path, xlogfname, + "RECOVERYXLOG", + wal_segment_size, + InRedo) && +!RestoreArchivedFile(path, partialxlogfname, "RECOVERYXLOG", wal_segment_size, - InRedo)) + InRedo); + + pfree(partialxlogfname); + if (restoredArchivedFile) return -1; break; diff --git a/src/test/recovery/t/028_pitr_timelines.pl b/src/test/recovery/t/028_pitr_timelines.pl index 4b7d825b71..512695cd0a 100644 --- a/src/test/recovery/t/028_pitr_timelines.pl +++ b/src/test/recovery/t/028_pitr_timelines.pl @@ -110,7 +110,7 @@ $node_standby->stop; # segment 00020003, before the timeline switching # record. (They are also present in the # 00010003.partial file, but .partial files are not -# used automatically.) +# used when recovering along the latest timeline by default.) # Now test PITR to the recovery target. It should find the WAL in # segment 00020003, but not follow the timeline switch @@ -173,4 +173,39 @@ $node_pitr2->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';") $result = $node_pitr2->safe_psql('postgres', "SELECT max(i) FROM foo;"); is($result, qq{3}, "check table contents after point-in-time recovery"); +# The 00010003.partial file could have been generated +# by pg_receivewal without any standby node involved. In this case, we +# wouldn't be able to recover from 00020003. +# Now, test PITR to the initial recovery target staying on the backup's +# current timeline, trying to fetch the data from the +# 00010003.partial file. + +my $node_pitr3 = PostgreSQL::Test::Cluster->new('node_pitr3'); +$node_pitr3->init_from_backup( + $node_primary, $backup_name, + standby => 0, + has_restoring => 1); +$node_pitr3->append_conf( + 'postgresql.conf', qq{ +recovery_target_name = 'rp' +recovery_target_action = 'promote' +recovery_target_timeline = 'current' +}); + +my $log_offset = -s $node_pitr3->logfile; +$node_pitr3->start; + +ok( $node_pitr3->log_contains( + "restored log file \"00010003.partial\" from archive", + $log_offset), + "restored 00010003.partial"); + +# Wait until recovery finishes. +$node_pi
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot wrote: > > What about something like? > > ereport(LOG, > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from > remote slot", > remote_slot->name), > errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.", > LSN_FORMAT_ARGS(remote_slot->restart_lsn), > LSN_FORMAT_ARGS(slot->data.restart_lsn)); > > Regards, +1. Better than earlier. I will update and post the patch. thanks Shveta
Re: CSN snapshots in hot standby
> On 5 Apr 2024, at 02:08, Kirill Reshke wrote: > > maybe we need some hooks here? Or maybe, we can take CSN here from extension > somehow. I really like the idea of CSN-provider-as-extension. But it's very important to move on with CSN, at least on standby, to make CSN actually happen some day. So, from my perspective, having LSN-as-CSN is already huge step forward. Best regards, Andrey Borodin.
Re: Another WaitEventSet resource leakage in back branches
On Fri, Mar 22, 2024 at 9:15 PM Etsuro Fujita wrote: > While working on [1], I noticed $SUBJECT: WaitLatchOrSocket in back > branches is ignoring the possibility of failing partway through, too. > I added a PG_FAINALLY block to that function, like commit 555276f85. > Patch attached. I noticed that PG_FAINALLY was added in v13. I created a separate patch for v12 using PG_CATCH instead. Patch attached. I am attaching the previous patch for later versions as well. I am planning to back-patch these next week. Best regards, Etsuro Fujita fix-another-WaitEventSet-resource-leakage.patch Description: Binary data fix-another-WaitEventSet-resource-leakage-PG12.patch Description: Binary data
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 04:09:01PM +0530, shveta malik wrote: > On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot > wrote: > > > > What about something like? > > > > ereport(LOG, > > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs > > from remote slot", > > remote_slot->name), > > errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.", > > LSN_FORMAT_ARGS(remote_slot->restart_lsn), > > LSN_FORMAT_ARGS(slot->data.restart_lsn)); > > > > Regards, > > +1. Better than earlier. I will update and post the patch. > Thanks! BTW, I just realized that the LSN I used in my example in the LSN_FORMAT_ARGS() are not the right ones. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Thu, Apr 4, 2024 at 4:38 PM Amit Kapila wrote: > > I think this would probably be better than the current situation but > can we think of a solution to allow toggling the value of two_phase > even when prepared transactions are present? Can you please summarize > the reason for the problems in doing that and the solutions, if any? > > -- > With Regards, > Amit Kapila. > Updated the patch, as it wasn't addressing updating of two-phase in the remote slot. Currently the main issue that needs to be handled is the handling of pending prepared transactions while the two_phase is altered. I see 3 issues with the current approach. 1. Uncommitted prepared transactions when toggling two_phase from true to false When two_phase was true, prepared transactions were decoded at PREPARE time and send to the subscriber, which is then prepared on the subscriber with a new gid. Once the two_phase is toggled to false, then the COMMIT PREPARED on the publisher is converted to commit and the entire transaction is decoded and sent to the subscriber. This will leave the previously prepared transaction pending. 2. Uncommitted prepared transactions when toggling two_phase form false to true When two_phase was false, prepared transactions were ignored and not decoded at PREPARE time on the publisher. Once the two_phase is toggled to true, the apply worker and the walsender are restarted and a replication is restarted from a new "start_decoding_at" LSN. Now, this new "start_decoding_at" could be past the LSN of the PREPARE record and if so, the PREPARE record is skipped and not send to the subscriber. Look at comments in DecodeTXNNeedSkip() for detail. Later when the user issues COMMIT PREPARED, this is decoded and sent to the subscriber. but there is no prepared transaction on the subscriber, and this fails because the corresponding gid of the transaction couldn't be found. 3. While altering the two_phase of the subscription, it is required to also alter the two_phase field of the slot on the primary. The subscription cannot remotely alter the two_phase option of the slot when the subscription is enabled, as the slot is owned by the walsender on the publisher side. Possible solutions for the 3 problems: 1. While toggling two_phase from true to false, we could probably get list of prepared transactions for this subscriber id and rollback/abort the prepared transactions. This will allow the transactions to be re-applied like a normal transaction when the commit comes. Alternatively, if this isn't appropriate doing it in the ALTER SUBSCRIPTION context, we could store the xids of all prepared transactions of this subscription in a list and when the corresponding xid is being committed by the apply worker, prior to commit, we make sure the previously prepared transaction is rolled back. But this would add the overhead of checking this list every time a transaction is committed by the apply worker. 2. No solution yet. 3. We could mandate that the altering of two_phase state only be done after disabling the subscription, just like how it is handled for failover option. Let me know your thoughts. regards, Ajin Cherian Fujitsu Australia v2-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 4:31 PM Bertrand Drouvot wrote: > > BTW, I just realized that the LSN I used in my example in the > LSN_FORMAT_ARGS() > are not the right ones. Noted. Thanks. Please find v3 with the comments addressed. thanks Shveta v3-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > There is an intermittent BF failure observed at [1] after this commit > (2ec005b). > Thanks for analyzing and providing the patch. I'll look into it. There is another BF failure [1] which I have analyzed. The main reason for failure is the following test: # Failed test 'logical slots have synced as true on standby' # at /home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl line 198. # got: 'f' # expected: 't' Here, we are expecting that the two logical slots (lsub1_slot, and lsub2_slot), one created via subscription and another one via API pg_create_logical_replication_slot() are synced. The standby LOGs which are as follows show that the one created by API 'lsub2_slot' is synced but the other one 'lsub1_slot': LOG for lsub1_slot: 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL: Streaming transactions committing after 0/0, reading WAL from 0/360. 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] STATEMENT: SELECT pg_sync_replication_slots(); 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG: xmin required by slots: data 0, catalog 740 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG: could not sync slot "lsub1_slot" LOG for lsub2_slot: 2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG: xmin required by slots: data 0, catalog 740 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG: newly created slot "lsub2_slot" is sync-ready now 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] STATEMENT: SELECT pg_sync_replication_slots(); We can see from the log of lsub1_slot that its restart_lsn is 0/360 which means it will start reading from the WAL from that location. Now, if we check the publisher log, we have running_xacts record at that location. See following LOGs: 2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG: statement: SELECT pg_create_logical_replication_slot('lsub2_slot', 'test_decoding', false, false, true); 2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/360 oldest xid 740 latest complete 739 next xid 740) 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 739 next xid 740) The first running_xact record ends at 360 and the second one at 398. So, the start location of the second running_xact is 360, the same can be confirmed by the following LOG line of walsender: 2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG: serializing snapshot to pg_logical/snapshots/0-360.snap This shows that while processing running_xact at location 360, we have serialized the snapshot. As there is no running transaction in WAL at 360 so ideally we should have reached a consistent state after processing that record on standby. But the reason standby didn't process that LOG is that the confirmed_flush LSN is also at the same location so the function LogicalSlotAdvanceAndCheckSnapState() exits without reading the WAL at that location. Now, this can be confirmed by the below walsender-specific LOG in publisher: 2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write 0/360 flush 0/360 apply 0/360 reply_time 2024-04-05 04:36:59.155181+00 We update the confirmed_flush location with the flush location after receiving the above feedback. You can notice that we didn't receive the feedback for the 398 location and hence both the confirmed_flush and restart_lsn are at the same location 0/360. Now, the test is waiting for the subscriber to send feedback of the last WAL write location by $primary->wait_for_catchup('regress_mysub1'); As noticed from the publisher LOGs, the query we used for wait is: SELECT '0/360' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('regress_mysub1', 'walreceiver') Here, instead of '0/360' it should have used ''0/398' which is the last write location. This position we get via function pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And this variable seems to be touched by commit c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At this stage, I am not sure so just sharing with others to see if what I am saying sounds logical. I'll think more about this. [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-04-05%2004%3A34%3A27 -- With Regards, Amit Kapila.
Re: LogwrtResult contended spinlock
Couldn't push: I tested with --disable-atomics --disable-spinlocks and the tests fail because the semaphore for the atomic variables is not always initialized. This is weird -- it's like a client process is running at a time when StartupXLOG has not initialized the variable ... so the initialization in the other place was not completely wrong. I'll investigate after lunch. Here's v16. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "XML!" Exclaimed C++. "What are you doing here? You're not a programming language." "Tell that to the people who use me," said XML. https://burningbird.net/the-parable-of-the-languages/ >From 2164778b74681910544a64ba6c2ae36e5204ed9e Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 5 Apr 2024 13:21:39 +0200 Subject: [PATCH v16 1/2] Make XLogCtl->log{Write,Flush}Result accessible with atomics This removes the need to hold both the info_lck spinlock and WALWriteLock to update them. We use stock atomic write instead, with WALWriteLock held. Readers can use atomic read, without any locking. This allows for some code to be reordered: some places were a bit contorted to avoid repeated spinlock acquisition, but that's no longer a concern, so we can turn them to more natural coding. Some further changes are likely possible with a positive performance impact, but in this commit I did rather minimal ones only, to avoid increasing the blast radius. Reviewed-by: Bharath Rupireddy Reviewed-by: Jeff Davis Reviewed-by: Andres Freund (earlier versions) Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql --- src/backend/access/transam/xlog.c | 105 -- 1 file changed, 57 insertions(+), 48 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3bdd9a2ddd..748ed1f2ef 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -292,12 +292,7 @@ static bool doPageWrites; * LogwrtRqst indicates a byte position that we need to write and/or fsync * the log up to (all records before that point must be written or fsynced). * The positions already written/fsynced are maintained in logWriteResult - * and logFlushResult. - * - * To read XLogCtl->logWriteResult or ->logFlushResult, you must hold either - * info_lck or WALWriteLock. To update them, you need to hold both locks. - * The point of this arrangement is that the value can be examined by code - * that already holds WALWriteLock without needing to grab info_lck as well. + * and logFlushResult using atomic access. * In addition to the shared variable, each backend has a private copy of * both in LogwrtResult, which is updated when convenient. * @@ -473,12 +468,9 @@ typedef struct XLogCtlData pg_time_t lastSegSwitchTime; XLogRecPtr lastSegSwitchLSN; - /* - * Protected by info_lck and WALWriteLock (you must hold either lock to - * read it, but both to update) - */ - XLogRecPtr logWriteResult; /* last byte + 1 written out */ - XLogRecPtr logFlushResult; /* last byte + 1 flushed */ + /* These are accessed using atomics -- info_lck not needed */ + pg_atomic_uint64 logWriteResult; /* last byte + 1 written out */ + pg_atomic_uint64 logFlushResult; /* last byte + 1 flushed */ /* * Latest initialized page in the cache (last byte position + 1). @@ -616,11 +608,15 @@ static XLogwrtResult LogwrtResult = {0, 0}; /* * Update local copy of shared XLogCtl->log{Write,Flush}Result + * + * It's critical that Flush always trails Write, so the order of the reads is + * important, as is the barrier. See also XLogWrite. */ #define RefreshXLogWriteResult(_target) \ do { \ - _target.Write = XLogCtl->logWriteResult; \ - _target.Flush = XLogCtl->logFlushResult; \ + _target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \ + pg_read_barrier(); \ + _target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \ } while (0) /* @@ -968,9 +964,8 @@ XLogInsertRecord(XLogRecData *rdata, /* advance global request to include new block(s) */ if (XLogCtl->LogwrtRqst.Write < EndPos) XLogCtl->LogwrtRqst.Write = EndPos; - /* update local result copy while I have the chance */ - RefreshXLogWriteResult(LogwrtResult); SpinLockRelease(&XLogCtl->info_lck); + RefreshXLogWriteResult(LogwrtResult); } /* @@ -1989,17 +1984,17 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic) if (opportunistic) break; - /* Before waiting, get info_lck and update LogwrtResult */ + /* Advance shared memory write request position */ SpinLockAcquire(&XLogCtl->info_lck); if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr) XLogCtl->LogwrtRqst.Write = OldPageRqstPtr; - RefreshXLogWriteResult(LogwrtResult); SpinLockRelease(&XLogCtl->info_lck); /* - * Now that we have an up-to-date LogwrtResult value, see if we - * still need to write it or if someone else already did. + * Acquire an up-to-date LogwrtResu
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, 4 Apr 2024 at 12:45, Jelte Fennema-Nio wrote: > On Thu, 4 Apr 2024 at 14:50, Peter Eisentraut > wrote: > > It appears there are several different perspectives about this. My > > intuition was that a protocol version change indicates something that we > > eventually want all client libraries to support. Whereas protocol > > extensions are truly opt-in. > > > > For example, if we didn't have the ParameterStatus message and someone > > wanted to add it, then this ought to be a protocol version change, so > > that we eventually get everyone to adopt it. > > > > But, for example, the column encryption feature is not something I'd > > expect all client libraries to implement, so it can be opt-in. > > I think that is a good way of deciding between version bump vs > protocol extension parameter. But I'd like to make one > clarification/addition to this logic, if the protocol feature already > requires opt-in from the client because of how the feature works, then > there's no need for a protocol extension parameter. e.g. if you're > column-encryption feature would require the client to send a > ColumnDecrypt message before the server would exhibit any behaviour > relating to the column-encryption feature, then the client can simply > "support" the feature by never sending the ColumnDecrypt message. > Thus, in such cases a protocol extension parameter would not be > necessary, even if the feature is considered opt-in. > > > (I believe we have added a number of new protocol messages since the > > beginning of the 3.0 protocol, without any version change, so "new > > protocol message" might not always be a good example to begin with.) > > Personally, I feel like this is something we should change. IMHO, we > should get to a state where protocol minor version bumps are so > low-risk that we can do them whenever we add message types. Right now > there are basically multiple versions of the 3.0 protocol, which is > very confusing to anyone implementing it. Different servers > implementing the 3.0 protocol without the NegotiateVersion message is > a good example of that. > Totally agree. > > > I fear that if we prefer protocol extensions over version increases, > > then we'd get a very fragmented landscape of different client libraries > > supporting different combinations of things. > > +1 Dave > +1 >
Re: meson vs windows perl
On 2024-04-02 Tu 09:34, Andrew Dunstan wrote: meson.build has this code ldopts = run_command(perl, '-MExtUtils::Embed', '-e', 'ldopts', check: true).stdout().strip() undesired = run_command(perl_conf_cmd, 'ccdlflags', check: true).stdout().split() undesired += run_command(perl_conf_cmd, 'ldflags', check: true).stdout().split() perl_ldopts = [] foreach ldopt : ldopts.split(' ') if ldopt == '' or ldopt in undesired continue endif perl_ldopts += ldopt.strip('"') endforeach message('LDFLAGS recommended by perl: "@0@"'.format(ldopts)) message('LDFLAGS for embedding perl: "@0@"'.format(' '.join(perl_ldopts))) This code is seriously broken if perl reports items including spaces, when a) removing the quotes is quite wrong, and b) splitting on spaces is also wrong. Here's an example from one of my colleagues: C:\Program Files\Microsoft Visual Studio\2022\Professional>perl.EXE -MExtUtils::Embed -e ldopts -nologo -nodefaultlib -debug -opt:ref,icf -ltcg -libpath:"C:\edb\languagepack\v4\Perl-5.38\lib\CORE" -machine:AMD64 -subsystem:console,"5.02" "C:\edb\languagepack\v4\Perl-5.38\lib\CORE\perl538.lib" "C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\lib\x64\oldnames.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\kernel32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\user32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\gdi32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\winspool.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\comdlg32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\advapi32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\shell32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\ole32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\oleaut32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\netapi32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\uuid.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\ws2_32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\mpr.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\winmm.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\version.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\odbc32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\odbccp32.lib" "C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64\comctl32.lib" "C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\lib\x64\msvcrt.lib" "C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\lib\x64\vcruntime.lib" "C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64\ucrt.lib" And with that we get errors like cl : Command line warning D9024 : unrecognized source file type 'C:\Program', object file assumed cl : Command line warning D9024 : unrecognized source file type 'Files\Microsoft', object file assumed cl : Command line warning D9024 : unrecognized source file type 'Visual', object file assumed cl : Command line warning D9024 : unrecognized source file type 'C:\Program', object file assumed cl : Command line warning D9024 : unrecognized source file type 'Files', object file assumed cl : Command line warning D9024 : unrecognized source file type '(x86)\Windows', object file assumed It looks like we need to get smarter about how we process the ldopts and strip out the ccdlflags and ldflags Here is an attempt to fix all that. It's ugly, but I think it's more principled. First, instead of getting the ldopts and then trying to filter out the ldflags and ccdlflags, it tells perl not to include those in the first place, by overriding a couple of routines in ExtUtils::Embed. And second, it's smarter about splitting what's left, so that it doesn't split on a space that's in a quoted item. The perl that's used to do that second bit is not pretty, but it has been tested on the system where the problem arose and apparently cures the problem. (No doubt some perl guru could improve it.) It also works on my Ubuntu system, so I don't think we'll be breaking anything (famous last words). cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com diff --git a/meson.build b/meson.build index 87437960bc..91e87055d6 100644 --- a/meson.build +++ b/meson.build @@ -993,22 +993,11 @@ if not perlopt.disabled() # Config's ccdlflags and ldflags. (Those are the choices of those who # built the Perl installation, which are not necessarily ap
Re: remaining sql/json patches
On Thu, Apr 4, 2024 at 9:02 PM Amit Langote wrote: > I'll post the rebased 0002 tomorrow after addressing your comments. Here's one. Main changes: * Fixed a bug in get_table_json_columns() which caused nested columns to be deparsed incorrectly, something Jian reported upthread. * Simplified the algorithm in JsonTablePlanNextRow() I'll post another revision or two maybe tomorrow, but posting what I have now in case Jian wants to do more testing. -- Thanks, Amit Langote v50-0001-JSON_TABLE-Add-support-for-NESTED-columns.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > > There is an intermittent BF failure observed at [1] after this commit > > (2ec005b). > > > > Thanks for analyzing and providing the patch. I'll look into it. There > is another BF failure [1] which I have analyzed. The main reason for > failure is the following test: > > # Failed test 'logical slots have synced as true on standby' > # at > /home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl > line 198. > # got: 'f' > # expected: 't' > > Here, we are expecting that the two logical slots (lsub1_slot, and > lsub2_slot), one created via subscription and another one via API > pg_create_logical_replication_slot() are synced. The standby LOGs > which are as follows show that the one created by API 'lsub2_slot' is > synced but the other one 'lsub1_slot': > > LOG for lsub1_slot: > > 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL: > Streaming transactions committing after 0/0, reading WAL from > 0/360. > 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] > STATEMENT: SELECT pg_sync_replication_slots(); > 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG: > xmin required by slots: data 0, catalog 740 > 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG: > could not sync slot "lsub1_slot" > > LOG for lsub2_slot: > > 2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG: > xmin required by slots: data 0, catalog 740 > 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG: > newly created slot "lsub2_slot" is sync-ready now > 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] > STATEMENT: SELECT pg_sync_replication_slots(); > > We can see from the log of lsub1_slot that its restart_lsn is > 0/360 which means it will start reading from the WAL from that > location. Now, if we check the publisher log, we have running_xacts > record at that location. See following LOGs: > > 2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG: > statement: SELECT pg_create_logical_replication_slot('lsub2_slot', > 'test_decoding', false, false, true); > 2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG: > snapshot of 0+0 running transaction ids (lsn 0/360 oldest xid 740 > latest complete 739 next xid 740) > > > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: > snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 > latest complete 739 next xid 740) > > The first running_xact record ends at 360 and the second one at > 398. So, the start location of the second running_xact is 360, > the same can be confirmed by the following LOG line of walsender: > > 2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG: > serializing snapshot to pg_logical/snapshots/0-360.snap > > This shows that while processing running_xact at location 360, we > have serialized the snapshot. As there is no running transaction in > WAL at 360 so ideally we should have reached a consistent state > after processing that record on standby. But the reason standby didn't > process that LOG is that the confirmed_flush LSN is also at the same > location so the function LogicalSlotAdvanceAndCheckSnapState() exits > without reading the WAL at that location. Now, this can be confirmed > by the below walsender-specific LOG in publisher: > > 2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write > 0/360 flush 0/360 apply 0/360 reply_time 2024-04-05 > 04:36:59.155181+00 > > We update the confirmed_flush location with the flush location after > receiving the above feedback. You can notice that we didn't receive > the feedback for the 398 location and hence both the > confirmed_flush and restart_lsn are at the same location 0/360. > Now, the test is waiting for the subscriber to send feedback of the > last WAL write location by > $primary->wait_for_catchup('regress_mysub1'); As noticed from the > publisher LOGs, the query we used for wait is: > > SELECT '0/360' <= replay_lsn AND state = 'streaming' > FROM pg_catalog.pg_stat_replication > WHERE application_name IN ('regress_mysub1', 'walreceiver') > > Here, instead of '0/360' it should have used ''0/398' which is > the last write location. This position we get via function > pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And > this variable seems to be touched by commit > c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At > this stage, I am not sure so just sharing with others to see if what I > am saying sounds logical. I'll think more about this. > Thinking more on this, it doesn't seem related to c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit does
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, Apr 4, 2024 at 8:50 AM Peter Eisentraut wrote: > It appears there are several different perspectives about this. My > intuition was that a protocol version change indicates something that we > eventually want all client libraries to support. Whereas protocol > extensions are truly opt-in. Hmm. This doesn't seem like a bad way to make a distinction to me, except that I fear it would be mushy in practice. For example: > For example, if we didn't have the ParameterStatus message and someone > wanted to add it, then this ought to be a protocol version change, so > that we eventually get everyone to adopt it. > > But, for example, the column encryption feature is not something I'd > expect all client libraries to implement, so it can be opt-in. I agree that column encryption might not necessarily be supported by all client libraries, but equally, the ParameterStatus message is just for the benefit of the client. A client that doesn't care about the contents of such a message is free to ignore it, and would be better off if it weren't sent in the first place; it's just extra bytes on the wire that aren't needed for anything. So why would we want to force everyone to adopt that, if it didn't exist already? I also wonder how the protocol negotiation for column encryption is actually going to work. What are the actual wire protocol changes that are needed? What does the server need to know from the client, or the client from the server, about what is supported? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Allow non-superuser to cancel superuser tasks.
On Fri, Apr 05, 2024 at 02:39:05PM +0900, Michael Paquier wrote: > + /* > + * If the backend is autovacuum worker, allow user with the privileges > of > + * pg_signal_autovacuum role to signal the backend. > + */ > + if (pgstat_get_backend_type(GetNumberFromPGProc(proc)) == > B_AUTOVAC_WORKER) > + { > + if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM)) > + return SIGNAL_BACKEND_NOAUTOVACUUM; > + } > > I was wondering why this is not done after we've checked that we have > a superuser-owned backend, and this is giving me a pause. @Nathan, > why do you think we should not rely on the roleId for an autovacuum > worker? In core, do_autovacuum() is only called in a process without > a role specified, and I've noticed your remark here: > https://www.postgresql.org/message-id/20240401202146.GA2354284@nathanxps13 > It's feeling more natural here to check that we have a superuser-owned > backend first, and then do a lookup of the process type. I figured since there's no reason to rely on that behavior, we might as well do a bit of future-proofing in case autovacuum workers are ever not run as InvalidOid. It'd be easy enough to fix this code if that ever happened, so I'm not too worried about this. > One thing that we should definitely not do is letting any user calling > pg_signal_backend() know that a given PID maps to an autovacuum > worker. This information is hidden in pg_stat_activity. And > actually, doesn't the patch leak this information to all users when > calling pg_signal_backend with random PID numbers because of the fact > that SIGNAL_BACKEND_NOAUTOVACUUM exists? Any role could guess which > PIDs are used by an autovacuum worker because of the granularity > required for the error related to pg_signal_autovacuum. Hm. I hadn't considered that angle. IIUC right now they'll just get the generic superuser error for autovacuum workers. I don't know how concerned to be about users distinguishing autovacuum workers from other superuser backends, _but_ if roles with pg_signal_autovacuum can't even figure out the PIDs for the autovacuum workers, then this feature seems kind-of useless. Perhaps we should allow roles with privileges of pg_signal_autovacuum to see the autovacuum workers in pg_stat_activity. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: IPC::Run::time[r|out] vs our TAP tests
Tom Lane writes: > Erik Wienhold writes: >> Libedit 20191025-3.1 is the first version where ":{?VERB" works as >> expected. The previous release 20190324-3.1 still produces the escaped >> output that Michael found. That narrows down the changes to everything >> between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and >> [2] (both inclusive). > > Hm. I just installed NetBSD 8.2 in a VM, and it passes this test: > > # +++ tap install-check in src/bin/psql +++ > t/001_basic.pl ... ok > t/010_tab_completion.pl .. ok > t/020_cancel.pl .. ok > All tests successful. > > So it seems like the bug does not exist in any currently-supported > NetBSD release. Debian has been known to ship obsolete libedit > versions, though. Both the current (bokworm/12) and previous (bullseye/11) versions of Debian have new enough libedits to not be affected by this bug: libedit| 3.1-20181209-1 | oldoldstable | source libedit| 3.1-20191231-2 | oldstable | source libedit| 3.1-20221030-2 | stable | source libedit| 3.1-20230828-1 | testing| source libedit| 3.1-20230828-1 | unstable | source libedit| 3.1-20230828-1 | unstable-debug | source But in bullseye they decided that OpenSSL is a system library as far as the GPL is concerned, so are linking directly to readline. And even before then their psql wrapper would LD_PRELOAD readline instead if installed, so approximately nobody actually ever used psql with libedit on Debian. > regards, tom lane - ilmari
Re: Fixing backslash dot for COPY FROM...CSV
Tom Lane wrote: > I've looked over this patch and I generally agree that this is a > reasonable solution. Thanks for reviewing this! > I'm also wondering why the patch adds a test for > "PQprotocolVersion(conn) >= 3" in handleCopyIn. I've removed this in the attached update. > I concur with Robert's doubts about some of the doc changes though. > In particular, since we're not changing the behavior for non-CSV > mode, we shouldn't remove any statements that apply to non-CSV mode. I don't quite understand the issues with the doc changes. Let me detail the changes. The first change is under CSV Format so it does no concern non-CSV modes. --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -761,11 +761,7 @@ COPY count format, \., the end-of-data marker, could also appear as a data value. To avoid any misinterpretation, a \. data value appearing as a lone entry on a line is automatically -quoted on output, and on input, if quoted, is not interpreted as the -end-of-data marker. If you are loading a file created by another -application that has a single unquoted column and might have a -value of \., you might need to quote that value in the -input file. +quoted on output. The part about quoting output is kept because the code still does that. About this bit: "and on input, if quoted, is not interpreted as the end-of-data marker." So the patch removes that part. The reason is \. is now not interpreted as EOD in both cases, quoted or unquoted, conforming to spec. Previously, what the reader should have understood by "if quoted, ..." is that it implies "if not quoted, then .\ will be interpreted as EOD even though that behavior does not conform to the CSV spec". If we documented the new behavior, it would be something like: when quoted, it works as expected, and when unquoted, it works as expected too. Isn't it simpler not to say anything? About the next sentence: "If you are loading a file created by another application that has a single unquoted column and might have a value of \., you might need to quote that value in the input file." It's removed as well because it's not longer necessary to do that. The other hunk is in psql doc: --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1119,10 +1119,6 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g destination, because all data must pass through the client/server connection. For large amounts of data the SQL command might be preferable. -Also, because of this pass-through method, \copy -... from in CSV mode will erroneously -treat a \. data value alone on a line as an -end-of-input marker. That behavior no longer happens, so this gets removed as well. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite From 5a16b81f10e47fe1ac3842abd34e8bef7e639e26 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= Date: Fri, 5 Apr 2024 14:22:49 +0200 Subject: [PATCH v4] Support backslash-dot on a line by itself as valid data in COPY FROM...CSV --- doc/src/sgml/ref/copy.sgml | 6 +-- doc/src/sgml/ref/psql-ref.sgml | 4 -- src/backend/commands/copyfromparse.c | 57 +++- src/bin/psql/copy.c | 30 ++- src/test/regress/expected/copy.out | 18 + src/test/regress/sql/copy.sql| 12 ++ 6 files changed, 65 insertions(+), 62 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 33ce7c4ea6..a63f073c1b 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -814,11 +814,7 @@ COPY count format, \., the end-of-data marker, could also appear as a data value. To avoid any misinterpretation, a \. data value appearing as a lone entry on a line is automatically -quoted on output, and on input, if quoted, is not interpreted as the -end-of-data marker. If you are loading a file created by another -application that has a single unquoted column and might have a -value of \., you might need to quote that value in the -input file. +quoted on output. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index cc7d797159..d94e3cacfc 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1119,10 +1119,6 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g destination, because all data must pass through the client/server connection. For large amounts of data the SQL command might be preferable. -Also, because of this pass-through method, \copy -... from in CSV mode will erroneously -treat a \. data value alone on a line as an -end-of-input marker. diff --git a/src/backend/commands/copyfromparse.c
Re: Popcount optimization using AVX512
On Fri, Apr 05, 2024 at 10:33:27AM +0300, Ants Aasma wrote: > The main issue I saw was that clang was able to peel off the first > iteration of the loop and then eliminate the mask assignment and > replace masked load with a memory operand for vpopcnt. I was not able > to convince gcc to do that regardless of optimization options. > Generated code for the inner loop: > > clang: > : > 50: add rdx, 64 > 54: cmp rdx, rdi > 57: jae > 59: vpopcntq zmm1, zmmword ptr [rdx] > 5f: vpaddq zmm0, zmm1, zmm0 > 65: jmp > > gcc: > : > 38: kmovq k1, rdx > 3d: vmovdqu8 zmm0 {k1} {z}, zmmword ptr [rax] > 43: add rax, 64 > 47: mov rdx, -1 > 4e: vpopcntq zmm0, zmm0 > 54: vpaddq zmm0, zmm0, zmm1 > 5a: vmovdqa64 zmm1, zmm0 > 60: cmp rax, rsi > 63: jb > > I'm not sure how much that matters in practice. Attached is a patch to > do this manually giving essentially the same result in gcc. As most > distro packages are built using gcc I think it would make sense to > have the extra code if it gives a noticeable benefit for large cases. Yeah, I did see this, but I also wasn't sure if it was worth further complicating the code. I can test with and without your fix and see if it makes any difference in the benchmarks. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: broken JIT support on Fedora 40
On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro wrote: > https://github.com/llvm/llvm-project/pull/87093 Oh, with those clues, I think I might see... It is a bit strange that we copy attributes from AttributeTemplate(), a function that returns Datum, to our void deform function. It works (I mean doesn't crash) if you just comment this line out: llvm_copy_attributes(AttributeTemplate, v_deform_fn); ... but I guess that disables inlining of the deform function? So perhaps we just need to teach that thing not to try to copy the return value's attributes, which also seems to work here: diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index ec0fdd49324..92b4993a98a 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -552,8 +552,11 @@ llvm_copy_attributes(LLVMValueRef v_from, LLVMValueRef v_to) /* copy function attributes */ llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeFunctionIndex); - /* and the return value attributes */ - llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex); + if (LLVMGetTypeKind(LLVMGetFunctionReturnType(v_to)) != LLVMVoidTypeKind) + { + /* and the return value attributes */ + llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex); + }
Re: broken JIT support on Fedora 40
> On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote: > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro wrote: > > https://github.com/llvm/llvm-project/pull/87093 > > Oh, with those clues, I think I might see... It is a bit strange that > we copy attributes from AttributeTemplate(), a function that returns > Datum, to our void deform function. It works (I mean doesn't crash) > if you just comment this line out: > > llvm_copy_attributes(AttributeTemplate, v_deform_fn); > > ... but I guess that disables inlining of the deform function? So > perhaps we just need to teach that thing not to try to copy the return > value's attributes, which also seems to work here: Yep, I think this is it. I've spent some hours trying to understand why suddenly deform function has noundef ret attribute, when it shouldn't -- this explains it and the proposed change fixes the crash. One thing that is still not clear to me though is why this copied attribute doesn't show up in the bitcode dumped right before running inline pass (I've added this to troubleshoot the issue).
Re: Security lessons from liblzma
On Thu, Apr 4, 2024 at 4:48 PM Daniel Gustafsson wrote: > AFAIK we haven't historically enforced that installations have the openssl > binary in PATH, but it would be a pretty low bar to add. The bigger issue is > likely to find someone to port this to Windows, it probably won't be too hard > but as with all things building on Windows, we need someone skilled in that > area to do it. I wonder how hard it would be to just code up our own binary to do this. If it'd be a pain to do that, or to maintain it across SSL versions, then it's a bad plan and we shouldn't do it. But if it's not that much code, maybe it'd be worth considering. I'm also sort of afraid that we're getting sucked into thinking real hard about this SSL certificate issue rather than trying to brainstorm all the other places that might be problematic. The latter might be a more fruitful exercise (or maybe not, what do I know?). -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add bump memory context type and use it for tuplesorts
On Thu, 4 Apr 2024 at 22:43, Tom Lane wrote: > > Matthias van de Meent writes: > > On Mon, 25 Mar 2024 at 22:44, Tom Lane wrote: > >> Basically, I'm not happy with consuming the last reasonably-available > >> pattern for a memory context type that has little claim to being the > >> Last Context Type We Will Ever Want. Rather than making a further > >> dent in our ability to detect corrupted chunks, we should do something > >> towards restoring the expansibility that existed in the original > >> design. Then we can add bump contexts and whatever else we want. > > > So, would something like the attached make enough IDs available so > > that we can add the bump context anyway? > > > It extends memory context IDs to 5 bits (32 values), of which > > - 8 have glibc's malloc pattern of 001/010; > > - 1 is unused memory's 0 > > - 1 is wipe_mem's 1 > > - 4 are used by existing contexts (Aset/Generation/Slab/AlignedRedirect) > > - 18 are newly available. > > This seems like it would solve the problem for a good long time > to come; and if we ever need more IDs, we could steal one more bit > by requiring the offset to the block header to be a multiple of 8. > (Really, we could just about do that today at little or no cost ... > machines with MAXALIGN less than 8 are very thin on the ground.) Hmm, it seems like a decent idea, but I didn't want to deal with the repercussions of that this late in the cycle when these 2 bits were still relatively easy to get hold of. > The only objection I can think of is that perhaps this would slow > things down a tad by requiring more complicated shifting/masking. > I wonder if we could redo the performance checks that were done > on the way to accepting the current design. I didn't do very extensive testing, but the light performance tests that I did with the palloc performance benchmark patch & script shared above indicate didn't measure an observable negative effect. An adapted version of the test that uses repalloc() to check performance differences in MCXT_METHOD() doesn't show a significant performance difference from master either. That test case is attached as repalloc-performance-test-function.patch.txt. The full set of patches would then accumulate to the attached v5 of the patchset. 0001 is an update of my patch from yesterday, in which I update MemoryContextMethodID infrastructure for more IDs, and use a new naming scheme for unused/reserved IDs. 0002 and 0003 are David's patches, with minor changes to work with 0001 (rebasing, and I moved the location around to keep function declaration in order with memctx ids) Kind regards, Matthias van de Meent From b65320736cf63684b4710b3d63e243a0862d37c6 Mon Sep 17 00:00:00 2001 From: Matthias van de Meent Date: Fri, 5 Apr 2024 15:13:52 +0200 Subject: [PATCH] repalloc performance test function --- src/backend/utils/adt/mcxtfuncs.c | 231 ++ src/include/catalog/pg_proc.dat | 12 ++ 2 files changed, 243 insertions(+) diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c index 4d4a70915b..88de4bbcb5 100644 --- a/src/backend/utils/adt/mcxtfuncs.c +++ b/src/backend/utils/adt/mcxtfuncs.c @@ -15,8 +15,11 @@ #include "postgres.h" +#include + #include "funcapi.h" #include "mb/pg_wchar.h" +#include "miscadmin.h" #include "storage/proc.h" #include "storage/procarray.h" #include "utils/builtins.h" @@ -185,3 +188,231 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS) PG_RETURN_BOOL(true); } + +typedef struct AllocateTestNext +{ + struct AllocateTestNext *next; /* ptr to the next allocation */ +} AllocateTestNext; + +/* #define ALLOCATE_TEST_DEBUG */ +/* + * pg_allocate_memory_test + * Used to test the performance of a memory context types + */ +Datum +pg_allocate_memory_test(PG_FUNCTION_ARGS) +{ + int32 chunk_size = PG_GETARG_INT32(0); + int64 keep_memory = PG_GETARG_INT64(1); + int64 total_alloc = PG_GETARG_INT64(2); + text *context_type_text = PG_GETARG_TEXT_PP(3); + char *context_type; + int64 curr_memory_use = 0; + int64 remaining_alloc_bytes = total_alloc; + MemoryContext context; + MemoryContext oldContext; + AllocateTestNext *next_free_ptr = NULL; + AllocateTestNext *last_alloc = NULL; + clock_t start, end; + + if (chunk_size < sizeof(AllocateTestNext)) + elog(ERROR, "chunk_size (%d) must be at least %ld bytes", chunk_size, +sizeof(AllocateTestNext)); + if (keep_memory > total_alloc) + elog(ERROR, "keep_memory (" INT64_FORMAT ") must be less than total_alloc (" INT64_FORMAT ")", +keep_memory, total_alloc); + + context_type = text_to_cstring(context_type_text); + + start = clock(); + + if (strcmp(context_type, "generation") == 0) + context = GenerationContextCreate(CurrentMemoryContext
Re: Speed up clean meson builds by ~25%
On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio wrote: > It improved clean build times on my machine (10 cores/20 threads) from ~40 > seconds to ~30 seconds. After discussing this off-list with Bilal, I realized that this gain is only happening for clang builds on my system. Because those take a long time as was also recently discussed in[1]. My builds don't take nearly as long though. I tried with clang 15 through 18 and they all took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu 22.04 [1]: https://www.postgresql.org/message-id/flat/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com
Re: broken JIT support on Fedora 40
> On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote: > > On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote: > > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro > > wrote: > > > https://github.com/llvm/llvm-project/pull/87093 > > > > Oh, with those clues, I think I might see... It is a bit strange that > > we copy attributes from AttributeTemplate(), a function that returns > > Datum, to our void deform function. It works (I mean doesn't crash) > > if you just comment this line out: > > > > llvm_copy_attributes(AttributeTemplate, v_deform_fn); > > > > ... but I guess that disables inlining of the deform function? So > > perhaps we just need to teach that thing not to try to copy the return > > value's attributes, which also seems to work here: > > Yep, I think this is it. I've spent some hours trying to understand why > suddenly deform function has noundef ret attribute, when it shouldn't -- > this explains it and the proposed change fixes the crash. One thing that > is still not clear to me though is why this copied attribute doesn't > show up in the bitcode dumped right before running inline pass (I've > added this to troubleshoot the issue). One thing to consider in this context is indeed adding "verify" pass as suggested in the PR, at least for the debugging configuration. Without the fix it immediately returns: Running analysis: VerifierAnalysis on deform_0_1 Attribute 'noundef' applied to incompatible type! llvm error: Broken function found, compilation aborted!
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, Apr 4, 2024 at 12:45 PM Jelte Fennema-Nio wrote: > Yeah, we definitely think differently here then. To me bumping the > minor protocol version shouldn't be a thing that we would need to > carefully consider. It should be easy to do, and probably done often. Often? I kind of hope that the protocol starts to evolve a bit more than it has, but I don't want a continuous stream of changes. That will be very hard to test and verify correctness, and a hassle for drivers to keep up with, and a mess for compatibility. > So, what approach of checking feature support are you envisioning > instead? A function for every feature? > Something like SupportsSetProtocolParameter, that returns an error > message if it's not supported and NULL otherwise. And then add such a > support function for every feature? Yeah, something like that. > I think I might agree with you that it would be nice for features that > depend on a protocol extension parameter, indeed because in the future > we might make them required and they don't have to update their logic > then. But for features only depending on the protocol version I > honestly don't see the point. A protocol version check is always going > to continue working. Sure, if we introduce it based on the protocol version then we don't need anything else. > I'm also not sure why you're saying a user is not entitled to this > information. We already provide users of libpq a way to see the full > Postgres version, and the major protocol version. I think allowing a > user to access this information is only a good thing. But I agree that > providing easy to use feature support functions is a better user > experience in some cases. I guess that's a fair point. But I'm worried that if we expose too much of the internals, we won't be able to change things later. > > But this is another example of a problem that can *easily* be fixed > > without using up the entirety of the _pq_ namespace. You can introduce > > _pq_.dont_error_on_unknown_gucs=1 or > > _pq_.dont_error_on_these_gucs='foo, bar, baz'. The distinction between > > the startup packet containing whizzbang=frob and instead containing > > _pq_.whizzbang=frob shouldn't be just whether an error is thrown if we > > don't know anything about whizzbang. > > Nice idea, but that sadly won't work well in practice. Because old PG > versions don't know about _pq_.dont_error_on_unknown_gucs. So that > would mean we'd have to wait another 5 years (and probably more) until > we could set non-_pq_-prefixed GUCs safely in the startup message, > using this approach. Hmm. I guess that is a problem. > Two side-notes: > 1. I realized there is a second benefit to using _pq_ for all GUCs > that change the protocol level that I didn't mention in my previous > email: It allows clients to assume that _pq_ prefixed GUCs will change > the wire-protocol and that they should not allow a user of the client > to set them willy-nilly. I'll go into this benefit more in the rest of > this email. > 2. To clarify: I'm suggesting that a startup packet containing > _pq_.whizzbang would actually set the _pq_.whizzbang GUC, and not the > whizzbang GUC. i.e. the _pq_ prefix is not stripped-off when parsing > the startup packet. I really intended the _pq_ prefix as a way of taking something out of the GUC namespace, not as a part of the GUC namespace that users would see. And I'm reluctant to go back on that. If we want to make pg_protocol.${NAME} mean a wire protocol parameter, well maybe there's something to that idea [insert caveats here]. But doesn't _pq_ look like something that was intended to be internal? That's certainly how I intended it. > > I want to be able to know the state of my protocol > > parameters by calling libpq functions that answer my questions > > definitively based on libpq's own internal state. libpq itself *must* > > know what compression I'm using on my connection; the server's answer > > may be different. > > I think that totally makes sense that libpq should be able to answer > those questions without contacting the server, and indeed some > introspection should be added for that. But being able to introspect > what the server thinks the setting is seems quite useful too. That > still doesn't solve the problem of poolers though. How about > introducing a new ParameterGet message type too (matching the proposed > ParameterSet), so that poolers can easily parse and intercept that > message type. Wouldn't libpq already know what value it last set? Or is this needed because it doesn't know what the other side's default is? > 2. By using PGC_PROTOCOL to indicate that those GUCs can only be > changed using ParameterSet Hmm, OK. I guess if the PGC_PROTOCOL flag makes it so that the GUC can only be set using ParameterSet, and it also makes them non-transactional, then it's fine. So to be clear, I can't set these in postgresql.conf, or postgresql.auto.conf, or via ALTER $ANYTHING, or via SET, or in any other way than by sending P
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, Apr 4, 2024 at 1:10 PM Jelte Fennema-Nio wrote: > Attached is a rebased patchset We should keep talking about this, but I think we're far too close to the wire at this point to think about committing anything for v17 at this point. These are big changes, they haven't been thoroughly reviewed by anyone AFAICT, and we don't have consensus on what we ought to be doing. I know that's probably not what you want to hear, but realistically, I think that's the only reasonable decision at this point. -- Robert Haas EDB: http://www.enterprisedb.com
Re: meson vs windows perl
On 2024-04-05 Fr 08:25, Andrew Dunstan wrote: Here is an attempt to fix all that. It's ugly, but I think it's more principled. First, instead of getting the ldopts and then trying to filter out the ldflags and ccdlflags, it tells perl not to include those in the first place, by overriding a couple of routines in ExtUtils::Embed. And second, it's smarter about splitting what's left, so that it doesn't split on a space that's in a quoted item. The perl that's used to do that second bit is not pretty, but it has been tested on the system where the problem arose and apparently cures the problem. (No doubt some perl guru could improve it.) It also works on my Ubuntu system, so I don't think we'll be breaking anything (famous last words). Apparently I spoke too soon. Please ignore the above for now. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: IPC::Run::time[r|out] vs our TAP tests
On 2024-04-04 Th 17:24, Tom Lane wrote: TIL that IPC::Run::timer is not the same as IPC::Run::timeout. With a timer object you have to check $timer->is_expired to see if the timeout has elapsed, but with a timeout object you don't because it will throw a Perl exception upon timing out, probably killing your test program. It appears that a good chunk of our TAP codebase has not read this memo, because I see plenty of places that are checking is_expired in the naive belief that they'll still have control after a timeout has fired. I started having a look at these. Here are the cases I found: ./src/bin/psql/t/010_tab_completion.pl: my $okay = ($out =~ $pattern && !$h->{timeout}->is_expired); ./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired; ./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql startup timed out" if $self->{timeout}->is_expired; ./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql query timed out" if $self->{timeout}->is_expired; ./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql query timed out" if $self->{timeout}->is_expired; ./src/test/perl/PostgreSQL/Test/Cluster.pm: # timeout, which we'll handle by testing is_expired ./src/test/perl/PostgreSQL/Test/Cluster.pm: unless $timeout->is_expired; ./src/test/perl/PostgreSQL/Test/Cluster.pm:timeout is the IPC::Run::Timeout object whose is_expired method can be tested ./src/test/perl/PostgreSQL/Test/Cluster.pm: # timeout, which we'll handle by testing is_expired ./src/test/perl/PostgreSQL/Test/Cluster.pm: unless $timeout->is_expired; ./src/test/perl/PostgreSQL/Test/Utils.pm: if ($timeout->is_expired) ./src/test/recovery/t/021_row_visibility.pl: if ($psql_timeout->is_expired) ./src/test/recovery/t/032_relfilenode_reuse.pl: if ($psql_timeout->is_expired) Those in Cluster.pm look correct - they are doing the run() in an eval block and testing for the is_expired setting in an exception block. The other cases look more suspect. I'll take a closer look. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: WIP Incremental JSON Parser
On 2024-04-04 Th 15:54, Nathan Bossart wrote: On Thu, Apr 04, 2024 at 03:31:12PM -0400, Andrew Dunstan wrote: Does the attached patch fix it for you? It clears up 2 of the 3 warnings for me: ../postgresql/src/common/jsonapi.c: In function ‘IsValidJsonNumber’: ../postgresql/src/common/jsonapi.c:2018:30: error: ‘dummy_lex.inc_state’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 2018 | if (lex->incremental && !lex->inc_state->is_last_chunk && | Thanks, please try this instead. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c index 5f947dd618..44dbb7f7f9 100644 --- a/src/common/jsonapi.c +++ b/src/common/jsonapi.c @@ -279,6 +279,8 @@ IsValidJsonNumber(const char *str, int len) return false; dummy_lex.incremental = false; + dummy_lex.inc_state = NULL; + dummy_lex.pstack = NULL; /* * json_lex_number expects a leading '-' to have been eaten already. @@ -297,6 +299,8 @@ IsValidJsonNumber(const char *str, int len) dummy_lex.input_length = len; } + dummy_lex.token_start = dummy_lex.input; + json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len); return (!numeric_error) && (total_len == dummy_lex.input_length); @@ -2018,6 +2022,9 @@ json_lex_number(JsonLexContext *lex, char *s, { appendBinaryStringInfo(&lex->inc_state->partial_token, lex->token_start, s - lex->token_start); + if (num_err != NULL) + *num_err = error; + return JSON_INCOMPLETE; } else if (num_err != NULL)
Re: Add bump memory context type and use it for tuplesorts
Matthias van de Meent writes: > On Thu, 4 Apr 2024 at 22:43, Tom Lane wrote: >> The only objection I can think of is that perhaps this would slow >> things down a tad by requiring more complicated shifting/masking. >> I wonder if we could redo the performance checks that were done >> on the way to accepting the current design. > I didn't do very extensive testing, but the light performance tests > that I did with the palloc performance benchmark patch & script shared > above indicate didn't measure an observable negative effect. OK. I did not read the patch very closely, but at least in principle I have no further objections. David, are you planning to take point on getting this in? regards, tom lane
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote: > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > Thinking more on this, it doesn't seem related to > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change > any locking or something like that which impacts write positions. Agree. > I think what has happened here is that running_xact record written by > the background writer [1] is not written to the kernel or disk (see > LogStandbySnapshot()), before pg_current_wal_lsn() checks the > current_lsn to be compared with replayed LSN. Agree, I think it's not visible through pg_current_wal_lsn() yet. Also I think that the DEBUG message in LogCurrentRunningXacts() " elog(DEBUG2, "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)", CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt, LSN_FORMAT_ARGS(recptr), CurrRunningXacts->oldestRunningXid, CurrRunningXacts->latestCompletedXid, CurrRunningXacts->nextXid); " should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is released, see: \watch on Session 1 provides: pg_current_wal_lsn 0/87D110 (1 row) Until: Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579 2579XLogRecPtr WriteRqstPtr = asyncXactLSN; (gdb) n 2581boolwakeup = false; (gdb) 2584SpinLockAcquire(&XLogCtl->info_lck); (gdb) 2585RefreshXLogWriteResult(LogwrtResult); (gdb) 2586sleeping = XLogCtl->WalWriterSleeping; (gdb) 2587prevAsyncXactLSN = XLogCtl->asyncXactLSN; (gdb) 2588if (XLogCtl->asyncXactLSN < asyncXactLSN) (gdb) 2589XLogCtl->asyncXactLSN = asyncXactLSN; (gdb) 2590SpinLockRelease(&XLogCtl->info_lck); (gdb) p p/x (uint32) XLogCtl->asyncXactLSN $1 = 0x87d148 Then session 1 provides: pg_current_wal_lsn 0/87D148 (1 row) So, when we see in the log: 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 739 next xid 740) 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG: statement: SELECT '0/360' <= replay_lsn AND state = 'streaming' It's indeed possible that the new LSN was not visible yet (spinlock not released?) before the query began (because we can not rely on the time the DEBUG message has been emitted). > Note that the reason why > walsender has picked the running_xact written by background writer is > because it has checked after pg_current_wal_lsn() query, see LOGs [2]. > I think we can probably try to reproduce manually via debugger. > > If this theory is correct It think it is. > then I think we will need to use injection > points to control the behavior of bgwriter or use the slots created > via SQL API for syncing in tests. > > Thoughts? I think that maybe as a first step we should move the "elog(DEBUG2," message as proposed above to help debugging (that could help to confirm the above theory). If the theory is proven then I'm not sure we need the extra complexity of injection point here, maybe just relying on the slots created via SQL API could be enough. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Psql meta-command conninfo+
On 04.04.24 18:15, Maiquel Grassi wrote: Well, I can revert \conninfo to its original state and keep \conninfo+ as it is, perhaps removing the application name, as I believe everything else is important for a DBA/SysAdmin. Do you think we can proceed with the patch this way? I am open to ideas that make \conninfo not limited to just four or five basic pieces of information and easily bring everything else to the screen. The original \conninfo was designed to report values from the libpq API about what libpq connected to. And the convention in psql is that "+" provide more or less the same information but a bit more. So I think it is wrong to make "\conninfo+" something fundamentally different than "\conninfo". And even more so if it contains information that isn't really "connection information". For example, current_user() doesn't make sense here, I think. I mean, if you want to add a command \some_useful_status_information, we can talk about that, but let's leave \conninfo to what it does. But I think there are better ways to implement this kind of server-side status, for example, with a system view. One problem in this patch is that it has no tests. Any change in any of the involved functions or views will silently break this. (Note that plain \conninfo doesn't have this problem to this extent because it only relies on libpq function calls. Also, a system view would not have this problem.)
Re: documentation structure
On Fri, Mar 29, 2024 at 9:40 AM Robert Haas wrote: > 2. Demote "Monitoring Disk Usage" from a chapter on its own to a > section of the "Monitoring Database Activity" chapter. I haven't seen > any objections to this, and I'd like to move ahead with it. > > 3. Merge the separate chapters on various built-in index AMs into one. > Peter didn't think this was a good idea, but Tom and Alvaro's comments > focused on how to do it mechanically, and on whether the chapters > needed to be reordered afterwards, which I took to mean that they were > OK with the basic concept. David Johnston was also clearly in favor of > it. So I'd like to move ahead with this one, too. I committed these two patches. > 4. Consolidate the "Generic WAL Records" and "Custom WAL Resource > Managers" chapters, which cover related topics, into a single one. I > didn't see anyone object to this, but David Johnston pointed out that > the patch I posted was a few bricks short of a load, because it really > needed to put some introductory text into the new chapter. I'll study > this a bit more and propose a new patch that does the same thing a bit > more carefully than my previous version did. Here is a new version of this patch. I think this is v18 material at this point, absent an outcry to the contrary. Sometimes we're flexible about doc patches. -- Robert Haas EDB: http://www.enterprisedb.com v3-0001-docs-Consolidate-into-new-WAL-for-Extensions-chap.patch Description: Binary data
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 02:35:42PM +, Bertrand Drouvot wrote: > I think that maybe as a first step we should move the "elog(DEBUG2," message > as > proposed above to help debugging (that could help to confirm the above > theory). If you agree and think that makes sense, pleae find attached a tiny patch doing so. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From a414e81a4c5c88963b2412d1641f3de1262c15c0 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 5 Apr 2024 14:49:51 + Subject: [PATCH v1] Move DEBUG message in LogCurrentRunningXacts() Indeed the new LSN is visible to others session (say through pg_current_wal_lsn()) only after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is released. So moving the DEBUG message after the XLogSetAsyncXactLSN() call seems more appropriate for debugging purpose. --- src/backend/storage/ipc/standby.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) 100.0% src/backend/storage/ipc/ diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 87b04e51b3..ba549acf50 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -1366,6 +1366,17 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts) recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS); + /* + * Ensure running_xacts information is synced to disk not too far in the + * future. We don't want to stall anything though (i.e. use XLogFlush()), + * so we let the wal writer do it during normal operation. + * XLogSetAsyncXactLSN() conveniently will mark the LSN as to-be-synced + * and nudge the WALWriter into action if sleeping. Check + * XLogBackgroundFlush() for details why a record might not be flushed + * without it. + */ + XLogSetAsyncXactLSN(recptr); + if (CurrRunningXacts->subxid_overflow) elog(DEBUG2, "snapshot of %d running transactions overflowed (lsn %X/%X oldest xid %u latest complete %u next xid %u)", @@ -1383,17 +1394,6 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts) CurrRunningXacts->latestCompletedXid, CurrRunningXacts->nextXid); - /* - * Ensure running_xacts information is synced to disk not too far in the - * future. We don't want to stall anything though (i.e. use XLogFlush()), - * so we let the wal writer do it during normal operation. - * XLogSetAsyncXactLSN() conveniently will mark the LSN as to-be-synced - * and nudge the WALWriter into action if sleeping. Check - * XLogBackgroundFlush() for details why a record might not be flushed - * without it. - */ - XLogSetAsyncXactLSN(recptr); - return recptr; } -- 2.34.1
Re: Speed up clean meson builds by ~25%
Hi, On 2024-04-05 15:36:34 +0200, Jelte Fennema-Nio wrote: > On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio wrote: > > It improved clean build times on my machine (10 cores/20 threads) from ~40 > > seconds to ~30 seconds. > > After discussing this off-list with Bilal, I realized that this gain > is only happening for clang builds on my system. Because those take a > long time as was also recently discussed in[1]. My builds don't take > nearly as long though. I tried with clang 15 through 18 and they all > took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu > 22.04 > > [1]: > https://www.postgresql.org/message-id/flat/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com I recommend opening a bug report for clang, best with an already preprocessed input file. We're going to need to do something about this from our side as well, I suspect. The times aren't great with gcc either, even if not as bad as with clang. Greetings, Andres Freund
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 16:04, Robert Haas wrote: > > On Thu, Apr 4, 2024 at 1:10 PM Jelte Fennema-Nio wrote: > > Attached is a rebased patchset > > We should keep talking about this, but I think we're far too close to > the wire at this point to think about committing anything for v17 at > this point. These are big changes, they haven't been thoroughly > reviewed by anyone AFAICT, and we don't have consensus on what we > ought to be doing. I know that's probably not what you want to hear, > but realistically, I think that's the only reasonable decision at this > point. Agreed on not considering this for PG17 nor this commitfest anymore. I changed the commit fest entry accordingly.
Re: Streaming read-ready sequential scan code
On Thu, Apr 4, 2024 at 12:39 PM Andres Freund wrote: > > On 2024-04-04 22:37:39 +1300, Thomas Munro wrote: > > On Thu, Apr 4, 2024 at 10:31 PM Thomas Munro wrote: > > > Alright what about this? > > I think it's probably worth adding a bit more of the commit message to the > function comment. Yes, there's a bit in one of the return branches, but that's > not what you're going to look at when just checking what the function does. Agreed about the comment. I kept thinking that BAS_BULKREAD should maybe return nbuffers - 1, but I couldn't convince myself why. Otherwise v2-0001-Allow-BufferAccessStrategy-to-limit-pin-count LGTM. - Melanie
Re: Popcount optimization using AVX512
On Fri, Apr 05, 2024 at 07:58:44AM -0500, Nathan Bossart wrote: > On Fri, Apr 05, 2024 at 10:33:27AM +0300, Ants Aasma wrote: >> The main issue I saw was that clang was able to peel off the first >> iteration of the loop and then eliminate the mask assignment and >> replace masked load with a memory operand for vpopcnt. I was not able >> to convince gcc to do that regardless of optimization options. >> Generated code for the inner loop: >> >> clang: >> : >> 50: add rdx, 64 >> 54: cmp rdx, rdi >> 57: jae >> 59: vpopcntq zmm1, zmmword ptr [rdx] >> 5f: vpaddq zmm0, zmm1, zmm0 >> 65: jmp >> >> gcc: >> : >> 38: kmovq k1, rdx >> 3d: vmovdqu8 zmm0 {k1} {z}, zmmword ptr [rax] >> 43: add rax, 64 >> 47: mov rdx, -1 >> 4e: vpopcntq zmm0, zmm0 >> 54: vpaddq zmm0, zmm0, zmm1 >> 5a: vmovdqa64 zmm1, zmm0 >> 60: cmp rax, rsi >> 63: jb >> >> I'm not sure how much that matters in practice. Attached is a patch to >> do this manually giving essentially the same result in gcc. As most >> distro packages are built using gcc I think it would make sense to >> have the extra code if it gives a noticeable benefit for large cases. > > Yeah, I did see this, but I also wasn't sure if it was worth further > complicating the code. I can test with and without your fix and see if it > makes any difference in the benchmarks. This seems to provide a small performance boost, so I've incorporated it into v27. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 9fc4b7556b72d51fce676db84b446099767efff3 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 27 Mar 2024 16:39:24 -0500 Subject: [PATCH v27 1/2] AVX512 popcount support --- config/c-compiler.m4 | 58 ++ configure| 252 +++ configure.ac | 51 ++ meson.build | 87 + src/Makefile.global.in | 5 + src/include/pg_config.h.in | 12 ++ src/include/port/pg_bitutils.h | 11 ++ src/makefiles/meson.build| 4 +- src/port/Makefile| 11 ++ src/port/meson.build | 6 +- src/port/pg_bitutils.c | 5 + src/port/pg_popcount_avx512.c| 82 + src/port/pg_popcount_avx512_choose.c | 81 + src/test/regress/expected/bit.out| 24 +++ src/test/regress/sql/bit.sql | 4 + 15 files changed, 690 insertions(+), 3 deletions(-) create mode 100644 src/port/pg_popcount_avx512.c create mode 100644 src/port/pg_popcount_avx512_choose.c diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 3268a780bb..892b3c9580 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -694,3 +694,61 @@ if test x"$Ac_cachevar" = x"yes"; then fi undefine([Ac_cachevar])dnl ])# PGAC_LOONGARCH_CRC32C_INTRINSICS + +# PGAC_XSAVE_INTRINSICS +# - +# Check if the compiler supports the XSAVE instructions using the _xgetbv +# intrinsic function. +# +# An optional compiler flag can be passed as argument (e.g., -mxsave). If the +# intrinsic is supported, sets pgac_xsave_intrinsics and CFLAGS_XSAVE. +AC_DEFUN([PGAC_XSAVE_INTRINSICS], +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_xsave_intrinsics_$1])])dnl +AC_CACHE_CHECK([for _xgetbv with CFLAGS=$1], [Ac_cachevar], +[pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS $1" +AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ], + [return _xgetbv(0) & 0xe0;])], + [Ac_cachevar=yes], + [Ac_cachevar=no]) +CFLAGS="$pgac_save_CFLAGS"]) +if test x"$Ac_cachevar" = x"yes"; then + CFLAGS_XSAVE="$1" + pgac_xsave_intrinsics=yes +fi +undefine([Ac_cachevar])dnl +])# PGAC_XSAVE_INTRINSICS + +# PGAC_AVX512_POPCNT_INTRINSICS +# - +# Check if the compiler supports the AVX512 POPCNT instructions using the +# _mm512_setzero_si512, _mm512_maskz_loadu_epi8, _mm512_popcnt_epi64, +# _mm512_add_epi64, and _mm512_reduce_add_epi64 intrinsic functions. +# +# Optional compiler flags can be passed as argument (e.g., -mavx512vpopcntdq +# -mavx512bw). If the intrinsics are supported, sets +# pgac_avx512_popcnt_intrinsics and CFLAGS_POPCNT. +AC_DEFUN([PGAC_AVX512_POPCNT_INTRINSICS], +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics_$1])])dnl +AC_CACHE_CHECK([for _mm512_popcnt_epi64 with CFLAGS=$1], [Ac_cachevar], +[pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS $1" +AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ], + [const char buf@<:@sizeof(__m512i)@:>@; + PG_INT64_TYPE popcnt = 0; + __m512i accum = _mm512_setzero_si512(); + const __m512i val = _mm512_maskz_loadu_epi8((__mmask64) 0xf0f0f0f0f0f0f0f0, (const __m512i *) buf); + const __m512i cnt = _mm512_popcnt_epi64(val); + accum = _mm512_add_epi64(accum, cnt); + popcnt = _mm512_reduce_add_epi64(accum); + /* return computed va
Re: WIP Incremental JSON Parser
On Fri, Apr 05, 2024 at 10:15:45AM -0400, Andrew Dunstan wrote: > On 2024-04-04 Th 15:54, Nathan Bossart wrote: >> On Thu, Apr 04, 2024 at 03:31:12PM -0400, Andrew Dunstan wrote: >> > Does the attached patch fix it for you? >> It clears up 2 of the 3 warnings for me: >> >> ../postgresql/src/common/jsonapi.c: In function ‘IsValidJsonNumber’: >> ../postgresql/src/common/jsonapi.c:2018:30: error: ‘dummy_lex.inc_state’ may >> be used uninitialized in this function [-Werror=maybe-uninitialized] >> 2018 | if (lex->incremental && !lex->inc_state->is_last_chunk && >>| > > Thanks, please try this instead. LGTM, thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Table AM Interface Enhancements
Hi, hackers! On Tue, 2 Apr 2024 at 19:17, Jeff Davis wrote: > On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote: > > I don't like the idea that every custom table AM reltoptions should > > begin with StdRdOptions. I would rather introduce the new data > > structure with table options, which need to be accessed outside of > > table AM. Then reloptions will be a backbox only directly used in > > table AM, while table AM has a freedom on what to store in reloptions > > and how to calculate externally-visible options. What do you think? > > Hi Alexander! > > I agree with all of that. It will take some refactoring to get there, > though. > > One idea is to store StdRdOptions like normal, but if an unrecognized > option is found, ask the table AM if it understands the option. In that > case I think we'd just use a different field in pg_class so that it can > use whatever format it wants to represent its options. > > Regards, > Jeff Davis > I tried to rework a patch regarding table am according to the input from Alexander and Jeff. It splits table reloptions into two categories: - common for all tables (stored in a fixed size structure and could be accessed from outside) - table-am specific (variable size, parsed and accessed by access method only) Please find a patch attached. v9-0001-Custom-reloptions-for-table-AM.patch Description: Binary data
Re: documentation structure
On Mon, Mar 25, 2024 at 11:40 AM Peter Eisentraut wrote: > I think a possible problem we need to consider with these proposals to > combine chapters is that they could make the chapters themselves too > deep and harder to navigate. I looked into various options for further combining chapters and/or appendixes and found that this is indeed a huge problem. For example, I had thought of creating a Developer Information chapter in the appendix and moving various existing chapters and appendixes inside of it, but that means that the elements in those chapters get demoted to , and what used to be a whole chapter or appendix becomes a . And since you get one HTML page per , that means that instead of a bunch of individual HTML pages of very pleasant length, you suddenly get one very long HTML page that is, exactly as you say, hard to navigate. > The rendering can be adjusted to some degree, but then we also need to > make sure any new chunking makes sense in other chapters. (And it might > also change a bunch of externally known HTML links.) I looked into this and I'm unclear how much customization is possible. I gather that the current scheme comes from having chunk.section.depth of 1, and I guess you can change that to 2 to get an HTML page per , but it seems like it would take a LOT of restructuring to make that work. It would be much easier if you could vary this across different parts of the documentation; for instance, if you could say, well, in this particular chapter or appendix, I want chunk.section.depth of 2, but elsewhere 1, that would be quite handy, but after several hours reading various things about DocBook on the Internet, I was still unable to determine conclusively whether this was possible. There's an interesting comment in stylesheet-speedup-xhtml.xsl that says "Since we set a fixed $chunk.section.depth, we can do away with a bunch of complicated XPath searches for the previous and next sections at various levels." That sounds like it's suggesting that it is in fact possible for this setting to vary, but I don't know if that's true, or how to do it, and it sounds like there might be performance consequences, too. > I think maybe more could also be done at the top-level structure, too. > Right now, we have -> -> . We could add on > top of that. Does this let you create structures of non-uniform depth? i.e. is there a way that we can group some chapters into sets while leaving others as standalone chapters, or somesuch? I'm not 100% confident that non-uniform depth (either via or via chunk.section.depth or via some other mechanism) is a good idea. There's a sort of uniformity to our navigation right now that does have some real appeal. The downside, though, is that if you want something to be a single HTML page, it's got to either be a chapter (or appendix) by itself with no sections inside of it, or it's got to be a inside of a chapter, and so anything that's long enough that it should be an HTML page by itself can never be more than one level below the index. And that seems to make it quite difficult to keep the index small. Without some kind of variable-depth structure, the only other ways that I can see to improve things are: 1. Make chunk.section.depth 2 and restructure the entire documentation until the results look reasonable. This might be possible but I bet it's difficult. We have, at present, chapters of *wildly* varying length, from a few sentences to many, many pages. That is perhaps a bad thing; you most likely wouldn't do that in a printed book. But fixing it is a huge project. We don't necessarily have the same amount of content about each topic, and there isn't necessarily a way of grouping related topics together that produces units of relatively uniform length. I think it's sensible to try to make improvements where we can, by pushing stuff down that's short and not that important, but finding our way to a chunk.section.depth=2 world that feels good to most people compared to what we have today seems like it's going to be challening. 2. Replace the current index with a custom index or landing page of some kind. Or keep the current index and add a new landing page alongside it. Something that isn't derived automatically from the documentation structure but is created by hand. -- Robert Haas EDB: http://www.enterprisedb.com
Re: broken JIT support on Fedora 40
> On Fri, Apr 05, 2024 at 03:50:50PM +0200, Dmitry Dolgov wrote: > > On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote: > > > On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote: > > > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro > > > wrote: > > > > https://github.com/llvm/llvm-project/pull/87093 > > > > > > Oh, with those clues, I think I might see... It is a bit strange that > > > we copy attributes from AttributeTemplate(), a function that returns > > > Datum, to our void deform function. It works (I mean doesn't crash) > > > if you just comment this line out: > > > > > > llvm_copy_attributes(AttributeTemplate, v_deform_fn); > > > > > > ... but I guess that disables inlining of the deform function? So > > > perhaps we just need to teach that thing not to try to copy the return > > > value's attributes, which also seems to work here: > > > > Yep, I think this is it. I've spent some hours trying to understand why > > suddenly deform function has noundef ret attribute, when it shouldn't -- > > this explains it and the proposed change fixes the crash. One thing that > > is still not clear to me though is why this copied attribute doesn't > > show up in the bitcode dumped right before running inline pass (I've > > added this to troubleshoot the issue). > > One thing to consider in this context is indeed adding "verify" pass as > suggested in the PR, at least for the debugging configuration. Without the fix > it immediately returns: > > Running analysis: VerifierAnalysis on deform_0_1 > Attribute 'noundef' applied to incompatible type! > > llvm error: Broken function found, compilation aborted! Here is what I have in mind. Interestingly enough, it also shows few more errors besides "noundef": Intrinsic name not mangled correctly for type arguments! Should be: llvm.lifetime.end.p0 ptr @llvm.lifetime.end.p0i8 It refers to the function from create_LifetimeEnd, not sure how important is this. >From 7a05bd48a4270998a08e63e07cdf1cb9932bfddd Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Fri, 5 Apr 2024 17:52:06 +0200 Subject: [PATCH v1] Add jit_verify_bitcode When passing LLVM IR through optimizer, for troubleshooting purposes it's useful to apply the "verify" pass as well. It can catch an invalid IR, if such was produced by mistake, and output a meaningful message about the error, e.g.: Attribute 'noundef' applied to incompatible type! ptr @deform_0_1 Control this via jit_verify_bitcode development guc. --- doc/src/sgml/config.sgml| 19 +++ src/backend/jit/jit.c | 1 + src/backend/jit/llvm/llvmjit.c | 11 --- src/backend/utils/misc/guc_tables.c | 11 +++ src/include/jit/jit.h | 2 +- 5 files changed, 40 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 624518e0b01..5f07ae099f4 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -11931,6 +11931,25 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1) + + jit_verify_bitcode (boolean) + + jit_verify_bitcode configuration parameter + + + + +Adds a verify pass to the pipeline, when optimizing +generated LLVM IR. It helps to identify +cases when an invalid IR was generated due to errors, and is only +useful for working on the internals of the JIT implementation. +The default setting is off. +Only superusers and users with the appropriate SET +privilege can change this setting. + + + + jit_expressions (boolean) diff --git a/src/backend/jit/jit.c b/src/backend/jit/jit.c index 815b58f33c5..ad94e4b55ee 100644 --- a/src/backend/jit/jit.c +++ b/src/backend/jit/jit.c @@ -39,6 +39,7 @@ bool jit_tuple_deforming = true; double jit_above_cost = 10; double jit_inline_above_cost = 50; double jit_optimize_above_cost = 50; +bool jit_verify_bitcode = false; static JitProviderCallbacks provider; static bool provider_successfully_loaded = false; diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index ec0fdd49324..a4b21abb83a 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -698,12 +698,17 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module) #else LLVMPassBuilderOptionsRef options; LLVMErrorRef err; - const char *passes; + const char *passes, *intermediate_passes; if (context->base.flags & PGJIT_OPT3) - passes = "default"; + intermediate_passes = "default"; else - passes = "default,mem2reg"; + intermediate_passes = "default,mem2reg"; + + if (jit_verify_bitcode) + passes = psprintf("verify,%s", intermediate_passes); + else + passes = intermediate_passes; options = LLVMCreatePassBuilderOptions(); diff --git a/src/backend/utils/misc/guc_table
Re: Fixing backslash dot for COPY FROM...CSV
"Daniel Verite" writes: > Tom Lane wrote: >> I've looked over this patch and I generally agree that this is a >> reasonable solution. > Thanks for reviewing this! While testing this, I tried running the tests with an updated server and non-updated psql, and not only did the new test case fail, but so did a bunch of existing ones. That's because existing psql will send the trailing "\." of inlined data to the server, and the updated server will now think that's data if it's in CSV mode. So this means that the patch introduces a rather serious cross-version compatibility problem. I doubt we can consider inlined CSV data to be a niche case that few people use; but it will fail every time if your psql is older than your server. Not sure what to do here. One idea is to install just the psql-side fix, which should break nothing now that version-2 protocol is dead, and then wait a few years before introducing the server-side change. That seems kind of sad though. An argument for not waiting is that psql may not be the only client that needs this behavioral adjustment, and if so there's going to be breakage anyway when we change the server; so we might as well get it over with. regards, tom lane
Re: Psql meta-command conninfo+
> The original \conninfo was designed to report values from the libpq API > about what libpq connected to. And the convention in psql is that "+" > provide more or less the same information but a bit more. So I think it > is wrong to make "\conninfo+" something fundamentally different than > "\conninfo". That is a fair argument. Looking at this a bit more, it looks like we can enhance the GSSAPI output of conninfo to get the fields that GSSAPI fields that conninfo+ was aiming for Right now it just shows: printf(_("GSSAPI-encrypted connection\n")); but we can use libpq to get the other GSSAPI attributes in the output. > And even more so if it contains information that isn't really > "connection information". For example, current_user() doesn't make > sense here, I think. > I mean, if you want to add a command \some_useful_status_information, we > can talk about that, but let's leave \conninfo to what it does. > But I think there are better ways to implement this kind of server-side > status, for example, with a system view. What about a \sessioninfo command that shows all the information for the current session, PID, app_name, current_user, session_user, system_user, client_address, client_port, etc. These will be the fields that we cannot gather if the connection is broken for whatever reason. The distinction here is \sessioninfo are the details after the connection is established and could possibly be changed during the connection. Regards, Sami
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 16:02, Robert Haas wrote: > Often? > > I kind of hope that the protocol starts to evolve a bit more than it > has, but I don't want a continuous stream of changes. That will be > very hard to test and verify correctness, and a hassle for drivers to > keep up with, and a mess for compatibility. I definitely think protocol changes require a lot more scrutiny than many other things, given their hard/impossible to change nature. But I do think that we shouldn't be at all averse to the act of bumping the protocol version itself. If we have a single small protocol change in one release, then imho it's no problem to bump the protocol version. Bumping the version should be painless. So we shouldn't be inclined to push an already agreed upon protocol change to the next release, because there are some more protocol changes in the pipeline that won't make it in the current one. I don't think this would be any harder for drivers to keep up with, then if we'd bulk all changes together. If driver developers only want to support changes in bulk changes, they can simply choose not to support 3.1 at all if they want and wait until 3.2 to support everything in bulk then. > > So, what approach of checking feature support are you envisioning > > instead? A function for every feature? > > Something like SupportsSetProtocolParameter, that returns an error > > message if it's not supported and NULL otherwise. And then add such a > > support function for every feature? > > Yeah, something like that. > ... > > > I'm also not sure why you're saying a user is not entitled to this > > information. We already provide users of libpq a way to see the full > > Postgres version, and the major protocol version. I think allowing a > > user to access this information is only a good thing. But I agree that > > providing easy to use feature support functions is a better user > > experience in some cases. > > I guess that's a fair point. But I'm worried that if we expose too > much of the internals, we won't be able to change things later. I'll take a look at redesigning the protocol parameter stuff. To work with dedicated functions instead. > I really intended the _pq_ prefix as a way of taking something out of > the GUC namespace, not as a part of the GUC namespace that users would > see. And I'm reluctant to go back on that. If we want to make > pg_protocol.${NAME} mean a wire protocol parameter, well maybe there's > something to that idea [insert caveats here]. But doesn't _pq_ look > like something that was intended to be internal? That's certainly how > I intended it. I agree that _pq_ does look internal and doesn't clearly indicate that it's a protocol related change. But sadly the _pq_ prefix is the only one that doesn't error in startup packets, waiting another 5 years until pg_protocol is allowed in the startup packet doesn't seem like a reasonable solution either. How about naming the GUCs pg_protocol.${NAME}, but still requiring the _pq_ prefix in the StartupPacket. That way only client libraries would have to see this internal prefix and they could remap it someway. I see two options for that: 1. At the server replace the _pq_ prefix with pg_protocol. So _pq_.${NAME} would map to pg_protocol.${name} 2. At the server replace the _pq_.pg_protocol prefix with pg_protocol. So _pq_.pg_protocol.${NAME} would map to pg_protocol.${name}. I guess you prefer option 2, because that would still leave lots of space to do something with the rest of the _pq_ space, i.e. _pq_.magic_pixie_dust can still be used for something different than a GUC. Bikeshedding: I think I prefer protocol.${NAME} over pg_protocol.${NAME}, it's shorter and it seems obvious that protocol is the postgres protocol in this context. This should be a fairly simple change to make. > Wouldn't libpq already know what value it last set? Or is this needed > because it doesn't know what the other side's default is? libpq could/should indeed know this, but for debugging/testing purposes it is quite useful to have a facility to read the server side value. I think defaults should always be whatever was happening if the parameter wasn't specified before, so knowing the server default is not something the client needs to worry about (i.e. the default is defined as part of the protocol spec). > Hmm, OK. I guess if the PGC_PROTOCOL flag makes it so that the GUC can > only be set using ParameterSet, and it also makes them > non-transactional, then it's fine. So to be clear, I can't set these > in postgresql.conf, or postgresql.auto.conf, or via ALTER $ANYTHING, > or via SET, or in any other way than by sending ParameterStatus > messages. And when I send a ParameterStatus message, it doesn't matter > if I'm in a good transaction, an aborted transaction, or no > transaction at all, and the setting change takes effect regardless of > that and regardless of any subsequent rollbacks. Is that right? > > I feel like maybe it's not, because you seem to be thin
Re: AIX support
> What you do need to do to reproduce the described problems is > check out the Postgres git tree and rewind to just before > commit 0b16bb877, where we deleted AIX support. Any attempt > to restore AIX support would have to start with reverting that > commit (and perhaps the followup f0827b443). > regards, tom lane Hi Team, thank you for all the info. We progressed to build the source on our nodes and the build was successful with the below configuration. Postgres - github-bcdfa5f2e2f AIX - 71c Xlc - 13.1.0 Bison- 3.0.5 Going ahead, we want to build the changes that were removed as part of “0b16bb8776bb8”, with latest Xlc and gcc version. We were building the source from the postgres ftp server(https://ftp.postgresql.org/pub/source/), would like to understand if there are any source level changes between the ftp server and the source on github? Regards, Sriram. From: Tom Lane Date: Friday, 29 March 2024 at 9:03 AM To: Thomas Munro Cc: Noah Misch , Sriram RK , Alvaro Herrera , pgsql-hack...@postgresql.org Subject: Re: AIX support Thomas Munro writes: > Oh, sorry, I had missed the part where newer compilers fix the issue > too. Old out-of-support versions of AIX running old compilers, what > fun. Indeed. One of the topics that needs investigation if you want to pursue this is which AIX system and compiler versions still deserve support, and which of the AIX hacks we had been carrying still need to be there based on that analysis. For context, we've been pruning support for extinct-in-the-wild OS versions pretty aggressively over the past couple of years, and I'd expect to apply the same standard to AIX. regards, tom lane
Re: documentation structure
On Fri, Apr 5, 2024 at 9:01 AM Robert Haas wrote: > > > The rendering can be adjusted to some degree, but then we also need to > > make sure any new chunking makes sense in other chapters. (And it might > > also change a bunch of externally known HTML links.) > > I looked into this and I'm unclear how much customization is possible. > > Here is a link to my attempt at this a couple of years ago. It basically "abuses" refentry. https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com I never did dive into the man page or PDF dynamics of this particular change but it seemed to solve HTML pagination without negative consequences and with minimal risk of unintended consequences since only the markup on the pages we want to alter is changed, not global configuration. David J.
Re: documentation structure
On Fri, Apr 5, 2024 at 12:15 PM David G. Johnston wrote: > Here is a link to my attempt at this a couple of years ago. It basically > "abuses" refentry. > > https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com > > I never did dive into the man page or PDF dynamics of this particular change > but it seemed to solve HTML pagination without negative consequences and with > minimal risk of unintended consequences since only the markup on the pages we > want to alter is changed, not global configuration. Hmm, but it seems like that might have generated some man page entries that we don't want? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Speed up clean meson builds by ~25%
On Fri, 5 Apr 2024 at 17:24, Andres Freund wrote: > I recommend opening a bug report for clang, best with an already preprocessed > input file. > We're going to need to do something about this from our side as well, I > suspect. The times aren't great with gcc either, even if not as bad as with > clang. Agreed. While not a full solution, I think this patch is still good to address some of the pain: Waiting 10 seconds at the end of my build with only 1 of my 10 cores doing anything. So while this doesn't decrease CPU time spent it does decrease wall-clock time significantly in some cases, with afaict no downsides.
Re: documentation structure
On Fri, Apr 5, 2024 at 9:18 AM Robert Haas wrote: > On Fri, Apr 5, 2024 at 12:15 PM David G. Johnston > wrote: > > Here is a link to my attempt at this a couple of years ago. It > basically "abuses" refentry. > > > > > https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com > > > > I never did dive into the man page or PDF dynamics of this particular > change but it seemed to solve HTML pagination without negative consequences > and with minimal risk of unintended consequences since only the markup on > the pages we want to alter is changed, not global configuration. > > Hmm, but it seems like that might have generated some man page entries > that we don't want? > If so (didn't check) maybe just remove them in post? David J.
Re: Fixing backslash dot for COPY FROM...CSV
I wrote: > So this means that the patch introduces a rather serious cross-version > compatibility problem. I doubt we can consider inlined CSV data to be > a niche case that few people use; but it will fail every time if your > psql is older than your server. On third thought, that may not be as bad as I was thinking. We don't blink at the idea that an older psql's \d commands may malfunction with a newer server, and I think most users have internalized the idea that they want psql >= server. If the patch created an incompatibility with that direction, it'd be a problem, but I don't think it does. regards, tom lane
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 12:09, Jelte Fennema-Nio wrote: > On Fri, 5 Apr 2024 at 16:02, Robert Haas wrote: > > Often? > > > > I kind of hope that the protocol starts to evolve a bit more than it > > has, but I don't want a continuous stream of changes. That will be > > very hard to test and verify correctness, and a hassle for drivers to > > keep up with, and a mess for compatibility. > > I definitely think protocol changes require a lot more scrutiny than > many other things, given their hard/impossible to change nature. > > But I do think that we shouldn't be at all averse to the act of > bumping the protocol version itself. If we have a single small > protocol change in one release, then imho it's no problem to bump the > protocol version. Bumping the version should be painless. So we > shouldn't be inclined to push an already agreed upon protocol change > to the next release, because there are some more protocol changes in > the pipeline that won't make it in the current one. > > I don't think this would be any harder for drivers to keep up with, > then if we'd bulk all changes together. If driver developers only want > to support changes in bulk changes, they can simply choose not to > support 3.1 at all if they want and wait until 3.2 to support > everything in bulk then. > > > > So, what approach of checking feature support are you envisioning > > > instead? A function for every feature? > > > Something like SupportsSetProtocolParameter, that returns an error > > > message if it's not supported and NULL otherwise. And then add such a > > > support function for every feature? > > > > Yeah, something like that. > > ... > > > > > I'm also not sure why you're saying a user is not entitled to this > > > information. We already provide users of libpq a way to see the full > > > Postgres version, and the major protocol version. I think allowing a > > > user to access this information is only a good thing. But I agree that > > > providing easy to use feature support functions is a better user > > > experience in some cases. > > > > I guess that's a fair point. But I'm worried that if we expose too > > much of the internals, we won't be able to change things later. > > I'll take a look at redesigning the protocol parameter stuff. To work > with dedicated functions instead. > +1 > > > I really intended the _pq_ prefix as a way of taking something out of > > the GUC namespace, not as a part of the GUC namespace that users would > > see. And I'm reluctant to go back on that. If we want to make > > pg_protocol.${NAME} mean a wire protocol parameter, well maybe there's > > something to that idea [insert caveats here]. But doesn't _pq_ look > > like something that was intended to be internal? That's certainly how > > I intended it. > Is this actually used in practice? If so, how ? > > I agree that _pq_ does look internal and doesn't clearly indicate that > it's a protocol related change. But sadly the _pq_ prefix is the only > one that doesn't error in startup packets, waiting another 5 years > until pg_protocol is allowed in the startup packet doesn't seem like a > reasonable solution either. > > How about naming the GUCs pg_protocol.${NAME}, but still requiring the > _pq_ prefix in the StartupPacket. That way only client libraries would > have to see this internal prefix and they could remap it someway. I > see two options for that: > 1. At the server replace the _pq_ prefix with pg_protocol. So > _pq_.${NAME} would map to pg_protocol.${name} > 2. At the server replace the _pq_.pg_protocol prefix with pg_protocol. > So _pq_.pg_protocol.${NAME} would map to pg_protocol.${name}. > > I guess you prefer option 2, because that would still leave lots of > space to do something with the rest of the _pq_ space, i.e. > _pq_.magic_pixie_dust can still be used for something different than a > GUC. > > Bikeshedding: I think I prefer protocol.${NAME} over > pg_protocol.${NAME}, it's shorter and it seems obvious that protocol > is the postgres protocol in this context. > > This should be a fairly simple change to make. > > > Wouldn't libpq already know what value it last set? Or is this needed > > because it doesn't know what the other side's default is? > > libpq could/should indeed know this, but for debugging/testing > purposes it is quite useful to have a facility to read the server side > value. I think defaults should always be whatever was happening if the > parameter wasn't specified before, so knowing the server default is > not something the client needs to worry about (i.e. the default is > defined as part of the protocol spec). > > > Hmm, OK. I guess if the PGC_PROTOCOL flag makes it so that the GUC can > > only be set using ParameterSet, and it also makes them > > non-transactional, then it's fine. So to be clear, I can't set these > > in postgresql.conf, or postgresql.auto.conf, or via ALTER $ANYTHING, > > or via SET, or in any other way than by sending ParameterStatus > > messages. And when I send a Paramete
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Thu, Apr 4, 2024 at 6:37 PM Michael Paquier wrote: > From where did you pull the LibreSSL sources? Directly from the > OpenBSD tree? I've been building LibreSSL Portable: https://github.com/libressl/portable > Ah, right. OpenSSL_add_all_algorithms() is documented as having no > effect in 1.1.0. I think it still has an effect, but it's unnecessary unless you need some specific initialization other than the defaults -- for which we now have OPENSSL_init_crypto(). Best I could tell, pgcrypto had no such requirements (but extra eyes on that would be appreciated). > I would be OK to draw a line to what we test in the buildfarm if it > comes to that, down to OpenBSD 6.9. That would correspond to LibreSSL 3.3 if I'm not mistaken. Any particular reason for 6.9 as the dividing line, and not something later? And by "test in the buildfarm", do you mean across all versions, or just what we support for PG17? (For the record, I don't think there's any reason to drop older LibreSSL testing for earlier branches.) > This version is already not > supported, and we had a number of issues with older versions and > timestamps going backwards. Speaking of which: for completeness I should note that LibreSSL 3.2 (OpenBSD 6.8) is failing src/test/ssl because of alternate error messages. That failure exists on HEAD and is not introduced in this patchset. LibreSSL 3.3, which passes, has the following changelog note: * If x509_verify() fails, ensure that the error is set on both the x509_verify_ctx() and its store context to make some failures visible from SSL_get_verify_result(). So that would explain that. If we drop support for 3.2 and earlier then there's nothing to be done anyway. > -/* Define to 1 if you have the `CRYPTO_lock' function. */ > -#undef HAVE_CRYPTO_LOCK > > I'm happy to see that gone for good. +1 > + # Functions introduced in OpenSSL 1.1.0/LibreSSL 2.7.0. > + ['OPENSSL_init_ssl', {'required': true}], > + ['BIO_meth_new', {'required': true}], > + ['ASN1_STRING_get0_data', {'required': true}], > + ['HMAC_CTX_new', {'required': true}], > + ['HMAC_CTX_free', {'required': true}], > > These should be removed to save cycles in ./configure and meson, no? > We don't have any more of their HAVE_* flags in the tree with this > patch applied. True, but they are required, and I needed something in there to reject earlier builds. Daniel's suggested approach with _VERSION_NUMBER should be able to replace this I think. > - cdata.set('OPENSSL_API_COMPAT', '0x10002000L', > + cdata.set('OPENSSL_API_COMPAT', '0x1010L', > > Seems to me that this should also document in meson.build why 1.1.0 is > chosen, same as ./configure. Good point. > It seems to me that src/common/protocol_openssl.c could be cleaned up; > I see SSL_CTX_set_min_proto_version and SSL_CTX_set_max_proto_version > listed in LibreSSL (looking at some past version of > https://github.com/libressl/libressl.git that I still have around). > > There is an extra thing in pg_strong_random.c once we cut OpenSSL < > 1.1.1.. Do we still need pg_strong_random_init() and its RAND_poll() > when it comes to LibreSSL? This is a sensitive area, so we should be > careful. It would be cool if there are more cleanups that can happen. I agree we need to be careful around removal, though, especially now that we know that LibreSSL testing is spotty... I will look more into these later today. Thanks, --Jacob
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Dave Cramer writes: > On Fri, 5 Apr 2024 at 12:09, Jelte Fennema-Nio wrote: >> Setting PGC_PROTOCOL gucs would be allowed in the startup packet, >> which is fine afaict because that's also something that's part of the >> protocol level and is thus fully controlled by client libraries and >> poolers) But other than that: Yes, conf files, ALTER, and SET cannot >> change these GUCs. > +1 I don't buy that argument, actually. libpq, and pretty much every other client AFAIK, has provisions to let higher code levels insert random options into the startup packet. So to make this work libpq would have to filter or at least inspect such options, which is logic that doesn't exist and doesn't seem nice to need. The other problem with adding these things in the startup packet is that when you send that packet, you don't know what the server version is and hence don't know if it will take these options. What's so bad about insisting that these options must be sent in a separate message? regards, tom lane
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, Apr 5, 2024 at 12:09 PM Jelte Fennema-Nio wrote: > But I do think that we shouldn't be at all averse to the act of > bumping the protocol version itself. If we have a single small > protocol change in one release, then imho it's no problem to bump the > protocol version. Bumping the version should be painless. So we > shouldn't be inclined to push an already agreed upon protocol change > to the next release, because there are some more protocol changes in > the pipeline that won't make it in the current one. I think I half-agree with this. Let's say we all agree that the world will end unless we make wire protocol changes A and B, and for whatever reason we also agree that these changes should be handled via a protocol version bump rather than any other method, but only the patch for A is sufficiently stable by the end of the release cycle. Then we commit A and bump the protocol version and the next release we do the same for B, hopefully before the world ends. In this sense this is just like CATALOG_VERSION_NO or XLOG_PAGE_MAGIC. We don't postpone commits because they'd require bumping those values; we just bump the values when it's necessary. But on the other hand, I find it a bit hard to believe that the statement "Bumping the version should be painless" will ever correspond to reality. Since we've not done a hard wire protocol break in a very long time, I assume that we would not want to start now. But that also means that when the PG version with protocol version 3.5 comes out, that server is going to have to be compatible with 3.4, 3.3, 3.2, 3.1, and 3.0. How will we test that it really is? We could test against libpqs from older server versions, but that quickly becomes awkward, because it means that there won't be tests that run as part of a regular build, but it'll have to all be done in the buildfarm or CI with something like the cross-version upgrade tests we already have. Maybe we'd be better off adding a libpq connection option that forces the use of a specific minor protocol version, but then we'll need backward-compatibility code in libpq basically forever. But maybe we need that anyway to avoid older and newer servers being unable to communicate. Plus, you've got all of the consequences for non-core drivers, which have to both add support for the new wire protocol - if they don't want to seem outdated and eventually obsolete - and also test that they're still compatible with all supported server versions. Connection poolers have the same set of problems. The whole thing is almost a hole with no bottom. Keeping up with core changes in this area could become a massive undertaking for lots and lots of people, some of whom may be the sole maintainer of some important driver that now needs a whole bunch of work. I'm not sure how much it improves things if we imagine adding feature flags to the existing protocol versions, rather than whole new protocol versions, but at least it cuts down on the assumption that adopting new features is mandatory, and that such features are cumulative. If a driver wants to support TDE but not protocol parameters or protocol parameters but not TDE, who are we to say no? If in supporting those things we bump the protocol version to 3.2, and then 3.3 fixes a huge performance problem, are drivers going to be required to add support for features they don't care about to get the performance fixes? I see some benefit in bumping the protocol version for major changes, or for changes that we have an important reason to make mandatory, or to make previously-optional features for which support has become in practical terms universal part of the base feature set. But I'm very skeptical of the idea that we should just handle as many things as possible via a protocol version bump. We've been avoiding protocol version bumps like the plague since forever, and swinging all the way to the other extreme doesn't sound like the right idea to me. > How about naming the GUCs pg_protocol.${NAME}, but still requiring the > _pq_ prefix in the StartupPacket. That way only client libraries would > have to see this internal prefix and they could remap it someway. I > see two options for that: > 1. At the server replace the _pq_ prefix with pg_protocol. So > _pq_.${NAME} would map to pg_protocol.${name} > 2. At the server replace the _pq_.pg_protocol prefix with pg_protocol. > So _pq_.pg_protocol.${NAME} would map to pg_protocol.${name}. > > I guess you prefer option 2, because that would still leave lots of > space to do something with the rest of the _pq_ space, i.e. > _pq_.magic_pixie_dust can still be used for something different than a > GUC. I'm not sure what I think about this. Do we need these new GUCs to be both PGC_PROTOCOL *and also* live in a separate namespace? I see the need for the former pretty clearly: if these kinds of things are to be part of the GUC system (which wasn't my initial bias, but whatever) then they need to have some important behavioral differences
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 18:30, Dave Cramer wrote: >> > I really intended the _pq_ prefix as a way of taking something out of >> > the GUC namespace, not as a part of the GUC namespace that users would >> > see. And I'm reluctant to go back on that. If we want to make >> > pg_protocol.${NAME} mean a wire protocol parameter, well maybe there's >> > something to that idea [insert caveats here]. But doesn't _pq_ look >> > like something that was intended to be internal? That's certainly how >> > I intended it. > > > Is this actually used in practice? If so, how ? No, it's not used for anything at the moment. This whole thread is basically about trying to agree on how we want to make protocol changes in the future in a somewhat standardized way. But using the tools available that we have to not break connecting to old postgres servers: ProtocolVersionNegotation messages, minor version numbers, and _pq_ parameters in the startup message. All of those have so far been completely theoretical and have not appeared in any client-server communication.
Re: Fixing backslash dot for COPY FROM...CSV
Tom Lane wrote: > Not sure what to do here. One idea is to install just the psql-side > fix, which should break nothing now that version-2 protocol is dead, > and then wait a few years before introducing the server-side change. > That seems kind of sad though. Wouldn't backpatching solve this? Then only the users who don't apply the minor updates would have non-matching server and psql. Initially I though that obsoleting the v2 protocol was a recent move, but reading older messages from the list I've got the impression that it was more or less in the pipeline since way before version 10. Also one of the cases the patch fixes, the one when imported data are silently truncated at the point of \., is quite nasty IMO. I can imagine an app where user-supplied data would be appended row-by-row into a CSV file, and would be processed periodically by batch. Under some conditions, in particular if newlines in the first column are allowed, a malevolent user could submit a \. sequence to cause the batch to miss the rest of the data without any error being raised. [1] https://www.postgresql.org/message-id/11648.1403147417%40sss.pgh.pa.us Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 18:43, Tom Lane wrote: > I don't buy that argument, actually. libpq, and pretty much every > other client AFAIK, has provisions to let higher code levels insert > random options into the startup packet. So to make this work libpq > would have to filter or at least inspect such options, which is > logic that doesn't exist and doesn't seem nice to need. libpq actually doesn't support doing this (only by putting them in the "options" parameter, but _pq_ parameters would not be allowed there), but indeed many other clients do allow this and indeed likely don't have logic to filter/disallow _pq_ prefixed parameters. This seems very easy to address though: Only parse _pq_ options when protocol version 3.1 is requested by the client, and otherwise always report them as "not supported". Then clients upgrading to 3.1, they should filter/disallow _pq_ parameters to be arbitrarily set. I don't think that's hard/not nice to add, it's literally a prefix check for the "_pq_." string. > The other problem with adding these things in the startup packet > is that when you send that packet, you don't know what the server > version is and hence don't know if it will take these options. (imho) the whole point of the _pq_ options is that they don't trigger an error when they are requested by the client, but not supported by the server. So I don't understand your problem here. > What's so bad about insisting that these options must be sent in a > separate message? To not require an additional roundtrip waiting for the server to respond.
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 5 Apr 2024, at 03:37, Michael Paquier wrote: > On Thu, Apr 04, 2024 at 11:03:35AM -0700, Jacob Champion wrote: >> v3 does that by putting back checks for symbols that aren't part of >> LibreSSL (tested back to 2.7, which is where the 1.1.x APIs started to >> arrive). > > From where did you pull the LibreSSL sources? Directly from the > OpenBSD tree? > >> It also makes adjustments for the new OPENSSL_API_COMPAT >> version, getting rid of OpenSSL_add_all_algorithms() and adding a >> missing header. > > Ah, right. OpenSSL_add_all_algorithms() is documented as having no > effect in 1.1.0. This API was deprecated and made into a no-op in OpenBSD 6.4 which corresponds to LibreSSL 2.8.3. >> This patch has a deficiency where 1.1.0 itself isn't actually rejected >> at configure time; Daniel's working on an explicit check for the >> OPENSSL/LIBRESSL_VERSION_NUMBER that should fix that up. There's an >> open question about which version we should pin for LibreSSL, which >> should ultimately come down to which versions of OpenBSD we want PG17 >> to support. > > I would be OK to draw a line to what we test in the buildfarm if it > comes to that, down to OpenBSD 6.9. This version is already not > supported, and we had a number of issues with older versions and > timestamps going backwards. There is a member on 6.8 as well, and while 6.8 work fine the tests all fail due to the error messages being different. Rather than adding alternate output for an EOL version of OpenBSD (which currently don't even run the ssl checks in the BF) I think using 6.9 as the minimum makes sense. > + # Functions introduced in OpenSSL 1.1.0/LibreSSL 2.7.0. > + ['OPENSSL_init_ssl', {'required': true}], > + ['BIO_meth_new', {'required': true}], > + ['ASN1_STRING_get0_data', {'required': true}], > + ['HMAC_CTX_new', {'required': true}], > + ['HMAC_CTX_free', {'required': true}], > > These should be removed to save cycles in ./configure and meson, no? Correct, they are removed in favor of a compile test for OpenSSL version. > - cdata.set('OPENSSL_API_COMPAT', '0x10002000L', > + cdata.set('OPENSSL_API_COMPAT', '0x1010L', > > Seems to me that this should also document in meson.build why 1.1.0 is > chosen, same as ./configure. Done. > It seems to me that src/common/protocol_openssl.c could be cleaned up; > I see SSL_CTX_set_min_proto_version and SSL_CTX_set_max_proto_version > listed in LibreSSL (looking at some past version of > https://github.com/libressl/libressl.git that I still have around). Both SSL_CTX_set_min_proto_version and SSL_CTX_set_max_proto_version are available in at least OpenBSD 6.3 which is LibreSSL 2.7.5. With this we can thus remove the whole file. > There is an extra thing in pg_strong_random.c once we cut OpenSSL < > 1.1.1.. Do we still need pg_strong_random_init() and its RAND_poll() > when it comes to LibreSSL? This is a sensitive area, so we should be > careful. Re-reading the thread which added this comment, and the OpenSSL docs and code, I'm leaning towards leaving this in. The overhead is marginal and fork safety has been broken at least once in OpenSSL since 1.1.1: https://github.com/openssl/openssl/issues/12377 That particular bug was thankfully caught before it shipped, but mitigating the risk is this cheap enough that is seems reasonable to keep this in. Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 fails on Windows in CI which I will investigate next. -- Daniel Gustafsson v4-0001-Remove-support-for-OpenSSL-1.0.2.patch Description: Binary data
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 5 Apr 2024, at 18:41, Jacob Champion > wrote: > On Thu, Apr 4, 2024 at 6:37 PM Michael Paquier wrote: >> I would be OK to draw a line to what we test in the buildfarm if it >> comes to that, down to OpenBSD 6.9. > > That would correspond to LibreSSL 3.3 if I'm not mistaken. Any > particular reason for 6.9 as the dividing line, and not something > later? And by "test in the buildfarm", do you mean across all > versions, or just what we support for PG17? (For the record, I don't > think there's any reason to drop older LibreSSL testing for earlier > branches.) We should draw the line on something we can reliably test, so 6.9 seems fine to me (unless there is evidence of older versions being common in the wild). OpenBSD themselves support 2 backbranches so 6.9 is still far beyond the EOL mark upstream. -- Daniel Gustafsson
Re: AIX support
On Fri, Apr 05, 2024 at 04:12:06PM +, Sriram RK wrote: > > > What you do need to do to reproduce the described problems is > > check out the Postgres git tree and rewind to just before > > commit 0b16bb877, where we deleted AIX support. Any attempt > > to restore AIX support would have to start with reverting that > > commit (and perhaps the followup f0827b443). > Going ahead, we want to build the changes that were removed as part of > “0b16bb8776bb8”, with latest Xlc and gcc version. > > We were building the source from the postgres ftp > server(https://ftp.postgresql.org/pub/source/), would like to understand if > there are any source level changes between the ftp server and the source on > github? To succeed in this endeavor, you'll need to develop fluency in the tools to answer questions like that, or bring in someone who is fluent. In this case, GNU diff is the standard tool for answering your question. These resources cover other topics you would need to learn: https://wiki.postgresql.org/wiki/Developer_FAQ https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
Re: Can't compile PG 17 (master) from git under Msys2 autoconf
On 2024-Apr-04, Regina Obe wrote: > I think I got something not too far off from what's there now that works > under my msys2 setup again. This is partly using your idea of using > $(top_builddir) to qualify the path but in the LN_S section that is causing > me grief. > This seems to work okay building in tree and out of tree. > By changing these lines in src/backend/Makefile > > $(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h > rm -f '$@' > - $(LN_S) ../../backend/$< '$@' > + $(LN_S) $(top_builddir)/src/backend/$< '$@' > > $(top_builddir)/src/include/utils/wait_event_types.h: > utils/activity/wait_event_types.h > rm -f '$@' > - $(LN_S) ../../backend/$< '$@' > + $(LN_S) $(top_builddir)/src/backend/$< '$@' > > I've also attached as a patch. Hmm, that's quite strange ... it certainly doesn't work for me. Maybe the LN_S utility is resolving the symlink at creation time, rather than letting it be a reference to be resolved later. Apparently, the only Msys2 buildfarm animal is now using Meson, so we don't have any representative animal for your situation. What does LN_S do for you anyway? Looking at https://stackoverflow.com/questions/61594025/symlink-in-msys2-copy-or-hard-link it sounds like this would work if the MSYS environment variable was set to winsymlinks (or maybe not. I don't know if a "windows shortcut" would be usable in this case.) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Nadie está tan esclavizado como el que se cree libre no siéndolo" (Goethe)
Re: incremental backup breakage in BlockRefTableEntryGetBlocks
On Fri, Apr 5, 2024 at 2:59 AM Jakub Wartak wrote: > And of course i'm attaching reproducer with some braindump notes in > case in future one hits similiar issue and wonders where to even start > looking (it's very primitive though but might help). Thanks. I've committed the patch now. -- Robert Haas EDB: http://www.enterprisedb.com
Re: LogwrtResult contended spinlock
Pushed 0001. Here's the patch that adds the Copy position one more time, with the monotonic_advance function returning the value. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ >From 3f5c860576245b92701e7bfc517947c418c68510 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 5 Apr 2024 13:52:44 +0200 Subject: [PATCH v17] Add logCopyResult Updated using monotonic increment --- src/backend/access/transam/xlog.c | 32 ++- src/include/port/atomics.h| 36 +++ 2 files changed, 67 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b4499cda7b..a40c1fb79e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -469,6 +469,7 @@ typedef struct XLogCtlData XLogRecPtr lastSegSwitchLSN; /* These are accessed using atomics -- info_lck not needed */ + pg_atomic_uint64 logCopyResult; /* last byte + 1 copied to WAL buffers */ pg_atomic_uint64 logWriteResult; /* last byte + 1 written out */ pg_atomic_uint64 logFlushResult; /* last byte + 1 flushed */ @@ -1499,6 +1500,7 @@ static XLogRecPtr WaitXLogInsertionsToFinish(XLogRecPtr upto) { uint64 bytepos; + XLogRecPtr copyptr; XLogRecPtr reservedUpto; XLogRecPtr finishedUpto; XLogCtlInsert *Insert = &XLogCtl->Insert; @@ -1507,6 +1509,11 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto) if (MyProc == NULL) elog(PANIC, "cannot wait without a PGPROC structure"); + /* check if there's any work to do */ + copyptr = pg_atomic_read_u64(&XLogCtl->logCopyResult); + if (upto <= copyptr) + return copyptr; + /* Read the current insert position */ SpinLockAcquire(&Insert->insertpos_lck); bytepos = Insert->CurrBytePos; @@ -1586,6 +1593,10 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto) if (insertingat != InvalidXLogRecPtr && insertingat < finishedUpto) finishedUpto = insertingat; } + + finishedUpto = + pg_atomic_monotonic_advance_u64(&XLogCtl->logCopyResult, finishedUpto); + return finishedUpto; } @@ -1727,13 +1738,24 @@ WALReadFromBuffers(char *dstbuf, XLogRecPtr startptr, Size count, { char *pdst = dstbuf; XLogRecPtr recptr = startptr; + XLogRecPtr copyptr; Size nbytes = count; if (RecoveryInProgress() || tli != GetWALInsertionTimeLine()) return 0; Assert(!XLogRecPtrIsInvalid(startptr)); - Assert(startptr + count <= LogwrtResult.Write); + + /* + * Caller should ensure that the requested data has been copied to WAL + * buffers before we try to read it. + */ + copyptr = pg_atomic_read_u64(&XLogCtl->logCopyResult); + if (startptr + count > copyptr) + ereport(ERROR, +errmsg("request to read past end of generated WAL; requested %X/%X, current position %X/%X", + LSN_FORMAT_ARGS(startptr + count), + LSN_FORMAT_ARGS(copyptr))); /* * Loop through the buffers without a lock. For each buffer, atomically @@ -2571,13 +2593,19 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) { XLogRecPtr Flush; XLogRecPtr Write; + XLogRecPtr Copy; Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); pg_read_barrier(); Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); + pg_read_barrier(); + Copy = pg_atomic_read_u64(&XLogCtl->logCopyResult); /* WAL written to disk is always ahead of WAL flushed */ Assert(Write >= Flush); + + /* WAL copied to buffers is always ahead of WAL written */ + Assert(Copy >= Write); } #endif } @@ -4951,6 +4979,7 @@ XLOGShmemInit(void) SpinLockInit(&XLogCtl->Insert.insertpos_lck); SpinLockInit(&XLogCtl->info_lck); + pg_atomic_init_u64(&XLogCtl->logCopyResult, InvalidXLogRecPtr); pg_atomic_init_u64(&XLogCtl->logWriteResult, InvalidXLogRecPtr); pg_atomic_init_u64(&XLogCtl->logFlushResult, InvalidXLogRecPtr); pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr); @@ -5979,6 +6008,7 @@ StartupXLOG(void) * because no other process can be reading or writing WAL yet. */ LogwrtResult.Write = LogwrtResult.Flush = EndOfLog; + pg_atomic_write_u64(&XLogCtl->logCopyResult, EndOfLog); pg_atomic_write_u64(&XLogCtl->logWriteResult, EndOfLog); pg_atomic_write_u64(&XLogCtl->logFlushResult, EndOfLog); XLogCtl->LogwrtRqst.Write = EndOfLog; diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h index ff47782cdb..78987f3154 100644 --- a/src/include/port/atomics.h +++ b/src/include/port/atomics.h @@ -570,6 +570,42 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_) return pg_atomic_sub_fetch_u64_impl(ptr, sub_); } +/* + * Monotonically advance the given variable using only atomic operations until + * it's at least the target value. Returns the latest value observed, which + * may or may not be the target value. + * + * Full barrier semantics (even when value is unchanged). + */ +static inline uint64 +pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_) +{ + uint64 c
Re: Streaming read-ready sequential scan code
On Fri, Apr 5, 2024 at 12:15 AM Thomas Munro wrote: > > Yeah, I plead benchmarking myopia, sorry. The fastpath as committed > is only reached when distance goes 2->1, as pg_prewarm does. Oops. > With the attached minor rearrangement, it works fine. I also poked > some more at that memory prefetcher. Here are the numbers I got on a > desktop system (Intel i9-9900 @ 3.1GHz, Linux 6.1, turbo disabled, > cpufreq governor=performance, 2MB huge pages, SB=8GB, consumer NMVe, > GCC -O3). > > create table t (i int, filler text) with (fillfactor=10); > insert into t > select g, repeat('x', 900) from generate_series(1, 56) g; > vacuum freeze t; > set max_parallel_workers_per_gather = 0; > > select count(*) from t; > > cold = must be read from actual disk (Linux drop_caches) > warm = read from linux page cache > hot = already in pg cache via pg_prewarm > > cold warmhot > master2479ms 886ms 200ms > seqscan 2498ms 716ms 211ms <-- regression > seqscan + fastpath2493ms 711ms 200ms <-- fixed, I think? > seqscan + memprefetch 2499ms 716ms 182ms > seqscan + fastpath + memprefetch 2505ms 710ms 170ms <-- \O/ > > Cold has no difference. That's just my disk demonstrating Linux RA at > 128kB (default); random I/O is obviously a more interesting story. > It's consistently a smidgen faster with Linux RA set to 2MB (as in > blockdev --setra 4096 /dev/nvmeXXX), and I believe this effect > probably also increases on fancier faster storage than what I have on > hand: > > cold > master1775ms > seqscan + fastpath + memprefetch 1700ms > > Warm is faster as expected (fewer system calls schlepping data > kernel->userspace). > > The interesting column is hot. The 200ms->211ms regression is due to > the extra bookkeeping in the slow path. The rejiggered fastpath code > fixes it for me, or maybe sometimes shows an extra 1ms. Phew. Can > you reproduce that? I am able to reproduce the fast path solving the issue using Heikki's example here [1] but in shared buffers (hot). master: 25 ms stream read: 29 ms stream read + fast path: 25 ms I haven't looked into or reviewed the memory prefetching part. While reviewing 0002, I realized that I don't quite see how read_stream_get_block() will be used in the fastpath -- which it claims in its comments. read_stream_next_buffer() is the only caller of read_stream_look_ahead()->read_stream_get_block(), and if fast_path is true, read_stream_next_buffer() always returns before calling read_stream_look_ahead(). Maybe I am missing something. I see fast_path uses read_stream_fill_blocknums() to invoke the callback. Oh and why does READ_STREAM_DISABLE_FAST_PATH macro exist? Otherwise 0002 looks good to me. I haven't reviewed 0003 or 0004. I attached a new version (v11) because I noticed an outdated comment in my seq scan streaming read user patch (0001). The other patches in the set are untouched from your versions besides adding author/reviewer info in commit message for 0002. - Melanie [1] https://www.postgresql.org/message-id/3b0f3701-addd-4629-9257-cf28e1a6e6a1%40iki.fi From acbd7172f10857d49d4b5d4afc4efab704b33486 Mon Sep 17 00:00:00 2001 From: David Rowley Date: Mon, 10 Jul 2023 11:22:34 +0200 Subject: [PATCH v11 3/4] Add pg_prefetch_mem() macro to load cache lines. Initially mapping to GCC, Clang and MSVC builtins. Discussion: https://postgr.es/m/CAEepm%3D2y9HM9QP%2BHhRZdQ3pU6FShSMyu%3DV1uHXhQ5gG-dketHg%40mail.gmail.com --- config/c-compiler.m4 | 17 configure | 40 ++ configure.ac | 3 +++ meson.build| 1 + src/include/c.h| 8 src/include/pg_config.h.in | 3 +++ 6 files changed, 72 insertions(+) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 3268a780bb0..4cc02f97601 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -355,6 +355,23 @@ AC_DEFINE_UNQUOTED(AS_TR_CPP([HAVE$1]), 1, [Define to 1 if your compiler understands $1.]) fi])# PGAC_CHECK_BUILTIN_FUNC +# PGAC_CHECK_BUILTIN_VOID_FUNC +# --- +# Variant for void functions. +AC_DEFUN([PGAC_CHECK_BUILTIN_VOID_FUNC], +[AC_CACHE_CHECK(for $1, pgac_cv$1, +[AC_LINK_IFELSE([AC_LANG_PROGRAM([ +void +call$1($2) +{ +$1(x); +}], [])], +[pgac_cv$1=yes], +[pgac_cv$1=no])]) +if test x"${pgac_cv$1}" = xyes ; then +AC_DEFINE_UNQUOTED(AS_TR_CPP([HAVE$1]), 1, + [Define to 1 if your compiler understands $1.]) +fi])# PGAC_CHECK_BUILTIN_VOID_FUNC # PGAC_CHECK_BUILTIN_FUNC_PTR diff --git a/configure b/configure index 36feeafbb23..79b78c33ddc 100755 --- a/configure +++ b/configure @@ -15543,6 +15543,46 @@ _ACEOF fi +# Can we use a built-in to prefetch memory? +{ $as_echo "$as_me:${a
RE: Can't compile PG 17 (master) from git under Msys2 autoconf
> > I think I got something not too far off from what's there now that works > under my msys2 setup again. This is partly using your idea of using > $(top_builddir) to qualify the path but in the LN_S section that is causing me > grief. > > This seems to work okay building in tree and out of tree. > > By changing these lines in src/backend/Makefile > > > > $(top_builddir)/src/include/storage/lwlocknames.h: > storage/lmgr/lwlocknames.h > > rm -f '$@' > > - $(LN_S) ../../backend/$< '$@' > > + $(LN_S) $(top_builddir)/src/backend/$< '$@' > > > > $(top_builddir)/src/include/utils/wait_event_types.h: > utils/activity/wait_event_types.h > > rm -f '$@' > > - $(LN_S) ../../backend/$< '$@' > > + $(LN_S) $(top_builddir)/src/backend/$< '$@' > > > > I've also attached as a patch. > > Hmm, that's quite strange ... it certainly doesn't work for me. Maybe the > LN_S utility is resolving the symlink at creation time, rather than letting > it be a > reference to be resolved later. Apparently, the only Msys2 buildfarm animal > is > now using Meson, so we don't have any representative animal for your > situation. > > What does LN_S do for you anyway? Looking at > https://stackoverflow.com/questions/61594025/symlink-in-msys2-copy-or- > hard-link > it sounds like this would work if the MSYS environment variable was set to > winsymlinks (or maybe not. I don't know if a "windows shortcut" would be > usable in this case.) > > -- > Álvaro Herrera PostgreSQL Developer — > https://www.EnterpriseDB.com/ > "Nadie está tan esclavizado como el que se cree libre no siéndolo" (Goethe) I think it ends up doing a copy thus the copy error in my log failures which don't exist anywhere in the Makefil cp -pR ../../backend/storage/lmgr/lwlocknames.h Sorry for not checking on a linux system. I was thinking I should have done that first. Thanks, Regina
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
> > > Plus, you've got all of the consequences for non-core drivers, which > have to both add support for the new wire protocol - if they don't > want to seem outdated and eventually obsolete - and also test that > they're still compatible with all supported server versions. > Connection poolers have the same set of problems. The whole thing is > almost a hole with no bottom. Keeping up with core changes in this > area could become a massive undertaking for lots and lots of people, > some of whom may be the sole maintainer of some important driver that > now needs a whole bunch of work. > We already have this in many places. Headers or functions change and extensions have to fix their code. catalog changes force drivers to change their code. This argument blocks any improvement to the protocol. I don't think it's unreasonable to expect maintainers to keep up. We could make it easier by having a specific list for maintainers, but that doesn't change the work. > I'm not sure how much it improves things if we imagine adding feature > flags to the existing protocol versions, rather than whole new > protocol versions, but at least it cuts down on the assumption that > adopting new features is mandatory, and that such features are > cumulative. If a driver wants to support TDE but not protocol > parameters or protocol parameters but not TDE, who are we to say no? > If in supporting those things we bump the protocol version to 3.2, and > then 3.3 fixes a huge performance problem, are drivers going to be > required to add support for features they don't care about to get the > performance fixes? I see some benefit in bumping the protocol version > for major changes, or for changes that we have an important reason to > make mandatory, or to make previously-optional features for which > support has become in practical terms universal part of the base > feature set. But I'm very skeptical of the idea that we should just > handle as many things as possible via a protocol version bump. We've > been avoiding protocol version bumps like the plague since forever, > and swinging all the way to the other extreme doesn't sound like the > right idea to me. > +1 for not swinging too far here. But I don't think it should be a non starter. Dave
Re: Can't compile PG 17 (master) from git under Msys2 autoconf
On 2024-Apr-05, Regina Obe wrote: > I think it ends up doing a copy thus the copy error in my log failures which > don't exist anywhere in the Makefil > > cp -pR ../../backend/storage/lmgr/lwlocknames.h > > Sorry for not checking on a linux system. I was thinking I should have done > that first. Ah yeah, that's per configure: if ln -s conf$$.file conf$$ 2>/dev/null; then as_ln_s='ln -s' # ... but there are two gotchas: # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail. # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable. # In both cases, we have to default to `cp -pR'. ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe || as_ln_s='cp -pR' I guess we need to patch the rule so that the LN_S is called so that it'd resolve correctly in both cases. I guess the easy option is to go back to the original recipe and update the comment to indicate that we do it to placate Msys2. Here it is with the old comment: -# The point of the prereqdir incantation in some of the rules below is to -# force the symlink to use an absolute path rather than a relative path. -# For headers which are generated by make distprep, the actual header within -# src/backend will be in the source tree, while the symlink in src/include -# will be in the build tree, so a simple ../.. reference won't work. -# For headers generated during regular builds, we prefer a relative symlink. $(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h - prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \ -cd '$(dir $@)' && rm -f $(notdir $@) && \ -$(LN_S) "$$prereqdir/$(notdir $<)" . Maybe it's possible to make this simpler, as it looks overly baroque, and we don't really need absolute paths anyway -- we just need the path resolved at the right time. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: [HACKERS] make async slave to wait for lsn to be replayed
Hello, BTW I noticed that https://coverage.postgresql.org/src/backend/commands/waitlsn.c.gcov.html says that lsn_cmp is not covered by the tests. This probably indicates that the tests are a little too light, but I'm not sure how much extra effort we want to spend. I'm still concerned that WaitLSNCleanup is only called in ProcKill. Does this mean that if a process throws an error while waiting, it'll not get cleaned up until it exits? Maybe this is not a big deal, but it seems odd. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Now I have my system running, not a byte was off the shelf; It rarely breaks and when it does I fix the code myself. It's stable, clean and elegant, and lightning fast as well, And it doesn't cost a nickel, so Bill Gates can go to hell."
Re: LogwrtResult contended spinlock
On Fri, 2024-04-05 at 13:54 +0200, Alvaro Herrera wrote: > Couldn't push: I tested with --disable-atomics --disable-spinlocks > and > the tests fail because the semaphore for the atomic variables is not > always initialized. This is weird -- it's like a client process is > running at a time when StartupXLOG has not initialized the variable > ... > so the initialization in the other place was not completely wrong. It looks like you solved the problem and pushed the first patch. Looks good to me. Patch 0002 also looks good to me, after a mostly-trivial rebase (just remember to initialize logCopyResult). Minor comments: * You could consider using a read barrier before reading the Copy ptr, or using the pg_atomic_read_membarrier_u64() variant. I don't see a correctness problem, but it might be slightly more clear and I don't see a lot of downside. * Also, I assume that the Max() call in pg_atomic_monotonic_advance_u64() is just for clarity? Surely the currval cannot be less than _target when it returns. I'd probably just do Assert(currval >= _target) and then return currval. Regards, Jeff Davis
Re: Security lessons from liblzma
On Fri, Apr 5, 2024 at 6:24 AM Robert Haas wrote: > I wonder how hard it would be to just code up our own binary to do > this. If it'd be a pain to do that, or to maintain it across SSL > versions, then it's a bad plan and we shouldn't do it. But if it's not > that much code, maybe it'd be worth considering. I think my biggest concern, other than the maintenance costs, would be the statement "we know SSL works on Windows because we test it against some certificates we hand-rolled ourselves." We can become experts in certificate formats, of course, but... does it buy us much? If someone comes and complains that a certificate doesn't work correctly (as they have *very* recently [3]), I would like to be able to write a test that uses what OpenSSL actually generates as opposed to learning how to make it myself first. > I'm also sort of afraid that we're getting sucked into thinking real > hard about this SSL certificate issue rather than trying to brainstorm > all the other places that might be problematic. The latter might be a > more fruitful exercise (or maybe not, what do I know?). +1. Don't get me wrong: I spent a lot of time refactoring the sslfiles machinery a while back, and I would love for it to all go away. I don't really want to interrupt any lines of thought that are moving in that direction. Please continue. _And also:_ the xz attack relied on a long chain of injections, both technical and social. I'm still wrapping my head around Russ Cox's writeup [1, 2], but the "hidden blob of junk" is only a single part of all that. I'm not even sure it was a necessary part; it just happened to work well for this particular project and line of attack. I've linked Russ Cox in particular because Golang has made a bunch of language-level decisions with the supply chain in mind, including the philosophy that a build should ideally not be able to run arbitrary code at all, and therefore generated files _must_ be checked in. I remember $OLDJOB having buildbots that would complain if the contents of the file you checked in didn't match what was (reproducibly!) generated. I think there's a lot more to think about here. --Jacob [1] https://research.swtch.com/xz-timeline [2] https://research.swtch.com/xz-script [3] https://www.postgresql.org/message-id/flat/17760-b6c61e752ec07060%40postgresql.org
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed All three patches applied nivcely. Code fits standart, comments are relevant. The new status of this patch is: Ready for Committer
Re: Fixing backslash dot for COPY FROM...CSV
"Daniel Verite" writes: > Tom Lane wrote: >> Not sure what to do here. One idea is to install just the psql-side >> fix, which should break nothing now that version-2 protocol is dead, >> and then wait a few years before introducing the server-side change. >> That seems kind of sad though. > Wouldn't backpatching solve this? No, it'd just reduce the surface area a bit. People on less-than- the-latest-minor-release would still have the issue. In any case back-patching further than v14 would be a nonstarter, because we didn't remove protocol v2 support till then. However, the analogy to "\d commands might fail against a newer server" reduces my level of concern quite a lot: it's hard to draw much of a line separating that kind of issue from "inline COPY CSV will fail against a newer server". It's not like such failures won't be obvious and fairly easy to diagnose. regards, tom lane
Obsolete comment in CopyReadLineText()
CopyReadLineText quoth: * The objective of this loop is to transfer the entire next input line * into line_buf. Hence, we only care for detecting newlines (\r and/or * \n) and the end-of-copy marker (\.). * * In CSV mode, \r and \n inside a quoted field are just part of the data * value and are put in line_buf. We keep just enough state to know if we * are currently in a quoted field or not. * * These four characters, and the CSV escape and quote characters, are * assumed the same in frontend and backend encodings. When that last bit was written, it was because we were detecting newlines and end-of-copy markers before performing encoding conversion. That's not true any more: by the time CopyReadLineText sees the data, it was already converted by CopyConvertBuf. So I don't believe there actually is any such dependency anymore, and we should simply remove that last sentence. Any objections? regards, tom lane
Re: WIP Incremental JSON Parser
On 2024-04-05 Fr 11:43, Nathan Bossart wrote: On Fri, Apr 05, 2024 at 10:15:45AM -0400, Andrew Dunstan wrote: On 2024-04-04 Th 15:54, Nathan Bossart wrote: On Thu, Apr 04, 2024 at 03:31:12PM -0400, Andrew Dunstan wrote: Does the attached patch fix it for you? It clears up 2 of the 3 warnings for me: ../postgresql/src/common/jsonapi.c: In function ‘IsValidJsonNumber’: ../postgresql/src/common/jsonapi.c:2018:30: error: ‘dummy_lex.inc_state’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 2018 | if (lex->incremental && !lex->inc_state->is_last_chunk && | Thanks, please try this instead. LGTM, thanks! Thanks for checking. Pushed. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: meson vs windows perl
On 2024-04-05 Fr 10:12, Andrew Dunstan wrote: On 2024-04-05 Fr 08:25, Andrew Dunstan wrote: Here is an attempt to fix all that. It's ugly, but I think it's more principled. First, instead of getting the ldopts and then trying to filter out the ldflags and ccdlflags, it tells perl not to include those in the first place, by overriding a couple of routines in ExtUtils::Embed. And second, it's smarter about splitting what's left, so that it doesn't split on a space that's in a quoted item. The perl that's used to do that second bit is not pretty, but it has been tested on the system where the problem arose and apparently cures the problem. (No doubt some perl guru could improve it.) It also works on my Ubuntu system, so I don't think we'll be breaking anything (famous last words). Apparently I spoke too soon. Please ignore the above for now. OK, this has been fixed and checked. The attached is what I propose. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/meson.build b/meson.build index 87437960bc..be87ef6950 100644 --- a/meson.build +++ b/meson.build @@ -993,21 +993,15 @@ if not perlopt.disabled() # Config's ccdlflags and ldflags. (Those are the choices of those who # built the Perl installation, which are not necessarily appropriate # for building PostgreSQL.) -ldopts = run_command(perl, '-MExtUtils::Embed', '-e', 'ldopts', check: true).stdout().strip() -undesired = run_command(perl_conf_cmd, 'ccdlflags', check: true).stdout().split() -undesired += run_command(perl_conf_cmd, 'ldflags', check: true).stdout().split() - -perl_ldopts = [] -foreach ldopt : ldopts.split(' ') - if ldopt == '' or ldopt in undesired -continue - endif - - perl_ldopts += ldopt.strip('"') -endforeach - +ldopts = run_command(perl, '-MExtUtils::Embed', '-e', '*ExtUtils::Embed::_ldflags = *Extutils::Embed::_ccdlflags = sub { return q[]; }; ldopts', + check: true).stdout().strip() +# avoid use of " char in this perl mini-program to avoid possibly +# confusing the Windows command processor +perl_ldopts = run_command(perl, '-MEnglish', '-e', + 'my $arg = shift; while ($arg =~ /\S/) { if ($arg =~ /^\s*([^\042 ]+)(?![\042])/) { print qq[$1\n]; $arg = $POSTMATCH;} elsif ($arg =~ /^\s*\042([^\042]+)\042/) { print qq[$1\n]; $arg = $POSTMATCH;} }', + '--', ldopts, check: true).stdout().strip().split('\n') message('LDFLAGS recommended by perl: "@0@"'.format(ldopts)) -message('LDFLAGS for embedding perl: "@0@"'.format(' '.join(perl_ldopts))) +message('LDFLAGS for embedding perl:\n"@0@"'.format('\n'.join(perl_ldopts))) perl_dep_int = declare_dependency( compile_args: perl_ccflags,
Re: Fixing backslash dot for COPY FROM...CSV
After some more poking at this topic, I realize that there is already very strange and undocumented behavior for backslash-dot even in non-CSV mode. Create a file like this: $ cat eofdata foobar foobaz\. more \. yet more and try importing it with COPY: regression=# create table eofdata(f1 text); CREATE TABLE regression=# copy eofdata from '/home/tgl/pgsql/eofdata'; COPY 2 regression=# table eofdata; f1 foobar foobaz (2 rows) That's what you get in 9.0 and earlier versions, and it's already not-as-documented, because we claim that only \. alone on a line is an EOF marker; we certainly don't suggest that what's in front of it will be taken as valid data. However, somebody broke it some more in 9.1, because 9.1 up to HEAD produce this result: regression=# create table eofdata(f1 text); CREATE TABLE regression=# copy eofdata from '/home/tgl/pgsql/eofdata'; COPY 3 regression=# table eofdata; f1 foobar foobaz more (3 rows) So the current behavior is that \. that is on the end of a line, but is not the whole line, is silently discarded and we keep going. All versions throw "end-of-copy marker corrupt" if there is something after \. on the same line. This is sufficiently weird that I'm starting to come around to Daniel's original proposal that we just drop the server's recognition of \. altogether (which would allow removal of some dozens of lines of complicated and now known-buggy code). Alternatively, we could fix it so that \. at the end of a line draws "end-of-copy marker corrupt", which would at least make things consistent, but I'm not sure that has any great advantage. I surely don't want to document the current behavioral details as being the right thing that we're going to keep doing. Thoughts? regards, tom lane
Re: Improve eviction algorithm in ReorderBuffer
On Fri, 2024-04-05 at 16:58 +0900, Masahiko Sawada wrote: > IIUC for example in ReorderBuffer, the heap key is transaction size > and the hash key is xid. Yes. > > I see your point. It assumes that the bh_node_type is a pointer or at > least unique. So it cannot work with Datum being a value. Right. One option might just be to add some comments explaining the API and limitations, but in general I feel it's confusing to hash a pointer without a good reason. > > It sounds like a data structure that mixes the hash table and the > binary heap and we use it as the main storage (e.g. for > ReorderBufferTXN) instead of using the binary heap as the secondary > data structure. IIUC with your idea, the indexed binary heap has a > hash table to store elements each of which has its index within the > heap node array. I guess it's better to create it as a new data > structure rather than extending the existing binaryheap, since APIs > could be very different. I might be missing something, though. You are right that this approach starts to feel like a new data structure and is not v17 material. I am interested in this for v18 though -- we could make the API more like simplehash to be more flexible when using values (rather than pointers) and to be able to inline the comparator. > > * remove the hash table from binaryheap.c > > > > * supply a new callback to the binary heap with type like: > > > > typedef void (*binaryheap_update_index)( > > bh_node_type node, > > int new_element_index); > > > > * make the remove, update_up, and update_down methods take the > > element > > index rather than the pointer ... > This shows that the current indexed binaryheap is much slower than > the > other two implementations, and the xx_binaryheap has a good number in > spite of also being indexed. xx_binaryheap isn't indexed though, right? > When it comes to > implementing the above idea (i.e. changing binaryheap to > xx_binaryheap), it was simple since we just replace the code where we > update the hash table with the code where we call the callback, if we > get the consensus on API change. That seems reasonable to me. > The fact that we use simplehash for the internal hash table might > make > this idea complex. If I understand your suggestion correctly, the > caller needs to tell the hash table the hash function when creating a > binaryheap but the hash function needs to be specified at a compile > time. We can use a dynahash instead but it would make the binaryheap > slow further. simplehash.h supports private_data, which makes it easier to track a callback. In binaryheap.c, that would look something like: static inline uint32 binaryheap_hash(bh_nodeidx_hash *tab, uint32 key) { binaryheap_hashfunc hashfunc = tab->private_data; return hashfunc(key); } ... #define SH_HASH_KEY(tb, key) binaryheap_hash(tb, key) ... binaryheap_allocate(int num_nodes, binaryheap_comparator compare, void *arg, binaryheap_hashfunc hashfunc) { ... if (hashfunc != NULL) { /* could have a new structure, but we only need to * store one pointer, so don't bother with palloc/pfree */ void *private_data = (void *)hashfunc; heap->bh_nodeidx = bh_nodeidx_create(..., private_data); ... And in reorderbuffer.c, define the callback like: static uint32 reorderbuffer_xid_hash(TransactionId xid) { /* fasthash32 is 'static inline' so may * be faster than hash_bytes()? */ return fasthash32(&xid, sizeof(TransactionId), 0); } In summary, there are two viable approaches for addressing the concerns in v17: 1. Provide callback to update ReorderBufferTXN->heap_element_index, and use that index (rather than the pointer) for updating the heap key (transaction size) or removing elements from the heap. 2. Provide callback for hashing, so that binaryheap.c can hash the xid value rather than the pointer. I don't have a strong opinion about which one to use. I prefer something closer to #1 for v18, but for v17 I suggest whichever one comes out simpler. Regards, Jeff Davis
fasthash32() returning uint64?
In hashfn_unstable.h, fasthash32() is declared as: /* like fasthash64, but returns a 32-bit hashcode */ static inline uint64 fasthash32(const char *k, size_t len, uint64 seed) Is the return type of uint64 a typo? Regards, Jeff Davis
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Fri, Apr 5, 2024 at 9:59 AM Daniel Gustafsson wrote: > Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 fails > on Windows in CI which I will investigate next. The changes for SSL_OP_NO_CLIENT_RENEGOTIATION and SSL_R_VERSION_TOO_LOW look good to me. > -Remove support for OpenSSL 1.0.2 and 1.1.0 > +Remove support for OpenSSL 1.0.2 I modified the commit message while working on v3 and forgot to put it back before posting, sorry. > +++ b/src/include/pg_config.h.in > @@ -84,9 +84,6 @@ > /* Define to 1 if you have the header file. */ > #undef HAVE_CRTDEFS_H > > -/* Define to 1 if you have the `CRYPTO_lock' function. */ > -#undef HAVE_CRYPTO_LOCK An autoreconf run on my machine pulls in more changes (getting rid of the symbols we no longer check for). --Jacob
Re: IPC::Run::time[r|out] vs our TAP tests
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= writes: > Tom Lane writes: >> So it seems like the bug does not exist in any currently-supported >> NetBSD release. Debian has been known to ship obsolete libedit >> versions, though. > Both the current (bokworm/12) and previous (bullseye/11) versions of > Debian have new enough libedits to not be affected by this bug: > ... > But in bullseye they decided that OpenSSL is a system library as far as > the GPL is concerned, so are linking directly to readline. > And even before then their psql wrapper would LD_PRELOAD readline > instead if installed, so approximately nobody actually ever used psql > with libedit on Debian. Based on this info, I'm disinclined to put work into trying to make the case behave correctly with that old libedit version, or even to lobotomize the test case enough so it would pass. What I suggest Michael do with tanager is install the OS-version-appropriate version of GNU readline, so that the animal will test what ilmari describes as the actually common use-case. (I see that what he did for the moment is add --without-readline. Perhaps that's a decent long-term choice too, because I think we have rather little coverage of that option except on Windows.) regards, tom lane
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On 05.04.24 18:59, Daniel Gustafsson wrote: Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 fails on Windows in CI which I will investigate next. I'm not a fan of the new PGAC_CHECK_OPENSSL. It creates a second place where the OpenSSL version number has to be updated. We had this carefully constructed so that there is only one place that OPENSSL_API_COMPAT is defined and that is the only place that needs to be updated. We put the setting of OPENSSL_API_COMPAT into configure so that the subsequent OpenSSL tests would use it, and if the API number higher than what the library supports, the tests should fail. So I don't understand why the configure changes have to be so expansive.
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 5 Apr 2024, at 23:26, Peter Eisentraut wrote: > > On 05.04.24 18:59, Daniel Gustafsson wrote: >> Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 >> fails >> on Windows in CI which I will investigate next. > > I'm not a fan of the new PGAC_CHECK_OPENSSL. It creates a second place where > the OpenSSL version number has to be updated. We had this carefully > constructed so that there is only one place that OPENSSL_API_COMPAT is > defined and that is the only place that needs to be updated. We put the > setting of OPENSSL_API_COMPAT into configure so that the subsequent OpenSSL > tests would use it, and if the API number higher than what the library > supports, the tests should fail. But does that actually work? If I change the API_COMPAT to the 1.1.1 version number and run configure against 1.0.2 it passes just fine. Am I missing some clever trick here? The reason to expand the check is to ensure that we have the version we want for both OpenSSL and LibreSSL, and deprecating OpenSSL versions isn't all that commonly done so having to change the version in the check didn't seem that invasive to me. -- Daniel Gustafsson
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Fri, Apr 5, 2024 at 2:48 PM Daniel Gustafsson wrote: > But does that actually work? If I change the API_COMPAT to the 1.1.1 version > number and run configure against 1.0.2 it passes just fine. Am I missing some > clever trick here? Similarly, I changed my API_COMPAT to a nonsense 0x9010L and - a 1.1.1 build succeeds - a 3.0 build fails - LibreSSL doesn't appear to care or check that #define at all --Jacob
Re: Add last_commit_lsn to pg_stat_database
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Hello I think it is convenient to know the last commit LSN of a database for troubleshooting and monitoring purposes similar to the "pd_lsn" field in a page header that records the last LSN that modifies this page. I am not sure if it can help determine the WAL location to resume / catch up in logical replication as the "confirmed_flush_lsn" and "restart_lsn" in a logical replication slot are already there to figure out the resume / catchup point as I understand. Anyhow, I had a look at the patch, in general it looks good to me. Also ran it against the CI bot, which also turned out fine. Just one small comment though. The patch supports the recording of last commit lsn from 2 phase commit as well, but the test does not seem to have a test on 2 phase commit. In my opinion, it should test whether the last commit lsn increments when a prepared transaction is committed in addition to a regular transaction. thank you - Cary Huang www.highgo.ca
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 5 Apr 2024, at 22:55, Jacob Champion > wrote: > > On Fri, Apr 5, 2024 at 9:59 AM Daniel Gustafsson wrote: >> Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 >> fails >> on Windows in CI which I will investigate next. The attached version fixes the Windows 1.1.1 check which was missing the include directory. > The changes for SSL_OP_NO_CLIENT_RENEGOTIATION and > SSL_R_VERSION_TOO_LOW look good to me. Thanks! >> +++ b/src/include/pg_config.h.in >> @@ -84,9 +84,6 @@ >> /* Define to 1 if you have the header file. */ >> #undef HAVE_CRTDEFS_H >> >> -/* Define to 1 if you have the `CRYPTO_lock' function. */ >> -#undef HAVE_CRYPTO_LOCK > > An autoreconf run on my machine pulls in more changes (getting rid of > the symbols we no longer check for). Ah yes, missed updating before formatting the patch. Done in the attached. -- Daniel Gustafsson v5-0001-Remove-support-for-OpenSSL-1.0.2.patch Description: Binary data
RE: Can't compile PG 17 (master) from git under Msys2 autoconf
> > I think it ends up doing a copy thus the copy error in my log failures > > which don't exist anywhere in the Makefil > > > > cp -pR ../../backend/storage/lmgr/lwlocknames.h > > > > Sorry for not checking on a linux system. I was thinking I should have done > that first. > > Ah yeah, that's per configure: > > if ln -s conf$$.file conf$$ 2>/dev/null; then > as_ln_s='ln -s' > # ... but there are two gotchas: > # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail. > # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable. > # In both cases, we have to default to `cp -pR'. > ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe || > as_ln_s='cp -pR' > > I guess we need to patch the rule so that the LN_S is called so that it'd > resolve > correctly in both cases. I guess the easy option is to go back to the > original > recipe and update the comment to indicate that we do it to placate Msys2. > Here it is with the old comment: > > -# The point of the prereqdir incantation in some of the rules below is to > -# force the symlink to use an absolute path rather than a relative path. > -# For headers which are generated by make distprep, the actual header > within > -# src/backend will be in the source tree, while the symlink in src/include > -# will be in the build tree, so a simple ../.. reference won't work. > -# For headers generated during regular builds, we prefer a relative > symlink. > >$(top_builddir)/src/include/storage/lwlocknames.h: > storage/lmgr/lwlocknames.h > - prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \ > -cd '$(dir $@)' && rm -f $(notdir $@) && \ > -$(LN_S) "$$prereqdir/$(notdir $<)" . > > > Maybe it's possible to make this simpler, as it looks overly baroque, and we > don't really need absolute paths anyway -- we just need the path resolved at > the right time. > > -- > Álvaro HerreraBreisgau, Deutschland — > https://www.EnterpriseDB.com/ Yah I was thinking it was removed cause no one could figure out why it was so complicated and decided to make it more readable.
Re: Popcount optimization using AVX512
On Sat, 6 Apr 2024 at 04:38, Nathan Bossart wrote: > This seems to provide a small performance boost, so I've incorporated it > into v27. Won't Valgrind complain about this? +pg_popcount_avx512(const char *buf, int bytes) + buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf); + val = _mm512_maskz_loadu_epi8(mask, (const __m512i *) buf); David
DROP OWNED BY fails to clean out pg_init_privs grants
I wondered why buildfarm member copperhead has started to fail xversion-upgrade-HEAD-HEAD tests. I soon reproduced the problem here: pg_restore: creating ACL "regress_pg_dump_schema.TYPE "test_type"" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 4355; 0 0 ACL TYPE "test_type" buildfarm pg_restore: error: could not execute query: ERROR: role "74603" does not exist Command was: SELECT pg_catalog.binary_upgrade_set_record_init_privs(true); GRANT ALL ON TYPE "regress_pg_dump_schema"."test_type" TO "74603"; SELECT pg_catalog.binary_upgrade_set_record_init_privs(false); REVOKE ALL ON TYPE "regress_pg_dump_schema"."test_type" FROM "74603"; (So now I'm wondering why *only* copperhead has shown this so far. Are our other cross-version-upgrade testing animals AWOL?) I believe this is a longstanding problem that was exposed by accident by commit 936e3fa37. If you run "make installcheck" in HEAD's src/test/modules/test_pg_dump, and then poke around in the leftover contrib_regression database, you can find dangling grants in pg_init_privs: contrib_regression=# table pg_init_privs; objoid | classoid | objsubid | privtype |initprivs +--+--+--+-- --- ... es} 43134 | 1259 |0 | e| {postgres=rwU/postgres,43125=U/postgr es} 43128 | 1259 |0 | e| {postgres=arwdDxtm/postgres,43125=r/p ostgres} ... The fact that the DROP ROLE added by 936e3fa37 succeeded indicates that these role references weren't captured in pg_shdepend. I imagine that we also lack code that would allow DROP OWNED BY to follow up on such entries if they existed, but I've not checked that for sure. In any case, there's probably a nontrivial amount of code to be written to make this work. Given the lack of field complaints, I suspect that extension scripts simply don't grant privileges to random roles that aren't the extension's owner. So I wonder a little bit if this is even worth fixing, as opposed to blocking off somehow. But probably we should first try to fix it. I doubt this is something we'll have fixed by Monday, so I will go add an open item for it. regards, tom lane
Re: Streaming read-ready sequential scan code
On Sat, Apr 6, 2024 at 6:55 AM Melanie Plageman wrote: > On Fri, Apr 5, 2024 at 12:15 AM Thomas Munro wrote: > > The interesting column is hot. The 200ms->211ms regression is due to > > the extra bookkeeping in the slow path. The rejiggered fastpath code > > fixes it for me, or maybe sometimes shows an extra 1ms. Phew. Can > > you reproduce that? > > I am able to reproduce the fast path solving the issue using Heikki's > example here [1] but in shared buffers (hot). > > master: 25 ms > stream read: 29 ms > stream read + fast path: 25 ms Great, thanks. > I haven't looked into or reviewed the memory prefetching part. > > While reviewing 0002, I realized that I don't quite see how > read_stream_get_block() will be used in the fastpath -- which it > claims in its comments. Comments improved. > Oh and why does READ_STREAM_DISABLE_FAST_PATH macro exist? Just for testing purposes. Behaviour should be identical to external observers either way, it's just a hand-rolled specialisation for certain parameters, and it's useful to be able to verify that and measure the speedup. I think it's OK to leave a bit of developer/testing scaffolding in the tree if it's likely to be useful again and especially if like this case it doesn't create any dead code. (Perhaps in later work we might find the right way to get the compiler to do the specialisation? It's so simple though.) The occasional CI problem I mentioned turned out to be read_stream_reset() remembering a little too much state across rescans. Fixed. Thanks both for the feedback on the ring buffer tweaks. Comments updated. Here is the full stack of patches I would like to commit very soon, though I may leave the memory prefetching part out for a bit longer to see if I can find any downside. From 54efd755040507b55672092907d53b4db30f5a06 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 5 Apr 2024 12:08:24 +1300 Subject: [PATCH v12 1/6] Improve read_stream.c's fast path. Unfortunately the "fast path" for cached scans that don't do any I/O was accidentally coded in a way that could only be triggered by pg_prewarm's usage patter. We really need it to work for the streaming sequential scan patch. Refactor. Reviewed-by: Melanie Plageman Discussion: https://postgr.es/m/CA%2BhUKGKXZALJ%3D6aArUsXRJzBm%3Dqvc4AWp7%3DiJNXJQqpbRLnD_w%40mail.gmail.com --- src/backend/storage/aio/read_stream.c | 75 +++ 1 file changed, 31 insertions(+), 44 deletions(-) diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index 4f21262ff5e..b9e11a28312 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -169,7 +169,7 @@ get_per_buffer_data(ReadStream *stream, int16 buffer_index) /* * Ask the callback which block it would like us to read next, with a small * buffer in front to allow read_stream_unget_block() to work and to allow the - * fast path to work in batches. + * fast path to skip this function and work directly from the array. */ static inline BlockNumber read_stream_get_block(ReadStream *stream, void *per_buffer_data) @@ -578,13 +578,12 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data) if (likely(stream->fast_path)) { BlockNumber next_blocknum; - bool need_wait; /* Fast path assumptions. */ Assert(stream->ios_in_progress == 0); Assert(stream->pinned_buffers == 1); Assert(stream->distance == 1); - Assert(stream->pending_read_nblocks == 1); + Assert(stream->pending_read_nblocks == 0); Assert(stream->per_buffer_data_size == 0); /* We're going to return the buffer we pinned last time. */ @@ -594,40 +593,29 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data) buffer = stream->buffers[oldest_buffer_index]; Assert(buffer != InvalidBuffer); - /* - * Pin a buffer for the next call. Same buffer entry, and arbitrary - * I/O entry (they're all free). - */ - need_wait = StartReadBuffer(&stream->ios[0].op, - &stream->buffers[oldest_buffer_index], - stream->pending_read_blocknum, - stream->advice_enabled ? - READ_BUFFERS_ISSUE_ADVICE : 0); - - /* Choose the block the next call will pin. */ + /* Choose the next block to pin. */ if (unlikely(stream->blocknums_next == stream->blocknums_count)) read_stream_fill_blocknums(stream); next_blocknum = stream->blocknums[stream->blocknums_next++]; - /* - * Fast return if the next call doesn't require I/O for the buffer we - * just pinned, and we have a block number to give it as a pending - * read. - */ - if (likely(!need_wait && next_blocknum != InvalidBlockNumber)) + if (likely(next_blocknum != InvalidBlockNumber)) { - stream->pending_read_blocknum = next_blocknum; - return buffer; - } - - /* - * For anything more complex, set up some more state and take the slow - * path next time. - */ - stream->fast_path = false; +
Re: Is this a problem in GenericXLogFinish()?
On Fri, Apr 05, 2024 at 01:26:29PM +0900, Michael Paquier wrote: > On Wed, Feb 07, 2024 at 06:42:21PM +0900, Michael Paquier wrote: >> On Wed, Feb 07, 2024 at 02:08:42PM +0530, Amit Kapila wrote: >> > Thanks for the report and looking into it. Pushed! >> >> Thanks Amit for the commit and Kuroda-san for the patch! > > I have been pinged about this thread and I should have looked a bit > closer, but wait a minute here. I have forgotten to mention that Zubeyr Eryilmaz, a colleague, should be credited here. -- Michael signature.asc Description: PGP signature