Re: pg_walfile_name_offset can return inconsistent values
Hi, On 2023-11-09 16:14:07 -0500, Bruce Momjian wrote: > On Thu, Nov 9, 2023 at 09:49:48PM +0100, Matthias van de Meent wrote: > > Either way, I think fix #1 is most correct (as was attached in > > offset2.diff, and quoted verbatim here), because that has no chance of > > having surprising underflowing behaviour when you use '0/0'::lsn as > > input. > > Attached is the full patch that changes pg_walfile_name_offset() and > pg_walfile_name(). There is no need for doc changes. I think this needs to add tests "documenting" the changed behaviour and perhaps also for a few other edge cases. You could e.g. test SELECT * FROM pg_walfile_name_offset('0/0') which today returns bogus values, and which is independent of the wal segment size. And with SELECT setting::int8 AS segment_size FROM pg_settings WHERE name = 'wal_segment_size' \gset you can test real things too, e.g.: SELECT segment_number, file_offset FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size), pg_split_walfile_name(file_name); SELECT segment_number, file_offset FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size + 1), pg_split_walfile_name(file_name); SELECT segment_number, file_offset = :segment_size - 1 FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size - 1), pg_split_walfile_name(file_name); Greetings, Andres Freund
Re: remaining sql/json patches
Hi Erik, On Sat, Nov 11, 2023 at 11:52 Erik Rijkers wrote: > Hi, > > At the moment, what is the patchset to be tested? The latest SQL/JSON > server I have is from September, and it's become unclear to me what > belongs to the SQL/JSON patchset. It seems to me cfbot erroneously > shows green because it successfully compiles later detail-patches (i.e., > not the SQL/JSON set itself). Please correct me if I'm wrong and it is > in fact possible to derive from cfbot a patchset that are the ones to > use to build the latest SQL/JSON server. I’ll be posting a new set that addresses Andres’ comments early next week. >
Re: locked reads for atomics
On Fri, Nov 10, 2023 at 06:48:39PM -0800, Andres Freund wrote: > Yes. We should optimize pg_atomic_exchange_u32() one of these days - it can be > done *far* faster than a cmpxchg. When I was adding the atomic abstraction > there was concern with utilizing too many different atomic instructions. I > didn't really agree back then, but these days I really don't see a reason to > not use a few more intrinsics. I might give this a try, if for no other reason than it'd force me to improve my mental model of this stuff. :) >> It refers to "unlocked writes," but this isn't >> pg_atomic_unlocked_write_u32_impl(). The original thread for this comment >> [0] doesn't offer any hints, either. Does "unlocked" mean something >> different here, such as "write without any barrier semantics?" > > It's just about not using the spinlock. If we were to *not* use a spinlock > here, we'd break pg_atomic_compare_exchange_u32(), because the > spinlock-implementation of pg_atomic_compare_exchange_u32() needs to actually > be able to rely on no concurrent changes to the value to happen. Thanks for clarifying. I thought it might've been hinting at something beyond the compare/exchange implications. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: remaining sql/json patches
Hi, At the moment, what is the patchset to be tested? The latest SQL/JSON server I have is from September, and it's become unclear to me what belongs to the SQL/JSON patchset. It seems to me cfbot erroneously shows green because it successfully compiles later detail-patches (i.e., not the SQL/JSON set itself). Please correct me if I'm wrong and it is in fact possible to derive from cfbot a patchset that are the ones to use to build the latest SQL/JSON server. Thanks! Erik
Re: locked reads for atomics
Hi, On 2023-11-10 20:38:13 -0600, Nathan Bossart wrote: > On Fri, Nov 10, 2023 at 03:11:50PM -0800, Andres Freund wrote: > > On 2023-11-10 14:51:28 -0600, Nathan Bossart wrote: > >> + * This read is guaranteed to read the current value, > > > > It doesn't guarantee that *at all*. What it guarantees is solely that the > > current CPU won't be doing something that could lead to reading an outdated > > value. To actually ensure the value is up2date, the modifying side also > > needs > > to have used a form of barrier (in the form of fetch_add, compare_exchange, > > etc or an explicit barrier). > > Okay, I think I was missing that this doesn't work along with > pg_atomic_write_u32() because that doesn't have any barrier semantics > (probably because the spinlock version does). IIUC you'd want to use > pg_atomic_exchange_u32() to write the value instead, which seems to really > just be another compare/exchange under the hood. Yes. We should optimize pg_atomic_exchange_u32() one of these days - it can be done *far* faster than a cmpxchg. When I was adding the atomic abstraction there was concern with utilizing too many different atomic instructions. I didn't really agree back then, but these days I really don't see a reason to not use a few more intrinsics. > Speaking of the spinlock implementation of pg_atomic_write_u32(), I've been > staring at this comment for a while and can't make sense of it: > > void > pg_atomic_write_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 val) > { > /* >* One might think that an unlocked write doesn't need to > acquire the >* spinlock, but one would be wrong. Even an unlocked write has > to cause a >* concurrent pg_atomic_compare_exchange_u32() (et al) to fail. >*/ > SpinLockAcquire((slock_t *) &ptr->sema); > ptr->value = val; > SpinLockRelease((slock_t *) &ptr->sema); > } > > It refers to "unlocked writes," but this isn't > pg_atomic_unlocked_write_u32_impl(). The original thread for this comment > [0] doesn't offer any hints, either. Does "unlocked" mean something > different here, such as "write without any barrier semantics?" It's just about not using the spinlock. If we were to *not* use a spinlock here, we'd break pg_atomic_compare_exchange_u32(), because the spinlock-implementation of pg_atomic_compare_exchange_u32() needs to actually be able to rely on no concurrent changes to the value to happen. Greetings, Andres Freund
Re: locked reads for atomics
On Fri, Nov 10, 2023 at 03:11:50PM -0800, Andres Freund wrote: > On 2023-11-10 14:51:28 -0600, Nathan Bossart wrote: >> + * This read is guaranteed to read the current value, > > It doesn't guarantee that *at all*. What it guarantees is solely that the > current CPU won't be doing something that could lead to reading an outdated > value. To actually ensure the value is up2date, the modifying side also needs > to have used a form of barrier (in the form of fetch_add, compare_exchange, > etc or an explicit barrier). Okay, I think I was missing that this doesn't work along with pg_atomic_write_u32() because that doesn't have any barrier semantics (probably because the spinlock version does). IIUC you'd want to use pg_atomic_exchange_u32() to write the value instead, which seems to really just be another compare/exchange under the hood. Speaking of the spinlock implementation of pg_atomic_write_u32(), I've been staring at this comment for a while and can't make sense of it: void pg_atomic_write_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 val) { /* * One might think that an unlocked write doesn't need to acquire the * spinlock, but one would be wrong. Even an unlocked write has to cause a * concurrent pg_atomic_compare_exchange_u32() (et al) to fail. */ SpinLockAcquire((slock_t *) &ptr->sema); ptr->value = val; SpinLockRelease((slock_t *) &ptr->sema); } It refers to "unlocked writes," but this isn't pg_atomic_unlocked_write_u32_impl(). The original thread for this comment [0] doesn't offer any hints, either. Does "unlocked" mean something different here, such as "write without any barrier semantics?" [0] https://postgr.es/m/14947.1475690465%40sss.pgh.pa.us -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Adding facility for injection points (or probe points?) for more advanced tests
Hi, On 2023-10-25 13:13:38 +0900, Michael Paquier wrote: > So, please find attached a patch set that introduces an in-core > facility to be able to set what I'm calling here an "injection point", > that consists of being able to register in shared memory a callback > that can be run within a defined location of the code. It means that > it is not possible to trigger a callback before shared memory is set, > but I've faced far more the case where I wanted to trigger something > after shmem is set anyway. Persisting an injection point across > restarts is also possible by adding some through an extension's shmem > hook, as we do for custom LWLocks for example, as long as a library is > loaded. I would like to see a few example tests using this facility - without that it's a bit hard to judge how the impact on core code would be and how easy tests are to write. It also seems like there's a few bits and pieces missing to actually be able to write interesting tests. It's one thing to be able to inject code, but what you commonly want to do for tests is to actually wait for such a spot in the code to be reached, then perhaps wait inside the "modified" code, and do something else in the test script. But as-is a decent amount of C code would need to be written to write such a test, from what I can tell? Greetings, Andres Freund
Re: Why do indexes and sorts use the database collation?
Hi, On 2023-11-10 16:03:16 -0800, Jeff Davis wrote: > An "en_US" user doing: > >CREATE TABLE foo(t TEXT PRIMARY KEY); > > is providing no indication that they want an index tailored to their > locale. Yet we are creating the index with the "en_US" collation and > therefore imposing huge performance costs (something like 2X slower > index build time than the "C" locale), and also huge dependency > versioning risks that could lead to index corruption and/or wrong > results. I guess you are arguing that the user didn't intend to create an index here? I don't think that is true - users know that pkeys create indexes. If we used C here, users would often need to create a second index on the same column using the actual database collation - I think we'd very commonly end up with complaints that the pkey index doesn't work, because it's been automatically created with a different collation than the column. Also, wouldn't the intent to use a different collation for the column be expressed by changing the column's collation? > Similarly, a user doing: > >SELECT DISTINCT t FROM bar; > > is providing no indication that they care about the collation of "t" > (we are free to choose a HashAgg which makes no ordering guarantee at > all). Yet if we choose Sort+GroupAgg, the Sort will be performed in the > "en_US" locale, which is something like 2X slower than the "C" locale. OTOH, if we are choosing a groupagg, we might be able to implement that using an index, which is more likey to exist in the databases collation. Looks like we even just look for indexes that are in the database collation. Might be worth teaching the planner additional smarts here. > Thoughts? I seriously doubt its a good idea to change which collations primary keys use by default. But I think there's a decent bit of work we could do in the planner, e.g: - Teach the planner to take collation costs into account for costing - right now index scans with "C" cost the same as index scans with more expensive collations. That seems wrong even for equality lookups and would make it hard to make improvements to prefer cheaper collations in other situations. - Teach the planner to use cheaper collations when ordering for reasons other than the user's direct request (e.g. DISTINCT/GROUP BY, merge joins). I think we should also explain in our docs that C can be considerably faster - I couldn't find anything in a quick look. Greetings, Andres Freund
Re: maybe a type_sanity. sql bug
On Sat, Nov 11, 2023 at 08:00:00AM +0800, jian he wrote: > I am not sure the pg_class "relam" description part is correct. since > partitioned indexes (relkind "I") also have the access method, but no > storage. > " > If this is a table or an index, the access method used (heap, B-tree, > hash, etc.); otherwise zero (zero occurs for sequences, as well as > relations without storage, such as views) > " This should be adjusted as well in the docs, IMO. I would propose something slightly more complicated: " If this is a table, index, materialized view or partitioned index, the access method used (heap, B-tree, hash, etc.); otherwise zero (zero occurs for sequences, as well as relations without storage, like views). " > diff --git a/src/test/regress/sql/type_sanity.sql > b/src/test/regress/sql/type_sanity.sql > index a546ba89..6d806941 100644 > --- a/src/test/regress/sql/type_sanity.sql > +++ b/src/test/regress/sql/type_sanity.sql Ahah, nice catches. I'll go adjust that on HEAD like the other one you pointed out. Just note that materialized views have a relam defined, so the first comment you have changed is not completely correct. -- Michael signature.asc Description: PGP signature
Re: A recent message added to pg_upgade
On Fri, Nov 10, 2023 at 03:27:25PM +0530, Amit Kapila wrote: > I don't think this comment is correct because there won't be any apply > activity on the new cluster as after restoration subscriptions should > be disabled. On the old cluster, I think one problem is that the > origins may move forward after we copy them which can cause data > inconsistency issues. The other is that we may not prefer to generate > additional data and WAL during the upgrade. Also, I am not completely > sure about using the word 'corruption' in this context. What is your suggestion here? Would it be better to just aim for simplicity and just say that we don't want it to run because "it can lead to some inconsistent behaviors"? -- Michael signature.asc Description: PGP signature
Why do indexes and sorts use the database collation?
An "en_US" user doing: CREATE TABLE foo(t TEXT PRIMARY KEY); is providing no indication that they want an index tailored to their locale. Yet we are creating the index with the "en_US" collation and therefore imposing huge performance costs (something like 2X slower index build time than the "C" locale), and also huge dependency versioning risks that could lead to index corruption and/or wrong results. Similarly, a user doing: SELECT DISTINCT t FROM bar; is providing no indication that they care about the collation of "t" (we are free to choose a HashAgg which makes no ordering guarantee at all). Yet if we choose Sort+GroupAgg, the Sort will be performed in the "en_US" locale, which is something like 2X slower than the "C" locale. One of the strongest arguments for using a non-C collation in these cases is the chance to use a non-deterministic collation, like a case- insensitive one. But the database collation is always deterministic, and all deterministic collations have exactly the same definition of equality, so there's no reason not to use "C". Another argument is that, if the column is the database collation and the index is "C", then the index is unusable for text range scans, etc. But there are two ways to solve that problem: 1. Set the column collation to "C"; or 2. Set the index collation to the database collation. Range scans are often most useful when the text is not actually natural language, but instead is some kind of formatted text representing another type of thing, often in ASCII. In that case, the range scan is really some kind of prefix search or partitioning, and the "C" locale is probably the right thing to use, and #1 wins. Granted, there are reasons to want an index to have a particular collation, in which case it makes sense to opt-in to #2. But in the common case, the high performance costs and dependency versioning risks aren't worth it. Thoughts? Regards, Jeff Davis
Re: maybe a type_sanity. sql bug
looking around. I found other three minor issues. attached. I am not sure the pg_class "relam" description part is correct. since partitioned indexes (relkind "I") also have the access method, but no storage. " If this is a table or an index, the access method used (heap, B-tree, hash, etc.); otherwise zero (zero occurs for sequences, as well as relations without storage, such as views) " diff --git a/src/test/regress/sql/type_sanity.sql b/src/test/regress/sql/type_sanity.sql index a546ba89..6d806941 100644 --- a/src/test/regress/sql/type_sanity.sql +++ b/src/test/regress/sql/type_sanity.sql @@ -364,22 +364,22 @@ WHERE relkind NOT IN ('r', 'i', 'S', 't', 'v', 'm', 'c', 'f', 'p', 'I') OR relpersistence NOT IN ('p', 'u', 't') OR relreplident NOT IN ('d', 'n', 'f', 'i'); --- All tables and indexes should have an access method. +-- All tables and indexes, partitioned indexes should have an access method. SELECT c1.oid, c1.relname FROM pg_class as c1 -WHERE c1.relkind NOT IN ('S', 'v', 'f', 'c') and +WHERE c1.relkind NOT IN ('S', 'v', 'f', 'c','p') and c1.relam = 0; --- Conversely, sequences, views, types shouldn't have them +-- Conversely, sequences, views, types, partitioned table shouldn't have them SELECT c1.oid, c1.relname FROM pg_class as c1 -WHERE c1.relkind IN ('S', 'v', 'f', 'c') and +WHERE c1.relkind IN ('S', 'v', 'f', 'c','p') and c1.relam != 0; --- Indexes should have AMs of type 'i' +-- Indexes, partitioned indexes should have AMs of type 'i' SELECT pc.oid, pc.relname, pa.amname, pa.amtype FROM pg_class as pc JOIN pg_am AS pa ON (pc.relam = pa.oid) -WHERE pc.relkind IN ('i') and +WHERE pc.relkind IN ('i','I') and pa.amtype != 'i'; -- Tables, matviews etc should have AMs of type 't'
Re: Force the old transactions logs cleanup even if checkpoint is skipped
Hi, On 2023-11-09 11:50:10 +, Zakhlystov, Daniil (Nebius) wrote: > > On 9 Nov 2023, at 01:30, Michael Paquier wrote: > > > > I am not really convinced that this is worth complicating the skipped > > path for this goal. In my experience, I've seen complaints where WAL > > archiving bloat was coming from the archive command not able to keep > > up with the amount generated by the backend, particularly because the > > command invocation was taking longer than it takes to generate a new > > segment. Even if there is a hole of activity in the server, if too > > much WAL has been generated it may not be enough to catch up depending > > on the number of segments that need to be processed. Others are free > > to chime in with extra opinions, of course. > > I agree that there might multiple reasons of pg_wal bloat. Please note that > I am not addressing the WAL archiving issue at all. My proposal is to add a > small improvement to the WAL cleanup routine for WALs that have been already > archived successfully to free the disk space. > > Yes, it might be not a common case, but a fairly realistic one. It occurred > multiple times > in our production when we had temporary issues with archiving. This small > complication of the skipped path will help Postgres to return to a normal > operational > state without any human operator / external control routine intervention. I agree that the scenario is worth addressing - it's quite a nasty situation. But I'm not sure this is the way to address it. If a checkpoint does have to happen, we might not get to the point of removing the old segments, because we might fail to emit the WAL record due to running out of space. And if that doesn't happen - do we really want to wait till a checkpoint finishes to free up space? What if we instead made archiver delete WAL files after archiving, if they're old enough? Some care would be needed to avoid checkpointer and archiver trampling on each other, but that doesn't seem too hard. Greetings, Andres Freund
Re: ResourceOwner refactoring
Hi, On 2023-11-10 16:26:51 +0200, Heikki Linnakangas wrote: > The quick, straightforward fix is to move the "CurrentResourceOwner = NULL" > line earlier in CommitTransaction, per attached > 0003-Clear-CurrentResourceOwner-earlier-in-CommitTransact.patch. You're not > allowed to use the resource owner after you start to release it; it was a > bit iffy even before the ResourceOwner rewrite but now it's explicitly > forbidden. By clearing CurrentResourceOwner as soon as we start releasing > it, we can prevent any accidental use. > > When CurrentResourceOwner == NULL, dsm_attach() returns a handle that's not > associated with any ResourceOwner. That's appropriate for the pgstat case. > The DSA is "pinned", so the handle is forgotten from the ResourceOwner right > after calling dsm_attach() anyway. Yea - I wonder if we should have a version of dsm_attach() that pins at the same time. It's pretty silly to first attach (remembering the dsm) and then immediately pin (forgetting the dsm). > Looking closer at dsa.c, I think this is a wider problem though. The > comments don't make it very clear how it's supposed to interact with > ResourceOwners. There's just this brief comment in dsa_pin_mapping(): > > > * By default, areas are owned by the current resource owner, which means > > they > > * are detached automatically when that scope ends. > > The dsa_area struct isn't directly owned by any ResourceOwner though. The > DSM segments created by dsa_create() or dsa_attach() are. But there's a relationship from the dsa too, via on_dsm_detach(). > But the functions dsa_allocate() and dsa_get_address() can create or attach > more DSM segments to the area, and they will be owned by the by the current > resource owner *at the time of the call*. Yea, that doesn't seem great. It shouldn't affect the stats could though, due the dsa being pinned. > I think that is surprising behavior from the DSA facility. When you make > allocations with dsa_allocate() or just call dsa_get_address() on an > existing dsa_pointer, you wouldn't expect the current resource owner to > matter. I think dsa_create/attach() should store the current resource owner > in the dsa_area, for use in subsequent operations on the DSA, per attached > patch (0002-Fix-dsa.c-with-different-resource-owners.patch). Yea, that seems the right direction from here. Greetings, Andres Freund
Re: locked reads for atomics
Hi, On 2023-11-10 21:49:06 +, John Morris wrote: > >I wonder if it's worth providing a set of "locked read" functions. > > Most out-of-order machines include “read acquire” and “write release” which > are pretty close to what you’re suggesting. Is that really true? It's IA64 lingo. X86 doesn't have them, while arm has more granular barriers, they don't neatly map onto acquire/release either. I don't think an acquire here would actually be equivalent to making this a full barrier - an acquire barrier allows moving reads or stores from *before* the barrier to be moved after the barrier. It just prevents the opposite. And for proper use of acquire/release semantics we'd need to pair operations much more closely. Right now we often rely on another preceding memory barrier to ensure correct ordering, having to use paired operations everywhere would lead to slower code. I thoroughly dislike how strongly C++11/C11 prefer paired atomics *on the same address* over "global" fences. It often leads to substantially slower code. And they don't at all map neatly on hardware, where largely barrier semantics are *not* tied to individual addresses. And the fence specification is just about unreadable (although I think they did fix some of the worst issues). Greetings, Andres Freund
Re: locked reads for atomics
Hi, On 2023-11-10 14:51:28 -0600, Nathan Bossart wrote: > Moving this to a new thread and adding it to the January commitfest. > > On Thu, Nov 09, 2023 at 03:27:33PM -0600, Nathan Bossart wrote: > > On Tue, Nov 07, 2023 at 04:58:16PM -0800, Andres Freund wrote: > >> However, even if there's likely some other implied memory barrier that we > >> could piggyback on, the patch much simpler to understand if it doesn't > >> change > >> coherency rules. There's no way the overhead could matter. > > > > I wonder if it's worth providing a set of "locked read" functions. Those > > could just do a compare/exchange with 0 in the generic implementation. For > > patches like this one where the overhead really shouldn't matter, I'd > > encourage folks to use those to make it easy to reason about correctness. > > Concretely, like this. I don't think "locked" is a good name - there's no locking. I think that'll deter their use, because it will make it sound more expensive than they are. pg_atomic_read_membarrier_u32()? > @@ -228,7 +228,8 @@ pg_atomic_init_u32(volatile pg_atomic_uint32 *ptr, uint32 > val) > * The read is guaranteed to return a value as it has been written by this or > * another process at some point in the past. There's however no cache > * coherency interaction guaranteeing the value hasn't since been written to > - * again. > + * again. Consider using pg_atomic_locked_read_u32() unless you have a > strong > + * reason (e.g., performance) to use unlocked reads. I think that's too strong an endorsement. Often there will be no difference in difficulty analysing correctness, because the barrier pairing / the interaction with the surrounding code needs to be analyzed just as much. > * No barrier semantics. > */ > @@ -239,6 +240,24 @@ pg_atomic_read_u32(volatile pg_atomic_uint32 *ptr) > return pg_atomic_read_u32_impl(ptr); > } > > +/* > + * pg_atomic_read_u32 - locked read from atomic variable. Un-updated name... > + * This read is guaranteed to read the current value, It doesn't guarantee that *at all*. What it guarantees is solely that the current CPU won't be doing something that could lead to reading an outdated value. To actually ensure the value is up2date, the modifying side also needs to have used a form of barrier (in the form of fetch_add, compare_exchange, etc or an explicit barrier). > +#ifndef PG_HAVE_ATOMIC_LOCKED_READ_U32 > +#define PG_HAVE_ATOMIC_LOCKED_READ_U32 > +static inline uint32 > +pg_atomic_locked_read_u32_impl(volatile pg_atomic_uint32 *ptr) > +{ > + uint32 old = 0; > + > + /* > + * In the generic implementation, locked reads are implemented as a > + * compare/exchange with 0. That'll fail or succeed, but always return > the > + * most up-to-date value. It might also store a 0, but only if the > + * previous value was also a zero, i.e., harmless. > + */ > + pg_atomic_compare_exchange_u32_impl(ptr, &old, 0); > + > + return old; > +} > +#endif I suspect implementing it with an atomic fetch_add of 0 would be faster... Greetings, Andres Freund
Re: locked reads for atomics
On Fri, Nov 10, 2023 at 09:49:06PM +, John Morris wrote: > Most out-of-order machines include “read acquire” and “write release” > which are pretty close to what you’re suggesting. With the current > routines, we only have “read relaxed” and “write relaxed”. I think > implementing acquire/release semantics is a very good idea, We do have both pg_atomic_write_u32() and pg_atomic_unlocked_write_u32() (see commit b0779ab), but AFAICT those only differ in the fallback/spinlock implementations. I suppose there could be an unlocked 64-bit write on platforms that have 8-byte single-copy atomicity but still need to use the fallback/spinlock implementation for some reason, but that might be a bit of a stretch, and the use-cases might be few and far between... > I would also like to clarify the properties of atomics. One very > important question: Are atomics also volatile? The PostgreSQL atomics support appears to ensure they are volatile. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Failure during Building Postgres in Windows with Meson
Hi, On November 10, 2023 10:53:12 AM PST, Tristan Partin wrote: >On Thu Nov 9, 2023 at 9:31 AM CST, Nazir Bilal Yavuz wrote: >> Hi, >> >> On Thu, 9 Nov 2023 at 18:27, Tristan Partin wrote: >> > >> > Can you try with Meson v1.2.3? >> >> I tried with Meson v1.2.3 and upstream, both failed with the same error. >An employee at Collabora produced a fix[0]. If I understand correctly, you can thus work around the problem by enabling use of precompiled headers. Which also explains why CI didn't show this - normally on Windows you want to use pch. > It might still be worthwhile however to see why it happens in one shell and > not the other. It's gcc vs msvc due to path. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: pgsql: Don't trust unvalidated xl_tot_len.
On Sat, Nov 11, 2023 at 10:42 AM Christoph Berg wrote: > > I haven't investigated the details yet, and it's not affecting the > > builds on apt.postgresql.org, but the Debian amd64 and i386 regression > > tests just failed this test on PG13 (11 and 15 are ok): > > 12 and 14 are also failing, now on Debian unstable. (Again, only in > the salsa.debian.org tests, not on apt.postgresql.org's buildds.) > > 12 amd64: https://salsa.debian.org/postgresql/postgresql/-/jobs/4898857 > 12 i386: https://salsa.debian.org/postgresql/postgresql/-/jobs/4898858 > > The tests there are running in Docker containers. Hmm. regress_log_039_end_of_wal says: No such file or directory at /builds/postgresql/postgresql/debian/output/source_dir/build/../src/test/perl/TestLib.pm line 655. In the 13 branch we see that's in the new scan_server_header() subroutine where it tries to open the header, after asking pg_config --includedir-server where it lives. Hmm...
Re: locked reads for atomics
>I wonder if it's worth providing a set of "locked read" functions. Most out-of-order machines include “read acquire” and “write release” which are pretty close to what you’re suggesting. With the current routines, we only have “read relaxed” and “write relaxed”. I think implementing acquire/release semantics is a very good idea, I would also like to clarify the properties of atomics. One very important question: Are atomics also volatile? If so, the compiler has very limited ability to move them around. If not, it is difficult to tell when or where they will take place unless the surrounding code is peppered with barriers.
Re: pgsql: Don't trust unvalidated xl_tot_len.
Re: To Thomas Munro > > src/test/recovery/t/039_end_of_wal.pl | 460 > > > > I haven't investigated the details yet, and it's not affecting the > builds on apt.postgresql.org, but the Debian amd64 and i386 regression > tests just failed this test on PG13 (11 and 15 are ok): 12 and 14 are also failing, now on Debian unstable. (Again, only in the salsa.debian.org tests, not on apt.postgresql.org's buildds.) 12 amd64: https://salsa.debian.org/postgresql/postgresql/-/jobs/4898857 12 i386: https://salsa.debian.org/postgresql/postgresql/-/jobs/4898858 The tests there are running in Docker containers. Christoph
Re: Atomic ops for unlogged LSN
On Thu, Nov 09, 2023 at 03:27:33PM -0600, Nathan Bossart wrote: > I wonder if it's worth providing a set of "locked read" functions. Those > could just do a compare/exchange with 0 in the generic implementation. For > patches like this one where the overhead really shouldn't matter, I'd > encourage folks to use those to make it easy to reason about correctness. I moved this proposal to a new thread [0]. [0] https://postgr.es/m/20231110205128.GB1315705%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
locked reads for atomics
Moving this to a new thread and adding it to the January commitfest. On Thu, Nov 09, 2023 at 03:27:33PM -0600, Nathan Bossart wrote: > On Tue, Nov 07, 2023 at 04:58:16PM -0800, Andres Freund wrote: >> However, even if there's likely some other implied memory barrier that we >> could piggyback on, the patch much simpler to understand if it doesn't change >> coherency rules. There's no way the overhead could matter. > > I wonder if it's worth providing a set of "locked read" functions. Those > could just do a compare/exchange with 0 in the generic implementation. For > patches like this one where the overhead really shouldn't matter, I'd > encourage folks to use those to make it easy to reason about correctness. Concretely, like this. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h index bbff945eba..21a6edac3e 100644 --- a/src/include/port/atomics.h +++ b/src/include/port/atomics.h @@ -228,7 +228,8 @@ pg_atomic_init_u32(volatile pg_atomic_uint32 *ptr, uint32 val) * The read is guaranteed to return a value as it has been written by this or * another process at some point in the past. There's however no cache * coherency interaction guaranteeing the value hasn't since been written to - * again. + * again. Consider using pg_atomic_locked_read_u32() unless you have a strong + * reason (e.g., performance) to use unlocked reads. * * No barrier semantics. */ @@ -239,6 +240,24 @@ pg_atomic_read_u32(volatile pg_atomic_uint32 *ptr) return pg_atomic_read_u32_impl(ptr); } +/* + * pg_atomic_read_u32 - locked read from atomic variable. + * + * This read is guaranteed to read the current value, but note that there's no + * guarantee that the value isn't updated between when this function returns + * and when you try to use it. Note that this is less performant than an + * unlocked read (i.e., pg_atomic_read_u32()), but it is generally much easier + * to reason about correctness with locked reads. + * + * Full barrier semantics. + */ +static inline uint32 +pg_atomic_locked_read_u32(volatile pg_atomic_uint32 *ptr) +{ + AssertPointerAlignment(ptr, 4); + return pg_atomic_locked_read_u32_impl(ptr); +} + /* * pg_atomic_write_u32 - write to atomic variable. * @@ -429,6 +448,15 @@ pg_atomic_read_u64(volatile pg_atomic_uint64 *ptr) return pg_atomic_read_u64_impl(ptr); } +static inline uint64 +pg_atomic_locked_read_u64(volatile pg_atomic_uint64 *ptr) +{ +#ifndef PG_HAVE_ATOMIC_U64_SIMULATION + AssertPointerAlignment(ptr, 8); +#endif + return pg_atomic_locked_read_u64_impl(ptr); +} + static inline void pg_atomic_write_u64(volatile pg_atomic_uint64 *ptr, uint64 val) { diff --git a/src/include/port/atomics/generic.h b/src/include/port/atomics/generic.h index cb5804adbf..5bb3aea9b7 100644 --- a/src/include/port/atomics/generic.h +++ b/src/include/port/atomics/generic.h @@ -49,6 +49,25 @@ pg_atomic_read_u32_impl(volatile pg_atomic_uint32 *ptr) } #endif +#ifndef PG_HAVE_ATOMIC_LOCKED_READ_U32 +#define PG_HAVE_ATOMIC_LOCKED_READ_U32 +static inline uint32 +pg_atomic_locked_read_u32_impl(volatile pg_atomic_uint32 *ptr) +{ + uint32 old = 0; + + /* + * In the generic implementation, locked reads are implemented as a + * compare/exchange with 0. That'll fail or succeed, but always return the + * most up-to-date value. It might also store a 0, but only if the + * previous value was also a zero, i.e., harmless. + */ + pg_atomic_compare_exchange_u32_impl(ptr, &old, 0); + + return old; +} +#endif + #ifndef PG_HAVE_ATOMIC_WRITE_U32 #define PG_HAVE_ATOMIC_WRITE_U32 static inline void @@ -325,6 +344,25 @@ pg_atomic_read_u64_impl(volatile pg_atomic_uint64 *ptr) #endif /* PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY && !PG_HAVE_ATOMIC_U64_SIMULATION */ #endif /* PG_HAVE_ATOMIC_READ_U64 */ +#ifndef PG_HAVE_ATOMIC_LOCKED_READ_U64 +#define PG_HAVE_ATOMIC_LOCKED_READ_U64 +static inline uint64 +pg_atomic_locked_read_u64_impl(volatile pg_atomic_uint64 *ptr) +{ + uint64 old = 0; + + /* + * In the generic implementation, locked reads are implemented as a + * compare/exchange with 0. That'll fail or succeed, but always return the + * most up-to-date value. It might also store a 0, but only if the + * previous value was also a zero, i.e., harmless. + */ + pg_atomic_compare_exchange_u64_impl(ptr, &old, 0); + + return old; +} +#endif + #ifndef PG_HAVE_ATOMIC_INIT_U64 #define PG_HAVE_ATOMIC_INIT_U64 static inline void
Re: pg_dump needs SELECT privileges on irrelevant extension table
On Thu, Nov 9, 2023 at 11:02 AM Tom Lane wrote: > I'm hearing nothing but crickets :-( Yeah :/ Based on your arguments above, it sounds like your patch may improve several other corner cases when backported, so that sounds good overall to me. My best guess is that Timescale will be happy with this patch's approach. But I can't speak with any authority. Aleks -- anything to add? --Jacob
Re: Adding facility for injection points (or probe points?) for more advanced tests
On Tue, Nov 07, 2023 at 05:01:16PM +0900, Michael Paquier wrote: > On Mon, Nov 06, 2023 at 10:28:14PM +0300, Nazir Bilal Yavuz wrote: >> I liked the idea; thanks for working on this! +1, this seems very useful. > +#ifdef USE_INJECTION_POINTS > +#define INJECTION_POINT_RUN(name) InjectionPointRun(name) > +#else > +#define INJECTION_POINT_RUN(name) ((void) name) > +#endif nitpick: Why is the non-injection-point version "(void) name"? I see "(void) true" used elsewhere for this purpose. > + > + Here is an example of callback for > + InjectionPointCallback: > + > +static void > +custom_injection_callback(const char *name) > +{ > +elog(NOTICE, "%s: executed custom callback", name); > +} > + Why do we provide the name to the callback functions? Overall, the design looks pretty good to me. I think it's a good idea to keep it simple to start with. Since this is really only intended for special tests that run in special builds, it seems like we ought to be able to change it easily in the future as needed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Add recovery to pg_control and remove backup_label
On 11/10/23 00:37, Michael Paquier wrote: On Tue, Nov 07, 2023 at 05:20:27PM +0900, Michael Paquier wrote: On Mon, Nov 06, 2023 at 05:39:02PM -0400, David Steele wrote: I've retested today, and miss the failure. I'll let you know if I see this again. I've done a few more dozen runs, and still nothing. I am wondering what this disturbance was. OK, hopefully it was just a blip. If we set these fields where backup_label was renamed, the logic would not be exactly the same since pg_control won't be updated until the next time through the loop. Since the fields should be updated before UpdateControlFile() I thought it made sense to keep all the updates together. Overall I think it is simpler, and we don't need to acquire a lock on ControlFile. What you are proposing is the same as what we already do for backupEndRequired or backupStartPoint in the control file when initializing recovery, so objection withdrawn. Thomas included this change in his pg_basebackup changes so I did the same. Maybe wait a bit before we split this out? Seems like a pretty small change... Seems like a pretty good argument for refactoring that now, and let any other patches rely on it. Would you like to send a separate patch? The split still looks worth doing seen from here, so I am switching the patch as WoA for now. This has been split out. Actually, grouping backupStartPointTLI and minRecoveryPointTLI should reduce more the size with some alignment magic, no? I thought about this, but it seemed to me that existing fields had been positioned to make the grouping logical rather than to optimize alignment, e.g. minRecoveryPointTLI. Ideally that would have been placed near backupEndRequired (or vice versa). But if the general opinion is to rearrange for alignment, I'm OK with that. I've not tested, but it looks like moving backupStartPointTLI after backupEndPoint should shave 8 bytes, if you want to maintain a more coherent group for the LSNs. OK, I have moved backupStartPointTLI. +* backupFromStandby indicates that the backup was taken on a standby. It is +* require to initialize recovery and set to false afterwards. s/require/required/. Fixed. The term "backup recovery", that we've never used in the tree until now as far as I know. Perhaps this recovery method should just be referred as "recovery from backup"? Well, "backup recovery" is less awkward, I think. For instance "backup recovery field" vs "recovery from backup field". By the way, there is another thing that this patch has forgotten: the SQL functions that display data from the control file. Shouldn't pg_control_recovery() be extended with the new fields? These fields may be less critical than the other ones related to recovery, but I suspect that showing them can become handy at least for debugging and monitoring purposes. I guess that depends on whether we reset them or not (discussion below). Right now they would not be visible since by the time the user could log on they would be reset. Something in this area is that backupRecoveryRequired is the switch controlling if the fields set by the recovery initialization. Could it be actually useful to leave the other fields as they are and only reset backupRecoveryRequired before the first control file update? This would leave a trace of the backup history directly in the control file. Since the other recovery fields are cleared in ReachedEndOfBackup() this would be a change from what we do now. None of these fields are ever visible (with the exception of minRecoveryPoint/TLI) since they are reset when the database becomes consistent and before logons are allowed. Viewing them with pg_controldata makes sense, but I'm not sure pg_control_recovery() does. In fact, are backup_start_lsn, backup_end_lsn, and end_of_backup_record_required ever non-zero when logged onto Postgres? Maybe I'm missing something? What about pg_resetwal and RewriteControlFile()? Shouldn't these recovery fields be reset as well? Done. git diff --check is complaining a bit. Fixed. New patches attached based on eb81e8e790. Regards, -DavidFrom d241a0e8aa589b3abf66331f8a3af0aabe87c214 Mon Sep 17 00:00:00 2001 From: David Steele Date: Fri, 10 Nov 2023 17:50:54 + Subject: Allow content size to be passed to sendFileWithContent(). sendFileWithContent() previously got the content length by using strlen(), but it is also possible to pass binary content. Use len == -1 to indicate that strlen() should be use to get the content length, otherwise honor the value in len. --- src/backend/backup/basebackup.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index b537f462197..f216b588422 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -94,7 +94,7 @@ static bool verify_page_checksum(Page page, XLogRecPtr start_lsn,
Re: Failure during Building Postgres in Windows with Meson
On Thu Nov 9, 2023 at 9:31 AM CST, Nazir Bilal Yavuz wrote: Hi, On Thu, 9 Nov 2023 at 18:27, Tristan Partin wrote: > > Can you try with Meson v1.2.3? I tried with Meson v1.2.3 and upstream, both failed with the same error. An employee at Collabora produced a fix[0]. It might still be worthwhile however to see why it happens in one shell and not the other. [0]: https://github.com/mesonbuild/meson/pull/12498 -- Tristan Partin Neon (https://neon.tech)
Re: Move bki file pre-processing from initdb to bootstrap
Thank you for review, Peter. Makes sense on the split part. Was starting to think in same lines, at the end of last iteration. Will come back shortly. On Fri, Nov 10, 2023 at 12:48 AM Peter Eisentraut wrote: > On 17.10.23 03:32, Krishnakumar R wrote: > >> The version comparison has been moved from initdb to bootstrap. This > >> created some compatibility problems with windows tests. For now I kept > >> the version check to not have \n added, which worked fine and serves > >> the purpose. However hoping to have something better in v3 in addition > >> to addressing any other comments. > > > > With help from Thomas, figured out that on windows fopen uses binary > > mode in the backend which causes issues with EOL. Please find the > > attached patch updated with a fix for this. > > I suggest that this patch set be split up into three incremental parts: > > 1. Move some build-time settings from initdb to postgres.bki. > 2. The database locale handling. > 3. The bki file handling. > > Each of these topics really needs a separate detailed consideration. > >
Re: pgsql: Don't trust unvalidated xl_tot_len.
Re: To Thomas Munro > I haven't investigated the details yet, and it's not affecting the > builds on apt.postgresql.org, but the Debian amd64 and i386 regression > tests just failed this test on PG13 (11 and 15 are ok): That's on Debian bullseye, fwiw. (But the 13 build on apt.pg.o/bullseye passed.) Christoph
Re: pgsql: Don't trust unvalidated xl_tot_len.
Re: Thomas Munro > Don't trust unvalidated xl_tot_len. > src/test/recovery/t/039_end_of_wal.pl | 460 I haven't investigated the details yet, and it's not affecting the builds on apt.postgresql.org, but the Debian amd64 and i386 regression tests just failed this test on PG13 (11 and 15 are ok): t/039_end_of_wal.pl .. Dubious, test returned 2 (wstat 512, 0x200) No subtests run Test Summary Report --- t/039_end_of_wal.pl(Wstat: 512 Tests: 0 Failed: 0) Non-zero exit status: 2 Parse errors: No plan found in TAP output Files=26, Tests=254, 105 wallclock secs ( 0.10 usr 0.02 sys + 19.58 cusr 11.02 csys = 30.72 CPU) Result: FAIL make[2]: *** [Makefile:23: check] Error 1 https://salsa.debian.org/postgresql/postgresql/-/jobs/4910354 https://salsa.debian.org/postgresql/postgresql/-/jobs/4910355 Christoph
Re: Improvements in pg_dump/pg_restore toc format and performances
On Tue, Oct 03, 2023 at 03:17:57PM +0530, vignesh C wrote: > Few comments: Pierre, do you plan to submit a new revision of this patch set for the November commitfest? If not, the commitfest entry may be marked as returned-with-feedback soon. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: SET ROLE documentation improvement
On Tue, Sep 26, 2023 at 08:33:25AM -0700, Yurii Rashkovskii wrote: > This is a good start, indeed. I've amended my patch to include it. Thanks for the new patch. Looking again, I'm kind of hesitant to add too much qualification to this note about losing superuser privileges. If we changed it to Note that when a superuser chooses to SET ROLE to a non-superuser role, they lose their superuser privileges, except for the privilege to change to another role again using SET ROLE or RESET ROLE. it almost seems to imply that a non-superuser role could obtain the ability to switch to any role if they first SET ROLE to a superuser. In practice, that's true because they could just give the session role SUPERUSER, but I don't think that's the intent of this section. I thought about changing it to something like Note that when a superuser chooses to SET ROLE to a non-superuser role, they lose their superuser privileges. However, if the current session user is a superuser, they retain the ability to set the current user identifier to any role via SET ROLE and RESET ROLE. but it seemed weird to me to single out superusers here when it's always true that the current session user retains the ability to SET ROLE to any role they have the SET option on. That is already covered above in the "Description" section, so I don't really see the need to belabor the point by adding qualifications to the "Notes" section. ISTM the point of these couple of paragraphs in the "Notes" section is to explain the effects on privileges for schemas, tables, etc. I still think we should update the existing note about privileges for SET/RESET ROLE to something like the following: diff --git a/doc/src/sgml/ref/set_role.sgml b/doc/src/sgml/ref/set_role.sgml index 13bad1bf66..c91a95f5af 100644 --- a/doc/src/sgml/ref/set_role.sgml +++ b/doc/src/sgml/ref/set_role.sgml @@ -41,8 +41,10 @@ RESET ROLE - The specified role_name - must be a role that the current session user is a member of. + The current session user must have the SET for the + specified role_name, either + directly or indirectly via a chain of memberships with the + SET option. (If the session user is a superuser, any role can be selected.) -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: AdvanceXLInsertBuffers() vs wal_sync_method=open_datasync
Hi, On 2023-11-10 17:16:35 +0200, Heikki Linnakangas wrote: > On 10/11/2023 05:54, Andres Freund wrote: > > In this case I had used wal_sync_method=open_datasync - it's often faster > > and > > if we want to scale WAL writes more we'll have to use it more widely (you > > can't have multiple fdatasyncs in progress and reason about which one > > affects > > what, but you can have multiple DSYNC writes in progress at the same time). > > Not sure I understand that. If you issue an fdatasync, it will sync all > writes that were complete before the fdatasync started. Right? If you have > multiple fdatasyncs in progress, that's true for each fdatasync. Or is there > a bottleneck in the kernel with multiple in-progress fdatasyncs or > something? Many filesystems only allow a single fdatasync to really be in progress at the same time, they eventually acquire an inode specific lock. More problematic cases include things like a write followed by an fdatasync, followed by a write of the same block in another process/thread - there's very little guarantee about which contents of that block are now durable. But more importantly, using fdatasync doesn't scale because it effectively has to flush the entire write cache one the device - which often contains plenty other dirty data. Whereas O_DSYNC can use FUA writes, which just makes the individual WAL writes write through the cache, while leaving the rest of the cache "unaffected". > > After a bit of confused staring and debugging I figured out that the problem > > is that the RequestXLogSwitch() within the code for starting a basebackup > > was > > triggering writing back the WAL in individual 8kB writes via > > GetXLogBuffer()->AdvanceXLInsertBuffer(). With open_datasync each of these > > writes is durable - on this drive each take about 1ms. > > I see. So the assumption in AdvanceXLInsertBuffer() is that XLogWrite() is > relatively fast. But with open_datasync, it's not. I'm not sure that was an explicit assumption rather than just how it worked out. > > To fix this, I suspect we need to make > > GetXLogBuffer()->AdvanceXLInsertBuffer() flush more aggressively. In this > > specific case, we even know for sure that we are going to fill a lot more > > buffers, so no heuristic would be needed. In other cases however we need > > some > > heuristic to know how much to write out. > > +1. Maybe use the same logic as in XLogFlush(). I've actually been wondering about moving all the handling of WALWriteLock to XLogWrite() and/or a new function called from all the places calling XLogWrite(). I suspect we can't quite use the same logic in AdvanceXLInsertBuffer() as we do in XLogFlush() - we e.g. don't ever want to trigger flushing out a partially filled page, for example. Or really ever want to unnecessarily wait for a WAL insertion to complete when we don't have to. > I wonder if the 'flexible' argument to XLogWrite() is too inflexible. It > would be nice to pass a hard minimum XLogRecPtr that it must write up to, > but still allow it to write more than that if it's convenient. Yes, I've also thought that. In the AIOified WAL code I ended up tracking "minimum" and "optimal" write/flush locations. Greetings, Andres Freund
Re: Buffer Cache Problem
Hi, I have 3 questions here: 1. I see comments in but_internals.h below: * Also, in places we do one-time reads of the flags without bothering to * lock the buffer header; this is generally for situations where we don't * expect the flag bit being tested to be changing. In fact, the flag is in state filed which is an atomic_u32, so we don’t need to acquire buffer header lock in any case, but for this comment, seems it’s saying we need to hold a buffer header lock when read flag in general. 2. Another question: * We can't physically remove items from a disk page if another backend has * the buffer pinned. Hence, a backend may need to wait for all other pins * to go away. This is signaled by storing its own pgprocno into * wait_backend_pgprocno and setting flag bit BM_PIN_COUNT_WAITER. At present, * there can be only one such waiter per buffer. The comments above, in fact for now, if a backend plan to remove items from a disk page, this is a mutation operation, so this backend must hold a exclusive lock for this buffer page, then in this case, there are no other backends pinning this buffer, so the pin refcount must be 1 (it’s by this backend), then this backend can remove the items safely and no need to wait other backends (because there are no other backends pinning this buffer). So my question is below: The operation “storing its own pgprocno into * wait_backend_pgprocno and setting flag bit BM_PIN_COUNT_WAITER” is whether too expensive, we should not do like this, right? 3. Where is the array? * Per-buffer I/O condition variables are currently kept outside this struct in * a separate array. They could be moved in here and still fit within that * limit on common systems, but for now that is not done.
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
On Fri, Nov 10, 2023 at 10:17:49AM +0530, Dilip Kumar wrote: > On Thu, Nov 9, 2023 at 4:55 PM Alvaro Herrera wrote: >> The only point on which we do not have full consensus yet is the need to >> have one GUC per SLRU, and a lot of effort seems focused on trying to >> fix the problem without adding so many GUCs (for example, using shared >> buffers instead, or use a single "scaling" GUC). I think that hinders >> progress. Let's just add multiple GUCs, and users can leave most of >> them alone and only adjust the one with which they have a performance >> problems; it's not going to be the same one for everybody. > > +1 +1 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: CRC32C Parallel Computation Optimization on ARM
On Tue, Nov 07, 2023 at 08:05:45AM +, Xiang Gao wrote: > I think I understand what you mean, this is the latest patch. Thank you! Thanks for the new patch. +# PGAC_ARMV8_VMULL_INTRINSICS +# +# Check if the compiler supports the vmull_p64 +# intrinsic functions. These instructions +# were first introduced in ARMv8 crypto Extension. I wonder if it'd be better to call this PGAC_ARMV8_CRYPTO_INTRINSICS since this check seems to indicate the presence of +crypto. Presumably there are other instructions in this extension that could be used elsewhere, in which case we could reuse this. +# Use ARM VMULL if available and ARM CRC32C intrinsic is avaliable too. +if test x"$USE_ARMV8_VMULL" = x"" && test x"$USE_ARMV8_VMULL_WITH_RUNTIME_CHECK" = x"" && (test x"$USE_ARMV8_CRC32C" = x"1" || test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x"1"); then + if test x"$pgac_armv8_vmull_intrinsics" = x"yes" && test x"$CFLAGS_VMULL" = x""; then +USE_ARMV8_VMULL=1 + else +if test x"$pgac_armv8_vmull_intrinsics" = x"yes"; then + USE_ARMV8_VMULL_WITH_RUNTIME_CHECK=1 +fi + fi +fi I'm not sure I see the need to check USE_ARMV8_CRC32C* when setting these. Couldn't we set them solely on the results of our PGAC_ARMV8_VMULL_INTRINSICS check? It looks like this is what you are doing in meson.build already. +extern pg_crc32c pg_comp_crc32c_with_vmull_armv8(pg_crc32c crc, const void *data, size_t len); nitpick: Maybe pg_comp_crc32_armv8_parallel()? -# all versions of pg_crc32c_armv8.o need CFLAGS_CRC -pg_crc32c_armv8.o: CFLAGS+=$(CFLAGS_CRC) -pg_crc32c_armv8_shlib.o: CFLAGS+=$(CFLAGS_CRC) -pg_crc32c_armv8_srv.o: CFLAGS+=$(CFLAGS_CRC) Why are these lines deleted? - ['pg_crc32c_armv8', 'USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 'crc'], + ['pg_crc32c_armv8', 'USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK'], What is the purpose of this change? +__attribute__((target("+crc+crypto"))) I'm not sure we can assume that all compilers will understand this, and I'm not sure we need it. + if (use_vmull) + { +/* + * Crc32c parallel computation Input data is divided into three + * equal-sized blocks. Block length : 42 words(42 * 8 bytes). + * CRC0: 0 ~ 41 * 8, + * CRC1: 42 * 8 ~ (42 * 2 - 1) * 8, + * CRC2: 42 * 2 * 8 ~ (42 * 3 - 1) * 8. + */ Shouldn't we surround this with #ifdefs for USE_ARMV8_VMULL*? if (pg_crc32c_armv8_available()) + { +#if defined(USE_ARMV8_VMULL) + pg_comp_crc32c = pg_comp_crc32c_with_vmull_armv8; +#elif defined(USE_ARMV8_VMULL_WITH_RUNTIME_CHECK) + if (pg_vmull_armv8_available()) + { + pg_comp_crc32c = pg_comp_crc32c_with_vmull_armv8; + } + else + { + pg_comp_crc32c = pg_comp_crc32c_armv8; + } +#else pg_comp_crc32c = pg_comp_crc32c_armv8; +#endif + } IMO it'd be better to move the #ifdefs into the functions so that we can simplify this to something like if (pg_crc32c_armv8_available()) { if (pg_crc32c_armv8_crypto_available()) pg_comp_crc32c = pg_comp_crc32c_armv8_parallel; else pg_comp_crc32c = pg_comp_crc32c_armv8; else pc_comp_crc32c = pg_comp_crc32c_sb8; -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: AdvanceXLInsertBuffers() vs wal_sync_method=open_datasync
On 10/11/2023 05:54, Andres Freund wrote: In this case I had used wal_sync_method=open_datasync - it's often faster and if we want to scale WAL writes more we'll have to use it more widely (you can't have multiple fdatasyncs in progress and reason about which one affects what, but you can have multiple DSYNC writes in progress at the same time). Not sure I understand that. If you issue an fdatasync, it will sync all writes that were complete before the fdatasync started. Right? If you have multiple fdatasyncs in progress, that's true for each fdatasync. Or is there a bottleneck in the kernel with multiple in-progress fdatasyncs or something? After a bit of confused staring and debugging I figured out that the problem is that the RequestXLogSwitch() within the code for starting a basebackup was triggering writing back the WAL in individual 8kB writes via GetXLogBuffer()->AdvanceXLInsertBuffer(). With open_datasync each of these writes is durable - on this drive each take about 1ms. I see. So the assumption in AdvanceXLInsertBuffer() is that XLogWrite() is relatively fast. But with open_datasync, it's not. To fix this, I suspect we need to make GetXLogBuffer()->AdvanceXLInsertBuffer() flush more aggressively. In this specific case, we even know for sure that we are going to fill a lot more buffers, so no heuristic would be needed. In other cases however we need some heuristic to know how much to write out. +1. Maybe use the same logic as in XLogFlush(). I wonder if the 'flexible' argument to XLogWrite() is too inflexible. It would be nice to pass a hard minimum XLogRecPtr that it must write up to, but still allow it to write more than that if it's convenient. -- Heikki Linnakangas Neon (https://neon.tech)
Re: EXCLUDE COLLATE in CREATE/ALTER TABLE document
On Wed, Nov 1, 2023 at 10:30 AM shihao zhong wrote: > > Thank you for your feedback on my previous patch. I have fixed the issue and > attached a new patch for your review. Could you please take a look for it if > you have a sec? Thanks > Your patch works fine. you can see it here: https://cirrus-ci.com/task/6481922939944960 in an ideal world, since the doc is already built, we can probably view it as a plain html file just click the ci test result. in src/sgml/ref/create_table.sgml: "Each exclude_element can optionally specify an operator class and/or ordering options; these are described fully under CREATE INDEX." You may need to update this sentence to reflect that exclude_element can also optionally specify collation.
Re: Buffer Cache Problem
> 2023年11月10日 22:31,jacktby jacktby 写道: > > In the bus_internal.h,I see > > Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change > tag, state or wait_backend_pgprocno fields. > > As we all know, this buffer header lock is implemented by a bit in state > filed, and this state field is a atomic_u32 type, so in fact we don’t need to > hold buffer lock when we update state, this comment has error,right? Oh, sorry this is true, in fact we never acquire a spin lock when update the state.
Re: Buffer Cache Problem
In the bus_internal.h,I see Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change tag, state or wait_backend_pgprocno fields. As we all know, this buffer header lock is implemented by a bit in state filed, and this state field is a atomic_u32 type, so in fact we don’t need to hold buffer lock when we update state, this comment has error,right?
Re: ResourceOwner refactoring
Thanks for the testing again! On 10/11/2023 11:00, Alexander Lakhin wrote: I could see two failure modes: 2023-11-10 08:42:28.870 UTC [1163274] ERROR: ResourceOwnerEnlarge called after release started 2023-11-10 08:42:28.870 UTC [1163274] STATEMENT: drop table t; 2023-11-10 08:42:28.870 UTC [1163274] WARNING: AbortTransaction while in COMMIT state 2023-11-10 08:42:28.870 UTC [1163274] PANIC: cannot abort transaction 906, it was already committed 2023-11-10 08:43:27.897 UTC [1164148] ERROR: ResourceOwnerEnlarge called after release started 2023-11-10 08:43:27.897 UTC [1164148] STATEMENT: DROP DATABASE db69; 2023-11-10 08:43:27.897 UTC [1164148] WARNING: AbortTransaction while in COMMIT state 2023-11-10 08:43:27.897 UTC [1164148] PANIC: cannot abort transaction 1043, it was already committed The stack trace for the second ERROR (ResourceOwnerEnlarge called ...) is: ... #6 0x558af5b2f35c in ResourceOwnerEnlarge (owner=0x558af716f3c8) at resowner.c:455 #7 0x558af5888f18 in dsm_create_descriptor () at dsm.c:1207 #8 0x558af5889205 in dsm_attach (h=3172038420) at dsm.c:697 #9 0x558af5b1ebed in get_segment_by_index (area=0x558af711da18, index=2) at dsa.c:1764 #10 0x558af5b1ea4b in dsa_get_address (area=0x558af711da18, dp=2199023329568) at dsa.c:970 #11 0x558af5669366 in dshash_seq_next (status=0x7ffdd5912fd0) at dshash.c:687 #12 0x558af5901998 in pgstat_drop_database_and_contents (dboid=16444) at pgstat_shmem.c:830 #13 0x558af59016f0 in pgstat_drop_entry (kind=PGSTAT_KIND_DATABASE, dboid=16444, objoid=0) at pgstat_shmem.c:888 #14 0x558af59044eb in AtEOXact_PgStat_DroppedStats (xact_state=0x558af7111ee0, isCommit=true) at pgstat_xact.c:88 #15 0x558af59043c7 in AtEOXact_PgStat (isCommit=true, parallel=false) at pgstat_xact.c:55 #16 0x558af53c782e in CommitTransaction () at xact.c:2371 #17 0x558af53c709e in CommitTransactionCommand () at xact.c:306 ... The quick, straightforward fix is to move the "CurrentResourceOwner = NULL" line earlier in CommitTransaction, per attached 0003-Clear-CurrentResourceOwner-earlier-in-CommitTransact.patch. You're not allowed to use the resource owner after you start to release it; it was a bit iffy even before the ResourceOwner rewrite but now it's explicitly forbidden. By clearing CurrentResourceOwner as soon as we start releasing it, we can prevent any accidental use. When CurrentResourceOwner == NULL, dsm_attach() returns a handle that's not associated with any ResourceOwner. That's appropriate for the pgstat case. The DSA is "pinned", so the handle is forgotten from the ResourceOwner right after calling dsm_attach() anyway. Looking closer at dsa.c, I think this is a wider problem though. The comments don't make it very clear how it's supposed to interact with ResourceOwners. There's just this brief comment in dsa_pin_mapping(): * By default, areas are owned by the current resource owner, which means they * are detached automatically when that scope ends. The dsa_area struct isn't directly owned by any ResourceOwner though. The DSM segments created by dsa_create() or dsa_attach() are. But the functions dsa_allocate() and dsa_get_address() can create or attach more DSM segments to the area, and they will be owned by the by the current resource owner *at the time of the call*. So if you call dsa_get_address() while in a different resource owner, things get very confusing. The attached new test module demonstrates that (0001-Add-test_dsa-module.patch), here's a shortened version: a = dsa_create(tranche_id); /* Switch to new resource owner */ oldowner = CurrentResourceOwner; childowner = ResourceOwnerCreate(oldowner, "temp owner"); CurrentResourceOwner = childowner; /* make a bunch of allocations */ for (int i = 0; i < 1; i++) p[i] = dsa_allocate(a, 1000); /* Release the child resource owner */ CurrentResourceOwner = oldowner; ResourceOwnerRelease(childowner, RESOURCE_RELEASE_BEFORE_LOCKS, true, false); ResourceOwnerRelease(childowner, RESOURCE_RELEASE_LOCKS, true, false); ResourceOwnerRelease(childowner, RESOURCE_RELEASE_AFTER_LOCKS, true, false); ResourceOwnerDelete(childowner); dsa_detach(a); This first prints warnings on The ResourceOwnerRelease() calls: 2023-11-10 13:57:21.475 EET [745346] WARNING: resource was not closed: dynamic shared memory segment 2395813396 2023-11-10 13:57:21.475 EET [745346] WARNING: resource was not closed: dynamic shared memory segment 3922992700 2023-11-10 13:57:21.475 EET [745346] WARNING: resource was not closed
Re: pg_upgrade and logical replication
On Thu, 9 Nov 2023 at 07:44, Peter Smith wrote: > > Thanks for addressing my previous review comments. > > I re-checked the latest patch v12-0001 and found the following: > > == > Commit message > > 1. > The new SQL binary_upgrade_create_sub_rel_state function has the following > syntax: > SELECT binary_upgrade_create_sub_rel_state(subname text, relid oid, > state char [,sublsn pg_lsn]) > > ~ > > Looks like v12 accidentally forgot to update this to the modified > function name 'binary_upgrade_add_sub_rel_state' This is handled in the v13 version patch posted at: https://www.postgresql.org/message-id/CALDaNm0mGz6_69BiJTmEqC8Q0U0x2nMZOs3w9btKOHZZpfC2ow%40mail.gmail.com Regards, Vignesh
Re: pg_upgrade and logical replication
On Thu, 9 Nov 2023 at 12:23, Michael Paquier wrote: > > On Thu, Nov 09, 2023 at 01:14:05PM +1100, Peter Smith wrote: > > Looks like v12 accidentally forgot to update this to the modified > > function name 'binary_upgrade_add_sub_rel_state' > > This v12 is overall cleaner than its predecessors. Nice to see. > > +my $result = $publisher->safe_psql('postgres', "SELECT count(*) FROM t1"); > +is($result, qq(1), "check initial t1 table data on publisher"); > +$result = $publisher->safe_psql('postgres', "SELECT count(*) FROM t2"); > +is($result, qq(1), "check initial t1 table data on publisher"); > +$result = $old_sub->safe_psql('postgres', "SELECT count(*) FROM t1"); > +is($result, qq(1), "check initial t1 table data on the old subscriber"); > +$result = $old_sub->safe_psql('postgres', "SELECT count(*) FROM t2"); > > I'd argue that t1 and t2 should have less generic names. t1 is used > to check that the upgrade process works, while t2 is added to the > publication after upgrading the subscriber. Say something like > tab_upgraded or tab_not_upgraded? Modified > +my $synced_query = > + "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN > ('r');"; > > Perhaps it would be safer to use a query that checks the number of > relations in 'r' state? This query would return true if > pg_subscription_rel has no tuples. Modified > +# Table will be in 'd' (data is being copied) state as table sync will fail > +# because of primary key constraint error. > +my $started_query = > + "SELECT count(1) = 1 FROM pg_subscription_rel WHERE srsubstate = 'd';"; > > Relying on a pkey error to enforce an incorrect state is a good trick. > Nice. That was better way to get data sync state without manually changing the pg_subscription_rel catalog > +command_fails( > +[ > +'pg_upgrade', '--no-sync','-d', $old_sub->data_dir, > +'-D', $new_sub->data_dir, '-b', $bindir, > +'-B', $bindir,'-s', $new_sub->host, > +'-p', $old_sub->port, '-P', $new_sub->port, > +$mode,'--check', > +], > +'run of pg_upgrade --check for old instance with relation in \'d\' > datasync(invalid) state'); > +rmtree($new_sub->data_dir . "/pg_upgrade_output.d"); > > Okay by me to not stop the cluster for the --check to shave a few > cycles. It's a bit sad that we don't cross-check the contents of > subscription_state.txt before removing pg_upgrade_output.d. Finding > the file is easy even if the subdir where it is included is not a > constant name. Then it is possible to apply a regexp with the > contents consumed by a slurp_file(). Modified > +my $remote_lsn = $old_sub->safe_psql('postgres', > +"SELECT remote_lsn FROM pg_replication_origin_status"); > Perhaps you've not noticed, but this would be 0/0 most of the time. > However the intention is to check after a valid LSN to make sure that > the origin is set, no? I have added few more inserts to make remote_lsn not be 0/0 > I am wondering whether this should use a bit more data than just one > tuple, say at least two transaction, one of them with a multi-value > INSERT? Added one more multi-insert > +# -- > +# Check that pg_upgrade is successful when all tables are in ready state. > +# -- > This comment is a bit inconsistent with the state that are accepted, > but why not, at least that's predictible. The key test validation is mentioned in this style of comment > +* relation to pg_subscripion_rel table. This will be used only in > > Typo: s/pg_subscripion_rel/pg_subscription_rel/. Modified > This needs some word-smithing to explain the reasons why a state is > not needed: > > +/* > + * The subscription relation should be in either i (initialize), > + * r (ready) or s (synchronized) state as either the replication slot > + * is not created or the replication slot is already dropped and the > + * required WAL files will be present in the publisher. The other > + * states are not ok as the worker has dependency on the replication > + * slot/origin in these case: > > A slot not created yet refers to the 'i' state, while 'r' and 's' > refer to a slot created previously but already dropped, right? > Shouldn't this comment tell that rather than mixing the assumptions? Modified > + * a) SUBREL_STATE_DATASYNC: In this case, the table sync worker will > + * try to drop the replication slot but as the replication slots will > + * be created with old subscription id in the publisher and the > + * upgraded subscriber will not be able to clean the slots in this > + * case. > > Proposal: A relation upgraded while in this state would retain a > replication slot, which could not be dropped by the sync worker > spawned after the upgrade because the subscription ID tracked by the >
Re: Bug: RLS policy FOR SELECT is used to check new rows
On Fri, 2023-11-10 at 09:39 +, Dean Rasheed wrote: > On Thu, 9 Nov 2023 at 18:55, Laurenz Albe wrote: > > I think it can be useful to allow a user an UPDATE where the result > > does not satisfy the USING clause of the FOR SELECT policy. > > > > The idea that an UPDATE should only produce rows you can SELECT is not > > true today: if you run an UPDATE without a WHERE clause, you can > > create rows you cannot see. The restriction is only on UPDATEs with > > a WHERE clause. Weird, isn't it? > > That's true, but only if the UPDATE also doesn't have a RETURNING > clause. What I find weird about your proposal is that it would allow > an UPDATE ... RETURNING command to return something that would be > visible just that once, but then subsequently disappear. That seems > like a cure that's worse than the original disease that kicked off > this discussion. What kicked off the discussion was my complaint that FOR SELECT rules mess with UPDATE, so that's exactly what I would have liked: an UPDATE that makes the rows vanish. My naïve expectation was that FOR SELECT policies govern SELECT and FOR UPDATE policies govern UPDATE. After all, there is a WITH CHECK clause for FOR UPDATE policies that checks the result rows. So, from my perspective, we should never have let FOR SELECT policies mess with an UPDATE. But I am too late for that; such a change would be way too invasive now. So I'd like to introduce a "back door" by creating a FOR SELECT policy with WITH CHECK (TRUE). > As mentioned by others, the intention was that RLS behave like WITH > CHECK OPTION on an updatable view, so that new rows can't just > disappear. There are, however, 2 differences between the way it > currently works for RLS, and an updatable view: > > 1). RLS only does this for UPDATE commands. INSERT commands *can* > insert new rows that aren't visible, and so disappear. > > 2). It can't be turned off. The WITH CHECK OPTION on an updatable view > is an option that the user can choose to turn on or off. That's not > possible with RLS. Right. Plus the above-mentioned fact that you can make rows vanish with an UPDATE that has no WHERE. > It might be possible to change (2) though, by adding a new table-level > option (similar to a view's WITH CHECK OPTION) that enabled or > disabled the checking of new rows for that table, and whose default > matched the current behaviour. That would be a viable solution. Pro: it doesn't make the already hideously complicated RLS system even more complicated. Con: yet another storage option... > Before going too far down that route though, it is perhaps worth > asking whether this is something users really want. Is there a real > use-case for being able to UPDATE rows and have them disappear? What triggered my investigation was this question: https://stackoverflow.com/q/77346757/6464308 I personally don't have any stake in this. I just wanted a way to make RLS behave more like I think it should. Yours, Laurenz Albe
Re: Parallel aggregates in PG 16.1
On Fri, 10 Nov 2023 at 11:47, ZIMANYI Esteban wrote: > > In MobilityDB > https://github.com/MobilityDB/MobilityDB > we have defined a tstzspan type which is a fixed-size equivalent of the > tstzrange type in PostgreSQL. > > We have a span_union aggregate function which is the equivalent of the > range_agg function in PostgreSQL defined as follows > > CREATE FUNCTION tstzspan_union_finalfn(internal) > RETURNS tstzspanset > AS 'MODULE_PATHNAME', 'Span_union_finalfn' > LANGUAGE C IMMUTABLE PARALLEL SAFE; > > CREATE AGGREGATE span_union(tstzspan) ( > SFUNC = array_agg_transfn, > STYPE = internal, > COMBINEFUNC = array_agg_combine, > SERIALFUNC = array_agg_serialize, > DESERIALFUNC = array_agg_deserialize, > FINALFUNC = tstzspan_union_finalfn > ); > > As can be seen, we reuse the array_agg function to accumulate the values in > an array and the final function just does similar work as the > range_agg_finalfn to merge the overlapping spans. Did you note the following section in the CREATE AGGREGATE documentation [0]? """ An aggregate can optionally support partial aggregation, as described in Section 38.12.4. This requires specifying the COMBINEFUNC parameter. If the state_data_type is internal, it's usually also appropriate to provide the SERIALFUNC and DESERIALFUNC parameters so that parallel aggregation is possible. Note that the aggregate must also be marked PARALLEL SAFE to enable parallel aggregation. """ >From this, it seems like the PARALLEL = SAFE argument is missing from your aggregate definition as provided above. Kind regards, Matthias van de Meent Neon (https://neon.tech) [0] https://www.postgresql.org/docs/16/sql-createaggregate.html
Re: Add new option 'all' to pg_stat_reset_shared()
On 2023-11-10 13:18, Andres Freund wrote: Hi, On November 8, 2023 11:28:08 PM PST, Michael Paquier wrote: On Thu, Nov 09, 2023 at 01:50:34PM +0900, torikoshia wrote: PGSTAT_KIND_SLRU cannot be reset by pg_stat_reset_shared(), so I feel uncomfortable to delete it all together. It might be better after pg_stat_reset_shared() has been modified to take 'slru' as an argument, though. Not sure how to feel about that, TBH, but I would not include SLRUs here if we have already a separate function. I thought it would be better to reset statistics even when null is specified so that users are not confused with the behavior of pg_stat_reset_slru(). Attached patch added pg_stat_reset_shared() in system_functions.sql mainly for this reason. I'm OK with doing what your patch does, aka do the work if the value is NULL or if there's no argument given. -Resets some cluster-wide statistics counters to zero, depending on the +Resets cluster-wide statistics counters to zero, depending on the This does not need to change, aka SLRUs are for example still global and not included here. +If the argument is NULL or not specified, all counters shown in all +of these views are reset. Missing a markup around NULL. I know, we're not consistent about that, either, but if we are tweaking the area let's be right at least. Perhaps "all the counters from the views listed above are reset"? I see no reason to not include slrus. We should never have added the ability to reset them individually, particularly not without a use case - I couldn't find one skimming some discussion. And what's the point in not allowing to reset them via pg_stat_reset_shared()? When including SLRUs, do you think it's better to add 'slrus' argument which enables pg_stat_reset_shared() to reset all SLRUs? As described above, since SLRUs cannot be reset by pg_stat_reset_shared(), I feel a bit uncomfortable to delete it all together. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: trying again to get incremental backup
On Tue, Nov 7, 2023 at 2:06 AM Robert Haas wrote: > > On Mon, Oct 30, 2023 at 2:46 PM Andres Freund wrote: > > After playing with this for a while, I don't see a reason for > > wal_summarize_mb > > from a memory usage POV at least. > > Here's v8. Changes: Review comments, based on what I reviewed so far. - I think 0001 looks good improvement irrespective of the patch series. - review 0003 1. + be enabled either on a primary or on a standby. WAL summarization can + cannot be enabled when wal_level is set to + minimal. Grammatical error "WAL summarization can cannot" -> WAL summarization cannot 2. + + wal_summarize_keep_time (boolean) + + wal_summarize_keep_time configuration parameter + I feel the name of the guy should be either wal_summarizer_keep_time or wal_summaries_keep_time, I mean either we should refer to the summarizer process or to the way summaries files. 3. +XLogGetOldestSegno(TimeLineID tli) +{ + + /* Ignore files that are not XLOG segments */ + if (!IsXLogFileName(xlde->d_name)) + continue; + + /* Parse filename to get TLI and segno. */ + XLogFromFileName(xlde->d_name, &file_tli, &file_segno, + wal_segment_size); + + /* Ignore anything that's not from the TLI of interest. */ + if (tli != file_tli) + continue; + + /* If it's the oldest so far, update oldest_segno. */ Some of the single-line comments end with a full stop whereas others do not, so better to be consistent. 4. + * If start_lsn != InvalidXLogRecPtr, only summaries that end before the + * indicated LSN will be included. + * + * If end_lsn != InvalidXLogRecPtr, only summaries that start before the + * indicated LSN will be included. + * + * The intent is that you can call GetWalSummaries(tli, start_lsn, end_lsn) + * to get all WAL summaries on the indicated timeline that overlap the + * specified LSN range. + */ +List * +GetWalSummaries(TimeLineID tli, XLogRecPtr start_lsn, XLogRecPtr end_lsn) Instead of "If start_lsn != InvalidXLogRecPtr, only summaries that end before the" it should be "If start_lsn != InvalidXLogRecPtr, only summaries that end after the" because only if the summary files are Ending after the start_lsn then it will have some overlapping and we need to return them if ending before start lsn then those files are not overlapping at all, right? 5. In FilterWalSummaries() header also the comment is wrong same as for GetWalSummaries() function. 6. + * If the whole range of LSNs is covered, returns true, otherwise false. + * If false is returned, *missing_lsn is set either to InvalidXLogRecPtr + * if there are no WAL summary files in the input list, or to the first LSN + * in the range that is not covered by a WAL summary file in the input list. + */ +bool +WalSummariesAreComplete(List *wslist, XLogRecPtr start_lsn, I did not see the usage of this function, but I think if the whole range is not covered why not keep the behavior uniform w.r.t. what we set for '*missing_lsn', I mean suppose there is no file then missing_lsn is the start_lsn because a very first LSN is missing. 7. + nbytes = FileRead(io->file, data, length, io->filepos, + WAIT_EVENT_WAL_SUMMARY_READ); + if (nbytes < 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not write file \"%s\": %m", + FilePathName(io->file; /could not write file/ could not read file 8. +/* + * Comparator to sort a List of WalSummaryFile objects by start_lsn. + */ +static int +ListComparatorForWalSummaryFiles(const ListCell *a, const ListCell *b) +{ -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Synchronizing slots from primary to standby
Hi, On 11/10/23 4:31 AM, shveta malik wrote: On Thu, Nov 9, 2023 at 9:15 PM Drouvot, Bertrand wrote: Yeah I think so, because there is a time window when one could "use" the slot after the promotion and before it is removed. Producing things like: " 2023-11-09 15:16:50.294 UTC [2580462] LOG: dropped replication slot "logical_slot2" of dbid 5 as it was not sync-ready 2023-11-09 15:16:50.295 UTC [2580462] LOG: dropped replication slot "logical_slot3" of dbid 5 as it was not sync-ready 2023-11-09 15:16:50.297 UTC [2580462] LOG: dropped replication slot "logical_slot4" of dbid 5 as it was not sync-ready 2023-11-09 15:16:50.297 UTC [2580462] ERROR: replication slot "logical_slot5" is active for PID 2594628 " After the promotion one was able to use logical_slot5 and now we can now drop it. Yes, I was suspicious about this small window which may allow others to use this slot, that is why I was thinking of putting it in the promotion flow and thus asked that question earlier. But the slot-sync worker may end up creating it again in case it has not exited. Sorry, there is a typo up-thread, I meant "After the promotion one was able to use logical_slot5 and now we can NOT drop it.". We can not drop it because it is in use. So we need to carefully decide at what all places we need to put 'not-in recovery' checks in slot-sync workers. In the previous version, synchronize_one_slot() had that check and it was skipping sync if '!RecoveryInProgress'. But I have removed that check in v32 thinking that the slots which the worker has already fetched from the primary, let them all get synced and exit after that nstead of syncing half and leaving rest. But now on rethinking, was the previous behaviour correct i.e. skip sync at that point onward where we see it is no longer in standby-mode while few of the slots have already been synced in that sync-cycle. Thoughts? I think we still need to think/discuss the promotion flow. I think we would need to have the slot sync worker shutdown during the promotion (as suggested by Amit in [1]) but before that let the sync slot worker knows it is now acting during promotion. Something like: - let the sync worker know it is now acting under promotion - do what needs to be done while acting under promotion - shutdown the sync worker That way we would avoid any "risk" of having the sync worker doing something we don't expect while not in recovery anymore. Regarding "do what needs to be done while acting under promotion": - Ensure all slots in 'r' state are synced - drop slots that are in 'i' state Thoughts? [1]: https://www.postgresql.org/message-id/CAA4eK1J2Pc%3D5TOgty5u4bp--y7ZHaQx3_2eWPL%3DVPJ7A_0JF2g%40mail.gmail.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Parallel aggregates in PG 16.1
In MobilityDB https://github.com/MobilityDB/MobilityDB we have defined a tstzspan type which is a fixed-size equivalent of the tstzrange type in PostgreSQL. We have a span_union aggregate function which is the equivalent of the range_agg function in PostgreSQL defined as follows CREATE FUNCTION tstzspan_union_finalfn(internal) RETURNS tstzspanset AS 'MODULE_PATHNAME', 'Span_union_finalfn' LANGUAGE C IMMUTABLE PARALLEL SAFE; CREATE AGGREGATE span_union(tstzspan) ( SFUNC = array_agg_transfn, STYPE = internal, COMBINEFUNC = array_agg_combine, SERIALFUNC = array_agg_serialize, DESERIALFUNC = array_agg_deserialize, FINALFUNC = tstzspan_union_finalfn ); As can be seen, we reuse the array_agg function to accumulate the values in an array and the final function just does similar work as the range_agg_finalfn to merge the overlapping spans. I am testing the parallel aggregate features of PG 16.1 test=# select version(); version --- PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit I create a table with 1M random spans and another one with the same data converted to tstzrange CREATE TABLE tbl_tstzspan_1M AS SELECT k, random_tstzspan('2001-01-01', '2002-12-31', 10) AS t FROM generate_series(1, 1e6) AS k; CREATE TABLE tbl_tstzrange_1M AS SELECT k, t::tstzrange FROM tbl_tstzspan_1M; test=# analyze; ANALYZE test=# The tstzrange DOES NOT support parallel aggregates test=# EXPLAIN SELECT k%10, range_agg(t) AS t FROM tbl_tstzrange_1M group by k%10 order by k%10; QUERY PLAN - GroupAggregate (cost=66706.17..203172.65 rows=100 width=64) Group Key: ((k % '10'::numeric)) -> Gather Merge (cost=66706.17..183172.65 rows=100 width=54) Workers Planned: 2 -> Sort (cost=65706.15..66747.81 rows=416667 width=54) Sort Key: ((k % '10'::numeric)) -> Parallel Seq Scan on tbl_tstzrange_1m (cost=0.00..12568.33 rows=416667 width=54) (7 rows) The array_agg function supports parallel aggregates test=# EXPLAIN SELECT k%10, array_agg(t) AS t FROM tbl_tstzspan_1M group by k%10 order by k%10; QUERY PLAN -- Finalize GroupAggregate (cost=66706.17..193518.60 rows=100 width=64) Group Key: ((k % '10'::numeric)) -> Gather Merge (cost=66706.17..172268.60 rows=84 width=64) Workers Planned: 2 -> Partial GroupAggregate (cost=65706.15..75081.15 rows=416667 width=64) Group Key: ((k % '10'::numeric)) -> Sort (cost=65706.15..66747.81 rows=416667 width=56) Sort Key: ((k % '10'::numeric)) -> Parallel Seq Scan on tbl_tstzspan_1m (cost=0.00..12568.33 rows=416667 width=56) (9 rows) We are not able to make span_union aggregate support parallel aggregates test=# EXPLAIN SELECT k%10, span_union(t) AS t FROM tbl_tstzspan_1M group by k%10 order by k%10; QUERY PLAN -- GroupAggregate (cost=187879.84..210379.84 rows=100 width=64) Group Key: ((k % '10'::numeric)) -> Sort (cost=187879.84..190379.84 rows=100 width=56) Sort Key: ((k % '10'::numeric)) -> Seq Scan on tbl_tstzspan_1m (cost=0.00..19860.00 rows=100 width=56) Any suggestion? Thanks Esteban
Re: A recent message added to pg_upgade
On Fri, Nov 10, 2023 at 7:50 AM Michael Paquier wrote: > > On Thu, Nov 09, 2023 at 04:52:32PM +0900, Michael Paquier wrote: > > Thanks! > > Also, please see also a patch about switching the logirep launcher to > rely on IsBinaryUpgrade to prevent its startup. Any thoughts about > that? > Preventing these + * processes from starting while upgrading avoids any activity on the new + * cluster before the physical files are put in place, which could cause + * corruption on the new cluster upgrading to. I don't think this comment is correct because there won't be any apply activity on the new cluster as after restoration subscriptions should be disabled. On the old cluster, I think one problem is that the origins may move forward after we copy them which can cause data inconsistency issues. The other is that we may not prefer to generate additional data and WAL during the upgrade. Also, I am not completely sure about using the word 'corruption' in this context. -- With Regards, Amit Kapila.
Re: Add the ability to limit the amount of memory that can be allocated to backends.
hi. +static void checkAllocations(); should be "static void checkAllocations(void);" ? PgStatShared_Memtrack there is a lock, but seems not initialized, and not used. Can you expand on it? So in view pg_stat_global_memory_tracking, column "total_memory_reserved" is a point of time, total memory the whole server reserved/malloced? will it change every time you call it? the function pg_stat_get_global_memory_tracking provolatile => 's'. should be a VOLATILE function? pg_stat_get_memory_reservation, pg_stat_get_global_memory_tracking should be proretset => 'f'. +{ oid => '9891', + descr => 'statistics: memory utilized by current backend', + proname => 'pg_get_backend_memory_allocation', prorows => '1', proisstrict => 'f', + proretset => 't', provolatile => 's', proparallel => 'r', you declared +void pgstat_backend_memory_reservation_cb(void); but seems there is no definition. this part is unnecessary since you already declared src/include/catalog/pg_proc.dat? +/* SQL Callable functions */ +extern Datum pg_stat_get_memory_reservation(PG_FUNCTION_ARGS); +extern Datum pg_get_backend_memory_allocation(PG_FUNCTION_ARGS); +extern Datum pg_stat_get_global_memory_tracking(PG_FUNCTION_ARGS); The last sentence is just a plain link, no explanation. something is missing? + Reports how much memory remains available to the server. If a + backend process attempts to allocate more memory than remains, + the process will fail with an out of memory error, resulting in + cancellation of the process's active query/transaction. + If memory is not being limited (ie. max_total_memory is zero or not set), + this column returns NULL. + . + + + + + +static_shared_memory bigint + + + Reports how much static shared memory (non-DSM shared memory) is being used by + the server. Static shared memory is configured by the postmaster at + at server startup. + . + +
Re: Bug: RLS policy FOR SELECT is used to check new rows
On Thu, 9 Nov 2023 at 18:55, Laurenz Albe wrote: > > I think it can be useful to allow a user an UPDATE where the result > does not satisfy the USING clause of the FOR SELECT policy. > > The idea that an UPDATE should only produce rows you can SELECT is not > true today: if you run an UPDATE without a WHERE clause, you can > create rows you cannot see. The restriction is only on UPDATEs with > a WHERE clause. Weird, isn't it? > That's true, but only if the UPDATE also doesn't have a RETURNING clause. What I find weird about your proposal is that it would allow an UPDATE ... RETURNING command to return something that would be visible just that once, but then subsequently disappear. That seems like a cure that's worse than the original disease that kicked off this discussion. As mentioned by others, the intention was that RLS behave like WITH CHECK OPTION on an updatable view, so that new rows can't just disappear. There are, however, 2 differences between the way it currently works for RLS, and an updatable view: 1). RLS only does this for UPDATE commands. INSERT commands *can* insert new rows that aren't visible, and so disappear. 2). It can't be turned off. The WITH CHECK OPTION on an updatable view is an option that the user can choose to turn on or off. That's not possible with RLS. In a green field, I would say that it would be better to fix (1), so that INSERT and UPDATE are consistent. However, I fear that it may be too late for that, because any such change would risk breaking existing RLS policy setups in subtle ways. It might be possible to change (2) though, by adding a new table-level option (similar to a view's WITH CHECK OPTION) that enabled or disabled the checking of new rows for that table, and whose default matched the current behaviour. Before going too far down that route though, it is perhaps worth asking whether this is something users really want. Is there a real use-case for being able to UPDATE rows and have them disappear? Regards, Dean
Re: POC, WIP: OR-clause support for indexes
On 06.11.2023 16:51, Alena Rybakina wrote: I also support this approach. I have almost finished writing a patch that fixes the first problem related to the quadratic complexity of processing expressions by adding a hash table. I also added a check: if the number of groups is equal to the number of OR expressions, we assume that no expressions need to be converted and interrupt further execution. Now I am trying to fix the last problem in this patch: three tests have indicated a problem related to incorrect conversion. I don't think it can be serious, but I haven't figured out where the mistake is yet. I added log like that: ERROR: unrecognized node type: 0. I fixed this issue and added some cosmetic refactoring. The changes are presented in the or_patch_changes.diff file. -- Regards, Alena Rybakina Postgres Professional diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 25a4235dbd9..46212a77c64 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -44,7 +44,7 @@ /* GUC parameters */ bool Transform_null_equals = false; -bool or_transform_limit = false; +bool enable_or_transformation = false; static Node *transformExprRecurse(ParseState *pstate, Node *expr); @@ -108,7 +108,7 @@ typedef struct OrClauseGroupEntry List *consts; Oidscalar_type; Oidopno; - Expr *expr; + Node *expr; } OrClauseGroupEntry; static int @@ -189,16 +189,16 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) HASH_ELEM | HASH_FUNCTION | HASH_COMPARE); /* If this is not an 'OR' expression, skip the transformation */ - if (expr_orig->boolop != OR_EXPR || !or_transform_limit || len_ors == 1 || !or_group_htab) + if (expr_orig->boolop != OR_EXPR || !enable_or_transformation || len_ors == 1 || !or_group_htab) return transformBoolExpr(pstate, (BoolExpr *) expr_orig); foreach(lc, expr_orig->args) { Node *arg = lfirst(lc); - Node *orqual; - Node *const_expr; - Node *nconst_expr; - OrClauseGroupEntry *gentry; + Node *orqual = NULL; + Node *const_expr = NULL; + Node *nconst_expr = NULL; + OrClauseGroupEntry *gentry = NULL; boolfound; char *str; @@ -270,7 +270,7 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) gentry = hash_search(or_group_htab, &str, HASH_ENTER, &found); gentry->node = nconst_expr; gentry->consts = list_make1(const_expr); - gentry->expr = (Expr *) orqual; + gentry->expr = orqual; gentry->hash_leftvar_key = str; } @@ -283,7 +283,6 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) * transformed bool expression. */ hash_destroy(or_group_htab); - list_free(or_list); return (Node *) makeBoolExpr(OR_EXPR, or_list, expr_orig->location); } else @@ -345,9 +344,9 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) * OK: coerce all the right-hand non-Var inputs to the common * type and build an ArrayExpr for them. */ -List *aexprs; -ArrayExpr *newa; -ScalarArrayOpExpr *saopexpr; +List *aexprs = NIL; +ArrayExpr *newa = NULL; +ScalarArrayOpExpr *saopexpr = NULL; ListCell *l; aexprs = NIL; diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 54fd09abde7..3411f023df8 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -1049,12 +1049,12 @@ struct config_bool ConfigureNamesBool[] = NULL, NULL, NULL }, { - {"or_transform_limit", PGC_USERSET, QUERY_TUNING_OTHER, + {"enable_or_transformation", PGC_USERSET, QUERY_TUNING_OTHER, gettext_noop("Transform a sequence of OR clauses to an IN expression."), gettext_noop("The planner will replace clauses like 'x=c1 OR x=c2 .." "to the clause 'x IN (c1,c2,...)'") }, - &or_transform_limit, + &enable_or_transformation, false, NULL, NULL, NULL }, diff --git a/src/include/parser/parse_expr.h b/src/include/parser/parse_expr.h index 7a6943c116c..3a87de02859 100644 --- a/src/include/parser/parse_expr.h +++ b/src/include/parser/parse_expr.h @@ -17,7 +17,7 @@ /* GUC parameters */ extern PGDLLIMPORT bool Transform_null_equals; -extern PGDLLIMPORT bool or_transform_limit; +extern PGDLLIMPORT bool enable_or_transformation; extern Node *transformExpr(ParseState *pstate, Node *expr, ParseExprKind exprKind); diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 29c2bc6a2b2..dbc8bc3bed0 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -1883,7 +1883,7 @@ SELECT count(*) FROM tenk1 10 (1 row) -SET or_transform_limit = on; +SET enable_or_transformation = on; EXPLAIN (COSTS OFF) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42); @@ -1997,7 +1997,7 @@ SELECT count(*) FROM tenk1
Re: ResourceOwner refactoring
Hello Heikki, 09.11.2023 02:48, Heikki Linnakangas wrote: Thanks for the testing! Fixed. ... Thank you for the fix! Please look at one more failure caused be the new implementation of ResourceOwners: numdbs=80 for ((i=1;i<=10;i++)); do echo "ITERATION $i" for ((d=1;d<=$numdbs;d++)); do createdb db$d; done for ((d=1;d<=$numdbs;d++)); do echo " create table t(t1 text); drop table t; " | psql -d db$d >psql-$d.log 2>&1 & done wait grep 'PANIC' server.log && break; for ((d=1;d<=$numdbs;d++)); do dropdb db$d; done grep 'PANIC' server.log && break; done I could see two failure modes: 2023-11-10 08:42:28.870 UTC [1163274] ERROR: ResourceOwnerEnlarge called after release started 2023-11-10 08:42:28.870 UTC [1163274] STATEMENT: drop table t; 2023-11-10 08:42:28.870 UTC [1163274] WARNING: AbortTransaction while in COMMIT state 2023-11-10 08:42:28.870 UTC [1163274] PANIC: cannot abort transaction 906, it was already committed 2023-11-10 08:43:27.897 UTC [1164148] ERROR: ResourceOwnerEnlarge called after release started 2023-11-10 08:43:27.897 UTC [1164148] STATEMENT: DROP DATABASE db69; 2023-11-10 08:43:27.897 UTC [1164148] WARNING: AbortTransaction while in COMMIT state 2023-11-10 08:43:27.897 UTC [1164148] PANIC: cannot abort transaction 1043, it was already committed The stack trace for the second ERROR (ResourceOwnerEnlarge called ...) is: ... #6 0x558af5b2f35c in ResourceOwnerEnlarge (owner=0x558af716f3c8) at resowner.c:455 #7 0x558af5888f18 in dsm_create_descriptor () at dsm.c:1207 #8 0x558af5889205 in dsm_attach (h=3172038420) at dsm.c:697 #9 0x558af5b1ebed in get_segment_by_index (area=0x558af711da18, index=2) at dsa.c:1764 #10 0x558af5b1ea4b in dsa_get_address (area=0x558af711da18, dp=2199023329568) at dsa.c:970 #11 0x558af5669366 in dshash_seq_next (status=0x7ffdd5912fd0) at dshash.c:687 #12 0x558af5901998 in pgstat_drop_database_and_contents (dboid=16444) at pgstat_shmem.c:830 #13 0x558af59016f0 in pgstat_drop_entry (kind=PGSTAT_KIND_DATABASE, dboid=16444, objoid=0) at pgstat_shmem.c:888 #14 0x558af59044eb in AtEOXact_PgStat_DroppedStats (xact_state=0x558af7111ee0, isCommit=true) at pgstat_xact.c:88 #15 0x558af59043c7 in AtEOXact_PgStat (isCommit=true, parallel=false) at pgstat_xact.c:55 #16 0x558af53c782e in CommitTransaction () at xact.c:2371 #17 0x558af53c709e in CommitTransactionCommand () at xact.c:306 ... Best regards, Alexander
Re: Move bki file pre-processing from initdb to bootstrap
On 17.10.23 03:32, Krishnakumar R wrote: The version comparison has been moved from initdb to bootstrap. This created some compatibility problems with windows tests. For now I kept the version check to not have \n added, which worked fine and serves the purpose. However hoping to have something better in v3 in addition to addressing any other comments. With help from Thomas, figured out that on windows fopen uses binary mode in the backend which causes issues with EOL. Please find the attached patch updated with a fix for this. I suggest that this patch set be split up into three incremental parts: 1. Move some build-time settings from initdb to postgres.bki. 2. The database locale handling. 3. The bki file handling. Each of these topics really needs a separate detailed consideration.
Re: Move bki file pre-processing from initdb to bootstrap
On 06.10.23 02:24, Krishnakumar R wrote: elog(INFO, "Open bki file %s\n", bki_file); + boot_yyin = fopen(bki_file, "r"); Why is this needed? It already reads the bki file from stdin? We no longer open the bki file in initdb and pass to postgres to parse from stdin, instead we open the bki file directly in bootstrap and pass the file stream to the parser. Hence the need to switch the yyin. Have added a comment in the commit logs to capture this. Why this change? I mean, there is nothing wrong with it, but I don't follow how changing from reading from stdin to reading from a named file is related to moving the parameter substitution from initdb to the backend. One effect of this is that we would now have two different ways initdb interacts with the backend. In bootstrap mode, it reads from a named file, and the second run (the one that loads the system views etc.) reads from stdin. It's already confusing enough, so any further divergence should be adequately explained.
Re: Synchronizing slots from primary to standby
Hi, On 11/10/23 8:55 AM, Amit Kapila wrote: On Fri, Nov 10, 2023 at 12:50 PM Drouvot, Bertrand wrote: But even if we ERROR out instead of emitting a WARNING, the user would still need to be notified/monitor such errors. I agree that then probably they will come to know earlier because the slot sync mechanism would be stopped but still it is not "guaranteed" (specially if there is no others "working" synced slots around.) And if they do not, then there is still a risk to use this slot after a failover thinking this is a "synced" slot. I think this is another reason that probably giving ERROR has better chances for the user to notice before failover. IF knowing such errors user still proceeds with the failover, the onus is on her. Agree. My concern is more when they don't know about the error. We can probably document this hazard along with the failover feature so that users are aware that they either need to be careful while creating slots on standby or consult ERROR logs. I guess we can even make it visible in the view also. Yeah. Giving more thoughts, what about using a dedicated/reserved naming convention for synced slot like synced_ or such and then: - prevent user to create sync_ slots on standby - sync on primary to sync_ on standby - during failover, rename sync_ to and if exists then emit a WARNING and keep sync_ in place. That way both slots are still in place (the manually created and the sync_ Hmm, I think after failover, users need to rename all slots or we need to provide a way to rename them so that they can be used by subscribers which sounds like much more work. Agree that's much more work for the subscriber case. Maybe that's not worth the extra work. Also, the current coding doesn't ensure we will always give WARNING. If we see the below code that deals with this WARNING, + /* User created slot with the same name exists, emit WARNING. */ + else if (found && s->data.sync_state == SYNCSLOT_STATE_NONE) + { +ereport(WARNING, +errmsg("not synchronizing slot %s; it is a user created slot", + remote_slot->name)); + } + /* Otherwise create the slot first. */ + else + { +TransactionId xmin_horizon = InvalidTransactionId; +ReplicationSlot *slot; + +ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL, + remote_slot->two_phase, false); I think this is not a solid check to ensure that the slot existed before. Because it could be created as soon as the slot sync worker invokes ReplicationSlotCreate() here. Agree. So, having a concrete check to give WARNING would require some more logic which I don't think is a good idea to handle this boundary case. Yeah good point, agree to just error out in all the case then (if we discard the sync_ reserved wording proposal, which seems to be the case as probably not worth the extra work). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: MinGW compiler warnings in ecpg tests
Dear Peter, Michael, Sorry for reviving the old thread. While trying to build postgres on msys2 by meson, I faced the same warning. The OS is Windows 10. ``` $ ninja [2378/2402] Compiling C object src/interfaces/ecpg/test/sql/sqlda.exe.p/meson-generated_.._sqlda.c.obj ../postgres/src/interfaces/ecpg/test/sql/sqlda.pgc: In function 'dump_sqlda': ../postgres/src/interfaces/ecpg/test/sql/sqlda.pgc:45:33: warning: format '%d' expects argument of type 'int', but argument 3 has type 'long long int' [-Wformat=] 45 | "name sqlda descriptor: '%s' value %I64d\n", | ^~~ .. 49 | sqlda->sqlvar[i].sqlname.data, *(long long int *)sqlda->sqlvar[i].sqldata); | ~~ || |long long int ``` Before building, I did below steps: 1. Installed required software listed in [1]. 2. ran `meson setup -Dcassert=true -Ddebug=true /path/to/builddir` 3. moved to /path/to/builddir 4. ran `ninja` 5. got above warning Attached file summarize the result of meson command, which was output at the end of it. Also, belows show the version of meson/ninja. ``` $ ninja --version 1.11.1 $ meson -v 1.2.3 ``` I was quite not sure the windows build, but I could see that gcc compiler was used here. Does it mean that the compiler might not like the format string "%I64d"? I modified like below and could be compiled without warnings. ``` --- a/src/interfaces/ecpg/test/sql/sqlda.pgc +++ b/src/interfaces/ecpg/test/sql/sqlda.pgc @@ -41,7 +41,7 @@ dump_sqlda(sqlda_t *sqlda) break; case ECPGt_long_long: printf( -#ifdef _WIN32 +#if !defined(__GNUC__) "name sqlda descriptor: '%s' value %I64d\n", #else "name sqlda descriptor: '%s' value %lld\n", ``` [1]: https://www.postgresql.org/message-id/9f4f22be-f9f1-b350-bc06-521226b87f7a%40dunslane.net Best Regards, Hayato Kuroda FUJITSU LIMITED postgresql 17devel Data layout data block size: 8 kB WAL block size : 8 kB segment size : 1 GB System host system: windows x86_64 build system : windows x86_64 Compiler linker : ld.bfd C compiler : gcc 13.2.0 Compiler Flags CPP FLAGS : C FLAGS, functional: -fno-strict-aliasing -fwrapv -fexcess-precision=standard C FLAGS, warnings : -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wforma t-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation C FLAGS, modules : -fvisibility=hidden C FLAGS, user specified: LD FLAGS : -Wl,--stack,4194304 -Wl,--allow-multiple-definition -Wl,--disable-auto-import Programs bison : C:\msys64\usr\bin/bison.EXE 3.8.2 dtrace : NO flex : C:\msys64\usr\bin/flex.EXE 2.6.4 External libraries bonjour: NO bsd_auth : NO docs : YES docs_pdf : NO gss: NO icu: NO ldap : YES libxml : NO libxslt: NO llvm : NO lz4: NO nls: YES openssl: YES 3.1.4 pam: NO plperl : NO plpython : NO pltcl : YES 8.6.12 readline : YES selinux: NO systemd: NO uuid : NO zlib : YES 1.3 zstd : YES 1.5.5 User defined options debug : true cassert: true Found ninja-1.11.1 at C:\msys64\mingw64\bin/ninja.EXE