Re: Postgresql OOM
Hi, On 2024-06-06 13:58:24 +0100, Pantelis Theodosiou wrote: > I am not qualified to answer on the OOM issue but why are you joining the > same table (outputrequest) 4 times (using an identical join condition)? The conditions aren't actually the same rpl_rec_tro. input_sequence = r.input_sequence rpl_snd_tro.reply_input_sequence = r.input_sequence snd_tro.reply_input_sequence = t.input_sequence First two are r.input_sequence to different columns, the third one also uses reply_input_sequence but joins to t, not r. Greetings, Andres Freund
Re: Postgresql OOM
Hi, On 2024-06-06 15:25:25 +0300, Radu Radutiu wrote: > I have a query that forces an out of memory error, where the OS will kill > the postgresql process. FWIW, it can be useful to configure the OS with strict memory overcommit. That causes postgres to fail more gracefully, because the OOM killer won't be invoked. > The query plan (run immediately after a vacuum analyze) is at > https://explain.depesz.com/s/ITQI#html . Can you get EXPLAIN (ANALYZE, BUFFERS) to complete if you reduce the number of workers? It'd be useful to get some of the information about the actual numbers of tuples etc. Greetings, Andres Freund
Re: XACT_EVENT for 'commit prepared'
Hi, On 2024-06-07 11:19:40 -0400, Tom Lane wrote: > Xiaoran Wang writes: > > I found that in enum XactEvent, there is 'XACT_EVENT_PREPARE' for > > 'prepare transaction', but there is no event for 'commit prepared' or > > 'rollback prepared'. > > On the whole, it seems like a good idea to me that those commands > don't invoke event triggers. It is a core principle of 2PC that > if 'prepare' succeeded, 'commit prepared' must not fail. Invoking a > trigger during the second step would add failure cases and I'm not > sure what value it has. Event triggers? Isn't this about RegisterXactCallback? XACT_EVENT_COMMIT is called after the commit record has been flushed and the procarray has been modified. Thus a failure in the hook has somewhat limited consequences. I'd assume XACT_EVENT_COMMIT_PREPARED would do something similar. I suspect the reason we don't callback for 2pc commit/rollback prepared is simpl: The code for doing a 2pc commit prepared lives in twophase.c, not xact.c... Greetings, Andres Freund
Re: PgStat_KindInfo.named_on_disk not required in shared stats
Hi, On 2024-06-07 14:07:33 +0900, Michael Paquier wrote: > While hacking on the area of pgstat_*.c, I have noticed the existence > of named_on_disk in PgStat_KindInfo, that is here to track the fact > that replication slots are a particular case in the PgStat_HashKey for > the dshash table of the stats because this kind of stats requires a > mapping between the replication slot name and the hash key. > > As far as I can see, this field is not required and is used nowhere, > because the code relies on the existence of the to_serialized_name and > from_serialized_name callbacks to do the mapping. > > Wouldn't it make sense to remove it? This field is defined since > 5891c7a8ed8f that introduced the shmem stats, and has never been used > since. Yes, makes sense. Looks we changed direction during development a bunch of times...q > This frees an extra bit in PgStat_KindInfo, which is going to help me > a bit with what I'm doing with this area of the code while keeping the > structure size the same. Note it's just a single bit, not a full byte. So unless you need precisely 30 bits, rather than 29, I don't really see why it'd help? And i don't see a reason to strictly keep the structure size the same. Greetings, Andres Freund
Re: relfilenode statistics
Hi, On 2024-06-03 11:11:46 +, Bertrand Drouvot wrote: > The main argument is that we currently don’t have writes counters for > relations. > The reason is that we don’t have the relation OID when writing buffers out. > Tracking writes per relfilenode would allow us to track/consolidate writes per > relation (example in the v1 patch and in the message up-thread). > > I think that adding instrumentation in this area (writes counters) could be > beneficial (like it is for the ones we currently have for reads). > > Second argument is that this is also beneficial for the "Split index and > table statistics into different types of stats" thread (mentioned in the > previous > message). It would allow us to avoid additional branches in some situations > (like > the one mentioned by Andres in the link I provided up-thread). I think there's another *very* significant benefit: Right now physical replication doesn't populate statistics fields like n_dead_tup, which can be a huge issue after failovers, because there's little information about what autovacuum needs to do. Auto-analyze *partially* can fix it at times, if it's lucky enough to see enough dead tuples - but that's not a given and even if it works, is often wildly inaccurate. Once we put things like n_dead_tup into per-relfilenode stats, we can populate them during WAL replay. Thus after a promotion autovacuum has much better data. This also is important when we crash: We've been talking about storing a snapshot of the stats alongside each REDO pointer. Combined with updating stats during crash recovery, we'll have accurate dead-tuple stats once recovey has finished. Greetings, Andres Freund
Re: relfilenode statistics
Hi, On 2024-06-06 12:27:49 -0400, Robert Haas wrote: > On Wed, Jun 5, 2024 at 1:52 AM Bertrand Drouvot > wrote: > > I think we should keep the stats in the relation during relfilenode changes. > > As a POC, v1 implemented a way to do so during TRUNCATE (see the changes in > > table_relation_set_new_filelocator() and in pg_statio_all_tables): as you > > can > > see in the example provided up-thread the new heap_blks_written statistic > > has > > been preserved during the TRUNCATE. > > Yeah, I think there's something weird about this design. Somehow we're > ending up with both per-relation and per-relfilenode counters: > > + pg_stat_get_blocks_written(C.oid) + > pg_stat_get_relfilenode_blocks_written(d.oid, CASE WHEN > C.reltablespace <> 0 THEN C.reltablespace ELSE d.dattablespace END, > C.relfilenode) AS heap_blks_written, > > I'll defer to Andres if he thinks that's awesome, but to me it does > not seem right to track some blocks written in a per-relation counter > and others in a per-relfilenode counter. It doesn't immediately sound awesome. Nor really necessary? If we just want to keep prior stats upon arelation rewrite, we can just copy the stats from the old relfilenode. Or we can decide that those stats don't really make sense anymore, and start from scratch. I *guess* I could see an occasional benefit in having both counter for "prior relfilenodes" and "current relfilenode" - except that stats get reset manually and upon crash anyway, making this less useful than if it were really "lifetime" stats. Greetings, Andres Freund
Re: [multithreading] extension compatibility
Hi, On 2024-06-05 21:59:42 -0400, Robert Haas wrote: > On Wed, Jun 5, 2024 at 9:50 PM Andres Freund wrote: > > Depending on the architecture / ABI / compiler options it's often not > > meaningfully more expensive to access a thread local variable than a > > "normal" > > variable. > > > > I think these days it's e.g. more expensive on x86-64 windows, but not > > linux. On arm the overhead of TLS is more noticeable, across platforms, > > afaict. > > I mean, to me, this still sounds like we want multithreading to be a > build-time option. Maybe. I think shipping a mode where users can fairly simply toggle between threaded and process mode will allow us to get this stable a *lot* quicker than if we distribute two builds. Most users don't build from source, distros will have to pick the mode. If they don't choose threaded mode, we'll not find problems. If they do choose threaded mode, we can't ask users to switch to a process based mode to check if the problem is related. We have been talking in a bunch of threads about having a mode where the main postgres binary chooses a build optimized for the current architecture, to be able to use SIMD instructions without a runtime check/dispatch. I guess we could add threadedness to such a matrix... Greetings, Andres Freund
Re: [multithreading] extension compatibility
Hi, On 2024-06-05 21:10:01 -0400, Robert Haas wrote: > On Wed, Jun 5, 2024 at 8:01 PM Heikki Linnakangas wrote: > > I'm very much in favor of a runtime toggle. To be precise, a > > PGC_POSTMASTER setting. We'll get a lot more testing if you can easily > > turn it on/off, and so far I haven't seen anything that would require it > > to be a compile time option. > > I was thinking about global variable annotations. If someone wants to > build without multithreading, I think that they won't want to still > end up with a ton of variables being changed to thread-local. Depending on the architecture / ABI / compiler options it's often not meaningfully more expensive to access a thread local variable than a "normal" variable. I think these days it's e.g. more expensive on x86-64 windows, but not linux. On arm the overhead of TLS is more noticeable, across platforms, afaict. Example compiler output for x86-64 and armv8: https://godbolt.org/z/K369eG5MM Cycle analysis or linux x86-64 output: https://godbolt.org/z/KK57vM1of This shows that for the linux x86-64 case there's no difference in efficiency between the tls/non-tls case. The reason it's so fast on x86-64 linux is that they reused one of the "old" segment registers to serve as the index register differing between each thread. For x86-64 code, most code is compiled position independent, and *also* uses an indexed mode (but relative to the instruction pointer). I think we might be able to gain some small performance benefits via the annotations, which actualy might make it viable to just apply the annotations regardless of using threads or not. Greetings, Andres Freund
Re: Use the same Windows image on both VS and MinGW tasks
Hi, On 2023-08-29 15:18:29 +0300, Nazir Bilal Yavuz wrote: > The VS and MinGW Windows images are merged on Andres' pg-vm-images > repository now [1]. So, the old pg-ci-windows-ci-vs-2019 and > pg-ci-windows-ci-mingw64 images will not be updated from now on. This > new merged image (pg-ci-windows-ci) needs to be used on both VS and > MinGW tasks. > I attached a patch for using pg-ci-windows-ci Windows image on VS and > MinGW tasks. Thanks! Pushed to 15, 16 and master. Greetings, Andres Freund
Re: allow changing autovacuum_max_workers without restarting
Hi, On 2024-06-03 14:28:13 -0500, Nathan Bossart wrote: > On Mon, Jun 03, 2024 at 12:08:52PM -0700, Andres Freund wrote: > > Why do we think that increasing the number of PGPROC slots, heavyweight > > locks > > etc by 256 isn't going to cause issues? That's not an insubstantial amount > > of > > memory to dedicate to something that will practically never be used. > > I personally have not observed problems with these kinds of bumps in > resource usage, although I may be biased towards larger systems where it > doesn't matter as much. IME it matters *more* on larger systems. Or at least used to, I haven't experimented with this in quite a while. It's possible that we improved a bunch of things sufficiently for this to not matter anymore. Greetings, Andres Freund
Re: Fix an incorrect assertion condition in mdwritev().
Hi, On 2024-06-04 07:17:51 +0900, Michael Paquier wrote: > On Sun, May 26, 2024 at 12:08:46AM -0400, Tom Lane wrote: > > After a few minutes' thought, how about: > > > > Assert((uint64) blocknum + (uint64) nblocks <= (uint64) mdnblocks(reln, > > forknum)); > > > > This'd stop being helpful if we ever widen BlockNumber to 64 bits, > > but I think that's unlikely. (Partitioning seems like a better answer > > for giant tables.) > > No idea if this will happen or not, but that's not the only area where > we are going to need a native uint128 implementation to control the > overflows with uint64. I'm confused - isn't using common/int.h entirely sufficient for that? Nearly all architectures have more efficient ways to check for 64bit overflows than doing actual 128 bit math. Greetings, Andres Freund
Re: Proposal: Document ABI Compatibility
Hi, On 2024-06-03 15:21:04 -0400, David E. Wheeler wrote: > > Extensions in general can do lots of stuff, guaranteeing that bug fixes > > don't > > cause any problems is just not feasible. > > > > It'd be interesting to see a few examples of actual minor-version-upgrade > > extension breakages, so we can judge what caused them. > > In the community Slack[4], Matthias van de Meent writes[5]: > > > Citus’ pg_version_compat.h[7] where it re-implements a low-level function > > that was newly introduced in PG14.7. If you build against PG14.7+ headers, > > you may get random crashes when running on 14.6. I don't see how this would trigger random crashes. Unfortunately [4] doesn't seem to take me to a relevant message (pruned chat history?), so I can't infer more from that context. > I suppose it would work fine on 14.7 if compiled on 14.6 though. I suspect > there aren’t many examples, though, mostly just a lot of anxiety, and some > have decided that extensions must be recompiled for every minor release in > order to avoid the issue. StackGres[7] is one example, but I suspect Omni > (Yurii’s company) may follow. Regardless of ABI issues, it's probably a good idea to continually run tests against in-development minor versions, just to prevent something breaking from creeping in. IIRC there were a handful of cases where we accidentally broke some extension, because they relied on some implementation details. > >> Unless, that is, we can provide a complete list of things not to do (like > >> make use of padding) to avoid it. Is that feasible? > > > > You can't really rely on the contents of padding, in general. So I don't > > think > > this is really something that needs to be called out. > > Sure, probably not a problem, but if that’s the sole qualifier for making > binary changes, I think it’s worth saying, as opposed to “we don’t make > any”. Something like “Only changes to padding, which you never used anyway, > right?” :-) IDK, to me something like this seems to promise more than we actually can. Greetings, Andres Freund
Re: allow changing autovacuum_max_workers without restarting
Hi, On 2024-06-03 13:52:29 -0500, Nathan Bossart wrote: > Here is an updated patch that uses 256 as the upper limit. I don't have time to read through the entire thread right now - it'd be good for the commit message of a patch like this to include justification for why it's ok to make such a change. Even before actually committing it, so reviewers have an easier time catching up. Why do we think that increasing the number of PGPROC slots, heavyweight locks etc by 256 isn't going to cause issues? That's not an insubstantial amount of memory to dedicate to something that will practically never be used. ISTM that at the very least we ought to exclude the reserved slots from the computation of things like the number of locks resulting from max_locks_per_transaction. It's very common to increase max_locks_per_transaction substantially, adding ~250 to the multiplier can be a good amount of memory. And AV workers should never need a meaningful number. Increasing e.g. the size of the heavyweight lock table has consequences besides the increase in memory usage, the size increase can make it less likely for the table to fit largely into L3, thus decreasing performance. Greetings, Andres Freund
Re: Proposal: Document ABI Compatibility
Hi, On 2024-06-03 14:43:17 -0400, David E. Wheeler wrote: > At the PGConf Unconference session on improving extension support in core, > we talked quite a bit about the recent anxiety among extension developers > about a lack of an ABI compatibility guarantee in Postgres. Are there notes for the session? > Yurii Rashkovskii did a little header file spelunking and talked[1] about a > few changes in minor version releases, including to apparent field order in > structs. It'd be nice if the slides for the talk could be uploaded... > > You must be referring to my commit 714780dc. The new field is stored within > > alignment padding (though only on back branches). Has this been tied to a > > known problem? > > At the Unconference, Tom Lane said that this approach is pretty well drilled > into the heads of every committer, and new ones pick it up through > experience. The goal, IIUC, is to never introduce binary incompatibilities > into the C APIs in minor releases. This standard would be good to document, > to let extension developers know exactly what the guarantees are. I don't think we can really make this a hard guarantee. Yes, we try hard to avoid ABI breaks, but there IIRC have been a few cases over the years where that wasn't practical for some reason. If we have to decide between a bad bug and causing an ABI issue that's unlikely to affect anybody, we'll probably choose the ABI issue. > * The ABI is guaranteed to change only in backward compatible ways in minor > releases. If for some reason it doesn’t it’s a bug that will need to be > fixed. Thus I am not really on board with this statement as-is. Extensions in general can do lots of stuff, guaranteeing that bug fixes don't cause any problems is just not feasible. It'd be interesting to see a few examples of actual minor-version-upgrade extension breakages, so we can judge what caused them. > But if I understand correctly, the use of techniques like adding a new field > in padding does not mean that extensions compiled on later minor releases > will work on earlier minor releases of the same major version. I don't think it's common for such new-fields-in-padding to cause problems when using an earlier minor PG version. For that the extension would need to actually rely on the presence of the new field, but typically that'd not be the case when we introduce a new field in a minor version. > Unless, that is, we can provide a complete list of things not to do (like > make use of padding) to avoid it. Is that feasible? You can't really rely on the contents of padding, in general. So I don't think this is really something that needs to be called out. Greetings, Andres Freund
Re: Build with LTO / -flto on macOS
Hi, On 2024-06-03 17:07:22 +0200, Wolfgang Walther wrote: > Peter Eisentraut: > > It's probably worth clarifying that this option is needed on macOS only > > if LTO is also enabled. For standard (non-LTO) builds, the > > export-dynamic behavior is already the default on macOS (otherwise > > nothing in PostgreSQL would work). > > Right, man page say this: > > > Preserves all global symbols in main executables during LTO. Without this > option, Link Time Optimization is allowed to inline and remove global > functions. This option is used when a main executable may load a plug-in > which requires certain symbols from the main executable. Gah. Apples tendency to just break stuff that has worked across *nix-y platforms for decades is pretty annoying. They could just have made --export-dynamic an alias for --export_dynamic, but no, everyone needs a special macos thingy in their build scripts. > Peter: > > I don't think we explicitly offer LTO builds as part of the make build > > system, so anyone trying this would do it sort of self-service, by > > passing additional options to configure or make. In which case they > > might as well pass the -export_dynamic option along in the same way? > > The challenge is that it defeats the purpose of LTO to pass this along to > everything, e.g. via CFLAGS. The Makefiles set this in LDFLAGS_EX_BE only, > so it only affects the backend binary. This is not at all obvious and took > me quite a while to figure out why LTO silently didn't strip symbols from > other binaries. It does work to explicitly set LDFLAGS_EX_BE, though. > > Also, passing the LTO flag on Linux "just works" (clang, not GCC > necessarily). It should just work on gcc, or at least has in the recent past. ISTM if we want to test for -export_dynamic like what you proposed, we should do so only if --export-dynamic wasn't found. No need to incur the overhead on !macos. Greetings, Andres Freund
Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Hi, At some point this patch switched from rdtsc to rdtscp, which imo largely negates the point of it. What lead to that? Greetings, Andres Freund
Re: meson "experimental"?
Hi, On May 30, 2024 8:03:33 AM PDT, Andrew Dunstan wrote: >On Thu, May 30, 2024 at 6:32 AM Aleksander Alekseev < >aleksan...@timescale.com> wrote: > >> >> >> By a quick look on the buildfarm we seem to use Ninja >= 1.11.1. >> However since Meson can use both Ninja and VS as a backend I'm not >> certain which section would be most appropriate for naming the minimal >> required version of Ninja. >> >> >When I tried building with the VS backend it blew up, I don't recall the >details. I think we should just use ninja everywhere. That keeps things >simple. VS should work, and if not, we should fix it. It's slow, so I'd not use it for scheduled builds, but for people developing using visual studio. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: meson vs windows perl
Hi, On 2024-04-05 16:12:12 -0400, Andrew Dunstan wrote: > OK, this has been fixed and checked. The attached is what I propose. The perl command is pretty hard to read. What about using python's shlex module instead? Rough draft attached. Still not very pretty, but seems easier to read? It'd be even better if we could just get perl to print out the flags in an easier to parse way, but I couldn't immediately see a way. Greetings, Andres Freund diff --git i/meson.build w/meson.build index d6401fb8e30..191a051defb 100644 --- i/meson.build +++ w/meson.build @@ -997,13 +997,20 @@ if not perlopt.disabled() undesired = run_command(perl_conf_cmd, 'ccdlflags', check: true).stdout().split() undesired += run_command(perl_conf_cmd, 'ldflags', check: true).stdout().split() +ldopts_split = run_command(python, '-c', ''' +import shlex +import sys + +print('\n'.join(shlex.split(sys.argv[1])), end='') +''', + ldopts, check: true).stdout().split('\n') + perl_ldopts = [] -foreach ldopt : ldopts.split(' ') - if ldopt == '' or ldopt in undesired +foreach ldopt : ldopts_split + if ldopt in undesired continue endif - - perl_ldopts += ldopt.strip('"') + perl_ldopts += ldopt endforeach message('LDFLAGS recommended by perl: "@0@"'.format(ldopts))
Re: First draft of PG 17 release notes
Hi, On 2024-05-22 18:33:03 -0400, Bruce Momjian wrote: > On Tue, May 21, 2024 at 09:40:28AM -0700, Andres Freund wrote: > > On 2024-05-18 11:13:54 -0400, Bruce Momjian wrote: > > I agree keeping things reasonably short is important. But I don't think > > you're > > evenly applying it as a goal. > > > > Just skimming the notes from the end, I see > > - an 8 entries long pg_stat_statements section > > What item did you want to remove? Those are all user-visible changes. My point here was not that we necessarily need to remove those, but that their impact to users is smaller than many of the performance impacts you disregard. > > - multiple entries about "Create custom wait events for ..." > > Well, those are all in different sections, so how can they be merged, > unless I create a "wait event section", I guess. They're not, all are in "Additional Modules". Instead of - Create custom wait events for postgres_fdw (Masahiro Ikeda) - Create custom wait events for dblink (Masahiro Ikeda) - Allow extensions to define custom wait events (Masahiro Ikeda) I'd make it: - Allow extensions to define custom wait events and create custom wait events for postgres_fdw, dblink (Masahiro Ikeda) > > - three entries about adding --all to {reindexdb,vacuumdb,clusterdb}. > > The problem with merging these is that the "Specifically, --all can now > be used with" is different for all three of them. You said you were worried about the length of the release notes, because it discourages users from actually reading the release notes, due to getting bored. Having three instance of almost the same entry, with just minor changes between them, seems to precisely endanger boring readers. I'd probably just go for - Add --all option to clusterdb, reindexdb, vacuumdb to process objects in all databases matching a pattern (Nathan Bossart) or such. The precise details of how the option works for the different commands doesn't need to be stated in the release notes, that's more of a reference documentation thing. But if you want to include it, we can do something like Specifically, --all can now be used with --table (all commands), --schema (reindexdb, vacuumdb), and --exclude-schema (reindexdb, vacuumdb). > > - an entry about adding long options to pg_archivecleanup > > Well, that is a user-visible change. Should it not be listed? If you are concerned about the length of the release notes and as a consequence not including more impactful performance changes, then no, it shouldn't. It doesn't break anyones current scripts, it doesn't enable anything new. > > - two entries about grantable maintenance rights, once via pg_maintain, once > > per-table > > Well, one is a GRANT and another is a role, so merging them seemed like > it would be too confusing. I don't think it has to be. Maybe something roughly like - Allow granting the right to perform maintenance operations (Nathan Bossart) The permission can be granted on a per-table basis using the MAINTAIN privilege and on a system wide basis via the pg_maintain role. Operations that can be controlled are VACUUM, ANALYZE, REINDEX, REFRESH MATERIALIZED VIEW, CLUSTER, and LOCK TABLE. I'm again mostly reacting to your concern that the release notes are getting too boring to read. Repeated content, like in the current formulation, imo does endanger that. Current it is: - Add per-table GRANT permission MAINTAIN to control maintenance operations (Nathan Bossart) The operations are VACUUM, ANALYZE, REINDEX, REFRESH MATERIALIZED VIEW, CLUSTER, and LOCK TABLE. - Add user-grantable role pg_maintain to control maintenance operations (Nathan Bossart) The operations are VACUUM, ANALYZE, REINDEX, REFRESH MATERIALIZED VIEW, CLUSTER, and LOCK TABLE. > > - separate entries about pg_stat_reset_slru(), pg_stat_reset_shared("slru"), > > They are different functions with different detail text. So what? You can change their text. Making it three entries makes it harder for a reader that doesn't care about resetting stats to skip over the details. Make it something like - Improve control over resetting statistics (Atsushi Torikoshi, Bharath Rupireddy) pg_stat_reset_shared() can now reset all shared statistics, by passing NULL; pg_stat_reset_shared(NULL) also resets SLRU statistics; pg_stat_reset_shared("slru") resets SLRU statistics, which was already possible using pg_stat_reset_slru(NULL). Greetings, Andres Freund
Re: First draft of PG 17 release notes
Hi, On 2024-05-23 23:27:04 -0400, Bruce Momjian wrote: > On Thu, May 23, 2024 at 11:11:10PM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > I am not sure Bruce that you realize that your disregard for > > performance improvements is shared by nobody. Arguably, > > performance is 90% of what we do these days, and it's also > > 90% of what users care about. > > Please stop saying I don't document performance. I have already > explained enough which performance items I choose. Please address my > criteria or suggest new criteria. Bruce, just about everyone seems to disagree with your current approach. And not just this year, this has been a discussion in most if not all release note threads of the last few years. People, including me, *have* addressed your criteria, but you just waved those concerns away. It's hard to continue discussing criteria when it doesn't at all feel like a conversation. In the end, these are patches to the source code, I don't think you can just wave away widespread disagreement with your changes. That's not how we do postgres development. Greetings, Andres Freund
Re: Upgrade Debian CI images to Bookworm
Hi, On 2024-05-24 16:17:37 +0200, Peter Eisentraut wrote: > I'm not sure what the backpatching expectations of this kind of thing is. > The history of this CI setup is relatively short, so this hasn't been > stressed too much. I see that we once backpatched the macOS update, but > that might have been all. I've backpatched a few other changes too. > If we start backpatching this kind of thing, then this will grow as a job > over time. We'll have 5 or 6 branches to keep up to date, with several > operating systems. And once in a while we'll have to make additional > changes like this warning fix you mention here. I'm not sure how much we > want to take this on. Is there ongoing value in the CI setup in > backbranches? I find it extremely useful to run CI on backbranches before batckpatching. Enough so that I've thought about proposing backpatching CI all the way. I don't think it's that much work to fix this kind of thing in the backbranches. We don't need to backpatch new tasks or such. Just enough stuff to keep e.g. the base image the same - otherwise we end up running CI on unsupported distros, which doesn't help anybody. > With these patches, we could do either of the following: > 5) We update master, PG16, and PG15, but we hold all of them until the > warning in PG15 is fixed. I think we should apply the fix in <= 15 - IMO it's a correct compiler warning, what we do right now is wrong. Greetings, Andres Freund
Re: Convert node test compile-time settings into run-time parameters
Hi, On 2024-05-20 09:28:39 +0200, Peter Eisentraut wrote: > - Performance? Looking for example at pg_parse_query() and its siblings, > they also check for other debugging settings like log_parser_stats in the > main code path, so it doesn't seem to be a concern. I don't think we can conclude that. Just because we've not been that careful about performance in a few spots doesn't mean we shouldn't be careful in other areas. And I think something like log_parser_stats is a lot more generally useful than debug_copy_parse_plan_trees. The branch itself isn't necessarily the issue, the branch predictor can handle that to a good degree. The reduction in code density is a bigger concern - and also very hard to measure, because the cost is very incremental and distributed. At the very least I'd add unlikely() to all of the branches, so the debug code can be placed separately from the "normal" portions. Where I'd be more concerned about peformance is the added branch in READ_LOCATION_FIELD. There are a lot of calls to that, addding runtime branches to each, with external function calls inside, is somewhat likely to be measurable. > - Access control? I have these settings as PGC_USERSET for now. Maybe they > should be PGC_SUSET? That probably would be right. > Another thought: Do we really need three separate settings? Maybe not three settings, but a single setting, with multiple values, like debug_io_direct? Greetings, Andres Freund
Re: zlib detection in Meson on Windows broken?
Hi, On 2024-05-20 11:58:05 +0100, Dave Page wrote: > I then attempt to build PostgreSQL: > > meson setup build > -Dextra_include_dirs=C:/build64/openssl/include,C:/build64/zlib/include > -Dextra_lib_dirs=C:/build64/openssl/lib,C:/build64/zlib/lib -Dssl=openssl > -Dzlib=enabled --prefix=c:/build64/pgsql > > Which results in the output in output.txt, indicating that OpenSSL was > correctly found, but zlib was not. I've also attached the meson log. I forgot to mention that earlier: Assuming you're building something to be distributed, I'd recommend --auto-features=enabled/disabled and specifying specifically which dependencies you want to be used. Greetings, Andres Freund
Re: First draft of PG 17 release notes
Hi, On 2024-05-21 09:27:20 -0700, Andres Freund wrote: > Also, the release notes are also not just important to users. I often go back > and look in the release notes to see when some some important change was made, > because sometimes it's harder to find it in the git log, due to sheer > volume. And even just keeping up with all the changes between two releases is > hard, it's useful to be able to read the release notes and see what happened. > > [...] > > [1] I've wondered if we should have one more level of TOC on the release note > page, so it's easier to jump to specific sections. Which reminds me: Eventually I'd like to add links to the most important commits related to release note entries. We already do much of the work of building that list of commits for each entry. That'd allow a reader to find more details if interested. Right one either has to open the sgml file (which no user will know to do), or find the git entries manually. The latter of which is often hard, because the git commits often will use different wording etc. Admittedly doing so within the constraints of docbook and not wanting to overly decrease density (both in .sgml and the resulting html) isn't a trivial task. That's entirely independent of my concern around noting performance improvements in the release notes, of course. Greetings, Andres Freund
Re: First draft of PG 17 release notes
Hi, On 2024-05-18 11:13:54 -0400, Bruce Momjian wrote: > Please see the email I just posted. There are three goals we have to > adjust for: > > 1. short release notes so they are readable > 2. giving people credit for performance improvements > 3. showing people Postgres cares about performance > > I would like to achieve 2 & 3 without harming #1. My experience is if I > am reading a long document, and I get to a section where I start to > wonder, "Why should I care about this?", I start to skim the rest of > the document. I agree keeping things reasonably short is important. But I don't think you're evenly applying it as a goal. Just skimming the notes from the end, I see - an 8 entries long pg_stat_statements section - multiple entries about "Create custom wait events for ..." - three entries about adding --all to {reindexdb,vacuumdb,clusterdb}. - an entry about adding long options to pg_archivecleanup - two entries about grantable maintenance rights, once via pg_maintain, once per-table - separate entries about pg_stat_reset_slru(), pg_stat_reset_shared("slru"), If you're concerned about brevity, we can make things shorter without skipping over most performance imporvements. > I am particularly critical if I start to wonder, "Why > does the author _think_ I should care about this?" becasue it feels like > the author is writing for him/herself and not the audience. FWIW, I think it's a good thing for somebody other than the author to have a hand in writing a release notes entry for a change. The primary author(s) are often too deep into some issue to have a good view of the right level of detail and understandability. Greetings, Andres Freund
Re: First draft of PG 17 release notes
Hi, On 2024-05-18 10:59:47 -0400, Bruce Momjian wrote: > On Wed, May 15, 2024 at 08:48:02PM -0700, Andres Freund wrote: > > +many. > > > > We're having this debate every release. I think the ongoing reticence to > > note > > performance improvements in the release notes is hurting Postgres. > > > > For one, performance improvements are one of the prime reason users > > upgrade. Without them being noted anywhere more dense than the commit log, > > it's very hard to figure out what improved for users. A halfway widely > > applicable performance improvement is far more impactful than many of the > > feature changes we do list in the release notes. > > I agree the impact of performance improvements are often greater than > the average release note item. However, if people expect Postgres to be > faster, is it important for them to know _why_ it is faster? Yes, it very often is. Performance improvements typically aren't "make everything 3% faster", they're more "make this special thing 20% faster". Without know what got faster, users don't know if a) the upgrade will improve their production situation b) they need to change something to take advantage of the improvement > On the flip side, a performance improvement that makes everything 10% > faster has little behavioral change for users, and in fact I think we > get ~6% faster in every major release. I cannot recall many "make everything faster" improvements, if any. And even if it's "make everything faster" - that's useful for users to know, it might solve their production problem! It's also good for PR. Given how expensive postgres upgrades still are, we can't expect production workloads to upgrade to every major version. The set of performance improvements and feature additions between major versions can help users make an informed decision. Also, the release notes are also not just important to users. I often go back and look in the release notes to see when some some important change was made, because sometimes it's harder to find it in the git log, due to sheer volume. And even just keeping up with all the changes between two releases is hard, it's useful to be able to read the release notes and see what happened. > > For another, it's also very frustrating for developers that focus on > > performance. The reticence to note their work, while noting other, far > > smaller, things in the release notes, pretty much tells us that our work > > isn't > > valued. > > Yes, but are we willing to add text that every user will have to read > just for this purpose? Of course it's a tradeoff. We shouldn't verbosely note down every small changes just because of the recognition, that'd make the release notes unreadable. And it'd just duplicate the commit log. But that's not the same as defaulting to not noting performance improvements, even if the performance improvement is more impactful than many other features that are noted. > One think we _could_ do is to create a generic performance release note > item saying performance has been improved in the following areas, with > no more details, but we can list the authors on the item. To me that's the "General Performance" section. If somebody reading the release notes doesn't care about performance, they can just skip that section ([1]). I don't see why we wouldn't want to include the same level of detail as for other changes. Greetings, Andres Freund [1] I've wondered if we should have one more level of TOC on the release note page, so it's easier to jump to specific sections.
Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
On 2024-05-17 16:03:09 -0400, Peter Geoghegan wrote: > On Fri, May 17, 2024 at 3:50 PM Andres Freund wrote: > > You're saying that the data is correctly accessible on primaries, but broken > > on standbys? Is there any difference in how the page looks like on the > > primary > > vs standby? > > There clearly is. The relevant infomask bits are different. I didn't > examine it much closer than that, though. That could also just be because of a different replay position, hence my question about that somewhere else in the email...
Re: zlib detection in Meson on Windows broken?
Hi, On 2024-05-20 11:58:05 +0100, Dave Page wrote: > I have very little experience with Meson, and even less interpreting it's > logs, but it seems to me that it's not including the extra lib and include > directories when it runs the test compile, given the command line it's > reporting: > > cl C:\Users\dpage\git\postgresql\build\meson-private\tmpg_h4xcue\testfile.c > /nologo /showIncludes /utf-8 /EP /nologo /showIncludes /utf-8 /EP /Od /Oi- > > Bug, or am I doing something silly? It's a buglet. We rely on meson's internal fallback detection of zlib, if it's not provided via pkg-config or cmake. But it doesn't know about our extra_include_dirs parameter. We should probably fix that... Greetings, Andres Freund
Re: Speed up clean meson builds by ~25%
On 2024-05-20 09:37:46 -0400, Robert Haas wrote: > On Sat, May 18, 2024 at 6:09 PM Andres Freund wrote: > > A few tests with ccache disabled: > > These tests seem to show no difference between the two releases, so I > wonder what you're intending to demonstrate here. They show a few seconds of win for 'real' time. -O0: 0m21.577s->0m19.529s -O3: 0m59.730s->0m54.853s
Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
Hi, On 2024-05-17 15:12:31 +0200, Pavel Stehule wrote: > after migration on PostgreSQL 16 I seen 3x times (about every week) broken > tables on replica nodes. The query fails with error Migrating from what version? You're saying that the data is correctly accessible on primaries, but broken on standbys? Is there any difference in how the page looks like on the primary vs standby? > ERROR: could not access status of transaction 1442871302 > DETAIL: Could not open file "pg_xact/0560": No such file or directory > > verify_heapam reports > > ^[[Aprd=# select * from verify_heapam('account_login_history') where blkno > = 179036; > blkno | offnum | attnum |msg > > +++--- > 179036 | 30 || xmin 1393743382 precedes oldest valid > transaction ID 3:1687012112 So that's not just a narrow race... > master > > (2024-05-17 14:36:57) prd=# SELECT * FROM > page_header(get_raw_page('account_login_history', 179036)); > lsn │ checksum │ flags │ lower │ upper │ special │ pagesize │ > version │ prune_xid > ───┼──┼───┼───┼───┼─┼──┼─┼─── > A576/810F4CE0 │0 │ 4 │ 296 │ 296 │8192 │ 8192 │ > 4 │ 0 > (1 row) > > > replica > prd_aukro=# SELECT * FROM page_header(get_raw_page('account_login_history', > 179036)); > lsn | checksum | flags | lower | upper | special | pagesize | > version | prune_xid > ---+--+---+---+---+-+--+-+--- > A56C/63979DA0 |0 | 0 | 296 | 296 |8192 | 8192 | > 4 | 0 > (1 row) Is the replica behind the primary? Or did we somehow end up with diverging data? The page LSNs differ by about 40GB... Is there evidence of failed truncations of the relation in the log? From autovacuum? Does the data in the readable versions of the tuples on that page actually look valid? Is it possibly duplicated data? I'm basically wondering whether it's possible that we errored out during truncation (e.g. due to a file permission issue or such). Due to some brokenness in RelationTruncate() that can lead to data divergence between primary and standby and to old tuples re-appearing on either. Another question: Do you use pg_repack or such? Greetings, Andres Freund
Re: problems with "Shared Memory and Semaphores" section of docs
Hi, On 2024-05-17 18:30:08 +, Imseih (AWS), Sami wrote: > > The advantage of the GUC is that its value could be seen before trying to > > actually start the server. > > Only if they have a sample in postgresql.conf file, right? > A GUC like shared_memory_size_in_huge_pages will not be. You can query gucs with -C. E.g. postgres -D pgdev-dev -c shared_buffers=16MB -C shared_memory_size_in_huge_pages 13 postgres -D pgdev-dev -c shared_buffers=16MB -c huge_page_size=1GB -C shared_memory_size_in_huge_pages 1 Which is very useful to be able to actually configure that number of huge pages. I don't think a system view or such would not help here. Greetings, Andres Freund
Lowering the minimum value for maintenance_work_mem
Hi, In the subthread at [1] I needed to trigger multiple rounds of index vacuuming within one vacuum. It turns out that with the new dead tuple implementation, that got actually somewhat expensive. Particularly if all tuples on all pages get deleted, the representation is just "too dense". Normally that's obviously very good, but for testing, not so much: With the minimum setting of maintenance_work_mem=1024kB, a simple table with narrow rows, where all rows are deleted, the first cleanup happens after 3697812 dead tids. The table for that has to be > ~128MB. Needing a ~128MB table to be able to test multiple cleanup passes makes it much more expensive to test and consequently will lead to worse test coverage. I think we should consider lowering the minimum setting of maintenance_work_mem to the minimum of work_mem. For real-life workloads maintenance_work_mem=1024kB is going to already be quite bad, so we don't protect users much by forbidding a setting lower than 1MB. Just for comparison, with a limit of 1MB, < 17 needed to do the first cleanup pass after 174472 dead tuples. That's a 20x improvement. Really nice. Greetings, Andres Freund [1\ https://postgr.es/m/20240516193953.zdj545efq6vabymd%40awork3.anarazel.de
gist index builds try to open FSM over and over
./../../../../home/andres/src/postgresql/src/backend/access/heap/heapam_handler.c:1706 #18 0x00847996 in table_index_build_scan (table_rel=0x7f5b87979688, index_rel=0x7f5b87977f38, index_info=0x2fd9f50, allow_sync=true, progress=true, callback=0x848b6b , callback_state=0x7ffd3ce87340, scan=0x0) at ../../../../../home/andres/src/postgresql/src/include/access/tableam.h:1794 #19 0x00847da8 in gistbuild (heap=0x7f5b87979688, index=0x7f5b87977f38, indexInfo=0x2fd9f50) at ../../../../../home/andres/src/postgresql/src/backend/access/gist/gistbuild.c:313 #20 0x0094c945 in index_build (heapRelation=0x7f5b87979688, indexRelation=0x7f5b87977f38, indexInfo=0x2fd9f50, isreindex=false, parallel=true) at ../../../../../home/andres/src/postgresql/src/backend/catalog/index.c:3021 #21 0x0094970f in index_create (heapRelation=0x7f5b87979688, indexRelationName=0x2f2f798 "foo_i_idx1", indexRelationId=17747, parentIndexRelid=0, parentConstraintId=0, relFileNumber=0, indexInfo=0x2fd9f50, indexColNames=0x2f2fc60, accessMethodId=783, tableSpaceId=0, collationIds=0x2f32340, opclassIds=0x2f32358, opclassOptions=0x2f32370, coloptions=0x2f32388, stattargets=0x0, reloptions=0, flags=0, constr_flags=0, allow_system_table_mods=false, is_internal=false, constraintId=0x7ffd3ce876f4) at ../../../../../home/andres/src/postgresql/src/backend/catalog/index.c:1275 The reason we reopen over and over is that we close the file in mdexist(): /* * Close it first, to ensure that we notice if the fork has been unlinked * since we opened it. As an optimization, we can skip that in recovery, * which already closes relations when dropping them. */ if (!InRecovery) mdclose(reln, forknum); We call smgrexists as part of this code: static Buffer fsm_readbuf(Relation rel, FSMAddress addr, bool extend) ... /* * If we haven't cached the size of the FSM yet, check it first. Also * recheck if the requested block seems to be past end, since our cached * value might be stale. (We send smgr inval messages on truncation, but * not on extension.) */ if (reln->smgr_cached_nblocks[FSM_FORKNUM] == InvalidBlockNumber || blkno >= reln->smgr_cached_nblocks[FSM_FORKNUM]) { /* Invalidate the cache so smgrnblocks asks the kernel. */ reln->smgr_cached_nblocks[FSM_FORKNUM] = InvalidBlockNumber; if (smgrexists(reln, FSM_FORKNUM)) smgrnblocks(reln, FSM_FORKNUM); else reln->smgr_cached_nblocks[FSM_FORKNUM] = 0; } Because we set the size to 0 if the the fork doesn't exist, we'll reenter during the next call, and then do the same thing again. I don't think this is a huge performance issue or anything, but somehow it seems indicative of something being "wrong". It seems likely we encounter this issue not just with gist, but I haven't checked yet. Greetings, Andres Freund
Re: race condition when writing pg_control
Hi, On 2024-05-16 15:01:31 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2024-05-16 14:50:50 -0400, Tom Lane wrote: > >> The intention was certainly always that it be atomic. If it isn't > >> we have got *big* trouble. > > > We unfortunately do *know* that on several systems e.g. basebackup can read > > a > > partially written control file, while the control file is being > > updated. > > Yeah, but can't we just retry that if we get a bad checksum? Retry what/where precisely? We can avoid the issue in basebackup.c by taking ControlFileLock in the right moment - but that doesn't address pg_start/stop_backup based backups. Hence the patch in the referenced thread moving to replacing the control file by atomic-rename if there are base backups ongoing. > What had better be atomic is the write to disk. That is still true to my knowledge. > Systems that can't manage POSIX semantics for concurrent reads and writes > are annoying, but not fatal ... I think part of the issue that people don't agree what posix says about a read that's concurrent to a write... See e.g. https://utcc.utoronto.ca/~cks/space/blog/unix/WriteNotVeryAtomic Greetings, Andres Freund
Re: race condition when writing pg_control
Hi, On 2024-05-16 14:50:50 -0400, Tom Lane wrote: > Nathan Bossart writes: > > I suspect it will be difficult to investigate this one too much further > > unless we can track down a copy of the control file with the bad checksum. > > Other than searching for any new code that isn't doing the appropriate > > locking, maybe we could search the buildfarm for any other occurrences. I > > also seem some threads concerning whether the way we are reading/writing > > the control file is atomic. > > The intention was certainly always that it be atomic. If it isn't > we have got *big* trouble. We unfortunately do *know* that on several systems e.g. basebackup can read a partially written control file, while the control file is being updated. Thomas addressed this partially for frontend code, but not yet for backend code. See https://postgr.es/m/CA%2BhUKGLhLGCV67NuTiE%3Detdcw5ChMkYgpgFsa9PtrXm-984FYA%40mail.gmail.com Greetings, Andres Freund
Re: First draft of PG 17 release notes
Hi, On 2024-05-15 10:38:20 +0200, Alvaro Herrera wrote: > I disagree with this. IMO the impact of the Sawada/Naylor change is > likely to be enormous for people with large tables and large numbers of > tuples to clean up (I know we've had a number of customers in this > situation, I can't imagine any Postgres service provider that doesn't). > The fact that maintenance_work_mem is no longer capped at 1GB is very > important and I think we should mention that explicitly in the release > notes, as setting it higher could make a big difference in vacuum run > times. +many. We're having this debate every release. I think the ongoing reticence to note performance improvements in the release notes is hurting Postgres. For one, performance improvements are one of the prime reason users upgrade. Without them being noted anywhere more dense than the commit log, it's very hard to figure out what improved for users. A halfway widely applicable performance improvement is far more impactful than many of the feature changes we do list in the release notes. For another, it's also very frustrating for developers that focus on performance. The reticence to note their work, while noting other, far smaller, things in the release notes, pretty much tells us that our work isn't valued. Greetings, Andres Freund
Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Hi, On 2024-05-15 11:14:14 +0200, Alvaro Herrera wrote: > On 2024-May-15, Bharath Rupireddy wrote: > > > It looks like with the use of the new multi insert table access method > > (TAM) for COPY (v20-0005), pgbench regressed about 35% [1]. > > Where does this acronym "TAM" comes from for "table access method"? I > find it thoroughly horrible and wish we didn't use it. What's wrong > with using "table AM"? It's not that much longer, much clearer and > reuses our well-established acronym AM. Strongly agreed. I don't know why I dislike TAM so much though. Greetings, Andres Freund
Re: Adding the extension name to EData / log_line_prefix
Hi, On 2024-05-15 13:45:30 -0400, Tom Lane wrote: > There is one advantage over my suggestion of changing PG_MODULE_MAGIC: > if we tell people to write > >PG_MODULE_MAGIC; >#undef TEXTDOMAIN >#define TEXTDOMAIN PG_TEXTDOMAIN("hstore") > > then that's 100% backwards compatible and they don't need any > version-testing ifdef's. > > I still think that the kind of infrastructure Andres proposes > is way overkill compared to the value, plus it's almost certainly > going to have a bunch of platform-specific problems to solve. Maybe I missing something here. Even adding those two lines to the extensions in core and contrib is going to end up being more lines than what I proposed? What portability issues do you forsee? We already look up the same symbol in all the shared libraries ("Pg_magic_func"), so we know that we can deal with duplicate function names. Are you thinking that somehow we'd end up with symbol interposition or something? Greetings, Andres Freund
Re: Adding the extension name to EData / log_line_prefix
Hi, On 2024-05-15 12:54:45 -0400, Chapman Flack wrote: > On 05/15/24 11:50, Tom Lane wrote: > > Hmm, cute idea, but it'd only help for extensions that are > > NLS-enabled. Which I bet is a tiny fraction of the population. > > So far as I can find, we don't even document how to set up > > TEXTDOMAIN for an extension --- you have to cargo-cult the > > But I'd bet, within the fraction of the population that does use it, > it is already a short string that looks a whole lot like the name > of the extension. So maybe enhancing the documentation and making it > easy to set up would achieve much of the objective here. The likely outcome would IMO be that some extensions will have the data, others not. Whereas inferring the information from our side will give you something reliable. But I also just don't think it's something that architecturally fits together that well. If we either had TEXTDOMAIN reliably set across extensions or it'd architecturally be pretty, I'd go for it, but imo it's neither. Greetings, Andres Freund
Re: Adding the extension name to EData / log_line_prefix
Hi, On 2024-05-13 19:11:53 -0400, Tom Lane wrote: > The mechanism that Andres describes for sourcing the name seems a bit > overcomplex though. Why not just allow/require each extension to > specify its name as a constant string? We could force the matter by > redefining PG_MODULE_MAGIC as taking an argument: > PG_MODULE_MAGIC("hstore"); Mostly because it seemed somewhat sad to require every extension to have version-specific ifdefs around that, particularly because it's not hard for us to infer. I think there might be other use cases for the backend to provide "extension scoped" information, FWIW. Even just providing the full path to the extension library could be useful. Greetings, Andres Freund
Re: Adding the extension name to EData / log_line_prefix
Hi, On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote: > On Mon, May 13, 2024 at 5:51 PM Andres Freund wrote: > > It's not entirely trivial to provide errfinish() with a parameter > indicating > > the extension, but it's doable: > > > > 1) Have PG_MODULE_MAGIC also define a new variable for the extension name, > >empty at that point > > > > 2) In internal_load_library(), look up that new variable, and fill it > with a, > >mangled, libname. > > > > 4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set, > >we're in the server, otherwise an extension). In the backend itself, > define > >it to NULL, otherwise to the variable created by PG_MODULE_MAGIC. > > > > 5) In elog/ereport/errsave/... pass this new variable to > >errfinish/errsave_finish. > > > > Then every extension should define their own Pg_extension_filename, right? It'd be automatically set by postgres when loading libraries. > > I've attached a *very rough* prototype of this idea. My goal at this > stage was > > just to show that it's possible, not for the code to be in a reviewable > state. > > > > > > Here's e.g. what this produces with log_line_prefix='%m [%E] ' > > > > 2024-05-13 13:50:17.518 PDT [postgres] LOG: database system is ready to > accept connections > > 2024-05-13 13:50:19.138 PDT [cube] ERROR: invalid input syntax for cube > at character 13 > > 2024-05-13 13:50:19.138 PDT [cube] DETAIL: syntax error at or near "f" > > 2024-05-13 13:50:19.138 PDT [cube] STATEMENT: SELECT cube('foo'); > > > > 2024-05-13 13:43:07.484 PDT [postgres] LOG: database system is ready to > accept connections > > 2024-05-13 13:43:11.699 PDT [hstore] ERROR: syntax error in hstore: > unexpected end of string at character 15 > > 2024-05-13 13:43:11.699 PDT [hstore] STATEMENT: SELECT hstore('foo'); > > > > > > Was not able to build your patch by simply: Oh, turns out it only builds with meson right now. I forgot that, with autoconf, for some unknown reason, we only set BUILDING_DLL on some OSs. I attached a crude patch changing that. > > It's worth pointing out that this, quite fundamentally, can only work > when the > > log message is triggered directly by the extension. If the extension code > > calls some postgres function and that function then errors out, it'll be > seens > > as being part of postgres. > > > > But I think that's ok - they're going to be properly errcode-ified etc. > > > > Hmmm, depending on the extension it can extensively call/use postgres code > so would be nice if we can differentiate if the code is called from > Postgres itself or from an extension. I think that's not realistically possible. It's also very fuzzy what that'd mean. If there's a planner hook and then the query executes normally, what do you report for an execution time error? And even the simpler case - should use of pg_stat_statements cause everything within to be logged as a pg_stat_statement message? I think the best we can do is to actually say where the error is directly triggered from. Greetings, Andres Freund >From 1c59c465f2d359e8c47cf91d1ea458ea3b64ec84 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 13 May 2024 13:47:41 -0700 Subject: [PATCH v2 1/2] WIP: Track shared library in which elog/ereport is called --- src/include/fmgr.h | 1 + src/include/utils/elog.h | 18 +- src/backend/utils/error/elog.c | 33 - src/backend/utils/fmgr/dfmgr.c | 30 ++ 4 files changed, 68 insertions(+), 14 deletions(-) diff --git a/src/include/fmgr.h b/src/include/fmgr.h index ccb4070a251..3c7cfd7fee9 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -504,6 +504,7 @@ PG_MAGIC_FUNCTION_NAME(void) \ static const Pg_magic_struct Pg_magic_data = PG_MODULE_MAGIC_DATA; \ return _magic_data; \ } \ +PGDLLEXPORT const char *Pg_extension_filename = NULL; \ extern int no_such_variable diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 054dd2bf62f..5e57f01823d 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -137,6 +137,13 @@ struct Node; * prevents gcc from making the unreachability deduction at optlevel -O0. *-- */ +#ifdef BUILDING_DLL +#define ELOG_EXTNAME NULL +#else +extern PGDLLEXPORT const char *Pg_extension_filename; +#define ELOG_EXTNAME Pg_extension_filename +#endif + #ifdef HAVE__BUILTIN_CONSTANT_P #define ereport_domain(elevel, domain, ...) \ do { \ @@ -144,7 +151,7 @@ struct Node; if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ errstart_cold(elevel, domain) : \ errs
Adding the extension name to EData / log_line_prefix
Hi, It can be very useful to look at the log messages emitted by a larger number of postgres instances to see if anything unusual is happening. E.g. checking whether there are an increased number of internal, IO, corruption errors (and LOGs too, because we emit plenty bad things as LOG) . One difficulty is that extensions tend to not categorize their errors. But unfortunately errors in extensions are hard to distinguish from errors emitted by postgres. A related issue is that it'd be useful to be able to group log messages by extension, to e.g. see which extensions are emitting disproportionally many log messages. Therefore I'd like to collect the extension name in elog/ereport and add a matching log_line_prefix escape code. It's not entirely trivial to provide errfinish() with a parameter indicating the extension, but it's doable: 1) Have PG_MODULE_MAGIC also define a new variable for the extension name, empty at that point 2) In internal_load_library(), look up that new variable, and fill it with a, mangled, libname. 4) in elog.h, define a new macro depending on BUILDING_DLL (if it is set, we're in the server, otherwise an extension). In the backend itself, define it to NULL, otherwise to the variable created by PG_MODULE_MAGIC. 5) In elog/ereport/errsave/... pass this new variable to errfinish/errsave_finish. I've attached a *very rough* prototype of this idea. My goal at this stage was just to show that it's possible, not for the code to be in a reviewable state. Here's e.g. what this produces with log_line_prefix='%m [%E] ' 2024-05-13 13:50:17.518 PDT [postgres] LOG: database system is ready to accept connections 2024-05-13 13:50:19.138 PDT [cube] ERROR: invalid input syntax for cube at character 13 2024-05-13 13:50:19.138 PDT [cube] DETAIL: syntax error at or near "f" 2024-05-13 13:50:19.138 PDT [cube] STATEMENT: SELECT cube('foo'); 2024-05-13 13:43:07.484 PDT [postgres] LOG: database system is ready to accept connections 2024-05-13 13:43:11.699 PDT [hstore] ERROR: syntax error in hstore: unexpected end of string at character 15 2024-05-13 13:43:11.699 PDT [hstore] STATEMENT: SELECT hstore('foo'); It's worth pointing out that this, quite fundamentally, can only work when the log message is triggered directly by the extension. If the extension code calls some postgres function and that function then errors out, it'll be seens as being part of postgres. But I think that's ok - they're going to be properly errcode-ified etc. Thoughts? Greetings, Andres Freund >From 1c59c465f2d359e8c47cf91d1ea458ea3b64ec84 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 13 May 2024 13:47:41 -0700 Subject: [PATCH v1] WIP: Track shared library in which elog/ereport is called --- src/include/fmgr.h | 1 + src/include/utils/elog.h | 18 +- src/backend/utils/error/elog.c | 33 - src/backend/utils/fmgr/dfmgr.c | 30 ++ 4 files changed, 68 insertions(+), 14 deletions(-) diff --git a/src/include/fmgr.h b/src/include/fmgr.h index ccb4070a251..3c7cfd7fee9 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -504,6 +504,7 @@ PG_MAGIC_FUNCTION_NAME(void) \ static const Pg_magic_struct Pg_magic_data = PG_MODULE_MAGIC_DATA; \ return _magic_data; \ } \ +PGDLLEXPORT const char *Pg_extension_filename = NULL; \ extern int no_such_variable diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 054dd2bf62f..5e57f01823d 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -137,6 +137,13 @@ struct Node; * prevents gcc from making the unreachability deduction at optlevel -O0. *-- */ +#ifdef BUILDING_DLL +#define ELOG_EXTNAME NULL +#else +extern PGDLLEXPORT const char *Pg_extension_filename; +#define ELOG_EXTNAME Pg_extension_filename +#endif + #ifdef HAVE__BUILTIN_CONSTANT_P #define ereport_domain(elevel, domain, ...) \ do { \ @@ -144,7 +151,7 @@ struct Node; if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ errstart_cold(elevel, domain) : \ errstart(elevel, domain)) \ - __VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \ + __VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \ if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \ pg_unreachable(); \ } while(0) @@ -154,7 +161,7 @@ struct Node; const int elevel_ = (elevel); \ pg_prevent_errno_in_scope(); \ if (errstart(elevel_, domain)) \ - __VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \ + __VA_ARGS__, errfinish(__FILE__, __LINE__, __func__, ELOG_EXTNAME); \ if (elevel_ >= ERROR) \ pg_unreachable(); \ } while(0) @@ -169,7 +176,7 @@ extern bool message_level_is_interesting(int elevel); extern bool errstart(int elevel, const char *domain); extern pg_attribute_cold bool errstart_cold(int elevel, const char *domain); -extern void errfinish(const
Re: WAL record CRC calculated incorrectly because of underlying buffer modification
Hi, On 2024-05-10 16:00:01 +0300, Alexander Lakhin wrote: > and discovered that XLogRecordAssemble() calculates CRC over a buffer, > that might be modified by another process. If, with "might", you mean that it's legitimate for that buffer to be modified, I don't think so. The bug is that something is modifying the buffer despite it being exclusively locked. I.e. what we likely have here is a bug somewhere in the hash index code. Greetings, Andres Freund
Re: Is there an undocumented Syntax Check in Meson?
Hi, On 2024-05-09 20:53:27 +0100, Dagfinn Ilmari Mannsåker wrote: > Andres Freund writes: > > On 2024-05-09 20:12:38 +0100, Dagfinn Ilmari Mannsåker wrote: > >> Attached is a patch which adds a check-docs target for meson, which > >> takes 0.3s on my laptop. > >> +checkdocs = custom_target('check-docs', > >> + input: 'postgres.sgml', > >> + output: 'check-docs', > >> + depfile: 'postgres-full.xml.d', > >> + command: [xmllint, '--nonet', '--valid', '--noout', > >> +'--path', '@OUTDIR@', '@INPUT@'], > >> + depends: doc_generated, > >> + build_by_default: false, > >> +) > >> +alias_target('check-docs', checkdocs) > > > > Isn't the custom target redundant with postgres_full_xml? I.e. you could > > just > > have the alias_target depend on postgres_full_xml? > > We could, but that would actually rebuild postgres-full.xml, not just > check the syntax (but that only takes 0.1-0.2s longer), I don't think this is performance critical enough to worry about 0.1s. If anything I think the performance argument goes the other way round - doing the validation work multiple times is a waste of time... > and only run if the docs have been modified since it was last built (which I > guess is fine, since if you haven't modified the docs you can't have > introduced any syntax errors). That actually seems good to me. > It's already possible to run that target directly, i.e. > > ninja doc/src/sgml/postgres-full.xml > > We could just document that in the list of meson doc targets, but a > shortcut alias would roll off the fingers more easily and be more > discoverable. Agreed. Greetings, Andres Freund
Re: Is there an undocumented Syntax Check in Meson?
Hi, On 2024-05-09 09:23:37 -0700, David G. Johnston wrote: > This needs updating: > https://www.postgresql.org/docs/current/docguide-build-meson.html You mean it should have a syntax target? Or that something else is out of date? > Also, as a sanity check, running that command takes my system 1 minute. Any > idea what percentile that falls into? I think that's on the longer end - what OS/environment is this on? Even on ~5yo CPU with turbo boost disabled it's 48s for me. FWIW, the single-page html is a good bit faster, 29s on the same system. I remember the build being a lot slower on windows, fwiw, due to the number of files being opened/created. I guess that might also be the case on slow storage, due to filesystem journaling. Greetings, Andres Freund
Re: Is there an undocumented Syntax Check in Meson?
Hi, On 2024-05-09 20:12:38 +0100, Dagfinn Ilmari Mannsåker wrote: > Attached is a patch which adds a check-docs target for meson, which > takes 0.3s on my laptop. Nice. > +checkdocs = custom_target('check-docs', > + input: 'postgres.sgml', > + output: 'check-docs', > + depfile: 'postgres-full.xml.d', > + command: [xmllint, '--nonet', '--valid', '--noout', > +'--path', '@OUTDIR@', '@INPUT@'], > + depends: doc_generated, > + build_by_default: false, > +) > +alias_target('check-docs', checkdocs) Isn't the custom target redundant with postgres_full_xml? I.e. you could just have the alias_target depend on postgres_full_xml? Greetings, Andres Freund
Re: request for database identifier in the startup packet
Hi, On 2024-05-09 14:20:49 -0400, Dave Cramer wrote: > On Thu, 9 May 2024 at 12:22, Robert Haas wrote: > > On Thu, May 9, 2024 at 8:06 AM Dave Cramer wrote: > > > The JDBC driver is currently keeping a per connection cache of types in > > the driver. We are seeing cases where the number of columns is quite high. > > In one case Prevent fetchFieldMetaData() from being run when unnecessary. · > > Issue #3241 · pgjdbc/pgjdbc (github.com) 2.6 Million columns. > > > > > > If we knew that we were connecting to the same database we could use a > > single cache across connections. > > > > > > I think we would require a server/database identifier in the startup > > message. > > > > I understand the desire to share the cache, but not why that would > > require any kind of change to the wire protocol. > > > > The server identity is actually useful for many things such as knowing > which instance of a cluster you are connected to. > For the cache however we can't use the IP address to determine which server > we are connected to as we could be connected to a pooler. > Knowing exactly which server/database makes it relatively easy to have a > common cache across connections. Getting that in the startup message seems > like a good place ISTM that you could just as well query the information you'd like after connecting. And that's going to be a lot more flexible than having to have precisely the right information in the startup message, and most clients not needing it. Greetings, Andres Freund
Re: Use pgstat_kind_infos to read fixed shared stats structs
Hi, On 2024-05-06 14:07:53 -0500, Tristan Partin wrote: > Instead of needing to be explicit, we can just iterate the > pgstat_kind_infos array to find the memory locations to read into. > This was originally thought of by Andres in > 5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd. > > Not a fix, per se, but it does remove a comment. Perhaps the discussion will > just lead to someone deleting the comment, and keeping the code as is. > Either way, a win :). I think it's a good idea. I'd really like to allow extensions to register new types of stats eventually. Stuff like pg_stat_statements having its own, fairly ... mediocre, stats storage shouldn't be necessary. Do we need to increase the stats version, I didn't check if the order we currently store things in and the numerical order of the stats IDs are the same. Greetings, Andres Freund
Re: backend stuck in DataFileExtend
Hi, On 2024-05-06 12:37:26 -0500, Justin Pryzby wrote: > On Mon, May 06, 2024 at 10:04:13AM -0700, Andres Freund wrote: > > Hi, > > > > On 2024-05-06 09:05:38 -0500, Justin Pryzby wrote: > > > In March, I noticed that a backend got stuck overnight doing: > > > > > > backend_start| 2024-03-27 22:34:12.774195-07 > > > xact_start | 2024-03-27 22:34:39.741933-07 > > > query_start | 2024-03-27 22:34:41.997276-07 > > > state_change | 2024-03-27 22:34:41.997307-07 > > > wait_event_type | IO > > > wait_event | DataFileExtend > > > state| active > > > backend_xid | 330896991 > > > backend_xmin | 330896991 > > > query_id | -3255287420265634540 > > > query| PREPARE mml_1 AS INSERT INTO child.un... > > > > > > The backend was spinning at 100% CPU: > > > > > > [pryzbyj@telsa2021 ~]$ ps -O wchan,pcpu 7881 > > >PID WCHAN %CPU S TTY TIME COMMAND > > > 7881 ? 99.4 R ?08:14:55 postgres: telsasoft ts [local] > > > INSERT > > > > > > This was postgres 16 STABLE compiled at 14e991db8. > > > > > > It's possible that this is a rare issue that we haven't hit before. > > > It's also possible this this is a recent regression. We previously > > > compiled at b2c9936a7 without hitting any issue (possibly by chance). > > > > > > I could neither strace the process nor attach a debugger. They got > > > stuck attaching. Maybe it's possible there's a kernel issue. This is a > > > VM running centos7 / 3.10.0-1160.49.1.el7.x86_64. > > > > > $ awk '{print $14, $15}' /proc/7881/stat # usr / sys > > > 229 3088448 > > > > > > When I tried to shut down postgres (hoping to finally be able to attach > > > a debugger), instead it got stuck: > > > > That strongly indicates either a kernel bug or storage having an issue. It > > can't be postgres' fault if an IO never completes. > > Is that for sure even though wchan=? (which I take to mean "not in a system > call"), and the process is stuck in user mode ? Postgres doesn't do anything to prevent a debugger from working, so this is just indicative that the kernel is stuck somewhere that it didn't set up information about being blocked - because it's busy doing something. > > What do /proc/$pid/stack say? > > [pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/18468/stat > 18468 (postgres) R 2274 18468 18468 0 -1 4857920 91836 0 3985 0 364 3794271 0 > 0 20 0 1 0 6092292660 941846528 10 18446744073709551615 4194304 12848820 > 140732995870240 140732995857304 139726958536394 0 4194304 19929088 536896135 > 0 0 0 17 3 0 0 1682 0 0 14949632 15052146 34668544 140732995874457 > 140732995874511 140732995874511 140732995874781 0 stack, not stat... > > > Full disclosure: the VM that hit this issue today has had storage-level > > > errors (reported here at ZZqr_GTaHyuW7fLp@pryzbyj2023), as recently as 3 > > > days ago. > > > > So indeed, my suspicion from above is confirmed. > > I'd be fine with that conclusion (as in the earlier thread), except this > has now happened on 2 different VMs, and the first one has no I/O > issues. If this were another symptom of a storage failure, and hadn't > previously happened on another VM, I wouldn't be re-reporting it. Is it the same VM hosting environment? And the same old distro? Greetings, Andres Freund
Re: backend stuck in DataFileExtend
Hi, On 2024-05-06 09:05:38 -0500, Justin Pryzby wrote: > In March, I noticed that a backend got stuck overnight doing: > > backend_start| 2024-03-27 22:34:12.774195-07 > xact_start | 2024-03-27 22:34:39.741933-07 > query_start | 2024-03-27 22:34:41.997276-07 > state_change | 2024-03-27 22:34:41.997307-07 > wait_event_type | IO > wait_event | DataFileExtend > state| active > backend_xid | 330896991 > backend_xmin | 330896991 > query_id | -3255287420265634540 > query| PREPARE mml_1 AS INSERT INTO child.un... > > The backend was spinning at 100% CPU: > > [pryzbyj@telsa2021 ~]$ ps -O wchan,pcpu 7881 >PID WCHAN %CPU S TTY TIME COMMAND > 7881 ? 99.4 R ?08:14:55 postgres: telsasoft ts [local] INSERT > > This was postgres 16 STABLE compiled at 14e991db8. > > It's possible that this is a rare issue that we haven't hit before. > It's also possible this this is a recent regression. We previously > compiled at b2c9936a7 without hitting any issue (possibly by chance). > > I could neither strace the process nor attach a debugger. They got > stuck attaching. Maybe it's possible there's a kernel issue. This is a > VM running centos7 / 3.10.0-1160.49.1.el7.x86_64. > $ awk '{print $14, $15}' /proc/7881/stat # usr / sys > 229 3088448 > > When I tried to shut down postgres (hoping to finally be able to attach > a debugger), instead it got stuck: That strongly indicates either a kernel bug or storage having an issue. It can't be postgres' fault if an IO never completes. What do /proc/$pid/stack say? > In both cases, the backend got stuck after 10pm, which is when a backup > job kicks off, followed by other DB maintenance. Our backup job uses > pg_export_snapshot() + pg_dump --snapshot. In today's case, the pg_dump > would've finished and snapshot closed at 2023-05-05 22:15. The backup > job did some more pg_dumps involving historic tables without snapshots > and finished at 01:11:40, at which point a reindex job started, but it > looks like the DB was already stuck for the purpose of reindex, and so > the script ended after a handful of commands were "[canceled] due to > statement timeout". Is it possible that you're "just" waiting for very slow IO? Is there a lot of dirty memory? Particularly on these old kernels that can lead to very extreme delays. grep -Ei 'dirty|writeback' /proc/meminfo > [...] > Full disclosure: the VM that hit this issue today has had storage-level > errors (reported here at ZZqr_GTaHyuW7fLp@pryzbyj2023), as recently as 3 > days ago. So indeed, my suspicion from above is confirmed. Greetings, Andres Freund
Re: Support a wildcard in backtrace_functions
On 2024-04-26 14:39:16 -0400, Tom Lane wrote: > Andres Freund writes: > > I don't think enabling backtraces without a way to disable them is a good > > idea > > - security vulnerablilities in backtrace generation code are far from > > unheard > > of and can make error handling a lot slower... > > Well, in that case we have to have some kind of control GUC, and > I think the consensus is that the one we have now is under-designed. > So I also vote for a full revert and try again in v18. Yea, makes sense. I just wanted to point out that some level of control is needed, not say that what we have now is right.
Re: Support a wildcard in backtrace_functions
Hi, On 2024-04-19 15:24:17 -0400, Tom Lane wrote: > I can't say that I care for "backtrace_on_internal_error". > Re-reading that thread, I see I argued for having *no* GUC and > just enabling that behavior all the time. I lost that fight, > but it should have been clear that a GUC of this exact shape > is a design dead end --- and that's what we're seeing now. I don't think enabling backtraces without a way to disable them is a good idea - security vulnerablilities in backtrace generation code are far from unheard of and can make error handling a lot slower... Greetings, Andres Freund
Re: New committers: Melanie Plageman, Richard Guo
On 2024-04-26 06:54:26 -0500, Jonathan S. Katz wrote: > The Core Team would like to extend our congratulations to Melanie Plageman > and Richard Guo, who have accepted invitations to become our newest > PostgreSQL committers. > > Please join us in wishing them much success and few reverts! Congratulations!
Re: AIX support
Hi, On 2024-04-25 00:20:05 -0400, Tom Lane wrote: > Something I've been mulling over is whether to suggest that the > proposed "new port" should only target building with gcc. Yes. I also wonder if such a port should only support building with sysv style shared library support, rather than the AIX (and windows) style. That'd make it considerably less impactful on the buildsystem level. I don't know what the performance difference is these days. Greetings, Andres Freund
Re: pgsql: meson: Add initial version of meson based build system
Hi, On 2024-04-18 10:54:18 +0200, Christoph Berg wrote: > Re: Andres Freund > > > This commit broke VPATH builds when the original source directory > > > contains symlinks. > > Argh, this is missing spaces around the '=', leading to the branch always > > being entered. > > Glad I found at least something :) Yep :). I pushed a fix to that now. > I've been using this directory layout for years, apparently so far > I've always only used non-VPATH builds or dpkg-buildpackage, which > probably canonicalizes the path before building, given it works. I wonder if perhaps find's behaviour might have changed at some point? > Since no one else has been complaining, it might not be worth fixing. I'm personally not excited about working on fixing this, but if somebody else wants to spend the cycles to make this work reliably... It's certainly interesting that we have some code worrying about symlinks in configure.ac: # prepare build tree if outside source tree # Note 1: test -ef might not exist, but it's more reliable than `pwd`. # Note 2: /bin/pwd might be better than shell's built-in at getting # a symlink-free name. But we only use this to determine if we're doing a vpath build, not as the path passed to prep_buildtree... Greetings, Andres Freund
Re: GUC-ify walsender MAX_SEND_SIZE constant
Hi, On 2024-04-23 14:47:31 +0200, Jakub Wartak wrote: > On Tue, Apr 23, 2024 at 2:24 AM Michael Paquier wrote: > > > > > Any news, comments, etc. about this thread? > > > > FWIW, I'd still be in favor of doing a GUC-ification of this part, but > > at this stage I'd need more time to do a proper study of a case where > > this shows benefits to prove your point, or somebody else could come > > in and show it. > > > > Andres has objected to this change, on the ground that this was not > > worth it, though you are telling the contrary. I would be curious to > > hear from others, first, so as we gather more opinions to reach a > > consensus. I think it's a bad idea to make it configurable. It's just one more guc that nobody has a chance of realistically tuning. I'm not saying we shouldn't improve the code - just that making MAX_SEND_SIZE configurable doesn't really seem like a good answer. FWIW, I have a hard time believing that MAX_SEND_SIZE is going to be the the only or even primary issue with high latency, high bandwidth storage devices. > First: it's very hard to get *reliable* replication setup for > benchmark, where one could demonstrate correlation between e.g. > increasing MAX_SEND_SIZE and observing benefits (in sync rep it is > easier, as you are simply stalled in pgbench). Part of the problem are > the following things: Depending on the workload, it's possible to measure streaming-out performance without actually regenerating WAL. E.g. by using pg_receivewal to stream the data out multiple times. Another way to get fairly reproducible WAL workloads is to drive pg_logical_emit_message() from pgbench, that tends to havea lot less variability than running tpcb-like or such. > Second: once you perform above and ensure that there are no network or > I/O stalls back then I *think* I couldn't see any impact of playing > with MAX_SEND_SIZE from what I remember as probably something else is > saturated first. My understanding of Majid's use-case for tuning MAX_SEND_SIZE is that the bottleneck is storage, not network. The reason MAX_SEND_SIZE affects that is that it determines the max size passed to WALRead(), which in turn determines how much we read from the OS at once. If the storage has high latency but also high throughput, and readahead is disabled or just not aggressive enough after crossing segment boundaries, larger reads reduce the number of times you're likely to be blocked waiting for read IO. Which is also why I think that making MAX_SEND_SIZE configurable is a really poor proxy for improving the situation. We're imo much better off working on read_stream.[ch] support for reading WAL. Greetings, Andres Freund
Re: gcc 12.1.0 warning
Hi, On 2024-04-15 11:25:05 +0300, Nazir Bilal Yavuz wrote: > I am able to reproduce this. I regenerated the debian bookworm image > and ran CI on REL_15_STABLE with this image. > > CI Run: https://cirrus-ci.com/task/4978799442395136 Hm, not sure why I wasn't able to repro - now I can. It actually seems like a legitimate warning: The caller allocates the key as static struct config_generic * find_option(const char *name, bool create_placeholders, bool skip_errors, int elevel) { const char **key = and then does res = (struct config_generic **) bsearch((void *) , (void *) guc_variables, num_guc_variables, sizeof(struct config_generic *), guc_var_compare); while guc_var_compare() assume it's being passed a full config_generic: static int guc_var_compare(const void *a, const void *b) { const struct config_generic *confa = *(struct config_generic *const *) a; const struct config_generic *confb = *(struct config_generic *const *) b; return guc_name_compare(confa->name, confb->name); } which several versions of gcc then complain about: In function ‘guc_var_compare’, inlined from ‘bsearch’ at /usr/include/x86_64-linux-gnu/bits/stdlib-bsearch.h:33:23, inlined from ‘find_option’ at /home/andres/src/postgresql-15/src/backend/utils/misc/guc.c:5640:35: /home/andres/src/postgresql-15/src/backend/utils/misc/guc.c:5727:38: warning: array subscript ‘const struct config_generic[0]’ is partly outside array bounds of ‘const char[8]’ [-Warray-bounds=] 5727 | return guc_name_compare(confa->name, confb->name); | ~^~ /home/andres/src/postgresql-15/src/backend/utils/misc/guc.c: In function ‘find_option’: /home/andres/src/postgresql-15/src/backend/utils/misc/guc.c:5627:25: note: object ‘name’ of size 8 5627 | find_option(const char *name, bool create_placeholders, bool skip_errors, Which seems entirely legitimate. ISTM that guc_var_compare() ought to only cast the pointers to the key type, i.e. char *. And incidentally that does prevent the warning. The reason it doesn't happen in newer versions of postgres is that we aren't using guc_var_compare() in the relevant places anymore... Greetings, Andres Freund
Re: fix tablespace handling in pg_combinebackup
Hi, On 2024-04-22 18:10:17 -0400, Robert Haas wrote: > cfbot is giving me a bunch of green check marks, so I plan to commit > this version, barring objections. Makes sense. > I shall leave redesign of the symlink mess as a matter for others to > ponder; I'm not in love with what we have, but I think it will be > tricky to do better, and I don't think I want to spend time on it, at > least not now. Oh, yea, that's clearly a much bigger and separate project... Greetings, Andres Freund
Re: fix tablespace handling in pg_combinebackup
Hi, On 2024-04-23 09:15:27 +1200, Thomas Munro wrote: > I find myself wondering if symlinks should go on the list of "things > we pretended Windows had out of convenience, that turned out to be > more inconvenient than we expected, and we'd do better to tackle > head-on with a more portable idea". Yes, I think the symlink design was pretty clearly a mistake. > Perhaps we could just use a tablespace map file instead to do our own path > construction, or something like that. I suspect that would be one of those > changes that is technically easy, but community-wise hard as it affects a > load of backup tools and procedures...) Yea. I wonder if we could do a somewhat transparent upgrade by creating a file alongside each tablespace that contains the path or such. Greetings, Andres
Re: subscription/026_stats test is intermittently slow?
Hi, On 2024-04-19 13:57:41 -0400, Robert Haas wrote: > Have others seen this? Is there something we can/should do about it? Yes, I've also seen this - but never quite reproducible enough to properly tackle it. The first thing I'd like to do is to make the wait_for_catchup routine regularly log the current state, so we can in retrospect analyze e.g. whether there was continual, but slow, replay progress, or whether replay was entirely stuck. wait_for_catchup() not being debuggable has been a problem in many different tests, so I think it's high time to fix that. Greetings, Andres Freund
Re: ECPG cleanup and fix for clang compile-time problem
On 2024-04-18 23:11:52 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2024-04-18 22:18:34 -0400, Tom Lane wrote: > >> (If our code coverage tools worked on bison/flex stuff, > >> maybe this'd be less scary ... but they don't.) > > > For bison coverage seems to work, see e.g.: > > Yeah, I'd just noticed that --- I had it in my head that we'd put > LCOV_EXCL_START/STOP into bison files too, but nope they are only > in flex files. That's good for this specific problem, because the > code I'm worried about is all in the bison file. At least locally the coverage seems to make sense too, both for the main grammar and for ecpg's. > > around the scanner "body". Without that I get reasonable-looking, albeit > > not > > very comforting, coverage for pgc.l as well. > > I was just looking locally at what I got by removing that, and sadly > I don't think I believe it: there are a lot of places where it claims > we hit lines we don't, and vice versa. That might be partially > blamable on old tools on my RHEL8 workstation, but it sure seems > that flex output confuses lcov to some extent. Hm. Here it mostly looks reasonable, except that at least things seem off by 1. And sure enough, if I look at pgc.l it has code like case 2: YY_RULE_SETUP #line 465 "/home/andres/src/postgresql/src/interfaces/ecpg/preproc/pgc.l" { token_start = yytext; state_before_str_start = YYSTATE; However line 465 is actually the "token_start" line. Further down this seems to get worse, by "<>" it's off by 4 lines. $ apt policy flex flex: Installed: 2.6.4-8.2+b2 Candidate: 2.6.4-8.2+b2 Version table: *** 2.6.4-8.2+b2 500 500 http://mirrors.ocf.berkeley.edu/debian unstable/main amd64 Packages 100 /var/lib/dpkg/status Greetings, Andres Freund
Re: ECPG cleanup and fix for clang compile-time problem
Hi, On 2024-04-18 22:18:34 -0400, Tom Lane wrote: > Here is a patch series that aims to clean up ecpg's preprocessor > code a little and fix the compile time problems we're seeing with > late-model clang [1]. I guess whether it's a cleanup is in the eye of > the beholder, but it definitely succeeds at fixing compile time: for > me, the time needed to compile preproc.o with clang 16 drops from > 104 seconds to less than 1 second. It might be a little faster at > processing input too, though that wasn't the primary goal. Nice! I'll look at this more later. For now I just wanted to point one minor detail: > (If our code coverage tools worked on bison/flex stuff, > maybe this'd be less scary ... but they don't.) For bison coverage seems to work, see e.g.: https://coverage.postgresql.org/src/interfaces/ecpg/preproc/preproc.y.gcov.html#10638 I think the only reason it doesn't work for flex is that we have /* LCOV_EXCL_START */ /* LCOV_EXCL_STOP */ around the scanner "body". Without that I get reasonable-looking, albeit not very comforting, coverage for pgc.l as well. |Lines |Functions|Branches Filename |RateNum|Rate Num|Rate Num src/interfaces/ecpg/preproc/pgc.l |65.9% 748|87.5% 8|-0 src/interfaces/ecpg/preproc/preproc.y |29.9% 4964|66.7% 15|-0 This has been introduced by commit 421167362242ce1fb46d6d720798787e7cd65aad Author: Peter Eisentraut Date: 2017-08-10 23:33:47 -0400 Exclude flex-generated code from coverage testing Flex generates a lot of functions that are not actually used. In order to avoid coverage figures being ruined by that, mark up the part of the .l files where the generated code appears by lcov exclusion markers. That way, lcov will typically only reported on coverage for the .l file, which is under our control, but not for the .c file. Reviewed-by: Michael Paquier but I don't think it's working as intended, as it's also preventing coverage for the actual scanner definition. Greetings, Andres Freund
Re: AIX support
Hi, On 2024-04-18 11:15:43 +, Sriram RK wrote: > We (IBM-AIX team) looked into this issue > > https://www.postgresql.org/message-id/20240225194322...@rfd.leadboat.com > > This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0) > have issues. But we verified that this issue is resolved with the newer > compiler versions openXL(xlc17.1) and gcc(12.0) onwards. The reason we used these compilers was that those were the only ones we had kinda somewhat reasonable access to, via the gcc projects' "compile farm" https://portal.cfarm.net/ We have to rely on whatever the aix machines there provide. They're not particularly plentiful resource-wise either. This is generally one of the big issues with AIX support. There are other niche-y OSs that don't have a lot of users, e.g. openbsd, but in contrast to AIX I can just start an openbsd VM within a few minutes and iron out whatever portability issue I'm dealing with. Not being AIX customers we also can't raise bugs about compiler bugs, so we're stuck doing bad workarounds. > Also as part of the support, we will help in fixing all the issues related > to AIX and continue to support AIX for Postgres. If we need another CI > environment we can work to make one available. But for time being can we > start reverting the patch that has removed AIX support. The state when was removed was not in a state that I am OK with adding back. > We want to make a note that postgres is used extensively in our IBM product > and is being exploited by multiple customers. To be blunt: Then it'd have been nice to see some investment in that before now. Both on the code level and the infrastructure level (i.e. access to machines etc). Greetings, Andres Freund
Re: fix tablespace handling in pg_combinebackup
Hi, On 2024-04-18 09:03:21 -0400, Robert Haas wrote: > On Wed, Apr 17, 2024 at 5:50 PM Andres Freund wrote: > > > +If there are tablespace present in the backup, include tablespace_map as > > > +a keyword parameter whose values is a hash. When tar_program is used, the > > > +hash keys are tablespace OIDs; otherwise, they are the tablespace > > > pathnames > > > +used in the backup. In either case, the values are the tablespace > > > pathnames > > > +that should be used for the target cluster. > > > > Where would one get these oids? > > You pretty much have to pick them out of the tar file names. It sucks, > but it's not this patch's fault. I was really just remarking on this from the angle of a test writer. I know that our interfaces around this suck... For tests, do we really need to set anything on a per-tablespace basis? Can't we instead just reparent all of them to a new directory? > I wonder if we (as a project) would consider a patch that redesigned > this whole mechanism. Produce ${TABLESPACE_NAME}.tar in tar-format, > instead of ${OID}.tar. In directory-format, relocate via > -T${TABLESPACE_NAME}=${DIR} instead of -T${SERVERDIR}=${DIR}. That > would be a significant compatibility break, and you'd somehow need to > solve the problem of what to put in the tablespace_map file, which > requires OIDs. But it seems like if you could finesse that issue in > some elegant way, the result would just be a heck of a lot more usable > than what we have today. For some things that'd definitely be nicer - but not sure it work well for everything. Consider cases where you actually have external directories on different disks, and you want to restore a backup after some data loss. Now you need to list all the tablespaces separately, to put them back into their own location. One thing I've been wondering about is an option to put the "new" tablespaces into a location relative to each of the old ones. --tablespace-relative-location=../restore-2024-04-18 which would rewrite all the old tablespaces to that new location. > > Could some of this be simplified by using allow_in_place_tablespaces > > instead? > > Looks like it'd simplify at least the extended test somewhat? > > I don't think we can afford to assume that allow_in_place_tablespaces > doesn't change the behavior. I think we can't assume that absolutely everywhere, but we don't need to test it in a lot of places. > I said (at least off-list) when that feature was introduced that there was > no way it was going to remain an isolated development hack, and I think > that's proved to be 100% correct. Hm, I guess I kinda agree. But not because I think it wasn't good for development, but because it'd often be much saner to use relative tablespaces than absolute ones even for prod. My only point here was that the test would be simpler if you a) didn't need to create a temp directory for the tablespace, both for primary and standby b) didn't need to "gin up" a tablespace map, because all the paths are relative Just to be clear: I don't want the above to block merging your test. If you think you want to do it the way you did, please do. Greetings, Andres Freund
Re: plenty code is confused about function level static
Hi, On 2024-04-18 09:07:43 -0300, Ranier Vilela wrote: > On 18/04/2024 00:39, Andres Freund wrote: > >There are lots of places that could benefit from adding 'static > >const'. > > I found a few more places. Good catches. > Patch 004 > > The opposite would also help, adding static. > In these places, I believe it is safe to add static, > allowing the compiler to transform into read-only, definitively. I don't think this would even compile? E.g. LockTagTypeNames, pg_wchar_table are declared in a header and used across translation units. Greetings, Andres Freund
Re: plenty code is confused about function level static
Hi, On 2024-04-18 10:33:30 +0200, Peter Eisentraut wrote: > > Attached are fixes for struct option and a few more occurrences I've found > > with a bit of grepping. > > These look good to me. Thoughts about when to apply these? Arguably they're fixing mildly broken code, making it appropriate to fix in 17, but it's also something that we could end up fixing for a while... There are some variations of this that are a bit harder to fix, btw. We have objdump -j .data -t src/backend/postgres|sort -k5 ... 01474d00 g O .data 15f0 ConfigureNamesReal 01479a80 g O .data 1fb0 ConfigureNamesEnum 01476300 g O .data 3778 ConfigureNamesString ... 014682e0 g O .data 5848 ConfigureNamesBool 0146db40 g O .data 71c0 ConfigureNamesInt Not that thta's all *that* much these days, but it's still pretty silly to use ~80kB of memory in every postgres instance just because we didn't set conf->gen.vartype = PGC_BOOL; etc at compile time. Large modifiable arrays with callbacks are also quite useful for exploitation, as one doesn't need to figure out precise addresses. Greetings, Andres Freund
Re: Speed up clean meson builds by ~25%
On 2024-04-17 23:10:53 -0400, Tom Lane wrote: > Jelte Fennema-Nio writes: > > As I expected this problem was indeed fairly easy to address by still > > building "backend/parser" before "interfaces". See attached patch. > > I think we should hold off on this. I found a simpler way to address > ecpg's problem than what I sketched upthread. I have a not-ready-to- > show-yet patch that allows the vast majority of ecpg's grammar > productions to use the default semantic action. Testing on my M1 > Macbook with clang 16.0.6 from MacPorts, I see the compile time for > preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec. That's pretty amazing.
Re: pgsql: meson: Add initial version of meson based build system
Hi, Uh, huh. On 2024-04-17 15:42:28 +0200, Christoph Berg wrote: > Re: Andres Freund > > https://git.postgresql.org/pg/commitdiff/e6927270cd18d535b77cbe79c55c6584351524be > > This commit broke VPATH builds when the original source directory > contains symlinks. I.e. a symlink to the source directory, not a symlink inside the source directory. > Given there are no other changes I think this bit is at fault: > > > Modified Files > > -- > > configure.ac |6 + > > +# Ensure that any meson build directories would reconfigure and see that > +# there's a conflicting in-tree build and can error out. > +if test "$vpath_build"="no"; then > + touch meson.build > +fi Argh, this is missing spaces around the '=', leading to the branch always being entered. What I don't understand is how that possibly could affect the prep_buildtree step, that happens earlier. Hm. Uh, I don't think it does? Afaict this failure is entirely unrelated to 'touch meson.build'? From what I can tell the problem is that config/prep_buildtree is invoked with the symlinked path, and that that doesn't seem to work: bash -x /home/andres/src/postgresql-via-symlink/config/prep_buildtree /home/andres/src/postgresql-via-symlink . ++ basename /home/andres/src/postgresql-via-symlink/config/prep_buildtree + me=prep_buildtree + help='Usage: prep_buildtree sourcetree [buildtree]' + test -z /home/andres/src/postgresql-via-symlink + test x/home/andres/src/postgresql-via-symlink = x--help + unset CDPATH ++ cd /home/andres/src/postgresql-via-symlink ++ pwd + sourcetree=/home/andres/src/postgresql-via-symlink ++ cd . ++ pwd + buildtree=/tmp/pgs ++ find /home/andres/src/postgresql-via-symlink -type d '(' '(' -name CVS -prune ')' -o '(' -name .git -prune ')' -o -print ')' ++ grep -v '/home/andres/src/postgresql-via-symlink/doc/src/sgml/\+' ++ find /home/andres/src/postgresql-via-symlink -name Makefile -print -o -name GNUmakefile -print ++ grep -v /home/andres/src/postgresql-via-symlink/doc/src/sgml/images/ + exit 0 Note that the find does not return anything. Greetings, Andres Freund
Re: fix tablespace handling in pg_combinebackup
Hi, On 2024-04-17 16:16:55 -0400, Robert Haas wrote: > In the "Differential code coverage between 16 and HEAD" thread, Andres > pointed out that there wasn't test case coverage for > pg_combinebackup's code to handle files in tablespaces. I looked at > adding that, and as nobody could possibly have predicted, found a bug. Ha ;) > @@ -787,8 +787,13 @@ Does not start the node after initializing it. > > By default, the backup is assumed to be plain format. To restore from > a tar-format backup, pass the name of the tar program to use in the > -keyword parameter tar_program. Note that tablespace tar files aren't > -handled here. > +keyword parameter tar_program. > + > +If there are tablespace present in the backup, include tablespace_map as > +a keyword parameter whose values is a hash. When tar_program is used, the > +hash keys are tablespace OIDs; otherwise, they are the tablespace pathnames > +used in the backup. In either case, the values are the tablespace pathnames > +that should be used for the target cluster. Where would one get these oids? Could some of this be simplified by using allow_in_place_tablespaces instead? Looks like it'd simplify at least the extended test somewhat? Greetings, Andres Freund
plenty code is confused about function level static
Hi, We have a fair amount of code that uses non-constant function level static variables for read-only data. Which makes little sense - it prevents the compiler from understanding a) that the data is read only and can thus be put into a segment that's shared between all invocations of the program b) the data will be the same on every invocation, and thus from optimizing based on that. The most common example of this is that all our binaries use static struct option long_options[] = { ... }; which prevents long_options from being put into read-only memory. Is there some reason we went for this pattern in a fair number of places? I assume it's mostly copy-pasta, but... In practice it often is useful to use 'static const' instead of just 'const'. At least gcc otherwise soemtimes fills the data on the stack, instead of having a read-only data member that's already initialized. I'm not sure why, tbh. Attached are fixes for struct option and a few more occurrences I've found with a bit of grepping. There are lots of places that could benefit from adding 'static const'. E.g. most, if not all, HASHCTL's should be that, but that's more verbose to change, so I didn't do that. Greetings, Andres Freund >From d43a10b5ba46b010c8e075b1062f9f30eb013498 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 17 Apr 2024 14:27:03 -0700 Subject: [PATCH v1 1/3] static const: Convert struct option arrays --- src/bin/initdb/initdb.c | 2 +- src/bin/pg_amcheck/pg_amcheck.c | 2 +- src/bin/pg_archivecleanup/pg_archivecleanup.c | 2 +- src/bin/pg_basebackup/pg_basebackup.c | 2 +- src/bin/pg_basebackup/pg_createsubscriber.c | 2 +- src/bin/pg_basebackup/pg_receivewal.c | 2 +- src/bin/pg_basebackup/pg_recvlogical.c| 2 +- src/bin/pg_checksums/pg_checksums.c | 2 +- src/bin/pg_combinebackup/pg_combinebackup.c | 2 +- src/bin/pg_controldata/pg_controldata.c | 2 +- src/bin/pg_ctl/pg_ctl.c | 2 +- src/bin/pg_dump/pg_dump.c | 2 +- src/bin/pg_dump/pg_dumpall.c | 2 +- src/bin/pg_resetwal/pg_resetwal.c | 2 +- src/bin/pg_rewind/pg_rewind.c | 2 +- src/bin/pg_test_fsync/pg_test_fsync.c | 2 +- src/bin/pg_test_timing/pg_test_timing.c | 2 +- src/bin/pg_upgrade/option.c | 2 +- src/bin/pg_verifybackup/pg_verifybackup.c | 2 +- src/bin/pg_waldump/pg_waldump.c | 2 +- src/bin/pg_walsummary/pg_walsummary.c | 2 +- src/bin/pgbench/pgbench.c | 2 +- src/bin/psql/startup.c| 2 +- src/bin/scripts/clusterdb.c | 2 +- src/bin/scripts/createdb.c| 2 +- src/bin/scripts/createuser.c | 2 +- src/bin/scripts/dropdb.c | 2 +- src/bin/scripts/dropuser.c| 2 +- src/bin/scripts/pg_isready.c | 2 +- src/bin/scripts/reindexdb.c | 2 +- src/bin/scripts/vacuumdb.c| 2 +- contrib/btree_gist/btree_interval.c | 2 +- contrib/oid2name/oid2name.c | 2 +- contrib/vacuumlo/vacuumlo.c | 2 +- src/interfaces/ecpg/preproc/ecpg.c| 2 +- src/test/regress/pg_regress.c | 2 +- 36 files changed, 36 insertions(+), 36 deletions(-) diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 30e17bd1d1e..5d4044d5a90 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -3093,7 +3093,7 @@ initialize_data_directory(void) int main(int argc, char *argv[]) { - static struct option long_options[] = { + static const struct option long_options[] = { {"pgdata", required_argument, NULL, 'D'}, {"encoding", required_argument, NULL, 'E'}, {"locale", required_argument, NULL, 1}, diff --git a/src/bin/pg_amcheck/pg_amcheck.c b/src/bin/pg_amcheck/pg_amcheck.c index 7e3101704d4..bb100022a0d 100644 --- a/src/bin/pg_amcheck/pg_amcheck.c +++ b/src/bin/pg_amcheck/pg_amcheck.c @@ -233,7 +233,7 @@ main(int argc, char *argv[]) uint64 relprogress = 0; int pattern_id; - static struct option long_options[] = { + static const struct option long_options[] = { /* Connection options */ {"host", required_argument, NULL, 'h'}, {"port", required_argument, NULL, 'p'}, diff --git a/src/bin/pg_archivecleanup/pg_archivecleanup.c b/src/bin/pg_archivecleanup/pg_archivecleanup.c index 07bf356b70c..6cdf471597e 100644 --- a/src/bin/pg_archivecleanup/pg_archivecleanup.c +++ b/src/bin/pg_archivecleanup/pg_archivecleanup.c @@ -285,7 +285,7 @@ usage(void) int main(int argc, char **argv) { - static struct option long_options[] = { + static const struct option long_options[] = { {"clean-backup-history", no_argument, NULL, 'b'}, {"debug", no_argument, NULL, 'd'}, {"dry-run&q
Re: Removing GlobalVisTestNonRemovableHorizon
On 2024-04-16 07:32:55 +0900, Michael Paquier wrote: > On Mon, Apr 15, 2024 at 11:57:20AM -0700, Andres Freund wrote: > > GlobalVisTestNonRemovableHorizon()/GlobalVisTestNonRemovableFullHorizon() > > only > > existed for snapshot_too_old - but that was removed in f691f5b80a8. I'm > > inclined to think we should remove those functions for 17. No new code > > should > > use them. > > RMT hat on. Feel free to go ahead and clean up that now. No > objections from here as we don't want to take the risk of this stuff > getting more used in the wild. Cool. Pushed the removal..
Re: Removing GlobalVisTestNonRemovableHorizon
Hi, On 2024-04-15 15:13:51 -0400, Robert Haas wrote: > It would of course have been nice to have done this sooner, but I don't > think waiting for next release cycle will make anything better. I don't really know how it could have been discovered sooner. We don't have any infrastructure for finding code that's not used anymore. And even if we had something finding symbols that aren't referenced within the backend, we have lots of functions that are just used by extensions, which would thus appear unused. In my local build we have several hundred functions that are not used within the backend, according to -Wl,--gc-sections,--print-gc-sections. A lot of that is entirely expected stuff, like RegisterXactCallback(). But there are also long-unused things like TransactionIdIsActive(). Greetings, Andres Freund
Re: Stack overflow issue
Hi, On 2024-04-17 14:39:14 +0300, Alexander Korotkov wrote: > On Wed, Apr 17, 2024 at 2:37 PM Alexander Korotkov > wrote: > > I've invested more time into polishing this. I'm intended to push > > this. Could you, please, take a look before? > > Just after sending this I spotted a typo s/untill/until/. The updated > version is attached. Nice, I see you moved the code back to "where it was", the diff to 16 got smaller this way. > + /* > + * Repeatedly call CommitTransactionCommandInternal() until all the work > + * is done. > + */ > + while (!CommitTransactionCommandInternal()); Personally I'd use { } instead of just ; here. The above scans weirdly for me. But it's also not important. Greetings, Andres Freund
Re: documentation structure
Hi, On 2024-04-17 12:07:24 +0100, Dagfinn Ilmari Mannsåker wrote: > Andres Freund writes: > > I think the manual work for writing signatures in sgml is not insignificant, > > nor is the volume of sgml for them. Manually maintaining the signatures > > makes > > it impractical to significantly improve the presentation - which I don't > > think > > is all that great today. > > And it's very inconsistent. For example, some functions use > tags for optional parameters, others use square brackets, and some use > VARIADIC to indicate variadic parameters, others use > ellipses (sometimes in tags or brackets). That seems almost inevitably the outcome of many people having to manually infer the recommended semantics, for writing something boring but nontrivial, from a 30k line file. > > And the lack of argument names in the pg_proc entries is occasionally fairly > > annoying, because a \df+ doesn't provide enough information to use > > functions. > > I was also annoyed by this the other day (specifically wrt. the boolean > arguments to pg_ls_dir), My bane is regexp_match et al, I have given up on remembering the argument order. > and started whipping up a Perl script to parse func.sgml and generate > missing proargnames values for pg_proc.dat, which is how I discovered the > above. Nice. > The script currently has a pile of hacky regexes to cope with that, > so I'd be happy to submit a doc patch to turn it into actual markup to get > rid of that, if people think that's a worhtwhile use of time and won't clash > with any other plans for the documentation. I guess it's a bit hard to say without knowing how voluminious the changes would be. If we end up rewriting the whole file the tradeoff is less clear than if it's a dozen inconsistent entries. > > It'd also be quite useful if clients could render more of the documentation > > for functions. People are used to language servers providing full > > documentation for functions etc... > > A more user-friendly version of \df+ (maybe spelled \hf, for symmetry > with \h for commands?) would certainly be nice. Indeed. Greetings, Andres Freund
Re: documentation structure
Hi, On 2024-04-17 02:46:53 -0400, Corey Huinker wrote: > > > This sounds to me like it would be a painful exercise with not a > > > lot of benefit in the end. > > > > Maybe we could _verify_ the contents of func.sgml against pg_proc. > > > > All of the functions redefined in catalog/system_functions.sql complicate > using pg_proc.dat as a doc generator or source of validation. We'd probably > do better to validate against a live instance, and even then the benefit > wouldn't be great. There are 80 'CREATE OR REPLACE's in system_functions.sql, 1016 occurrences of func_table_entry in funcs.sgml and 3.3k functions in pg_proc. I'm not saying that differences due to system_functions.sql wouldn't be annoying to deal with, but it'd also be far from the end of the world. Greetings, Andres Freund
Re: documentation structure
Hi, On 2024-04-16 15:05:32 -0400, Tom Lane wrote: > Andres Freund writes: > > I think we should work on generating a lot of func.sgml. Particularly the > > signature etc should just come from pg_proc.dat, it's pointlessly painful to > > generate that by hand. And for a lot of the functions we should probably > > move > > the existing func.sgml comments to the description in pg_proc.dat. > > Where are you going to get the examples and text descriptions from? I think there's a few different way to do that. E.g. having long_desc, example fields in pg_proc.dat. Or having examples and description in a separate file and "enriching" that with auto-generated function signatures. > (And no, I don't agree that the pg_description string should match > what's in the docs. The description string has to be a short > one-liner in just about every case.) Definitely shouldn't be the same in all cases, but I think there's a decent number of cases where they can be the same. The differences between the two is often minimal today. Entirely randomly chosen example: { oid => '2825', descr => 'slope of the least-squares-fit linear equation determined by the (X, Y) pairs', proname => 'regr_slope', prokind => 'a', proisstrict => 'f', prorettype => 'float8', proargtypes => 'float8 float8', prosrc => 'aggregate_dummy' }, and regression slope regr_slope regr_slope ( Y double precision, X double precision ) double precision Computes the slope of the least-squares-fit linear equation determined by the (X, Y) pairs. Yes The description is quite similar, the pg_proc entry lacks argument names. > This sounds to me like it would be a painful exercise with not a > lot of benefit in the end. I think the manual work for writing signatures in sgml is not insignificant, nor is the volume of sgml for them. Manually maintaining the signatures makes it impractical to significantly improve the presentation - which I don't think is all that great today. And the lack of argument names in the pg_proc entries is occasionally fairly annoying, because a \df+ doesn't provide enough information to use functions. It'd also be quite useful if clients could render more of the documentation for functions. People are used to language servers providing full documentation for functions etc... Greetings, Andres Freund
Re: Differential code coverage between 16 and HEAD
Hi, On 2024-04-15 18:23:21 -0700, Jeff Davis wrote: > On Mon, 2024-04-15 at 17:05 -0700, Andres Freund wrote: > > Can't we test this as part of the normal testsuite? > > One thing that complicates things a bit is that the test compares the > results against ICU, so a mismatch in Unicode version between ICU and > Postgres can cause test failures. The test ignores unassigned code > points, so normally it just results in less-exhaustive test coverage. > But sometimes things really do change, and that would cause a failure. Hm, that seems annoying, even for update-unicode :/. But I guess it won't be very common to have such failures? > Stepping back a moment, my top worry is really not to test those C > functions, but to test the perl code that parses the text files and > generates those arrays. Imagine a future Unicode version does something > that the perl scripts didn't anticipate, and they fail to add array > entries for half the code points, or something like that. By testing > the arrays generated from freshly-parsed files exhaustively against > ICU, then we have a good defense against that. That situation really > only comes up when updating Unicode. That's a good point. > That's not to say that the C code shouldn't be tested, of course. Maybe > we can just do some spot checks for the functions that are reachable > via SQL and get rid of the functions that aren't yet reachable (and re- > add them when they are)? Yes, I think that'd be a good start. I don't think we necessarily need exhaustive coverage, just a bit more coverage than we have. > > I don't at all like that the tests depend on downloading new unicode > > data. What if there was an update but I just want to test the current > > state? > > I was mostly following the precedent for normalization. Should we > change that, also? Yea, I think we should. But I think it's less urgent if we end up testing more of the code without those test binaries. I don't immediately know what dependencies would be best, tbh. Greetings, Andres Freund
Re: documentation structure
Hi, On 2024-03-19 17:39:39 -0400, Andrew Dunstan wrote: > My own pet docs peeve is a purely editorial one: func.sgml is a 30k line > beast, and I think there's a good case for splitting out at least the > larger chunks of it. I think we should work on generating a lot of func.sgml. Particularly the signature etc should just come from pg_proc.dat, it's pointlessly painful to generate that by hand. And for a lot of the functions we should probably move the existing func.sgml comments to the description in pg_proc.dat. I suspect that we can't just generate all the documentation from pg_proc, because of xrefs etc. Although perhaps we could just strip those out for pg_proc. We'd need to add some more metadata to pg_proc, for grouping kinds of functions together. But that seems doable. Greetings, Andres Freund
Re: Table AM Interface Enhancements
Hi, On 2024-04-16 08:31:24 -0400, Robert Haas wrote: > On Tue, Apr 16, 2024 at 6:52 AM Alexander Korotkov > wrote: > > Taking a closer look at acquire_sample_rows(), I think it would be > > good if table AM implementation would care about block-level (or > > whatever-level) sampling. So that acquire_sample_rows() just fetches > > tuples one-by-one from table AM implementation without any care about > > blocks. Possible table_beginscan_analyze() could take an argument of > > target number of tuples, then those tuples are just fetches with > > table_scan_analyze_next_tuple(). What do you think? > > Andres is the expert here, but FWIW, that plan seems reasonable to me. > One downside is that every block-based tableam is going to end up with > a very similar implementation, which is kind of something I don't like > about the tableam API in general: if you want to make something that > is basically heap plus a little bit of special sauce, you have to copy > a mountain of code. Right now we don't really care about that problem, > because we don't have any other tableams in core, but if we ever do, I > think we're going to find ourselves very unhappy with that aspect of > things. But maybe now is not the time to start worrying. That problem > isn't unique to analyze, and giving out-of-core tableams the > flexibility to do what they want is better than not. I think that can partially be addressed by having more "block oriented AM" helpers in core, like we have for table_block_parallelscan*. Doesn't work for everything, but should for something like analyze. Greetings, Andres Freund
Re: Table AM Interface Enhancements
On 2024-04-16 13:33:53 +0300, Alexander Korotkov wrote: > Reverted. Thanks!
Re: Issue with the PRNG used by Postgres
Hi, On 2024-04-15 10:54:16 -0400, Robert Haas wrote: > On Fri, Apr 12, 2024 at 3:33 PM Andres Freund wrote: > > Here's a patch implementing this approach. I confirmed that before we > > trigger > > the stuck spinlock logic very quickly and after we don't. However, if most > > sleeps are interrupted, it can delay the stuck spinlock detection a good > > bit. But that seems much better than triggering it too quickly. > > +1 for doing something about this. I'm not sure if it goes far enough, > but it definitely seems much better than doing nothing. One thing I started to be worried about is whether a patch ought to prevent the timeout used by perform_spin_delay() from increasing when interrupted. Otherwise a few signals can trigger quite long waits. But as a I can't quite see a way to make this accurate in the backbranches, I suspect something like what I posted is still a good first version. > Given your findings, I'm honestly kind of surprised that I haven't seen > problems of this type more frequently. Same. I did a bunch of searches for the error, but found surprisingly little. I think in practice most spinlocks just aren't contended enough to reach perform_spin_delay(). And we have improved some on that over time. Greetings, Andres Freund
Re: Stack overflow issue
Hi, On 2024-04-16 15:45:42 +0300, Alexander Korotkov wrote: > On Tue, Apr 16, 2024 at 1:48 AM Andres Freund wrote: > > On 2024-03-06 14:17:23 +0200, Alexander Korotkov wrote: > > > 0001 Turn tail recursion into iteration in CommitTransactionCommand() > > > I did minor revision of comments and code blocks order to improve the > > > readability. > > > > After sending > > https://www.postgresql.org/message-id/20240414223305.m3i5eju6zylabvln%40awork3.anarazel.de > > I looked some more at important areas where changes didn't have code > > coverage. One thing I noticed was that the "non-internal" part of > > AbortCurrentTransaction() is uncovered: > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/transam/xact.c.gcov.html#L3403 > > > > Which made me try to understand fefd9a3fed2. I'm a bit confused about why > > some parts are handled in > > CommitCurrentTransaction()/AbortCurrentTransaction() > > and others are in the *Internal functions. > > > > I understand that fefd9a3fed2 needed to remove the recursion in > > CommitTransactionCommand()/AbortCurrentTransaction(). But I don't understand > > why that means having some code in in the non-internal and some in the > > internal functions? Wouldn't it be easier to just have all the state > > handling > > code in the Internal() function and just break after the > > CleanupSubTransaction() calls? > > I'm not sure I correctly get what you mean. Do you think the attached > patch matches the direction you're pointing? The patch itself is not > final, it requires cleanup and comments revision, just to check the > direction. Something like that, yea. The split does seem less confusing that way to me, but also not 100% certain. Greetings, Andres Freund
Re: Differential code coverage between 16 and HEAD
Hi, On 2024-04-16 13:50:14 +1200, David Rowley wrote: > I think primarily it's good to exercise that code just to make sure it > does not crash. Looking at the output of the above on my machine: Agreed. > name | ident | parent | level | total_bytes | > total_nblocks | free_bytes | free_chunks | used_bytes > ---+---++---+-+---++-+ > Caller tuples | | TupleSort sort | 6 | 1056848 | > 3 | 8040 | 0 |1048808 > (1 row) > > I'd say: > > Name: stable > ident: stable > parent: stable > level: could change from a refactor of code > total_bytes: could be different on other platforms or dependent on > MEMORY_CONTEXT_CHECKING > total_nblocks: stable enough > free_bytes: could be different on other platforms or dependent on > MEMORY_CONTEXT_CHECKING > free_chunks: always 0 > used_bytes: could be different on other platforms or dependent on > MEMORY_CONTEXT_CHECKING I think total_nblocks might also not be entirely stable? How about just checking if total_bytes, total_nblocks, free_bytes and used_bytes are bigger than 0? > I cut the 2nd row down to just 512 bytes as I didn't see the need to > add two large datums. Agreed, I just quickly hacked the statement up based on your earlier one. Looks good to me, either testing the other columns with > 0 or as you have it. > > Hm, independent of this, seems a bit odd that we don't include the memory > > context type in pg_backend_memory_contexts? > > That seems like useful information to include. It sure would be > useful to have in there to verify that I'm testing BumpStats(). I've > written a patch [2]. Nice! Greetings, Andres Freund
Re: Bugs in ecpg's macro mechanism
Hi, On 2024-04-15 20:47:16 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2024-04-15 17:48:32 -0400, Tom Lane wrote: > >> But I have no idea about making it work in meson. Any suggestions? > > > So you just want to compile define.c twice? The below should suffice: > > > - 'define': ['-DCMDLINESYM=123'], > > + 'define': ['-DCMDLINESYM=123', files('define.pgc')], > > Ah, thanks. I guess this depends on getopt_long reordering arguments > (since the "-o outfile" bit will come later). That is safe enough > in HEAD since 411b72034, but it might fail on weird platforms in v16. > How much do we care about that? (We can avoid that hazard in the > makefile build easily enough.) Oh, I didn't even think of that. If we do care, we can just move the -o to earlier. Or just officially add it as another input, that'd just be a bit of notational overhead. As moving the arguments around would just be the following, I see no reason to just do so. diff --git i/src/interfaces/ecpg/test/meson.build w/src/interfaces/ecpg/test/meson.build index c1e508ccc82..d7c0e9de7d6 100644 --- i/src/interfaces/ecpg/test/meson.build +++ w/src/interfaces/ecpg/test/meson.build @@ -45,9 +45,10 @@ ecpg_preproc_test_command_start = [ '--regression', '-I@CURRENT_SOURCE_DIR@', '-I@SOURCE_ROOT@' + '/src/interfaces/ecpg/include/', + '-o', '@OUTPUT@', ] ecpg_preproc_test_command_end = [ - '-o', '@OUTPUT@', '@INPUT@' + '@INPUT@', ] ecpg_test_dependencies = [] Greetings, Andres Freund
Re: What's our minimum ninja version?
Hi, On 2024-04-15 20:26:49 -0400, Tom Lane wrote: > installation.sgml says our minimum meson version is 0.54, but it's > silent on what the minimum ninja version is. What RHEL8 supplies > doesn't work for me: Yea. There was some thread where we'd noticed that, think you were on that too. We could probably work around the issue, if we needed to, possibly at the price of a bit more awkward notation. But I'm inclined to just document it. > $ meson setup build > ... > Found ninja-1.8.2 at /usr/bin/ninja > ninja: error: build.ninja:7140: multiple outputs aren't (yet?) supported by > depslog; bring this up on the mailing list if it affects you > > WARNING: Could not create compilation database. > > That's not a huge problem in itself, but I think we should figure > out what's the minimum version that works, and document that. Looks like it's 1.10, released 2020-01-27. Greetings, Andres Freund
Re: Differential code coverage between 16 and HEAD
Hi, On 2024-04-15 16:53:48 -0700, Jeff Davis wrote: > On Sun, 2024-04-14 at 15:33 -0700, Andres Freund wrote: > > - Coverage for some of the new unicode code is pretty poor: > > > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/common/unicode_category.c.gcov.html#L122 > > Thank you for looking. Those functions are tested by category_test.c > which is run with the 'update-unicode' target. Testing just during update-unicode doesn't strike me as a great - that way portability issues wouldn't be found. And if it were tested that way, coverage would understand it too. I can just include update-unicode when running coverage, but that doesn't seem great. Can't we test this as part of the normal testsuite? I don't at all like that the tests depend on downloading new unicode data. What if there was an update but I just want to test the current state? Greetings, Andres Freund
Re: Time to back-patch libxml deprecation fixes?
Hi, On 2024-04-15 19:14:22 -0400, Tom Lane wrote: > I think the right answer is to > back-patch Michael's 65c5864d7 (xml2: Replace deprecated routines with > recommended ones). We speculated about that at the time (see e.g., > 400928b83) but didn't pull the trigger. I think 65c5864d7 has now > baked long enough that it'd be safe to back-patch. Looks like a reasonable plan to me. Greetings, Andres Freund
Re: Bugs in ecpg's macro mechanism
Hi, On 2024-04-15 17:48:32 -0400, Tom Lane wrote: > I started looking into the ideas discussed at [1] about reimplementing > ecpg's string handling. Before I could make any progress I needed > to understand the existing input code, part of which is the macro > expansion mechanism ... and the more I looked at that the more bugs > I found, not to mention that it uses misleading field names and is > next door to uncommented. As part of the discussion leading to [1] I had looked at parse.pl and found it fairly impressively obfuscated and devoid of helpful comments. > I found two ways to crash ecpg outright and several more cases in which it'd > produce surprising behavior. :/ > One thing it's missing is any test of the behavior when command-line macro > definitions are carried from one file to the next one. To test that, we'd > need to compile more than one ecpg input file at a time. I can see how to > kluge the Makefiles to make that happen, basically this'd do: > > define.c: define.pgc $(ECPG_TEST_DEPENDENCIES) > - $(ECPG) -DCMDLINESYM=123 -o $@ $< > + $(ECPG) -DCMDLINESYM=123 -o $@ $< $< > > But I have no idea about making it work in meson. Any suggestions? So you just want to compile define.c twice? The below should suffice: diff --git i/src/interfaces/ecpg/test/sql/meson.build w/src/interfaces/ecpg/test/sql/meson.build index e04684065b0..202dc69c6ea 100644 --- i/src/interfaces/ecpg/test/sql/meson.build +++ w/src/interfaces/ecpg/test/sql/meson.build @@ -31,7 +31,7 @@ pgc_files = [ ] pgc_extra_flags = { - 'define': ['-DCMDLINESYM=123'], + 'define': ['-DCMDLINESYM=123', files('define.pgc')], 'oldexec': ['-r', 'questionmarks'], } I assume that was just an test hack, because it leads to the build failing because of main being duplicated. But it'd work the same with another, "non overlapping", file. Greetings, Andres Freund
Re: Differential code coverage between 16 and HEAD
Hi, On 2024-04-16 10:26:57 +1200, David Rowley wrote: > On Mon, 15 Apr 2024 at 10:33, Andres Freund wrote: > > - The new bump allocator has a fair amount of uncovered functionality: > > > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/mmgr/bump.c.gcov.html#L293 > > The attached adds a test to tuplesort to exercise BumpAllocLarge() Cool. > > > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/mmgr/bump.c.gcov.html#L613 > > I don't see a way to exercise those. They're meant to be "can't > happen" ERRORs. I could delete them and use BogusFree, BogusRealloc, > BogusGetChunkContext, BogusGetChunkSpace instead, but the ERROR > message would be misleading. I think it's best just to leave this. I guess was thinking more about BumpIsEmpty() and BumpStats() then the "bogus" cases. But BumpIsEmpty() likely is unreachable as well. BumpStats() is reachable, but perhaps it's not worth it? BEGIN; DECLARE foo CURSOR FOR SELECT LEFT(a,10),b FROM (VALUES(REPEAT('a', 512 * 1024),1),(REPEAT('b', 512 * 1024),2)) v(a,b) ORDER BY v.a DESC; FETCH 1 FROM foo; SELECT * FROM pg_backend_memory_contexts WHERE name = 'Caller tuples'; Hm, independent of this, seems a bit odd that we don't include the memory context type in pg_backend_memory_contexts? Greetings, Andres Freund
Re: Stack overflow issue
Hi, On 2024-03-06 14:17:23 +0200, Alexander Korotkov wrote: > 0001 Turn tail recursion into iteration in CommitTransactionCommand() > I did minor revision of comments and code blocks order to improve the > readability. After sending https://www.postgresql.org/message-id/20240414223305.m3i5eju6zylabvln%40awork3.anarazel.de I looked some more at important areas where changes didn't have code coverage. One thing I noticed was that the "non-internal" part of AbortCurrentTransaction() is uncovered: https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/transam/xact.c.gcov.html#L3403 Which made me try to understand fefd9a3fed2. I'm a bit confused about why some parts are handled in CommitCurrentTransaction()/AbortCurrentTransaction() and others are in the *Internal functions. I understand that fefd9a3fed2 needed to remove the recursion in CommitTransactionCommand()/AbortCurrentTransaction(). But I don't understand why that means having some code in in the non-internal and some in the internal functions? Wouldn't it be easier to just have all the state handling code in the Internal() function and just break after the CleanupSubTransaction() calls? That's of course largely unrelated to the coverage aspects. I just got curious. Greetings, Andres Freund
Re: Differential code coverage between 16 and HEAD
Hi, On 2024-04-15 15:36:04 -0400, Robert Haas wrote: > On Sun, Apr 14, 2024 at 6:33 PM Andres Freund wrote: > > - Some of the new walsummary code could use more tests. > > > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/backup/walsummaryfuncs.c.gcov.html#L69 > > So this is pg_wal_summary_contents() and > pg_get_wal_summarizer_state(). I was reluctant to try to cover these > because I thought it would be hard to get the tests to be stable. The > difficulties in stabilizing src/bin/pg_walsummary/t/002_blocks.pl seem > to demonstrate that this concern wasn't entire unfounded, but as far > as I know that test is now stable, so we could probably use the same > technique to test pg_wal_summary_contents(), maybe even as part of the > same test case. I don't really know what a good test for > pg_get_wal_summarizer_state() would look like, though. I think even just reaching the code, without a lot of of verification of the returned data, is better than not reaching the code at all. I.e. the test could just check that the pid is set, the tli is right. That'd also add at least some coverage of https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/postmaster/walsummarizer.c.gcov.html#L433 > > > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_combinebackup/pg_combinebackup.c.gcov.html#L424 > > I guess we could test this by adding a tablespace, and a tablespace > mapping, to one of the pg_combinebackup tests. Seems worthwhile to me. > > > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/common/blkreftable.c.gcov.html#L790 > > This is dead code. I thought we might need to use this as a way of > managing memory pressure, but it didn't end up being needed. We could > remove it, or mark it #if NOT_USED, or whatever. Don't really have an opinion on that. How likely do you think we'll need it going forward? Note that I didn't look exhaustively through the coverage of the walsummarizer code - I just looked at a few things that stood out. I looked for a few minutes more: - It seems worth explicitly covering the various record types that walsummarizer needs to understand: https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/postmaster/walsummarizer.c.gcov.html#L1184 i.e. XLOG_SMGR_TRUNCATE, XLOG_XACT_COMMIT_PREPARED, XLOG_XACT_ABORT, XLOG_XACT_ABORT_PREPARED. - Another thing that looks to be not covered is dealing with enabling/disabling summarize_wal, that also seems worth testing? Greetings, Andres Freund
Re: Table AM Interface Enhancements
Hi, On 2024-04-15 16:02:00 -0400, Robert Haas wrote: > On Mon, Apr 15, 2024 at 3:47 PM Andres Freund wrote: > > That said, I don't like the state after applying > > https://postgr.es/m/CAPpHfdvuT6DnguzaV-M1UQ2whYGDojaNU%3D-%3DiHc0A7qo9HBEJw%40mail.gmail.com > > because there's too much coupling. Hence talking about needing to iterate on > > the interface in some form, earlier in the thread. > > Mmph, I can't follow what the actual state of things is here. Are we > waiting for Alexander to push that patch? Is he waiting for somebody > to sign off on that patch? I think Alexander is arguing that we shouldn't revert 27bc1772fc & dd1f6b0c17 in 17. I already didn't think that was an option, because I didn't like the added interfaces, but now am even more certain, given how broken dd1f6b0c17 seems to be: https://postgr.es/m/20240415201057.khoyxbwwxfgzomeo%40awork3.anarazel.de > Do you want that patch applied, not applied, or applied with some set of > modifications? I think we should apply Alexander's proposed revert and then separately discuss what we should do about 041b96802ef. > I find the discussion of "too much coupling" too abstract. I want to > get down to specific proposals for what we should change, or not > change. I think it's a bit hard to propose something concrete until we've decided whether we'll revert 27bc1772fc & dd1f6b0c17. Greetings, Andres Freund
Re: Table AM Interface Enhancements
Hi, On 2024-04-12 01:04:03 +0300, Alexander Korotkov wrote: > 1) If we just apply my revert patch and leave c6fc50cb4028 and > 041b96802ef in the tree, then we get our table AM API narrowed. As > you expressed the current API requires block numbers to be 1:1 with > the actual physical on-disk location [2]. Not a secret I think the > current API is quite restrictive. And we're getting the ANALYZE > interface narrower than it was since 737a292b5de. Frankly speaking, I > don't think this is acceptable. As others already pointed out, c6fc50cb4028 was committed quite a while ago. I'm fairly unhappy about c6fc50cb4028, fwiw, but didn't realize that until it was too late. > In token of all of the above, is the in-tree state that bad? (if we > abstract the way 27bc1772fc and dd1f6b0c17 were committed). To me the 27bc1772fc doesn't make much sense on its own. You added calls directly to heapam internals to a file in src/backend/commands/, that just doesn't make sense. Leaving that aside, I think the interface isn't good on its own: table_relation_analyze() doesn't actually do anything, it just sets callbacks, that then later are called from analyze.c, which doesn't at all fit to the name of the callback/function. I realize that this is kinda cribbed from the FDW code, but I don't think that is a particularly good excuse. I don't think dd1f6b0c17 improves the situation, at all. It sets global variables to redirect how an individual acquire_sample_rows invocation works: void block_level_table_analyze(Relation relation, AcquireSampleRowsFunc *func, BlockNumber *totalpages, BufferAccessStrategy bstrategy, ScanAnalyzeNextBlockFunc scan_analyze_next_block_cb, ScanAnalyzeNextTupleFunc scan_analyze_next_tuple_cb) { *func = acquire_sample_rows; *totalpages = RelationGetNumberOfBlocks(relation); vac_strategy = bstrategy; scan_analyze_next_block = scan_analyze_next_block_cb; scan_analyze_next_tuple = scan_analyze_next_tuple_cb; } Notably it does so within the ->relation_analyze tableam callback, which does *NOT* not actually do anything other than returning a callback. So if ->relation_analyze() for another relation is called, the acquire_sample_rows() for the earlier relation will do something different. Note that this isn't a theoretical risk, acquire_inherited_sample_rows() actually collects the acquirefunc for all the inherited relations before calling acquirefunc. This is honestly leaving me somewhat speechless. Greetings, Andres Freund
Re: Table AM Interface Enhancements
Hi, On 2024-04-15 23:14:01 +0400, Pavel Borisov wrote: > Why it makes a difference looks a little bit unclear to me, I can't comment > on this. I noticed that before 041b96802ef we had a block number and block > sampler state that tied acquire_sample_rows() to the actual block > structure. That, and the prefetch calls actually translating the block numbers 1:1 to physical locations within the underlying file. And before 041b96802ef they were tied much more closely by the direct calls to heapam added in 27bc1772fc81. > After we have the whole struct ReadStream which doesn't comprise just a > wrapper for the same variables, but the state that ties > acquire_sample_rows() to the streaming read algorithm (and heap). Yes ... ? I don't see how that is a meaningful difference to the state as of 27bc1772fc81. Nor fundamentally worse than the state 27bc1772fc81^, given that we already issued requests for specific blocks in the file. That said, I don't like the state after applying https://postgr.es/m/CAPpHfdvuT6DnguzaV-M1UQ2whYGDojaNU%3D-%3DiHc0A7qo9HBEJw%40mail.gmail.com because there's too much coupling. Hence talking about needing to iterate on the interface in some form, earlier in the thread. What are you actually arguing for here? Greetings, Andres Freund
Removing GlobalVisTestNonRemovableHorizon
Hi, GlobalVisTestNonRemovableHorizon()/GlobalVisTestNonRemovableFullHorizon() only existed for snapshot_too_old - but that was removed in f691f5b80a8. I'm inclined to think we should remove those functions for 17. No new code should use them. Greetings, Andres Freund
Differential code coverage between 16 and HEAD
Hi, To see how well we're doing testing newly introduced code, I computed the differential code coverage between REL_16_STABLE and HEAD. While arguably comparing HEAD to the the merge-base between REL_16_STABLE and HEAD would be more accurate, I chose REL_16_STABLE because we've backpatched bugfixes with tests etc. I first got some nonsensical differences. That turns out to be due to immediate shutdowns in the tests, which a) can loose coverage, e.g. there were no hits for most of walsummarizer.c, because the test shuts always shuts it down immediately b) can cause corrupted coverage files if a process is shut down while writing out coverage files I partially worked around a) by writing out coverage files during abnormal shutdowns. That requires some care, I'll send a separate email about that. I worked around b) by rerunning tests until that didn't occur. The differential code coverage view in lcov is still somewhat raw. I had to weaken two error checks to get it to succeed in postgres. You can hover over the code coverage columns to get more details about what the various three letter acronyms mean. The most important ones are: - UNC - uncovered new code, i.e. we added code that's not tested - LBC - lost baseline coverage, i.e previously tested code isn't anymore - UBC - untested baseline, i.e. code that's still not tested - GBC - gained baseline coverage - unchanged code that's now tested - GNC - gained new coverage - new code that's tested https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/ This includes "branch coverage" - I'm not sure that's worth the additional clutter it generates. Looking at the differential coverage results, at least the following seem notable: - We added a bit less uncovered code than last year, but it's not quite a fair comparison, because I ran the numbers for 16 2023-04-08. Since the feature freeze, 17's coverage has improved by a few hundred lines (8225c2fd40c). - A good bit of the newly uncovered code is in branches that are legitimately hard to reach (unlikely errors etc). - Some of the new walsummary code could use more tests. https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/backup/walsummaryfuncs.c.gcov.html#L69 https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_combinebackup/pg_combinebackup.c.gcov.html#L424 https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/common/blkreftable.c.gcov.html#L790 - the new buffer eviction paths aren't tested at all: https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/storage/buffer/bufmgr.c.gcov.html#L6023 https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/contrib/pg_buffercache/pg_buffercache_pages.c.gcov.html#L356 It looks like it should be fairly trivial to test at least the basics? - Coverage for some of the new unicode code is pretty poor: https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/common/unicode_category.c.gcov.html#L122 - Some of the new nbtree code could use a bit more tests: https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/nbtree/nbtutils.c.gcov.html#L1468 - Our coverage of the non-default copy modes of pg_upgrade, pg_combinebackup is nonexistent, and that got worse with the introduction of a new method this release: https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_upgrade/file.c.gcov.html#L360 https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_upgrade/file.c.gcov.html#L400 https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_combinebackup/copy_file.c.gcov.html#L209 - Code coverage of acl.c is atrocious and got worse. - The new bump allocator has a fair amount of uncovered functionality: https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/mmgr/bump.c.gcov.html#L293 https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/mmgr/bump.c.gcov.html#L613 - A lot of the new resowner functions aren't covered, but I guess the equivalent functionality wasn't covered before, either: https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/cache/catcache.c.gcov.html#L2317 https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/cache/relcache.c.gcov.html#L6868 https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/storage/buffer/bufmgr.c.gcov.html#L3608 https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/storage/buffer/bufmgr.c.gcov.html#L5978 ... Greetings, Andres Freund
Re: Parallel CREATE INDEX for BRIN indexes
Hi, While preparing a differential code coverage report between 16 and HEAD, one thing that stands out is the parallel brin build code. Neither on coverage.postgresql.org nor locally is that code reached during our tests. https://coverage.postgresql.org/src/backend/access/brin/brin.c.gcov.html#2333 Greetings, Andres Freund
Re: gcc 12.1.0 warning
Hi, On 2023-10-27 13:09:01 +0300, Nazir Bilal Yavuz wrote: > I was testing 'upgrading CI Debian images to bookworm'. I tested the > bookworm on REL_15, REL_16 and upstream. REL_16 and upstream finished > successfully but the CompilerWarnings task failed on REL_15 with the > same error. Is that still the case? I briefly tried to repro this outside of CI and couldn't reproduce the warning. I'd really like to upgrade the CI images, it doesn't seem great to continue using oldstable. > gcc version: 12.2.0 > > CI Run: https://cirrus-ci.com/task/6151742664998912 Unfortunately the logs aren't accessible anymore, so I can't check the precise patch level of the compiler and/or the precise invocation used. Greetings, Andres Freund
ci: Allow running mingw tests by default via environment variable
Hi, We have CI support for mingw, but don't run the task by default, as it eats up precious CI credits. However, for cfbot we are using custom compute resources (currently donated by google), so we can afford to run the mingw tests. Right now that'd require cfbot to patch .cirrus.tasks.yml. While one can manually trigger manual task in one one's own repo, most won't have the permissions to do so for cfbot. I propose that we instead run the task automatically if $REPO_MINGW_TRIGGER_BY_DEFAULT is set, typically in cirrus' per-repository configuration. Unfortunately that's somewhat awkward to do in the cirrus-ci yaml configuration, the set of conditional expressions supported is very simplistic. To deal with that, I extended .cirrus.star to compute the required environment variable. If $REPO_MINGW_TRIGGER_BY_DEFAULT is set, CI_MINGW_TRIGGER_TYPE is set to 'automatic', if not it's 'manual'. We've also talked in other threads about adding CI support for 1) windows, building with visual studio 2) linux, with musl libc 3) free/netbsd That becomes more enticing, if we can enable them by default on cfbot but not elsewhere. With this change, it'd be easy to add further variables to control such future tasks. I also attached a second commit, that makes the "code" dealing with ci-os-only in .cirrus.tasks.yml simpler. While I think it does nicely simplify .cirrus.tasks.yml, overall it adds lines, possibly making this not worth it. I'm somewhat on the fence. Thoughts? On the code level, I thought if it'd be good to have a common prefix for all the automatically set variables. Right now that's CI_, but I'm not at all wedded to that. Greetings, Andres Freund >From 5d26ecfedbcbc83b4cb6e41a34c1af9a3ab24cdb Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 12 Apr 2024 15:04:32 -0700 Subject: [PATCH v2 1/2] ci: Allow running mingw tests by default via environment variable --- .cirrus.yml | 12 ++-- .cirrus.star| 39 +++ .cirrus.tasks.yml | 12 ++-- src/tools/ci/README | 12 4 files changed, 63 insertions(+), 12 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index a83129ae46d..f270f61241f 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -10,12 +10,20 @@ # # 1) the contents of this file # -# 2) if defined, the contents of the file referenced by the, repository +# 2) computed environment variables +# +#Used to enable/disable tasks based on the execution environment. See +#.cirrus.star: compute_environment_vars() +# +# 3) if defined, the contents of the file referenced by the, repository #level, REPO_CI_CONFIG_GIT_URL variable (see #https://cirrus-ci.org/guide/programming-tasks/#fs for the accepted #format) # -# 3) .cirrus.tasks.yml +#This allows running tasks in a different execution environment than the +#default, e.g. to have sufficient resources for cfbot. +# +# 4) .cirrus.tasks.yml # # This composition is done by .cirrus.star diff --git a/.cirrus.star b/.cirrus.star index 36233872d1e..7a164bb00de 100644 --- a/.cirrus.star +++ b/.cirrus.star @@ -7,7 +7,7 @@ https://github.com/bazelbuild/starlark/blob/master/spec.md See also .cirrus.yml and src/tools/ci/README """ -load("cirrus", "env", "fs") +load("cirrus", "env", "fs", "yaml") def main(): @@ -18,19 +18,36 @@ def main(): 1) the contents of .cirrus.yml -2) if defined, the contents of the file referenced by the, repository +2) computed environment variables + +3) if defined, the contents of the file referenced by the, repository level, REPO_CI_CONFIG_GIT_URL variable (see https://cirrus-ci.org/guide/programming-tasks/#fs for the accepted format) -3) .cirrus.tasks.yml +4) .cirrus.tasks.yml """ output = "" # 1) is evaluated implicitly + # Add 2) +additional_env = compute_environment_vars() +env_fmt = """ +### +# Computed environment variables start here +### +{0} +### +# Computed environment variables end here +### +""" +output += env_fmt.format(yaml.dumps({'env': additional_env})) + + +# Add 3) repo_config_url = env.get("REPO_CI_CONFIG_GIT_URL") if repo_config_url != None: print("loading additional configuration from \"{}\"".format(repo_config_url)) @@ -38,12 +55,26 @@ def main(): else: output += "\n# REPO_CI_CONFIG_URL was not set\n" -# Add 3) + +# Add 4) output += config_from(".cirrus.tasks.yml") + return output +def compute_environment_vars(): +cenv = {} + +# See definition of mingw task in .cirrus.tasks.yml +if env.get("REPO_MINGW_TRIGGER_BY_DEFAULT") in ['true', '1', 'yes']: +cenv['CI_MINGW_TRIGGER_TYPE'] = 'automatic' +
Re: Simplify documentation related to Windows builds
Hi, On 2024-04-12 10:27:01 +0900, Michael Paquier wrote: > Yeah. These days I personally just go through stuff like Chocolatey > or msys2 to get all my dependencies, or even a minimal set of them. I > suspect that most folks hanging around on pgsql-hackers do that as > well. Did that work with openssl for you? Because it didn't for me, when I tried that for CI. I didn't find it easy to find a working openssl for msvc, and when I did, it was one a page that could easily just have been some phishing attempt. Because of that I don't think we should remove the link to https://slproweb.com/products/Win32OpenSSL.html > So, yes, you're right that removing completely this list may be too > aggressive for the end-user. As far as I can see, there are a few > things that stand out: > - Diff is not mentioned in the list of dependencies on the meson page, > and it may not exist by default on Windows. I think that we should > add it. That seems quite basic compared to everything else. But also not opposed. I guess it might be worth checking if diff is present during meson configure, so it's not just some weird error. I didn't really think about that when writing the meson stuff, because it's just a hardcoded command in pg_regress.c, not something that visible to src/tools/msvc, configure or such. > - We don't use activeperl anymore in the buildfarm, and recommending > it is not a good idea based on the state of the project. If we don't > remove the entry, I would switch it to point to msys perl or even > strawberry perl. Andres has expressed concerns about the strawberry > part, so perhaps mentioning only msys perl would be enough? I think it's nonobvious enough to install that I think it's worth keeping something. I tried at some point, and unfortunately the perl from git-for-windows install doesn't quite work. It needs to be a perl targeting ucrt (or perhaps some other native target). > > So I think that it's pretty darn helpful to have some installation > > instructions in the documentation for stuff like this, just like I > > think it's useful that in the documentation index we tell people how > > to get the doc toolchain working on various platforms. FWIW, here's the mingw install commands to install a suitable environment for building postgres on windows with mingw, from the automated image generation for CI: https://github.com/anarazel/pg-vm-images/blob/main/scripts/windows_install_mingw64.ps1#L21-L22 I wonder if we should maintain something like that list somewhere in the postgres repo instead... Greetings, Andres Freund