Re: commitfest.postgresql.org is no longer fit for purpose
On Sun, May 19, 2024 at 10:50 PM Thomas Munro wrote: > Sometimes I question the sanity of the whole thing. Considering > cfbot's original "zero-effort CI" goal (or I guess "zero-extra-effort" > would be better), I was curious about what other projects had the same > idea, or whether we're really just starting at the "wrong end", and > came up with: > > https://github.com/getpatchwork/patchwork > http://vger.kernel.org/bpfconf2022_material/lsfmmbpf2022-bpf-ci.pdf > <-- example user > https://github.com/patchew-project/patchew > > Actually cfbot requires more effort than those, because it's driven > first by Commitfest app registration. Those projects are extremists > IIUC: just write to a mailing list, no other bureaucracy at all (at > least for most participants, presumably administrators can adjust the > status in some database when things go wrong?). We're actually > halfway to Gitlab et al already, with a web account and interaction > required to start the process of submitting a patch for consideration. > What I'm less clear on is who else has come up with the "bitrot" test > idea, either at the mailing list or web extremist ends of the scale. > Those are also generic tools, and cfbot obviously knows lots of things > about PostgreSQL, like the "highlights" and probably more things I'm > forgetting. For what it's worth, a few years before cfbot, I had privately attempted a similar idea for Postgres [1]. The project here is basically a very simple API and infrastructure for running builds and make check. A previous version [2] subscribed to the mailing lists and used Travis CI (and accidentally spammed some Postgres committers [3]). The project petered out as my work responsibilities shifted (and to be honest, after I felt sheepish about the spamming). I think cfbot is way, way ahead of where my project got at this point. But since you asked about other similar projects, I'm happy to discuss further if it's helpful to bounce ideas off someone who's thought about the same problem (though not for a while now, I admit). Thanks, Maciek [1]: https://github.com/msakrejda/pg-quilter [2]: https://github.com/msakrejda/pg-quilter/blob/2038d9493f9aa7d43d3eb0aec1d299b94624602e/lib/pg-quilter/git_harness.rb [3]: https://www.postgresql.org/message-id/flat/CAM3SWZQboGoVYAJNoPMx%3DuDLE%2BZh5k2MQa4dWk91YPGDxuY-gQ%40mail.gmail.com#e24bf57b77cfb6c440c999c018c46e92
Re: commitfest.postgresql.org is no longer fit for purpose
Thanks for raising this. As someone who is only modestly familiar with Postgres internals or even C, but would still like to contribute through review, I find the current process of finding a suitable patch both tedious and daunting. The Reviewing a Patch article on the wiki [0] says reviews like mine are still welcome, but it's hard to get started. I'd love to see this be more approachable. Thanks, Maciek [0]: https://wiki.postgresql.org/wiki/Reviewing_a_Patch
Re: Possibility to disable `ALTER SYSTEM`
On Wed, Mar 27, 2024, 11:46 Robert Haas wrote: > On Wed, Mar 27, 2024 at 1:12 PM Isaac Morland > wrote: > > On Wed, 27 Mar 2024 at 13:05, Greg Sabino Mullane > wrote: > >>> The purpose of the setting is to prevent > accidental modifications via ALTER > SYSTEM in environments where > >> The emphasis on 'accidental' seems a bit heavy here, and odd. Surely, > just "to prevent modifications via ALTER SYSTEM in environments where..." > is enough? > > Not necessarily disagreeing, but it's very important nobody ever mistake > this for a security feature. I don't know if the extra word "accidental" is > necessary, but I think that's the motivation. > > I think the emphasis is entirely warranted in this case. +1. And while "non-malicious" may technically be more correct, I don't think it's any clearer. >
Re: Possibility to disable `ALTER SYSTEM`
On Mon, Mar 18, 2024 at 10:27 AM Robert Haas wrote: > Right, we're adding this because of environments like Kubernetes, > which isn't a version, but it is a platform, or at least a deployment > mode, which is why I thought of that section. I think for now we > should just file this under "Other platforms and clients," which only > has one existing setting. If the number of settings of this type > grows, we can split it out. Fair enough, +1. > Using enable_* as code for "this is a planner GUC" is a pretty stupid > pattern, honestly, but I agree with you that it's long-established and > we probably shouldn't deviate from it lightly. Perhaps just rename to > allow_alter_system? +1
Re: Possibility to disable `ALTER SYSTEM`
On Mon, Mar 18, 2024 at 7:12 AM Jelte Fennema-Nio wrote: > > On Mon, 18 Mar 2024 at 13:57, Robert Haas wrote: > > I would have been somewhat inclined to find an existing section > > of postgresql.auto.conf for this parameter, perhaps "platform and > > version compatibility". > > I tried to find an existing section, but I couldn't find any that this > new GUC would fit into naturally. "Version and Platform Compatibility > / Previous PostgreSQL Versions" (the one you suggested) seems wrong > too. The GUCs there are to get back to Postgres behaviour from > previous versions. So that section would only make sense if we'd turn > enable_alter_system off by default (which obviously no-one in this > thread suggests/wants). > > If you have another suggestion for an existing category that we should > use, feel free to share. But imho, none of the existing ones are a > good fit. +1 on Version and Platform Compatibility. Maybe it just needs a new subsection there? This is for compatibility with a "deployment platform". The "Platform and Client Compatibility" subsection has just one entry, so a new subsection with also just one entry seems defensible, maybe just "Deployment Compatibility"? I think it's also plausible that there will be other similar settings for managed deployments in the future. > > Even if that is what we're going to do, do we want to call them "guard > > rails"? I'm not sure I'd find that name terribly clear, as a user. > > If anyone has a better suggestion, I'm happy to change it. No better suggestion at the moment, but while I used the term to explain the feature, I also don't think that's a great official name. For one thing, the section could easily be misinterpreted as guard rails for end-users who are new to Postgres. Also, I think it's more colloquial in tone than Postgres docs conventions. Further, I think we may want to change the GUC name itself. All the other GUCs that start with enable_ control planner behavior: maciek=# select name from pg_settings where name like 'enable_%'; name enable_async_append enable_bitmapscan enable_gathermerge enable_hashagg enable_hashjoin enable_incremental_sort enable_indexonlyscan enable_indexscan enable_material enable_memoize enable_mergejoin enable_nestloop enable_parallel_append enable_parallel_hash enable_partition_pruning enable_partitionwise_aggregate enable_partitionwise_join enable_presorted_aggregate enable_seqscan enable_sort enable_tidscan (21 rows) Do we really want to break that pattern?
Re: Possibility to disable `ALTER SYSTEM`
On Thu, Mar 14, 2024 at 1:38 PM Robert Haas wrote: > On Thu, Mar 14, 2024 at 4:08 PM Tom Lane wrote: > > The patch-of-record contains no such wording. > > I plan to fix that, if nobody else beats me to it. > > > And if this isn't a > > security feature, then what is it? If you have to say to your > > (super) users "please don't mess with the system configuration", > > you might as well just trust them not to do it the easy way as not > > to do it the hard way. If they're untrustworthy, why have they > > got superuser? > > I mean, I feel like this question has been asked and answered before, > multiple times, on this thread. If you sincerely don't understand the > use case, I can try again to explain it. But somehow I feel like it's > more that you just don't like the idea, which is fair, but it seems > like a considerable number of people feel otherwise. I know I'm jumping into a long thread here, but I've been following it out of interest. I'm sympathetic to the use case, since I used to work at a Postgres cloud provider, and while our system intentionally did not give our end users superuser privileges, I can imagine other managed environments where that's not an issue. I'd like to give answering this question again a shot, because I think this has been a persistent misunderstanding in this thread, and I don't think it's been made all that clear. It's not a security feature: it's a usability feature. It's a usability feature because, when Postgres configuration is managed by an outside mechanism (e.g., as in a Kubernetes environment), ALTER SYSTEM currently allows a superuser to make changes that appear to work, but may be discarded at some point in the future when that outside mechanism updates the config. They may also be represented incorrectly in a management dashboard if that dashboard is based on the values in the outside configuration mechanism, rather than values directly from Postgres. In this case, the end user with access to Postgres superuser privileges presumably also has access to the outside configuration mechanism. The goal is not to prevent them from changing settings, but to offer guard rails that prevent them from changing settings in a way that will be unstable (revertible by a future update) or confusing (not showing up in a management UI). There are challenges here in making sure this is _not_ seen as a security feature. But I do think the feature itself is sensible and worthwhile. Thanks, Maciek
Re: Printing backtrace of postgres processes
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed I'm not sure if this actually still needs review, but it's marked as such in the CF app, so I'm reviewing it in the hopes of moving it along. The feature works as documented. The docs say "This function is supported only if PostgreSQL was built with the ability to capture backtraces, otherwise it will emit a warning." I'm not sure what building with the ability to capture backtraces is, but it worked with no special config on my machine. I don't have much C experience, so I don't know if this is something that should have more context in a README somewhere, or if it's likely someone who's interested in this will already know what to do. The code looks fine to me. I tried running make installcheck-world, but it failed on 17 tests. However, master also fails here on 17 tests. A normal make check-world passes on both branches. I assume I'm doing something wrong and would appreciate any pointers [0]. Based on my review and Daniel's comment above, I'm marking this as Ready for Committer. Thanks, Maciek [0] My regression.diffs has errors like ``` diff -U3 /home/maciek/code/aux/postgres/src/test/regress/expected/copyselect.out /home/maciek/code/aux/postgres/src/test/regress/results/copyselect.out --- /home/maciek/code/aux/postgres/src/test/regress/expected/copyselect.out 2023-01-02 12:21:10.792646101 -0800 +++ /home/maciek/code/aux/postgres/src/test/regress/results/copyselect.out 2024-01-14 15:04:07.513887866 -0800 @@ -131,11 +131,6 @@ 2 ?column? -- -3 -(1 row) - - ?column? --- 4 (1 row) ``` and ``` diff -U3 /home/maciek/code/aux/postgres/src/test/regress/expected/create_table.out /home/maciek/code/aux/postgres/src/test/regress/results/create_table.out --- /home/maciek/code/aux/postgres/src/test/regress/expected/create_table.out 2023-10-02 22:14:02.583377845 -0700 +++ /home/maciek/code/aux/postgres/src/test/regress/results/create_table.out 2024-01-14 15:04:09.037890710 -0800 @@ -854,8 +854,6 @@ b | integer | | not null | 1 | plain| | Partition of: parted FOR VALUES IN ('b') Partition constraint: ((a IS NOT NULL) AND (a = 'b'::text)) -Not-null constraints: -"part_b_b_not_null" NOT NULL "b" (local, inherited) -- Both partition bound and partition key in describe output \d+ part_c ``` I'm on Ubuntu 22.04 with Postgres 11, 12, 13, and 16 installed from PGDG. The new status of this patch is: Ready for Committer
Re: Pdadmin open on Macbook issue
This is not the right mailing list for your question. Try the pgadmin-support [1] mailing list. You may also want to include more details in your question, because it's not really possible to tell what's going wrong from your description. [1]: https://www.postgresql.org/list/pgadmin-support/
Re: Proposal: In-flight explain logging
Have you seen the other recent patch regarding this? [1] The mailing list thread was active pretty recently. The submission is marked as Needs Review. I haven't looked at either patch, but the proposals are very similar as I understand it. [1]: https://commitfest.postgresql.org/45/4345/
Re: Emitting JSON to file using COPY TO
On Fri, Dec 1, 2023 at 11:32 AM Joe Conway wrote: > 1. Is supporting JSON array format sufficient, or does it need to > support some other options? How flexible does the support scheme need to be? "JSON Lines" is a semi-standard format [1] that's basically just newline-separated JSON values. (In fact, this is what log_destination=jsonlog gives you for Postgres logs, no?) It might be worthwhile to support that, too. [1]: https://jsonlines.org/
Re: run pgindent on a regular basis / scripted manner
On Tue, Oct 17, 2023, 09:22 Robert Haas wrote: > On Tue, Oct 17, 2023 at 12:18 PM Tom Lane wrote: > > Robert Haas writes: > > Is that actually possible? I had the idea that "git push" is an > > atomic operation, ie 100% or nothing. Is it only atomic per-branch? > > I believe so. Git push does have an --atomic flag to treat the entire push as a single operation.
Re: Differences between = ANY and IN?
Great, thanks for the guidance!
Re: pg_stat_statements and "IN" conditions
I've also tried the patch and I see the same results as Jakub, which make sense to me. I did have issues getting it to apply, though: `git am` complains about a conflict, though patch itself was able to apply it.
Differences between = ANY and IN?
Hello, My colleague has been working on submitting a patch [1] to the Ruby Rails framework to address some of the problems discussed in [2]. Regardless of whether that change lands, the change in Rails would be useful since people will be running Postgres versions without this patch for a while. My colleague's patch changes SQL generated from Ruby expressions like `where(id: [1, 2])` . This is currently translated to roughly `WHERE id IN (1, 2)` and would be changed to `id = ANY('{1,2}')`. As far as we know, the expressions are equivalent, but we wanted to double-check: are there any edge cases to consider here (other than the pg_stat_statements behavior, of course)? Thanks, Maciek [1]: https://github.com/rails/rails/pull/49388 [2]: https://www.postgresql.org/message-id/flat/20230209172651.cfgrebpyyr72h7fv%40alvherre.pgsql#eef3c77bc28b9922ea6b9660b0221b5d
Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
On Tue, Sep 19, 2023, 20:23 Maciek Sakrejda wrote: > I wonder if something like CURRENT (i.e., the search path at function > creation time) might be a useful keyword addition. I can see some uses > (more forgiving than SYSTEM but not as loose as SESSION), but I don't > know if it would justify its presence. I realize now this is exactly what SET search_path FROM CURRENT does. Sorry for the noise. Regarding extensions installed in the public schema throwing a wrench in the works, is that still a problem if the public schema is not writable? I know that that's a new default, but it won't be forever.
Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis wrote: >... > On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote: > > That leads to a second idea, which is having it continue > > to be a GUC but only affect directly-entered SQL, with all > > indirectly-entered SQL either being stored as a node tree or having a > > search_path property attached somewhere. > > That's not too far from the proposed patch and I'd certainly be > interested to hear more and/or adapt my patch towards this idea. As an interested bystander, that's the same thing I was thinking when reading this. I reread your original e-mail, Jeff, and I still think that. I wonder if something like CURRENT (i.e., the search path at function creation time) might be a useful keyword addition. I can see some uses (more forgiving than SYSTEM but not as loose as SESSION), but I don't know if it would justify its presence. Thanks for working on this. Thanks, Maciek
Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
On Mon, May 1, 2023, 18:08 Robert Haas wrote: > I am saying that, while wraparound is perhaps not a perfect term > for what's happening, it is not, in my opinion, a bad term either. I don't want to put words into Peter's mouth, but I think that he's arguing that the term "wraparound" suggests that there is something special about the transition between xid 2^32 and xid 0 (or, well, 3). There isn't. There's only something special about the transition, as your current xid advances, between the xid that's half the xid space ahead of your current xid and the xid that's half the xid space behind the current xid, if the latter is not frozen. I don't think that's what most users think of when they hear "wraparound".
Re: doc: add missing "id" attributes to extension packaging page
For what it's worth, I think having the anchors be always-visible when CSS disabled is a feature. The content is still perfectly readable, and the core feature from this patch is available. Introducing JavaScript to lose that functionality seems like a step backwards. By the way, the latest patch attachment was not the full patch series, which I think confused cfbot: [1] (unless I'm misunderstanding the state of the patch series). And thanks for working on this. I've hunted in the page source for ids to link to a number of times. I look forward to not doing that anymore. Thanks, Maciek [1]: https://commitfest.postgresql.org/42/4042/
Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)
On Thu, Feb 23, 2023, 09:55 Nikolay Samokhvalov wrote: > On Thu, Feb 23, 2023 at 9:05 AM Maciek Sakrejda > wrote: > >> I think Heikki's solution is probably more practical since (1) .. > > > Note that these ideas target two *different* problems: > - what was the duration of the last query > - when was the last query executed > > So, having both solved would be ideal. > Fair point, but since the duration solution needs to capture two timestamps anyway, it could print start time as well as duration. The prompt timestamp could still be handy for more intricate session forensics, but I don't know if that's a common-enough use case. Thanks, Maciek >
Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)
+1 on solving the general problem of "I forgot to set \timing--how long did this run?". I could have used that more than once in the past, and I'm sure it will come up again. I think Heikki's solution is probably more practical since (1) even if we add the prompt parameter originally proposed, I don't see it being included in the default, so it would require users to change their prompt before they can benefit from it and (2) even if we commit to never allowing tweaks to the format, I foresee a slow but endless trickle of requests and patches to do so. Thanks, Maciek
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Tue, Feb 14, 2023 at 11:08 AM Andres Freund wrote: > One thing I started to wonder about since is whether we should remove the io_ > prefix from io_object, io_context. The prefixes make sense on the C level, but > it's not clear to me that that's also the case on the table level. Yeah, +1. It's hard to argue that there would be any confusion, considering `io_` is in the name of the view. (Unless, I suppose, some other, non-I/O, "some_object" or "some_context" column were to be introduced to this view in the future. But that doesn't seem likely?)
Re: ANY_VALUE aggregate
I could have used such an aggregate in the past, so +1. This is maybe getting into nit-picking, but perhaps it should be documented as returning an "arbitrary" value instead of a "non-deterministic" one? Technically the value is deterministic: there's a concrete algorithm specifying how it's selected. However, the algorithm is reserved as an implementation detail, since the function is designed for cases in which the caller should not care which value is returned. Thanks, Maciek
Re: Something is wrong with wal_compression
On Fri, Jan 27, 2023, 18:58 Andres Freund wrote: > Hi, > > On 2023-01-27 16:15:08 +1300, Thomas Munro wrote: > > It would be pg_current_xact_id() that would have to pay the cost of > > the WAL flush, not pg_xact_status() itself, but yeah that's what the > > patch does (with some optimisations). I guess one question is whether > > there are any other reasonable real world uses of > > pg_current_xact_id(), other than the original goal[1]. > > txid_current() is a lot older than pg_current_xact_id(), and they're > backed by > the same code afaict. 8.4 I think. > > Unfortunately txid_current() is used in plenty montiring setups IME. > > I don't think it's a good idea to make a function that was quite cheap for > 15 > years, suddenly be several orders of magnitude more expensive... As someone working on a monitoring tool that uses it (well, both), +1. We'd have to rethink a few things if this becomes a performance concern. Thanks, Maciek
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Tue, Jan 17, 2023 at 9:22 AM Melanie Plageman wrote: > On Mon, Jan 16, 2023 at 4:42 PM Maciek Sakrejda wrote: > > I missed a couple of versions, but I think the docs are clearer now. > > I'm torn on losing some of the detail, but overall I do think it's a > > good trade-off. Moving some details out to after the table does keep > > the bulk of the view documentation more readable, and the "inform > > database tuning" part is great. I really like the idea of a separate > > Interpreting Statistics section, but for now this works. > > > > >+ vacuum: I/O operations performed outside of > > >shared > > >+ buffers while vacuuming and analyzing permanent relations. > > > > Why only permanent relations? Are temporary relations treated > > differently? I imagine if someone has a temp-table-heavy workload that > > requires regularly vacuuming and analyzing those relations, this point > > may be confusing without some additional explanation. > > Ah, yes. This is a bit confusing. We don't use buffer access strategies > when operating on temp relations, so vacuuming them is counted in IO > Context normal. I've added this information to the docs but now that > definition is a bit long. Perhaps it should be a note? That seems like > it would draw too much attention to this detail, though... Thanks for clarifying. I think the updated definition still works: it's still shorter than the `normal` context definition.
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Fri, Jan 13, 2023 at 10:38 AM Melanie Plageman wrote: > > Attached is v47. I missed a couple of versions, but I think the docs are clearer now. I'm torn on losing some of the detail, but overall I do think it's a good trade-off. Moving some details out to after the table does keep the bulk of the view documentation more readable, and the "inform database tuning" part is great. I really like the idea of a separate Interpreting Statistics section, but for now this works. >+ vacuum: I/O operations performed outside of >shared >+ buffers while vacuuming and analyzing permanent relations. Why only permanent relations? Are temporary relations treated differently? I imagine if someone has a temp-table-heavy workload that requires regularly vacuuming and analyzing those relations, this point may be confusing without some additional explanation. Other than that, this looks great. Thanks, Maciek
Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
On Fri, Jul 15, 2022 at 11:21 AM Maciek Sakrejda wrote: > On Fri, Jul 1, 2022 at 10:26 AM Andres Freund wrote: > > On 2022-07-01 01:23:01 -0700, Lukas Fittl wrote: > >... > > > Known WIP problems with this patch version: > > > > > > * There appears to be a timing discrepancy I haven't yet worked out, where > > > the \timing data reported by psql doesn't match what EXPLAIN ANALYZE is > > > reporting. With Andres' earlier test case, I'm seeing a consistent > > > ~700ms > > > higher for \timing than for the EXPLAIN ANALYZE time reported on the > > > server > > > side, only when rdtsc measurement is used -- its likely there is a > > > problem > > > somewhere with how we perform the cycles to time conversion > > > > Could you explain a bit more what you're seeing? I just tested your patches > > and didn't see that here. > > I did not see this either, but I did see that the execution time > reported by \timing is (for this test case) consistently 0.5-1ms > *lower* than the Execution Time reported by EXPLAIN. I did not see > that on master. Is that expected? For what it's worth, I can no longer reproduce this. In fact, I went back to master-as-of-around-then and applied Lukas' v2 patches again, and I still can't reproduce that. I do remember it happening consistently across several executions, but now \timing consistently shows 0.5-1ms slower, as expected. This does not explain the different timing issue Lukas was seeing in his tests, but I think we can assume what I reported originally here is not an issue.
Re: GetNewObjectId question
Sure. My C is pretty limited, but I think it's just the attached? I patterned the usage on the way this is done in CreateRole. It passes check-world here. From c7cae5d3e8d179505f26851f1241436a3748f9a8 Mon Sep 17 00:00:00 2001 From: Maciek Sakrejda Date: Sat, 10 Dec 2022 22:51:02 -0800 Subject: [PATCH] Replace GetNewObjectId call with GetNewOidWithIndex GetNewObjectId is not intended to be used directly in most situations, since it can lead to duplicate OIDs unless that is guarded against. --- src/backend/commands/user.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c index 8b6543edee..0686807fa0 100644 --- a/src/backend/commands/user.c +++ b/src/backend/commands/user.c @@ -1850,7 +1850,8 @@ AddRoleMems(const char *rolename, Oid roleid, } /* get an OID for the new row and insert it */ - objectId = GetNewObjectId(); + objectId = GetNewOidWithIndex(pg_authmem_rel, AuthMemOidIndexId, + Anum_pg_auth_members_oid); new_record[Anum_pg_auth_members_oid - 1] = objectId; tuple = heap_form_tuple(pg_authmem_dsc, new_record, new_record_nulls); -- 2.25.1
GetNewObjectId question
While browsing through varsup.c, I noticed this comment on GetNewObjectId: * Hence, this routine should generally not be used directly. The only direct * callers should be GetNewOidWithIndex() and GetNewRelFileNumber() in * catalog/catalog.c. But AddRoleMems in user.c appears to also call the function directly: /* get an OID for the new row and insert it */ objectId = GetNewObjectId(); new_record[Anum_pg_auth_members_oid - 1] = objectId; tuple = heap_form_tuple(pg_authmem_dsc, new_record, new_record_nulls); CatalogTupleInsert(pg_authmem_rel, tuple); I'm not sure if that call is right, but this seems inconsistent. Should that caller be using GetNewOidWithIndex instead? Or should the comment be updated? Thanks, Maciek
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
In the pg_stat_statements docs, there are several column descriptions like Total number of ... by the statement You added an additional sentence to some describing the equivalent pg_stat_io values, but you only added a period to the previous sentence for shared_blks_read (for other columns, the additional description just follows directly). These should be consistent. Otherwise, the docs look good to me.
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Tue, Nov 29, 2022 at 5:13 PM Melanie Plageman wrote: > Thanks for the review, Maciek! > > I've attached a new version 39 of the patch which addresses your docs > feedback from this email as well as docs feedback from Andres in [1] and > Justin in [2]. This looks great! Just a couple of minor comments. > You are right: reused is a normal, expected part of strategy > execution. And you are correct: the idea behind reusing existing > strategy buffers instead of taking buffers off the freelist is to leave > those buffers for blocks that we might expect to be accessed more than > once. > > In practice, however, if you happen to not be using many shared buffers, > and then do a large COPY, for example, you will end up doing a bunch of > writes (in order to reuse the strategy buffers) that you perhaps didn't > need to do at that time had you leveraged the freelist. I think the > decision about which tradeoff to make is quite contentious, though. Thanks for the explanation--that makes sense. > On Mon, Nov 7, 2022 at 1:26 PM Maciek Sakrejda wrote: > > Alternately, what do you think about pulling equivalencies to existing > > views out of the main column descriptions, and adding them after the > > main table as a sort of footnote? Most view docs don't have anything > > like that, but pg_stat_replication does and it might be a good pattern > > to follow. > > > > Thoughts? > > Thanks for including a patch! > In the attached v39, I've taken your suggestion of flattening some of > the lists and done some rewording as well. I have also moved the note > about equivalence with pg_stat_statements columns to the > pg_stat_statements documentation. The result is quite a bit different > than what I had before, so I would be interested to hear your thoughts. > > My concern with the blue "note" section like you mentioned is that it > would be harder to read the lists of backend types than it was in the > tabular format. Oh, I wasn't thinking of doing a separate "note": just additional paragraphs of text after the table (like what pg_stat_replication has before its "note", or the brief comment after the pg_stat_archiver table). But I think the updated docs work also. + + The context or location of an IO operation. + maybe "...of an IO operation:" (colon) instead? + default. Future values could include those derived from + XLOG_BLCKSZ, once WAL IO is tracked in this view, and + constant multipliers once non-block-oriented IO (e.g. temporary file IO) + is tracked here. I know Lukas had commented that we should communicate that the goal is to eventually provide relatively comprehensive I/O stats in this view (you do that in the view description and I think that works), and this is sort of along those lines, but I think speculative documentation like this is not all that helpful. I'd drop this last sentence. Just my two cents. + + evicted in io_context + buffer pool and io_object + temp relation counts the number of times a block of + data from an existing local buffer was evicted in order to replace it + with another block, also in local buffers. + Doesn't this follow from the first sentence of the column description? I think we could drop this, no? Otherwise, the docs look good to me. Thanks, Maciek
Odd behavior with pg_stat_statements and queries called from SQL functions
I noticed an odd behavior today in pg_stat_statements query normalization for queries called from SQL-language functions. If I have three functions that call an essentially identical query (the functions are only marked SECURITY DEFINER to prevent inlining): maciek=# create or replace function f1(f1param text) returns text language sql as 'select f1param' security definer; CREATE FUNCTION maciek=# create or replace function f2(f2param text) returns text language sql as 'select f2param' security definer; CREATE FUNCTION maciek=# create or replace function f3(text) returns text language sql as 'select $1' security definer; CREATE FUNCTION and I have pg_stat_statements.track = 'all', so that queries called from functions are tracked, these all end up with the same query id in pg_stat_statements, but the query text includes the parameter name (if one is referenced in the query in the function). E.g., if I call f1 first, then f2 and f3, I get: maciek=# select queryid, query, calls from pg_stat_statements where queryid = 6741491046520556186; queryid | query | calls -++--- 6741491046520556186 | select f1param | 3 (1 row) If I call f3 first, then f2 and f1, I get maciek=# select queryid, query, calls from pg_stat_statements where queryid = 6741491046520556186; queryid | query | calls -+---+--- 6741491046520556186 | select $1 | 3 (1 row) I understand that the query text may be captured differently for different queries that map to the same id, but it seems confusing that parameter names referenced in queries called from functions are not normalized away, since they're not germane to the query execution itself, and the context of the function is otherwise stripped away by this point. I would expect that all three of these queries end up in pg_stat_statements with the query text "select $1".Thoughts? Thanks, Maciek
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Thu, Nov 3, 2022 at 10:00 AM Melanie Plageman wrote: > > I decided not to call it pg_statio because all of the other stats views > have an underscore after stat and I thought it was an opportunity to be > consistent with them. Oh, got it. Makes sense. > > I'm reviewing the rendered docs now, and I noticed sentences like this > > are a bit hard to scan: they force the reader to parse a big list of > > backend types before even getting to the meat of what this is talking > > about. Should we maybe reword this so that the backend list comes at > > the end of the sentence? Or maybe even use a list (e.g., like in the > > "state" column description in pg_stat_activity)? > > Good idea with the bullet points. > For the lengthy lists, I've added bullet point lists to the docs for > several of the columns. It is quite long now but, hopefully, clearer? > Let me know if you think it improves the readability. Hmm, I should have tried this before suggesting it. I think the lists break up the flow of the column description too much. What do you think about the attached (on top of your patches--attaching it as a .diff to hopefully not confuse cfbot)? I kept the lists for backend types but inlined the others as a middle ground. I also added a few omitted periods and reworded "read plus extended" to avoid starting the sentence with a (lowercase) varname (I think in general it's fine to do that, but the more complicated sentence structure here makes it easier to follow if the sentence starts with a capital). Alternately, what do you think about pulling equivalencies to existing views out of the main column descriptions, and adding them after the main table as a sort of footnote? Most view docs don't have anything like that, but pg_stat_replication does and it might be a good pattern to follow. Thoughts? > > Also, is this (in the middle of the table) the right place for this > > column? I would have expected to see it before or after all the actual > > I/O op cells. > > I put it after read, write, and extend columns because it applies to > them. It doesn't apply to files_synced. For reused and evicted, I didn't > think bytes reused and evicted made sense. Also, when we add non-block > oriented IO, reused and evicted won't be used but op_bytes will be. So I > thought it made more sense to place it after the operations it applies > to. Got it, makes sense. > > + io_contexts. When a Buffer Access > > + Strategy reuses a buffer in the strategy ring, it must evict its > > + contents, incrementing reused. When a Buffer > > + Access Strategy adds a new shared buffer to the strategy ring > > + and this shared buffer is occupied, the Buffer Access > > + Strategy must evict the contents of the shared buffer, > > + incrementing evicted. > > > > I think the parallel phrasing here makes this a little hard to follow. > > Specifically, I think "must evict its contents" for the strategy case > > sounds like a bad thing, but in fact this is a totally normal thing > > that happens as part of strategy access, no? The idea is you probably > > won't need that buffer again, so it's fine to evict it. I'm not sure > > how to reword, but I think the current phrasing is misleading. > > I had trouble rephrasing this. I changed a few words. I see what you > mean. It is worth noting that reusing strategy buffers when there are > buffers on the freelist may not be the best behavior, so I wouldn't > necessarily consider "reused" a good thing. However, I'm not sure how > much the user could really do about this. I would at least like this > phrasing to be clear (evicted is for shared buffers, reused is for > strategy buffers), so, perhaps this section requires more work. Oh, I see. I think the updated wording works better. Although I think we can drop the quotes around "Buffer Access Strategy" here. They're useful when defining the term originally, but after that I think it's clearer to use the term unquoted. Just to understand this better myself, though: can you clarify when "reused" is not a normal, expected part of the strategy execution? I was under the impression that a ring buffer is used because each page is needed only "once" (i.e., for one set of operations) for the command using the strategy ring buffer. Naively, in that situation, it seems better to reuse a no-longer-needed buffer than to claim another buffer from the freelist (where other commands may eventually make better use of it). > > + again. A high number of repossessions is a sign of contention for the > > + blocks operated on by the strategy operation. > > > > This (and in general the repossession description) makes sense, but > > I'm not sure what to do with the information. Maybe Andres is right > > that we could skip this in the first version? > > I've removed repossessed and rejected in attached v37. I am a bit sad > about this because I don't see a good way forward and I think those > could be useful for users. I can see that, but I think as long as we're not doing
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Wed, Oct 26, 2022 at 10:55 AM Melanie Plageman wrote: + The pg_statio_ and + pg_stat_io views are primarily useful to determine + the effectiveness of the buffer cache. When the number of actual disk reads Totally nitpicking, but this reads a little funny to me. Previously the trailing underscore suggested this is a group, and now with pg_stat_io itself added (stupid question: should this be "pg_statio"?), it sounds like we're talking about two views: pg_stat_io and "pg_statio_". Maybe something like "The pg_stat_io view and the pg_statio_ set of views are primarily..."? + by that backend type in that IO context. Currently only a subset of IO + operations are tracked here. WAL IO, IO on temporary files, and some forms + of IO outside of shared buffers (such as when building indexes or moving a + table from one tablespace to another) could be added in the future. Again nitpicking, but should this be "may be added"? I think "could" suggests the possibility of implementation, whereas "may" feels more like a hint as to how the feature could evolve. + portion of the main shared buffer pool. This pattern is called a + Buffer Access Strategy in the + PostgreSQL source code and the fixed-size + ring buffer is referred to as a strategy ring buffer for + the purposes of this view's documentation. + Nice, I think this explanation is very helpful. You also use the term "strategy context" and "strategy operation" below. I think it's fairly obvious what those mean, but pointing it out in case we want to note that here, too. + read and extended for Maybe "plus" instead of "and" here for clarity (I'm assuming that's what the "and" means)? + backend_types autovacuum launcher, + autovacuum worker, client backend, + standalone backend, background + worker, and walsender for all + io_contexts is similar to the sum of I'm reviewing the rendered docs now, and I noticed sentences like this are a bit hard to scan: they force the reader to parse a big list of backend types before even getting to the meat of what this is talking about. Should we maybe reword this so that the backend list comes at the end of the sentence? Or maybe even use a list (e.g., like in the "state" column description in pg_stat_activity)? + heap_blks_read, idx_blks_read, + tidx_blks_read, and + toast_blks_read in + pg_statio_all_tables. and + blks_read from If using the PostgreSQL extension, + , + read for + backend_types autovacuum launcher, + autovacuum worker, client backend, + standalone backend, background + worker, and walsender for all + io_contexts is equivalent to Same comment as above re: the lengthy list. + Normal client backends should be able to rely on maintenance processes + like the checkpointer and background writer to write out dirty data as Nice--it's great to see this mentioned. But I think these are generally referred to as "auxiliary" not "maintenance" processes, no? + If using the PostgreSQL extension, + , written and + extended for backend_types Again, should this be "plus" instead of "and"? + + bytes_conversion bigint + I think this general approach works (instead of unit). I'm not wild about the name, but I don't really have a better suggestion. Maybe "op_bytes" (since each cell is counting the number of I/O operations)? But I think bytes_conversion is okay. Also, is this (in the middle of the table) the right place for this column? I would have expected to see it before or after all the actual I/O op cells. + io_contexts. When a Buffer Access + Strategy reuses a buffer in the strategy ring, it must evict its + contents, incrementing reused. When a Buffer + Access Strategy adds a new shared buffer to the strategy ring + and this shared buffer is occupied, the Buffer Access + Strategy must evict the contents of the shared buffer, + incrementing evicted. I think the parallel phrasing here makes this a little hard to follow. Specifically, I think "must evict its contents" for the strategy case sounds like a bad thing, but in fact this is a totally normal thing that happens as part of strategy access, no? The idea is you probably won't need that buffer again, so it's fine to evict it. I'm not sure how to reword, but I think the current phrasing is misleading. + The number of times a bulkread found the current + buffer in the fixed-size strategy ring dirty and requiring flush. Maybe "...found ... to be dirty..."? + frequent vacuuming or more aggressive autovacuum settings, as buffers are + dirtied during a bulkread operation when updating the hint bit or when + performing on-access pruning. Are there docs to cross-reference here, especially for pruning? I couldn't find much except a few un-explained mentions in the page layout docs [2], and most of the search results refer to partition pruning. Searching for hint bits at least gives some info in blog posts and the wiki. + again. A high number of repossessions is a sign of contention for the + blocks operated on by the strategy operation. This (and in
Re: warn if GUC set to an invalid shared library
On Sat, Oct 29, 2022 at 10:40 AM Justin Pryzby wrote: > > On Fri, Sep 02, 2022 at 05:24:58PM -0500, Justin Pryzby wrote: > > It caused no issue when I changed: > > > > /* Check that it's acceptable for the indicated > > parameter */ > > if (!parse_and_validate_value(record, name, value, > > - PGC_S_FILE, ERROR, > > + PGC_S_TEST, ERROR, > > , )) > > > > I'm not sure where to go from here. > > I'm hoping for some guidance ; this simple change may be naive, but I'm not > sure what a wider change would look like. I assume you mean guidance on implementation details here, and not on design. I still think this is a useful patch and I'd be happy to review and try out future iterations, but I don't have any useful input here. Also, for what it's worth, I think requiring the libraries to be in place before running ALTER SYSTEM does not really seem that onerous. I can't really think of use cases it precludes. Thanks, Maciek
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Thu, Oct 20, 2022 at 10:31 AM Andres Freund wrote: > - "repossession" is a very unintuitive name for me. If we want something like > it, can't we just name it reuse_failed or such? +1, I think "repossessed" is awkward. I think "reuse_failed" works, but no strong opinions on an alternate name. > - Wonder if the column names should be reads, writes, extends, etc instead of > the current naming pattern Why? Lukas suggested alignment with existing views like pg_stat_database and pg_stat_statements. It doesn't make sense to use the blks_ prefix since it's not all blocks, but otherwise it seems like we should be consistent, no? > > "freelist_acquired" is confusing for local buffers but I wanted to > > distinguish between reuse/eviction of local buffers and initial > > allocation. "freelist_acquired" seemed more fitting because there is a > > clocksweep to find a local buffer and if it hasn't been allocated yet it > > is allocated in a place similar to where shared buffers acquire a buffer > > from the freelist. If I didn't count it here, I would need to make a new > > column only for local buffers called "allocated" or something like that. > > I think you're making this too granular. We need to have more detail than > today. But we don't necessarily need to catch every nuance. In general I agree that coarser granularity here may be easier to use. I do think the current docs explain what's going on pretty well, though, and I worry if merging too many concepts will make that harder to follow. But if a less detailed breakdown still communicates potential problems, +1. > > This seems not too bad at first, but if you consider that later we will > > add other kinds of IO -- eg WAL IO or temporary file IO, we won't be > > able to use these existing columns and will need to add even more > > columns describing the exact behavior in those cases. > > I think it's clearly not the right direction. +1, I think the existing approach makes more sense.
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Wed, Oct 19, 2022 at 12:27 PM Melanie Plageman wrote: > > v34 is attached. > I think the column names need discussion. Also, the docs need more work > (I added a lot of new content there). I could use feedback on the column > names and definitions and review/rephrasing ideas for the docs > additions. Nice! I think the expanded docs are great, and make this information much easier to interpret. >+ io_context bulkread, existing >+ dirty buffers in the ring requirng flush are "requiring" >+ shared buffers were acquired from the freelist and added to the >+ fixed-size strategy ring buffer. Shared buffers are added to the >+ strategy ring lazily. If the current buffer in the ring is pinned or in This is the first mention of the term "strategy" in these docs. It's not totally opaque, since there's some context, but maybe we should either try to avoid that term or define it more explicitly? >+ io_contexts. This is equivalent to >+ evicted for shared buffers in >+ io_context shared, as the >contents >+ of the buffer are evicted but refers to the case when >the I don't quite follow this: does this mean that I should expect 'reused' and 'evicted' to be equal in the 'shared' context, because they represent the same thing? Or will 'reused' just be null because it's not distinct from 'evicted'? It looks like it's null right now, but I find the wording here confusing. >+ future with a new shared buffer. A high number of >+ bulkread rejections can indicate a need for more >+ frequent vacuuming or more aggressive autovacuum settings, as buffers >are >+ dirtied during a bulkread operation when updating the hint bit or when >+ performing on-access pruning. This is great. Just wanted to re-iterate that notes like this are really helpful to understanding this view. > I've implemented a change using the same function pg_settings uses to > turn the build-time parameter BLCKSZ into 8kB (get_config_unit_name()) > using the flag GUC_UNIT_BLOCKS. I am unsure if this is better or worse > than "block_size". I am feeling very conflicted about this column. Yeah, I guess it feels less natural here than in pg_settings, but it still kind of feels like one way of doing this is better than two...
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Thu, Oct 13, 2022 at 10:29 AM Melanie Plageman wrote: > I think that it makes sense to count both the initial buffers added to > the ring and subsequent shared buffers added to the ring (either when > the current strategy buffer is pinned or in use or when a bulkread > rejects dirty strategy buffers in favor of new shared buffers) as > strategy clocksweeps because of how the statistic would be used. > > Clocksweeps give you an idea of how much of your working set is cached > (setting aside initially reading data into shared buffers when you are > warming up the db). You may use clocksweeps to determine if you need to > make shared buffers larger. > > Distinguishing strategy buffer clocksweeps from shared buffer > clocksweeps allows us to avoid enlarging shared buffers if most of the > clocksweeps are to bring in blocks for the strategy operation. > > However, I could see an argument that discounting strategy clocksweeps > done because the current strategy buffer is pinned makes the number of > shared buffer clocksweeps artificially low since those other queries > using the buffer would have suffered a cache miss were it not for the > strategy. And, in this case, you would take strategy clocksweeps > together with shared clocksweeps to make your decision. And if we > include buffers initially added to the strategy ring in the strategy > clocksweep statistic, this number may be off because those blocks may > not be needed in the main shared working set. But you won't know that > until you try to reuse the buffer and it is pinned. So, I think we don't > have a better option than counting initial buffers added to the ring as > strategy clocksweeps (as opposed to as reuses). > > So, in answer to your question, no, I cannot think of a scenario like > that. That analysis makes sense to me; thanks. > It also made me remember that I am incorrectly counting rejected buffers > as reused. I'm not sure if it is a good idea to subtract from reuses > when a buffer is rejected. Waiting until after it is rejected to count > the reuse will take some other code changes. Perhaps we could also count > rejections in the stats? I'm not sure what makes sense here. > > Not critical, but is there a list of backend types we could > > cross-reference elsewhere in the docs? > > The most I could find was this longer explanation (with exhaustive list > of types) in pg_stat_activity docs [1]. I could duplicate what it says > or I could link to the view and say "see pg_stat_activity" for a > description of backend_type" or something like that (to keep them from > getting out of sync as new backend_types are added. I suppose I could > also add docs on backend_types, but I'm not sure where something like > that would go. I think linking pg_stat_activity is reasonable for now. A separate section for this might be nice at some point, but that seems out of scope. > > From the io_context column description: > > > > + The autovacuum daemon, explicit VACUUM, > > explicit > > + ANALYZE, many bulk reads, and many bulk > > writes use a > > + fixed amount of memory, acquiring the equivalent number of > > shared > > + buffers and reusing them circularly to avoid occupying an > > undue portion > > + of the main shared buffer pool. > > + > > > > I don't understand how this is relevant to the io_context column. > > Could you expand on that, or am I just missing something obvious? > > > > I'm trying to explain why those other IO Contexts exist (bulkread, > bulkwrite, vacuum) and why they are separate from shared buffers. > Should I cut it altogether or preface it with something like: these are > counted separate from shared buffers because...? Oh I see. That makes sense; it just wasn't obvious to me this was talking about the last three values of io_context. I think a brief preface like that would be helpful (maybe explicitly with "these last three values", and I think "counted separately"). > > + > > + > role="column_definition"> > > + extended bigint > > + > > + > > + Extends of relations done by this > > backend_type in > > + order to write data in this io_context. > > + > > + > > > > I understand what this is, but not why this is something I might want > > to know about. > > Unlike writes, backends largely have to do their own extends, so > separating this from writes lets us determine whether or not we need to > change checkpointer/bgwriter to be more aggressive using the writes > without the distraction of the extends. Should I mention this in the > docs? The other stats views don't seems to editorialize at all, and I > wasn't sure if this was an objective enough point to include in docs. Thanks for the clarification. Just to make sure I understand, you mean that if I see a high extended count, that may be interesting in terms of write activity, but I can't fix that by tuning--it's just the nature of my workload? I think you're right that this is
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Thanks for working on this! Like Lukas, I'm excited to see more visibility into important parts of the system like this. On Mon, Oct 10, 2022 at 11:49 AM Melanie Plageman wrote: > > I've gone ahead and implemented option 1 (commented below). No strong opinion on 1 versus 2, but I guess at least partly because I don't understand the implications (I do understand the difference, just not when it might be important in terms of stats). Can we think of a situation where combining stats about initial additions with pinned additions hides some behavior that might be good to understand and hard to pinpoint otherwise? I took a look at the latest docs (as someone mostly familiar with internals at only a pretty high level, so probably somewhat close to the target audience) and have some feedback. + + + backend_type text + + + Type of backend (e.g. background worker, autovacuum worker). + + Not critical, but is there a list of backend types we could cross-reference elsewhere in the docs? >From the io_context column description: + The autovacuum daemon, explicit VACUUM, explicit + ANALYZE, many bulk reads, and many bulk writes use a + fixed amount of memory, acquiring the equivalent number of shared + buffers and reusing them circularly to avoid occupying an undue portion + of the main shared buffer pool. + I don't understand how this is relevant to the io_context column. Could you expand on that, or am I just missing something obvious? + + + extended bigint + + + Extends of relations done by this backend_type in + order to write data in this io_context. + + I understand what this is, but not why this is something I might want to know about. And from your earlier e-mail: On Thu, Oct 6, 2022 at 10:42 AM Melanie Plageman wrote: > > Because we want to add non-block-oriented IO in the future (like > temporary file IO) to this view and want to use the same "read", > "written", "extended" columns, I would prefer not to prefix the columns > with "blks_". I have added a column "unit" which would contain the unit > in which read, written, and extended are in. Unfortunately, fsyncs are > not per block, so "unit" doesn't really work for this. I documented > this. > > The most correct thing to do to accommodate block-oriented and > non-block-oriented IO would be to specify all the values in bytes. > However, I would like this view to be usable visually (as opposed to > just in scripts and by tools). The only current value of unit is > "block_size" which could potentially be combined with the value of the > GUC to get bytes. > > I've hard-coded the string "block_size" into the view generation > function pg_stat_get_io(), so, if this idea makes sense, perhaps I > should do something better there. That seems broadly reasonable, but pg_settings also has a 'unit' field, and in that view, unit is '8kB' on my system--i.e., it (presumably) reflects the block size. Is that something we should try to be consistent with (not sure if that's a good idea, but thought it was worth asking)? > On Fri, Sep 30, 2022 at 7:18 PM Lukas Fittl wrote: > > - Overall it would be helpful if we had a dedicated documentation page on > > I/O statistics that's linked from the pg_stat_io view description, and > > explains how the I/O statistics tie into the various concepts of shared > > buffers / buffer access strategies / etc (and what is not tracked today) > > I haven't done this yet. How specific were you thinking -- like > interpretations of all the combinations and what to do with what you > see? Like you should run pg_prewarm if you see X? Specific checkpointer > or bgwriter GUCs to change? Or just links to other docs pages on > recommended tunings? > > Were you imagining the other IO statistics views (like > pg_statio_all_tables and pg_stat_database) also being included in this > page? Like would it be a comprehensive guide to IO statistics and what > their significance/purposes are? I can't speak for Lukas here, but I encouraged him to suggest more thorough documentation in general, so I can speak to my concerns: in general, these stats should be usable for someone who does not know much about Postgres internals. It's pretty low-level information, sure, so I think you need some understanding of how the system broadly works to make sense of it. But ideally you should be able to find what you need to understand the concepts involved within the docs. I think your updated docs are much clearer (with the caveats of my specific comments above). It would still probably be helpful to have a dedicated page on I/O stats (and yeah, something with a broad scope, along the lines of a comprehensive guide), but I think that can wait until a future patch. Thanks, Maciek
Re: warn if GUC set to an invalid shared library
Thanks for picking this back up, Justin. >I've started to think that we should really WARN whenever a (set of) GUC is set >in a manner that the server will fail to start - not just for shared libraries. +0.5. I think it's a reasonable change, but I've never broken my server with anything other than shared_preload_libraries, so I'd rather see an improvement here first rather than expanding scope. I think shared_preload_libraries (and friends) is especially tricky due to the syntax, and more likely to lead to problems. On the update patch itself, I have some minor feedback about message wording postgres=# set local_preload_libraries=xyz; SET Great, it's nice that this no longer gives a warning. postgres=# alter role bob set local_preload_libraries = xyz; WARNING: could not access file "xyz" DETAIL: New sessions will currently fail to connect with the new setting. ALTER ROLE The warning makes sense, but the detail feels a little awkward. I think "currently" is sort of redundant with "new setting". And it could be clearer that the setting did in fact take effect (I know the ALTER ROLE command tag echo tells you that, but we could reinforce that in the warning). Also, I know I said last time that the scope of the warning is clear from the setting, but looking at it again, I think we could do better. I guess because when we're generating the error, we don't know whether the source was ALTER DATABASE or ALTER ROLE, we can't give a more specific message? Ideally, I think the DETAIL would be something like "New sessions for this role will fail to connect due to this setting". Maybe even with a HINT of "Run ALTER ROLE again with a valid value to fix this"? If that's not feasible, maybe "New sessions for this role or database will fail to connect due to this setting"? That message is not as clear about the impact of the change as it could be, but hopefully you know what command you just ran, so that should make it unambiguous. I do think without qualifying that, it suggests that all new sessions are affected. Hmm, or maybe just "New sessions affected by this setting will fail to connect"? That also makes the scope clear without the warning having to be aware of the scope: if you just ran ALTER DATABASE it's pretty clear that what is affected by the setting is the database. I think this is probably the way to go, but leaving my thought process above for context. postgres=# alter system set shared_preload_libraries = lol; WARNING: could not access file "$libdir/plugins/lol" DETAIL: The server will currently fail to start with this setting. HINT: If the server is shut down, it will be necessary to manually edit the postgresql.auto.conf file to allow it to start again. ALTER SYSTEM I think this works. Tom's copy edit above omitted "currently" from the DETAIL and did not include the "$libdir/plugins/" prefix. I don't feel strongly about these either way. 2022-07-22 10:37:50.217 PDT [1131187] LOG: database system is shut down 2022-07-22 10:37:50.306 PDT [1134058] WARNING: could not access file "$libdir/plugins/lol" 2022-07-22 10:37:50.306 PDT [1134058] DETAIL: The server will currently fail to start with this setting. 2022-07-22 10:37:50.306 PDT [1134058] HINT: If the server is shut down, it will be necessary to manually edit the postgresql.auto.conf file to allow it to start again. 2022-07-22 10:37:50.312 PDT [1134058] FATAL: could not access file "lol": No such file or directory 2022-07-22 10:37:50.312 PDT [1134058] CONTEXT: while loading shared libraries for setting "shared_preload_libraries" from /home/maciek/code/aux/postgres/tmpdb/postgresql.auto.conf:3 2022-07-22 10:37:50.312 PDT [1134058] LOG: database system is shut down Hmm, I guess this is a side effect of where these messages are emitted, but at this point, lines 4-6 here are a little confusing, no? The server was already shut down, and we're trying to start it back up. If there's no reasonable way to avoid that output, I think it's okay, but it'd be better to skip it (or adjust it to the new context). Thanks, Maciek
Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
I ran that original test case with and without the patch. Here are the numbers I'm seeing: master (best of three): postgres=# SELECT count(*) FROM lotsarows; Time: 582.423 ms postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM lotsarows; Time: 616.102 ms postgres=# EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows; Time: 1068.700 ms (00:01.069) patched (best of three): postgres=# SELECT count(*) FROM lotsarows; Time: 550.822 ms postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM lotsarows; Time: 612.572 ms postgres=# EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows; Time: 690.875 ms On Fri, Jul 1, 2022 at 10:26 AM Andres Freund wrote: > On 2022-07-01 01:23:01 -0700, Lukas Fittl wrote: >... > > Known WIP problems with this patch version: > > > > * There appears to be a timing discrepancy I haven't yet worked out, where > > the \timing data reported by psql doesn't match what EXPLAIN ANALYZE is > > reporting. With Andres' earlier test case, I'm seeing a consistent ~700ms > > higher for \timing than for the EXPLAIN ANALYZE time reported on the > > server > > side, only when rdtsc measurement is used -- its likely there is a problem > > somewhere with how we perform the cycles to time conversion > > Could you explain a bit more what you're seeing? I just tested your patches > and didn't see that here. I did not see this either, but I did see that the execution time reported by \timing is (for this test case) consistently 0.5-1ms *lower* than the Execution Time reported by EXPLAIN. I did not see that on master. Is that expected? Thanks, Maciek
Re: Add id's to various elements in protocol.sgml
On Thu, Feb 24, 2022, 16:52 Kyotaro Horiguchi wrote: > At Tue, 21 Dec 2021 08:47:27 +0100, Brar Piening wrote in > > On 20.12.2021 at 16:09, Robert Haas wrote: > > > As a data point, this is something I have also wanted to do, from time > > > to time. I am generally of the opinion that any place the > > +1 from me. When I put an URL in the answer for inquiries, I always > look into the html for name/id tags so that the inquirer quickly find > the information source (or the backing or reference point) on the > page. +1 here as well. I often do the exact same thing. It's not hard, but it's a little tedious, especially considering most modern doc systems support linkable sections.
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Andrew made a good case above for avoiding LOG: >I do think we should be wary of any name starting with "LOG", though. >Long experience tells us that's something that confuses users when it refers to the WAL.
Re: warn if GUC set to an invalid shared library
On Wed, Feb 9, 2022 at 5:58 PM Justin Pryzby wrote: > FYI, it has said "while..." and hasn't said "guc" since the 2nd revision of > the > patch. The v3-0001 attached above had "while... for GUC..."--sorry I wasn't clear. In v4, the message looks fine to me for shared_preload_libraries (except there is a doubled "is"). However, I also get the message for a simple SET with local_preload_libraries: postgres=# set local_preload_libraries=xyz; WARNING: could not access file "xyz" HINT: The server will fail to start with the existing configuration. If it is is shut down, it will be necessary to manually edit the postgresql.auto.conf file to allow it to start. SET I'm not familiar with that setting (reading the docs, it's like a non-superuser session_preload_libraries for compatible modules?), but given nothing is being persisted here with ALTER SYSTEM, this seems incorrect. Changing session_preload_libraries emits a similar warning: postgres=# set session_preload_libraries = foo; WARNING: could not access file "$libdir/plugins/foo" HINT: New sessions will fail with the existing configuration. SET This is also not persisted, so I think this is also incorrect, right? (I'm not sure what setting session_preload_libraries without an ALTER ROLE or ALTER DATABASE accomplishes, given a new session is required for the change to take effect, but I thought I'd point this out.) I'm guessing this may be due to trying to have the warning for ALTER ROLE? postgres=# alter role bob set session_preload_libraries = foo; WARNING: could not access file "$libdir/plugins/foo" HINT: New sessions will fail with the existing configuration. ALTER ROLE This is great. Ideally, we'd qualify this with "New sessions for user..." or "New sessions for database..." but given you get the warning right after running the relevant command, maybe that's clear enough. > $ ./tmp_install/usr/local/pgsql/bin/postgres -D > src/test/regress/tmp_check/data -clogging_collector=on > 2022-02-09 19:53:48.034 CST postmaster[30979] FATAL: could not access file > "a": No such file or directory > 2022-02-09 19:53:48.034 CST postmaster[30979] CONTEXT: while loading shared > libraries for setting "shared_preload_libraries" > from > /home/pryzbyj/src/postgres/src/test/regress/tmp_check/data/postgresql.auto.conf:3 > 2022-02-09 19:53:48.034 CST postmaster[30979] LOG: database system is shut > down > > Maybe it's enough to show the GucSource rather than file:line... This is great. I think the file:line output is helpful here. Thanks, Maciek
Re: CREATEROLE and role ownership hierarchies
I'm chiming in a little late here, but as someone who worked on a system to basically work around the lack of unprivileged CREATE ROLE for a cloud provider (I worked on the Heroku Data team for several years), I thought it might be useful to offer my perspective. This is, of course, not the only use case, but maybe it's useful to have something concrete. As a caveat, I don't know how current this still is (I no longer work there, though the docs [1] seem to still describe the same system), or if there are better ways to achieve the goals of a service provider. Broadly, the general use case is something like what Robert has sketched out in his e-mails. Heroku took care of setting up the database, archiving, replication, and any other system-level config. It would then keep the superuser credentials private, create a database, and a user that owned that database and had all the permissions we could grant it without compromising the integrity of the system. (We did not want customers to break their databases, both to ensure a better user experience and to avoid getting paged.) Initially, this meant customers got just the one database user because of CREATE ROLE's limitations. To work around that, at some point, we added an API that would CREATE ROLE for you, accessible through a dashboard and the Heroku CLI. This ran CREATE ROLE (or DROP ROLE) for you, but otherwise it largely let you configure the resulting roles as you pleased (using the original role we create for you). We wanted to avoid reinventing the wheel as much as possible, and the customer database (including the role configuration) was mostly a black box for us (we did manage some predefined permissions configurations through our dashboard, but the Postgres catalogs were the source of truth for that). Thinking about how this would fit into a potential non-superuser CREATE ROLE world, the sandbox superuser model discussed above covers this pretty well, though I share some of Robert's concerns around how this fits into existing systems. Hope this is useful feedback. Thanks for working on this! [1]: https://devcenter.heroku.com/articles/heroku-postgresql-credentials
Re: warn if GUC set to an invalid shared library
On Wed, Feb 2, 2022 at 7:39 AM David G. Johnston wrote: > I would at least consider having the UX go something like: > > postgres=# ALTER SYSTEM SET shared_preload_libraries = not_such_library; > ERROR: that library is not present>. > HINT: to bypass the error please add FORCE before SET > postgres=# ALTER SYSTEM FORCE SET shared_preload_libraries = > no_such_library; > NOTICE: Error suppressed while setting shared_preload_libraries. > > That is, have the user express their desire to leave the system in a > precarious state explicitly before actually doing so. > While I don't have a problem with that behavior, given that there are currently no such facilities for asserting "yes, really" with ALTER SYSTEM, I don't think it's worth introducing that just for this patch. A warning seems like a reasonable first step. This can always be expanded later. I'd rather see a warning ship than move the goalposts out of reach.
Re: warn if GUC set to an invalid shared library
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested I tried the latest version of the patch, and it works as discussed. There is no documentation, but I think that's moot for this warning (we may want to note something in the setting docs, but even if so, I think we should figure out the message first and then decide if it merits additional explanation in the docs). I do not know whether it is spec-compliant, but I doubt the spec has much to say on something like this. I tried running ALTER SYSTEM and got the warnings as expected: postgres=# alter system set shared_preload_libraries = no_such_library,not_this_one_either; WARNING: could not access file "$libdir/plugins/no_such_library" WARNING: could not access file "$libdir/plugins/not_this_one_either" ALTER SYSTEM I think this is great, but it would be really helpful to also indicate that at this point the server will fail to come back up after a restart. In my mind, that's a big part of the reason for having a warning here. Having made this mistake a couple of times, I would be able to read between the lines, as would many other users, but if you're not familiar with Postgres this might still be pretty opaque. I think if I'm reading the code correctly, this warning path is shared between ALTER SYSTEM and a SET of local_preload_libraries so it might be tricky to word this in a way that works in all situations, but it could make the precarious situation a lot clearer. I don't really know a good wording here, but maybe a hint like "The server or session will not be able to start if it has been configured to use libraries that cannot be loaded."? Also, there are two sides to this: one is actually applying the possibly-bogus setting, and the other is when that setting takes effect (e.g., attempting to start the server or to start a new session). I think Robert had good feedback regarding the latter: On Fri, Jan 28, 2022 at 6:42 AM Robert Haas wrote: > On Tue, Dec 28, 2021 at 12:45 PM Justin Pryzby wrote: > > 0002 adds context when failing to start. > > > > 2021-12-27 17:01:12.996 CST postmaster[1403] WARNING: could not > > load library: $libdir/plugins/asdf: cannot open shared object file: No such > > file or directory > > 2021-12-27 17:01:14.938 CST postmaster[1403] FATAL: could not > > access file "asdf": No such file or directory > > 2021-12-27 17:01:14.938 CST postmaster[1403] CONTEXT: guc > > "shared_preload_libraries" > > 2021-12-27 17:01:14.939 CST postmaster[1403] LOG: database system > > is shut down > > -1 from me on using "guc" in any user-facing error message. And even > guc -> setting isn't a big improvement. If we're going to structure > the reporting this way there, we should try to use a meaningful phrase > there, probably beginning with the word "while"; see "git grep > errcontext.*while" for interesting precedents. > > That said, that series of messages seems a bit suspect to me, because > the WARNING seems to be stating the same problem as the subsequent > FATAL and CONTEXT lines. Ideally we'd tighten that somehow. Maybe we don't even need the WARNING in this case? At this point, it's clear what the problem is. I think the CONTEXT line does actually help, because otherwise it's not clear why the server failed to start, but the warning does seem superfluous here. I do agree that GUC is awkward here, and I like the "while..." wording suggested both for consistency with other messages and how it could work here: CONTEXT: while loading "shared_preload_libraries" I think that would be pretty clear. In the ALTER SYSTEM case, you still need to know to edit the file in spite of the warning telling you not to edit it, but I think that's still better. Based on Cary's feedback, maybe that could be clearer, too (like you, I'm not sure if I understood what he did correctly), but I think that could certainly be future work. Thanks, Maciek
Re: warn if GUC set to an invalid shared library
On Sat, Jan 8, 2022 at 2:07 PM Justin Pryzby wrote: > Unfortunately, the output for dlopen() is not portable, which (I think) means > most of what I wrote can't be made to work.. Since it doesn't work to call > dlopen() when using SET, I tried using just stat(). But that also fails on > windows, since one of the regression tests has an invalid filename involving > unbalanced quotes, which cause it to return EINVAL rather than ENOENT. So SET > cannot warn portably, unless it includes no details at all (or we specially > handle the windows case), or change the pre-existing regression test. But > there's a 2nd instability, too, apparently having to do with timing. So I'm > planning to drop the 0001 patch. Hmm. I think 001 is a big part of the usability improvement here. Could we not at least warn generically, without relaying the underlying error? The notable thing in this situation is that the specified library could not be loaded (and that it will almost certainly cause problems on restart). The specific error would be nice to have, but it's less important. What is the timing instability? > > On Tue, Dec 28, 2021 at 9:45 AM Justin Pryzby wrote: > > For whatever reason, I get slightly different (and somewhat redundant) > > output on failing to start: > > > > 2022-01-08 12:59:36.784 PST [324482] WARNING: could not load library: > > $libdir/plugins/totally bogus: cannot open shared object file: No such file > > or directory > > 2022-01-08 12:59:36.787 PST [324482] FATAL: could not load library: > > totally bogus: cannot open shared object file: No such file or directory > > 2022-01-08 12:59:36.787 PST [324482] LOG: database system is shut down > > I think the first WARNING is from the GUC mechanism "setting" the library. > And then the FATAL is from trying to apply the GUC. > It looks like you didn't apply the 0002 patch for that test so got no CONTEXT > ? I still had the terminal open where I tested this, and the scrollback did show me applying the patch (and building after). I tried a make clean and applying the patch again, and I do see the CONTEXT line now. I'm not sure what the problem was but seems like PEBKAC--sorry about that. Thanks, Maciek
Re: warn if GUC set to an invalid shared library
On Thu, Dec 30, 2021 at 12:21 AM Bharath Rupireddy wrote: > Overall the idea looks good to me. A warning on ALTER SYSTEM SET seems > reasonable than nothing. ERROR isn't the way to go as it limits the > users of setting the extensions in shared_preload_libraries first, > installing them later. Is NOTICE here a better idea than WARNING? I don't think so--I'm skeptical that "updated shared_preload_libraries first, then install them" is much more than a theoretical use case. We may not want to block that off completely, but I think a warning is reasonable here, because you're *probably* doing something wrong if you get to this message at all (and if you're not, you're probably familiar enough with Postgres to know to ignore the warning). Thanks, Maciek
Re: warn if GUC set to an invalid shared library
Thanks for working on this! I tried it out and it worked for me. I reviewed the patch and didn't see any problems, but I'm not much of a C programmer. On Tue, Dec 28, 2021 at 9:45 AM Justin Pryzby wrote: > 0002 adds context when failing to start. > > 2021-12-27 17:01:12.996 CST postmaster[1403] WARNING: could not load > library: $libdir/plugins/asdf: cannot open shared object file: No such file > or directory > 2021-12-27 17:01:14.938 CST postmaster[1403] FATAL: could not access > file "asdf": No such file or directory > 2021-12-27 17:01:14.938 CST postmaster[1403] CONTEXT: guc > "shared_preload_libraries" > 2021-12-27 17:01:14.939 CST postmaster[1403] LOG: database system is > shut down For whatever reason, I get slightly different (and somewhat redundant) output on failing to start: 2022-01-08 12:59:36.784 PST [324482] WARNING: could not load library: $libdir/plugins/totally bogus: cannot open shared object file: No such file or directory 2022-01-08 12:59:36.787 PST [324482] FATAL: could not load library: totally bogus: cannot open shared object file: No such file or directory 2022-01-08 12:59:36.787 PST [324482] LOG: database system is shut down I'm on a pretty standard Ubuntu 20.04 (with PGDG packages installed for 11, 12, and 13). I did configure to a --prefix and set LD_LIBRARY_PATH and PATH. Not sure if this is an issue in my environment or a platform difference? Also, regarding the original warning: 2022-01-08 12:57:24.953 PST [324338] WARNING: could not load library: $libdir/plugins/totally bogus: cannot open shared object file: No such file or directory I think this is pretty clear to users familiar with shared_preload_libraries. However, for someone less experienced with Postgres and just following instructions on how to set up, e.g., auto_explain (and making a typo), it's not clear from this message that your server will fail to start again after this if you shut it down (or it crashes!), and how to get out of this situation. Should we add a HINT to that effect? Similarly, for > 0001 adds WARNINGs when doing SET: > > postgres=# SET local_preload_libraries=xyz; > WARNING: could not load library: xyz: cannot open shared object > file: No such file or directory > SET This works for me, but should this explain the impact (especially if used with something like ALTER ROLE)? I guess that's probably because the context may not be easily accessible. I think shared_preload_libraries is much more common, though, so I'm more interested in a warning there. Thanks, Maciek
Re: [EXTERNAL] Re: should we document an example to set multiple libraries in shared_preload_libraries?
On Fri, Dec 10, 2021 at 10:10 AM Godfrin, Philippe E wrote: > I may have missed something in this stream, but is this a system controlled > by Patroni? In my case, no: it's a pretty vanilla PGDG install on Ubuntu 20.04. Thanks for the context, though. Thanks, Maciek
Re: should we document an example to set multiple libraries in shared_preload_libraries?
On Thu, Dec 9, 2021 at 7:42 AM Bharath Rupireddy wrote: > How about ALTER SYSTEM SET shared_preload_libraries command itself > checking for availability of the specified libraries (after fetching > library names from the parsed string value) with stat() and then > report an error if any of the libraries doesn't exist? Currently, it > just accepts if the specified value passes the parsing. That certainly would have helped me. I guess it technically breaks the theoretical use case of "first change the shared_preload_libraries setting, then install those libraries, then restart Postgres," but I don't see why anyone would do that in practice. For what it's worth, I don't even feel strongly that this needs to be an error—just that the current flow around this is error-prone due to several sharp edges and could be improved. I would be happy with an error, but a warning or other mechanism could work, too. I do think better documentation is not enough: the differences between a working setting value and a broken one are pretty subtle. Thanks, Maciek
Re: should we document an example to set multiple libraries in shared_preload_libraries?
On Wed, Dec 1, 2021 at 5:15 AM Tom Lane wrote: > > Justin Pryzby writes: > > +1 to document it, but it seems like the worse problem is allowing the > > admin to > > write a configuration which causes the server to fail to start, without > > having > > issued a warning. > > > I think you could fix that with a GUC check hook to emit a warning. > > ... > > Considering the vanishingly small number of actual complaints we've > seen about this, that sounds ridiculously over-engineered. > A documentation example should be sufficient. I don't know if this will tip the scales, but I'd like to lodge a belated complaint. I've gotten myself in this server-fails-to-start situation several times (in development, for what it's worth). The syntax (as Bharath pointed out in the original message) is pretty picky, there are no guard rails, and if you got there through ALTER SYSTEM, you can't fix it with ALTER SYSTEM (because the server isn't up). If you go to fix it manually, you get a scary "Do not edit this file manually!" warning that you have to know to ignore in this case (that's if you find the file after you realize what the fairly generic "FATAL: ... No such file or directory" error in the log is telling you). Plus you have to get the (different!) quoting syntax right or cut your losses and delete the change. I'm over-dramatizing this a bit, but I do think there are a lot of opportunities to make mistakes here, and this behavior could be more user-friendly beyond just documentation changes. If a config change is bogus most likely due to a quoting mistake or a typo, a warning would be fantastic (i.e., the stat() check Justin suggested). Or maybe the FATAL log message on a failed startup could include the source of the problem? Thanks, Maciek
TOAST questions
Hello, (I hope it's okay to ask general internals questions here; if this list is strictly for development, I'll keep my questions on -general but since I'm asking about internal behavior, this seemed more appropriate.) I was playing around with inspecting TOAST tables in order to understand the mechanism better, and I ran across a strange issue: I've created a table that has a text column, inserted and then deleted some data, and the TOAST table still has some entries even though the owning table is now empty: maciek=# SELECT reltoastrelid::regclass FROM pg_class WHERE relname = 'users'; reltoastrelid --- pg_toast.pg_toast_4398034 (1 row) maciek=# select chunk_id, chunk_seq from pg_toast.pg_toast_4398034; chunk_id | chunk_seq --+--- 4721665 | 0 4721665 | 1 (2 rows) maciek=# select * from users; id | created_at | is_admin | username ++--+-- (0 rows) I've tried to reproduce this with a new table in the same database, but could not see the same behavior (the TOAST table entries are deleted when I delete rows from the main table). This is 11.12. Is this expected? In case it's relevant, this table originally had the first three columns. I inserted one row, then added the text column, set its STORAGE to EXTERNAL, and set toast_tuple_target to the minimum of 128. I inserted a few more rows until I found one large enough to go in the TOAST table (It looks like Postgres attempts to compress values and store them inline first even when STORAGE is EXTERNAL? I don't recall the exact size, but I had to use a value much larger than 128 before it hit the TOAST table. The TOAST docs allude to this behavior but just making sure I understand correctly.), then I deleted the rows with non-NULL values in the text column, and noticed the TOAST table entries were still there. So I deleted everything in the users table and still see the two rows above in the TOAST table. I've tried this sequence of steps again with a new table and could not reproduce the issue. Thanks, Maciek
Re: compute_query_id and pg_stat_statements
On Thu, May 13, 2021 at 8:38 AM Julien Rouhaud wrote: > > On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote: > > > > The warning makes it clear that there's something wrong, but not how > > to fix it > > I'm confused, are we talking about the new warning in v2 as suggested by > Pavel? > As it should make things quite clear: > > +SELECT count(*) FROM pg_stat_statements; > +WARNING: Query identifier calculation seems to be disabled > +HINT: If you don't want to use a third-party module to compute query > identifiers, you may want to enable compute_query_id > > The wording can of course be improved. I meant that no warning can be as clear as things just working, but I do have feedback on the specific message here: * "seems to" be disabled? Is it? Any reason not to be more definitive here? * On reading the beginning of the hint, I can see users asking themselves, "Do I want to use a third-party module to compute query identifiers?" * "may want to enable"? Are there any situations where I don't want to use a third-party module *and* I don't want to enable compute_query_id? So maybe something like > +SELECT count(*) FROM pg_stat_statements; > +WARNING: Query identifier calculation is disabled > +HINT: You must enable compute_query_id or configure a third-party module to > compute query identifiers in order to use pg_stat_statements. (I admit "configure a third-party module" is pretty vague, but I think that suggests it's only an option to consider if you know what you're doing.) Also, if you're configuring this for usage with a tool like pganalyze, and neglect to run a manual query (we guide users to do that, but they may skip that step), the warnings may not even be visible (the Go driver we are using does not surface client warnings). Should this be an error instead of a warning? Is it ever useful to get an empty result set from querying pg_stat_statements? Using an error here would parallel the behavior of shared_preload_libraries not including pg_stat_statements.
Re: compute_query_id and pg_stat_statements
On Thu, May 13, 2021 at 7:42 AM Bruce Momjian wrote: > > On Thu, May 13, 2021 at 12:03:42PM +0800, Julien Rouhaud wrote: > > On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote: > > I don't know what to say. So here is a summary of the complaints that I'm > > aware of: > > > > - > > https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.ca...@moonset.ru > > That was only a couple of days after the commit just before the feature > > freeze, > > so it may be the less relevant complaint. > > > > - > > https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com > > Did a git bisect to find the commit that changed the behavior and somehow > > didn't notice the new setting > > > > - this thread, with Fuji-san saying: > > > > > I'm afraid that users may easily forget to enable compute_query_id when > > > using > > > pg_stat_statements (because this setting was not necessary in v13 or > > > before) > > > > - this thread, with Peter E. saying: > > > > > Now there is the additional burden of turning on this weird setting that > > > no one understands. That's a 50% increase in burden. And almost no one > > > will > > > want to use a nondefault setting. pg_stat_statements is pretty popular. > > > I > > > think leaving in this requirement will lead to widespread confusion and > > > complaints. > > > > - this thread, with Pavel saying: > > > > > Until now, the pg_stat_statements was zero-config. So the change is not > > > user > > > friendly. > > > > So it's a mix of "it's changing something that didn't change in a long time" > > and "it's adding extra footgun and/or burden as it's not doing by default > > what > > the majority of users will want", with an overwhelming majority of people > > supporting the "we don't want that extra burden". > > Well, now that we have clear warnings when it is misconfigured, > especially when querying the pg_stat_statements view, are these > complaints still valid? Also, how is modifying > shared_preload_libraries zero-config, but modifying > shared_preload_libraries and compute_query_id a huge burden? The warning makes it clear that there's something wrong, but not how to fix it (as I noted in another message in this thread, a web search for pg_stat_statements docs still leads to an obsolete version). I don't think anyone is arguing that this is insurmountable for all users, but it is additional friction, and every bit of friction makes Postgres harder to use. Users don't read documentation, or misread documentation, or just are not sure what the documentation or the warning is telling them, in spite of our best efforts. And you're right, modifying shared_preload_libraries is not zero-config--I would love it if we could drop that requirement ;). Still, adding another setting is clearly one more thing to get wrong. > I am personally not comfortable committing a patch to add an auto option > the way it is implemented, so another committer will need to take > ownership of this, or the entire feature can be removed. Assuming we do want to avoid additional configuration requirements for pg_stat_statements, is there another mechanism you feel would work better? Or is that constraint incompatible with sane behavior for this feature?
Re: compute_query_id and pg_stat_statements
On Wed, May 12, 2021 at 9:58 PM Julien Rouhaud wrote: > Le jeu. 13 mai 2021 à 12:52, Maciek Sakrejda a écrit : >> >> For what it's worth, I don't think the actual changing of an extra >> setting is that big a burden: it's the figuring out that you need to >> change it, and how you should configure it, that is the problem. >> Especially since all major search engines still seem to return 9.4 (!) >> documentation as the first hit for a "pg_stat_statements" search. The >> common case (installing pg_stat_statements but not tweaking query id >> generation) should be simple. > > > the v2 patch I sent should address both your requirements. Yes, thanks--I just tried it and this is great. I just wanted to argue against reversing course here.
Re: compute_query_id and pg_stat_statements
On Wed, May 12, 2021 at 9:03 PM Julien Rouhaud wrote: > > On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote: > > How do they know to set shared_preload_libraries then? We change the > > user API all the time, especially in GUCs, and even rename them, but for > > some reason we don't think people using pg_stat_statements can be > > trusted to read the release notes and change their behavior. I just > > don't get it. > > I don't know what to say. So here is a summary of the complaints that I'm > aware of: > > - > https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.ca...@moonset.ru > That was only a couple of days after the commit just before the feature > freeze, > so it may be the less relevant complaint. > > - > https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com > Did a git bisect to find the commit that changed the behavior and somehow > didn't notice the new setting > > - this thread, with Fuji-san saying: > > > I'm afraid that users may easily forget to enable compute_query_id when > > using > > pg_stat_statements (because this setting was not necessary in v13 or before) > > - this thread, with Peter E. saying: > > > Now there is the additional burden of turning on this weird setting that > > no one understands. That's a 50% increase in burden. And almost no one > > will > > want to use a nondefault setting. pg_stat_statements is pretty popular. I > > think leaving in this requirement will lead to widespread confusion and > > complaints. > > - this thread, with Pavel saying: > > > Until now, the pg_stat_statements was zero-config. So the change is not user > > friendly. > > So it's a mix of "it's changing something that didn't change in a long time" > and "it's adding extra footgun and/or burden as it's not doing by default what > the majority of users will want", with an overwhelming majority of people > supporting the "we don't want that extra burden". For what it's worth, I don't think the actual changing of an extra setting is that big a burden: it's the figuring out that you need to change it, and how you should configure it, that is the problem. Especially since all major search engines still seem to return 9.4 (!) documentation as the first hit for a "pg_stat_statements" search. The common case (installing pg_stat_statements but not tweaking query id generation) should be simple.
Re: pg_stat_statements requires compute_query_id
On Mon, May 10, 2021 at 7:43 AM Julien Rouhaud wrote: > On Mon, May 10, 2021 at 04:36:16PM +0200, Pavel Stehule wrote: > > I expected just an empty column query_id and workable extension. This > > doesn't look well. > > > > More, it increases the (little bit) complexity of migration to Postgres 14. > > This was already raised multiple times, and the latest discussion can be found > at [1]. > > Multiple options have been suggested, but AFAICT there isn't a clear consensus > on what we should do exactly, so I've not been able to send a fix yet. > > [1]: > https://www.postgresql.org/message-id/flat/35457b09-36f8-add3-1d07-6034fa585ca8%40oss.nttdata.com Before it petered out, the thread seemed to be converging toward consensus that the situation could be improved. I work on pganalyze, and our product requires pg_stat_statements to be enabled for a lot of its functionality. We guide our users through enabling it, but if additional steps are required in 14, that may be confusing. I don't have any strong feelings on the particular mechanism that would work best here, but it would be nice if enabling pg_stat_statements in 14 did not require more work than in 13. Even if it's just one extra setting, it's another potential thing to get wrong and have to troubleshoot, plus it means all existing pg_stat_statements guides out there would become out of date. Thanks, Maciek
Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
On Wed, Jun 24, 2020 at 2:44 AM Amit Kapila wrote: > So, that leads to loops as 2 on "Parallel Seq Scan" and "Nested Loop" nodes. > Does this make sense now? Yes, I think we're on the same page. Thanks for the additional details. It turns out that the plan I sent at the top of the thread is actually an older plan we had saved, all the way from April 2018. We're fairly certain this was Postgres 10, but not sure what point release. I tried to reproduce this on 10, 11, 12, and 13 beta, but I am now seeing similar results to yours: Buffers and I/O Timings are rolled up into the parallel leader, and that is propagated as expected to the Gather. Sorry for the confusion. On Wed, Jun 24, 2020 at 3:18 AM Maciek Sakrejda wrote: >So we should be seeing an average, not a sum, right? And here I missed that the documentation specifies rows and actual time as per-loop, but other metrics are not--they're just cumulative. So actual time and rows are still per-"loop" values, but while rows values are additive (the Gather combines rows from the parallel leader and the workers), the actual time is not because the whole point is that this work happens in parallel. I'll report back if I can reproduce the weird numbers we saw in that original plan or find out exactly what Postgres version it was from. Thanks, Maciek
Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
On Tue, Jun 23, 2020 at 7:55 PM Amit Kapila wrote: > > I don't see any other reason for > > looping over the NL node itself in this plan. The Gather itself > > doesn't do any real looping, right? > > It is right that Gather doesn't do looping but Parallel Seq Scan node does so. Sorry, I still don't follow. How does a Parallel Seq Scan do looping? I looked at the parallel plan docs but I don't see looping mentioned anywhere[1]. Also, is looping not normally indicated on children, rather than on the node doing the looping? E.g., with a standard Nested Loop, the outer child will have loops=1 and the inner child will have loops equal to the row count produced by the outer child (and the Nested Loop itself will have loops=1 unless it also is being looped over by a parent node), right? But even aside from that, why do I need to divide by the number of loops here, when normally Postgres presents a per-loop average? [1]: https://www.postgresql.org/docs/current/parallel-plans.html
Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
On Tue, Jun 23, 2020 at 2:57 AM Amit Kapila wrote: > I don't think this is an odd situation because in this case, child > nodes like "Nested Loop" and "Parallel Seq Scan" has a value of > 'loops' as 3. So, to get the correct stats at those nodes, you need > to divide it by 3 whereas, at Gather node, the value of 'loops' is 1. > If you want to verify this thing then try with a plan where loops > should be 1 for child nodes as well, you should get the same value at > both Gather and Parallel Seq Scan nodes. Thanks for the response, but I still don't follow. I had assumed that loops=3 was just from loops=1 for the parallel leader plus loops=1 for each worker--is that not right? I don't see any other reason for looping over the NL node itself in this plan. The Gather itself doesn't do any real looping, right? But even so, the documentation [1] states: >In some query plans, it is possible for a subplan node to be executed more >than once. For example, the inner index scan will be executed once per outer >row in the above nested-loop plan. In such cases, the loops value reports the >total number of executions of the node, and the actual time and rows values >shown are averages per-execution. This is done to make the numbers comparable >with the way that the cost estimates are shown. Multiply by the loops value to >get the total time actually spent in the node. So we should be seeing an average, not a sum, right? [1]: https://www.postgresql.org/docs/current/using-explain.html#USING-EXPLAIN-ANALYZE
EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
Hello, I had some questions about the behavior of some accounting in parallel EXPLAIN plans. Take the following plan: ``` Gather (cost=1000.43..750173.74 rows=2 width=235) (actual time=1665.122..1665.122 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=27683 read=239573 I/O Timings: read=687.358 -> Nested Loop (cost=0.43..749173.54 rows=1 width=235) (actual time=1660.095..1660.095 rows=0 loops=3) Inner Unique: true Buffers: shared hit=77579 read=657847 I/O Timings: read=2090.189 Worker 0: actual time=1657.443..1657.443 rows=0 loops=1 Buffers: shared hit=23645 read=208365 I/O Timings: read=702.270 Worker 1: actual time=1658.277..1658.277 rows=0 loops=1 Buffers: shared hit=26251 read=209909 I/O Timings: read=700.560 -> Parallel Seq Scan on public.schema_indices (cost=0.00..748877.88 rows=35 width=235) (actual time=136.744..1659.629 rows=32 loops=3) Filter: ((schema_indices.invalidated_at_snapshot_id IS NULL) AND (NOT schema_indices.is_valid)) Rows Removed by Filter: 703421 Buffers: shared hit=77193 read=657847 I/O Timings: read=2090.189 Worker 0: actual time=69.248..1656.950 rows=32 loops=1 Buffers: shared hit=23516 read=208365 I/O Timings: read=702.270 Worker 1: actual time=260.188..1657.875 rows=27 loops=1 Buffers: shared hit=26140 read=209909 I/O Timings: read=700.560 -> Index Scan using schema_tables_pkey on public.schema_tables (cost=0.43..8.45 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=95) Index Cond: (schema_tables.id = schema_indices.table_id) Filter: (schema_tables.database_id = 123) Rows Removed by Filter: 1 Buffers: shared hit=386 Worker 0: actual time=0.011..0.011 rows=0 loops=32 Buffers: shared hit=129 Worker 1: actual time=0.011..0.011 rows=0 loops=27 Buffers: shared hit=111 Planning Time: 0.429 ms Execution Time: 1667.373 ms ``` The Nested Loop here aggregates data for metrics like `buffers read` from its workers, and to calculate a metric like `buffers read` for the parallel leader, we can subtract the values recorded in each individual worker. This happens in the Seq Scan and Index Scan children, as well. However, the Gather node appears to only include values from its direct parallel leader child (excluding that child's workers). This leads to the odd situation that the Gather has lower values for some of these metrics than its child (because the child node reporting includes the worker metrics) even though the values are supposed to be cumulative. This is even more surprising for something like I/O timing, where the Gather has a lower `read` value than one of the Nested Loop workers, which doesn't make sense in terms of wall-clock time. Is this behavior intentional? If so, is there an explanation of the reasoning or the trade-offs involved? Would it not make sense to propagate those cumulative parallel costs up the tree all the way to the root, instead of only using the parallel leader metrics under Gather? Thanks, Maciek
Re: JIT performance bug/regression & JIT EXPLAIN
On Mon, Jan 27, 2020 at 11:01 AM Robert Haas wrote: > On Fri, Nov 15, 2019 at 8:05 PM Maciek Sakrejda wrote: > > On Fri, Nov 15, 2019 at 5:49 AM Robert Haas wrote: > > > Personally, I don't care very much about backward-compatibility, or > > > about how hard it is for tools to parse. I want it to be possible, but > > > if it takes a little extra effort, so be it. > > > > I think these are two separate issues. I agree on > > backward-compatibility (especially if we can embed a server version in > > structured EXPLAIN output to make it easier for tools to track format > > differences), but not caring how hard it is for tools to parse? What's > > the point of structured formats, then? > > To make the data easy to parse. :-) > > I mean, it's clear that, on the one hand, having a format like JSON > that, as has recently been pointed out elsewhere, is parsable by a > wide variety of tools, is advantageous. However, I don't think it > really matters whether the somebody's got to look at a tag called > Flump and match it up with the data in another tag called JIT-Flump, > or whether there's a Flump group that has RegularStuff and JIT tags > inside of it. There's just not much difference in the effort involved. > Being able to parse the JSON or XML using generic code is enough of a > win that the details shouldn't matter that much. Having a structured EXPLAIN schema that's semantically consistent is still valuable. At the end of the day, it's humans who are writing the tools that consume that structured output. Given the sparse structured EXPLAIN schema documentation, as someone who currently works on EXPLAIN tooling, I'd prefer a trend toward consistency at the expense of backward compatibility. (Of course, we should avoid gratuitous changes.) But I take back the version number suggestion after reading Tom's response; that was naïve. > I think if you were going to complain about the limitations of our > current EXPLAIN output format, it'd make a lot more sense to focus on > the way we output expressions. That would be nice to have, but for what it's worth, my main complaint would be about documentation (especially around structured formats). The "Using EXPLAIN" section covers the basics, but understanding what node types exist, and what fields show up for what nodes and what they mean--that seems to be a big missing piece (I don't feel entitled to this documentation; as a structured format consumer, I'm just pointing out a deficiency). Contrast that with the great wire protocol documentation. In some ways it's easier to work on native drivers than on EXPLAIN tooling because the docs are thorough and well organized.
Re: Duplicate Workers entries in some EXPLAIN plans
Thanks for the thorough review. I obviously missed some critical issues. I recognized some of the other maintainability problems, but did not have a sense of how to best avoid them, being unfamiliar with the code. For what it's worth, this version makes sense to me.
Re: Duplicate Workers entries in some EXPLAIN plans
Okay. Does not getting as many workers as it asks for include sometimes getting zero, completely changing the actual output? If so, I'll submit a new version of the patch removing all tests--I was hoping to improve coverage, but I guess this is not the way to start. If not, can we keep the json tests at least if we only consider the first worker?
Re: Duplicate Workers entries in some EXPLAIN plans
Great, thank you. I noticed in the CF patch tester app, the build fails on Windows [1]. Investigating, I realized I had failed to fully strip volatile EXPLAIN info (namely buffers) in TEXT mode due to a bad regexp_replace. I've fixed this in the attached patch (which I also rebased against current master again). [1]: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.76313 diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index d189b8d573..b4108f82ec 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr, Oid sortOperator, Oid collation, bool nullsFirst); static void show_tablesample(TableSampleClause *tsc, PlanState *planstate, List *ancestors, ExplainState *es); -static void show_sort_info(SortState *sortstate, ExplainState *es); +static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es); static void show_hash_info(HashState *hashstate, ExplainState *es); static void show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es); @@ -131,7 +131,9 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es); static void ExplainJSONLineEnding(ExplainState *es); static void ExplainYAMLLineStarting(ExplainState *es); static void escape_yaml(StringInfo buf, const char *str); - +static void ExplainOpenWorker(StringInfo worker_str, ExplainState *es); +static void ExplainCloseWorker(ExplainState *es); +static void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, ExplainState *es); /* @@ -289,6 +291,7 @@ NewExplainState(void) es->costs = true; /* Prepare output buffer. */ es->str = makeStringInfo(); + es->root_str = es->str; return es; } @@ -1090,9 +1093,19 @@ ExplainNode(PlanState *planstate, List *ancestors, const char *partialmode = NULL; const char *operation = NULL; const char *custom_name = NULL; + StringInfo *worker_strs = NULL; int save_indent = es->indent; bool haschildren; + /* Prepare per-worker output */ + if (es->analyze && planstate->worker_instrument) + { + int num_workers = planstate->worker_instrument->num_workers; + worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo)); + for (int n = 0; n < num_workers; n++) + worker_strs[n] = makeStringInfo(); + } + switch (nodeTag(plan)) { case T_Result: @@ -1357,6 +1370,55 @@ ExplainNode(PlanState *planstate, List *ancestors, ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es); } + /* Prepare worker general execution details */ + if (es->analyze && es->verbose && !es->hide_workers && planstate->worker_instrument) + { + WorkerInstrumentation *w = planstate->worker_instrument; + + for (int n = 0; n < w->num_workers; ++n) + { + Instrumentation *instrument = >instrument[n]; + double nloops = instrument->nloops; + double startup_ms; + double total_ms; + double rows; + + if (nloops <= 0) +continue; + startup_ms = 1000.0 * instrument->startup / nloops; + total_ms = 1000.0 * instrument->total / nloops; + rows = instrument->ntuples / nloops; + + ExplainOpenWorker(worker_strs[n], es); + + if (es->format == EXPLAIN_FORMAT_TEXT) + { +if (es->timing) + appendStringInfo(es->str, + "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n", + startup_ms, total_ms, rows, nloops); +else + appendStringInfo(es->str, + "actual rows=%.0f loops=%.0f\n", + rows, nloops); + } + else + { +if (es->timing) +{ + ExplainPropertyFloat("Actual Startup Time", "ms", + startup_ms, 3, es); + ExplainPropertyFloat("Actual Total Time", "ms", + total_ms, 3, es); +} +ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); +ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es); + } + } + + ExplainCloseWorker(es); + } + switch (nodeTag(plan)) { case T_SeqScan: @@ -1856,7 +1918,7 @@ ExplainNode(PlanState *planstate, List *ancestors, break; case T_Sort: show_sort_keys(castNode(SortState, planstate), ancestors, es); - show_sort_info(castNode(SortState, planstate), es); + show_sort_info(castNode(SortState, planstate), worker_strs, es); break; case T_MergeAppend: show_merge_append_keys(castNode(MergeAppendState, planstate), @@ -1885,76 +1947,31 @@ ExplainNode(PlanState *planstate, List *ancestors, if (es->buffers && planstate->instrument) show_buffer_usage(es, >instrument->bufusage); - /* Show worker detail */ - if (es->analyze && es->verbose && !es->hide_workers && - planstate->worker_instrument) + /* Prepare worker buffer usage */ + if (es->buffers && es->analyze && es->verbose && !es->hide_workers + && planstate->worker_instrument) { WorkerInstrumentation *w = planstate->worker_instrument; - bool opened_group = false; int n; for (n = 0; n < w->num_workers; ++n) {
Re: Duplicate Workers entries in some EXPLAIN plans
On Wed, Jan 22, 2020 at 9:37 AM Georgios Kokolatos wrote: > My bad, I should have been more clear. I meant that it is preferable to use > the C99 standard which calls for declaring variables in the scope that you > need them. Ah, I see. I think I got that from ExplainPrintSettings. I've corrected my usage--thanks for pointing it out. I appreciate the effort to maintain a consistent style. > > >> int indent; /* current indentation level */ > >> List *grouping_stack; /* format-specific grouping state */ > >> + boolprint_workers; /* whether current node has worker > >> metadata */ > >> > >> Hmm.. commit introduced `hide_workers` in the struct. Having > >> both > >> names in the struct so far apart even seems a bit confusing and sloppy. Do > >> you > >> think it would be possible to combine or rename? > > > > > > I noticed that. I was thinking about combining them, but > > "hide_workers" seems to be about "pretend there is no worker output > > even if there is" and "print_workers" is "keep track of whether or not > > there is worker output to print". Maybe I'll rename to > > "has_worker_output"? > > The rename sounds a bit better in my humble opinion. Thanks. Also, reviewing my code again, I noticed that when I moved the general worker output earlier, I missed part of the merge conflict: I had replaced - /* Show worker detail */ - if (es->analyze && es->verbose && !es->hide_workers && - planstate->worker_instrument) with + /* Prepare worker general execution details */ + if (es->analyze && es->verbose && planstate->worker_instrument) which ignores the es->hide_workers flag (it did not fail the tests, but the intent is pretty clear). I've corrected this in the current patch. I also noticed that we can now handle worker buffer output more consistently across TEXT and structured formats, so I made that small change too: diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 140d0be426..b23b015594 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -1401,8 +1401,6 @@ ExplainNode(PlanState *planstate, List *ancestors, appendStringInfo(es->str, "actual rows=%.0f loops=%.0f\n", rows, nloops); - if (es->buffers) - show_buffer_usage(es, >bufusage); } else { @@ -1951,7 +1949,7 @@ ExplainNode(PlanState *planstate, List *ancestors, /* Prepare worker buffer usage */ if (es->buffers && es->analyze && es->verbose && !es->hide_workers - && planstate->worker_instrument && es->format != EXPLAIN_FORMAT_TEXT) + && planstate->worker_instrument) { WorkerInstrumentation *w = planstate->worker_instrument; int n; diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out index 8034a4e0db..a4eed3067f 100644 --- a/src/test/regress/expected/explain.out +++ b/src/test/regress/expected/explain.out @@ -103,8 +103,8 @@ $$, 'verbose', 'analyze', 'buffers', 'timing off', 'costs off', 'summary off'), Sort Key: (ROW("*VALUES*".column1)) + Buffers: shared hit=114 + Worker 0: actual rows=2 loops=1 + - Buffers: shared hit=114+ Sort Method: xxx + + Buffers: shared hit=114+ -> Values Scan on "*VALUES*" (actual rows=2 loops=1)+ Output: "*VALUES*".column1, ROW("*VALUES*".column1)+ Worker 0: actual rows=2 loops=1+ I think the "producing plan output for a worker" process is easier to reason about now, and while it changes TEXT format worker output order, the other changes in this patch are more drastic so this probably does not matter. I've also addressed the other feedback above, and reworded a couple of comments slightly. diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index d189b8d573..b4108f82ec 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr, Oid sortOperator, Oid collation, bool nullsFirst); static void show_tablesample(TableSampleClause *tsc, PlanState *planstate, List *ancestors, ExplainState *es); -static void show_sort_info(SortState *sortstate, ExplainState *es); +static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es); static void show_hash_info(HashState *hashstate, ExplainState *es); static void show_tidbitmap_info(BitmapHeapScanState *planstate,
Re: Duplicate Workers entries in some EXPLAIN plans
Thanks! I'll fix the brace issues. Re: the other items: > + int num_workers = planstate->worker_instrument->num_workers; > + int n; > + worker_strs = (StringInfo *) palloc0(num_workers * > sizeof(StringInfo)); > + for (n = 0; n < num_workers; n++) { > > I think C99 would be better here. Also no parenthesis needed. Pardon my C illiteracy, but what am I doing that's not valid C99 here? > + /* Prepare worker general execution details */ > + if (es->analyze && es->verbose && planstate->worker_instrument) > + { > + WorkerInstrumentation *w = planstate->worker_instrument; > + int n; > + > + for (n = 0; n < w->num_workers; ++n) > > I think C99 would be better here. And here (if it's not the same problem)? > + ExplainCloseGroup("Workers", "Workers", false, es); > + // do we have any other cleanup to do? > > This comment does not really explain anything. Either remove > or rephrase. Also C style comments. Good catch, thanks--I had put this in to remind myself (and reviewers) about cleanup, but I don't think there's anything else to do, so I'll just drop it. > int indent; /* current indentation level */ > List *grouping_stack; /* format-specific grouping state */ > + boolprint_workers; /* whether current node has worker metadata */ > > Hmm.. commit introduced `hide_workers` in the struct. Having > both > names in the struct so far apart even seems a bit confusing and sloppy. Do you > think it would be possible to combine or rename? I noticed that. I was thinking about combining them, but "hide_workers" seems to be about "pretend there is no worker output even if there is" and "print_workers" is "keep track of whether or not there is worker output to print". Maybe I'll rename to "has_worker_output"? > +extern void ExplainOpenWorker(StringInfo worker_str, ExplainState *es); > +extern void ExplainCloseWorker(ExplainState *es); > +extern void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, > ExplainState *es); > > No need to expose those, is there? I feel there should be static. Good call, I'll update.
Re: Duplicate Workers entries in some EXPLAIN plans
TEXT format was tricky due to its inconsistencies, but I think I have something working reasonably well. I added a simple test for TEXT format output as well, using a similar approach as the JSON format test, and liberally regexp_replacing away any volatile output. I suppose in theory we could do this for YAML, too, but I think it's gross enough not to be worth it, especially given the high similarity of all the structured outputs. diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index d189b8d573..c3c27e13a1 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr, Oid sortOperator, Oid collation, bool nullsFirst); static void show_tablesample(TableSampleClause *tsc, PlanState *planstate, List *ancestors, ExplainState *es); -static void show_sort_info(SortState *sortstate, ExplainState *es); +static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es); static void show_hash_info(HashState *hashstate, ExplainState *es); static void show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es); @@ -289,6 +289,7 @@ NewExplainState(void) es->costs = true; /* Prepare output buffer. */ es->str = makeStringInfo(); + es->root_str = es->str; return es; } @@ -1090,9 +1091,20 @@ ExplainNode(PlanState *planstate, List *ancestors, const char *partialmode = NULL; const char *operation = NULL; const char *custom_name = NULL; + StringInfo *worker_strs = NULL; int save_indent = es->indent; bool haschildren; + /* Prepare per-worker output */ + if (es->analyze && planstate->worker_instrument) { + int num_workers = planstate->worker_instrument->num_workers; + int n; + worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo)); + for (n = 0; n < num_workers; n++) { + worker_strs[n] = makeStringInfo(); + } + } + switch (nodeTag(plan)) { case T_Result: @@ -1357,6 +1369,58 @@ ExplainNode(PlanState *planstate, List *ancestors, ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es); } + /* Prepare worker general execution details */ + if (es->analyze && es->verbose && planstate->worker_instrument) + { + WorkerInstrumentation *w = planstate->worker_instrument; + int n; + + for (n = 0; n < w->num_workers; ++n) + { + Instrumentation *instrument = >instrument[n]; + double nloops = instrument->nloops; + double startup_ms; + double total_ms; + double rows; + + if (nloops <= 0) +continue; + startup_ms = 1000.0 * instrument->startup / nloops; + total_ms = 1000.0 * instrument->total / nloops; + rows = instrument->ntuples / nloops; + + ExplainOpenWorker(worker_strs[n], es); + + if (es->format == EXPLAIN_FORMAT_TEXT) + { +if (es->timing) + appendStringInfo(es->str, + "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n", + startup_ms, total_ms, rows, nloops); +else + appendStringInfo(es->str, + "actual rows=%.0f loops=%.0f\n", + rows, nloops); +if (es->buffers) + show_buffer_usage(es, >bufusage); + } + else + { +if (es->timing) +{ + ExplainPropertyFloat("Actual Startup Time", "ms", + startup_ms, 3, es); + ExplainPropertyFloat("Actual Total Time", "ms", + total_ms, 3, es); +} +ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); +ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es); + } + } + + ExplainCloseWorker(es); + } + switch (nodeTag(plan)) { case T_SeqScan: @@ -1856,7 +1920,7 @@ ExplainNode(PlanState *planstate, List *ancestors, break; case T_Sort: show_sort_keys(castNode(SortState, planstate), ancestors, es); - show_sort_info(castNode(SortState, planstate), es); + show_sort_info(castNode(SortState, planstate), worker_strs, es); break; case T_MergeAppend: show_merge_append_keys(castNode(MergeAppendState, planstate), @@ -1885,74 +1949,30 @@ ExplainNode(PlanState *planstate, List *ancestors, if (es->buffers && planstate->instrument) show_buffer_usage(es, >instrument->bufusage); - /* Show worker detail */ - if (es->analyze && es->verbose && !es->hide_workers && - planstate->worker_instrument) + /* Prepare worker buffer usage */ + if (es->buffers && es->analyze && es->verbose && !es->hide_workers + && planstate->worker_instrument && es->format != EXPLAIN_FORMAT_TEXT) { WorkerInstrumentation *w = planstate->worker_instrument; - bool opened_group = false; int n; for (n = 0; n < w->num_workers; ++n) { Instrumentation *instrument = >instrument[n]; double nloops = instrument->nloops; - double startup_ms; - double total_ms; - double rows; if (nloops <= 0) continue; - startup_ms = 1000.0 * instrument->startup / nloops; - total_ms = 1000.0 * instrument->total / nloops; - rows = instrument->ntuples / nloops; - - if
Re: Duplicate Workers entries in some EXPLAIN plans
Sounds good, I'll try that format. Any idea how to test YAML? With the JSON format, I was able to rely on Postgres' own JSON-manipulating functions to strip or canonicalize fields that can vary across executions--I can't really do that with YAML. Or should I run EXPLAIN with COSTS OFF, TIMING OFF, SUMMARY OFF and assume that for simple queries the BUFFERS output (and other fields I can't turn off like Sort Space Used) *is* going to be stable?
Re: Duplicate Workers entries in some EXPLAIN plans
Thanks for the review! I looked at and rebased the patch on current master, ac5bdf6. I introduced a new test file because this bug is specifically about EXPLAIN output (as opposed to query execution or planning functionality), and it didn't seem like a test would fit in any of the other files. I focused on testing just the behavior around this specific bug (and fix). I think eventually we should probably test other more fundamental EXPLAIN features (and I'm happy to contribute to that) in that file, but that seems outside of the scope of this patch. Any thoughts on what we should do with text mode output (which is untouched right now)? The output Andres proposed above makes sense to me, but I'd like to get more input. diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index d189b8d573..96a8973884 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr, Oid sortOperator, Oid collation, bool nullsFirst); static void show_tablesample(TableSampleClause *tsc, PlanState *planstate, List *ancestors, ExplainState *es); -static void show_sort_info(SortState *sortstate, ExplainState *es); +static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es); static void show_hash_info(HashState *hashstate, ExplainState *es); static void show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es); @@ -289,6 +289,7 @@ NewExplainState(void) es->costs = true; /* Prepare output buffer. */ es->str = makeStringInfo(); + es->root_str = es->str; return es; } @@ -1090,9 +1091,20 @@ ExplainNode(PlanState *planstate, List *ancestors, const char *partialmode = NULL; const char *operation = NULL; const char *custom_name = NULL; + StringInfo *worker_strs = NULL; int save_indent = es->indent; bool haschildren; + /* Prepare per-worker output */ + if (es->analyze && planstate->worker_instrument) { + int num_workers = planstate->worker_instrument->num_workers; + int n; + worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo)); + for (n = 0; n < num_workers; n++) { + worker_strs[n] = makeStringInfo(); + } + } + switch (nodeTag(plan)) { case T_Result: @@ -1357,6 +1369,64 @@ ExplainNode(PlanState *planstate, List *ancestors, ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es); } + /* Prepare worker general execution details */ + if (es->analyze && es->verbose && planstate->worker_instrument) + { + WorkerInstrumentation *w = planstate->worker_instrument; + int n; + + for (n = 0; n < w->num_workers; ++n) + { + Instrumentation *instrument = >instrument[n]; + double nloops = instrument->nloops; + double startup_ms; + double total_ms; + double rows; + + if (nloops <= 0) +continue; + startup_ms = 1000.0 * instrument->startup / nloops; + total_ms = 1000.0 * instrument->total / nloops; + rows = instrument->ntuples / nloops; + + if (es->format == EXPLAIN_FORMAT_TEXT) + { +appendStringInfoSpaces(es->str, es->indent * 2); +appendStringInfo(es->str, "Worker %d: ", n); +if (es->timing) + appendStringInfo(es->str, + "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n", + startup_ms, total_ms, rows, nloops); +else + appendStringInfo(es->str, + "actual rows=%.0f loops=%.0f\n", + rows, nloops); +es->indent++; +if (es->buffers) + show_buffer_usage(es, >bufusage); +es->indent--; + } + else + { +ExplainOpenWorker(worker_strs[n], es); + +if (es->timing) +{ + ExplainPropertyFloat("Actual Startup Time", "ms", + startup_ms, 3, es); + ExplainPropertyFloat("Actual Total Time", "ms", + total_ms, 3, es); +} +ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); +ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es); + } + } + + if (es->format != EXPLAIN_FORMAT_TEXT) { + ExplainCloseWorker(es); + } + } + switch (nodeTag(plan)) { case T_SeqScan: @@ -1856,7 +1926,7 @@ ExplainNode(PlanState *planstate, List *ancestors, break; case T_Sort: show_sort_keys(castNode(SortState, planstate), ancestors, es); - show_sort_info(castNode(SortState, planstate), es); + show_sort_info(castNode(SortState, planstate), worker_strs, es); break; case T_MergeAppend: show_merge_append_keys(castNode(MergeAppendState, planstate), @@ -1885,74 +1955,30 @@ ExplainNode(PlanState *planstate, List *ancestors, if (es->buffers && planstate->instrument) show_buffer_usage(es, >instrument->bufusage); - /* Show worker detail */ - if (es->analyze && es->verbose && !es->hide_workers && - planstate->worker_instrument) + /* Prepare worker buffer usage */ + if (es->buffers && es->analyze && es->verbose && !es->hide_workers + && planstate->worker_instrument && es->format != EXPLAIN_FORMAT_TEXT) {
Re: Duplicate Workers entries in some EXPLAIN plans
Done! Thanks!
Re: Duplicate Workers entries in some EXPLAIN plans
I wanted to follow up on this patch since I received no feedback. What should my next steps be (besides rebasing, though I want to confirm there's interest before I do that)?
Re: Duplicate Workers entries in some EXPLAIN plans
On Thu, Oct 24, 2019 at 6:48 PM Andres Freund wrote: > Unfortunately I think the fix isn't all that trivial, due to the way we > output the per-worker information at the end of ExplainNode(), by just > dumping things into a string. It seems to me that a step in the right > direction would be for ExplainNode() to create > planstate->worker_instrument StringInfos, which can be handed to > routines like show_sort_info(), which would print the per-node > information into that, rather than directly dumping into > es->output. Most of the current "Show worker detail" would be done > earlier in ExplainNode(), at the place where we current display the > "actual rows" bit. > > ISTM that should include removing the duplication fo the the contents of > show_sort_info(), and probably also for the Gather, GatherMerge blocks > (I've apparently skipped adding the JIT information to the latter, not > sure if we ought to fix that in the stable branches). > > Any chance you want to take a stab at that? It took me a while, but I did take a stab at it (thanks for your off-list help). Attached is my patch that changes the structured formats to merge sort worker output in with costs/timing/buffers worker output. I have not touched any other worker output yet, since it's not under a Workers group as far as I can tell (so it does not exhibit the problem I originally reported). I can try to make further changes here if the approach is deemed sound. I also have not touched text output; above you had proposed > Sort Method: external merge Disk: 4920kB > Buffers: shared hit=682 read=10188, temp read=1415 written=2101 > Worker 0: actual time=130.058..130.324 rows=1324 loops=1 >Sort Method: external merge Disk: 5880kB >Buffers: shared hit=337 read=3489, temp read=505 written=739 > Worker 1: actual time=130.273..130.512 rows=1297 loops=1 >Buffers: shared hit=345 read=3507, temp read=505 written=744 >Sort Method: external merge Disk: 5920kB which makes sense to me, but I'd like to confirm this is the approach we want before I add it to the patch. This is my first C in close to a decade (and I was never much of a C programmer to begin with), so please be gentle. As Andres suggested off-list, I also changed the worker output to order fields that also occur in the parent node in the same way as the parent node. I've also added a test for the patch, and because this is really an EXPLAIN issue rather than a query feature issue, I added a src/test/regress/sql/explain.sql for the test. I added a couple of utility functions for munging json-formatted EXPLAIN plans into something we can repeatably verify in regression tests (the functions use json rather than jsonb to preserve field order). I have not added this for YAML or XML (even though they should behave the same way), since I'm not familiar with the the functions to manipulate those data types in a similar way (if they exist). My hunch is due to the similarity of structured formats, just testing JSON is enough, but I can expand/adjust tests as necessary. diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 62fb3434a3..8f898fc20c 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr, Oid sortOperator, Oid collation, bool nullsFirst); static void show_tablesample(TableSampleClause *tsc, PlanState *planstate, List *ancestors, ExplainState *es); -static void show_sort_info(SortState *sortstate, ExplainState *es); +static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es); static void show_hash_info(HashState *hashstate, ExplainState *es); static void show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es); @@ -290,6 +290,7 @@ NewExplainState(void) es->costs = true; /* Prepare output buffer. */ es->str = makeStringInfo(); + es->root_str = es->str; return es; } @@ -1073,9 +1074,20 @@ ExplainNode(PlanState *planstate, List *ancestors, const char *partialmode = NULL; const char *operation = NULL; const char *custom_name = NULL; + StringInfo *worker_strs = NULL; int save_indent = es->indent; bool haschildren; + /* Prepare per-worker output */ + if (es->analyze && planstate->worker_instrument) { + int num_workers = planstate->worker_instrument->num_workers; + int n; + worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo)); + for (n = 0; n < num_workers; n++) { + worker_strs[n] = makeStringInfo(); + } + } + switch (nodeTag(plan)) { case T_Result: @@ -1340,6 +1352,64 @@ ExplainNode(PlanState *planstate, List *ancestors, ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es); } + /* Prepare worker general execution details */ + if (es->analyze && es->verbose && planstate->worker_instrument) + { + WorkerInstrumentation *w = planstate->worker_instrument; + int n; + + for (n = 0;
Re: JIT performance bug/regression & JIT EXPLAIN
On Fri, Nov 15, 2019 at 5:49 AM Robert Haas wrote: > Personally, I don't care very much about backward-compatibility, or > about how hard it is for tools to parse. I want it to be possible, but > if it takes a little extra effort, so be it. I think these are two separate issues. I agree on backward-compatibility (especially if we can embed a server version in structured EXPLAIN output to make it easier for tools to track format differences), but not caring how hard it is for tools to parse? What's the point of structured formats, then? > My main concern is having > the text output look good to human beings, because that is the primary > format and they are the primary consumers. Structured output is also for human beings, albeit indirectly. That text is the primary format may be more of a reflection of the difficulty of building and integrating EXPLAIN tools than its inherent superiority (that said, I'll concede it's a concise and elegant format for what it does). What if psql supported an EXPLAINER like it does EDITOR? For what it's worth, after thinking about this a bit, I'd like to see structured EXPLAIN evolve into a more consistent format, even if it means breaking changes (and I do think a version specifier at the root of the plan would make this easier).
Re: JIT performance bug/regression & JIT EXPLAIN
On Mon, Oct 28, 2019 at 5:02 PM Andres Freund wrote: > What I dislike about that is that it basically again is introducing "again"? Am I missing some history here? I'd love to read up on this if there are mistakes to learn from. > something that requires either pattern matching on key names (i.e. a key > of '(.*) JIT' is one that has information about JIT, and the associated > expresssion is in key $1), or knowing all the potential keys an > expression could be in. That still seems less awkward than having to handle a Filter field that's either scalar or a group. Most current EXPLAIN options just add additional fields to the structured plan instead of modifying it, no? If that output is better enough, though, maybe we should just always make Filter a group and go with the breaking change? If tooling authors need to treat this case specially anyway, might as well evolve the format. > Another alternative would be to just remove the 'Output' line when a > node doesn't project - it can't really carry meaning in those cases > anyway? ¯\_(ツ)_/¯ For what it's worth, I certainly wouldn't miss it.
Re: JIT performance bug/regression & JIT EXPLAIN
>But that's pretty crappy, because it means that the 'shape' of the output >depends on the jit_details option. Yeah, that would be hard to work with. What about adding it as a sibling group? "Filter": "(lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp without time zone)", "Filter JIT": { "Expr": "evalexpr_0_2", "Deform Scan": "deform_0_3", "Deform Outer": null, "Deform Inner": null } Also not that pretty, but at least it's easier to work with (I also changed the dashes to spaces since that's what the rest of EXPLAIN is doing as a matter of style). >But the compat break due to that change is not small- perhaps we could instead >mark that in another way? We could add a "Projects" boolean key instead? Of course that's more awkward in text mode. Maybe compat break is less of an issue in text mode and we can treat this differently? >I'm not sure that 'TRANS' is the best placeholder for the transition value >here. Maybe $TRANS would be clearer? +1, I think the `$` makes it clearer that this is not a literal expression. >For HashJoin/Hash I've added 'Outer Hash Key' and 'Hash Key' for each key, but >only in verbose mode. That reads pretty well to me. What does the structured output look like?
Duplicate Workers entries in some EXPLAIN plans
Hello, I originally reported this in pgsql-bugs [0], but there wasn't much feedback there, so moving the discussion here. When using JSON, YAML, or XML-format EXPLAIN on a plan that uses a parallelized sort, the Sort nodes list two different entries for "Workers", one for the sort-related info, and one for general worker info. This is what this looks like in JSON (some details elided): { "Node Type": "Sort", ... "Workers": [ { "Worker Number": 0, "Sort Method": "external merge", "Sort Space Used": 20128, "Sort Space Type": "Disk" }, { "Worker Number": 1, "Sort Method": "external merge", "Sort Space Used": 20128, "Sort Space Type": "Disk" } ], ... "Workers": [ { "Worker Number": 0, "Actual Startup Time": 309.726, "Actual Total Time": 310.179, "Actual Rows": 4128, "Actual Loops": 1, "Shared Hit Blocks": 2872, "Shared Read Blocks": 7584, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 490, "Temp Written Blocks": 2529 }, { "Worker Number": 1, "Actual Startup Time": 306.523, "Actual Total Time": 307.001, "Actual Rows": 4128, "Actual Loops": 1, "Shared Hit Blocks": 3356, "Shared Read Blocks": 7100, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 490, "Temp Written Blocks": 2529 } ], "Plans:" ... } This is technically valid JSON, but it's extremely difficult to work with, since default JSON parsing in Ruby, node, Python, Go, and even Postgres' own jsonb only keep the latest key--the sort information is discarded (other languages probably don't fare much better; this is what I had on hand). As Tom Lane pointed out in my pgsql-bugs thread, this has been reported before [1] and in that earlier thread, Andrew Dunstan suggested that perhaps the simplest solution is to just rename the sort-related Workers node. Tom expressed some concerns about a breaking change here, though I think the current behavior means vanishingly few users are parsing this data correctly. Thoughts? Thanks, Maciek [0]: https://www.postgresql.org/message-id/CADXhmgSr807j2Pc9aUjW2JOzOBe3FeYnQBe_f9U%2B-Mm4b1HRUw%40mail.gmail.com [1]: https://www.postgresql.org/message-id/flat/41ee53a5-a36e-cc8f-1bee-63f6565bb...@dalibo.com