from:"Maciek Sakrejda"

Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-20 Thread Maciek Sakrejda

On Sun, May 19, 2024 at 10:50 PM Thomas Munro  wrote:
> Sometimes I question the sanity of the whole thing.  Considering
> cfbot's original "zero-effort CI" goal (or I guess "zero-extra-effort"
> would be better), I was curious about what other projects had the same
> idea, or whether we're really just starting at the "wrong end", and
> came up with:
>
> https://github.com/getpatchwork/patchwork
> http://vger.kernel.org/bpfconf2022_material/lsfmmbpf2022-bpf-ci.pdf
> <-- example user
> https://github.com/patchew-project/patchew
>
> Actually cfbot requires more effort than those, because it's driven
> first by Commitfest app registration.  Those projects are extremists
> IIUC: just write to a mailing list, no other bureaucracy at all (at
> least for most participants, presumably administrators can adjust the
> status in some database when things go wrong?).  We're actually
> halfway to Gitlab et al already, with a web account and interaction
> required to start the process of submitting a patch for consideration.
> What I'm less clear on is who else has come up with the "bitrot" test
> idea, either at the mailing list or web extremist ends of the scale.
> Those are also generic tools, and cfbot obviously knows lots of things
> about PostgreSQL, like the "highlights" and probably more things I'm
> forgetting.

For what it's worth, a few years before cfbot, I had privately
attempted a similar idea for Postgres [1]. The project here is
basically a very simple API and infrastructure for running builds and
make check. A previous version [2] subscribed to the mailing lists and
used Travis CI (and accidentally spammed some Postgres committers
[3]). The project petered out as my work responsibilities shifted (and
to be honest, after I felt sheepish about the spamming).

I think cfbot is way, way ahead of where my project got at this point.
But since you asked about other similar projects, I'm happy to discuss
further if it's helpful to bounce ideas off someone who's thought
about the same problem (though not for a while now, I admit).

Thanks,
Maciek

[1]: https://github.com/msakrejda/pg-quilter
[2]: 
https://github.com/msakrejda/pg-quilter/blob/2038d9493f9aa7d43d3eb0aec1d299b94624602e/lib/pg-quilter/git_harness.rb
[3]: 
https://www.postgresql.org/message-id/flat/CAM3SWZQboGoVYAJNoPMx%3DuDLE%2BZh5k2MQa4dWk91YPGDxuY-gQ%40mail.gmail.com#e24bf57b77cfb6c440c999c018c46e92

Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-16 Thread Maciek Sakrejda

Thanks for raising this. As someone who is only modestly familiar with
Postgres internals or even C, but would still like to contribute through
review, I find the current process of finding a suitable patch both tedious
and daunting. The Reviewing a Patch article on the wiki [0] says reviews
like mine are still welcome, but it's hard to get started. I'd love to see
this be more approachable.

Thanks,
Maciek

[0]: https://wiki.postgresql.org/wiki/Reviewing_a_Patch

Re: Possibility to disable `ALTER SYSTEM`

2024-03-27 Thread Maciek Sakrejda

On Wed, Mar 27, 2024, 11:46 Robert Haas  wrote:

> On Wed, Mar 27, 2024 at 1:12 PM Isaac Morland 
> wrote:
> > On Wed, 27 Mar 2024 at 13:05, Greg Sabino Mullane 
> wrote:
> >>> The purpose of the setting is to prevent
> accidental modifications via ALTER
> SYSTEM in environments where
> >> The emphasis on 'accidental' seems a bit heavy here, and odd. Surely,
> just "to prevent modifications via ALTER SYSTEM in environments where..."
> is enough?
> > Not necessarily disagreeing, but it's very important nobody ever mistake
> this for a security feature. I don't know if the extra word "accidental" is
> necessary, but I think that's the motivation.
>
> I think the emphasis is entirely warranted in this case.


+1. And while "non-malicious" may technically be more correct, I don't
think it's any clearer.

>

Re: Possibility to disable `ALTER SYSTEM`

2024-03-18 Thread Maciek Sakrejda

On Mon, Mar 18, 2024 at 10:27 AM Robert Haas  wrote:
> Right, we're adding this because of environments like Kubernetes,
> which isn't a version, but it is a platform, or at least a deployment
> mode, which is why I thought of that section. I think for now we
> should just file this under "Other platforms and clients," which only
> has one existing setting. If the number of settings of this type
> grows, we can split it out.

Fair enough, +1.

> Using enable_* as code for "this is a planner GUC" is a pretty stupid
> pattern, honestly, but I agree with you that it's long-established and
> we probably shouldn't deviate from it lightly. Perhaps just rename to
> allow_alter_system?

+1

Re: Possibility to disable `ALTER SYSTEM`

2024-03-18 Thread Maciek Sakrejda

On Mon, Mar 18, 2024 at 7:12 AM Jelte Fennema-Nio  wrote:
>
> On Mon, 18 Mar 2024 at 13:57, Robert Haas  wrote:
> > I would have been somewhat inclined to find an existing section
> > of postgresql.auto.conf for this parameter, perhaps "platform and
> > version compatibility".
>
> I tried to find an existing section, but I couldn't find any that this
> new GUC would fit into naturally. "Version and Platform Compatibility
> / Previous PostgreSQL Versions" (the one you suggested) seems wrong
> too. The GUCs there are to get back to Postgres behaviour from
> previous versions. So that section would only make sense if we'd turn
> enable_alter_system off by default (which obviously no-one in this
> thread suggests/wants).
>
> If you have another suggestion for an existing category that we should
> use, feel free to share. But imho, none of the existing ones are a
> good fit.

+1 on Version and Platform Compatibility. Maybe it just needs a new
subsection there? This is for compatibility with a "deployment
platform". The "Platform and Client Compatibility" subsection has just
one entry, so a new subsection with also just one entry seems
defensible, maybe just "Deployment Compatibility"? I think it's also
plausible that there will be other similar settings for managed
deployments in the future.

> > Even if that is what we're going to do, do we want to call them "guard
> > rails"? I'm not sure I'd find that name terribly clear, as a user.
>
> If anyone has a better suggestion, I'm happy to change it.

No better suggestion at the moment, but while I used the term to
explain the feature, I also don't think that's a great official name.
For one thing, the section could easily be misinterpreted as guard
rails for end-users who are new to Postgres. Also, I think it's more
colloquial in tone than Postgres docs conventions.

Further, I think we may want to change the GUC name itself. All the
other GUCs that start with enable_ control planner behavior:

maciek=# select name from pg_settings where name like 'enable_%';
  name

 enable_async_append
 enable_bitmapscan
 enable_gathermerge
 enable_hashagg
 enable_hashjoin
 enable_incremental_sort
 enable_indexonlyscan
 enable_indexscan
 enable_material
 enable_memoize
 enable_mergejoin
 enable_nestloop
 enable_parallel_append
 enable_parallel_hash
 enable_partition_pruning
 enable_partitionwise_aggregate
 enable_partitionwise_join
 enable_presorted_aggregate
 enable_seqscan
 enable_sort
 enable_tidscan
(21 rows)

Do we really want to break that pattern?

Re: Possibility to disable `ALTER SYSTEM`

2024-03-14 Thread Maciek Sakrejda

On Thu, Mar 14, 2024 at 1:38 PM Robert Haas  wrote:
> On Thu, Mar 14, 2024 at 4:08 PM Tom Lane  wrote:
> > The patch-of-record contains no such wording.
>
> I plan to fix that, if nobody else beats me to it.
>
> > And if this isn't a
> > security feature, then what is it?  If you have to say to your
> > (super) users "please don't mess with the system configuration",
> > you might as well just trust them not to do it the easy way as not
> > to do it the hard way.  If they're untrustworthy, why have they
> > got superuser?
>
> I mean, I feel like this question has been asked and answered before,
> multiple times, on this thread. If you sincerely don't understand the
> use case, I can try again to explain it. But somehow I feel like it's
> more that you just don't like the idea, which is fair, but it seems
> like a considerable number of people feel otherwise.

I know I'm jumping into a long thread here, but I've been following it
out of interest. I'm sympathetic to the use case, since I used to work
at a Postgres cloud provider, and while our system intentionally did
not give our end users superuser privileges, I can imagine other
managed environments where that's not an issue. I'd like to give
answering this question again a shot, because I think this has been a
persistent misunderstanding in this thread, and I don't think it's
been made all that clear.

It's not a security feature: it's a usability feature.

It's a usability feature because, when Postgres configuration is
managed by an outside mechanism (e.g., as in a Kubernetes
environment), ALTER SYSTEM currently allows a superuser to make
changes that appear to work, but may be discarded at some point in the
future when that outside mechanism updates the config. They may also
be represented incorrectly in a management dashboard if that dashboard
is based on the values in the outside configuration mechanism, rather
than values directly from Postgres.

In this case, the end user with access to Postgres superuser
privileges presumably also has access to the outside configuration
mechanism. The goal is not to prevent them from changing settings, but
to offer guard rails that prevent them from changing settings in a way
that will be unstable (revertible by a future update) or confusing
(not showing up in a management UI).

There are challenges here in making sure this is _not_ seen as a
security feature. But I do think the feature itself is sensible and
worthwhile.

Thanks,
Maciek

Re: Printing backtrace of postgres processes

2024-01-14 Thread Maciek Sakrejda

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I'm not sure if this actually still needs review, but it's marked as such in 
the CF app, so I'm reviewing it in the hopes of moving it along.

The feature works as documented. The docs say "This function is supported only 
if PostgreSQL was built with the ability to capture backtraces, otherwise it 
will emit a warning." I'm not sure what building with the ability to capture 
backtraces is, but it worked with no special config on my machine. I don't have 
much C experience, so I don't know if this is something that should have more 
context in a README somewhere, or if it's likely someone who's interested in 
this will already know what to do. The code looks fine to me.

I tried running make installcheck-world, but it failed on 17 tests. However, 
master also fails here on 17 tests. A normal make check-world passes on both 
branches. I assume I'm doing something wrong and would appreciate any pointers 
[0].

Based on my review and Daniel's comment above, I'm marking this as Ready for 
Committer.

Thanks,
Maciek

[0] My regression.diffs has errors like

```
diff -U3 
/home/maciek/code/aux/postgres/src/test/regress/expected/copyselect.out 
/home/maciek/code/aux/postgres/src/test/regress/results/copyselect.out
--- /home/maciek/code/aux/postgres/src/test/regress/expected/copyselect.out 
2023-01-02 12:21:10.792646101 -0800
+++ /home/maciek/code/aux/postgres/src/test/regress/results/copyselect.out  
2024-01-14 15:04:07.513887866 -0800
@@ -131,11 +131,6 @@
 2
  ?column? 
 --
-3
-(1 row)
-
- ?column? 
---
 4
 (1 row)
```

and

```
diff -U3 
/home/maciek/code/aux/postgres/src/test/regress/expected/create_table.out 
/home/maciek/code/aux/postgres/src/test/regress/results/create_table.out
--- /home/maciek/code/aux/postgres/src/test/regress/expected/create_table.out   
2023-10-02 22:14:02.583377845 -0700
+++ /home/maciek/code/aux/postgres/src/test/regress/results/create_table.out
2024-01-14 15:04:09.037890710 -0800
@@ -854,8 +854,6 @@
  b  | integer |   | not null | 1   | plain|  | 
 Partition of: parted FOR VALUES IN ('b')
 Partition constraint: ((a IS NOT NULL) AND (a = 'b'::text))
-Not-null constraints:
-"part_b_b_not_null" NOT NULL "b" (local, inherited)
 
 -- Both partition bound and partition key in describe output
 \d+ part_c
``` 

I'm on Ubuntu 22.04 with Postgres 11, 12, 13, and 16 installed from PGDG.

The new status of this patch is: Ready for Committer

Re: Pdadmin open on Macbook issue

2023-12-28 Thread Maciek Sakrejda

This is not the right mailing list for your question. Try the
pgadmin-support [1] mailing list. You may also want to include more details
in your question, because it's not really possible to tell what's going
wrong from your description.

[1]: https://www.postgresql.org/list/pgadmin-support/

Re: Proposal: In-flight explain logging

2023-12-02 Thread Maciek Sakrejda

Have you seen the other recent patch regarding this? [1] The mailing
list thread was active pretty recently. The submission is marked as
Needs Review. I haven't looked at either patch, but the proposals are
very similar as I understand it.

[1]: https://commitfest.postgresql.org/45/4345/

Re: Emitting JSON to file using COPY TO

2023-12-02 Thread Maciek Sakrejda

On Fri, Dec 1, 2023 at 11:32 AM Joe Conway  wrote:
> 1. Is supporting JSON array format sufficient, or does it need to
> support some other options? How flexible does the support scheme need to be?

"JSON Lines" is a semi-standard format [1] that's basically just
newline-separated JSON values. (In fact, this is what
log_destination=jsonlog gives you for Postgres logs, no?) It might be
worthwhile to support that, too.

[1]: https://jsonlines.org/

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Maciek Sakrejda

On Tue, Oct 17, 2023, 09:22 Robert Haas  wrote:

> On Tue, Oct 17, 2023 at 12:18 PM Tom Lane  wrote:
> > Robert Haas  writes:
> > Is that actually possible?  I had the idea that "git push" is an
> > atomic operation, ie 100% or nothing.  Is it only atomic per-branch?
>
> I believe so.


Git push does have an --atomic flag to treat the entire push as a single
operation.

Re: Differences between = ANY and IN?

2023-10-03 Thread Maciek Sakrejda

Great, thanks for the guidance!

Re: pg_stat_statements and "IN" conditions

2023-10-03 Thread Maciek Sakrejda

I've also tried the patch and I see the same results as Jakub, which
make sense to me. I did have issues getting it to apply, though: `git
am` complains about a conflict, though patch itself was able to apply
it.

Differences between = ANY and IN?

2023-10-02 Thread Maciek Sakrejda

Hello,

My colleague has been working on submitting a patch [1] to the Ruby
Rails framework to address some of the problems discussed in [2].
Regardless of whether that change lands, the change in Rails would be
useful since people will be running Postgres versions without this
patch for a while.

My colleague's patch changes SQL generated from Ruby expressions like
`where(id: [1, 2])` . This is currently translated to roughly `WHERE
id IN (1, 2)` and would be changed to `id = ANY('{1,2}')`.

As far as we know, the expressions are equivalent, but we wanted to
double-check: are there any edge cases to consider here (other than
the pg_stat_statements behavior, of course)?

Thanks,
Maciek

[1]: https://github.com/rails/rails/pull/49388
[2]: 
https://www.postgresql.org/message-id/flat/20230209172651.cfgrebpyyr72h7fv%40alvherre.pgsql#eef3c77bc28b9922ea6b9660b0221b5d

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-09-20 Thread Maciek Sakrejda

On Tue, Sep 19, 2023, 20:23 Maciek Sakrejda  wrote:

> I wonder if something like CURRENT (i.e., the search path at function
> creation time) might be a useful keyword addition. I can see some uses
> (more forgiving than SYSTEM but not as loose as SESSION), but I don't
> know if it would justify its presence.

I realize now this is exactly what SET search_path FROM CURRENT does. Sorry
for the noise.

Regarding extensions installed in the public schema throwing a wrench in
the works, is that still a problem if the public schema is not writable? I
know that that's a new default, but it won't be forever.

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-09-19 Thread Maciek Sakrejda

On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis  wrote:
>...
> On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote:
> > That leads to a second idea, which is having it continue
> > to be a GUC but only affect directly-entered SQL, with all
> > indirectly-entered SQL either being stored as a node tree or having a
> > search_path property attached somewhere.
>
> That's not too far from the proposed patch and I'd certainly be
> interested to hear more and/or adapt my patch towards this idea.

As an interested bystander, that's the same thing I was thinking when
reading this. I reread your original e-mail, Jeff, and I still think
that.

I wonder if something like CURRENT (i.e., the search path at function
creation time) might be a useful keyword addition. I can see some uses
(more forgiving than SYSTEM but not as loose as SESSION), but I don't
know if it would justify its presence.

Thanks for working on this.

Thanks,
Maciek

Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread Maciek Sakrejda

On Mon, May 1, 2023, 18:08 Robert Haas  wrote:

> I am saying that, while wraparound is perhaps not a perfect term
> for what's happening, it is not, in my opinion, a bad term either.

I don't want to put words into Peter's mouth, but I think that he's arguing
that the term "wraparound" suggests that there is something special about
the transition between xid 2^32 and xid 0 (or, well, 3). There isn't.
There's only something special about the transition, as your current xid
advances, between the xid that's half the xid space ahead of your current
xid and the xid that's half the xid space behind the current xid, if the
latter is not frozen. I don't think that's what most users think of when
they hear "wraparound".

Re: doc: add missing "id" attributes to extension packaging page

2023-04-05 Thread Maciek Sakrejda

For what it's worth, I think having the anchors be always-visible when
CSS disabled is a feature. The content is still perfectly readable,
and the core feature from this patch is available. Introducing
JavaScript to lose that functionality seems like a step backwards.

By the way, the latest patch attachment was not the full patch series,
which I think confused cfbot: [1]  (unless I'm misunderstanding the
state of the patch series).

And thanks for working on this. I've hunted in the page source for ids
to link to a number of times. I look forward to not doing that
anymore.

Thanks,
Maciek

[1]: https://commitfest.postgresql.org/42/4042/

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-23 Thread Maciek Sakrejda

On Thu, Feb 23, 2023, 09:55 Nikolay Samokhvalov 
wrote:

> On Thu, Feb 23, 2023 at 9:05 AM Maciek Sakrejda 
> wrote:
>
>> I think Heikki's solution is probably more practical since (1) ..
>
>
> Note that these ideas target two *different* problems:
> - what was the duration of the last query
> - when was the last query executed
>
> So, having both solved would be ideal.
>

Fair point, but since the duration solution needs to capture two timestamps
anyway, it could print start time as well as duration.

The prompt timestamp could still be handy for more intricate session
forensics, but I don't know if that's a common-enough use case.

Thanks,
Maciek

>

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-23 Thread Maciek Sakrejda

+1 on solving the general problem of "I forgot to set \timing--how
long did this run?". I could have used that more than once in the
past, and I'm sure it will come up again.

I think Heikki's solution is probably more practical since (1) even if
we add the prompt parameter originally proposed, I don't see it being
included in the default, so it would require users to change their
prompt before they can benefit from it and (2) even if we commit to
never allowing tweaks to the format, I foresee a slow but endless
trickle of requests and patches to do so.

Thanks,
Maciek

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-14 Thread Maciek Sakrejda

On Tue, Feb 14, 2023 at 11:08 AM Andres Freund  wrote:
> One thing I started to wonder about since is whether we should remove the io_
> prefix from io_object, io_context. The prefixes make sense on the C level, but
> it's not clear to me that that's also the case on the table level.

Yeah, +1. It's hard to argue that there would be any confusion,
considering `io_` is in the name of the view.

(Unless, I suppose, some other, non-I/O, "some_object" or
"some_context" column were to be introduced to this view in the
future. But that doesn't seem likely?)

Re: ANY_VALUE aggregate

2023-02-14 Thread Maciek Sakrejda

I could have used such an aggregate in the past, so +1.

This is maybe getting into nit-picking, but perhaps it should be
documented as returning an "arbitrary" value instead of a
"non-deterministic" one? Technically the value is deterministic:
there's a concrete algorithm specifying how it's selected. However,
the algorithm is reserved as an implementation detail, since the
function is designed for cases in which the caller should not care
which value is returned.

Thanks,
Maciek

Re: Something is wrong with wal_compression

2023-01-27 Thread Maciek Sakrejda

On Fri, Jan 27, 2023, 18:58 Andres Freund  wrote:

> Hi,
>
> On 2023-01-27 16:15:08 +1300, Thomas Munro wrote:
> > It would be pg_current_xact_id() that would have to pay the cost of
> > the WAL flush, not pg_xact_status() itself, but yeah that's what the
> > patch does (with some optimisations).  I guess one question is whether
> > there are any other reasonable real world uses of
> > pg_current_xact_id(), other than the original goal[1].
>
> txid_current() is a lot older than pg_current_xact_id(), and they're
> backed by
> the same code afaict. 8.4 I think.
>
> Unfortunately txid_current() is used in plenty montiring setups IME.
>
> I don't think it's a good idea to make a function that was quite cheap for
> 15
> years, suddenly be several orders of magnitude more expensive...


As someone working on a monitoring tool that uses it (well, both), +1. We'd
have to rethink a few things if this becomes a performance concern.

Thanks,
Maciek

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-01-17 Thread Maciek Sakrejda

On Tue, Jan 17, 2023 at 9:22 AM Melanie Plageman
 wrote:
> On Mon, Jan 16, 2023 at 4:42 PM Maciek Sakrejda  wrote:
> > I missed a couple of versions, but I think the docs are clearer now.
> > I'm torn on losing some of the detail, but overall I do think it's a
> > good trade-off. Moving some details out to after the table does keep
> > the bulk of the view documentation more readable, and the "inform
> > database tuning" part is great. I really like the idea of a separate
> > Interpreting Statistics section, but for now this works.
> >
> > >+  vacuum: I/O operations performed outside of 
> > >shared
> > >+  buffers while vacuuming and analyzing permanent relations.
> >
> > Why only permanent relations? Are temporary relations treated
> > differently? I imagine if someone has a temp-table-heavy workload that
> > requires regularly vacuuming and analyzing those relations, this point
> > may be confusing without some additional explanation.
>
> Ah, yes. This is a bit confusing. We don't use buffer access strategies
> when operating on temp relations, so vacuuming them is counted in IO
> Context normal. I've added this information to the docs but now that
> definition is a bit long. Perhaps it should be a note? That seems like
> it would draw too much attention to this detail, though...

Thanks for clarifying. I think the updated definition still works:
it's still shorter than the `normal` context definition.

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-01-16 Thread Maciek Sakrejda

On Fri, Jan 13, 2023 at 10:38 AM Melanie Plageman
 wrote:
>
> Attached is v47.

I missed a couple of versions, but I think the docs are clearer now.
I'm torn on losing some of the detail, but overall I do think it's a
good trade-off. Moving some details out to after the table does keep
the bulk of the view documentation more readable, and the "inform
database tuning" part is great. I really like the idea of a separate
Interpreting Statistics section, but for now this works.

>+  vacuum: I/O operations performed outside of 
>shared
>+  buffers while vacuuming and analyzing permanent relations.

Why only permanent relations? Are temporary relations treated
differently? I imagine if someone has a temp-table-heavy workload that
requires regularly vacuuming and analyzing those relations, this point
may be confusing without some additional explanation.

Other than that, this looks great.

Thanks,
Maciek

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-02 Thread Maciek Sakrejda

On Fri, Jul 15, 2022 at 11:21 AM Maciek Sakrejda  wrote:
> On Fri, Jul 1, 2022 at 10:26 AM Andres Freund  wrote:
> > On 2022-07-01 01:23:01 -0700, Lukas Fittl wrote:
> >...
> > > Known WIP problems with this patch version:
> > >
> > > * There appears to be a timing discrepancy I haven't yet worked out, where
> > >   the \timing data reported by psql doesn't match what EXPLAIN ANALYZE is
> > >   reporting. With Andres' earlier test case, I'm seeing a consistent 
> > > ~700ms
> > >   higher for \timing than for the EXPLAIN ANALYZE time reported on the
> > > server
> > >   side, only when rdtsc measurement is used -- its likely there is a 
> > > problem
> > >   somewhere with how we perform the cycles to time conversion
> >
> > Could you explain a bit more what you're seeing? I just tested your patches
> > and didn't see that here.
>
> I did not see this either, but I did see that the execution time
> reported by \timing is (for this test case) consistently 0.5-1ms
> *lower* than the Execution Time reported by EXPLAIN. I did not see
> that on master. Is that expected?

For what it's worth, I can no longer reproduce this. In fact, I went
back to master-as-of-around-then and applied Lukas' v2 patches again,
and I still can't reproduce that. I do remember it happening
consistently across several executions, but now \timing consistently
shows 0.5-1ms slower, as expected. This does not explain the different
timing issue Lukas was seeing in his tests, but I think we can assume
what I reported originally here is not an issue.

Re: GetNewObjectId question

2022-12-10 Thread Maciek Sakrejda

Sure. My C is pretty limited, but I think it's just the attached? I
patterned the usage on the way this is done in CreateRole. It passes
check-world here.
From c7cae5d3e8d179505f26851f1241436a3748f9a8 Mon Sep 17 00:00:00 2001
From: Maciek Sakrejda 
Date: Sat, 10 Dec 2022 22:51:02 -0800
Subject: [PATCH] Replace GetNewObjectId call with GetNewOidWithIndex

GetNewObjectId is not intended to be used directly in most situations,
since it can lead to duplicate OIDs unless that is guarded against.
---
 src/backend/commands/user.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 8b6543edee..0686807fa0 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -1850,7 +1850,8 @@ AddRoleMems(const char *rolename, Oid roleid,
 			}

 			/* get an OID for the new row and insert it */
-			objectId = GetNewObjectId();
+			objectId = GetNewOidWithIndex(pg_authmem_rel, AuthMemOidIndexId,
+		  Anum_pg_auth_members_oid);
 			new_record[Anum_pg_auth_members_oid - 1] = objectId;
 			tuple = heap_form_tuple(pg_authmem_dsc,
 	new_record, new_record_nulls);
-- 
2.25.1

GetNewObjectId question

2022-12-10 Thread Maciek Sakrejda

While browsing through varsup.c, I noticed this comment on GetNewObjectId:

* Hence, this routine should generally not be used directly.  The only direct
* callers should be GetNewOidWithIndex() and GetNewRelFileNumber() in
* catalog/catalog.c.

But AddRoleMems in user.c appears to also call the function directly:

/* get an OID for the new row and insert it */
objectId = GetNewObjectId();
new_record[Anum_pg_auth_members_oid - 1] = objectId;
tuple = heap_form_tuple(pg_authmem_dsc,
new_record, new_record_nulls);
CatalogTupleInsert(pg_authmem_rel, tuple);

I'm not sure if that call is right, but this seems inconsistent.
Should that caller be using GetNewOidWithIndex instead? Or should the
comment be updated?

Thanks,
Maciek

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-12-07 Thread Maciek Sakrejda

In the pg_stat_statements docs, there are several column descriptions like

   Total number of ... by the statement

You added an additional sentence to some describing the equivalent
pg_stat_io values, but you only added a period to the previous
sentence for shared_blks_read (for other columns, the additional
description just follows directly). These should be consistent.

Otherwise, the docs look good to me.

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-12-04 Thread Maciek Sakrejda

On Tue, Nov 29, 2022 at 5:13 PM Melanie Plageman
 wrote:
> Thanks for the review, Maciek!
>
> I've attached a new version 39 of the patch which addresses your docs
> feedback from this email as well as docs feedback from Andres in [1] and
> Justin in [2].

This looks great! Just a couple of minor comments.

> You are right: reused is a normal, expected part of strategy
> execution. And you are correct: the idea behind reusing existing
> strategy buffers instead of taking buffers off the freelist is to leave
> those buffers for blocks that we might expect to be accessed more than
> once.
>
> In practice, however, if you happen to not be using many shared buffers,
> and then do a large COPY, for example, you will end up doing a bunch of
> writes (in order to reuse the strategy buffers) that you perhaps didn't
> need to do at that time had you leveraged the freelist. I think the
> decision about which tradeoff to make is quite contentious, though.

Thanks for the explanation--that makes sense.

> On Mon, Nov 7, 2022 at 1:26 PM Maciek Sakrejda  wrote:
> > Alternately, what do you think about pulling equivalencies to existing
> > views out of the main column descriptions, and adding them after the
> > main table as a sort of footnote? Most view docs don't have anything
> > like that, but pg_stat_replication does and it might be a good pattern
> > to follow.
> >
> > Thoughts?
>
> Thanks for including a patch!
> In the attached v39, I've taken your suggestion of flattening some of
> the lists and done some rewording as well. I have also moved the note
> about equivalence with pg_stat_statements columns to the
> pg_stat_statements documentation. The result is quite a bit different
> than what I had before, so I would be interested to hear your thoughts.
>
> My concern with the blue "note" section like you mentioned is that it
> would be harder to read the lists of backend types than it was in the
> tabular format.

Oh, I wasn't thinking of doing a separate "note": just additional
paragraphs of text after the table (like what pg_stat_replication has
before its "note", or the brief comment after the pg_stat_archiver
table). But I think the updated docs work also.

+  
+   The context or location of an IO operation.
+  

maybe "...of an IO operation:" (colon) instead?

+   default. Future values could include those derived from
+   XLOG_BLCKSZ, once WAL IO is tracked in this view, and
+   constant multipliers once non-block-oriented IO (e.g. temporary file IO)
+   is tracked here.

I know Lukas had commented that we should communicate that the goal is
to eventually provide relatively comprehensive I/O stats in this view
(you do that in the view description and I think that works), and this
is sort of along those lines, but I think speculative documentation
like this is not all that helpful. I'd drop this last sentence. Just
my two cents.

+  
+   evicted in io_context
+   buffer pool and io_object
+   temp relation counts the number of times a block of
+   data from an existing local buffer was evicted in order to replace it
+   with another block, also in local buffers.
+  

Doesn't this follow from the first sentence of the column description?
I think we could drop this, no?

Otherwise, the docs look good to me.

Thanks,
Maciek

Odd behavior with pg_stat_statements and queries called from SQL functions

2022-11-16 Thread Maciek Sakrejda

I noticed an odd behavior today in pg_stat_statements query
normalization for queries called from SQL-language functions. If I
have three functions that call an essentially identical query (the
functions are only marked SECURITY DEFINER to prevent inlining):

maciek=# create or replace function f1(f1param text) returns text
language sql as 'select f1param' security definer;
CREATE FUNCTION
maciek=# create or replace function f2(f2param text) returns text
language sql as 'select f2param' security definer;
CREATE FUNCTION
maciek=# create or replace function f3(text) returns text language sql
as 'select $1' security definer;
CREATE FUNCTION

and I have pg_stat_statements.track = 'all', so that queries called
from functions are tracked, these all end up with the same query id in
pg_stat_statements, but the query text includes the parameter name (if
one is referenced in the query in the function). E.g., if I call f1
first, then f2 and f3, I get:

maciek=# select queryid, query, calls from pg_stat_statements where
queryid = 6741491046520556186;
   queryid   | query  | calls
-++---
 6741491046520556186 | select f1param | 3
(1 row)

If I call f3 first, then f2 and f1, I get

maciek=# select queryid, query, calls from pg_stat_statements where
queryid = 6741491046520556186;
   queryid   |   query   | calls
-+---+---
 6741491046520556186 | select $1 | 3
(1 row)

I understand that the query text may be captured differently for
different queries that map to the same id, but it seems confusing that
parameter names referenced in queries called from functions are not
normalized away, since they're not germane to the query execution
itself, and the context of the function is otherwise stripped away by
this point. I would expect that all three of these queries end up in
pg_stat_statements with the query text "select $1".Thoughts?

Thanks,
Maciek

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-11-07 Thread Maciek Sakrejda

On Thu, Nov 3, 2022 at 10:00 AM Melanie Plageman
 wrote:
>
> I decided not to call it pg_statio because all of the other stats views
> have an underscore after stat and I thought it was an opportunity to be
> consistent with them.

Oh, got it. Makes sense.

> > I'm reviewing the rendered docs now, and I noticed sentences like this
> > are a bit hard to scan: they force the reader to parse a big list of
> > backend types before even getting to the meat of what this is talking
> > about. Should we maybe reword this so that the backend list comes at
> > the end of the sentence? Or maybe even use a list (e.g., like in the
> > "state" column description in pg_stat_activity)?
>
> Good idea with the bullet points.
> For the lengthy lists, I've added bullet point lists to the docs for
> several of the columns. It is quite long now but, hopefully, clearer?
> Let me know if you think it improves the readability.

Hmm, I should have tried this before suggesting it. I think the lists
break up the flow of the column description too much. What do you
think about the attached (on top of your patches--attaching it as a
.diff to hopefully not confuse cfbot)? I kept the lists for backend
types but inlined the others as a middle ground. I also added a few
omitted periods and reworded "read plus extended" to avoid starting
the sentence with a (lowercase) varname (I think in general it's fine
to do that, but the more complicated sentence structure here makes it
easier to follow if the sentence starts with a capital).

Alternately, what do you think about pulling equivalencies to existing
views out of the main column descriptions, and adding them after the
main table as a sort of footnote? Most view docs don't have anything
like that, but pg_stat_replication does and it might be a good pattern
to follow.

Thoughts?

> > Also, is this (in the middle of the table) the right place for this
> > column? I would have expected to see it before or after all the actual
> > I/O op cells.
>
> I put it after read, write, and extend columns because it applies to
> them. It doesn't apply to files_synced. For reused and evicted, I didn't
> think bytes reused and evicted made sense. Also, when we add non-block
> oriented IO, reused and evicted won't be used but op_bytes will be. So I
> thought it made more sense to place it after the operations it applies
> to.

Got it, makes sense.

> > + io_contexts. When a Buffer Access
> > + Strategy reuses a buffer in the strategy ring, it must evict its
> > + contents, incrementing reused. When a Buffer
> > + Access Strategy adds a new shared buffer to the strategy ring
> > + and this shared buffer is occupied, the Buffer Access
> > + Strategy must evict the contents of the shared buffer,
> > + incrementing evicted.
> >
> > I think the parallel phrasing here makes this a little hard to follow.
> > Specifically, I think "must evict its contents" for the strategy case
> > sounds like a bad thing, but in fact this is a totally normal thing
> > that happens as part of strategy access, no? The idea is you probably
> > won't need that buffer again, so it's fine to evict it. I'm not sure
> > how to reword, but I think the current phrasing is misleading.
>
> I had trouble rephrasing this. I changed a few words. I see what you
> mean. It is worth noting that reusing strategy buffers when there are
> buffers on the freelist may not be the best behavior, so I wouldn't
> necessarily consider "reused" a good thing. However, I'm not sure how
> much the user could really do about this. I would at least like this
> phrasing to be clear (evicted is for shared buffers, reused is for
> strategy buffers), so, perhaps this section requires more work.

Oh, I see. I think the updated wording works better. Although I think
we can drop the quotes around "Buffer Access Strategy" here. They're
useful when defining the term originally, but after that I think it's
clearer to use the term unquoted.

Just to understand this better myself, though: can you clarify when
"reused" is not a normal, expected part of the strategy execution? I
was under the impression that a ring buffer is used because each page
is needed only "once" (i.e., for one set of operations) for the
command using the strategy ring buffer. Naively, in that situation, it
seems better to reuse a no-longer-needed buffer than to claim another
buffer from the freelist (where other commands may eventually make
better use of it).

> > + again. A high number of repossessions is a sign of contention for the
> > + blocks operated on by the strategy operation.
> >
> > This (and in general the repossession description) makes sense, but
> > I'm not sure what to do with the information. Maybe Andres is right
> > that we could skip this in the first version?
>
> I've removed repossessed and rejected in attached v37. I am a bit sad
> about this because I don't see a good way forward and I think those
> could be useful for users.

I can see that, but I think as long as we're not doing

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-30 Thread Maciek Sakrejda

On Wed, Oct 26, 2022 at 10:55 AM Melanie Plageman
 wrote:

+ The pg_statio_ and
+ pg_stat_io views are primarily useful to determine
+ the effectiveness of the buffer cache. When the number of actual disk reads

Totally nitpicking, but this reads a little funny to me. Previously
the trailing underscore suggested this is a group, and now with
pg_stat_io itself added (stupid question: should this be
"pg_statio"?), it sounds like we're talking about two views:
pg_stat_io and "pg_statio_". Maybe something like "The pg_stat_io view
and the pg_statio_ set of views are primarily..."?

+ by that backend type in that IO context. Currently only a subset of IO
+ operations are tracked here. WAL IO, IO on temporary files, and some forms
+ of IO outside of shared buffers (such as when building indexes or moving a
+ table from one tablespace to another) could be added in the future.

Again nitpicking, but should this be "may be added"? I think "could"
suggests the possibility of implementation, whereas "may" feels more
like a hint as to how the feature could evolve.

+ portion of the main shared buffer pool. This pattern is called a
+ Buffer Access Strategy in the
+ PostgreSQL source code and the fixed-size
+ ring buffer is referred to as a strategy ring buffer for
+ the purposes of this view's documentation.
+ 

Nice, I think this explanation is very helpful. You also use the term
"strategy context" and "strategy operation" below. I think it's fairly
obvious what those mean, but pointing it out in case we want to note
that here, too.

+ read and extended for

Maybe "plus" instead of "and" here for clarity (I'm assuming that's
what the "and" means)?

+ backend_types autovacuum launcher,
+ autovacuum worker, client backend,
+ standalone backend, background
+ worker, and walsender for all
+ io_contexts is similar to the sum of

I'm reviewing the rendered docs now, and I noticed sentences like this
are a bit hard to scan: they force the reader to parse a big list of
backend types before even getting to the meat of what this is talking
about. Should we maybe reword this so that the backend list comes at
the end of the sentence? Or maybe even use a list (e.g., like in the
"state" column description in pg_stat_activity)?

+ heap_blks_read, idx_blks_read,
+ tidx_blks_read, and
+ toast_blks_read in 
+ pg_statio_all_tables. and
+ blks_read from If using the PostgreSQL extension,
+ ,
+ read for
+ backend_types autovacuum launcher,
+ autovacuum worker, client backend,
+ standalone backend, background
+ worker, and walsender for all
+ io_contexts is equivalent to

Same comment as above re: the lengthy list.

+ Normal client backends should be able to rely on maintenance processes
+ like the checkpointer and background writer to write out dirty data as

Nice--it's great to see this mentioned. But I think these are
generally referred to as "auxiliary" not "maintenance" processes, no?

+ If using the PostgreSQL extension,
+ , written and
+ extended for backend_types

Again, should this be "plus" instead of "and"?

+ 
+ bytes_conversion bigint
+ 

I think this general approach works (instead of unit). I'm not wild
about the name, but I don't really have a better suggestion. Maybe
"op_bytes" (since each cell is counting the number of I/O operations)?
But I think bytes_conversion is okay.

Also, is this (in the middle of the table) the right place for this
column? I would have expected to see it before or after all the actual
I/O op cells.

+ io_contexts. When a Buffer Access
+ Strategy reuses a buffer in the strategy ring, it must evict its
+ contents, incrementing reused. When a Buffer
+ Access Strategy adds a new shared buffer to the strategy ring
+ and this shared buffer is occupied, the Buffer Access
+ Strategy must evict the contents of the shared buffer,
+ incrementing evicted.

I think the parallel phrasing here makes this a little hard to follow.
Specifically, I think "must evict its contents" for the strategy case
sounds like a bad thing, but in fact this is a totally normal thing
that happens as part of strategy access, no? The idea is you probably
won't need that buffer again, so it's fine to evict it. I'm not sure
how to reword, but I think the current phrasing is misleading.

+ The number of times a bulkread found the current
+ buffer in the fixed-size strategy ring dirty and requiring flush.

Maybe "...found ... to be dirty..."?

+ frequent vacuuming or more aggressive autovacuum settings, as buffers are
+ dirtied during a bulkread operation when updating the hint bit or when
+ performing on-access pruning.

Are there docs to cross-reference here, especially for pruning? I
couldn't find much except a few un-explained mentions in the page
layout docs [2], and most of the search results refer to partition
pruning. Searching for hint bits at least gives some info in blog
posts and the wiki.

+ again. A high number of repossessions is a sign of contention for the
+ blocks operated on by the strategy operation.

This (and in

Re: warn if GUC set to an invalid shared library

2022-10-30 Thread Maciek Sakrejda

On Sat, Oct 29, 2022 at 10:40 AM Justin Pryzby  wrote:
>
> On Fri, Sep 02, 2022 at 05:24:58PM -0500, Justin Pryzby wrote:
> > It caused no issue when I changed:
> >
> > /* Check that it's acceptable for the indicated 
> > parameter */
> > if (!parse_and_validate_value(record, name, value,
> > - PGC_S_FILE, ERROR,
> > + PGC_S_TEST, ERROR,
> >   , ))
> >
> > I'm not sure where to go from here.
>
> I'm hoping for some guidance ; this simple change may be naive, but I'm not
> sure what a wider change would look like.

I assume you mean guidance on implementation details here, and not on
design. I still think this is a useful patch and I'd be happy to
review and try out future iterations, but I don't have any useful
input here.

Also, for what it's worth, I think requiring the libraries to be in
place before running ALTER SYSTEM does not really seem that onerous. I
can't really think of use cases it precludes.

Thanks,
Maciek

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-23 Thread Maciek Sakrejda

On Thu, Oct 20, 2022 at 10:31 AM Andres Freund  wrote:
> - "repossession" is a very unintuitive name for me. If we want something like
>   it, can't we just name it reuse_failed or such?

+1, I think "repossessed" is awkward. I think "reuse_failed" works,
but no strong opinions on an alternate name.

> - Wonder if the column names should be reads, writes, extends, etc instead of
>   the current naming pattern

Why? Lukas suggested alignment with existing views like
pg_stat_database and pg_stat_statements. It doesn't make sense to use
the blks_ prefix since it's not all blocks, but otherwise it seems
like we should be consistent, no?

> > "freelist_acquired" is confusing for local buffers but I wanted to
> > distinguish between reuse/eviction of local buffers and initial
> > allocation. "freelist_acquired" seemed more fitting because there is a
> > clocksweep to find a local buffer and if it hasn't been allocated yet it
> > is allocated in a place similar to where shared buffers acquire a buffer
> > from the freelist. If I didn't count it here, I would need to make a new
> > column only for local buffers called "allocated" or something like that.
>
> I think you're making this too granular. We need to have more detail than
> today. But we don't necessarily need to catch every nuance.

In general I agree that coarser granularity here may be easier to use.
I do think the current docs explain what's going on pretty well,
though, and I worry if merging too many concepts will make that harder
to follow. But if a less detailed breakdown still communicates
potential problems, +1.

> > This seems not too bad at first, but if you consider that later we will
> > add other kinds of IO -- eg WAL IO or temporary file IO, we won't be
> > able to use these existing columns and will need to add even more
> > columns describing the exact behavior in those cases.
>
> I think it's clearly not the right direction.

+1, I think the existing approach makes more sense.

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-23 Thread Maciek Sakrejda

On Wed, Oct 19, 2022 at 12:27 PM Melanie Plageman
 wrote:
>
> v34 is attached.
> I think the column names need discussion. Also, the docs need more work
> (I added a lot of new content there). I could use feedback on the column
> names and definitions and review/rephrasing ideas for the docs
> additions.

Nice! I think the expanded docs are great, and make this information
much easier to interpret.

>+   io_context bulkread, existing
>+   dirty buffers in the ring requirng flush are

"requiring"

>+   shared buffers were acquired from the freelist and added to the
>+   fixed-size strategy ring buffer. Shared buffers are added to the
>+   strategy ring lazily. If the current buffer in the ring is pinned or in

This is the first mention of the term "strategy" in these docs. It's
not totally opaque, since there's some context, but maybe we should
either try to avoid that term or define it more explicitly?

>+   io_contexts. This is equivalent to
>+   evicted for shared buffers in
>+   io_context shared, as the 
>contents
>+   of the buffer are evicted but refers to the case when 
>the

I don't quite follow this: does this mean that I should expect
'reused' and 'evicted' to be equal in the 'shared' context, because
they represent the same thing? Or will 'reused' just be null because
it's not distinct from 'evicted'? It looks like it's null right now,
but I find the wording here confusing.

>+  future with a new shared buffer. A high number of
>+  bulkread rejections can indicate a need for more
>+  frequent vacuuming or more aggressive autovacuum settings, as buffers 
>are
>+  dirtied during a bulkread operation when updating the hint bit or when
>+  performing on-access pruning.

This is great. Just wanted to re-iterate that notes like this are
really helpful to understanding this view.

> I've implemented a change using the same function pg_settings uses to
> turn the build-time parameter BLCKSZ into 8kB (get_config_unit_name())
> using the flag GUC_UNIT_BLOCKS. I am unsure if this is better or worse
> than "block_size". I am feeling very conflicted about this column.

Yeah, I guess it feels less natural here than in pg_settings, but it
still kind of feels like one way of doing this is better than two...

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-16 Thread Maciek Sakrejda

On Thu, Oct 13, 2022 at 10:29 AM Melanie Plageman
 wrote:
> I think that it makes sense to count both the initial buffers added to
> the ring and subsequent shared buffers added to the ring (either when
> the current strategy buffer is pinned or in use or when a bulkread
> rejects dirty strategy buffers in favor of new shared buffers) as
> strategy clocksweeps because of how the statistic would be used.
>
> Clocksweeps give you an idea of how much of your working set is cached
> (setting aside initially reading data into shared buffers when you are
> warming up the db). You may use clocksweeps to determine if you need to
> make shared buffers larger.
>
> Distinguishing strategy buffer clocksweeps from shared buffer
> clocksweeps allows us to avoid enlarging shared buffers if most of the
> clocksweeps are to bring in blocks for the strategy operation.
>
> However, I could see an argument that discounting strategy clocksweeps
> done because the current strategy buffer is pinned makes the number of
> shared buffer clocksweeps artificially low since those other queries
> using the buffer would have suffered a cache miss were it not for the
> strategy. And, in this case, you would take strategy clocksweeps
> together with shared clocksweeps to make your decision. And if we
> include buffers initially added to the strategy ring in the strategy
> clocksweep statistic, this number may be off because those blocks may
> not be needed in the main shared working set. But you won't know that
> until you try to reuse the buffer and it is pinned. So, I think we don't
> have a better option than counting initial buffers added to the ring as
> strategy clocksweeps (as opposed to as reuses).
>
> So, in answer to your question, no, I cannot think of a scenario like
> that.

That analysis makes sense to me; thanks.

> It also made me remember that I am incorrectly counting rejected buffers
> as reused. I'm not sure if it is a good idea to subtract from reuses
> when a buffer is rejected. Waiting until after it is rejected to count
> the reuse will take some other code changes. Perhaps we could also count
> rejections in the stats?

I'm not sure what makes sense here.

> > Not critical, but is there a list of backend types we could
> > cross-reference elsewhere in the docs?
>
> The most I could find was this longer explanation (with exhaustive list
> of types) in pg_stat_activity docs [1]. I could duplicate what it says
> or I could link to the view and say "see pg_stat_activity" for a
> description of backend_type" or something like that (to keep them from
> getting out of sync as new backend_types are added. I suppose I could
> also add docs on backend_types, but I'm not sure where something like
> that would go.

I think linking pg_stat_activity is reasonable for now. A separate
section for this might be nice at some point, but that seems out of
scope.

> > From the io_context column description:
> >
> > +   The autovacuum daemon, explicit VACUUM,
> > explicit
> > +   ANALYZE, many bulk reads, and many bulk
> > writes use a
> > +   fixed amount of memory, acquiring the equivalent number of
> > shared
> > +   buffers and reusing them circularly to avoid occupying an
> > undue portion
> > +   of the main shared buffer pool.
> > +  
> >
> > I don't understand how this is relevant to the io_context column.
> > Could you expand on that, or am I just missing something obvious?
> >
>
> I'm trying to explain why those other IO Contexts exist (bulkread,
> bulkwrite, vacuum) and why they are separate from shared buffers.
> Should I cut it altogether or preface it with something like: these are
> counted separate from shared buffers because...?

Oh I see. That makes sense; it just wasn't obvious to me this was
talking about the last three values of io_context. I think a brief
preface like that would be helpful (maybe explicitly with "these last
three values", and I think "counted separately").

> > + 
> > +   > role="column_definition">
> > +   extended bigint
> > +  
> > +  
> > +   Extends of relations done by this
> > backend_type in
> > +   order to write data in this io_context.
> > +  
> > + 
> >
> > I understand what this is, but not why this is something I might want
> > to know about.
>
> Unlike writes, backends largely have to do their own extends, so
> separating this from writes lets us determine whether or not we need to
> change checkpointer/bgwriter to be more aggressive using the writes
> without the distraction of the extends. Should I mention this in the
> docs? The other stats views don't seems to editorialize at all, and I
> wasn't sure if this was an objective enough point to include in docs.

Thanks for the clarification. Just to make sure I understand, you mean
that if I see a high extended count, that may be interesting in terms
of write activity, but I can't fix that by tuning--it's just the
nature of my workload?

I think you're right that this is

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-10 Thread Maciek Sakrejda

Thanks for working on this! Like Lukas, I'm excited to see more
visibility into important parts of the system like this.

On Mon, Oct 10, 2022 at 11:49 AM Melanie Plageman
 wrote:
>
> I've gone ahead and implemented option 1 (commented below).

No strong opinion on 1 versus 2, but I guess at least partly because I
don't understand the implications (I do understand the difference,
just not when it might be important in terms of stats). Can we think
of a situation where combining stats about initial additions with
pinned additions hides some behavior that might be good to understand
and hard to pinpoint otherwise?

I took a look at the latest docs (as someone mostly familiar with
internals at only a pretty high level, so probably somewhat close to
the target audience) and have some feedback.

+ 
+  
+   backend_type text
+  
+  
+   Type of backend (e.g. background worker, autovacuum worker).
+  
+ 

Not critical, but is there a list of backend types we could
cross-reference elsewhere in the docs?

>From the io_context column description:

+   The autovacuum daemon, explicit VACUUM,
explicit
+   ANALYZE, many bulk reads, and many bulk
writes use a
+   fixed amount of memory, acquiring the equivalent number of
shared
+   buffers and reusing them circularly to avoid occupying an
undue portion
+   of the main shared buffer pool.
+  

I don't understand how this is relevant to the io_context column.
Could you expand on that, or am I just missing something obvious?

+ 
+  
+   extended bigint
+  
+  
+   Extends of relations done by this
backend_type in
+   order to write data in this io_context.
+  
+ 

I understand what this is, but not why this is something I might want
to know about.

And from your earlier e-mail:

On Thu, Oct 6, 2022 at 10:42 AM Melanie Plageman
 wrote:
>
> Because we want to add non-block-oriented IO in the future (like
> temporary file IO) to this view and want to use the same "read",
> "written", "extended" columns, I would prefer not to prefix the columns
> with "blks_". I have added a column "unit" which would contain the unit
> in which read, written, and extended are in. Unfortunately, fsyncs are
> not per block, so "unit" doesn't really work for this. I documented
> this.
>
> The most correct thing to do to accommodate block-oriented and
> non-block-oriented IO would be to specify all the values in bytes.
> However, I would like this view to be usable visually (as opposed to
> just in scripts and by tools). The only current value of unit is
> "block_size" which could potentially be combined with the value of the
> GUC to get bytes.
>
> I've hard-coded the string "block_size" into the view generation
> function pg_stat_get_io(), so, if this idea makes sense, perhaps I
> should do something better there.

That seems broadly reasonable, but pg_settings also has a 'unit'
field, and in that view, unit is '8kB' on my system--i.e., it
(presumably) reflects the block size. Is that something we should try
to be consistent with (not sure if that's a good idea, but thought it
was worth asking)?

> On Fri, Sep 30, 2022 at 7:18 PM Lukas Fittl  wrote:
> > - Overall it would be helpful if we had a dedicated documentation page on 
> > I/O statistics that's linked from the pg_stat_io view description, and 
> > explains how the I/O statistics tie into the various concepts of shared 
> > buffers / buffer access strategies / etc (and what is not tracked today)
>
> I haven't done this yet. How specific were you thinking -- like
> interpretations of all the combinations and what to do with what you
> see? Like you should run pg_prewarm if you see X? Specific checkpointer
> or bgwriter GUCs to change? Or just links to other docs pages on
> recommended tunings?
>
> Were you imagining the other IO statistics views (like
> pg_statio_all_tables and pg_stat_database) also being included in this
> page? Like would it be a comprehensive guide to IO statistics and what
> their significance/purposes are?

I can't speak for Lukas here, but I encouraged him to suggest more
thorough documentation in general, so I can speak to my concerns: in
general, these stats should be usable for someone who does not know
much about Postgres internals. It's pretty low-level information,
sure, so I think you need some understanding of how the system broadly
works to make sense of it. But ideally you should be able to find what
you need to understand the concepts involved within the docs.

I think your updated docs are much clearer (with the caveats of my
specific comments above). It would still probably be helpful to have a
dedicated page on I/O stats (and yeah, something with a broad scope,
along the lines of a comprehensive guide), but I think that can wait
until a future patch.

Thanks,
Maciek

Re: warn if GUC set to an invalid shared library

2022-07-22 Thread Maciek Sakrejda

Thanks for picking this back up, Justin.

>I've started to think that we should really WARN whenever a (set of) GUC is set
>in a manner that the server will fail to start - not just for shared libraries.

+0.5. I think it's a reasonable change, but I've never broken my
server with anything other than shared_preload_libraries, so I'd
rather see an improvement here first rather than expanding scope. I
think shared_preload_libraries (and friends) is especially tricky due
to the syntax, and more likely to lead to problems.

On the update patch itself, I have some minor feedback about message wording

postgres=# set local_preload_libraries=xyz;
SET

Great, it's nice that this no longer gives a warning.

postgres=# alter role bob set local_preload_libraries = xyz;
WARNING:  could not access file "xyz"
DETAIL:  New sessions will currently fail to connect with the new setting.
ALTER ROLE

The warning makes sense, but the detail feels a little awkward. I
think "currently" is sort of redundant with "new setting". And it
could be clearer that the setting did in fact take effect (I know the
ALTER ROLE command tag echo tells you that, but we could reinforce
that in the warning).

Also, I know I said last time that the scope of the warning is clear
from the setting, but looking at it again, I think we could do better.
I guess because when we're generating the error, we don't know whether
the source was ALTER DATABASE or ALTER ROLE, we can't give a more
specific message? Ideally, I think the DETAIL would be something like
"New sessions for this role will fail to connect due to this setting".
Maybe even with a HINT of "Run ALTER ROLE again with a valid value to
fix this"? If that's not feasible, maybe  "New sessions for this role
or database will fail to connect due to this setting"? That message is
not as clear about the impact of the change as it could be, but
hopefully you know what command you just ran, so that should make it
unambiguous. I do think without qualifying that, it suggests that all
new sessions are affected.

Hmm, or maybe just "New sessions affected by this setting will fail to
connect"? That also makes the scope clear without the warning having
to be aware of the scope: if you just ran ALTER DATABASE it's pretty
clear that what is affected by the setting is the database. I think
this is probably the way to go, but leaving my thought process above
for context.

postgres=# alter system set shared_preload_libraries = lol;
WARNING:  could not access file "$libdir/plugins/lol"
DETAIL:  The server will currently fail to start with this setting.
HINT:  If the server is shut down, it will be necessary to manually
edit the postgresql.auto.conf file to allow it to start again.
ALTER SYSTEM

I think this works. Tom's copy edit above omitted "currently" from the
DETAIL and did not include the "$libdir/plugins/" prefix. I don't feel
strongly about these either way.

2022-07-22 10:37:50.217 PDT [1131187] LOG:  database system is shut down
2022-07-22 10:37:50.306 PDT [1134058] WARNING:  could not access file
"$libdir/plugins/lol"
2022-07-22 10:37:50.306 PDT [1134058] DETAIL:  The server will
currently fail to start with this setting.
2022-07-22 10:37:50.306 PDT [1134058] HINT:  If the server is shut
down, it will be necessary to manually edit the postgresql.auto.conf
file to allow it to start again.
2022-07-22 10:37:50.312 PDT [1134058] FATAL:  could not access file
"lol": No such file or directory
2022-07-22 10:37:50.312 PDT [1134058] CONTEXT:  while loading shared
libraries for setting "shared_preload_libraries"
from /home/maciek/code/aux/postgres/tmpdb/postgresql.auto.conf:3
2022-07-22 10:37:50.312 PDT [1134058] LOG:  database system is shut down

Hmm, I guess this is a side effect of where these messages are
emitted, but at this point, lines 4-6 here are a little confusing, no?
The server was already shut down, and we're trying to start it back
up. If there's no reasonable way to avoid that output, I think it's
okay, but it'd be better to skip it (or adjust it to the new context).

Thanks,
Maciek

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2022-07-15 Thread Maciek Sakrejda

I ran that original test case with and without the patch. Here are the
numbers I'm seeing:

master (best of three):

postgres=# SELECT count(*) FROM lotsarows;
Time: 582.423 ms

postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM lotsarows;
Time: 616.102 ms

postgres=# EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
Time: 1068.700 ms (00:01.069)

patched (best of three):

postgres=# SELECT count(*) FROM lotsarows;
Time: 550.822 ms

postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM lotsarows;
Time: 612.572 ms

postgres=# EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
Time: 690.875 ms

On Fri, Jul 1, 2022 at 10:26 AM Andres Freund  wrote:
> On 2022-07-01 01:23:01 -0700, Lukas Fittl wrote:
>...
> > Known WIP problems with this patch version:
> >
> > * There appears to be a timing discrepancy I haven't yet worked out, where
> >   the \timing data reported by psql doesn't match what EXPLAIN ANALYZE is
> >   reporting. With Andres' earlier test case, I'm seeing a consistent ~700ms
> >   higher for \timing than for the EXPLAIN ANALYZE time reported on the
> > server
> >   side, only when rdtsc measurement is used -- its likely there is a problem
> >   somewhere with how we perform the cycles to time conversion
>
> Could you explain a bit more what you're seeing? I just tested your patches
> and didn't see that here.

I did not see this either, but I did see that the execution time
reported by \timing is (for this test case) consistently 0.5-1ms
*lower* than the Execution Time reported by EXPLAIN. I did not see
that on master. Is that expected?

Thanks,
Maciek

Re: Add id's to various elements in protocol.sgml

2022-02-24 Thread Maciek Sakrejda

On Thu, Feb 24, 2022, 16:52 Kyotaro Horiguchi 
wrote:

> At Tue, 21 Dec 2021 08:47:27 +0100, Brar Piening  wrote in
> > On 20.12.2021 at 16:09, Robert Haas wrote:
> > > As a data point, this is something I have also wanted to do, from time
> > > to time. I am generally of the opinion that any place the
>
> +1 from me.  When I put an URL in the answer for inquiries, I always
> look into the html for name/id tags so that the inquirer quickly find
> the information source (or the backing or reference point) on the
> page.


+1 here as well. I often do the exact same thing. It's not hard, but it's a
little tedious, especially considering most modern doc systems support
linkable sections.

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-14 Thread Maciek Sakrejda

Andrew made a good case above for avoiding LOG:

>I do think we should be wary of any name starting with "LOG", though.
>Long experience tells us that's something that confuses users when it
refers to the WAL.

Re: warn if GUC set to an invalid shared library

2022-02-14 Thread Maciek Sakrejda

On Wed, Feb 9, 2022 at 5:58 PM Justin Pryzby  wrote:
> FYI, it has said "while..." and hasn't said "guc" since the 2nd revision of 
> the
> patch.

The v3-0001 attached above had "while... for GUC..."--sorry I wasn't clear.

In v4, the message looks fine to me for shared_preload_libraries
(except there is a doubled "is"). However, I also get the message for
a simple SET with local_preload_libraries:

postgres=# set local_preload_libraries=xyz;
WARNING:  could not access file "xyz"
HINT:  The server will fail to start with the existing configuration.
If it is is shut down, it will be necessary to manually edit the
postgresql.auto.conf file to allow it to start.
SET

I'm not familiar with that setting (reading the docs, it's like a
non-superuser session_preload_libraries for compatible modules?), but
given nothing is being persisted here with ALTER SYSTEM, this seems
incorrect.

Changing session_preload_libraries emits a similar warning:

postgres=# set session_preload_libraries = foo;
WARNING:  could not access file "$libdir/plugins/foo"
HINT:  New sessions will fail with the existing configuration.
SET

This is also not persisted, so I think this is also incorrect, right?
(I'm not sure what setting session_preload_libraries without an ALTER
ROLE or ALTER DATABASE accomplishes, given a new session is required
for the change to take effect, but I thought I'd point this out.) I'm
guessing this may be due to trying to have the warning for ALTER ROLE?

postgres=# alter role bob set session_preload_libraries = foo;
WARNING:  could not access file "$libdir/plugins/foo"
HINT:  New sessions will fail with the existing configuration.
ALTER ROLE

This is great. Ideally, we'd qualify this with "New sessions for
user..." or "New sessions for database..." but given you get the
warning right after running the relevant command, maybe that's clear
enough.

> $ ./tmp_install/usr/local/pgsql/bin/postgres -D 
> src/test/regress/tmp_check/data -clogging_collector=on
> 2022-02-09 19:53:48.034 CST postmaster[30979] FATAL:  could not access file 
> "a": No such file or directory
> 2022-02-09 19:53:48.034 CST postmaster[30979] CONTEXT:  while loading shared 
> libraries for setting "shared_preload_libraries"
> from 
> /home/pryzbyj/src/postgres/src/test/regress/tmp_check/data/postgresql.auto.conf:3
> 2022-02-09 19:53:48.034 CST postmaster[30979] LOG:  database system is shut 
> down
>
> Maybe it's enough to show the GucSource rather than file:line...

This is great. I think the file:line output is helpful here.

Thanks,
Maciek

Re: CREATEROLE and role ownership hierarchies

2022-02-03 Thread Maciek Sakrejda

I'm chiming in a little late here, but as someone who worked on a
system to basically work around the lack of unprivileged CREATE ROLE
for a cloud provider (I worked on the Heroku Data team for several
years), I thought it might be useful to offer my perspective. This is,
of course, not the only use case, but maybe it's useful to have
something concrete. As a caveat, I don't know how current this still
is (I no longer work there, though the docs [1] seem to still describe
the same system), or if there are better ways to achieve the goals of
a service provider.

Broadly, the general use case is something like what Robert has
sketched out in his e-mails. Heroku took care of setting up the
database, archiving, replication, and any other system-level config.
It would then keep the superuser credentials private, create a
database, and a user that owned that database and had all the
permissions we could grant it without compromising the integrity of
the system. (We did not want customers to break their databases, both
to ensure a better user experience and to avoid getting paged.)
Initially, this meant customers got just the one database user because
of CREATE ROLE's limitations.

To work around that, at some point, we added an API that would CREATE
ROLE for you, accessible through a dashboard and the Heroku CLI. This
ran CREATE ROLE (or DROP ROLE) for you, but otherwise it largely let
you configure the resulting roles as you pleased (using the original
role we create for you). We wanted to avoid reinventing the wheel as
much as possible, and the customer database (including the role
configuration) was mostly a black box for us (we did manage some
predefined permissions configurations through our dashboard, but the
Postgres catalogs were the source of truth for that).

Thinking about how this would fit into a potential non-superuser
CREATE ROLE world, the sandbox superuser model discussed above covers
this pretty well, though I share some of Robert's concerns around how
this fits into existing systems.

Hope this is useful feedback. Thanks for working on this!

[1]: https://devcenter.heroku.com/articles/heroku-postgresql-credentials

Re: warn if GUC set to an invalid shared library

2022-02-03 Thread Maciek Sakrejda

On Wed, Feb 2, 2022 at 7:39 AM David G. Johnston 
wrote:

> I would at least consider having the UX go something like:
>
> postgres=# ALTER SYSTEM SET shared_preload_libraries = not_such_library;
> ERROR:  that library is not present>.
> HINT: to bypass the error please add FORCE before SET
> postgres=# ALTER SYSTEM FORCE SET shared_preload_libraries =
> no_such_library;
> NOTICE: Error suppressed while setting shared_preload_libraries.
>
> That is, have the user express their desire to leave the system in a
> precarious state explicitly before actually doing so.
>

While I don't have a problem with that behavior, given that there are
currently no such facilities for asserting "yes, really" with ALTER SYSTEM,
I don't think it's worth introducing that just for this patch. A warning
seems like a reasonable first step. This can always be expanded later. I'd
rather see a warning ship than move the goalposts out of reach.

Re: warn if GUC set to an invalid shared library

2022-02-01 Thread Maciek Sakrejda

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

I tried the latest version of the patch, and it works as discussed. There is no 
documentation, but I think that's moot for this warning (we may want to note 
something in the setting docs, but even if so, I think we should figure out the 
message first and then decide if it merits additional explanation in the docs). 
I do not know whether it is spec-compliant, but I doubt the spec has much to 
say on something like this.

I tried running ALTER SYSTEM and got the warnings as expected:

postgres=# alter system set shared_preload_libraries = 
no_such_library,not_this_one_either;
WARNING:  could not access file "$libdir/plugins/no_such_library"
WARNING:  could not access file "$libdir/plugins/not_this_one_either"
ALTER SYSTEM

I think this is great, but it would be really helpful to also indicate that at 
this point the server will fail to come back up after a restart. In my mind, 
that's a big part of the reason for having a warning here. Having made this 
mistake a couple of times, I would be able to read between the lines, as would 
many other users, but if you're not familiar with Postgres this might still be 
pretty opaque. I think if I'm reading the code correctly, this warning path is 
shared between ALTER SYSTEM and a SET of local_preload_libraries so it might be 
tricky to word this in a way that works in all situations, but it could make 
the precarious situation a lot clearer. I don't really know a good wording 
here, but maybe a hint like "The server or session will not be able to start if 
it has been configured to use libraries that cannot be loaded."?

Also, there are two sides to this: one is actually applying the possibly-bogus 
setting, and the other is when that setting takes effect (e.g., attempting to 
start the server or to start a new session). I think Robert had good feedback 
regarding the latter:

On Fri, Jan 28, 2022 at 6:42 AM Robert Haas  wrote:
> On Tue, Dec 28, 2021 at 12:45 PM Justin Pryzby  wrote:
> > 0002 adds context when failing to start.
> >
> > 2021-12-27 17:01:12.996 CST postmaster[1403] WARNING:  could not 
> > load library: $libdir/plugins/asdf: cannot open shared object file: No such 
> > file or directory
> > 2021-12-27 17:01:14.938 CST postmaster[1403] FATAL:  could not 
> > access file "asdf": No such file or directory
> > 2021-12-27 17:01:14.938 CST postmaster[1403] CONTEXT:  guc 
> > "shared_preload_libraries"
> > 2021-12-27 17:01:14.939 CST postmaster[1403] LOG:  database system 
> > is shut down
>
> -1 from me on using "guc" in any user-facing error message. And even
> guc -> setting isn't a big improvement. If we're going to structure
> the reporting this way there, we should try to use a meaningful phrase
> there, probably beginning with the word "while"; see "git grep
> errcontext.*while" for interesting precedents.
>
> That said, that series of messages seems a bit suspect to me, because
> the WARNING seems to be stating the same problem as the subsequent
> FATAL and CONTEXT lines. Ideally we'd tighten that somehow.

Maybe we don't even need the WARNING in this case? At this point, it's clear 
what the problem is. I think the CONTEXT line does actually help, because 
otherwise it's not clear why the server failed to start, but the warning does 
seem superfluous here. I do agree that GUC is awkward here, and I like the 
"while..." wording suggested both for consistency with other messages and how 
it could work here:

CONTEXT: while loading "shared_preload_libraries"

I think that would be pretty clear. In the ALTER SYSTEM case, you still need to 
know to edit the file in spite of the warning telling you not to edit it, but I 
think that's still better. Based on Cary's feedback, maybe that could be 
clearer, too (like you, I'm not sure if I understood what he did correctly), 
but I think that could certainly be future work.

Thanks,
Maciek

Re: warn if GUC set to an invalid shared library

2022-01-09 Thread Maciek Sakrejda

On Sat, Jan 8, 2022 at 2:07 PM Justin Pryzby  wrote:
> Unfortunately, the output for dlopen() is not portable, which (I think) means
> most of what I wrote can't be made to work..  Since it doesn't work to call
> dlopen() when using SET, I tried using just stat().  But that also fails on
> windows, since one of the regression tests has an invalid filename involving
> unbalanced quotes, which cause it to return EINVAL rather than ENOENT.  So SET
> cannot warn portably, unless it includes no details at all (or we specially
> handle the windows case), or change the pre-existing regression test.  But
> there's a 2nd instability, too, apparently having to do with timing.  So I'm
> planning to drop the 0001 patch.

Hmm. I think 001 is a big part of the usability improvement here.
Could we not at least warn generically, without relaying the
underlying error? The notable thing in this situation is that the
specified library could not be loaded (and that it will almost
certainly cause problems on restart). The specific error would be nice
to have, but it's less important. What is the timing instability?

> > On Tue, Dec 28, 2021 at 9:45 AM Justin Pryzby  wrote:
> > For whatever reason, I get slightly different (and somewhat redundant)
> > output on failing to start:
> >
> > 2022-01-08 12:59:36.784 PST [324482] WARNING:  could not load library: 
> > $libdir/plugins/totally bogus: cannot open shared object file: No such file 
> > or directory
> > 2022-01-08 12:59:36.787 PST [324482] FATAL:  could not load library: 
> > totally bogus: cannot open shared object file: No such file or directory
> > 2022-01-08 12:59:36.787 PST [324482] LOG:  database system is shut down
>
> I think the first WARNING is from the GUC mechanism "setting" the library.
> And then the FATAL is from trying to apply the GUC.
> It looks like you didn't apply the 0002 patch for that test so got no CONTEXT 
> ?

I still had the terminal open where I tested this, and the scrollback
did show me applying the patch (and building after). I tried a make
clean and applying the patch again, and I do see the CONTEXT line now.
I'm not sure what the problem was but seems like PEBKAC--sorry about
that.

Thanks,
Maciek

Re: warn if GUC set to an invalid shared library

2022-01-08 Thread Maciek Sakrejda

On Thu, Dec 30, 2021 at 12:21 AM Bharath Rupireddy
 wrote:
> Overall the idea looks good to me. A warning on ALTER SYSTEM SET seems
> reasonable than nothing. ERROR isn't the way to go as it limits the
> users of setting the extensions in shared_preload_libraries first,
> installing them later. Is NOTICE here a better idea than WARNING?

I don't think so--I'm skeptical that "updated shared_preload_libraries
first, then install them" is much more than a theoretical use case. We
may not want to block that off completely, but I think a warning is
reasonable here, because you're *probably* doing something wrong if
you get to this message at all (and if you're not, you're probably
familiar enough with Postgres to know to ignore the warning).

Thanks,
Maciek

Re: warn if GUC set to an invalid shared library

2022-01-08 Thread Maciek Sakrejda

Thanks for working on this! I tried it out and it worked for me. I
reviewed the patch and didn't see any problems, but I'm not much of a
C programmer.

On Tue, Dec 28, 2021 at 9:45 AM Justin Pryzby  wrote:
> 0002 adds context when failing to start.
>
> 2021-12-27 17:01:12.996 CST postmaster[1403] WARNING:  could not load 
> library: $libdir/plugins/asdf: cannot open shared object file: No such file 
> or directory
> 2021-12-27 17:01:14.938 CST postmaster[1403] FATAL:  could not access 
> file "asdf": No such file or directory
> 2021-12-27 17:01:14.938 CST postmaster[1403] CONTEXT:  guc 
> "shared_preload_libraries"
> 2021-12-27 17:01:14.939 CST postmaster[1403] LOG:  database system is 
> shut down

For whatever reason, I get slightly different (and somewhat redundant)
output on failing to start:

2022-01-08 12:59:36.784 PST [324482] WARNING:  could not load library:
$libdir/plugins/totally bogus: cannot open shared object file: No such
file or directory
2022-01-08 12:59:36.787 PST [324482] FATAL:  could not load library:
totally bogus: cannot open shared object file: No such file or
directory
2022-01-08 12:59:36.787 PST [324482] LOG:  database system is shut down

I'm on a pretty standard Ubuntu 20.04 (with PGDG packages installed
for 11, 12, and 13). I did configure to a --prefix and set
LD_LIBRARY_PATH and PATH. Not sure if this is an issue in my
environment or a platform difference?

Also, regarding the original warning:

2022-01-08 12:57:24.953 PST [324338] WARNING:  could not load library:
$libdir/plugins/totally bogus: cannot open shared object file: No such
file or directory

I think this is pretty clear to users familiar with
shared_preload_libraries. However, for someone less experienced with
Postgres and just following instructions on how to set up, e.g.,
auto_explain (and making a typo), it's not clear from this message
that your server will fail to start again after this if you shut it
down (or it crashes!), and how to get out of this situation. Should we
add a HINT to that effect?

Similarly, for

> 0001 adds WARNINGs when doing SET:
>
> postgres=# SET local_preload_libraries=xyz;
> WARNING:  could not load library: xyz: cannot open shared object 
> file: No such file or directory
> SET

This works for me, but should this explain the impact (especially if
used with something like ALTER ROLE)? I guess that's probably because
the context may not be easily accessible. I think
shared_preload_libraries is much more common, though, so I'm more
interested in a warning there.

Thanks,
Maciek

Re: [EXTERNAL] Re: should we document an example to set multiple libraries in shared_preload_libraries?

2021-12-18 Thread Maciek Sakrejda

On Fri, Dec 10, 2021 at 10:10 AM Godfrin, Philippe E
 wrote:
> I may have missed something in this stream, but is this a system controlled 
> by Patroni?

In my case, no: it's a pretty vanilla PGDG install on Ubuntu 20.04.
Thanks for the context, though.

Thanks,
Maciek

Re: should we document an example to set multiple libraries in shared_preload_libraries?

2021-12-18 Thread Maciek Sakrejda

On Thu, Dec 9, 2021 at 7:42 AM Bharath Rupireddy
 wrote:
> How about ALTER SYSTEM SET shared_preload_libraries command itself
> checking for availability of the specified libraries (after fetching
> library names from the parsed string value) with stat() and then
> report an error if any of the libraries doesn't exist? Currently, it
> just accepts if the specified value passes the parsing.

That certainly would have helped me. I guess it technically breaks the
theoretical use case of "first change the shared_preload_libraries
setting, then install those libraries, then restart Postgres," but I
don't see why anyone would do that in practice.

For what it's worth, I don't even feel strongly that this needs to be
an error—just that the current flow around this is error-prone due to
several sharp edges and could be improved. I would be happy with an
error, but a warning or other mechanism could work, too. I do think
better documentation is not enough: the differences between a working
setting value and a broken one are pretty subtle.

Thanks,
Maciek

Re: should we document an example to set multiple libraries in shared_preload_libraries?

2021-12-08 Thread Maciek Sakrejda

On Wed, Dec 1, 2021 at 5:15 AM Tom Lane  wrote:
>
> Justin Pryzby  writes:
> > +1 to document it, but it seems like the worse problem is allowing the 
> > admin to
> > write a configuration which causes the server to fail to start, without 
> > having
> > issued a warning.
>
> > I think you could fix that with a GUC check hook to emit a warning.
> > ...
>
> Considering the vanishingly small number of actual complaints we've
> seen about this, that sounds ridiculously over-engineered.
> A documentation example should be sufficient.

I don't know if this will tip the scales, but I'd like to lodge a
belated complaint. I've gotten myself in this server-fails-to-start
situation several times (in development, for what it's worth). The
syntax (as Bharath pointed out in the original message) is pretty
picky, there are no guard rails, and if you got there through ALTER
SYSTEM, you can't fix it with ALTER SYSTEM (because the server isn't
up). If you go to fix it manually, you get a scary "Do not edit this
file manually!" warning that you have to know to ignore in this case
(that's if you find the file after you realize what the fairly generic
"FATAL: ... No such file or directory" error in the log is telling
you). Plus you have to get the (different!) quoting syntax right or
cut your losses and delete the change.

I'm over-dramatizing this a bit, but I do think there are a lot of
opportunities to make mistakes here, and this behavior could be more
user-friendly beyond just documentation changes. If a config change is
bogus most likely due to a quoting mistake or a typo, a warning would
be fantastic (i.e., the stat() check Justin suggested). Or maybe the
FATAL log message on a failed startup could include the source of the
problem?

Thanks,
Maciek

TOAST questions

2021-07-07 Thread Maciek Sakrejda

Hello,

(I hope it's okay to ask general internals questions here; if this list is
strictly for development, I'll keep my questions on -general but since I'm
asking about internal behavior, this seemed more appropriate.)

I was playing around with inspecting TOAST tables in order to understand
the mechanism better, and I ran across a strange issue: I've created a
table that has a text column, inserted and then deleted some data, and the
TOAST table still has some entries even though the owning table is now
empty:

maciek=# SELECT reltoastrelid::regclass FROM pg_class WHERE relname =
'users';
   reltoastrelid
---
 pg_toast.pg_toast_4398034
(1 row)

maciek=# select chunk_id, chunk_seq from pg_toast.pg_toast_4398034;
 chunk_id | chunk_seq
--+---
  4721665 | 0
  4721665 | 1
(2 rows)

maciek=# select * from users;
 id | created_at | is_admin | username
++--+--
(0 rows)

I've tried to reproduce this with a new table in the same database, but
could not see the same behavior (the TOAST table entries are deleted when I
delete rows from the main table). This is 11.12. Is this expected?

In case it's relevant, this table originally had the first three columns. I
inserted one row, then added the text column, set its STORAGE to EXTERNAL,
and set toast_tuple_target to the minimum of 128. I inserted a few more
rows until I found one large enough to go in the TOAST table (It looks like
Postgres attempts to compress values and store them inline first even when
STORAGE is EXTERNAL? I don't recall the exact size, but I had to use a
value much larger than 128 before it hit the TOAST table. The TOAST docs
allude to this behavior but just making sure I understand correctly.), then
I deleted the rows with non-NULL values in the text column, and noticed the
TOAST table entries were still there. So I deleted everything in the users
table and still see the two rows above in the TOAST table. I've tried this
sequence of steps again with a new table and could not reproduce the issue.

Thanks,
Maciek

Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Maciek Sakrejda

On Thu, May 13, 2021 at 8:38 AM Julien Rouhaud  wrote:
>
> On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:
> >
> > The warning makes it clear that there's something wrong, but not how
> > to fix it
>
> I'm confused, are we talking about the new warning in v2 as suggested by 
> Pavel?
> As it should make things quite clear:
>
> +SELECT count(*) FROM pg_stat_statements;
> +WARNING:  Query identifier calculation seems to be disabled
> +HINT:  If you don't want to use a third-party module to compute query 
> identifiers, you may want to enable compute_query_id
>
> The wording can of course be improved.

I meant that no warning can be as clear as things just working, but I
do have feedback on the specific message here:

 * "seems to" be disabled? Is it? Any reason not to be more definitive here?
 * On reading the beginning of the hint, I can see users asking
themselves, "Do I want to use a third-party module to compute query
identifiers?"
 * "may want to enable"? Are there any situations where I don't want
to use a third-party module *and* I don't want to enable
compute_query_id?

So maybe something like

> +SELECT count(*) FROM pg_stat_statements;
> +WARNING:  Query identifier calculation is disabled
> +HINT:  You must enable compute_query_id or configure a third-party module to 
> compute query identifiers in order to use pg_stat_statements.

(I admit "configure a third-party module" is pretty vague, but I think
that suggests it's only an option to consider if you know what you're
doing.)

Also, if you're configuring this for usage with a tool like pganalyze,
and neglect to run a manual query (we guide users to do that, but they
may skip that step), the warnings may not even be visible (the Go
driver we are using does not surface client warnings). Should this be
an error instead of a warning? Is it ever useful to get an empty
result set from querying pg_stat_statements? Using an error here would
parallel the behavior of shared_preload_libraries not including
pg_stat_statements.

Re: compute_query_id and pg_stat_statements

2021-05-13 Thread Maciek Sakrejda

On Thu, May 13, 2021 at 7:42 AM Bruce Momjian  wrote:
>
> On Thu, May 13, 2021 at 12:03:42PM +0800, Julien Rouhaud wrote:
> > On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
> > I don't know what to say.  So here is a summary of the complaints that I'm
> > aware of:
> >
> > - 
> > https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.ca...@moonset.ru
> > That was only a couple of days after the commit just before the feature 
> > freeze,
> > so it may be the less relevant complaint.
> >
> > - 
> > https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com
> > Did a git bisect to find the commit that changed the behavior and somehow
> > didn't notice the new setting
> >
> > - this thread, with Fuji-san saying:
> >
> > > I'm afraid that users may easily forget to enable compute_query_id when 
> > > using
> > > pg_stat_statements (because this setting was not necessary in v13 or 
> > > before)
> >
> > - this thread, with Peter E. saying:
> >
> > > Now there is the additional burden of turning on this weird setting that
> > > no one understands.  That's a 50% increase in burden.  And almost no one 
> > > will
> > > want to use a nondefault setting.  pg_stat_statements is pretty popular.  
> > > I
> > > think leaving in this requirement will lead to widespread confusion and
> > > complaints.
> >
> > - this thread, with Pavel saying:
> >
> > > Until now, the pg_stat_statements was zero-config. So the change is not 
> > > user
> > > friendly.
> >
> > So it's a mix of "it's changing something that didn't change in a long time"
> > and "it's adding extra footgun and/or burden as it's not doing by default 
> > what
> > the majority of users will want", with an overwhelming majority of people
> > supporting the "we don't want that extra burden".
>
> Well, now that we have clear warnings when it is misconfigured,
> especially when querying the pg_stat_statements view, are these
> complaints still valid?   Also, how is modifying
> shared_preload_libraries zero-config, but modifying
> shared_preload_libraries and compute_query_id a huge burden?

The warning makes it clear that there's something wrong, but not how
to fix it (as I noted in another message in this thread, a web search
for pg_stat_statements docs still leads to an obsolete version). I
don't think anyone is arguing that this is insurmountable for all
users, but it is additional friction, and every bit of friction makes
Postgres harder to use. Users don't read documentation, or misread
documentation, or just are not sure what the documentation or the
warning is telling them, in spite of our best efforts.

And you're right, modifying shared_preload_libraries is not
zero-config--I would love it if we could drop that requirement ;).
Still, adding another setting is clearly one more thing to get wrong.

> I am personally not comfortable committing a patch to add an auto option
> the way it is implemented, so another committer will need to take
> ownership of this, or the entire feature can be removed.

Assuming we do want to avoid additional configuration requirements for
pg_stat_statements, is there another mechanism you feel would work
better? Or is that constraint incompatible with sane behavior for this
feature?

Re: compute_query_id and pg_stat_statements

2021-05-12 Thread Maciek Sakrejda

On Wed, May 12, 2021 at 9:58 PM Julien Rouhaud  wrote:
> Le jeu. 13 mai 2021 à 12:52, Maciek Sakrejda  a écrit :
>>
>> For what it's worth, I don't think the actual changing of an extra
>> setting is that big a burden: it's the figuring out that you need to
>> change it, and how you should configure it, that is the problem.
>> Especially since all major search engines still seem to return 9.4 (!)
>> documentation as the first hit for a "pg_stat_statements" search. The
>> common case (installing pg_stat_statements but not tweaking query id
>> generation) should be simple.
>
>
> the v2 patch I sent should address both your requirements.

Yes, thanks--I just tried it and this is great. I just wanted to argue
against reversing course here.

Re: compute_query_id and pg_stat_statements

2021-05-12 Thread Maciek Sakrejda

On Wed, May 12, 2021 at 9:03 PM Julien Rouhaud  wrote:
>
> On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
> > How do they know to set shared_preload_libraries then?  We change the
> > user API all the time, especially in GUCs, and even rename them, but for
> > some reason we don't think people using pg_stat_statements can be
> > trusted to read the release notes and change their behavior.  I just
> > don't get it.
>
> I don't know what to say.  So here is a summary of the complaints that I'm
> aware of:
>
> - 
> https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.ca...@moonset.ru
> That was only a couple of days after the commit just before the feature 
> freeze,
> so it may be the less relevant complaint.
>
> - 
> https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com
> Did a git bisect to find the commit that changed the behavior and somehow
> didn't notice the new setting
>
> - this thread, with Fuji-san saying:
>
> > I'm afraid that users may easily forget to enable compute_query_id when 
> > using
> > pg_stat_statements (because this setting was not necessary in v13 or before)
>
> - this thread, with Peter E. saying:
>
> > Now there is the additional burden of turning on this weird setting that
> > no one understands.  That's a 50% increase in burden.  And almost no one 
> > will
> > want to use a nondefault setting.  pg_stat_statements is pretty popular.  I
> > think leaving in this requirement will lead to widespread confusion and
> > complaints.
>
> - this thread, with Pavel saying:
>
> > Until now, the pg_stat_statements was zero-config. So the change is not user
> > friendly.
>
> So it's a mix of "it's changing something that didn't change in a long time"
> and "it's adding extra footgun and/or burden as it's not doing by default what
> the majority of users will want", with an overwhelming majority of people
> supporting the "we don't want that extra burden".

For what it's worth, I don't think the actual changing of an extra
setting is that big a burden: it's the figuring out that you need to
change it, and how you should configure it, that is the problem.
Especially since all major search engines still seem to return 9.4 (!)
documentation as the first hit for a "pg_stat_statements" search. The
common case (installing pg_stat_statements but not tweaking query id
generation) should be simple.

Re: pg_stat_statements requires compute_query_id

2021-05-10 Thread Maciek Sakrejda

On Mon, May 10, 2021 at 7:43 AM Julien Rouhaud  wrote:
> On Mon, May 10, 2021 at 04:36:16PM +0200, Pavel Stehule wrote:
> > I expected just an empty column query_id and workable extension. This
> > doesn't look well.
> >
> > More, it increases the (little bit) complexity of migration to Postgres 14.
>
> This was already raised multiple times, and the latest discussion can be found
> at [1].
>
> Multiple options have been suggested, but AFAICT there isn't a clear consensus
> on what we should do exactly, so I've not been able to send a fix yet.
>
> [1]: 
> https://www.postgresql.org/message-id/flat/35457b09-36f8-add3-1d07-6034fa585ca8%40oss.nttdata.com

Before it petered out, the thread seemed to be converging toward
consensus that the situation could be improved. I work on pganalyze,
and our product requires pg_stat_statements to be enabled for a lot of
its functionality. We guide our users through enabling it, but if
additional steps are required in 14, that may be confusing. I don't
have any strong feelings on the particular mechanism that would work
best here, but it would be nice if enabling pg_stat_statements in 14
did not require more work than in 13. Even if it's just one extra
setting, it's another potential thing to get wrong and have to
troubleshoot, plus it means all existing pg_stat_statements guides out
there would become out of date.

Thanks,
Maciek

Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation

2020-06-25 Thread Maciek Sakrejda

On Wed, Jun 24, 2020 at 2:44 AM Amit Kapila  wrote:
> So, that leads to loops as 2 on "Parallel Seq Scan" and "Nested Loop" nodes.  
> Does this make sense now?

Yes, I think we're on the same page. Thanks for the additional details.

It turns out that the plan I sent at the top of the thread is actually
an older plan we had saved, all the way from April 2018. We're fairly
certain this was Postgres 10, but not sure what point release. I tried
to reproduce this on 10, 11, 12, and 13 beta, but I am now seeing
similar results to yours: Buffers and I/O Timings are rolled up into
the parallel leader, and that is propagated as expected to the Gather.
Sorry for the confusion.

On Wed, Jun 24, 2020 at 3:18 AM Maciek Sakrejda  wrote:
>So we should be seeing an average, not a sum, right?

And here I missed that the documentation specifies rows and actual
time as per-loop, but other metrics are not--they're just cumulative.
So actual time and rows are still per-"loop" values, but while rows
values are additive (the Gather combines rows from the parallel leader
and the workers), the actual time is not because the whole point is
that this work happens in parallel.

I'll report back if I can reproduce the weird numbers we saw in that
original plan or find out exactly what Postgres version it was from.

Thanks,
Maciek

Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation

2020-06-24 Thread Maciek Sakrejda

On Tue, Jun 23, 2020 at 7:55 PM Amit Kapila  wrote:
> > I don't see any other reason for
> > looping over the NL node itself in this plan. The Gather itself
> > doesn't do any real looping, right?
>
> It is right that Gather doesn't do looping but Parallel Seq Scan node does so.

Sorry, I still don't follow. How does a Parallel Seq Scan do looping?
I looked at the parallel plan docs but I don't see looping mentioned
anywhere[1]. Also, is looping not normally indicated on children,
rather than on the node doing the looping? E.g., with a standard
Nested Loop, the outer child will have loops=1 and the inner child
will have loops equal to the row count produced by the outer child
(and the Nested Loop itself will have loops=1 unless it also is being
looped over by a parent node), right?

But even aside from that, why do I need to divide by the number of
loops here, when normally Postgres presents a per-loop average?

[1]: https://www.postgresql.org/docs/current/parallel-plans.html

Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation

2020-06-23 Thread Maciek Sakrejda

On Tue, Jun 23, 2020 at 2:57 AM Amit Kapila  wrote:
> I don't think this is an odd situation because in this case, child
> nodes like "Nested Loop" and "Parallel Seq Scan" has a value of
> 'loops' as 3.  So, to get the correct stats at those nodes, you need
> to divide it by 3 whereas, at Gather node, the value of 'loops' is 1.
> If you want to verify this thing then try with a plan where loops
> should be 1 for child nodes as well, you should get the same value at
> both Gather and Parallel Seq Scan nodes.

Thanks for the response, but I still don't follow. I had assumed that
loops=3 was just from loops=1 for the parallel leader plus loops=1 for
each worker--is that not right? I don't see any other reason for
looping over the NL node itself in this plan. The Gather itself
doesn't do any real looping, right?

But even so, the documentation [1] states:

>In some query plans, it is possible for a subplan node to be executed more 
>than once. For example, the inner index scan will be executed once per outer 
>row in the above nested-loop plan. In such cases, the loops value reports the 
>total number of executions of the node, and the actual time and rows values 
>shown are averages per-execution. This is done to make the numbers comparable 
>with the way that the cost estimates are shown. Multiply by the loops value to 
>get the total time actually spent in the node.

So we should be seeing an average, not a sum, right?

[1]: 
https://www.postgresql.org/docs/current/using-explain.html#USING-EXPLAIN-ANALYZE

EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation

2020-06-22 Thread Maciek Sakrejda

Hello,

I had some questions about the behavior of some accounting in parallel
EXPLAIN plans. Take the following plan:

```
Gather  (cost=1000.43..750173.74 rows=2 width=235) (actual
time=1665.122..1665.122 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=27683 read=239573
  I/O Timings: read=687.358
  ->  Nested Loop  (cost=0.43..749173.54 rows=1 width=235) (actual
time=1660.095..1660.095 rows=0 loops=3)
Inner Unique: true
Buffers: shared hit=77579 read=657847
I/O Timings: read=2090.189
Worker 0: actual time=1657.443..1657.443 rows=0 loops=1
  Buffers: shared hit=23645 read=208365
  I/O Timings: read=702.270
Worker 1: actual time=1658.277..1658.277 rows=0 loops=1
  Buffers: shared hit=26251 read=209909
  I/O Timings: read=700.560
->  Parallel Seq Scan on public.schema_indices
(cost=0.00..748877.88 rows=35 width=235) (actual
time=136.744..1659.629 rows=32 loops=3)
  Filter: ((schema_indices.invalidated_at_snapshot_id IS
NULL) AND (NOT schema_indices.is_valid))
  Rows Removed by Filter: 703421
  Buffers: shared hit=77193 read=657847
  I/O Timings: read=2090.189
  Worker 0: actual time=69.248..1656.950 rows=32 loops=1
Buffers: shared hit=23516 read=208365
I/O Timings: read=702.270
  Worker 1: actual time=260.188..1657.875 rows=27 loops=1
Buffers: shared hit=26140 read=209909
I/O Timings: read=700.560
->  Index Scan using schema_tables_pkey on
public.schema_tables  (cost=0.43..8.45 rows=1 width=8) (actual
time=0.011..0.011 rows=0 loops=95)
  Index Cond: (schema_tables.id = schema_indices.table_id)
  Filter: (schema_tables.database_id = 123)
  Rows Removed by Filter: 1
  Buffers: shared hit=386
  Worker 0: actual time=0.011..0.011 rows=0 loops=32
Buffers: shared hit=129
  Worker 1: actual time=0.011..0.011 rows=0 loops=27
Buffers: shared hit=111
Planning Time: 0.429 ms
Execution Time: 1667.373 ms
```

The Nested Loop here aggregates data for metrics like `buffers read`
from its workers, and to calculate a metric like `buffers read` for
the parallel leader, we can subtract the values recorded in each
individual worker. This happens in the Seq Scan and Index Scan
children, as well. However, the Gather node appears to only include
values from its direct parallel leader child (excluding that child's
workers).

This leads to the odd situation that the Gather has lower values for
some of these metrics than its child (because the child node reporting
includes the worker metrics) even though the values are supposed to be
cumulative. This is even more surprising for something like I/O
timing, where the Gather has a lower `read` value than one of the
Nested Loop workers, which doesn't make sense in terms of wall-clock
time.

Is this behavior intentional? If so, is there an explanation of the
reasoning or the trade-offs involved? Would it not make sense to
propagate those cumulative parallel costs up the tree all the way to
the root, instead of only using the parallel leader metrics under
Gather?

Thanks,
Maciek

Re: JIT performance bug/regression & JIT EXPLAIN

2020-01-27 Thread Maciek Sakrejda

On Mon, Jan 27, 2020 at 11:01 AM Robert Haas  wrote:
> On Fri, Nov 15, 2019 at 8:05 PM Maciek Sakrejda  wrote:
> > On Fri, Nov 15, 2019 at 5:49 AM Robert Haas  wrote:
> > > Personally, I don't care very much about backward-compatibility, or
> > > about how hard it is for tools to parse. I want it to be possible, but
> > > if it takes a little extra effort, so be it.
> >
> > I think these are two separate issues. I agree on
> > backward-compatibility (especially if we can embed a server version in
> > structured EXPLAIN output to make it easier for tools to track format
> > differences), but not caring how hard it is for tools to parse? What's
> > the point of structured formats, then?
>
> To make the data easy to parse. :-)
>
> I mean, it's clear that, on the one hand, having a format like JSON
> that, as has recently been pointed out elsewhere, is parsable by a
> wide variety of tools, is advantageous. However, I don't think it
> really matters whether the somebody's got to look at a tag called
> Flump and match it up with the data in another tag called JIT-Flump,
> or whether there's a Flump group that has RegularStuff and JIT tags
> inside of it. There's just not much difference in the effort involved.
> Being able to parse the JSON or XML using generic code is enough of a
> win that the details shouldn't matter that much.

Having a structured EXPLAIN schema that's semantically consistent is
still valuable. At the end of the day, it's humans who are writing the
tools that consume that structured output. Given the sparse structured
EXPLAIN schema documentation, as someone who currently works on
EXPLAIN tooling, I'd prefer a trend toward consistency at the expense
of backward compatibility. (Of course, we should avoid gratuitous
changes.)

But I take back the version number suggestion after reading Tom's
response; that was naïve.

> I think if you were going to complain about the limitations of our
> current EXPLAIN output format, it'd make a lot more sense to focus on
> the way we output expressions.

That would be nice to have, but for what it's worth, my main complaint
would be about documentation (especially around structured formats).
The "Using EXPLAIN" section covers the basics, but understanding what
node types exist, and what fields show up for what nodes and what they
mean--that seems to be a big missing piece (I don't feel entitled to
this documentation; as a structured format consumer, I'm just pointing
out a deficiency). Contrast that with the great wire protocol
documentation. In some ways it's easier to work on native drivers than
on EXPLAIN tooling because the docs are thorough and well organized.

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-24 Thread Maciek Sakrejda

Thanks for the thorough review. I obviously missed some critical
issues. I recognized some of the other maintainability problems, but
did not have a sense of how to best avoid them, being unfamiliar with
the code.

For what it's worth, this version makes sense to me.

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-24 Thread Maciek Sakrejda

Okay. Does not getting as many workers as it asks for include
sometimes getting zero, completely changing the actual output? If so,
I'll submit a new version of the patch removing all tests--I was
hoping to improve coverage, but I guess this is not the way to start.
If not, can we keep the json tests at least if we only consider the
first worker?

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-24 Thread Maciek Sakrejda

Great, thank you. I noticed in the CF patch tester app, the build
fails on Windows [1]. Investigating, I realized I had failed to fully
strip volatile EXPLAIN info (namely buffers) in TEXT mode due to a
bad regexp_replace. I've fixed this in the attached patch (which I
also rebased against current master again).

[1]: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.76313
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..b4108f82ec 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
    Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
@@ -131,7 +131,9 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
-
+static void ExplainOpenWorker(StringInfo worker_str, ExplainState *es);
+static void ExplainCloseWorker(ExplainState *es);
+static void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, ExplainState *es);
 
 
 /*
@@ -289,6 +291,7 @@ NewExplainState(void)
 	es->costs = true;
 	/* Prepare output buffer. */
 	es->str = makeStringInfo();
+	es->root_str = es->str;
 
 	return es;
 }
@@ -1090,9 +1093,19 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	StringInfo *worker_strs = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/* Prepare per-worker output */
+	if (es->analyze && planstate->worker_instrument)
+	{
+		int num_workers = planstate->worker_instrument->num_workers;
+		worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+		for (int n = 0; n < num_workers; n++)
+			worker_strs[n] = makeStringInfo();
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1357,6 +1370,55 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
+	/* Prepare worker general execution details */
+	if (es->analyze && es->verbose && !es->hide_workers && planstate->worker_instrument)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+
+		for (int n = 0; n < w->num_workers; ++n)
+		{
+			Instrumentation *instrument = >instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			ExplainOpenWorker(worker_strs[n], es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+if (es->timing)
+	appendStringInfo(es->str,
+	 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+	 startup_ms, total_ms, rows, nloops);
+else
+	appendStringInfo(es->str,
+	 "actual rows=%.0f loops=%.0f\n",
+	 rows, nloops);
+			}
+			else
+			{
+if (es->timing)
+{
+	ExplainPropertyFloat("Actual Startup Time", "ms",
+		 startup_ms, 3, es);
+	ExplainPropertyFloat("Actual Total Time", "ms",
+		 total_ms, 3, es);
+}
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+		}
+
+		ExplainCloseWorker(es);
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_SeqScan:
@@ -1856,7 +1918,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			break;
 		case T_Sort:
 			show_sort_keys(castNode(SortState, planstate), ancestors, es);
-			show_sort_info(castNode(SortState, planstate), es);
+			show_sort_info(castNode(SortState, planstate), worker_strs, es);
 			break;
 		case T_MergeAppend:
 			show_merge_append_keys(castNode(MergeAppendState, planstate),
@@ -1885,76 +1947,31 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, >instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && !es->hide_workers &&
-		planstate->worker_instrument)
+	/* Prepare worker buffer usage */
+	if (es->buffers && es->analyze && es->verbose && !es->hide_workers
+		&& planstate->worker_instrument)
 	{
 		WorkerInstrumentation *w = planstate->worker_instrument;
-		bool		opened_group = false;
 		int			n;
 
 		for (n = 0; n < w->num_workers; ++n)
 		{

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-23 Thread Maciek Sakrejda

On Wed, Jan 22, 2020 at 9:37 AM Georgios Kokolatos  wrote:
> My bad, I should have been more clear. I meant that it is preferable to use
> the C99 standard which calls for declaring variables in the scope that you
> need them.

Ah, I see. I think I got that from ExplainPrintSettings. I've
corrected my usage--thanks for pointing it out. I appreciate the
effort to maintain a consistent style.

>
> >> int indent; /* current indentation level */
> >> List   *grouping_stack; /* format-specific grouping state */
> >> +   boolprint_workers;  /* whether current node has worker 
> >> metadata */
> >>
> >> Hmm.. commit  introduced `hide_workers` in the struct. Having 
> >> both
> >> names in the struct so far apart even seems a bit confusing and sloppy. Do 
> >> you
> >> think it would be possible to combine or rename?
> >
> >
> > I noticed that. I was thinking about combining them, but
> > "hide_workers" seems to be about "pretend there is no worker output
> > even if there is" and "print_workers" is "keep track of whether or not
> > there is worker output to print". Maybe I'll rename to
> > "has_worker_output"?
>
> The rename sounds a bit better in my humble opinion. Thanks.

Also, reviewing my code again, I noticed that when I moved the general
worker output earlier, I missed part of the merge conflict: I had
replaced

-   /* Show worker detail */
-   if (es->analyze && es->verbose && !es->hide_workers &&
-   planstate->worker_instrument)

with

+   /* Prepare worker general execution details */
+   if (es->analyze && es->verbose && planstate->worker_instrument)

which ignores the es->hide_workers flag (it did not fail the tests,
but the intent is pretty clear). I've corrected this in the current
patch.

I also noticed that we can now handle worker buffer output more
consistently across TEXT and structured formats, so I made that small
change too:

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 140d0be426..b23b015594 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1401,8 +1401,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
appendStringInfo(es->str,

  "actual rows=%.0f loops=%.0f\n",

  rows, nloops);
-   if (es->buffers)
-   show_buffer_usage(es,
>bufusage);
}
else
{
@@ -1951,7 +1949,7 @@ ExplainNode(PlanState *planstate, List *ancestors,

/* Prepare worker buffer usage */
if (es->buffers && es->analyze && es->verbose && !es->hide_workers
-   && planstate->worker_instrument && es->format !=
EXPLAIN_FORMAT_TEXT)
+   && planstate->worker_instrument)
{
WorkerInstrumentation *w = planstate->worker_instrument;
int n;
diff --git a/src/test/regress/expected/explain.out
b/src/test/regress/expected/explain.out
index 8034a4e0db..a4eed3067f 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -103,8 +103,8 @@ $$, 'verbose', 'analyze', 'buffers', 'timing off',
'costs off', 'summary off'),
  Sort Key: (ROW("*VALUES*".column1))  +
  Buffers: shared hit=114  +
  Worker 0: actual rows=2 loops=1  +
-   Buffers: shared hit=114+
Sort Method: xxx   +
+   Buffers: shared hit=114+
  ->  Values Scan on "*VALUES*" (actual rows=2 loops=1)+
Output: "*VALUES*".column1, ROW("*VALUES*".column1)+
Worker 0: actual rows=2 loops=1+


I think the "producing plan output for a worker" process is easier to
reason about now, and while it changes TEXT format worker output
order, the other changes in this patch are more drastic so this
probably does not matter.

I've also addressed the other feedback above, and reworded a couple of
comments slightly.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..b4108f82ec 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
    Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-22 Thread Maciek Sakrejda

Thanks! I'll fix the brace issues. Re: the other items:

> +   int num_workers = planstate->worker_instrument->num_workers;
> +   int n;
> +   worker_strs = (StringInfo *) palloc0(num_workers * 
> sizeof(StringInfo));
> +   for (n = 0; n < num_workers; n++) {
>
> I think C99 would be better here. Also no parenthesis needed.

Pardon my C illiteracy, but what am I doing that's not valid C99 here?

> +   /* Prepare worker general execution details */
> +   if (es->analyze && es->verbose && planstate->worker_instrument)
> +   {
> +   WorkerInstrumentation *w = planstate->worker_instrument;
> +   int n;
> +
> +   for (n = 0; n < w->num_workers; ++n)
>
> I think C99 would be better here.

And here (if it's not the same problem)?

> +   ExplainCloseGroup("Workers", "Workers", false, es);
> +   // do we have any other cleanup to do?
>
> This comment does not really explain anything. Either remove
> or rephrase. Also C style comments.

Good catch, thanks--I had put this in to remind myself (and reviewers)
about cleanup, but I don't think there's anything else to do, so I'll
just drop it.

> int indent; /* current indentation level */
> List   *grouping_stack; /* format-specific grouping state */
> +   boolprint_workers;  /* whether current node has worker metadata */
>
> Hmm.. commit  introduced `hide_workers` in the struct. Having 
> both
> names in the struct so far apart even seems a bit confusing and sloppy. Do you
> think it would be possible to combine or rename?

I noticed that. I was thinking about combining them, but
"hide_workers" seems to be about "pretend there is no worker output
even if there is" and "print_workers" is "keep track of whether or not
there is worker output to print". Maybe I'll rename to
"has_worker_output"?

> +extern void ExplainOpenWorker(StringInfo worker_str, ExplainState *es);
> +extern void ExplainCloseWorker(ExplainState *es);
> +extern void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, 
> ExplainState *es);
>
> No need to expose those, is there? I feel there should be static.

Good call, I'll update.

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-21 Thread Maciek Sakrejda

TEXT format was tricky due to its inconsistencies, but I think I have
something working reasonably well. I added a simple test for TEXT
format output as well, using a similar approach as the JSON format
test, and liberally regexp_replacing away any volatile output. I
suppose in theory we could do this for YAML, too, but I think it's
gross enough not to be worth it, especially given the high similarity
of all the structured outputs.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..c3c27e13a1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
    Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
@@ -289,6 +289,7 @@ NewExplainState(void)
 	es->costs = true;
 	/* Prepare output buffer. */
 	es->str = makeStringInfo();
+	es->root_str = es->str;
 
 	return es;
 }
@@ -1090,9 +1091,20 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	StringInfo *worker_strs = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/* Prepare per-worker output */
+	if (es->analyze && planstate->worker_instrument) {
+		int num_workers = planstate->worker_instrument->num_workers;
+		int n;
+		worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+		for (n = 0; n < num_workers; n++) {
+			worker_strs[n] = makeStringInfo();
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1357,6 +1369,58 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
+	/* Prepare worker general execution details */
+	if (es->analyze && es->verbose && planstate->worker_instrument)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+		int			n;
+
+		for (n = 0; n < w->num_workers; ++n)
+		{
+			Instrumentation *instrument = >instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			ExplainOpenWorker(worker_strs[n], es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+if (es->timing)
+	appendStringInfo(es->str,
+	 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+	 startup_ms, total_ms, rows, nloops);
+else
+	appendStringInfo(es->str,
+	 "actual rows=%.0f loops=%.0f\n",
+	 rows, nloops);
+if (es->buffers)
+	show_buffer_usage(es, >bufusage);
+			}
+			else
+			{
+if (es->timing)
+{
+	ExplainPropertyFloat("Actual Startup Time", "ms",
+		 startup_ms, 3, es);
+	ExplainPropertyFloat("Actual Total Time", "ms",
+		 total_ms, 3, es);
+}
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+		}
+
+		ExplainCloseWorker(es);
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_SeqScan:
@@ -1856,7 +1920,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			break;
 		case T_Sort:
 			show_sort_keys(castNode(SortState, planstate), ancestors, es);
-			show_sort_info(castNode(SortState, planstate), es);
+			show_sort_info(castNode(SortState, planstate), worker_strs, es);
 			break;
 		case T_MergeAppend:
 			show_merge_append_keys(castNode(MergeAppendState, planstate),
@@ -1885,74 +1949,30 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, >instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && !es->hide_workers &&
-		planstate->worker_instrument)
+	/* Prepare worker buffer usage */
+	if (es->buffers && es->analyze && es->verbose && !es->hide_workers
+		&& planstate->worker_instrument && es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		WorkerInstrumentation *w = planstate->worker_instrument;
-		bool		opened_group = false;
 		int			n;
 
 		for (n = 0; n < w->num_workers; ++n)
 		{
 			Instrumentation *instrument = >instrument[n];
 			double		nloops = instrument->nloops;
-			double		startup_ms;
-			double		total_ms;
-			double		rows;
 
 			if (nloops <= 0)
 continue;
-			startup_ms = 1000.0 * instrument->startup / nloops;
-			total_ms = 1000.0 * instrument->total / nloops;
-			rows = instrument->ntuples / nloops;
-
-			if

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-15 Thread Maciek Sakrejda

Sounds good, I'll try that format. Any idea how to test YAML? With the
JSON format, I was able to rely on Postgres' own JSON-manipulating
functions to strip or canonicalize fields that can vary across
executions--I can't really do that with YAML. Or should I run EXPLAIN
with COSTS OFF, TIMING OFF, SUMMARY OFF and assume that for simple
queries the BUFFERS output (and other fields I can't turn off like
Sort Space Used) *is* going to be stable?

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-14 Thread Maciek Sakrejda

Thanks for the review! I looked at  and rebased the patch
on current master, ac5bdf6.

I introduced a new test file because this bug is specifically about
EXPLAIN output (as opposed to query execution or planning
functionality), and it didn't seem like a test would fit in any of the
other files. I focused on testing just the behavior around this
specific bug (and fix). I think eventually we should probably test
other more fundamental EXPLAIN features (and I'm happy to contribute
to that) in that file, but that seems outside of the scope of this
patch.

Any thoughts on what we should do with text mode output (which is
untouched right now)? The output Andres proposed above makes sense to
me, but I'd like to get more input.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..96a8973884 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
    Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
@@ -289,6 +289,7 @@ NewExplainState(void)
 	es->costs = true;
 	/* Prepare output buffer. */
 	es->str = makeStringInfo();
+	es->root_str = es->str;
 
 	return es;
 }
@@ -1090,9 +1091,20 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	StringInfo *worker_strs = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/* Prepare per-worker output */
+	if (es->analyze && planstate->worker_instrument) {
+		int num_workers = planstate->worker_instrument->num_workers;
+		int n;
+		worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+		for (n = 0; n < num_workers; n++) {
+			worker_strs[n] = makeStringInfo();
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1357,6 +1369,64 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
+	/* Prepare worker general execution details */
+	if (es->analyze && es->verbose && planstate->worker_instrument)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+		int			n;
+
+		for (n = 0; n < w->num_workers; ++n)
+		{
+			Instrumentation *instrument = >instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+appendStringInfoSpaces(es->str, es->indent * 2);
+appendStringInfo(es->str, "Worker %d: ", n);
+if (es->timing)
+	appendStringInfo(es->str,
+	 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+	 startup_ms, total_ms, rows, nloops);
+else
+	appendStringInfo(es->str,
+	 "actual rows=%.0f loops=%.0f\n",
+	 rows, nloops);
+es->indent++;
+if (es->buffers)
+	show_buffer_usage(es, >bufusage);
+es->indent--;
+			}
+			else
+			{
+ExplainOpenWorker(worker_strs[n], es);
+
+if (es->timing)
+{
+	ExplainPropertyFloat("Actual Startup Time", "ms",
+		 startup_ms, 3, es);
+	ExplainPropertyFloat("Actual Total Time", "ms",
+		 total_ms, 3, es);
+}
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+		}
+
+		if (es->format != EXPLAIN_FORMAT_TEXT) {
+			ExplainCloseWorker(es);
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_SeqScan:
@@ -1856,7 +1926,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			break;
 		case T_Sort:
 			show_sort_keys(castNode(SortState, planstate), ancestors, es);
-			show_sort_info(castNode(SortState, planstate), es);
+			show_sort_info(castNode(SortState, planstate), worker_strs, es);
 			break;
 		case T_MergeAppend:
 			show_merge_append_keys(castNode(MergeAppendState, planstate),
@@ -1885,74 +1955,30 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, >instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && !es->hide_workers &&
-		planstate->worker_instrument)
+	/* Prepare worker buffer usage */
+	if (es->buffers && es->analyze && es->verbose && !es->hide_workers
+		&& planstate->worker_instrument && es->format != EXPLAIN_FORMAT_TEXT)
 	{

Re: Duplicate Workers entries in some EXPLAIN plans

2019-12-27 Thread Maciek Sakrejda

Done! Thanks!

Re: Duplicate Workers entries in some EXPLAIN plans

2019-12-26 Thread Maciek Sakrejda

I wanted to follow up on this patch since I received no feedback. What
should my next steps be (besides rebasing, though I want to confirm there's
interest before I do that)?

Re: Duplicate Workers entries in some EXPLAIN plans

2019-11-18 Thread Maciek Sakrejda

On Thu, Oct 24, 2019 at 6:48 PM Andres Freund  wrote:
> Unfortunately I think the fix isn't all that trivial, due to the way we
> output the per-worker information at the end of ExplainNode(), by just
> dumping things into a string.  It seems to me that a step in the right
> direction would be for ExplainNode() to create
> planstate->worker_instrument StringInfos, which can be handed to
> routines like show_sort_info(), which would print the per-node
> information into that, rather than directly dumping into
> es->output. Most of the current "Show worker detail" would be done
> earlier in ExplainNode(), at the place where we current display the
> "actual rows" bit.
>
> ISTM that should include removing the duplication fo the the contents of
> show_sort_info(), and probably also for the Gather, GatherMerge blocks
> (I've apparently skipped adding the JIT information to the latter, not
> sure if we ought to fix that in the stable branches).
>
> Any chance you want to take a stab at that?

It took me a while, but I did take a stab at it (thanks for your
off-list help). Attached is my patch that changes the structured
formats to merge sort worker output in with costs/timing/buffers
worker output. I have not touched any other worker output yet, since
it's not under a Workers group as far as I can tell (so it does not
exhibit the problem I originally reported). I can try to make further
changes here if the approach is deemed sound. I also have not touched
text output; above you had proposed

>  Sort Method: external merge  Disk: 4920kB
>  Buffers: shared hit=682 read=10188, temp read=1415 written=2101
>  Worker 0: actual time=130.058..130.324 rows=1324 loops=1
>Sort Method: external merge  Disk: 5880kB
>Buffers: shared hit=337 read=3489, temp read=505 written=739
>  Worker 1: actual time=130.273..130.512 rows=1297 loops=1
>Buffers: shared hit=345 read=3507, temp read=505 written=744
>Sort Method: external merge  Disk: 5920kB

which makes sense to me, but I'd like to confirm this is the approach
we want before I add it to the patch.

This is my first C in close to a decade (and I was never much of a C
programmer to begin with), so please be gentle.

As Andres suggested off-list, I also changed the worker output to
order fields that also occur in the parent node in the same way as the
parent node.

I've also added a test for the patch, and because this is really an
EXPLAIN issue rather than a query feature issue, I added a
src/test/regress/sql/explain.sql for the test. I added a couple of
utility functions for munging json-formatted EXPLAIN plans into
something we can repeatably verify in regression tests (the functions
use json rather than jsonb to preserve field order). I have not added
this for YAML or XML (even though they should behave the same way),
since I'm not familiar with the the functions to manipulate those data
types in a similar way (if they exist). My hunch is due to the
similarity of structured formats, just testing JSON is enough, but I
can expand/adjust tests as necessary.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62fb3434a3..8f898fc20c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
    Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
@@ -290,6 +290,7 @@ NewExplainState(void)
 	es->costs = true;
 	/* Prepare output buffer. */
 	es->str = makeStringInfo();
+	es->root_str = es->str;
 
 	return es;
 }
@@ -1073,9 +1074,20 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	StringInfo *worker_strs = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/* Prepare per-worker output */
+	if (es->analyze && planstate->worker_instrument) {
+		int num_workers = planstate->worker_instrument->num_workers;
+		int n;
+		worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+		for (n = 0; n < num_workers; n++) {
+			worker_strs[n] = makeStringInfo();
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1340,6 +1352,64 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
+	/* Prepare worker general execution details */
+	if (es->analyze && es->verbose && planstate->worker_instrument)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+		int			n;
+
+		for (n = 0;

Re: JIT performance bug/regression & JIT EXPLAIN

2019-11-15 Thread Maciek Sakrejda

On Fri, Nov 15, 2019 at 5:49 AM Robert Haas  wrote:
> Personally, I don't care very much about backward-compatibility, or
> about how hard it is for tools to parse. I want it to be possible, but
> if it takes a little extra effort, so be it.

I think these are two separate issues. I agree on
backward-compatibility (especially if we can embed a server version in
structured EXPLAIN output to make it easier for tools to track format
differences), but not caring how hard it is for tools to parse? What's
the point of structured formats, then?

> My main concern is having
> the text output look good to human beings, because that is the primary
> format and they are the primary consumers.

Structured output is also for human beings, albeit indirectly. That
text is the primary format may be more of a reflection of the
difficulty of building and integrating EXPLAIN tools than its inherent
superiority (that said, I'll concede it's a concise and elegant format
for what it does). What if psql supported an EXPLAINER like it does
EDITOR?

For what it's worth, after thinking about this a bit, I'd like to see
structured EXPLAIN evolve into a more consistent format, even if it
means breaking changes (and I do think a version specifier at the root
of the plan would make this easier).

Re: JIT performance bug/regression & JIT EXPLAIN

2019-11-12 Thread Maciek Sakrejda

On Mon, Oct 28, 2019 at 5:02 PM Andres Freund  wrote:
> What I dislike about that is that it basically again is introducing

"again"? Am I missing some history here? I'd love to read up on this
if there are mistakes to learn from.

> something that requires either pattern matching on key names (i.e. a key
> of '(.*) JIT' is one that has information about JIT, and the associated
> expresssion is in key $1), or knowing all the potential keys an
> expression could be in.

That still seems less awkward than having to handle a Filter field
that's either scalar or a group. Most current EXPLAIN options just add
additional fields to the structured plan instead of modifying it, no?
If that output is better enough, though, maybe we should just always
make Filter a group and go with the breaking change? If tooling
authors need to treat this case specially anyway, might as well evolve
the format.

> Another alternative would be to just remove the 'Output' line when a
> node doesn't project - it can't really carry meaning in those cases
> anyway?

¯\_(ツ)_/¯

For what it's worth, I certainly wouldn't miss it.

Re: JIT performance bug/regression & JIT EXPLAIN

2019-10-28 Thread Maciek Sakrejda

>But that's pretty crappy, because it means that the 'shape' of the output 
>depends on the jit_details option.

Yeah, that would be hard to work with. What about adding it as a sibling group?

"Filter": "(lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)",
"Filter JIT": {
  "Expr": "evalexpr_0_2",
  "Deform Scan": "deform_0_3",
  "Deform Outer": null,
  "Deform Inner": null
}

Also not that pretty, but at least it's easier to work with (I also
changed the dashes to spaces since that's what the rest of EXPLAIN is
doing as a matter of style).

>But the compat break due to that change is not small- perhaps we could instead 
>mark that in another way?

We could add a "Projects" boolean key instead? Of course that's more
awkward in text mode. Maybe compat break is less of an issue in text
mode and we can treat this differently?

>I'm not sure that 'TRANS' is the best placeholder for the transition value 
>here. Maybe $TRANS would be clearer?

+1, I think the `$` makes it clearer that this is not a literal expression.

>For HashJoin/Hash I've added 'Outer Hash Key' and 'Hash Key' for each key, but 
>only in verbose mode.

That reads pretty well to me. What does the structured output look like?

Duplicate Workers entries in some EXPLAIN plans

2019-10-22 Thread Maciek Sakrejda

Hello,

I originally reported this in pgsql-bugs [0], but there wasn't much
feedback there, so moving the discussion here. When using JSON, YAML, or
XML-format EXPLAIN on a plan that uses a parallelized sort, the Sort nodes
list two different entries for "Workers", one for the sort-related info,
and one for general worker info. This is what this looks like in JSON (some
details elided):

{
  "Node Type": "Sort",
  ...
  "Workers": [
{
  "Worker Number": 0,
  "Sort Method": "external merge",
  "Sort Space Used": 20128,
  "Sort Space Type": "Disk"
},
{
  "Worker Number": 1,
  "Sort Method": "external merge",
  "Sort Space Used": 20128,
  "Sort Space Type": "Disk"
}
  ],
  ...
  "Workers": [
{
  "Worker Number": 0,
  "Actual Startup Time": 309.726,
  "Actual Total Time": 310.179,
  "Actual Rows": 4128,
  "Actual Loops": 1,
  "Shared Hit Blocks": 2872,
  "Shared Read Blocks": 7584,
  "Shared Dirtied Blocks": 0,
  "Shared Written Blocks": 0,
  "Local Hit Blocks": 0,
  "Local Read Blocks": 0,
  "Local Dirtied Blocks": 0,
  "Local Written Blocks": 0,
  "Temp Read Blocks": 490,
  "Temp Written Blocks": 2529
},
{
  "Worker Number": 1,
  "Actual Startup Time": 306.523,
  "Actual Total Time": 307.001,
  "Actual Rows": 4128,
  "Actual Loops": 1,
  "Shared Hit Blocks": 3356,
  "Shared Read Blocks": 7100,
  "Shared Dirtied Blocks": 0,
  "Shared Written Blocks": 0,
  "Local Hit Blocks": 0,
  "Local Read Blocks": 0,
  "Local Dirtied Blocks": 0,
  "Local Written Blocks": 0,
  "Temp Read Blocks": 490,
  "Temp Written Blocks": 2529
}
  ],
  "Plans:" ...
}

This is technically valid JSON, but it's extremely difficult to work with,
since default JSON parsing in Ruby, node, Python, Go, and even Postgres'
own jsonb only keep the latest key--the sort information is discarded
(other languages probably don't fare much better; this is what I had on
hand). As Tom Lane pointed out in my pgsql-bugs thread, this has been
reported before [1] and in that earlier thread, Andrew Dunstan suggested
that perhaps the simplest solution is to just rename the sort-related
Workers node. Tom expressed some concerns about a breaking change here,
though I think the current behavior means vanishingly few users are parsing
this data correctly. Thoughts?

Thanks,
Maciek

[0]:
https://www.postgresql.org/message-id/CADXhmgSr807j2Pc9aUjW2JOzOBe3FeYnQBe_f9U%2B-Mm4b1HRUw%40mail.gmail.com
[1]:
https://www.postgresql.org/message-id/flat/41ee53a5-a36e-cc8f-1bee-63f6565bb...@dalibo.com

78 matches

Mail list logo