Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud wrote: > I originally suggested to make it clearer by having an enum GUC rather than a > boolean, say compute_queryid = [ none | core | external ], and if set to > external then a hook would be explicitely called. Right now, "none" and > "external" are binded with compute_queryid = off, and depends on whether an > extension is computing a queryid during post_parse_analyse_hook. I would just make it a Boolean and have a hook. The Boolean controls whether it gets computed at all, and the hook lets an external module override the way it gets computed. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Mar 17, 2021 at 12:24:44PM -0400, Bruce Momjian wrote: > On Wed, Mar 17, 2021 at 05:16:50PM +0100, Pavel Stehule wrote: > > > > > > st 17. 3. 2021 v 17:03 odesílatel Tom Lane napsal: > > > > Bruce Momjian writes: > > > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote: > > >> I still say that it's a serious mistake to sanctify a query ID > > calculation > > >> method that was designed only for pg_stat_statement's needs as the > > one > > >> true way to do it. But that's what exposing it in a core view would > > do. > > > > > OK, I am fine with creating a new method, and maybe having > > > pg_stat_statements use it. Is that the direction we should be going > > in? > > > > The point is that we've understood Query.queryId as something that > > different extensions might calculate differently for their own needs. > > In particular it's easy to imagine extensions that want an ID that is > > less fuzzy than what pg_stat_statements wants. We never had a plan for > > how two such extensions could co-exist, but at least it was possible > > to use one if you didn't use another. If this gets moved into core > > then there will basically be only one way that anyone can do it. > > > > Maybe what we need is a design for allowing more than one query ID. > > > > > > Theoretically there can be a hook for calculation of queryid, that can be by > > used extension. Default can be assigned with a method that is used by > > pg_stat_statements. > > Yes, that is what the code patch says it does. > > > I don't think it is possible to use more different query id for > > pg_stat_statements so this solution can be simple. > > Agreed. Actually, putting the query identifer computation in the core makes it way more tunable, even if it's conterintuitive. What it means is that you can now chose to use usual pgss' algorithm or a different one for log_line_prefix and pg_stat_activity.queryid, but also that you can now use pgss with a different query id algorithm. That's another thing that user were asking for a long time. I originally suggested to make it clearer by having an enum GUC rather than a boolean, say compute_queryid = [ none | core | external ], and if set to external then a hook would be explicitely called. Right now, "none" and "external" are binded with compute_queryid = off, and depends on whether an extension is computing a queryid during post_parse_analyse_hook. It could later be extended to suit other needs if we ever come to some agreement (for instance "legacy", "logical_replication_stable" or whatever better name we can find for something that doesn't depend on Oid).
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Mar 17, 2021 at 05:16:50PM +0100, Pavel Stehule wrote: > > > st 17. 3. 2021 v 17:03 odesílatel Tom Lane napsal: > > Bruce Momjian writes: > > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote: > >> I still say that it's a serious mistake to sanctify a query ID > calculation > >> method that was designed only for pg_stat_statement's needs as the one > >> true way to do it. But that's what exposing it in a core view would > do. > > > OK, I am fine with creating a new method, and maybe having > > pg_stat_statements use it. Is that the direction we should be going in? > > The point is that we've understood Query.queryId as something that > different extensions might calculate differently for their own needs. > In particular it's easy to imagine extensions that want an ID that is > less fuzzy than what pg_stat_statements wants. We never had a plan for > how two such extensions could co-exist, but at least it was possible > to use one if you didn't use another. If this gets moved into core > then there will basically be only one way that anyone can do it. > > Maybe what we need is a design for allowing more than one query ID. > > > Theoretically there can be a hook for calculation of queryid, that can be by > used extension. Default can be assigned with a method that is used by > pg_stat_statements. Yes, that is what the code patch says it does. > I don't think it is possible to use more different query id for > pg_stat_statements so this solution can be simple. Agreed. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
st 17. 3. 2021 v 17:03 odesílatel Tom Lane napsal: > Bruce Momjian writes: > > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote: > >> I still say that it's a serious mistake to sanctify a query ID > calculation > >> method that was designed only for pg_stat_statement's needs as the one > >> true way to do it. But that's what exposing it in a core view would do. > > > OK, I am fine with creating a new method, and maybe having > > pg_stat_statements use it. Is that the direction we should be going in? > > The point is that we've understood Query.queryId as something that > different extensions might calculate differently for their own needs. > In particular it's easy to imagine extensions that want an ID that is > less fuzzy than what pg_stat_statements wants. We never had a plan for > how two such extensions could co-exist, but at least it was possible > to use one if you didn't use another. If this gets moved into core > then there will basically be only one way that anyone can do it. > > Maybe what we need is a design for allowing more than one query ID. > Theoretically there can be a hook for calculation of queryid, that can be by used extension. Default can be assigned with a method that is used by pg_stat_statements. I don't think it is possible to use more different query id for pg_stat_statements so this solution can be simple. regards Pavel > > > I do think we need _some_ method in core if we are going to be exposing > > this value in pg_stat_activity and log_line_prefix. > > I'm basically objecting to the conclusion that we should do either > of those. There is no way around the fact that it will break every > user of Query.queryId other than pg_stat_statements, unless they > are okay with whatever definition pg_stat_statements is using (which > is a moving target BTW). > > regards, tom lane > > >
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Mar 17, 2021 at 12:01:38PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote: > >> I still say that it's a serious mistake to sanctify a query ID calculation > >> method that was designed only for pg_stat_statement's needs as the one > >> true way to do it. But that's what exposing it in a core view would do. > > > OK, I am fine with creating a new method, and maybe having > > pg_stat_statements use it. Is that the direction we should be going in? > > The point is that we've understood Query.queryId as something that > different extensions might calculate differently for their own needs. > In particular it's easy to imagine extensions that want an ID that is > less fuzzy than what pg_stat_statements wants. We never had a plan for > how two such extensions could co-exist, but at least it was possible > to use one if you didn't use another. If this gets moved into core > then there will basically be only one way that anyone can do it. Well, the patch docs say: Enables or disables in core query identifier computation.arameter. The extension requires a query --> identifier to be computed. Note that an external module can --> alternatively be used if the in core query identifier computation specification doesn't suit your need. In this case, in core computation must be disabled. The default is off. > Maybe what we need is a design for allowing more than one query ID. > > > I do think we need _some_ method in core if we are going to be exposing > > this value in pg_stat_activity and log_line_prefix. > > I'm basically objecting to the conclusion that we should do either > of those. There is no way around the fact that it will break every > user of Query.queryId other than pg_stat_statements, unless they > are okay with whatever definition pg_stat_statements is using (which > is a moving target BTW). I thought the above doc patch feature avoided this problem because an extension can override the build-in query id. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Bruce Momjian writes: > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote: >> I still say that it's a serious mistake to sanctify a query ID calculation >> method that was designed only for pg_stat_statement's needs as the one >> true way to do it. But that's what exposing it in a core view would do. > OK, I am fine with creating a new method, and maybe having > pg_stat_statements use it. Is that the direction we should be going in? The point is that we've understood Query.queryId as something that different extensions might calculate differently for their own needs. In particular it's easy to imagine extensions that want an ID that is less fuzzy than what pg_stat_statements wants. We never had a plan for how two such extensions could co-exist, but at least it was possible to use one if you didn't use another. If this gets moved into core then there will basically be only one way that anyone can do it. Maybe what we need is a design for allowing more than one query ID. > I do think we need _some_ method in core if we are going to be exposing > this value in pg_stat_activity and log_line_prefix. I'm basically objecting to the conclusion that we should do either of those. There is no way around the fact that it will break every user of Query.queryId other than pg_stat_statements, unless they are okay with whatever definition pg_stat_statements is using (which is a moving target BTW). regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote: > Bruce Momjian writes: > > We are reaching the two-year mark on this feature, that everyone seems > > to agree is needed. Is any committer going to work on this to get it > > into PG 14? Should I take it? > > I still say that it's a serious mistake to sanctify a query ID calculation > method that was designed only for pg_stat_statement's needs as the one > true way to do it. But that's what exposing it in a core view would do. OK, I am fine with creating a new method, and maybe having pg_stat_statements use it. Is that the direction we should be going in? I do think we need _some_ method in core if we are going to be exposing this value in pg_stat_activity and log_line_prefix. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Bruce Momjian writes: > We are reaching the two-year mark on this feature, that everyone seems > to agree is needed. Is any committer going to work on this to get it > into PG 14? Should I take it? I still say that it's a serious mistake to sanctify a query ID calculation method that was designed only for pg_stat_statement's needs as the one true way to do it. But that's what exposing it in a core view would do. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Sun, Mar 14, 2021 at 04:06:45PM +0800, Julien Rouhaud wrote: > Recent conflict, thanks to cfbot. v18 attached. We are reaching the two-year mark on this feature, that everyone seems to agree is needed. Is any committer going to work on this to get it into PG 14? Should I take it? I just read the thread and I didn't see any open issues. Are there any? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Recent conflict, thanks to cfbot. v18 attached. >From fa94eba58ee0ca098cfde0d17de72dc230ee471c Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Wed, 14 Oct 2020 02:11:37 +0800 Subject: [PATCH v18 1/3] Move pg_stat_statements query jumbling to core. A new compute_queryid GUC is also added, to control whether the queryid should be computed. It's now possible to disable core queryid computation and use pg_stat_statements with a different algorithm to compute the queryid by using third-party module. Author: Julien Rouhaud Reviewed-by: Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- .../pg_stat_statements/pg_stat_statements.c | 805 + .../pg_stat_statements.conf | 1 + doc/src/sgml/config.sgml | 18 + src/backend/parser/analyze.c | 14 +- src/backend/tcop/postgres.c | 6 +- src/backend/utils/misc/Makefile | 1 + src/backend/utils/misc/guc.c | 10 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/backend/utils/misc/queryjumble.c | 834 ++ src/include/parser/analyze.h | 4 +- src/include/utils/guc.h | 1 + src/include/utils/queryjumble.h | 58 ++ 12 files changed, 969 insertions(+), 784 deletions(-) create mode 100644 src/backend/utils/misc/queryjumble.c create mode 100644 src/include/utils/queryjumble.h diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 62cccbfa44..99bc7184cb 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -8,24 +8,9 @@ * a shared hashtable. (We track only as many distinct queries as will fit * in the designated amount of shared memory.) * - * As of Postgres 9.2, this module normalizes query entries. Normalization - * is a process whereby similar queries, typically differing only in their - * constants (though the exact rules are somewhat more subtle than that) are - * recognized as equivalent, and are tracked as a single entry. This is - * particularly useful for non-prepared queries. - * - * Normalization is implemented by fingerprinting queries, selectively - * serializing those fields of each query tree's nodes that are judged to be - * essential to the query. This is referred to as a query jumble. This is - * distinct from a regular serialization in that various extraneous - * information is ignored as irrelevant or not essential to the query, such - * as the collations of Vars and, most notably, the values of constants. - * - * This jumble is acquired at the end of parse analysis of each query, and - * a 64-bit hash of it is stored into the query's Query.queryId field. - * The server then copies this value around, making it available in plan - * tree(s) generated from the query. The executor can then use this value - * to blame query costs on the proper queryId. + * As of Postgres 9.2, this module normalizes query entries. As of Postgres + * 14, the normalization is done by the core, if compute_queryid is enabled, or + * by third-party modules if enabled. * * To facilitate presenting entries to users, we create "representative" query * strings in which constants are replaced with parameter symbols ($n), to @@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ #define IS_STICKY(c) ((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0) -#define JUMBLE_SIZE1024 /* query serialization buffer size */ - /* * Extension version number, for supporting older extension versions' objects */ @@ -235,40 +218,6 @@ typedef struct pgssSharedState pgssGlobalStats stats; /* global statistics for pgss */ } pgssSharedState; -/* - * Struct for tracking locations/lengths of constants during normalization - */ -typedef struct pgssLocationLen -{ - int location; /* start offset in query text */ - int length; /* length in bytes, or -1 to ignore */ -} pgssLocationLen; - -/* - * Working state for computing a query jumble and producing a normalized - * query string - */ -typedef struct pgssJumbleState -{ - /* Jumble of current query tree */ - unsigned char *jumble; - - /* Number of bytes used in jumble[] */ - Size jumble_len; - - /* Array of locations of constants that should be removed */ - pgssLocationLen *clocations; - - /* Allocated length of clocations array */ - int clocations_buf_size; - - /* Current number of valid entries in clocations array */ - int clocations_count; - - /* highest Param id we've seen, in order to start normalization correctly */ - int highest_extern_param_id; -} pgssJumbleState; - /* Local variables */ /* Current nesting depth of ExecutorRun+ProcessUtility calls */ @@ -342,7 +291,8 @@
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Jan 20, 2021 at 12:43 AM Julien Rouhaud wrote: > > On Fri, Jan 8, 2021 at 1:07 AM Julien Rouhaud wrote: > > > > v15 that fixes recent conflicts. > > Rebase only, thanks to the cfbot! V16 attached. Recent commit exposed that the explain_filter() doesn't filter negative sign. This can now be a problem with query identifiers in explain output as they use the whole bigint range. v17 attached fixes that, also rebased against current HEAD. From 8904b60ed3cc770f209ae5f97804b4d1ffe7d175 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Sun, 8 Mar 2020 14:34:44 +0100 Subject: [PATCH v17 3/3] Expose query identifier in verbose explain If a query identifier has been computed, either by enabling compute_queryid or using a third-party module, verbose explain will display it. Author: Julien Rouhaud Reviewed-by: Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- doc/src/sgml/config.sgml | 14 +++--- doc/src/sgml/ref/explain.sgml | 6 -- src/backend/commands/explain.c| 18 ++ src/test/regress/expected/explain.out | 11 ++- src/test/regress/sql/explain.sql | 5 - 5 files changed, 43 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 1763790473..2aeb146223 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7524,13 +7524,13 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; Enables or disables in core query identifier computation. A query identifier can be displayed in the pg_stat_activity -view, or emitted in the log if configured via the parameter. The extension also requires a query identifier -to be computed. Note that an external module can alternatively be used -if the in core query identifier computation specification doesn't suit -your need. In this case, in core computation must be disabled. The -default is off. +view, using EXPLAIN, or emitted in the log if +configured via the parameter. +The extension also requires a query +identifier to be computed. Note that an external module can +alternatively be used if the in core query identifier computation +specification doesn't suit your need. In this case, in core +computation must be disabled. The default is off. diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml index c4512332a0..105b069b41 100644 --- a/doc/src/sgml/ref/explain.sgml +++ b/doc/src/sgml/ref/explain.sgml @@ -136,8 +136,10 @@ ROLLBACK; the output column list for each node in the plan tree, schema-qualify table and function names, always label variables in expressions with their range table alias, and always print the name of each trigger for - which statistics are displayed. This parameter defaults to - FALSE. + which statistics are displayed. The query identifier will also be + displayed if one has been compute, see for more details. This parameter + defaults to FALSE. diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index afc45429ba..ac5879c1cf 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -24,6 +24,7 @@ #include "nodes/extensible.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "parser/analyze.h" #include "parser/parsetree.h" #include "rewrite/rewriteHandler.h" #include "storage/bufmgr.h" @@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, { ExplainState *es = NewExplainState(); TupOutputState *tstate; + JumbleState *jstate = NULL; + Query *query; List *rewritten; ListCell *lc; bool timing_set = false; @@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, /* if the summary was not set explicitly, set default value */ es->summary = (summary_set) ? es->summary : es->analyze; + query = castNode(Query, stmt->query); + if (compute_queryid) + jstate = JumbleQuery(query, pstate->p_sourcetext); + + if (post_parse_analyze_hook) + (*post_parse_analyze_hook) (pstate, query, jstate); + /* * Parse analysis was done already, but we still have to run the rule * rewriter. We do not do AcquireRewriteLocks: we assume the query either @@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, /* Create textual dump of plan tree */ ExplainPrintPlan(es, queryDesc); + if (es->verbose && plannedstmt->queryId != UINT64CONST(0)) + { + char buf[MAXINT8LEN+1]; + + pg_lltoa(plannedstmt->queryId, buf); + ExplainPropertyText("Query Identifier", buf, es); + } + /* Show buffer usage in planning */ if (bufusage) { diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out index dc7ab2ce8b..f45f069f30 100644 ---
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hello Yamada-san, On Wed, Jan 20, 2021 at 10:06 AM Tatsuro Yamada wrote: > > Hi Julien, > > > >> Rebase only, thanks to the cfbot! V16 attached. > > > > I tested the v16 patch on a0efda88a by using "make installcheck-parallel", > > and > > my result is the following. Attached file is regression.diffs. > > > Sorry, my environment was not suitable for the test when I sent my previous > email. > I fixed my environment and tested it again, and it was a success. See below: > > === > All 202 tests passed. > === No worries, thanks a lot for testing!
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hi Julien, Rebase only, thanks to the cfbot! V16 attached. I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and my result is the following. Attached file is regression.diffs. Sorry, my environment was not suitable for the test when I sent my previous email. I fixed my environment and tested it again, and it was a success. See below: === All 202 tests passed. === Regards, Tatsuro Yamada
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hi Julien, Rebase only, thanks to the cfbot! V16 attached. I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and my result is the following. Attached file is regression.diffs. 1 of 202 tests failed. The differences that caused some tests to fail can be viewed in the file "/home/postgres/PG140/src/test/regress/regression.diffs". A copy of the test summary that you see above is saved in the file "/home/postgres/PG140/src/test/regress/regression.out". src/test/regress/regression.diffs - diff -U3 /home/postgres/PG140/src/test/regress/expected/rules.out /home/postgres/PG140/src/test/regress/results/rules.out --- /home/postgres/PG140/src/test/regress/expected/rules.out2021-01-20 08:41:16.383175559 +0900 +++ /home/postgres/PG140/src/test/regress/results/rules.out 2021-01-20 08:43:46.589171774 +0900 @@ -1760,10 +1760,9 @@ s.state, s.backend_xid, s.backend_xmin, -s.queryid, s.query, s.backend_type - FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid) + FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid) ... Thanks, Tatsuro Yamada diff -U3 /home/postgres/PG140/src/test/regress/expected/rules.out /home/postgres/PG140/src/test/regress/results/rules.out --- /home/postgres/PG140/src/test/regress/expected/rules.out2021-01-20 08:41:16.383175559 +0900 +++ /home/postgres/PG140/src/test/regress/results/rules.out 2021-01-20 08:52:10.891159065 +0900 @@ -1760,10 +1760,9 @@ s.state, s.backend_xid, s.backend_xmin, -s.queryid, s.query, s.backend_type - FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid) + FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid) LEFT JOIN pg_database d ON ((s.datid = d.oid))) LEFT JOIN pg_authid u ON ((s.usesysid = u.oid))); pg_stat_all_indexes| SELECT c.oid AS relid, @@ -1875,7 +1874,7 @@ s.gss_auth AS gss_authenticated, s.gss_princ AS principal, s.gss_enc AS encrypted - FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid) + FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid) WHERE (s.client_port IS NOT NULL); pg_stat_progress_analyze| SELECT s.pid, s.datid, @@ -2032,7 +2031,7 @@ w.sync_priority, w.sync_state, w.reply_time - FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid) + FROM
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Jan 8, 2021 at 1:07 AM Julien Rouhaud wrote: > > v15 that fixes recent conflicts. Rebase only, thanks to the cfbot! V16 attached. From a0388c53d9755cfd706513f7f02a08b31a48aacb Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Mon, 18 Mar 2019 18:55:50 +0100 Subject: [PATCH v16 2/3] Expose queryid in pg_stat_activity and log_line_prefix Similarly to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Also add a %Q placeholder to include the queryid in the log_line_prefix, which will also only expose top level statements. Author: Julien Rouhaud Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- .../pg_stat_statements/pg_stat_statements.c | 112 +++--- doc/src/sgml/config.sgml | 29 +++-- doc/src/sgml/monitoring.sgml | 16 +++ src/backend/catalog/system_views.sql | 1 + src/backend/executor/execMain.c | 8 ++ src/backend/executor/execParallel.c | 14 ++- src/backend/executor/nodeGather.c | 3 +- src/backend/executor/nodeGatherMerge.c| 4 +- src/backend/parser/analyze.c | 5 + src/backend/postmaster/pgstat.c | 65 ++ src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/pgstatfuncs.c | 7 +- src/backend/utils/error/elog.c| 9 +- src/backend/utils/misc/postgresql.conf.sample | 1 + src/backend/utils/misc/queryjumble.c | 29 +++-- src/include/catalog/pg_proc.dat | 6 +- src/include/executor/execParallel.h | 3 +- src/include/pgstat.h | 5 + src/include/utils/queryjumble.h | 2 +- src/test/regress/expected/rules.out | 9 +- 20 files changed, 223 insertions(+), 110 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 3db4fa2f7a..ce166f417e 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -65,6 +65,7 @@ #include "tcop/utility.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/queryjumble.h" #include "utils/memutils.h" #include "utils/timestamp.h" @@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ #define IS_STICKY(c) ((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0) +/* + * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze + * ignores. + */ +#define PGSS_HANDLED_UTILITY(n) (!IsA(n, ExecuteStmt) && \ + !IsA(n, PrepareStmt) && \ + !IsA(n, DeallocateStmt)) + /* * Extension version number, for supporting older extension versions' objects */ @@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, QueryCompletion *qc); -static uint64 pgss_hash_string(const char *str, int len); static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, pgssStoreKind kind, @@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate) return; /* - * Utility statements get queryId zero. We do this even in cases where - * the statement contains an optimizable statement for which a queryId - * could be derived (such as EXPLAIN or DECLARE CURSOR). For such cases, - * runtime control will first go through ProcessUtility and then the - * executor, and we don't want the executor hooks to do anything, since we - * are already measuring the statement's costs at the utility level. + * Clear queryId for prepared statements related utility, as those will + * inherit from the underlying statement's one (except DEALLOCATE which is + * entirely untracked). */ if (query->utilityStmt) { - query->queryId = UINT64CONST(0); + if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt)) + query->queryId = UINT64CONST(0); return; } @@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, DestReceiver *dest, QueryCompletion *qc) { Node *parsetree = pstmt->utilityStmt; + uint64 saved_queryId = pstmt->queryId; + + /* + * Force utility statements to get queryId zero. We do this even in cases + * where the statement contains an optimizable statement for which a + * queryId could be derived (such as EXPLAIN or DECLARE CURSOR). For such + * cases, runtime control will first go through ProcessUtility and then
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Sun, Oct 18, 2020 at 4:12 PM Julien Rouhaud wrote: > > On Sun, Oct 18, 2020 at 12:20 PM Tom Lane wrote: > > > > Alvaro Herrera writes: > > > Wait ... what? I've been thinking that this GUC is just to enable or > > > disable the computation of query ID, not to change the algorithm to do > > > so. Do we really need to allow different algorithms in different > > > sessions? > > > > We established that some time ago, no? > > I thought we established the need for allowing different algorithms, > but I assumed globally not per session. Anyway, allowing to enable or > disable compute_queryid per session would technically allow that, > assuming that you have another module loaded that computes a queryid > only if no-one was already computed. In that case pg_stat_statements > works as you would expect, you will get a new entry, with a duplicated > query text. > > With a bit more thinking, there's at least one use case where it's > interesting to disable pg_stat_statements: queries using temporary > tables. In that case you're guaranteed to generate an infinity of > different queryid. That doesn't really help since you're not > aggregating anything anymore, and it also makes pg_stat_statements > virtually unusable as once you have a workload that needs frequent > eviction, the overhead is so bad that you basically have to disable > pg_stat_statements. We could alternatively add a GUC to disable > queryid computation when one of the tables is a temporary table, but > that's yet one among many considerations that are probably best > answered with a custom implementation. > > I'm also attaching an updated patch with some attempt to improve the > documentation. I mention that in-core algorithm may not suits > everyone's needs, but we don't actually document what heuristics are. > Should we give more details on them and what are the most direct > consequences? v15 that fixes recent conflicts. From 952796fa1c65000948ed2a267f76676e354c989c Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Sun, 8 Mar 2020 14:34:44 +0100 Subject: [PATCH v15 3/3] Expose query identifier in verbose explain If a query identifier has been computed, either by enabling compute_queryid or using a third-party module, verbose explain will display it. Author: Julien Rouhaud Reviewed-by: Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- doc/src/sgml/config.sgml | 14 +++--- doc/src/sgml/ref/explain.sgml | 6 -- src/backend/commands/explain.c| 18 ++ src/test/regress/expected/explain.out | 9 + src/test/regress/sql/explain.sql | 3 +++ 5 files changed, 41 insertions(+), 9 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index d9c85a1f80..1d2f7ba393 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7487,13 +7487,13 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; Enables or disables in core query identifier computation. A query identifier can be displayed in the pg_stat_activity -view, or emitted in the log if configured via the parameter. The extension also requires a query identifier -to be computed. Note that an external module can alternatively be used -if the in core query identifier computation specification doesn't suit -your need. In this case, in core computation must be disabled. The -default is off. +view, using EXPLAIN, or emitted in the log if +configured via the parameter. +The extension also requires a query +identifier to be computed. Note that an external module can +alternatively be used if the in core query identifier computation +specification doesn't suit your need. In this case, in core +computation must be disabled. The default is off. diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml index c4512332a0..105b069b41 100644 --- a/doc/src/sgml/ref/explain.sgml +++ b/doc/src/sgml/ref/explain.sgml @@ -136,8 +136,10 @@ ROLLBACK; the output column list for each node in the plan tree, schema-qualify table and function names, always label variables in expressions with their range table alias, and always print the name of each trigger for - which statistics are displayed. This parameter defaults to - FALSE. + which statistics are displayed. The query identifier will also be + displayed if one has been compute, see for more details. This parameter + defaults to FALSE. diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 5d7eb3574c..2e1b4bf0bf 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -24,6 +24,7 @@ #include "nodes/extensible.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "parser/analyze.h"
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Sun, Oct 18, 2020 at 12:20 PM Tom Lane wrote: > > Alvaro Herrera writes: > > Wait ... what? I've been thinking that this GUC is just to enable or > > disable the computation of query ID, not to change the algorithm to do > > so. Do we really need to allow different algorithms in different > > sessions? > > We established that some time ago, no? I thought we established the need for allowing different algorithms, but I assumed globally not per session. Anyway, allowing to enable or disable compute_queryid per session would technically allow that, assuming that you have another module loaded that computes a queryid only if no-one was already computed. In that case pg_stat_statements works as you would expect, you will get a new entry, with a duplicated query text. With a bit more thinking, there's at least one use case where it's interesting to disable pg_stat_statements: queries using temporary tables. In that case you're guaranteed to generate an infinity of different queryid. That doesn't really help since you're not aggregating anything anymore, and it also makes pg_stat_statements virtually unusable as once you have a workload that needs frequent eviction, the overhead is so bad that you basically have to disable pg_stat_statements. We could alternatively add a GUC to disable queryid computation when one of the tables is a temporary table, but that's yet one among many considerations that are probably best answered with a custom implementation. I'm also attaching an updated patch with some attempt to improve the documentation. I mention that in-core algorithm may not suits everyone's needs, but we don't actually document what heuristics are. Should we give more details on them and what are the most direct consequences? From 4b1f4ed2bfc2917879a33cc1348157f2fffd0cb4 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Mon, 18 Mar 2019 18:55:50 +0100 Subject: [PATCH v14 2/3] Expose queryid in pg_stat_activity and log_line_prefix Similarly to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Also add a %Q placeholder to include the queryid in the log_line_prefix, which will also only expose top level statements. Author: Julien Rouhaud Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- .../pg_stat_statements/pg_stat_statements.c | 112 +++--- doc/src/sgml/config.sgml | 29 +++-- doc/src/sgml/monitoring.sgml | 16 +++ src/backend/catalog/system_views.sql | 1 + src/backend/executor/execMain.c | 8 ++ src/backend/executor/execParallel.c | 14 ++- src/backend/executor/nodeGather.c | 3 +- src/backend/executor/nodeGatherMerge.c| 4 +- src/backend/parser/analyze.c | 5 + src/backend/postmaster/pgstat.c | 65 ++ src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/pgstatfuncs.c | 7 +- src/backend/utils/error/elog.c| 10 +- src/backend/utils/misc/postgresql.conf.sample | 1 + src/backend/utils/misc/queryjumble.c | 29 +++-- src/include/catalog/pg_proc.dat | 6 +- src/include/executor/execParallel.h | 3 +- src/include/pgstat.h | 5 + src/include/utils/queryjumble.h | 2 +- src/test/regress/expected/rules.out | 9 +- 20 files changed, 224 insertions(+), 110 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index f352d0b615..2a69dbb88e 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -65,6 +65,7 @@ #include "tcop/utility.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/queryjumble.h" #include "utils/memutils.h" PG_MODULE_MAGIC; @@ -98,6 +99,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ #define IS_STICKY(c) ((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0) +/* + * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze + * ignores. + */ +#define PGSS_HANDLED_UTILITY(n) (!IsA(n, ExecuteStmt) && \ + !IsA(n, PrepareStmt) && \ + !IsA(n, DeallocateStmt)) + /* * Extension version number, for supporting older extension versions' objects */ @@ -295,7 +304,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, QueryCompletion *qc); -static uint64 pgss_hash_string(const char
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Alvaro Herrera writes: > Wait ... what? I've been thinking that this GUC is just to enable or > disable the computation of query ID, not to change the algorithm to do > so. Do we really need to allow different algorithms in different > sessions? We established that some time ago, no? regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On 2020-Oct-17, Tom Lane wrote: > Fair point, but if we allow several different values to be set in > different sessions, what ends up happening in pg_stat_statements? > > On the other hand, maybe that's just a matter for documentation. > "If the 'same' query is processed with two different queryID settings, > that will generally result in two separate table entries, because > the same ID hash is unlikely to be produced in both cases". Wait ... what? I've been thinking that this GUC is just to enable or disable the computation of query ID, not to change the algorithm to do so. Do we really need to allow different algorithms in different sessions?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Alvaro Herrera writes: > On 2020-Oct-17, Julien Rouhaud wrote: >> On Sat, Oct 17, 2020 at 12:23 AM Tom Lane wrote: >>> then there's a potential security issue if the GUC is USERSET level: >>> a user could hide her queries from pg_stat_statement by turning the >>> GUC off. So this line of thought suggests the GUC needs to be at >>> least SUSET, and maybe higher ... doesn't pg_stat_statement need it >>> to have the same value cluster-wide? > I don't think we should consider pg_stat_statement a bulletproof defense > for security problems. It is already lossy by design. Fair point, but if we allow several different values to be set in different sessions, what ends up happening in pg_stat_statements? On the other hand, maybe that's just a matter for documentation. "If the 'same' query is processed with two different queryID settings, that will generally result in two separate table entries, because the same ID hash is unlikely to be produced in both cases". There is certainly a use-case for wanting to be able to do this, if for example you'd like different query aggregation behavior for different applications. > I do think it'd be preferrable if we allowed it to be disabled at the > config file level only, not with SET (prevent users from hiding stuff); > but I think it is useful to allow users to enable it for specific > queries or for specific sessions only, while globally disabled. Indeed. I'm kind of talking myself into the idea that USERSET, or at most SUSET, is fine, so long as we document what happens when it has different values in different sessions. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On 2020-Oct-17, Julien Rouhaud wrote: > On Sat, Oct 17, 2020 at 12:23 AM Tom Lane wrote: > > then there's a potential security issue if the GUC is USERSET level: > > a user could hide her queries from pg_stat_statement by turning the > > GUC off. So this line of thought suggests the GUC needs to be at > > least SUSET, and maybe higher ... doesn't pg_stat_statement need it > > to have the same value cluster-wide? > > Well, I don't think that there's any guarantee that pg_stat_statemens > will display all activity that has been run, since there's a limited > amount of (userid, dbid, queryid) that can be stored, but I agree that > allowing random user to hide their activity isn't nice. Note that I > defined the GUC as SUSET, but maybe it should be SIGHUP? I don't think we should consider pg_stat_statement a bulletproof defense for security problems. It is already lossy by design. I do think it'd be preferrable if we allowed it to be disabled at the config file level only, not with SET (prevent users from hiding stuff); but I think it is useful to allow users to enable it for specific queries or for specific sessions only, while globally disabled. This might mean we need to mark it PGC_SIGHUP and then have the check hook disallow it from being changed under such-and-such conditions.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Oct 16, 2020 at 11:04 PM Bruce Momjian wrote: > > On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote: > > On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian wrote: > > > There is that, and log_line_prefix, which I can imaging being useful. > > > My point is that if the queryid is visible, there should be a reason it > > > defaults to show empty. > > > > I did some naive benchmarking. Using a custom pgbench script with this > > query: > > > > SELECT * > > FROM pg_class c > > JOIN pg_attribute a ON a.attrelid = c.oid > > ORDER BY 1 DESC > > LIMIT 1; > > > > I can see around 2% overhead (this query is reported with ~ 3ms > > latency average). Adding a few joins, overhead goes down to 1%. > > That number is too high to enable this by default. I suggest we either > improve the performance of this, or clearly document that you have to > enable the hash computation to see the pg_stat_activity and > log_line_prefix fields. I realize that I didn't update the documentation part to reflect the new GUC. I'll fix that and add more warnings about the requirements to have values displayed in pg_stat_acitivity and log_line_prefix.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Sat, Oct 17, 2020 at 12:23 AM Tom Lane wrote: > > Alvaro Herrera writes: > > In this case, I suppose using pg_stat_statement would require to have it > > enabled, and it'd just not collect anything if disabled. Yes, my idea was to be able to have pg_stat_statements enabled even if no queryid is computed without that being a problem, and the patch I sent should handle that properly, as pgss_store (and a few other places) check for a non-zero queryid before doing any work. Also, we can't have pg_stat_statements have any specific behavior based on the new GUC, as there could alternatively be another module that handles the queryid generation. > Alternatively, pg_stat_statement might be able to force it on > (applying a non-overridable PGC_INTERNAL-level setting) on load? > Not sure if that'd be desirable or not. > > If the behavior of pg_stat_statement is to do nothing when it > sees a query without the ID calculated (which I guess it'd have to) Yes that's what it does. > then there's a potential security issue if the GUC is USERSET level: > a user could hide her queries from pg_stat_statement by turning the > GUC off. So this line of thought suggests the GUC needs to be at > least SUSET, and maybe higher ... doesn't pg_stat_statement need it > to have the same value cluster-wide? Well, I don't think that there's any guarantee that pg_stat_statemens will display all activity that has been run, since there's a limited amount of (userid, dbid, queryid) that can be stored, but I agree that allowing random user to hide their activity isn't nice. Note that I defined the GUC as SUSET, but maybe it should be SIGHUP?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Oct 16, 2020 at 01:03:55PM -0300, Álvaro Herrera wrote: > On 2020-Oct-16, Bruce Momjian wrote: > > > On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote: > > > > I did some naive benchmarking. Using a custom pgbench script with this > > > query: > > > > I can see around 2% overhead (this query is reported with ~ 3ms > > > latency average). Adding a few joins, overhead goes down to 1%. > > > > That number is too high to enable this by default. I suggest we either > > improve the performance of this, or clearly document that you have to > > enable the hash computation to see the pg_stat_activity and > > log_line_prefix fields. > > Agreed. This is similar to how we used to deal with query strings: an > optional feature, disabled by default (cf. commit b13c9686d084). > > In this case, I suppose using pg_stat_statement would require to have it > enabled, and it'd just not collect anything if disabled. Similarly, the > field would show NULL in pg_stat_activity or an empty string in > log_line_prefix/CSV logs. Yes, and at each use point, e.g., pg_stat_activity, log_line_prefix, we have to remind people how to turn hash compuation on. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Alvaro Herrera writes: > In this case, I suppose using pg_stat_statement would require to have it > enabled, and it'd just not collect anything if disabled. Alternatively, pg_stat_statement might be able to force it on (applying a non-overridable PGC_INTERNAL-level setting) on load? Not sure if that'd be desirable or not. If the behavior of pg_stat_statement is to do nothing when it sees a query without the ID calculated (which I guess it'd have to) then there's a potential security issue if the GUC is USERSET level: a user could hide her queries from pg_stat_statement by turning the GUC off. So this line of thought suggests the GUC needs to be at least SUSET, and maybe higher ... doesn't pg_stat_statement need it to have the same value cluster-wide? regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On 2020-Oct-16, Bruce Momjian wrote: > On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote: > > I did some naive benchmarking. Using a custom pgbench script with this > > query: > > I can see around 2% overhead (this query is reported with ~ 3ms > > latency average). Adding a few joins, overhead goes down to 1%. > > That number is too high to enable this by default. I suggest we either > improve the performance of this, or clearly document that you have to > enable the hash computation to see the pg_stat_activity and > log_line_prefix fields. Agreed. This is similar to how we used to deal with query strings: an optional feature, disabled by default (cf. commit b13c9686d084). In this case, I suppose using pg_stat_statement would require to have it enabled, and it'd just not collect anything if disabled. Similarly, the field would show NULL in pg_stat_activity or an empty string in log_line_prefix/CSV logs. So users that want it can easily have it, and users that don't are not paying the price. For maximum user-friendliness, pg_stat_statement could be loaded and shmem-initialized even when query ID computation is turned off, and you'd be able to enable query ID computation with just SIGHUP; so you don't have to restart the server in order to enable statement tracking. (I suppose we would forbid users from disabling query ID with SET, though.)
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote: > On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian wrote: > > There is that, and log_line_prefix, which I can imaging being useful. > > My point is that if the queryid is visible, there should be a reason it > > defaults to show empty. > > I did some naive benchmarking. Using a custom pgbench script with this query: > > SELECT * > FROM pg_class c > JOIN pg_attribute a ON a.attrelid = c.oid > ORDER BY 1 DESC > LIMIT 1; > > I can see around 2% overhead (this query is reported with ~ 3ms > latency average). Adding a few joins, overhead goes down to 1%. That number is too high to enable this by default. I suggest we either improve the performance of this, or clearly document that you have to enable the hash computation to see the pg_stat_activity and log_line_prefix fields. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian wrote: > > On Wed, Oct 14, 2020 at 10:34:31PM +0800, Julien Rouhaud wrote: > > On Wed, Oct 14, 2020 at 10:31 PM Tom Lane wrote: > > > > > > Bruce Momjian writes: > > > > Is there a measureable overhead when this is turned on, since it is off > > > > by default and maybe should default to on. > > > > > > I don't believe that "default to on" can even be in the discussion. > > > There is no in-core feature that would use this by default. > > > > If the 2nd patch is applied there would be pg_stat_activity.queryid > > column, but I doubt that's a strong enough argument. > > There is that, and log_line_prefix, which I can imaging being useful. > My point is that if the queryid is visible, there should be a reason it > defaults to show empty. I did some naive benchmarking. Using a custom pgbench script with this query: SELECT * FROM pg_class c JOIN pg_attribute a ON a.attrelid = c.oid ORDER BY 1 DESC LIMIT 1; I can see around 2% overhead (this query is reported with ~ 3ms latency average). Adding a few joins, overhead goes down to 1%. Adding on top of the join some WHERE and GROUP BY conditions, overhead goes down to 0.2% (at that point average latency is around 9ms on my laptop). So having this enabled by default is probably only going to hit people with OLTP-style workload with a majority of queries running in a couple of milliseconds or less, which isn't that uncommon.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Oct 14, 2020 at 10:34:31PM +0800, Julien Rouhaud wrote: > On Wed, Oct 14, 2020 at 10:31 PM Tom Lane wrote: > > > > Bruce Momjian writes: > > > Is there a measureable overhead when this is turned on, since it is off > > > by default and maybe should default to on. > > > > I don't believe that "default to on" can even be in the discussion. > > There is no in-core feature that would use this by default. > > If the 2nd patch is applied there would be pg_stat_activity.queryid > column, but I doubt that's a strong enough argument. There is that, and log_line_prefix, which I can imaging being useful. My point is that if the queryid is visible, there should be a reason it defaults to show empty. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Oct 14, 2020 at 10:31 PM Tom Lane wrote: > > Bruce Momjian writes: > > Is there a measureable overhead when this is turned on, since it is off > > by default and maybe should default to on. > > I don't believe that "default to on" can even be in the discussion. > There is no in-core feature that would use this by default. If the 2nd patch is applied there would be pg_stat_activity.queryid column, but I doubt that's a strong enough argument.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Bruce Momjian writes: > Is there a measureable overhead when this is turned on, since it is off > by default and maybe should default to on. I don't believe that "default to on" can even be in the discussion. There is no in-core feature that would use this by default. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Oct 14, 2020 at 10:21:24PM +0800, Julien Rouhaud wrote: > On Wed, Oct 14, 2020 at 10:09 PM Bruce Momjian wrote: > > > > OK, I came up with the hash idea only to address one of your concerns > > > > about mismatched hashes for algorithm improvements/changes. Seems we > > > > might as well just document that cross-version hashes are different. > > > > > > Ok, so I tried to implement what seems to be the consensus. First > > > attached patch moves the current pgss queryid computation in core, > > > with a new compute_queryid GUC (on/off). One thing I don't really > > > > Why would someone turn compute_queryid off? Overhead? > > Yes, or possibly to use a different algorithm. Is there a measureable overhead when this is turned on, since it is off by default and maybe should default to on. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Oct 14, 2020 at 10:09 PM Bruce Momjian wrote: > > On Wed, Oct 14, 2020 at 05:43:33PM +0800, Julien Rouhaud wrote: > > On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian wrote: > > > > > > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote: > > > > Bruce Momjian writes: > > > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > > > > >> Yeah, I agree --- a version number is the wrong way to think about > > > > >> this. > > > > > > > > > The version number was to invalidate _all_ query hashes if the > > > > > algorithm is slightly modified, rather than invalidating just some of > > > > > them, which could lead to confusion. > > > > > > > > Color me skeptical as to the use-case for that. From users' > > > > standpoints, > > > > the hash is mainly going to change when we change the set of parse node > > > > fields that get hashed. Which is going to happen at every major release > > > > and no (or at least epsilon) minor releases. So I do not see a point in > > > > tracking an algorithm version number as such. Seems like make-work. > > > > > > OK, I came up with the hash idea only to address one of your concerns > > > about mismatched hashes for algorithm improvements/changes. Seems we > > > might as well just document that cross-version hashes are different. > > > > Ok, so I tried to implement what seems to be the consensus. First > > attached patch moves the current pgss queryid computation in core, > > with a new compute_queryid GUC (on/off). One thing I don't really > > Why would someone turn compute_queryid off? Overhead? Yes, or possibly to use a different algorithm.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Oct 14, 2020 at 05:43:33PM +0800, Julien Rouhaud wrote: > On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian wrote: > > > > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote: > > > Bruce Momjian writes: > > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > > > >> Yeah, I agree --- a version number is the wrong way to think about > > > >> this. > > > > > > > The version number was to invalidate _all_ query hashes if the > > > > algorithm is slightly modified, rather than invalidating just some of > > > > them, which could lead to confusion. > > > > > > Color me skeptical as to the use-case for that. From users' standpoints, > > > the hash is mainly going to change when we change the set of parse node > > > fields that get hashed. Which is going to happen at every major release > > > and no (or at least epsilon) minor releases. So I do not see a point in > > > tracking an algorithm version number as such. Seems like make-work. > > > > OK, I came up with the hash idea only to address one of your concerns > > about mismatched hashes for algorithm improvements/changes. Seems we > > might as well just document that cross-version hashes are different. > > Ok, so I tried to implement what seems to be the consensus. First > attached patch moves the current pgss queryid computation in core, > with a new compute_queryid GUC (on/off). One thing I don't really Why would someone turn compute_queryid off? Overhead? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Oct 14, 2020 at 5:43 PM Julien Rouhaud wrote: > > On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian wrote: > > > > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote: > > > Bruce Momjian writes: > > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > > > >> Yeah, I agree --- a version number is the wrong way to think about > > > >> this. > > > > > > > The version number was to invalidate _all_ query hashes if the > > > > algorithm is slightly modified, rather than invalidating just some of > > > > them, which could lead to confusion. > > > > > > Color me skeptical as to the use-case for that. From users' standpoints, > > > the hash is mainly going to change when we change the set of parse node > > > fields that get hashed. Which is going to happen at every major release > > > and no (or at least epsilon) minor releases. So I do not see a point in > > > tracking an algorithm version number as such. Seems like make-work. > > > > OK, I came up with the hash idea only to address one of your concerns > > about mismatched hashes for algorithm improvements/changes. Seems we > > might as well just document that cross-version hashes are different. > > Ok, so I tried to implement what seems to be the consensus. First > attached patch moves the current pgss queryid computation in core, > with a new compute_queryid GUC (on/off). One thing I don't really > like about this patch is that the JumbleState that pgss needs in order > to normalize the query string (the constants location and such) has to > be done by the core while computing the queryid and provided to pgss > in post_parse_analyse hook. That isn't ideal as it looks very > specific to pgss needs. On the other hand it means that you can now > use pgss with custom queryid heuristics by disabling compute_queryid > and having your module doing only that in post_parse_analyse_hook. > You'll however need to be careful to configure > shared_preload_libraries such that your custom module's > post_parse_analyse_hook is called first, so pgss' one can be called > with the needed JumbleState. Note that if no JumbleState is provided > pgss will store non normalized queries, but will otherwise behave as > intended. > > The 2nd patch is the rebased original queryid exposure patch. No big > changes, except that it now handles utility statements queryid > generated during post_parse_analysis, same as regular queries. This > should simplify the work needed for custom queryid third party > modules. > > The 3rd patch changes explain (verbose) to display the queryid if one > has been generated, whether by core or a third-party module. For > instance: > > rjuju=# set compute_queryid = on; > SET > rjuju=# explain (verbose) select relname from pg_class; > QUERY PLAN > --- > Seq Scan on pg_catalog.pg_class (cost=0.00..16.90 rows=390 width=64) >Output: relname > Query Identifier: -5494854185674379299 > (3 rows) There was a possibly uninitialized var issue in the previous patches (thanks cfbot), v13 fixes that. From ee578a9128898d69ff50bf5db59bebf55ed13250 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Mon, 18 Mar 2019 18:55:50 +0100 Subject: [PATCH v13 2/3] Expose queryid in pg_stat_activity and log_line_prefix Similarly to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Also add a %Q placeholder to include the queryid in the log_line_prefix, which will also only expose top level statements. Author: Julien Rouhaud Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- .../pg_stat_statements/pg_stat_statements.c | 112 +++--- doc/src/sgml/config.sgml | 9 +- doc/src/sgml/monitoring.sgml | 15 +++ src/backend/catalog/system_views.sql | 1 + src/backend/executor/execMain.c | 8 ++ src/backend/executor/execParallel.c | 14 ++- src/backend/executor/nodeGather.c | 3 +- src/backend/executor/nodeGatherMerge.c| 4 +- src/backend/parser/analyze.c | 5 + src/backend/postmaster/pgstat.c | 65 ++ src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/pgstatfuncs.c | 7 +- src/backend/utils/error/elog.c| 10 +- src/backend/utils/misc/postgresql.conf.sample | 1 + src/backend/utils/misc/queryjumble.c | 29 +++-- src/include/catalog/pg_proc.dat | 6 +- src/include/executor/execParallel.h | 3 +- src/include/pgstat.h | 5 + src/include/utils/queryjumble.h | 2 +-
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian wrote: > > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > > >> Yeah, I agree --- a version number is the wrong way to think about this. > > > > > The version number was to invalidate _all_ query hashes if the > > > algorithm is slightly modified, rather than invalidating just some of > > > them, which could lead to confusion. > > > > Color me skeptical as to the use-case for that. From users' standpoints, > > the hash is mainly going to change when we change the set of parse node > > fields that get hashed. Which is going to happen at every major release > > and no (or at least epsilon) minor releases. So I do not see a point in > > tracking an algorithm version number as such. Seems like make-work. > > OK, I came up with the hash idea only to address one of your concerns > about mismatched hashes for algorithm improvements/changes. Seems we > might as well just document that cross-version hashes are different. Ok, so I tried to implement what seems to be the consensus. First attached patch moves the current pgss queryid computation in core, with a new compute_queryid GUC (on/off). One thing I don't really like about this patch is that the JumbleState that pgss needs in order to normalize the query string (the constants location and such) has to be done by the core while computing the queryid and provided to pgss in post_parse_analyse hook. That isn't ideal as it looks very specific to pgss needs. On the other hand it means that you can now use pgss with custom queryid heuristics by disabling compute_queryid and having your module doing only that in post_parse_analyse_hook. You'll however need to be careful to configure shared_preload_libraries such that your custom module's post_parse_analyse_hook is called first, so pgss' one can be called with the needed JumbleState. Note that if no JumbleState is provided pgss will store non normalized queries, but will otherwise behave as intended. The 2nd patch is the rebased original queryid exposure patch. No big changes, except that it now handles utility statements queryid generated during post_parse_analysis, same as regular queries. This should simplify the work needed for custom queryid third party modules. The 3rd patch changes explain (verbose) to display the queryid if one has been generated, whether by core or a third-party module. For instance: rjuju=# set compute_queryid = on; SET rjuju=# explain (verbose) select relname from pg_class; QUERY PLAN --- Seq Scan on pg_catalog.pg_class (cost=0.00..16.90 rows=390 width=64) Output: relname Query Identifier: -5494854185674379299 (3 rows) From 4a81289f02e9bfb796317b32d492eb949c9ed4a1 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Sun, 8 Mar 2020 14:34:44 +0100 Subject: [PATCH v12 3/3] Expose query identifier in verbose explain If a query identifier has been computed, either by enabling compute_queryid or using a third-party module, verbose explain will display it. Author: Julien Rouhaud Reviewed-by: Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- src/backend/commands/explain.c| 18 ++ src/test/regress/expected/explain.out | 9 + src/test/regress/sql/explain.sql | 3 +++ 3 files changed, 30 insertions(+) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index c8e292adfa..bb08c18a3a 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -24,6 +24,7 @@ #include "nodes/extensible.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "parser/analyze.h" #include "parser/parsetree.h" #include "rewrite/rewriteHandler.h" #include "storage/bufmgr.h" @@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, { ExplainState *es = NewExplainState(); TupOutputState *tstate; + JumbleState *jstate; + Query *query; List *rewritten; ListCell *lc; bool timing_set = false; @@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, /* if the summary was not set explicitly, set default value */ es->summary = (summary_set) ? es->summary : es->analyze; + query = castNode(Query, stmt->query); + if (compute_queryid) + jstate = JumbleQuery(query, pstate->p_sourcetext); + + if (post_parse_analyze_hook) + (*post_parse_analyze_hook) (pstate, query, jstate); + /* * Parse analysis was done already, but we still have to run the rule * rewriter. We do not do AcquireRewriteLocks: we assume the query either @@ -582,6 +592,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, /* Create textual dump of plan tree */ ExplainPrintPlan(es, queryDesc); + if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > >> Yeah, I agree --- a version number is the wrong way to think about this. > > > The version number was to invalidate _all_ query hashes if the > > algorithm is slightly modified, rather than invalidating just some of > > them, which could lead to confusion. > > Color me skeptical as to the use-case for that. From users' standpoints, > the hash is mainly going to change when we change the set of parse node > fields that get hashed. Which is going to happen at every major release > and no (or at least epsilon) minor releases. So I do not see a point in > tracking an algorithm version number as such. Seems like make-work. OK, I came up with the hash idea only to address one of your concerns about mismatched hashes for algorithm improvements/changes. Seems we might as well just document that cross-version hashes are different. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Bruce Momjian writes: > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: >> Yeah, I agree --- a version number is the wrong way to think about this. > The version number was to invalidate _all_ query hashes if the > algorithm is slightly modified, rather than invalidating just some of > them, which could lead to confusion. Color me skeptical as to the use-case for that. From users' standpoints, the hash is mainly going to change when we change the set of parse node fields that get hashed. Which is going to happen at every major release and no (or at least epsilon) minor releases. So I do not see a point in tracking an algorithm version number as such. Seems like make-work. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > Robert Haas writes: > > I don't really understand how a version number helps. It's not like > > there is going to be a v2 that is in all ways better than v1. If there > > are different algorithms here, they are going to be customized for > > different needs. > > Yeah, I agree --- a version number is the wrong way to think about this. > It's gonna be more like algorithm foo versus algorithm bar versus > algorithm baz, where each one is better for a specific set of use-cases. > Julien already noted the point about hashing object OIDs versus object > names; one can easily imagine disagreeing with pg_stat_statement's > choices about ignoring values of constants; other properties of statements > might be irrelevant for some use-cases; and so on. The version number was to invalidate _all_ query hashes if the algorithm is slightly modified, rather than invalidating just some of them, which could lead to confusion. The idea of selectable hash algorithms is nice if people feel there is sufficient need for that. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Robert Haas writes: > I don't really understand how a version number helps. It's not like > there is going to be a v2 that is in all ways better than v1. If there > are different algorithms here, they are going to be customized for > different needs. Yeah, I agree --- a version number is the wrong way to think about this. It's gonna be more like algorithm foo versus algorithm bar versus algorithm baz, where each one is better for a specific set of use-cases. Julien already noted the point about hashing object OIDs versus object names; one can easily imagine disagreeing with pg_stat_statement's choices about ignoring values of constants; other properties of statements might be irrelevant for some use-cases; and so on. I'm okay with moving pg_stat_statement's existing algorithm into core as long as there's a way for extensions to override it. With proper design, that would allow extensions that do override it to coexist with pg_stat_statements (thereby redefining the latter's idea of which statements are "the same"), which is something that doesn't really work nicely today. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Oct 12, 2020 at 10:14 AM Bruce Momjian wrote: > On Mon, Oct 12, 2020 at 04:20:05PM +0800, Julien Rouhaud wrote: > > But there are many people that aren't happy with the current hashing > > approach. If we're going to move the computation in core, shouldn't > > we listen to their complaints and let them pay some probably quite > > high overhead to base the hash on name and/or fully qualified name > > rather than OID? > > For instance people using logical replication to upgrade to a newer > > version may want to easily compare query performance on the new > > version, or people with multi-tenant databases may want to ignore the > > schema name to keep a low number of different queryid. > > Well, we have to consider how complex the user interface has to be to > allow more flexibility. We don't need to allow every option a user will > want. > > With a version number, we have the ability to improve the algorithm or > add customization, but for the first use, we are probably better off > keeping it simple. I thought your earlier idea of allowing this to be controlled by a GUC was good. There could be a default method built into core, matching what pg_stat_statements does, so you could select no hashing or that method no matter what. Then extensions could provide other methods which could be selected via the GUC. I don't really understand how a version number helps. It's not like there is going to be a v2 that is in all ways better than v1. If there are different algorithms here, they are going to be customized for different needs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Oct 12, 2020 at 04:20:05PM +0800, Julien Rouhaud wrote: > But there are many people that aren't happy with the current hashing > approach. If we're going to move the computation in core, shouldn't > we listen to their complaints and let them pay some probably quite > high overhead to base the hash on name and/or fully qualified name > rather than OID? > For instance people using logical replication to upgrade to a newer > version may want to easily compare query performance on the new > version, or people with multi-tenant databases may want to ignore the > schema name to keep a low number of different queryid. Well, we have to consider how complex the user interface has to be to allow more flexibility. We don't need to allow every option a user will want. With a version number, we have the ability to improve the algorithm or add customization, but for the first use, we are probably better off keeping it simple. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Oct 7, 2020 at 9:53 AM Bruce Momjian wrote: > > On Wed, Oct 7, 2020 at 10:42:49AM +0900, Michael Paquier wrote: > > On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote: > > > I propose moving the pg_stat_statements queryid hash code into the > > > server (with a version number), and also adding a postgresql.conf > > > variable that lets you control how detailed the queryid hash is > > > computed. This addresses the problem of people wanting different hash > > > methods. > > > > In terms of making this part expendable in the future, there could be > > a point in having an enum here, but are we sure that we will have a > > need for that in the future? What I get from this discussion is that > > we want a unique source of truth that users can consume, and that the > > only source of truth proposed is the PGSS hashing. We may change the > > way we compute the query ID in the future, for example if it gets > > expanded to some utility statements, etc. But that would be > > controlled by the version number in the hash, not the GUC itself. > > Oh, if that is true, then I agree let's just go with the version number. But there are many people that aren't happy with the current hashing approach. If we're going to move the computation in core, shouldn't we listen to their complaints and let them pay some probably quite high overhead to base the hash on name and/or fully qualified name rather than OID? For instance people using logical replication to upgrade to a newer version may want to easily compare query performance on the new version, or people with multi-tenant databases may want to ignore the schema name to keep a low number of different queryid. It would probably still be possible to have a custom queryid hashing by disabling the core one and computing a new one in a custom extension, but that seems a bit hackish. Jumping back on Tom's point that there are judgment calls on what is examined or not, after a quick look I see at least two possible problems of ignored clauses: - WITH TIES clause - OVERRIDING clause I personally think that they shouldn't be ignored, but I don't know if they were only forgotten or ignored on purpose.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Oct 7, 2020 at 10:42:49AM +0900, Michael Paquier wrote: > On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote: > > I propose moving the pg_stat_statements queryid hash code into the > > server (with a version number), and also adding a postgresql.conf > > variable that lets you control how detailed the queryid hash is > > computed. This addresses the problem of people wanting different hash > > methods. > > In terms of making this part expendable in the future, there could be > a point in having an enum here, but are we sure that we will have a > need for that in the future? What I get from this discussion is that > we want a unique source of truth that users can consume, and that the > only source of truth proposed is the PGSS hashing. We may change the > way we compute the query ID in the future, for example if it gets > expanded to some utility statements, etc. But that would be > controlled by the version number in the hash, not the GUC itself. Oh, if that is true, then I agree let's just go with the version number. > > When computing a hash, the queryid detail level and version number will > > be mixed into the hash, so only a hash that used a similar query and > > identical queryid detail level would match. > > Yes, having a version number directly dependent on the hashing sounds > like a good compromise to me. Good, much simpler. I think there is enough demand for a queryid that I would like to get this moving forward. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote: > I propose moving the pg_stat_statements queryid hash code into the > server (with a version number), and also adding a postgresql.conf > variable that lets you control how detailed the queryid hash is > computed. This addresses the problem of people wanting different hash > methods. In terms of making this part expendable in the future, there could be a point in having an enum here, but are we sure that we will have a need for that in the future? What I get from this discussion is that we want a unique source of truth that users can consume, and that the only source of truth proposed is the PGSS hashing. We may change the way we compute the query ID in the future, for example if it gets expanded to some utility statements, etc. But that would be controlled by the version number in the hash, not the GUC itself. > When computing a hash, the queryid detail level and version number will > be mixed into the hash, so only a hash that used a similar query and > identical queryid detail level would match. Yes, having a version number directly dependent on the hashing sounds like a good compromise to me. -- Michael signature.asc Description: PGP signature
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Oct 6, 2020 at 02:34:58PM +0900, Michael Paquier wrote: > On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote: > > On Tue, Oct 6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote: > >> Maybe we could add a new hook for only queryid computation, and add a > >> GUC to let people choose between no queryid computed, core computation > >> (current pg_stat_statement) and 3rd party plugin? > > > > That all seems very complicated. If we go in that direction, I suggest > > we just give up getting any of this into core. > > A GUC would have at least the advantage to make the computation > consistent for any system willing to consume it, with the option to > not pay any potential performance impact, though I have to admit that > just moving the query ID computation of PGSS into core may not be the > best option as a query ID of 0 means the same thing for a utility, for > an initialization, and for a backend running a query with an unknown > value, but that could be worked out. > > FWIW, I think that adding the system ID in the hash is too > restrictive, as it could be interesting for users to do stat > comparisons across multiple systems running the same major version. > It would be better to not give any strong guarantee that the query ID > computed will remain consistent across major versions so as it is > possible to keep improving it. Also, if nothing has been done that > changes the hashing computation, I see little benefit in forcing a > breakage by adding something like PG_MAJORVERSION_NUM or such in the > hash computation. I thought some more about this. First, I think having the queryid hash code in the server, without requiring pg_stat_statements, is a requirement --- I think too many people will want to use this feature independent of pg_stat_statements. Second, I understand the desire to have different hash computation methods, depending on what level of detail/matching you want. I propose moving the pg_stat_statements queryid hash code into the server (with a version number), and also adding a postgressql.conf variable that lets you control how detailed the queryid hash is computed. This addresses the problem of people wanting different hash methods. When computing a hash, the queryid detail level and version number will be mixed into the hash, so only a hash that used a similar query and identical queryid detail level would match. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote: > On Tue, Oct 6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote: >> Maybe we could add a new hook for only queryid computation, and add a >> GUC to let people choose between no queryid computed, core computation >> (current pg_stat_statement) and 3rd party plugin? > > That all seems very complicated. If we go in that direction, I suggest > we just give up getting any of this into core. A GUC would have at least the advantage to make the computation consistent for any system willing to consume it, with the option to not pay any potential performance impact, though I have to admit that just moving the query ID computation of PGSS into core may not be the best option as a query ID of 0 means the same thing for a utility, for an initialization, and for a backend running a query with an unknown value, but that could be worked out. FWIW, I think that adding the system ID in the hash is too restrictive, as it could be interesting for users to do stat comparisons across multiple systems running the same major version. It would be better to not give any strong guarantee that the query ID computed will remain consistent across major versions so as it is possible to keep improving it. Also, if nothing has been done that changes the hashing computation, I see little benefit in forcing a breakage by adding something like PG_MAJORVERSION_NUM or such in the hash computation. -- Michael signature.asc Description: PGP signature
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Oct 05, 2020 at 05:24:06PM -0400, Bruce Momjian wrote: > (Also, did we decide _not_ to make the pg_stat_statements queryid > always a positive value?) This specific point has been discussed a couple of years ago, please see cff440d and its related thread: https://www.postgresql.org/message-id/ca+tgmobg_kp4cbkfmsznuaam1gww6hhrnizc0kjrmooeynz...@mail.gmail.com -- Michael signature.asc Description: PGP signature
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Oct 6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote: > > I do think the queryid has to display independent of pg_stat_statements, > > because I can see people using queryid for log file and pg_stat_activity > > comparisons. I also think the ability to have queryid accessible is an > > important feature outside of pg_stat_statements, so I do think we need a > > way to move this idea forward. > > For the record, for now any extension can compute a queryid and there > are at least 2 other published extensions that already do that, one of > them having different semantics on how to compute the queryid. I'm > not sure that we'll ever get a consensus on those semantics due to > performance tradeoff, so removing the ability to let people put their > own code for that doesn't seem like the best way forward. > > Maybe we could add a new hook for only queryid computation, and add a > GUC to let people choose between no queryid computed, core computation > (current pg_stat_statement) and 3rd party plugin? That all seems very complicated. If we go in that direction, I suggest we just give up getting any of this into core. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Oct 6, 2020 at 10:18 AM Bruce Momjian wrote: > > On Mon, Oct 5, 2020 at 07:58:42PM -0300, Álvaro Herrera wrote: > > On 2020-Oct-05, Tom Lane wrote: > > > > > FWIW, I think this proposal is a mess. I was willing to hold my nose > > > and have a queryId field in the internal Query struct without any solid > > > consensus about what its semantics are and which extensions get to use it. > > > Exposing it to end users seems like a bridge too far, though. In > > > particular, I'm afraid that that will cause people to expect it to have > > > consistent values across PG versions, or even just across architectures > > > within one version. > > > > I wonder if it would help to purposefully change the computation so that > > it is not -- for instance, hash the system_identifier as initial value. > > Then users would be forced to accept that it'll change as soon as it > > migrates to another server or is upgraded to a new major version. > > That seems like a good idea, but it would prevent cross-cluster > same-major-version comparisons, which seems like a negative. Perhaps we > should add the major version into the hash to handle this. Ideally, > let's just put a queryid-hash-version into to the hash, so if we change > the computation, we just update the hash version and nothing matches > anymore. > > I do think the queryid has to display independent of pg_stat_statements, > because I can see people using queryid for log file and pg_stat_activity > comparisons. I also think the ability to have queryid accessible is an > important feature outside of pg_stat_statements, so I do think we need a > way to move this idea forward. For the record, for now any extension can compute a queryid and there are at least 2 other published extensions that already do that, one of them having different semantics on how to compute the queryid. I'm not sure that we'll ever get a consensus on those semantics due to performance tradeoff, so removing the ability to let people put their own code for that doesn't seem like the best way forward. Maybe we could add a new hook for only queryid computation, and add a GUC to let people choose between no queryid computed, core computation (current pg_stat_statement) and 3rd party plugin?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Oct 5, 2020 at 07:58:42PM -0300, Álvaro Herrera wrote: > On 2020-Oct-05, Tom Lane wrote: > > > FWIW, I think this proposal is a mess. I was willing to hold my nose > > and have a queryId field in the internal Query struct without any solid > > consensus about what its semantics are and which extensions get to use it. > > Exposing it to end users seems like a bridge too far, though. In > > particular, I'm afraid that that will cause people to expect it to have > > consistent values across PG versions, or even just across architectures > > within one version. > > I wonder if it would help to purposefully change the computation so that > it is not -- for instance, hash the system_identifier as initial value. > Then users would be forced to accept that it'll change as soon as it > migrates to another server or is upgraded to a new major version. That seems like a good idea, but it would prevent cross-cluster same-major-version comparisons, which seems like a negative. Perhaps we should add the major version into the hash to handle this. Ideally, let's just put a queryid-hash-version into to the hash, so if we change the computation, we just update the hash version and nothing matches anymore. I do think the queryid has to display independent of pg_stat_statements, because I can see people using queryid for log file and pg_stat_activity comparisons. I also think the ability to have queryid accessible is an important feature outside of pg_stat_statements, so I do think we need a way to move this idea forward. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On 2020-Oct-05, Tom Lane wrote: > FWIW, I think this proposal is a mess. I was willing to hold my nose > and have a queryId field in the internal Query struct without any solid > consensus about what its semantics are and which extensions get to use it. > Exposing it to end users seems like a bridge too far, though. In > particular, I'm afraid that that will cause people to expect it to have > consistent values across PG versions, or even just across architectures > within one version. I wonder if it would help to purposefully change the computation so that it is not -- for instance, hash the system_identifier as initial value. Then users would be forced to accept that it'll change as soon as it migrates to another server or is upgraded to a new major version.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Bruce Momjian writes: > I would like to apply this patch (I know it has been in the commitfest > since July 2019), but I have some questions about the user API. Does it > make sense to have a column in pg_stat_actvity and an option in > log_line_prefix that will be empty unless pg_stat_statements is > installed? Is there no clean way to move the query hash computation out > of pg_stat_statements and into the main code so the query id is always > visible? (Also, did we decide _not_ to make the pg_stat_statements > queryid always a positive value?) FWIW, I think this proposal is a mess. I was willing to hold my nose and have a queryId field in the internal Query struct without any solid consensus about what its semantics are and which extensions get to use it. Exposing it to end users seems like a bridge too far, though. In particular, I'm afraid that that will cause people to expect it to have consistent values across PG versions, or even just across architectures within one version. The larger picture here is that there's lots of room to doubt whether pg_stat_statements' decisions about what to ignore or include in the ID will be satisfactory to everybody. If that were not so, we'd just move the computation into core. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Aug 19, 2020 at 04:19:30PM +0200, Julien Rouhaud wrote: > Similarly to other fields in pg_stat_activity, only the queryid from the top > level statements are exposed, and if the backends status isn't active then the > queryid from the last executed statements is displayed. > > Also add a %Q placeholder to include the queryid in the log_line_prefix, which > will also only expose top level statements. I would like to apply this patch (I know it has been in the commitfest since July 2019), but I have some questions about the user API. Does it make sense to have a column in pg_stat_actvity and an option in log_line_prefix that will be empty unless pg_stat_statements is installed? Is there no clean way to move the query hash computation out of pg_stat_statements and into the main code so the query id is always visible? (Also, did we decide _not_ to make the pg_stat_statements queryid always a positive value?) Also, in the doc patch: By default, query identifiers are not computed, so this field will always be null, unless an additional module that compute query identifiers, such as , is configured. why are you saying "such as"? Isn't pg_stat_statements the only way to see the queryid? This command allowed the queryid to be displayed in pg_stat_activity: ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements'; -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Jul 28, 2020 at 10:55:04AM +0200, Julien Rouhaud wrote: > On Tue, Jul 28, 2020 at 10:07 AM torikoshia > wrote: > > > > Thanks for updating! > > I tested the patch setting log_statement = 'all', but %Q in > > log_line_prefix > > was always 0 even when pg_stat_statements.queryid and > > pg_stat_activity.queryid are not 0. > > > > Is this an intentional behavior? > > > >[...] > > Thanks for the tests! That's indeed an expected behavior (although I > wasn't aware of it), which isn't documented in this patch (I'll fix > it). The reason for that is that log_statements is done right after > parsing the query: > > /* > * Do basic parsing of the query or queries (this should be safe even if > * we are in aborted transaction state!) > */ > parsetree_list = pg_parse_query(query_string); > > /* Log immediately if dictated by log_statement */ > if (check_log_statement(parsetree_list)) > { > ereport(LOG, > (errmsg("statement: %s", query_string), > errhidestmt(true), > errdetail_execute(parsetree_list))); > was_logged = true; > } > > As parse analysis is not yet done, no queryid can be computed at that > point, so we always print 0. That's a limitation that can't be > removed without changing the semantics of log_statements, so we'll > probably have to live with it. > > > And here is a minor typo. > > optionnally -> optionally > > > > > > > 753 + /* query identifier, optionnally computed using > > > post_parse_analyze_hook */ > > Thanks, I fixed it locally! Recent conflict, rebased v11 attached. >From 473d038a1b447d4569709c3a499fc7356af76452 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Mon, 18 Mar 2019 18:55:50 +0100 Subject: [PATCH v11] Expose queryid in pg_stat_activity and log_line_prefix Similarly to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Also add a %Q placeholder to include the queryid in the log_line_prefix, which will also only expose top level statements. Author: Julien Rouhaud Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- .../pg_stat_statements/pg_stat_statements.c | 179 -- doc/src/sgml/config.sgml | 9 +- doc/src/sgml/monitoring.sgml | 15 ++ src/backend/catalog/system_views.sql | 1 + src/backend/executor/execMain.c | 8 + src/backend/executor/execParallel.c | 14 +- src/backend/executor/nodeGather.c | 3 +- src/backend/executor/nodeGatherMerge.c| 4 +- src/backend/parser/analyze.c | 5 + src/backend/postmaster/pgstat.c | 65 +++ src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/pgstatfuncs.c | 7 +- src/backend/utils/error/elog.c| 10 +- src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/catalog/pg_proc.dat | 6 +- src/include/executor/execParallel.h | 3 +- src/include/pgstat.h | 5 + src/test/regress/expected/rules.out | 9 +- 18 files changed, 270 insertions(+), 79 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 6b91c62c31..486d07f9de 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -115,6 +115,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define JUMBLE_SIZE1024 /* query serialization buffer size */ +/* + * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze + * ignores. + */ +#define PGSS_HANDLED_UTILITY(n) (!IsA(n, ExecuteStmt) && \ + !IsA(n, PrepareStmt) && \ + !IsA(n, DeallocateStmt)) + /* * Extension version number, for supporting older extension versions' objects */ @@ -345,7 +353,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, QueryCompletion *qc); -static uint64 pgss_hash_string(const char *str, int len); +static const char *pgss_clean_querytext(const char *query, int *location, int *len); +static uint64 pgss_compute_utility_queryid(const char *query, int query_len); static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, pgssStoreKind kind, @@ -845,16 +854,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query) return; /* - * Utility statements get queryId zero. We do this even in cases where - * the statement
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Jul 28, 2020 at 10:07 AM torikoshia wrote: > > Thanks for updating! > I tested the patch setting log_statement = 'all', but %Q in > log_line_prefix > was always 0 even when pg_stat_statements.queryid and > pg_stat_activity.queryid are not 0. > > Is this an intentional behavior? > >[...] Thanks for the tests! That's indeed an expected behavior (although I wasn't aware of it), which isn't documented in this patch (I'll fix it). The reason for that is that log_statements is done right after parsing the query: /* * Do basic parsing of the query or queries (this should be safe even if * we are in aborted transaction state!) */ parsetree_list = pg_parse_query(query_string); /* Log immediately if dictated by log_statement */ if (check_log_statement(parsetree_list)) { ereport(LOG, (errmsg("statement: %s", query_string), errhidestmt(true), errdetail_execute(parsetree_list))); was_logged = true; } As parse analysis is not yet done, no queryid can be computed at that point, so we always print 0. That's a limitation that can't be removed without changing the semantics of log_statements, so we'll probably have to live with it. > And here is a minor typo. > optionnally -> optionally > > > > 753 + /* query identifier, optionnally computed using > > post_parse_analyze_hook */ Thanks, I fixed it locally!
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On 2020-07-14 20:24, Julien Rouhaud wrote: On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote: Hi, v9 patch fails to apply to HEAD, could you check and rebase it? Thanks for the notice, v10 attached! And here are minor typos. 79 +* utility statements. Note that we don't compute a queryId for prepared 80 +* statemets related utility, as those will inherit from the underlying 81 +* statements's one (except DEALLOCATE which is entirely untracked). statemets -> statements statements's -> statements' or statement's? Thanks! I went with "statement's". Thanks for updating! I tested the patch setting log_statement = 'all', but %Q in log_line_prefix was always 0 even when pg_stat_statements.queryid and pg_stat_activity.queryid are not 0. Is this an intentional behavior? ``` $ initdb --no-locale -D data $ edit postgresql.conf shared_preload_libraries = 'pg_stat_statements' logging_collector = on log_line_prefix = '%m [%p] queryid:%Q ' log_statement = 'all' $ pg_ctl start -D data $ psql =# CREATE EXTENSION pg_stat_statements; =# CREATE TABLE t1 (i int); =# INSERT INTO t1 VALUES (0),(1); =# SELECT queryid, query FROM pg_stat_activity; -- query ids are all 0 on the log $ view log 2020-07-28 15:57:58.475 EDT [4480] queryid:0 LOG: statement: CREATE TABLE t1 (i int); 2020-07-28 15:58:13.730 EDT [4480] queryid:0 LOG: statement: INSERT INTO t1 VALUES (0),(1); 2020-07-28 15:59:28.389 EDT [4480] queryid:0 LOG: statement: SELECT * FROM t1; -- on pg_stat_activity and pgss, query ids are not 0 $ psql =# SELECT queryid, query FROM pg_stat_activity WHERE query LIKE '%t1%'; queryid|query --+-- 1109063694563750779 | SELECT * FROM t1; -2582225123719476948 | SELECT queryid, query FROM pg_stat_activity WHERE query LIKE '%t1%'; (2 rows) =# SELECT queryid, query FROM pg_stat_statements WHERE query LIKE '%t1%'; queryid| query --+- -5028988130796701553 | CREATE TABLE t1 (i int) 1109063694563750779 | SELECT * FROM t1 2726469050076420724 | INSERT INTO t1 VALUES ($1),($2) ``` And here is a minor typo. optionnally -> optionally 753 + /* query identifier, optionnally computed using post_parse_analyze_hook */ Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote: > Hi, > > v9 patch fails to apply to HEAD, could you check and rebase it? Thanks for the notice, v10 attached! > And here are minor typos. > > 79 +* utility statements. Note that we don't compute a queryId > for prepared > 80 +* statemets related utility, as those will inherit from the > underlying > 81 +* statements's one (except DEALLOCATE which is entirely > untracked). > > statemets -> statements > statements's -> statements' or statement's? Thanks! I went with "statement's". >From 8c651ee05c8a5e55966ad1646f48e83941d3776a Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Mon, 18 Mar 2019 18:55:50 +0100 Subject: [PATCH v10] Expose queryid in pg_stat_activity and log_line_prefix Similarly to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Also add a %Q placeholder to include the queryid in the log_line_prefix, which will also only expose top level statements. Author: Julien Rouhaud Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- .../pg_stat_statements/pg_stat_statements.c | 179 -- doc/src/sgml/config.sgml | 9 +- doc/src/sgml/monitoring.sgml | 15 ++ src/backend/catalog/system_views.sql | 1 + src/backend/executor/execMain.c | 8 + src/backend/executor/execParallel.c | 14 +- src/backend/executor/nodeGather.c | 3 +- src/backend/executor/nodeGatherMerge.c| 4 +- src/backend/parser/analyze.c | 5 + src/backend/postmaster/pgstat.c | 65 +++ src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/pgstatfuncs.c | 7 +- src/backend/utils/error/elog.c| 10 +- src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/catalog/pg_proc.dat | 6 +- src/include/executor/execParallel.h | 3 +- src/include/pgstat.h | 5 + src/test/regress/expected/rules.out | 9 +- 18 files changed, 270 insertions(+), 79 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 14cad19afb..a51c207b49 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -115,6 +115,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define JUMBLE_SIZE1024 /* query serialization buffer size */ +/* + * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze + * ignores. + */ +#define PGSS_HANDLED_UTILITY(n) (!IsA(n, ExecuteStmt) && \ + !IsA(n, PrepareStmt) && \ + !IsA(n, DeallocateStmt)) + /* * Extension version number, for supporting older extension versions' objects */ @@ -345,7 +353,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, QueryCompletion *qc); -static uint64 pgss_hash_string(const char *str, int len); +static const char *pgss_clean_querytext(const char *query, int *location, int *len); +static uint64 pgss_compute_utility_queryid(const char *query, int query_len); static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, pgssStoreKind kind, @@ -845,16 +854,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query) return; /* - * Utility statements get queryId zero. We do this even in cases where - * the statement contains an optimizable statement for which a queryId - * could be derived (such as EXPLAIN or DECLARE CURSOR). For such cases, - * runtime control will first go through ProcessUtility and then the - * executor, and we don't want the executor hooks to do anything, since we - * are already measuring the statement's costs at the utility level. + * We compute a queryId now so that it can get exported in out + * PgBackendStatus. pgss_ProcessUtility will later discard it to prevents + * double counting of optimizable statements that are directly contained in + * utility statements. Note that we don't compute a queryId for prepared + * statements related utility, as those will inherit from the underlying + * statement's one (except DEALLOCATE which is entirely untracked). */ if (query->utilityStmt) { - query->queryId = UINT64CONST(0); + if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt) + && pstate->p_sourcetext) + { + const char *querytext = pstate->p_sourcetext; + int query_location =
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hi, v9 patch fails to apply to HEAD, could you check and rebase it? And here are minor typos. 79 +* utility statements. Note that we don't compute a queryId for prepared 80 +* statemets related utility, as those will inherit from the underlying 81 +* statements's one (except DEALLOCATE which is entirely untracked). statemets -> statements statements's -> statements' or statement's? Regards, -- Atsushi Torikoshi On Wed, Apr 8, 2020 at 11:38 PM Julien Rouhaud wrote: > On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada > wrote: > > > > Hi Julien, > > > > On 2020/04/02 22:25, Julien Rouhaud wrote: > > > New conflict, rebased v9 attached. > > > > I tested the patch on the head (c7654f6a3) and > > the result was fine. See below: > > > > $ make installcheck-world > > = > > All 1 tests passed. > > = > > Thanks Yamada-san! Unfortunately this patch still didn't attract any > committer, so I moved it to the next commitfest. > > >
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada wrote: > > Hi Julien, > > On 2020/04/02 22:25, Julien Rouhaud wrote: > > New conflict, rebased v9 attached. > > I tested the patch on the head (c7654f6a3) and > the result was fine. See below: > > $ make installcheck-world > = > All 1 tests passed. > = Thanks Yamada-san! Unfortunately this patch still didn't attract any committer, so I moved it to the next commitfest.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hi Julien, On 2020/04/02 22:25, Julien Rouhaud wrote: New conflict, rebased v9 attached. I tested the patch on the head (c7654f6a3) and the result was fine. See below: $ make installcheck-world = All 1 tests passed. = Regards, Tatsuro Yamada
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
New conflict, rebased v9 attached. >From 26b98194d8add282158c65f6ac46c721ba80f498 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Mon, 18 Mar 2019 18:55:50 +0100 Subject: [PATCH v9 1/2] Expose queryid in pg_stat_activity and log_line_prefix Similarly to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Also add a %Q placeholder to include the queryid in the log_line_prefix, which will also only expose top level statements. Author: Julien Rouhaud Reviewed-by: Evgeny Efimkin, Michael Paquier Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- .../pg_stat_statements/pg_stat_statements.c | 179 -- doc/src/sgml/config.sgml | 9 +- doc/src/sgml/monitoring.sgml | 12 ++ src/backend/catalog/system_views.sql | 1 + src/backend/executor/execMain.c | 8 + src/backend/executor/execParallel.c | 14 +- src/backend/executor/nodeGather.c | 3 +- src/backend/executor/nodeGatherMerge.c| 4 +- src/backend/parser/analyze.c | 5 + src/backend/postmaster/pgstat.c | 65 +++ src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/pgstatfuncs.c | 7 +- src/backend/utils/error/elog.c| 10 +- src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/catalog/pg_proc.dat | 6 +- src/include/executor/execParallel.h | 3 +- src/include/pgstat.h | 5 + src/test/regress/expected/rules.out | 9 +- 18 files changed, 267 insertions(+), 79 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 942922b01f..4073361f4c 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -115,6 +115,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define JUMBLE_SIZE1024 /* query serialization buffer size */ +/* + * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze + * ignores. + */ +#define PGSS_HANDLED_UTILITY(n) (!IsA(n, ExecuteStmt) && \ + !IsA(n, PrepareStmt) && \ + !IsA(n, DeallocateStmt)) + /* * Extension version number, for supporting older extension versions' objects */ @@ -342,7 +350,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, QueryCompletion *qc); -static uint64 pgss_hash_string(const char *str, int len); +static const char *pgss_clean_querytext(const char *query, int *location, int *len); +static uint64 pgss_compute_utility_queryid(const char *query, int query_len); static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, pgssStoreKind kind, @@ -841,16 +850,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query) return; /* - * Utility statements get queryId zero. We do this even in cases where - * the statement contains an optimizable statement for which a queryId - * could be derived (such as EXPLAIN or DECLARE CURSOR). For such cases, - * runtime control will first go through ProcessUtility and then the - * executor, and we don't want the executor hooks to do anything, since we - * are already measuring the statement's costs at the utility level. + * We compute a queryId now so that it can get exported in out + * PgBackendStatus. pgss_ProcessUtility will later discard it to prevents + * double counting of optimizable statements that are directly contained in + * utility statements. Note that we don't compute a queryId for prepared + * statemets related utility, as those will inherit from the underlying + * statements's one (except DEALLOCATE which is entirely untracked). */ if (query->utilityStmt) { - query->queryId = UINT64CONST(0); + if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt) + && pstate->p_sourcetext) + { + const char *querytext = pstate->p_sourcetext; + int query_location = query->stmt_location; + int query_len = query->stmt_len; + + /* + * Confine our attention to the relevant part of the string, if the + * query is a portion of a multi-statement source string. + */ + querytext = pgss_clean_querytext(pstate->p_sourcetext, + _location, + _len); + + query->queryId = pgss_compute_utility_queryid(querytext, query_len); + } + else + query->queryId = UINT64CONST(0); return; } @@ -1098,6 +1125,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, DestReceiver *dest, QueryCompletion *qc) {
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Sat, Mar 14, 2020 at 06:53:51PM +0100, Julien Rouhaud wrote: > On Tue, Mar 03, 2020 at 04:24:59PM +0100, Julien Rouhaud wrote: > > > > cfbot reports a failure since 2f9661311b (command completion tag > > change), so here's a rebased v6, no change otherwise. > > > Conflict with 8e8a0becb3 (Unify several ways to tracking backend type), thanks > again to cfbot, rebased v7 attached. Bit repetita. >From 87be2c545e32c0c08a410949d5c5d383a4162af3 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Mon, 18 Mar 2019 18:55:50 +0100 Subject: [PATCH v8 1/2] Expose queryid in pg_stat_activity and log_line_prefix Similarly to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Also add a %Q placeholder to include the queryid in the log_line_prefix, which will also only expose top level statements. Author: Julien Rouhaud Reviewed-by: Evgeny Efimkin, Michael Paquier Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- .../pg_stat_statements/pg_stat_statements.c | 179 -- doc/src/sgml/config.sgml | 9 +- doc/src/sgml/monitoring.sgml | 12 ++ src/backend/catalog/system_views.sql | 1 + src/backend/executor/execMain.c | 8 + src/backend/executor/execParallel.c | 14 +- src/backend/executor/nodeGather.c | 3 +- src/backend/executor/nodeGatherMerge.c| 4 +- src/backend/parser/analyze.c | 5 + src/backend/postmaster/pgstat.c | 65 +++ src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/pgstatfuncs.c | 7 +- src/backend/utils/error/elog.c| 10 +- src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/catalog/pg_proc.dat | 6 +- src/include/executor/execParallel.h | 3 +- src/include/pgstat.h | 5 + src/test/regress/expected/rules.out | 9 +- 18 files changed, 267 insertions(+), 79 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 20dc8c605b..2b3aa79cb6 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define JUMBLE_SIZE1024 /* query serialization buffer size */ +/* + * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze + * ignores. + */ +#define PGSS_HANDLED_UTILITY(n) (!IsA(n, ExecuteStmt) && \ + !IsA(n, PrepareStmt) && \ + !IsA(n, DeallocateStmt)) + /* * Extension version number, for supporting older extension versions' objects */ @@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, QueryCompletion *qc); -static uint64 pgss_hash_string(const char *str, int len); +static const char *pgss_clean_querytext(const char *query, int *location, int *len); +static uint64 pgss_compute_utility_queryid(const char *query, int query_len); static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, double total_time, uint64 rows, @@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query) return; /* - * Utility statements get queryId zero. We do this even in cases where - * the statement contains an optimizable statement for which a queryId - * could be derived (such as EXPLAIN or DECLARE CURSOR). For such cases, - * runtime control will first go through ProcessUtility and then the - * executor, and we don't want the executor hooks to do anything, since we - * are already measuring the statement's costs at the utility level. + * We compute a queryId now so that it can get exported in out + * PgBackendStatus. pgss_ProcessUtility will later discard it to prevents + * double counting of optimizable statements that are directly contained in + * utility statements. Note that we don't compute a queryId for prepared + * statemets related utility, as those will inherit from the underlying + * statements's one (except DEALLOCATE which is entirely untracked). */ if (query->utilityStmt) { - query->queryId = UINT64CONST(0); + if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt) + && pstate->p_sourcetext) + { + const char *querytext = pstate->p_sourcetext; + int query_location = query->stmt_location; + int query_len = query->stmt_len; + + /* + * Confine our attention to the relevant part of the string, if the + * query is a portion of a multi-statement source string. + */ +
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Mar 03, 2020 at 04:24:59PM +0100, Julien Rouhaud wrote: > > cfbot reports a failure since 2f9661311b (command completion tag > change), so here's a rebased v6, no change otherwise. Conflict with 8e8a0becb3 (Unify several ways to tracking backend type), thanks again to cfbot, rebased v7 attached. >From dda1ab659a44c9a6375ee05d249baa2ec552 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Mon, 18 Mar 2019 18:55:50 +0100 Subject: [PATCH v7 1/2] Expose queryid in pg_stat_activity and log_line_prefix Similarly to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Also add a %Q placeholder to include the queryid in the log_line_prefix, which will also only expose top level statements. Author: Julien Rouhaud Reviewed-by: Evgeny Efimkin, Michael Paquier Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com --- .../pg_stat_statements/pg_stat_statements.c | 179 -- doc/src/sgml/config.sgml | 9 +- doc/src/sgml/monitoring.sgml | 12 ++ src/backend/catalog/system_views.sql | 1 + src/backend/executor/execMain.c | 8 + src/backend/executor/execParallel.c | 14 +- src/backend/executor/nodeGather.c | 3 +- src/backend/executor/nodeGatherMerge.c| 4 +- src/backend/parser/analyze.c | 5 + src/backend/postmaster/pgstat.c | 65 +++ src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/pgstatfuncs.c | 7 +- src/backend/utils/error/elog.c| 10 +- src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/catalog/pg_proc.dat | 6 +- src/include/executor/execParallel.h | 3 +- src/include/pgstat.h | 5 + src/test/regress/expected/rules.out | 9 +- 18 files changed, 267 insertions(+), 79 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 20dc8c605b..2b3aa79cb6 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define JUMBLE_SIZE1024 /* query serialization buffer size */ +/* + * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze + * ignores. + */ +#define PGSS_HANDLED_UTILITY(n) (!IsA(n, ExecuteStmt) && \ + !IsA(n, PrepareStmt) && \ + !IsA(n, DeallocateStmt)) + /* * Extension version number, for supporting older extension versions' objects */ @@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, QueryCompletion *qc); -static uint64 pgss_hash_string(const char *str, int len); +static const char *pgss_clean_querytext(const char *query, int *location, int *len); +static uint64 pgss_compute_utility_queryid(const char *query, int query_len); static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, double total_time, uint64 rows, @@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query) return; /* - * Utility statements get queryId zero. We do this even in cases where - * the statement contains an optimizable statement for which a queryId - * could be derived (such as EXPLAIN or DECLARE CURSOR). For such cases, - * runtime control will first go through ProcessUtility and then the - * executor, and we don't want the executor hooks to do anything, since we - * are already measuring the statement's costs at the utility level. + * We compute a queryId now so that it can get exported in out + * PgBackendStatus. pgss_ProcessUtility will later discard it to prevents + * double counting of optimizable statements that are directly contained in + * utility statements. Note that we don't compute a queryId for prepared + * statemets related utility, as those will inherit from the underlying + * statements's one (except DEALLOCATE which is entirely untracked). */ if (query->utilityStmt) { - query->queryId = UINT64CONST(0); + if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt) + && pstate->p_sourcetext) + { + const char *querytext = pstate->p_sourcetext; + int query_location = query->stmt_location; + int query_len = query->stmt_len; + + /* + * Confine our attention to the relevant part of the string, if the + * query is a portion of a multi-statement source string. + */ + querytext = pgss_clean_querytext(pstate->p_sourcetext, + _location, + _len);
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Jun 28, 2019 at 11:49:53AM -0700, Peter Geoghegan wrote: > On Tue, Mar 19, 2019 at 12:38 PM legrand legrand > wrote: > > Would it make sense to add it in auto explain ? > > I don't know for explain itself, but maybe ... > > I think that it should appear in EXPLAIN. pg_stat_statements already > cannot have a query hash of zero, so it might be okay to display it > only when its value is non-zero. I had forgotten about this. After looking at it, I can see a few issues. For now post_parse_analyze_hook isn't called for the underlying statement, so we don't have the queryid. And we can't compute the queryid for the underlying query in the initial post_parse_analyze_hook call as we don't want the executor to have a queryid set in that case to avoid cumulating counters for both the explain and the query. We could add an extra call in ExplainQuery, but this will be ignored by pg_stat_statements unless you set pg_stat_statements.track to all. Also, pgss_post_parse_analyze will try to record an entry with the normalized query text if no one exists yet and if any constant where removed. The problem is that, as I already mentioned in [1], the underlying query doesn't have query_location or query_len valued, so the recorded query text will at least contain the explain part of the input query. [1] https://www.postgresql.org/message-id/CAOBaU_Y-y%2BVOhTZgDOuDk6-9V72-ZXdWccXo_kx0P4DDBEEh9A%40mail.gmail.com
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Feb 7, 2020 at 11:12 AM Julien Rouhaud wrote: > > On Thu, Feb 06, 2020 at 02:59:09PM -0500, Robert Haas wrote: > > On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud > > wrote: > > > There's also the possibility to reserve 1 bit of the hash to know if > > > this is a utility command or not, although I don't recall right now > > > all the possible issues with utility commands and some special > > > handling of them. I'll work on it before the next commitfest. > > > > FWIW, I don't really see why it would be bad to have 0 mean that > > "there's no query ID for some reason" without caring whether that's > > because the current statement is a utility statement or because > > there's no statement in progress at all or whatever else. The user > > probably doesn't need our help to distinguish between "no statement" > > and "utility statement", right? > > Sure, but if we don't fix that it means that we also won't expose any queryid > for utility statement, even if pg_stat_statements is configured to track those > (with a very poor queryid handling, but still). > > While looking at this again, I realized that pg_stat_statements doesn't > compute > a queryid during the post parse analysis hook just to make sure that no query > identifier will be set during executorStart and the rest of executor > functions. > > AFAICT, that can't happen anyway since pg_plan_queries() will discard any > computed queryid for utility statements. This seems to be an oversight due to > original pg_stat_statements implementation, so I fixed this. > > Then, as processUtility is called between parse analysis and executor, I think > that we can simply work around this by computing utility statements query > identifier during parse analysis, removing it in pgss_ProcessUtility and > keeping a copy of it for the pgss_store calls in that function, as done in the > attached v5. > > This fixes everything except EXECUTE statements, which has to get the > underlying query's queryid. The problem is that EXECUTE won't get through > parse analysis, so while it's correctly handled for execution and pgss_store, > it's not being exposed in pg_stat_activity and log_line_prefix. To fix it, I > added an extra call to pgstat_report_queryid in executorStart. As this > function is a no-op if a queryid is already exposed, this shouldn't cause any > harm and fix any other cases of query execution that don't go through parse > analysis. > > Finally, DEALLOCATE is entirely ignored by pg_stat_statements, so those > statements will always be reported with a NULL/0 queryid, but this is > consistent as it's also not present in pg_stat_statements() SRF. cfbot reports a failure since 2f9661311b (command completion tag change), so here's a rebased v6, no change otherwise. diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 20dc8c605b..2b3aa79cb6 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define JUMBLE_SIZE1024 /* query serialization buffer size */ +/* + * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze + * ignores. + */ +#define PGSS_HANDLED_UTILITY(n) (!IsA(n, ExecuteStmt) && \ + !IsA(n, PrepareStmt) && \ + !IsA(n, DeallocateStmt)) + /* * Extension version number, for supporting older extension versions' objects */ @@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, QueryCompletion *qc); -static uint64 pgss_hash_string(const char *str, int len); +static const char *pgss_clean_querytext(const char *query, int *location, int *len); +static uint64 pgss_compute_utility_queryid(const char *query, int query_len); static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, double total_time, uint64 rows, @@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query) return; /* - * Utility statements get queryId zero. We do this even in cases where - * the statement contains an optimizable statement for which a queryId - * could be derived (such as EXPLAIN or DECLARE CURSOR). For such cases, - * runtime control will first go through ProcessUtility and then the - * executor, and we don't want the executor hooks to do anything, since we - * are already measuring the statement's costs at the utility level. + * We compute a queryId now so that it can get exported in out + * PgBackendStatus. pgss_ProcessUtility will later discard it to prevents + * double counting of optimizable statements that are directly contained in + * utility statements. Note that we don't compute a queryId for prepared + * statemets related utility, as those
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Thu, Feb 06, 2020 at 02:59:09PM -0500, Robert Haas wrote: > On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud wrote: > > There's also the possibility to reserve 1 bit of the hash to know if > > this is a utility command or not, although I don't recall right now > > all the possible issues with utility commands and some special > > handling of them. I'll work on it before the next commitfest. > > FWIW, I don't really see why it would be bad to have 0 mean that > "there's no query ID for some reason" without caring whether that's > because the current statement is a utility statement or because > there's no statement in progress at all or whatever else. The user > probably doesn't need our help to distinguish between "no statement" > and "utility statement", right? Sure, but if we don't fix that it means that we also won't expose any queryid for utility statement, even if pg_stat_statements is configured to track those (with a very poor queryid handling, but still). While looking at this again, I realized that pg_stat_statements doesn't compute a queryid during the post parse analysis hook just to make sure that no query identifier will be set during executorStart and the rest of executor functions. AFAICT, that can't happen anyway since pg_plan_queries() will discard any computed queryid for utility statements. This seems to be an oversight due to original pg_stat_statements implementation, so I fixed this. Then, as processUtility is called between parse analysis and executor, I think that we can simply work around this by computing utility statements query identifier during parse analysis, removing it in pgss_ProcessUtility and keeping a copy of it for the pgss_store calls in that function, as done in the attached v5. This fixes everything except EXECUTE statements, which has to get the underlying query's queryid. The problem is that EXECUTE won't get through parse analysis, so while it's correctly handled for execution and pgss_store, it's not being exposed in pg_stat_activity and log_line_prefix. To fix it, I added an extra call to pgstat_report_queryid in executorStart. As this function is a no-op if a queryid is already exposed, this shouldn't cause any harm and fix any other cases of query execution that don't go through parse analysis. Finally, DEALLOCATE is entirely ignored by pg_stat_statements, so those statements will always be reported with a NULL/0 queryid, but this is consistent as it's also not present in pg_stat_statements() SRF. diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 6f82a671ee..2da810ade6 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define JUMBLE_SIZE1024/* query serialization buffer size */ +/* + * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze + * ignores. + */ +#define PGSS_HANDLED_UTILITY(n)(!IsA(n, ExecuteStmt) && \ + !IsA(n, PrepareStmt) && \ + !IsA(n, DeallocateStmt)) + /* * Extension version number, for supporting older extension versions' objects */ @@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, char *completionTag); -static uint64 pgss_hash_string(const char *str, int len); +static const char *pgss_clean_querytext(const char *query, int *location, int *len); +static uint64 pgss_compute_utility_queryid(const char *query, int query_len); static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, double total_time, uint64 rows, @@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query) return; /* -* Utility statements get queryId zero. We do this even in cases where -* the statement contains an optimizable statement for which a queryId -* could be derived (such as EXPLAIN or DECLARE CURSOR). For such cases, -* runtime control will first go through ProcessUtility and then the -* executor, and we don't want the executor hooks to do anything, since we -* are already measuring the statement's costs at the utility level. +* We compute a queryId now so that it can get exported in out +* PgBackendStatus. pgss_ProcessUtility will later discard it to
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud wrote: > There's also the possibility to reserve 1 bit of the hash to know if > this is a utility command or not, although I don't recall right now > all the possible issues with utility commands and some special > handling of them. I'll work on it before the next commitfest. FWIW, I don't really see why it would be bad to have 0 mean that "there's no query ID for some reason" without caring whether that's because the current statement is a utility statement or because there's no statement in progress at all or whatever else. The user probably doesn't need our help to distinguish between "no statement" and "utility statement", right? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Sat, Feb 1, 2020 at 12:30 PM Tomas Vondra wrote: > > On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote: > >On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier wrote: > >> > >> On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote: > >> > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: > >> >> I'd really like to have the queryid function available through SQL, > >> >> but I think that this specific case wouldn't work very well for > >> >> pg_stat_statements' approach as it's working with oid. The query > >> >> string in pg_stat_activity is the user provided one rather than a > >> >> fully-qualified version, so in order to get that query's queryid, you > >> >> need to know the exact search_path in use in that backend, and that's > >> >> not something available. > >> > > >> > Yeah.. So, we have a patch marked as ready for committer here, and it > >> > seems to me that we have a couple of issues to discuss more about > >> > first particularly this query ID of 0. Again, do others have more > >> > any input to offer? > > > >I just realized that with current infrastructure it's not possible to > >display a utility queryid. We need to recognize utility to not > >process the counters twice (once in processUtility, once in the > >underlying executor), so we don't provide a queryid for utility > >statements in parse analysis. Current magic value 0 has the side > >effect of showing an invalid queryid for all utilty statements, and > >using a magic value different from 0 will just always display that > >magic value. We could instead add another field in the Query and > >PlannedStmt structs, say "int queryid_flags", that extensions could > >use for their needs? > > > >> And while on it, the latest patch does not apply, so a rebase is > >> needed here. > > > >Yep, I noticed that this morning. I already rebased the patch > >locally, I'll send a new version with new modifications when we reach > >an agreement on the utility issue. > > > > Well, this patch was in WoA since November, but now that I look at it > that might have been wrong - we're clearly waiting for agreement on how > to handle queryid for utility commands. I suspect the WoA status might > have been driving people away from this thread :-( Oh, indeed. > I've switched the patch to "needs review" and moved it to the next CF. Thanks > What I think needs to happen is we get a patch implementing one of the > proposed solutions, and discuss that. There's also the possibility to reserve 1 bit of the hash to know if this is a utility command or not, although I don't recall right now all the possible issues with utility commands and some special handling of them. I'll work on it before the next commitfest.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote: On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier wrote: On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote: > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: >> I'd really like to have the queryid function available through SQL, >> but I think that this specific case wouldn't work very well for >> pg_stat_statements' approach as it's working with oid. The query >> string in pg_stat_activity is the user provided one rather than a >> fully-qualified version, so in order to get that query's queryid, you >> need to know the exact search_path in use in that backend, and that's >> not something available. > > Yeah.. So, we have a patch marked as ready for committer here, and it > seems to me that we have a couple of issues to discuss more about > first particularly this query ID of 0. Again, do others have more > any input to offer? I just realized that with current infrastructure it's not possible to display a utility queryid. We need to recognize utility to not process the counters twice (once in processUtility, once in the underlying executor), so we don't provide a queryid for utility statements in parse analysis. Current magic value 0 has the side effect of showing an invalid queryid for all utilty statements, and using a magic value different from 0 will just always display that magic value. We could instead add another field in the Query and PlannedStmt structs, say "int queryid_flags", that extensions could use for their needs? And while on it, the latest patch does not apply, so a rebase is needed here. Yep, I noticed that this morning. I already rebased the patch locally, I'll send a new version with new modifications when we reach an agreement on the utility issue. Well, this patch was in WoA since November, but now that I look at it that might have been wrong - we're clearly waiting for agreement on how to handle queryid for utility commands. I suspect the WoA status might have been driving people away from this thread :-( I've switched the patch to "needs review" and moved it to the next CF. What I think needs to happen is we get a patch implementing one of the proposed solutions, and discuss that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier wrote: > > On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote: > > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: > >> I'd really like to have the queryid function available through SQL, > >> but I think that this specific case wouldn't work very well for > >> pg_stat_statements' approach as it's working with oid. The query > >> string in pg_stat_activity is the user provided one rather than a > >> fully-qualified version, so in order to get that query's queryid, you > >> need to know the exact search_path in use in that backend, and that's > >> not something available. > > > > Yeah.. So, we have a patch marked as ready for committer here, and it > > seems to me that we have a couple of issues to discuss more about > > first particularly this query ID of 0. Again, do others have more > > any input to offer? I just realized that with current infrastructure it's not possible to display a utility queryid. We need to recognize utility to not process the counters twice (once in processUtility, once in the underlying executor), so we don't provide a queryid for utility statements in parse analysis. Current magic value 0 has the side effect of showing an invalid queryid for all utilty statements, and using a magic value different from 0 will just always display that magic value. We could instead add another field in the Query and PlannedStmt structs, say "int queryid_flags", that extensions could use for their needs? > And while on it, the latest patch does not apply, so a rebase is > needed here. Yep, I noticed that this morning. I already rebased the patch locally, I'll send a new version with new modifications when we reach an agreement on the utility issue.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote: > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: >> I'd really like to have the queryid function available through SQL, >> but I think that this specific case wouldn't work very well for >> pg_stat_statements' approach as it's working with oid. The query >> string in pg_stat_activity is the user provided one rather than a >> fully-qualified version, so in order to get that query's queryid, you >> need to know the exact search_path in use in that backend, and that's >> not something available. > > Yeah.. So, we have a patch marked as ready for committer here, and it > seems to me that we have a couple of issues to discuss more about > first particularly this query ID of 0. Again, do others have more > any input to offer? And while on it, the latest patch does not apply, so a rebase is needed here. -- Michael signature.asc Description: PGP signature
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: > I'd really like to have the queryid function available through SQL, > but I think that this specific case wouldn't work very well for > pg_stat_statements' approach as it's working with oid. The query > string in pg_stat_activity is the user provided one rather than a > fully-qualified version, so in order to get that query's queryid, you > need to know the exact search_path in use in that backend, and that's > not something available. Yeah.. So, we have a patch marked as ready for committer here, and it seems to me that we have a couple of issues to discuss more about first particularly this query ID of 0. Again, do others have more any input to offer? -- Michael signature.asc Description: PGP signature
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Nov 13, 2019 at 4:15 AM Bruce Momjian wrote: > > On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote: > > On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote: > > > The thing is that pg_stat_statements assigns a 0 queryid in the > > > post_parse_analyze_hook to recognize utility statements and avoid > > > tracking instrumentation twice in case of utility statements, and then > > > compute a queryid base on a hash of the query text. Maybe we could > > > instead fully reserve queryid "2" for utility statements (so forcing > > > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead > > > of only 0), and use "2" as the identifier for utility statement > > > instead of "0"? > > > > Hmm. Not sure. At this stage it would be nice to gather more input > > on the matter, and FWIW, I don't like much the assumption that a query > > ID of 0 is perhaps a utility statement, or perhaps nothing depending > > on the state of a backend entry, or even perhaps something else > > depending how on how modules make use and define such query IDs. > > I thought each extension would export a function to compute the query > id, and you would all that function with the pg_stat_activity.query > string. I'd really like to have the queryid function available through SQL, but I think that this specific case wouldn't work very well for pg_stat_statements' approach as it's working with oid. The query string in pg_stat_activity is the user provided one rather than a fully-qualified version, so in order to get that query's queryid, you need to know the exact search_path in use in that backend, and that's not something available.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote: > On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote: > > The thing is that pg_stat_statements assigns a 0 queryid in the > > post_parse_analyze_hook to recognize utility statements and avoid > > tracking instrumentation twice in case of utility statements, and then > > compute a queryid base on a hash of the query text. Maybe we could > > instead fully reserve queryid "2" for utility statements (so forcing > > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead > > of only 0), and use "2" as the identifier for utility statement > > instead of "0"? > > Hmm. Not sure. At this stage it would be nice to gather more input > on the matter, and FWIW, I don't like much the assumption that a query > ID of 0 is perhaps a utility statement, or perhaps nothing depending > on the state of a backend entry, or even perhaps something else > depending how on how modules make use and define such query IDs. I thought each extension would export a function to compute the query id, and you would all that function with the pg_stat_activity.query string. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote: > The thing is that pg_stat_statements assigns a 0 queryid in the > post_parse_analyze_hook to recognize utility statements and avoid > tracking instrumentation twice in case of utility statements, and then > compute a queryid base on a hash of the query text. Maybe we could > instead fully reserve queryid "2" for utility statements (so forcing > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead > of only 0), and use "2" as the identifier for utility statement > instead of "0"? Hmm. Not sure. At this stage it would be nice to gather more input on the matter, and FWIW, I don't like much the assumption that a query ID of 0 is perhaps a utility statement, or perhaps nothing depending on the state of a backend entry, or even perhaps something else depending how on how modules make use and define such query IDs. -- Michael signature.asc Description: PGP signature
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Thanks for looking at it! On Wed, Sep 11, 2019 at 6:45 AM Michael Paquier wrote: > > An invalid query ID is assumed to be 0 in the patch, per the way it is > defined in pg_stat_statements. However this also maps with the case > where we have a utility statement. Oh indeed. Which means that if a utility statements later calls parse_analyze or friends, this patch would report an unexpected queryid. That's at least possible for something like COPY (SELECT * FROM tbl) TO ... The thing is that pg_stat_statements assigns a 0 queryid in the post_parse_analyze_hook to recognize utility statements and avoid tracking instrumentation twice in case of utility statements, and then compute a queryid base on a hash of the query text. Maybe we could instead fully reserve queryid "2" for utility statements (so forcing queryid "1" for standard queries if jumbling returns 0 *or* 2 instead of only 0), and use "2" as the identifier for utility statement instead of "0"? > + /* > +* We only report the top-level query identifiers. The stored queryid is > +* reset when a backend call pgstat_report_activity(STATE_RUNNING), or > with > +* an explicit call to this function. If the saved query identifier is > not > +* zero it means that it's not a top-level command, so ignore the one > +* provided unless it's an explicit call to reset the identifier. > +*/ > + if (queryId != 0 && beentry->st_queryid != 0) > + return; > Hmm. I am wondering if we shouldn't have an API dedicated to the > reset of the query ID. That logic looks rather brittle.. How about adding a "bool force" parameter to allow resetting the queryid to 0? > Wouldn't it be better (and more consistent) to update the query ID in > parse_analyze_varparams() and parse_analyze() as well after going > through the post_parse_analyze hook instead of pg_analyze_and_rewrite? I thought about it without knowing what would be best. I'll change to report the queryid right after calling post_parse_analyze_hook then. > + /* > +* If a new query is started, we reset the query identifier as it'll only > +* be known after parse analysis, to avoid reporting last query's > +* identifier. > +*/ > + if (state == STATE_RUNNING) > + beentry->st_queryid = 0 > I don't quite get why you don't reset the counter in other cases as > well. If the backend entry is idle in transaction or in an idle > state, it seems to me that we should not report the query ID of the > last query run in the transaction. And that would make the reset in > exec_simple_query() unnecessary, no? I'm reproducing the same behavior as for the query text, ie. showing the information about the last executed query text if state is idle: + queryid + bigint + Identifier of this backend's most recent query. If + state is active this field + shows the identifier of the currently executing query. In all other + states, it shows the identifier of last query that was executed. I think that showing the last executed query's queryid is as useful as the query text. Also, while avoiding a reset in exec_simple_query() it'd be required to do such reset in case of error during query execution, so that wouldn't make things quite simpler..
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Aug 07, 2019 at 09:03:21AM +, Evgeny Efimkin wrote: > The new status of this patch is: Ready for Committer I may be wrong of course, but it looks that this is wanted and the current shape of the patch looks sensible: - Register the query ID using a backend entry. - Only consider the top-level query. An invalid query ID is assumed to be 0 in the patch, per the way it is defined in pg_stat_statements. However this also maps with the case where we have a utility statement. +* We only report the top-level query identifiers. The stored queryid is +* reset when a backend call pgstat_report_activity(STATE_RUNNING), or with s/call/calls/ + /* +* We only report the top-level query identifiers. The stored queryid is +* reset when a backend call pgstat_report_activity(STATE_RUNNING), or with +* an explicit call to this function. If the saved query identifier is not +* zero it means that it's not a top-level command, so ignore the one +* provided unless it's an explicit call to reset the identifier. +*/ + if (queryId != 0 && beentry->st_queryid != 0) + return; Hmm. I am wondering if we shouldn't have an API dedicated to the reset of the query ID. That logic looks rather brittle.. Wouldn't it be better (and more consistent) to update the query ID in parse_analyze_varparams() and parse_analyze() as well after going through the post_parse_analyze hook instead of pg_analyze_and_rewrite? + /* +* If a new query is started, we reset the query identifier as it'll only +* be known after parse analysis, to avoid reporting last query's +* identifier. +*/ + if (state == STATE_RUNNING) + beentry->st_queryid = 0 I don't quite get why you don't reset the counter in other cases as well. If the backend entry is idle in transaction or in an idle state, it seems to me that we should not report the query ID of the last query run in the transaction. And that would make the reset in exec_simple_query() unnecessary, no? -- Michael signature.asc Description: PGP signature
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed HI! patch is look good for me. The new status of this patch is: Ready for Committer
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Kyotaro Horiguchi-4 wrote > At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand > legrand_legrand@ > wrote in < > 1564902241482-0.post@.nabble >> >> > However having the nested queryid in >> > pg_stat_activity would be convenient to track >> > what is a long stored functions currently doing. >> >> +1 >> >> And this could permit to get wait event sampling per queryid when >> pg_stat_statements.track = all > > I'm strongly on this side emotionally, but also I'm on Tom and > Andres's side that exposing querid that way is not the right > thing. > > Doing that means we don't need exact correspondence between > top-level query and queryId (in nested or multistatement queries) > in this patch. pg_stat_statements will allow us to do the same > thing by having additional uint64[MaxBackends] array in > pgssSharedState, instead of expanding PgBackendStatus array in > core by the same size. > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center Hi Kyotaro, Thank you for this answer. What you propose here is already available Inside pg_stat_sql_plans extension (a derivative from Pg_stat_statements and pg_store_plans) And I’m used to this queryid behavior with top Level Queries... My emotion was high but I will accept it ! Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Aug 5, 2019 at 9:28 AM Kyotaro Horiguchi wrote: > > At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand > wrote in <1564902241482-0.p...@n3.nabble.com> > > > However having the nested queryid in > > > pg_stat_activity would be convenient to track > > > what is a long stored functions currently doing. > > > > +1 > > > > And this could permit to get wait event sampling per queryid when > > pg_stat_statements.track = all > > I'm strongly on this side emotionally, but also I'm on Tom and > Andres's side that exposing querid that way is not the right > thing. > > Doing that means we don't need exact correspondence between > top-level query and queryId (in nested or multistatement queries) > in this patch. pg_stat_statements will allow us to do the same > thing by having additional uint64[MaxBackends] array in > pgssSharedState, instead of expanding PgBackendStatus array in > core by the same size. Sure, but the problem with this approach is that all extensions that compute their own queryid would have to do the same. I hope that we can come up with an approach friendlier for those extensions.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand wrote in <1564902241482-0.p...@n3.nabble.com> > > However having the nested queryid in > > pg_stat_activity would be convenient to track > > what is a long stored functions currently doing. > > +1 > > And this could permit to get wait event sampling per queryid when > pg_stat_statements.track = all I'm strongly on this side emotionally, but also I'm on Tom and Andres's side that exposing querid that way is not the right thing. Doing that means we don't need exact correspondence between top-level query and queryId (in nested or multistatement queries) in this patch. pg_stat_statements will allow us to do the same thing by having additional uint64[MaxBackends] array in pgssSharedState, instead of expanding PgBackendStatus array in core by the same size. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
> However having the nested queryid in > pg_stat_activity would be convenient to track > what is a long stored functions currently doing. +1 And this could permit to get wait event sampling per queryid when pg_stat_statements.track = all Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hi, On Sat, Aug 3, 2019 at 1:21 AM Andres Freund wrote: > > On 2019-08-02 10:54:35 +0200, Julien Rouhaud wrote: > > However having the nested queryid in > > pg_stat_activity would be convenient to track what is a long stored > > functions currently doing. Maybe we could expose something like > > top_level_queryid and current_queryid instead? > > Given that the query string is the toplevel one, I think that'd just be > confusing. And given the fact that it adds *substantial* additional > complexity, I'd just rip the subcommand bits out. Ok, so here's a version that only exposes the top-level queryid only. There can still be discrepancies with the query field, if a multi-command string is provided. The queryid will be updated each time a new top level statement is executed. As the queryid cannot be immediately known, and may never exist at all if a query fails to parse, here are the heuristic I used to update the stored queryid: - it's reset to 0 each time pgstat_report_activity(STATE_RUNNING) is called. This way, we're sure that we don't display last query's queryid in the logs if the next query fails to parse - it's also reset to 0 at the beginning of exec_simple_query() loop on the parsetree_list (for multi-command string case) - pg_analyze_and_rewrite() and pg_analyze_and_rewrite_params() will report the new queryid after parse analysis. - a non-zero queryid will only be updated if the stored one is zero This should also work as intended for background worker using SPI, provided that they correctly call pgstat_report_activity. I also modified ExecInitParallelPlan() to publish the queryId in the serialized plannedStmt, so ParallelQueryMain() can report it to make the queryid available in the parallel workers too. Note that this patch makes it clear that a zero queryid means no queryid computed (and NULL will be displayed in such case in pg_stat_activity). pg_stat_statements already makes sure that it cannot compute a zero queryid. It also assume that any extension computing a queryid will do that in the post_parse_analysis hook, which seems like a sane requirement. We may want to have a dedicated hook for that instead, if more people get interested in having the queryid only, possibly different implementations, if it becomes available outside pgss. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index c91e3e1550..c7ca1bf9a8 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -6353,6 +6353,11 @@ local0.*/var/log/postgresql session processes no + + %Q + queryid: identifier of session's current query, if any + yes + %% Literal % @@ -6736,8 +6741,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; Enables the collection of information on the currently -executing command of each session, along with the time when -that command began execution. This parameter is on by +executing command of each session, along with its identifier and the +time when that command began execution. This parameter is on by default. Note that even when enabled, this information is not visible to all users, only to superusers and the user owning the session being reported on, so it should not represent a diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index bf72d0c303..7f287c7a7e 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -824,6 +824,18 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser xid The current backend's xmin horizon. + + queryid + bigint + Identifier of this backend's most recent query. If + state is active this field + shows the identifier of the currently executing query. In all other + states, it shows the identifier of last query that was executed. By + default, query identifiers are not computed, so this field will always + be null, unless an additional module that compute query identifiers, such + as , is configured. + + query text diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index ea4c85e395..f30098c2cd 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -749,6 +749,7 @@ CREATE VIEW pg_stat_activity AS S.state, S.backend_xid, s.backend_xmin, +S.queryid, S.query, S.backend_type FROM pg_stat_get_activity(NULL) AS S diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c index 53cd2fc666..9ba6d3f2e6 100644 --- a/src/backend/executor/execParallel.c +++ b/src/backend/executor/execParallel.c @@ -121,7 +121,7 @@ typedef struct
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hi, On 2019-08-02 10:54:35 +0200, Julien Rouhaud wrote: > On Thu, Aug 1, 2019 at 11:05 PM Andres Freund wrote: > > > > I'm actually quite unconvinced that it's sensible to update the global > > value for nested queries. That'll mean e.g. the log_line_prefix and > > pg_stat_activity values are most of the time going to be bogus while > > nested, because the querystring that's associated with those will *not* > > be the value that the queryid corresponds to. elog.c uses > > debug_query_string to log the statement, which is only updated for > > top-level queries (outside of some exceptions like parallel workers for > > parallel queries in a function or stuff like that). And pg_stat_activity > > is also only updated for top level queries. > > Having the nested queryid seems indeed quite broken for > log_line_prefix. However having the nested queryid in > pg_stat_activity would be convenient to track what is a long stored > functions currently doing. Maybe we could expose something like > top_level_queryid and current_queryid instead? Given that the query string is the toplevel one, I think that'd just be confusing. And given the fact that it adds *substantial* additional complexity, I'd just rip the subcommand bits out. Greetings, Andres Freund
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Thu, Aug 1, 2019 at 11:05 PM Andres Freund wrote: > > I'm actually quite unconvinced that it's sensible to update the global > value for nested queries. That'll mean e.g. the log_line_prefix and > pg_stat_activity values are most of the time going to be bogus while > nested, because the querystring that's associated with those will *not* > be the value that the queryid corresponds to. elog.c uses > debug_query_string to log the statement, which is only updated for > top-level queries (outside of some exceptions like parallel workers for > parallel queries in a function or stuff like that). And pg_stat_activity > is also only updated for top level queries. Having the nested queryid seems indeed quite broken for log_line_prefix. However having the nested queryid in pg_stat_activity would be convenient to track what is a long stored functions currently doing. Maybe we could expose something like top_level_queryid and current_queryid instead?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Thu, Aug 1, 2019 at 10:52 PM Andres Freund wrote: > > On 2019-08-01 22:42:23 +0200, Julien Rouhaud wrote: > > Sure, but it requires extra wrapper functions, and the st_changecount > > dance when writing the new value. > > So? You need a wrapper function anyway, there's no way we're going to > add all those separate pg_atomic_write* calls directly. Ok > I also think this proposed column should probably respect > the track_activities GUC. Oh indeed, I'll fix that when I'll be sure of the semantics to implement.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hi, On 2019-08-01 22:49:48 +0200, Julien Rouhaud wrote: > On Thu, Aug 1, 2019 at 8:36 PM Andres Freund wrote: > > > > On 2019-08-01 14:20:46 -0400, Robert Haas wrote: > > > However, I think that the fact that this patch adds 15 new calls to > > > pg_atomic_write_u64(>queryId, ...) is probably not a good > > > sign. It seems like we ought to be able to centralize it better than > > > that. > > > > +1 > > Unfortunately I didn't find a better way to do that. Since you can > have nested execution, I don't see how to avoid adding extra code in > every parts of query execution. At least my +1 is not primarily about the number of sites that need to handle queryid changes, but that they all need to know about the way the queryid is stored. Including how atomicity etc is handled. That knowledge should be in one or two places, not more. In a file where that knowledge makes sense. I'm *also* concerned about the number of places, as that makes it likely that some have been missed/new ones will be introduced without the queryid handling. But that wasn't what I was referring to above. I'm actually quite unconvinced that it's sensible to update the global value for nested queries. That'll mean e.g. the log_line_prefix and pg_stat_activity values are most of the time going to be bogus while nested, because the querystring that's associated with those will *not* be the value that the queryid corresponds to. elog.c uses debug_query_string to log the statement, which is only updated for top-level queries (outside of some exceptions like parallel workers for parallel queries in a function or stuff like that). And pg_stat_activity is also only updated for top level queries. Greetings, Andres Freund
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hi, On 2019-08-01 22:42:23 +0200, Julien Rouhaud wrote: > On Thu, Aug 1, 2019 at 8:36 PM Andres Freund wrote: > > > > On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote: > > > On Wed, Jul 31, 2019 at 11:59 PM Andres Freund wrote: > > > > And if it were necessary, why wouldn't any of the other fields in > > > > PgBackendStatus need it? There's plenty of other fields written to > > > > without a lock, and several of those are also 8 bytes (so it's not a > > > > case of assuming that 8 byte reads might not be atomic, but for byte > > > > reads are). > > > > > > This patch is actually storing the queryid in PGPROC, not in > > > PgBackendStatus, thus the need for an atomic. I used PGPROC because > > > the value needs to be available in log_line_prefix() and spi.c, so > > > pgstat.c / PgBackendStatus didn't seem like the best interface in that > > > case. > > > > Hm. I'm not convinced that really is the case? You can just access > > MyBEentry, and read and update it? > > Sure, but it requires extra wrapper functions, and the st_changecount > dance when writing the new value. So? You need a wrapper function anyway, there's no way we're going to add all those separate pg_atomic_write* calls directly. > > I mean, we do so at a frequency > > roughtly as high as high as the new queryid updates for things like > > pgstat_report_activity(). > > pgstat_report_activity() is only called for top-level statement. For > the queryid we need to track it down to all nested statements, which > could be way higher. Compared to the overhead of executing a separate query the cost of single function call containing a MyBEentry update of an 8byte value seems almost guaranteed to be immeasurable. The executor startup alone is several orders of magnitude more expensive. I also think this proposed column should probably respect the track_activities GUC. Greetings, Andres Freund
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Thu, Aug 1, 2019 at 8:36 PM Andres Freund wrote: > > On 2019-08-01 14:20:46 -0400, Robert Haas wrote: > > However, I think that the fact that this patch adds 15 new calls to > > pg_atomic_write_u64(>queryId, ...) is probably not a good > > sign. It seems like we ought to be able to centralize it better than > > that. > > +1 Unfortunately I didn't find a better way to do that. Since you can have nested execution, I don't see how to avoid adding extra code in every parts of query execution.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Thu, Aug 1, 2019 at 8:36 PM Andres Freund wrote: > > On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote: > > On Wed, Jul 31, 2019 at 11:59 PM Andres Freund wrote: > > > And if it were necessary, why wouldn't any of the other fields in > > > PgBackendStatus need it? There's plenty of other fields written to > > > without a lock, and several of those are also 8 bytes (so it's not a > > > case of assuming that 8 byte reads might not be atomic, but for byte > > > reads are). > > > > This patch is actually storing the queryid in PGPROC, not in > > PgBackendStatus, thus the need for an atomic. I used PGPROC because > > the value needs to be available in log_line_prefix() and spi.c, so > > pgstat.c / PgBackendStatus didn't seem like the best interface in that > > case. > > Hm. I'm not convinced that really is the case? You can just access > MyBEentry, and read and update it? Sure, but it requires extra wrapper functions, and the st_changecount dance when writing the new value. > I mean, we do so at a frequency > roughtly as high as high as the new queryid updates for things like > pgstat_report_activity(). pgstat_report_activity() is only called for top-level statement. For the queryid we need to track it down to all nested statements, which could be way higher. But pgstat_progress_update_param() is called way more than that. > Reading the value of your own backend you'd > not need to follow the changecount algorithm, I think, because it's only > updated from the current backend. If reading were a problem, you > trivially just could have a cache in a local variable, to avoid > accessing shared memory. Yes definitely, except for pgstat_get_activity(), all reads are backend local and should be totally safe to read as is.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hi, On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote: > On Wed, Jul 31, 2019 at 11:59 PM Andres Freund wrote: > > And if it were necessary, why wouldn't any of the other fields in > > PgBackendStatus need it? There's plenty of other fields written to > > without a lock, and several of those are also 8 bytes (so it's not a > > case of assuming that 8 byte reads might not be atomic, but for byte > > reads are). > > This patch is actually storing the queryid in PGPROC, not in > PgBackendStatus, thus the need for an atomic. I used PGPROC because > the value needs to be available in log_line_prefix() and spi.c, so > pgstat.c / PgBackendStatus didn't seem like the best interface in that > case. Hm. I'm not convinced that really is the case? You can just access MyBEentry, and read and update it? I mean, we do so at a frequency roughtly as high as high as the new queryid updates for things like pgstat_report_activity(). Reading the value of your own backend you'd not need to follow the changecount algorithm, I think, because it's only updated from the current backend. If reading were a problem, you trivially just could have a cache in a local variable, to avoid accessing shared memory. > Is widening PGPROC is too expensive for this purpose? Well, I'm mostly not a fan of putting even more in there, because it's pretty hard to understand already. To me it architecturally status information doesn't belong there (In fact, I'm somewhat unhappy that wait_event_info etc in there, but that's at least commonly updated at the same time as other fields in PGPROC). Greetings, Andres Freund
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On 2019-08-01 14:20:46 -0400, Robert Haas wrote: > However, I think that the fact that this patch adds 15 new calls to > pg_atomic_write_u64(>queryId, ...) is probably not a good > sign. It seems like we ought to be able to centralize it better than > that. +1
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Thu, Aug 1, 2019 at 2:46 AM Julien Rouhaud wrote: > This patch is actually storing the queryid in PGPROC, not in > PgBackendStatus, thus the need for an atomic. I used PGPROC because > the value needs to be available in log_line_prefix() and spi.c, so > pgstat.c / PgBackendStatus didn't seem like the best interface in that > case. Is widening PGPROC is too expensive for this purpose? I doubt it. However, I think that the fact that this patch adds 15 new calls to pg_atomic_write_u64(>queryId, ...) is probably not a good sign. It seems like we ought to be able to centralize it better than that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Wed, Jul 31, 2019 at 11:59 PM Andres Freund wrote: > > On 2019-07-31 23:51:40 +0200, Julien Rouhaud wrote: > > On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin > > wrote: > > > What reason to use pg_atomic_uint64? > > > > The queryid is read and written without holding any lock on the PGPROC > > entry, so the pg_atomic_uint64 will guarantee that we get a consistent > > value in pg_stat_get_activity(). Other reads shouldn't be a problem > > as far as I remember. > > Hm, I don't think that's necessary in this case. That's what the > st_changecount protocol is trying to ensure, no? > > /* > * To avoid locking overhead, we use the following protocol: a backend > * increments st_changecount before modifying its entry, and again > after > * finishing a modification. A would-be reader should note the value > of > * st_changecount, copy the entry into private memory, then check > * st_changecount again. If the value hasn't changed, and if it's > even, > * the copy is valid; otherwise start over. This makes updates cheap > * while reads are potentially expensive, but that's the tradeoff we > want. > * > * The above protocol needs memory barriers to ensure that the > apparent > * order of execution is as it desires. Otherwise, for example, the > CPU > * might rearrange the code so that st_changecount is incremented > twice > * before the modification on a machine with weak memory ordering. > Hence, > * use the macros defined below for manipulating st_changecount, > rather > * than touching it directly. > */ > int st_changecount; > > > And if it were necessary, why wouldn't any of the other fields in > PgBackendStatus need it? There's plenty of other fields written to > without a lock, and several of those are also 8 bytes (so it's not a > case of assuming that 8 byte reads might not be atomic, but for byte > reads are). This patch is actually storing the queryid in PGPROC, not in PgBackendStatus, thus the need for an atomic. I used PGPROC because the value needs to be available in log_line_prefix() and spi.c, so pgstat.c / PgBackendStatus didn't seem like the best interface in that case. Is widening PGPROC is too expensive for this purpose?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hi, On 2019-07-31 23:51:40 +0200, Julien Rouhaud wrote: > On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin > wrote: > > What reason to use pg_atomic_uint64? > > The queryid is read and written without holding any lock on the PGPROC > entry, so the pg_atomic_uint64 will guarantee that we get a consistent > value in pg_stat_get_activity(). Other reads shouldn't be a problem > as far as I remember. Hm, I don't think that's necessary in this case. That's what the st_changecount protocol is trying to ensure, no? /* * To avoid locking overhead, we use the following protocol: a backend * increments st_changecount before modifying its entry, and again after * finishing a modification. A would-be reader should note the value of * st_changecount, copy the entry into private memory, then check * st_changecount again. If the value hasn't changed, and if it's even, * the copy is valid; otherwise start over. This makes updates cheap * while reads are potentially expensive, but that's the tradeoff we want. * * The above protocol needs memory barriers to ensure that the apparent * order of execution is as it desires. Otherwise, for example, the CPU * might rearrange the code so that st_changecount is incremented twice * before the modification on a machine with weak memory ordering. Hence, * use the macros defined below for manipulating st_changecount, rather * than touching it directly. */ int st_changecount; And if it were necessary, why wouldn't any of the other fields in PgBackendStatus need it? There's plenty of other fields written to without a lock, and several of those are also 8 bytes (so it's not a case of assuming that 8 byte reads might not be atomic, but for byte reads are). Greetings, Andres Freund
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Hello, On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin wrote: > > What reason to use pg_atomic_uint64? The queryid is read and written without holding any lock on the PGPROC entry, so the pg_atomic_uint64 will guarantee that we get a consistent value in pg_stat_get_activity(). Other reads shouldn't be a problem as far as I remember. > In docs: > occured - > occurred Thanks! I fixed it on my local branch.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
What reason to use pg_atomic_uint64? In docs: occured - > occurred
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Mar 19, 2019 at 12:38 PM legrand legrand wrote: > Would it make sense to add it in auto explain ? > I don't know for explain itself, but maybe ... I think that it should appear in EXPLAIN. pg_stat_statements already cannot have a query hash of zero, so it might be okay to display it only when its value is non-zero. -- Peter Geoghegan
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Mar 19, 2019 at 12:00 PM Robert Haas wrote: > On the other hand, it also appears that a lot of people would be very, > very happy to just be able to see the query ID field that already > exists, both in pg_stat_statements in pg_stat_activity, and we > shouldn't throw up unnecessary impediments in the way of making that > happen, at least IMHO. +1. pg_stat_statements will already lose all the statistics that it aggregated in the event of a hard crash. The trade-off that the query jumbling logic makes is not a bad one, all things considered. -- Peter Geoghegan