Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Robert Haas
On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud  wrote:
> I originally suggested to make it clearer by having an enum GUC rather than a
> boolean, say compute_queryid = [ none | core | external ], and if set to
> external then a hook would be explicitely called.  Right now, "none" and
> "external" are binded with compute_queryid = off, and depends on whether an
> extension is computing a queryid during post_parse_analyse_hook.

I would just make it a Boolean and have a hook. The Boolean controls
whether it gets computed at all, and the hook lets an external module
override the way it gets computed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Julien Rouhaud
On Wed, Mar 17, 2021 at 12:24:44PM -0400, Bruce Momjian wrote:
> On Wed, Mar 17, 2021 at 05:16:50PM +0100, Pavel Stehule wrote:
> > 
> > 
> > st 17. 3. 2021 v 17:03 odesílatel Tom Lane  napsal:
> > 
> > Bruce Momjian  writes:
> > > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:
> > >> I still say that it's a serious mistake to sanctify a query ID
> > calculation
> > >> method that was designed only for pg_stat_statement's needs as the 
> > one
> > >> true way to do it.  But that's what exposing it in a core view would 
> > do.
> > 
> > > OK, I am fine with creating a new method, and maybe having
> > > pg_stat_statements use it.  Is that the direction we should be going 
> > in?
> > 
> > The point is that we've understood Query.queryId as something that
> > different extensions might calculate differently for their own needs.
> > In particular it's easy to imagine extensions that want an ID that is
> > less fuzzy than what pg_stat_statements wants.  We never had a plan for
> > how two such extensions could co-exist, but at least it was possible
> > to use one if you didn't use another.  If this gets moved into core
> > then there will basically be only one way that anyone can do it.
> > 
> > Maybe what we need is a design for allowing more than one query ID.
> > 
> > 
> > Theoretically there can be a hook for calculation of queryid, that can be by
> > used extension. Default can be assigned with a method that is used by
> > pg_stat_statements.
> 
> Yes, that is what the code patch says it does.
> 
> > I don't think it is possible to use more different query id for
> > pg_stat_statements so this solution can be simple.
> 
> Agreed.

Actually, putting the query identifer computation in the core makes it way more
tunable, even if it's conterintuitive.  What it means is that you can now chose
to use usual pgss' algorithm or a different one for log_line_prefix and
pg_stat_activity.queryid, but also that you can now use pgss with a different
query id algorithm.  That's another thing that user were asking for a long
time.

I originally suggested to make it clearer by having an enum GUC rather than a
boolean, say compute_queryid = [ none | core | external ], and if set to
external then a hook would be explicitely called.  Right now, "none" and
"external" are binded with compute_queryid = off, and depends on whether an
extension is computing a queryid during post_parse_analyse_hook.

It could later be extended to suit other needs if we ever come to some
agreement (for instance "legacy", "logical_replication_stable" or whatever
better name we can find for something that doesn't depend on Oid).




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Bruce Momjian
On Wed, Mar 17, 2021 at 05:16:50PM +0100, Pavel Stehule wrote:
> 
> 
> st 17. 3. 2021 v 17:03 odesílatel Tom Lane  napsal:
> 
> Bruce Momjian  writes:
> > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:
> >> I still say that it's a serious mistake to sanctify a query ID
> calculation
> >> method that was designed only for pg_stat_statement's needs as the one
> >> true way to do it.  But that's what exposing it in a core view would 
> do.
> 
> > OK, I am fine with creating a new method, and maybe having
> > pg_stat_statements use it.  Is that the direction we should be going in?
> 
> The point is that we've understood Query.queryId as something that
> different extensions might calculate differently for their own needs.
> In particular it's easy to imagine extensions that want an ID that is
> less fuzzy than what pg_stat_statements wants.  We never had a plan for
> how two such extensions could co-exist, but at least it was possible
> to use one if you didn't use another.  If this gets moved into core
> then there will basically be only one way that anyone can do it.
> 
> Maybe what we need is a design for allowing more than one query ID.
> 
> 
> Theoretically there can be a hook for calculation of queryid, that can be by
> used extension. Default can be assigned with a method that is used by
> pg_stat_statements.

Yes, that is what the code patch says it does.

> I don't think it is possible to use more different query id for
> pg_stat_statements so this solution can be simple.

Agreed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Pavel Stehule
st 17. 3. 2021 v 17:03 odesílatel Tom Lane  napsal:

> Bruce Momjian  writes:
> > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:
> >> I still say that it's a serious mistake to sanctify a query ID
> calculation
> >> method that was designed only for pg_stat_statement's needs as the one
> >> true way to do it.  But that's what exposing it in a core view would do.
>
> > OK, I am fine with creating a new method, and maybe having
> > pg_stat_statements use it.  Is that the direction we should be going in?
>
> The point is that we've understood Query.queryId as something that
> different extensions might calculate differently for their own needs.
> In particular it's easy to imagine extensions that want an ID that is
> less fuzzy than what pg_stat_statements wants.  We never had a plan for
> how two such extensions could co-exist, but at least it was possible
> to use one if you didn't use another.  If this gets moved into core
> then there will basically be only one way that anyone can do it.
>
> Maybe what we need is a design for allowing more than one query ID.
>

Theoretically there can be a hook for calculation of queryid, that can be
by used extension. Default can be assigned with a method that is used by
pg_stat_statements.

I don't think it is possible to use more different query id for
pg_stat_statements so this solution can be simple.

regards

Pavel





>
> > I do think we need _some_ method in core if we are going to be exposing
> > this value in pg_stat_activity and log_line_prefix.
>
> I'm basically objecting to the conclusion that we should do either
> of those.  There is no way around the fact that it will break every
> user of Query.queryId other than pg_stat_statements, unless they
> are okay with whatever definition pg_stat_statements is using (which
> is a moving target BTW).
>
> regards, tom lane
>
>
>


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Bruce Momjian
On Wed, Mar 17, 2021 at 12:01:38PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:
> >> I still say that it's a serious mistake to sanctify a query ID calculation
> >> method that was designed only for pg_stat_statement's needs as the one
> >> true way to do it.  But that's what exposing it in a core view would do.
> 
> > OK, I am fine with creating a new method, and maybe having
> > pg_stat_statements use it.  Is that the direction we should be going in?
> 
> The point is that we've understood Query.queryId as something that
> different extensions might calculate differently for their own needs.
> In particular it's easy to imagine extensions that want an ID that is
> less fuzzy than what pg_stat_statements wants.  We never had a plan for
> how two such extensions could co-exist, but at least it was possible
> to use one if you didn't use another.  If this gets moved into core
> then there will basically be only one way that anyone can do it.

Well, the patch docs say:

Enables or disables in core query identifier computation.arameter.  The
 extension requires a query
--> identifier to be computed.  Note that an external module can
--> alternatively be used if the in core query identifier computation
specification doesn't suit your need.  In this case, in core
computation must be disabled.  The default is off.

> Maybe what we need is a design for allowing more than one query ID.
> 
> > I do think we need _some_ method in core if we are going to be exposing
> > this value in pg_stat_activity and log_line_prefix.
> 
> I'm basically objecting to the conclusion that we should do either
> of those.  There is no way around the fact that it will break every
> user of Query.queryId other than pg_stat_statements, unless they
> are okay with whatever definition pg_stat_statements is using (which
> is a moving target BTW).

I thought the above doc patch feature avoided this problem because an
extension can override the build-in query id.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Tom Lane
Bruce Momjian  writes:
> On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:
>> I still say that it's a serious mistake to sanctify a query ID calculation
>> method that was designed only for pg_stat_statement's needs as the one
>> true way to do it.  But that's what exposing it in a core view would do.

> OK, I am fine with creating a new method, and maybe having
> pg_stat_statements use it.  Is that the direction we should be going in?

The point is that we've understood Query.queryId as something that
different extensions might calculate differently for their own needs.
In particular it's easy to imagine extensions that want an ID that is
less fuzzy than what pg_stat_statements wants.  We never had a plan for
how two such extensions could co-exist, but at least it was possible
to use one if you didn't use another.  If this gets moved into core
then there will basically be only one way that anyone can do it.

Maybe what we need is a design for allowing more than one query ID.

> I do think we need _some_ method in core if we are going to be exposing
> this value in pg_stat_activity and log_line_prefix.

I'm basically objecting to the conclusion that we should do either
of those.  There is no way around the fact that it will break every
user of Query.queryId other than pg_stat_statements, unless they
are okay with whatever definition pg_stat_statements is using (which
is a moving target BTW).

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Bruce Momjian
On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > We are reaching the two-year mark on this feature, that everyone seems
> > to agree is needed.  Is any committer going to work on this to get it
> > into PG 14?  Should I take it?
> 
> I still say that it's a serious mistake to sanctify a query ID calculation
> method that was designed only for pg_stat_statement's needs as the one
> true way to do it.  But that's what exposing it in a core view would do.

OK, I am fine with creating a new method, and maybe having
pg_stat_statements use it.  Is that the direction we should be going in?
I do think we need _some_ method in core if we are going to be exposing
this value in pg_stat_activity and log_line_prefix.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Tom Lane
Bruce Momjian  writes:
> We are reaching the two-year mark on this feature, that everyone seems
> to agree is needed.  Is any committer going to work on this to get it
> into PG 14?  Should I take it?

I still say that it's a serious mistake to sanctify a query ID calculation
method that was designed only for pg_stat_statement's needs as the one
true way to do it.  But that's what exposing it in a core view would do.

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Bruce Momjian
On Sun, Mar 14, 2021 at 04:06:45PM +0800, Julien Rouhaud wrote:
> Recent conflict, thanks to cfbot.  v18 attached.

We are reaching the two-year mark on this feature, that everyone seems
to agree is needed.  Is any committer going to work on this to get it
into PG 14?  Should I take it?

I just read the thread and I didn't see any open issues.  Are there any?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-14 Thread Julien Rouhaud
Recent conflict, thanks to cfbot.  v18 attached.
>From fa94eba58ee0ca098cfde0d17de72dc230ee471c Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Wed, 14 Oct 2020 02:11:37 +0800
Subject: [PATCH v18 1/3] Move pg_stat_statements query jumbling to core.

A new compute_queryid GUC is also added, to control whether the queryid should
be computed.  It's now possible to disable core queryid computation and use
pg_stat_statements with a different algorithm to compute the queryid by using
third-party module.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +
 .../pg_stat_statements.conf   |   1 +
 doc/src/sgml/config.sgml  |  18 +
 src/backend/parser/analyze.c  |  14 +-
 src/backend/tcop/postgres.c   |   6 +-
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  | 834 ++
 src/include/parser/analyze.h  |   4 +-
 src/include/utils/guc.h   |   1 +
 src/include/utils/queryjumble.h   |  58 ++
 12 files changed, 969 insertions(+), 784 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..99bc7184cb 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * As of Postgres 9.2, this module normalizes query entries.  As of Postgres
+ * 14, the normalization is done by the core, if compute_queryid is enabled, or
+ * by third-party modules if enabled.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;
-
-	/* Allocated length of clocations array */
-	int			clocations_buf_size;
-
-	/* Current number of valid entries in clocations array */
-	int			clocations_count;
-
-	/* highest Param id we've seen, in order to start normalization correctly */
-	int			highest_extern_param_id;
-} pgssJumbleState;
-
 /* Local variables */
 
 /* Current nesting depth of ExecutorRun+ProcessUtility calls */
@@ -342,7 +291,8 @@ 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-01 Thread Julien Rouhaud
On Wed, Jan 20, 2021 at 12:43 AM Julien Rouhaud  wrote:
>
> On Fri, Jan 8, 2021 at 1:07 AM Julien Rouhaud  wrote:
> >
> > v15 that fixes recent conflicts.
>
> Rebase only, thanks to the cfbot!  V16 attached.

Recent commit exposed that the explain_filter() doesn't filter
negative sign.  This can now be a problem with query identifiers in
explain output as they use the whole bigint range.  v17 attached fixes
that, also rebased against current HEAD.
From 8904b60ed3cc770f209ae5f97804b4d1ffe7d175 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v17 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 14 +++---
 doc/src/sgml/ref/explain.sgml |  6 --
 src/backend/commands/explain.c| 18 ++
 src/test/regress/expected/explain.out | 11 ++-
 src/test/regress/sql/explain.sql  |  5 -
 5 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1763790473..2aeb146223 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7524,13 +7524,13 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
 Enables or disables in core query identifier computation.  A query
 identifier can be displayed in the pg_stat_activity
-view, or emitted in the log if configured via the  parameter.  The  extension also requires a query identifier
-to be computed.  Note that an external module can alternatively be used
-if the in core query identifier computation specification doesn't suit
-your need.  In this case, in core computation must be disabled.  The
-default is off.
+view, using EXPLAIN, or emitted in the log if
+configured via the  parameter.
+The  extension also requires a query
+identifier to be computed.  Note that an external module can
+alternatively be used if the in core query identifier computation
+specification doesn't suit your need.  In this case, in core
+computation must be disabled.  The default is off.

   
  
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..105b069b41 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
   the output column list for each node in the plan tree, schema-qualify
   table and function names, always label variables in expressions with
   their range table alias, and always print the name of each trigger for
-  which statistics are displayed.  This parameter defaults to
-  FALSE.
+  which statistics are displayed.  The query identifier will also be
+  displayed if one has been compute, see  for more details.  This parameter
+  defaults to FALSE.
  
 

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..ac5879c1cf 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate = NULL;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_queryid)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -598,6 +608,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	{
+		char	buf[MAXINT8LEN+1];
+
+		pg_lltoa(plannedstmt->queryId, buf);
+		ExplainPropertyText("Query Identifier", buf, es);
+	}
+
 	/* Show buffer usage in planning */
 	if (bufusage)
 	{
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index dc7ab2ce8b..f45f069f30 100644
--- 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-01-19 Thread Julien Rouhaud
Hello Yamada-san,

On Wed, Jan 20, 2021 at 10:06 AM Tatsuro Yamada
 wrote:
>
> Hi Julien,
>
>
> >> Rebase only, thanks to the cfbot!  V16 attached.
> >
> > I tested the v16 patch on a0efda88a by using "make installcheck-parallel", 
> > and
> > my result is the following. Attached file is regression.diffs.
>
>
> Sorry, my environment was not suitable for the test when I sent my previous 
> email.
> I fixed my environment and tested it again, and it was a success. See below:
>
> ===
>   All 202 tests passed.
> ===

No worries, thanks a lot for testing!




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-01-19 Thread Tatsuro Yamada

Hi Julien,



Rebase only, thanks to the cfbot!  V16 attached.


I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and
my result is the following. Attached file is regression.diffs.



Sorry, my environment was not suitable for the test when I sent my previous 
email.
I fixed my environment and tested it again, and it was a success. See below:

===
 All 202 tests passed.
===

Regards,
Tatsuro Yamada






Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-01-19 Thread Tatsuro Yamada

Hi Julien,


Rebase only, thanks to the cfbot!  V16 attached.


I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and
my result is the following. Attached file is regression.diffs.


 1 of 202 tests failed.


The differences that caused some tests to fail can be viewed in the
file "/home/postgres/PG140/src/test/regress/regression.diffs".  A copy of the 
test summary that you see
above is saved in the file 
"/home/postgres/PG140/src/test/regress/regression.out".


src/test/regress/regression.diffs
-
diff -U3 /home/postgres/PG140/src/test/regress/expected/rules.out 
/home/postgres/PG140/src/test/regress/results/rules.out
--- /home/postgres/PG140/src/test/regress/expected/rules.out2021-01-20 
08:41:16.383175559 +0900
+++ /home/postgres/PG140/src/test/regress/results/rules.out 2021-01-20 
08:43:46.589171774 +0900
@@ -1760,10 +1760,9 @@
 s.state,
 s.backend_xid,
 s.backend_xmin,
-s.queryid,
 s.query,
 s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
...

Thanks,
Tatsuro Yamada
diff -U3 /home/postgres/PG140/src/test/regress/expected/rules.out 
/home/postgres/PG140/src/test/regress/results/rules.out
--- /home/postgres/PG140/src/test/regress/expected/rules.out2021-01-20 
08:41:16.383175559 +0900
+++ /home/postgres/PG140/src/test/regress/results/rules.out 2021-01-20 
08:52:10.891159065 +0900
@@ -1760,10 +1760,9 @@
 s.state,
 s.backend_xid,
 s.backend_xmin,
-s.queryid,
 s.query,
 s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
  LEFT JOIN pg_database d ON ((s.datid = d.oid)))
  LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1875,7 +1874,7 @@
 s.gss_auth AS gss_authenticated,
 s.gss_princ AS principal,
 s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
 s.datid,
@@ -2032,7 +2031,7 @@
 w.sync_priority,
 w.sync_state,
 w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-01-19 Thread Julien Rouhaud
On Fri, Jan 8, 2021 at 1:07 AM Julien Rouhaud  wrote:
>
> v15 that fixes recent conflicts.

Rebase only, thanks to the cfbot!  V16 attached.
From a0388c53d9755cfd706513f7f02a08b31a48aacb Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v16 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++---
 doc/src/sgml/config.sgml  |  29 +++--
 doc/src/sgml/monitoring.sgml  |  16 +++
 src/backend/catalog/system_views.sql  |   1 +
 src/backend/executor/execMain.c   |   8 ++
 src/backend/executor/execParallel.c   |  14 ++-
 src/backend/executor/nodeGather.c |   3 +-
 src/backend/executor/nodeGatherMerge.c|   4 +-
 src/backend/parser/analyze.c  |   5 +
 src/backend/postmaster/pgstat.c   |  65 ++
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/adt/pgstatfuncs.c   |   7 +-
 src/backend/utils/error/elog.c|   9 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  |  29 +++--
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/executor/execParallel.h   |   3 +-
 src/include/pgstat.h  |   5 +
 src/include/utils/queryjumble.h   |   2 +-
 src/test/regress/expected/rules.out   |   9 +-
 20 files changed, 223 insertions(+), 110 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 3db4fa2f7a..ce166f417e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+	!IsA(n, PrepareStmt) && \
+	!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
 DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 	   int query_location, int query_len,
 	   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-01-07 Thread Julien Rouhaud
On Sun, Oct 18, 2020 at 4:12 PM Julien Rouhaud  wrote:
>
> On Sun, Oct 18, 2020 at 12:20 PM Tom Lane  wrote:
> >
> > Alvaro Herrera  writes:
> > > Wait ... what?  I've been thinking that this GUC is just to enable or
> > > disable the computation of query ID, not to change the algorithm to do
> > > so.  Do we really need to allow different algorithms in different
> > > sessions?
> >
> > We established that some time ago, no?
>
> I thought we established the need for allowing different algorithms,
> but I assumed globally not per session.  Anyway, allowing to enable or
> disable compute_queryid per session would technically allow that,
> assuming that you have another module loaded that computes a queryid
> only if no-one was already computed.  In that case pg_stat_statements
> works as you would expect, you will get a new entry, with a duplicated
> query text.
>
> With a bit more thinking, there's at least one use case where it's
> interesting to disable pg_stat_statements: queries using temporary
> tables.  In that case you're guaranteed to generate an infinity of
> different queryid.  That doesn't really help since you're not
> aggregating anything anymore, and it also makes pg_stat_statements
> virtually unusable as once you have a workload that needs frequent
> eviction, the overhead is so bad that you basically have to disable
> pg_stat_statements.  We could alternatively add a GUC to disable
> queryid computation when one of the tables is a temporary table, but
> that's yet one among many considerations that are probably best
> answered with a custom implementation.
>
> I'm also attaching an updated patch with some attempt to improve the
> documentation.  I mention that in-core algorithm may not suits
> everyone's needs, but we don't actually document what heuristics are.
> Should we give more details on them and what are the most direct
> consequences?

v15 that fixes recent conflicts.
From 952796fa1c65000948ed2a267f76676e354c989c Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v15 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 14 +++---
 doc/src/sgml/ref/explain.sgml |  6 --
 src/backend/commands/explain.c| 18 ++
 src/test/regress/expected/explain.out |  9 +
 src/test/regress/sql/explain.sql  |  3 +++
 5 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d9c85a1f80..1d2f7ba393 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7487,13 +7487,13 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
 Enables or disables in core query identifier computation.  A query
 identifier can be displayed in the pg_stat_activity
-view, or emitted in the log if configured via the  parameter.  The  extension also requires a query identifier
-to be computed.  Note that an external module can alternatively be used
-if the in core query identifier computation specification doesn't suit
-your need.  In this case, in core computation must be disabled.  The
-default is off.
+view, using EXPLAIN, or emitted in the log if
+configured via the  parameter.
+The  extension also requires a query
+identifier to be computed.  Note that an external module can
+alternatively be used if the in core query identifier computation
+specification doesn't suit your need.  In this case, in core
+computation must be disabled.  The default is off.

   
  
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..105b069b41 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
   the output column list for each node in the plan tree, schema-qualify
   table and function names, always label variables in expressions with
   their range table alias, and always print the name of each trigger for
-  which statistics are displayed.  This parameter defaults to
-  FALSE.
+  which statistics are displayed.  The query identifier will also be
+  displayed if one has been compute, see  for more details.  This parameter
+  defaults to FALSE.
  
 

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..2e1b4bf0bf 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-18 Thread Julien Rouhaud
On Sun, Oct 18, 2020 at 12:20 PM Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > Wait ... what?  I've been thinking that this GUC is just to enable or
> > disable the computation of query ID, not to change the algorithm to do
> > so.  Do we really need to allow different algorithms in different
> > sessions?
>
> We established that some time ago, no?

I thought we established the need for allowing different algorithms,
but I assumed globally not per session.  Anyway, allowing to enable or
disable compute_queryid per session would technically allow that,
assuming that you have another module loaded that computes a queryid
only if no-one was already computed.  In that case pg_stat_statements
works as you would expect, you will get a new entry, with a duplicated
query text.

With a bit more thinking, there's at least one use case where it's
interesting to disable pg_stat_statements: queries using temporary
tables.  In that case you're guaranteed to generate an infinity of
different queryid.  That doesn't really help since you're not
aggregating anything anymore, and it also makes pg_stat_statements
virtually unusable as once you have a workload that needs frequent
eviction, the overhead is so bad that you basically have to disable
pg_stat_statements.  We could alternatively add a GUC to disable
queryid computation when one of the tables is a temporary table, but
that's yet one among many considerations that are probably best
answered with a custom implementation.

I'm also attaching an updated patch with some attempt to improve the
documentation.  I mention that in-core algorithm may not suits
everyone's needs, but we don't actually document what heuristics are.
Should we give more details on them and what are the most direct
consequences?
From 4b1f4ed2bfc2917879a33cc1348157f2fffd0cb4 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v14 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++---
 doc/src/sgml/config.sgml  |  29 +++--
 doc/src/sgml/monitoring.sgml  |  16 +++
 src/backend/catalog/system_views.sql  |   1 +
 src/backend/executor/execMain.c   |   8 ++
 src/backend/executor/execParallel.c   |  14 ++-
 src/backend/executor/nodeGather.c |   3 +-
 src/backend/executor/nodeGatherMerge.c|   4 +-
 src/backend/parser/analyze.c  |   5 +
 src/backend/postmaster/pgstat.c   |  65 ++
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/adt/pgstatfuncs.c   |   7 +-
 src/backend/utils/error/elog.c|  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  |  29 +++--
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/executor/execParallel.h   |   3 +-
 src/include/pgstat.h  |   5 +
 src/include/utils/queryjumble.h   |   2 +-
 src/test/regress/expected/rules.out   |   9 +-
 20 files changed, 224 insertions(+), 110 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f352d0b615..2a69dbb88e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 
 PG_MODULE_MAGIC;
@@ -98,6 +99,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+	!IsA(n, PrepareStmt) && \
+	!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -295,7 +304,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
 DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-17 Thread Tom Lane
Alvaro Herrera  writes:
> Wait ... what?  I've been thinking that this GUC is just to enable or
> disable the computation of query ID, not to change the algorithm to do
> so.  Do we really need to allow different algorithms in different
> sessions?

We established that some time ago, no?

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-17 Thread Alvaro Herrera
On 2020-Oct-17, Tom Lane wrote:

> Fair point, but if we allow several different values to be set in
> different sessions, what ends up happening in pg_stat_statements?
> 
> On the other hand, maybe that's just a matter for documentation.
> "If the 'same' query is processed with two different queryID settings,
> that will generally result in two separate table entries, because
> the same ID hash is unlikely to be produced in both cases".

Wait ... what?  I've been thinking that this GUC is just to enable or
disable the computation of query ID, not to change the algorithm to do
so.  Do we really need to allow different algorithms in different
sessions?




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-17 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Oct-17, Julien Rouhaud wrote:
>> On Sat, Oct 17, 2020 at 12:23 AM Tom Lane  wrote:
>>> then there's a potential security issue if the GUC is USERSET level:
>>> a user could hide her queries from pg_stat_statement by turning the
>>> GUC off.  So this line of thought suggests the GUC needs to be at
>>> least SUSET, and maybe higher ... doesn't pg_stat_statement need it
>>> to have the same value cluster-wide?

> I don't think we should consider pg_stat_statement a bulletproof defense
> for security problems.  It is already lossy by design.

Fair point, but if we allow several different values to be set in
different sessions, what ends up happening in pg_stat_statements?

On the other hand, maybe that's just a matter for documentation.
"If the 'same' query is processed with two different queryID settings,
that will generally result in two separate table entries, because
the same ID hash is unlikely to be produced in both cases".  There
is certainly a use-case for wanting to be able to do this, if for
example you'd like different query aggregation behavior for different
applications.

> I do think it'd be preferrable if we allowed it to be disabled at the
> config file level only, not with SET (prevent users from hiding stuff);
> but I think it is useful to allow users to enable it for specific
> queries or for specific sessions only, while globally disabled.

Indeed.  I'm kind of talking myself into the idea that USERSET, or
at most SUSET, is fine, so long as we document what happens when it
has different values in different sessions.

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-17 Thread Alvaro Herrera
On 2020-Oct-17, Julien Rouhaud wrote:

> On Sat, Oct 17, 2020 at 12:23 AM Tom Lane  wrote:

> > then there's a potential security issue if the GUC is USERSET level:
> > a user could hide her queries from pg_stat_statement by turning the
> > GUC off.  So this line of thought suggests the GUC needs to be at
> > least SUSET, and maybe higher ... doesn't pg_stat_statement need it
> > to have the same value cluster-wide?
> 
> Well, I don't think that there's any guarantee that pg_stat_statemens
> will display all activity that has been run, since there's a limited
> amount of (userid, dbid, queryid) that can be stored, but I agree that
> allowing random user to hide their activity isn't nice.  Note that I
> defined the GUC as SUSET, but maybe it should be SIGHUP?

I don't think we should consider pg_stat_statement a bulletproof defense
for security problems.  It is already lossy by design.

I do think it'd be preferrable if we allowed it to be disabled at the
config file level only, not with SET (prevent users from hiding stuff);
but I think it is useful to allow users to enable it for specific
queries or for specific sessions only, while globally disabled.  This
might mean we need to mark it PGC_SIGHUP and then have the check hook
disallow it from being changed under such-and-such conditions.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-16 Thread Julien Rouhaud
On Fri, Oct 16, 2020 at 11:04 PM Bruce Momjian  wrote:
>
> On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote:
> > On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian  wrote:
> > > There is that, and log_line_prefix, which I can imaging being useful.
> > > My point is that if the queryid is visible, there should be a reason it
> > > defaults to show empty.
> >
> > I did some naive benchmarking.  Using a custom pgbench script with this 
> > query:
> >
> > SELECT *
> > FROM pg_class c
> > JOIN pg_attribute a ON a.attrelid = c.oid
> > ORDER BY 1 DESC
> > LIMIT 1;
> >
> > I can see around 2% overhead (this query is reported with ~ 3ms
> > latency average).  Adding a few joins, overhead goes down to 1%.
>
> That number is too high to enable this by default.  I suggest we either
> improve the performance of this, or clearly document that you have to
> enable the hash computation to see the pg_stat_activity and
> log_line_prefix fields.

I realize that I didn't update the documentation part to reflect the
new GUC.  I'll fix that and add more warnings about the requirements
to have values displayed in pg_stat_acitivity and log_line_prefix.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-16 Thread Julien Rouhaud
On Sat, Oct 17, 2020 at 12:23 AM Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > In this case, I suppose using pg_stat_statement would require to have it
> > enabled, and it'd just not collect anything if disabled.

Yes, my idea was to be able to have pg_stat_statements enabled even if
no queryid is computed without that being a problem, and the patch I
sent should handle that properly, as pgss_store (and a few other
places) check for a non-zero queryid before doing any work.

Also, we can't have pg_stat_statements have any specific behavior
based on the new GUC, as there could alternatively be another module
that handles the queryid generation.

> Alternatively, pg_stat_statement might be able to force it on
> (applying a non-overridable PGC_INTERNAL-level setting) on load?
> Not sure if that'd be desirable or not.
>
> If the behavior of pg_stat_statement is to do nothing when it
> sees a query without the ID calculated (which I guess it'd have to)

Yes that's what it does.

> then there's a potential security issue if the GUC is USERSET level:
> a user could hide her queries from pg_stat_statement by turning the
> GUC off.  So this line of thought suggests the GUC needs to be at
> least SUSET, and maybe higher ... doesn't pg_stat_statement need it
> to have the same value cluster-wide?

Well, I don't think that there's any guarantee that pg_stat_statemens
will display all activity that has been run, since there's a limited
amount of (userid, dbid, queryid) that can be stored, but I agree that
allowing random user to hide their activity isn't nice.  Note that I
defined the GUC as SUSET, but maybe it should be SIGHUP?




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-16 Thread Bruce Momjian
On Fri, Oct 16, 2020 at 01:03:55PM -0300, Álvaro Herrera wrote:
> On 2020-Oct-16, Bruce Momjian wrote:
> 
> > On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote:
> 
> > > I did some naive benchmarking.  Using a custom pgbench script with this 
> > > query:
> 
> > > I can see around 2% overhead (this query is reported with ~ 3ms
> > > latency average).  Adding a few joins, overhead goes down to 1%.
> > 
> > That number is too high to enable this by default.  I suggest we either
> > improve the performance of this, or clearly document that you have to
> > enable the hash computation to see the pg_stat_activity and
> > log_line_prefix fields.
> 
> Agreed.  This is similar to how we used to deal with query strings: an
> optional feature, disabled by default (cf.  commit b13c9686d084).
> 
> In this case, I suppose using pg_stat_statement would require to have it
> enabled, and it'd just not collect anything if disabled.  Similarly, the
> field would show NULL in pg_stat_activity or an empty string in
> log_line_prefix/CSV logs.

Yes, and at each use point, e.g., pg_stat_activity, log_line_prefix, we
have to remind people how to turn hash compuation on.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-16 Thread Tom Lane
Alvaro Herrera  writes:
> In this case, I suppose using pg_stat_statement would require to have it
> enabled, and it'd just not collect anything if disabled.

Alternatively, pg_stat_statement might be able to force it on
(applying a non-overridable PGC_INTERNAL-level setting) on load?
Not sure if that'd be desirable or not.

If the behavior of pg_stat_statement is to do nothing when it
sees a query without the ID calculated (which I guess it'd have to)
then there's a potential security issue if the GUC is USERSET level:
a user could hide her queries from pg_stat_statement by turning the
GUC off.  So this line of thought suggests the GUC needs to be at
least SUSET, and maybe higher ... doesn't pg_stat_statement need it
to have the same value cluster-wide?

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-16 Thread Alvaro Herrera
On 2020-Oct-16, Bruce Momjian wrote:

> On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote:

> > I did some naive benchmarking.  Using a custom pgbench script with this 
> > query:

> > I can see around 2% overhead (this query is reported with ~ 3ms
> > latency average).  Adding a few joins, overhead goes down to 1%.
> 
> That number is too high to enable this by default.  I suggest we either
> improve the performance of this, or clearly document that you have to
> enable the hash computation to see the pg_stat_activity and
> log_line_prefix fields.

Agreed.  This is similar to how we used to deal with query strings: an
optional feature, disabled by default (cf.  commit b13c9686d084).

In this case, I suppose using pg_stat_statement would require to have it
enabled, and it'd just not collect anything if disabled.  Similarly, the
field would show NULL in pg_stat_activity or an empty string in
log_line_prefix/CSV logs.

So users that want it can easily have it, and users that don't are not
paying the price.

For maximum user-friendliness, pg_stat_statement could be loaded and
shmem-initialized even when query ID computation is turned off, and
you'd be able to enable query ID computation with just SIGHUP; so you
don't have to restart the server in order to enable statement tracking.
(I suppose we would forbid users from disabling query ID with SET,
though.)




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-16 Thread Bruce Momjian
On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote:
> On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian  wrote:
> > There is that, and log_line_prefix, which I can imaging being useful.
> > My point is that if the queryid is visible, there should be a reason it
> > defaults to show empty.
> 
> I did some naive benchmarking.  Using a custom pgbench script with this query:
> 
> SELECT *
> FROM pg_class c
> JOIN pg_attribute a ON a.attrelid = c.oid
> ORDER BY 1 DESC
> LIMIT 1;
> 
> I can see around 2% overhead (this query is reported with ~ 3ms
> latency average).  Adding a few joins, overhead goes down to 1%.

That number is too high to enable this by default.  I suggest we either
improve the performance of this, or clearly document that you have to
enable the hash computation to see the pg_stat_activity and
log_line_prefix fields.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-14 Thread Julien Rouhaud
On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian  wrote:
>
> On Wed, Oct 14, 2020 at 10:34:31PM +0800, Julien Rouhaud wrote:
> > On Wed, Oct 14, 2020 at 10:31 PM Tom Lane  wrote:
> > >
> > > Bruce Momjian  writes:
> > > > Is there a measureable overhead when this is turned on, since it is off
> > > > by default and maybe should default to on.
> > >
> > > I don't believe that "default to on" can even be in the discussion.
> > > There is no in-core feature that would use this by default.
> >
> > If the 2nd patch is applied there would be pg_stat_activity.queryid
> > column, but I doubt that's a strong enough argument.
>
> There is that, and log_line_prefix, which I can imaging being useful.
> My point is that if the queryid is visible, there should be a reason it
> defaults to show empty.

I did some naive benchmarking.  Using a custom pgbench script with this query:

SELECT *
FROM pg_class c
JOIN pg_attribute a ON a.attrelid = c.oid
ORDER BY 1 DESC
LIMIT 1;

I can see around 2% overhead (this query is reported with ~ 3ms
latency average).  Adding a few joins, overhead goes down to 1%.
Adding on top of the join some WHERE and GROUP BY conditions, overhead
goes down to 0.2% (at that point average latency is around 9ms on my
laptop).  So having this enabled by default is probably only going to
hit people with OLTP-style workload with a majority of queries running
in a couple of milliseconds or less, which isn't that uncommon.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-14 Thread Bruce Momjian
On Wed, Oct 14, 2020 at 10:34:31PM +0800, Julien Rouhaud wrote:
> On Wed, Oct 14, 2020 at 10:31 PM Tom Lane  wrote:
> >
> > Bruce Momjian  writes:
> > > Is there a measureable overhead when this is turned on, since it is off
> > > by default and maybe should default to on.
> >
> > I don't believe that "default to on" can even be in the discussion.
> > There is no in-core feature that would use this by default.
> 
> If the 2nd patch is applied there would be pg_stat_activity.queryid
> column, but I doubt that's a strong enough argument.

There is that, and log_line_prefix, which I can imaging being useful. 
My point is that if the queryid is visible, there should be a reason it
defaults to show empty.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-14 Thread Julien Rouhaud
On Wed, Oct 14, 2020 at 10:31 PM Tom Lane  wrote:
>
> Bruce Momjian  writes:
> > Is there a measureable overhead when this is turned on, since it is off
> > by default and maybe should default to on.
>
> I don't believe that "default to on" can even be in the discussion.
> There is no in-core feature that would use this by default.

If the 2nd patch is applied there would be pg_stat_activity.queryid
column, but I doubt that's a strong enough argument.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-14 Thread Tom Lane
Bruce Momjian  writes:
> Is there a measureable overhead when this is turned on, since it is off
> by default and maybe should default to on.

I don't believe that "default to on" can even be in the discussion.
There is no in-core feature that would use this by default.

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-14 Thread Bruce Momjian
On Wed, Oct 14, 2020 at 10:21:24PM +0800, Julien Rouhaud wrote:
> On Wed, Oct 14, 2020 at 10:09 PM Bruce Momjian  wrote:
> > > > OK, I came up with the hash idea only to address one of your concerns
> > > > about mismatched hashes for algorithm improvements/changes.  Seems we
> > > > might as well just document that cross-version hashes are different.
> > >
> > > Ok, so I tried to implement what seems to be the consensus.  First
> > > attached patch moves the current pgss queryid computation in core,
> > > with a new compute_queryid GUC (on/off).  One thing I don't really
> >
> > Why would someone turn compute_queryid off?   Overhead?
> 
> Yes, or possibly to use a different algorithm.

Is there a measureable overhead when this is turned on, since it is off
by default and maybe should default to on.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-14 Thread Julien Rouhaud
On Wed, Oct 14, 2020 at 10:09 PM Bruce Momjian  wrote:
>
> On Wed, Oct 14, 2020 at 05:43:33PM +0800, Julien Rouhaud wrote:
> > On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian  wrote:
> > >
> > > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:
> > > > Bruce Momjian  writes:
> > > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
> > > > >> Yeah, I agree --- a version number is the wrong way to think about 
> > > > >> this.
> > > >
> > > > > The version number was to invalidate _all_ query hashes if the
> > > > > algorithm is slightly modified, rather than invalidating just some of
> > > > > them, which could lead to confusion.
> > > >
> > > > Color me skeptical as to the use-case for that.  From users' 
> > > > standpoints,
> > > > the hash is mainly going to change when we change the set of parse node
> > > > fields that get hashed.  Which is going to happen at every major release
> > > > and no (or at least epsilon) minor releases.  So I do not see a point in
> > > > tracking an algorithm version number as such.  Seems like make-work.
> > >
> > > OK, I came up with the hash idea only to address one of your concerns
> > > about mismatched hashes for algorithm improvements/changes.  Seems we
> > > might as well just document that cross-version hashes are different.
> >
> > Ok, so I tried to implement what seems to be the consensus.  First
> > attached patch moves the current pgss queryid computation in core,
> > with a new compute_queryid GUC (on/off).  One thing I don't really
>
> Why would someone turn compute_queryid off?   Overhead?

Yes, or possibly to use a different algorithm.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-14 Thread Bruce Momjian
On Wed, Oct 14, 2020 at 05:43:33PM +0800, Julien Rouhaud wrote:
> On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian  wrote:
> >
> > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
> > > >> Yeah, I agree --- a version number is the wrong way to think about 
> > > >> this.
> > >
> > > > The version number was to invalidate _all_ query hashes if the
> > > > algorithm is slightly modified, rather than invalidating just some of
> > > > them, which could lead to confusion.
> > >
> > > Color me skeptical as to the use-case for that.  From users' standpoints,
> > > the hash is mainly going to change when we change the set of parse node
> > > fields that get hashed.  Which is going to happen at every major release
> > > and no (or at least epsilon) minor releases.  So I do not see a point in
> > > tracking an algorithm version number as such.  Seems like make-work.
> >
> > OK, I came up with the hash idea only to address one of your concerns
> > about mismatched hashes for algorithm improvements/changes.  Seems we
> > might as well just document that cross-version hashes are different.
> 
> Ok, so I tried to implement what seems to be the consensus.  First
> attached patch moves the current pgss queryid computation in core,
> with a new compute_queryid GUC (on/off).  One thing I don't really

Why would someone turn compute_queryid off?   Overhead?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-14 Thread Julien Rouhaud
On Wed, Oct 14, 2020 at 5:43 PM Julien Rouhaud  wrote:
>
> On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian  wrote:
> >
> > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
> > > >> Yeah, I agree --- a version number is the wrong way to think about 
> > > >> this.
> > >
> > > > The version number was to invalidate _all_ query hashes if the
> > > > algorithm is slightly modified, rather than invalidating just some of
> > > > them, which could lead to confusion.
> > >
> > > Color me skeptical as to the use-case for that.  From users' standpoints,
> > > the hash is mainly going to change when we change the set of parse node
> > > fields that get hashed.  Which is going to happen at every major release
> > > and no (or at least epsilon) minor releases.  So I do not see a point in
> > > tracking an algorithm version number as such.  Seems like make-work.
> >
> > OK, I came up with the hash idea only to address one of your concerns
> > about mismatched hashes for algorithm improvements/changes.  Seems we
> > might as well just document that cross-version hashes are different.
>
> Ok, so I tried to implement what seems to be the consensus.  First
> attached patch moves the current pgss queryid computation in core,
> with a new compute_queryid GUC (on/off).  One thing I don't really
> like about this patch is that the JumbleState that pgss needs in order
> to normalize the query string (the constants location and such) has to
> be done by the core while computing the queryid and provided to pgss
> in post_parse_analyse hook.  That isn't ideal as it looks very
> specific to pgss needs.  On the other hand it means that you can now
> use pgss with custom queryid heuristics by disabling compute_queryid
> and having your module doing only that in post_parse_analyse_hook.
> You'll however need to be careful to configure
> shared_preload_libraries such that your custom module's
> post_parse_analyse_hook is called first, so pgss' one can be called
> with the needed JumbleState.  Note that if no JumbleState is provided
> pgss will store non normalized queries, but will otherwise behave as
> intended.
>
> The 2nd patch is the rebased original queryid exposure patch.  No big
> changes, except that it now handles utility statements queryid
> generated during post_parse_analysis, same as regular queries.  This
> should simplify the work needed for custom queryid third party
> modules.
>
> The 3rd patch changes explain (verbose) to display the queryid if one
> has been generated, whether by core or a third-party module.  For
> instance:
>
> rjuju=# set compute_queryid = on;
> SET
> rjuju=# explain (verbose) select relname from pg_class;
>   QUERY PLAN
> ---
>  Seq Scan on pg_catalog.pg_class  (cost=0.00..16.90 rows=390 width=64)
>Output: relname
>  Query Identifier: -5494854185674379299
> (3 rows)

There was a possibly uninitialized var issue in the previous patches
(thanks cfbot), v13 fixes that.
From ee578a9128898d69ff50bf5db59bebf55ed13250 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v13 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++---
 doc/src/sgml/config.sgml  |   9 +-
 doc/src/sgml/monitoring.sgml  |  15 +++
 src/backend/catalog/system_views.sql  |   1 +
 src/backend/executor/execMain.c   |   8 ++
 src/backend/executor/execParallel.c   |  14 ++-
 src/backend/executor/nodeGather.c |   3 +-
 src/backend/executor/nodeGatherMerge.c|   4 +-
 src/backend/parser/analyze.c  |   5 +
 src/backend/postmaster/pgstat.c   |  65 ++
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/adt/pgstatfuncs.c   |   7 +-
 src/backend/utils/error/elog.c|  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  |  29 +++--
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/executor/execParallel.h   |   3 +-
 src/include/pgstat.h  |   5 +
 src/include/utils/queryjumble.h   |   2 +-
 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-14 Thread Julien Rouhaud
On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian  wrote:
>
> On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
> > >> Yeah, I agree --- a version number is the wrong way to think about this.
> >
> > > The version number was to invalidate _all_ query hashes if the
> > > algorithm is slightly modified, rather than invalidating just some of
> > > them, which could lead to confusion.
> >
> > Color me skeptical as to the use-case for that.  From users' standpoints,
> > the hash is mainly going to change when we change the set of parse node
> > fields that get hashed.  Which is going to happen at every major release
> > and no (or at least epsilon) minor releases.  So I do not see a point in
> > tracking an algorithm version number as such.  Seems like make-work.
>
> OK, I came up with the hash idea only to address one of your concerns
> about mismatched hashes for algorithm improvements/changes.  Seems we
> might as well just document that cross-version hashes are different.

Ok, so I tried to implement what seems to be the consensus.  First
attached patch moves the current pgss queryid computation in core,
with a new compute_queryid GUC (on/off).  One thing I don't really
like about this patch is that the JumbleState that pgss needs in order
to normalize the query string (the constants location and such) has to
be done by the core while computing the queryid and provided to pgss
in post_parse_analyse hook.  That isn't ideal as it looks very
specific to pgss needs.  On the other hand it means that you can now
use pgss with custom queryid heuristics by disabling compute_queryid
and having your module doing only that in post_parse_analyse_hook.
You'll however need to be careful to configure
shared_preload_libraries such that your custom module's
post_parse_analyse_hook is called first, so pgss' one can be called
with the needed JumbleState.  Note that if no JumbleState is provided
pgss will store non normalized queries, but will otherwise behave as
intended.

The 2nd patch is the rebased original queryid exposure patch.  No big
changes, except that it now handles utility statements queryid
generated during post_parse_analysis, same as regular queries.  This
should simplify the work needed for custom queryid third party
modules.

The 3rd patch changes explain (verbose) to display the queryid if one
has been generated, whether by core or a third-party module.  For
instance:

rjuju=# set compute_queryid = on;
SET
rjuju=# explain (verbose) select relname from pg_class;
  QUERY PLAN
---
 Seq Scan on pg_catalog.pg_class  (cost=0.00..16.90 rows=390 width=64)
   Output: relname
 Query Identifier: -5494854185674379299
(3 rows)
From 4a81289f02e9bfb796317b32d492eb949c9ed4a1 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v12 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 src/backend/commands/explain.c| 18 ++
 src/test/regress/expected/explain.out |  9 +
 src/test/regress/sql/explain.sql  |  3 +++
 3 files changed, 30 insertions(+)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c8e292adfa..bb08c18a3a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
+	JumbleState *jstate;
+	Query		*query;
 	List	   *rewritten;
 	ListCell   *lc;
 	bool		timing_set = false;
@@ -239,6 +242,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	/* if the summary was not set explicitly, set default value */
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
+	query = castNode(Query, stmt->query);
+	if (compute_queryid)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
+	if (post_parse_analyze_hook)
+		(*post_parse_analyze_hook) (pstate, query, jstate);
+
 	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
@@ -582,6 +592,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-12 Thread Bruce Momjian
On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
> >> Yeah, I agree --- a version number is the wrong way to think about this.
> 
> > The version number was to invalidate _all_ query hashes if the
> > algorithm is slightly modified, rather than invalidating just some of
> > them, which could lead to confusion.
> 
> Color me skeptical as to the use-case for that.  From users' standpoints,
> the hash is mainly going to change when we change the set of parse node
> fields that get hashed.  Which is going to happen at every major release
> and no (or at least epsilon) minor releases.  So I do not see a point in
> tracking an algorithm version number as such.  Seems like make-work.

OK, I came up with the hash idea only to address one of your concerns
about mismatched hashes for algorithm improvements/changes.  Seems we
might as well just document that cross-version hashes are different.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-12 Thread Tom Lane
Bruce Momjian  writes:
> On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
>> Yeah, I agree --- a version number is the wrong way to think about this.

> The version number was to invalidate _all_ query hashes if the
> algorithm is slightly modified, rather than invalidating just some of
> them, which could lead to confusion.

Color me skeptical as to the use-case for that.  From users' standpoints,
the hash is mainly going to change when we change the set of parse node
fields that get hashed.  Which is going to happen at every major release
and no (or at least epsilon) minor releases.  So I do not see a point in
tracking an algorithm version number as such.  Seems like make-work.

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-12 Thread Bruce Momjian
On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > I don't really understand how a version number helps. It's not like
> > there is going to be a v2 that is in all ways better than v1. If there
> > are different algorithms here, they are going to be customized for
> > different needs.
> 
> Yeah, I agree --- a version number is the wrong way to think about this.
> It's gonna be more like algorithm foo versus algorithm bar versus
> algorithm baz, where each one is better for a specific set of use-cases.
> Julien already noted the point about hashing object OIDs versus object
> names; one can easily imagine disagreeing with pg_stat_statement's
> choices about ignoring values of constants; other properties of statements
> might be irrelevant for some use-cases; and so on.

The version number was to invalidate _all_ query hashes if the
algorithm is slightly modified, rather than invalidating just some of
them, which could lead to confusion.  The idea of selectable hash
algorithms is nice if people feel there is sufficient need for that.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-12 Thread Tom Lane
Robert Haas  writes:
> I don't really understand how a version number helps. It's not like
> there is going to be a v2 that is in all ways better than v1. If there
> are different algorithms here, they are going to be customized for
> different needs.

Yeah, I agree --- a version number is the wrong way to think about this.
It's gonna be more like algorithm foo versus algorithm bar versus
algorithm baz, where each one is better for a specific set of use-cases.
Julien already noted the point about hashing object OIDs versus object
names; one can easily imagine disagreeing with pg_stat_statement's
choices about ignoring values of constants; other properties of statements
might be irrelevant for some use-cases; and so on.

I'm okay with moving pg_stat_statement's existing algorithm into core as
long as there's a way for extensions to override it.  With proper design,
that would allow extensions that do override it to coexist with
pg_stat_statements (thereby redefining the latter's idea of which
statements are "the same"), which is something that doesn't really work
nicely today.

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-12 Thread Robert Haas
On Mon, Oct 12, 2020 at 10:14 AM Bruce Momjian  wrote:
> On Mon, Oct 12, 2020 at 04:20:05PM +0800, Julien Rouhaud wrote:
> > But there are many people that aren't happy with the current hashing
> > approach.  If we're going to move the computation in core, shouldn't
> > we listen to their complaints and let them pay some probably quite
> > high overhead to base the hash on name and/or fully qualified name
> > rather than OID?
> > For instance people using logical replication to upgrade to a newer
> > version may want to easily compare query performance on the new
> > version, or people with multi-tenant databases may want to ignore the
> > schema name to keep a low number of different queryid.
>
> Well, we have to consider how complex the user interface has to be to
> allow more flexibility.  We don't need to allow every option a user will
> want.
>
> With a version number, we have the ability to improve the algorithm or
> add customization, but for the first use, we are probably better off
> keeping it simple.

I thought your earlier idea of allowing this to be controlled by a GUC
was good. There could be a default method built into core, matching
what pg_stat_statements does, so you could select no hashing or that
method no matter what. Then extensions could provide other methods
which could be selected via the GUC.

I don't really understand how a version number helps. It's not like
there is going to be a v2 that is in all ways better than v1. If there
are different algorithms here, they are going to be customized for
different needs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-12 Thread Bruce Momjian
On Mon, Oct 12, 2020 at 04:20:05PM +0800, Julien Rouhaud wrote:
> But there are many people that aren't happy with the current hashing
> approach.  If we're going to move the computation in core, shouldn't
> we listen to their complaints and let them pay some probably quite
> high overhead to base the hash on name and/or fully qualified name
> rather than OID?
> For instance people using logical replication to upgrade to a newer
> version may want to easily compare query performance on the new
> version, or people with multi-tenant databases may want to ignore the
> schema name to keep a low number of different queryid.

Well, we have to consider how complex the user interface has to be to
allow more flexibility.  We don't need to allow every option a user will
want.

With a version number, we have the ability to improve the algorithm or
add customization, but for the first use, we are probably better off
keeping it simple.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-12 Thread Julien Rouhaud
On Wed, Oct 7, 2020 at 9:53 AM Bruce Momjian  wrote:
>
> On Wed, Oct  7, 2020 at 10:42:49AM +0900, Michael Paquier wrote:
> > On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:
> > > I propose moving the pg_stat_statements queryid hash code into the
> > > server (with a version number), and also adding a postgresql.conf
> > > variable that lets you control how detailed the queryid hash is
> > > computed.  This addresses the problem of people wanting different hash
> > > methods.
> >
> > In terms of making this part expendable in the future, there could be
> > a point in having an enum here, but are we sure that we will have a
> > need for that in the future?  What I get from this discussion is that
> > we want a unique source of truth that users can consume, and that the
> > only source of truth proposed is the PGSS hashing.  We may change the
> > way we compute the query ID in the future, for example if it gets
> > expanded to some utility statements, etc.  But that would be
> > controlled by the version number in the hash, not the GUC itself.
>
> Oh, if that is true, then I agree let's just go with the version number.

But there are many people that aren't happy with the current hashing
approach.  If we're going to move the computation in core, shouldn't
we listen to their complaints and let them pay some probably quite
high overhead to base the hash on name and/or fully qualified name
rather than OID?
For instance people using logical replication to upgrade to a newer
version may want to easily compare query performance on the new
version, or people with multi-tenant databases may want to ignore the
schema name to keep a low number of different queryid.

It would probably still be possible to have a custom queryid hashing
by disabling the core one and computing a new one in a custom
extension, but that seems a bit hackish.

Jumping back on Tom's point that there are judgment calls on what is
examined or not, after a quick look I see at least two possible
problems of ignored clauses:
- WITH TIES clause
- OVERRIDING clause

I personally think that they shouldn't be ignored, but I don't know if
they were only forgotten or ignored on purpose.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-06 Thread Bruce Momjian
On Wed, Oct  7, 2020 at 10:42:49AM +0900, Michael Paquier wrote:
> On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:
> > I propose moving the pg_stat_statements queryid hash code into the
> > server (with a version number), and also adding a postgresql.conf
> > variable that lets you control how detailed the queryid hash is
> > computed.  This addresses the problem of people wanting different hash
> > methods.
> 
> In terms of making this part expendable in the future, there could be
> a point in having an enum here, but are we sure that we will have a
> need for that in the future?  What I get from this discussion is that
> we want a unique source of truth that users can consume, and that the
> only source of truth proposed is the PGSS hashing.  We may change the
> way we compute the query ID in the future, for example if it gets
> expanded to some utility statements, etc.  But that would be
> controlled by the version number in the hash, not the GUC itself.

Oh, if that is true, then I agree let's just go with the version number.

> > When computing a hash, the queryid detail level and version number will
> > be mixed into the hash, so only a hash that used a similar query and
> > identical queryid detail level would match.
> 
> Yes, having a version number directly dependent on the hashing sounds
> like a good compromise to me.

Good, much simpler.  I think there is enough demand for a queryid that I
would like to get this moving forward.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-06 Thread Michael Paquier
On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:
> I propose moving the pg_stat_statements queryid hash code into the
> server (with a version number), and also adding a postgresql.conf
> variable that lets you control how detailed the queryid hash is
> computed.  This addresses the problem of people wanting different hash
> methods.

In terms of making this part expendable in the future, there could be
a point in having an enum here, but are we sure that we will have a
need for that in the future?  What I get from this discussion is that
we want a unique source of truth that users can consume, and that the
only source of truth proposed is the PGSS hashing.  We may change the
way we compute the query ID in the future, for example if it gets
expanded to some utility statements, etc.  But that would be
controlled by the version number in the hash, not the GUC itself.

> When computing a hash, the queryid detail level and version number will
> be mixed into the hash, so only a hash that used a similar query and
> identical queryid detail level would match.

Yes, having a version number directly dependent on the hashing sounds
like a good compromise to me.
--
Michael


signature.asc
Description: PGP signature


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-06 Thread Bruce Momjian
On Tue, Oct  6, 2020 at 02:34:58PM +0900, Michael Paquier wrote:
> On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote:
> > On Tue, Oct  6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote:
> >> Maybe we could add a new hook for only queryid computation, and add a
> >> GUC to let people choose between no queryid computed, core computation
> >> (current pg_stat_statement) and 3rd party plugin?
> > 
> > That all seems very complicated.  If we go in that direction, I suggest
> > we just give up getting any of this into core.
> 
> A GUC would have at least the advantage to make the computation
> consistent for any system willing to consume it, with the option to
> not pay any potential performance impact, though I have to admit that
> just moving the query ID computation of PGSS into core may not be the
> best option as a query ID of 0 means the same thing for a utility, for
> an initialization, and for a backend running a query with an unknown
> value, but that could be worked out.
> 
> FWIW, I think that adding the system ID in the hash is too
> restrictive, as it could be interesting for users to do stat
> comparisons across multiple systems running the same major version.
> It would be better to not give any strong guarantee that the query ID
> computed will remain consistent across major versions so as it is
> possible to keep improving it.  Also, if nothing has been done that
> changes the hashing computation, I see little benefit in forcing a
> breakage by adding something like PG_MAJORVERSION_NUM or such in the
> hash computation.

I thought some more about this.  First, I think having the queryid hash
code in the server, without requiring pg_stat_statements, is a
requirement --- I think too many people will want to use this feature
independent of pg_stat_statements.  Second, I understand the desire to
have different hash computation methods, depending on what level of
detail/matching you want.

I propose moving the pg_stat_statements queryid hash code into the
server (with a version number), and also adding a postgressql.conf
variable that lets you control how detailed the queryid hash is
computed.  This addresses the problem of people wanting different hash
methods.

When computing a hash, the queryid detail level and version number will
be mixed into the hash, so only a hash that used a similar query and
identical queryid detail level would match.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-05 Thread Michael Paquier
On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote:
> On Tue, Oct  6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote:
>> Maybe we could add a new hook for only queryid computation, and add a
>> GUC to let people choose between no queryid computed, core computation
>> (current pg_stat_statement) and 3rd party plugin?
> 
> That all seems very complicated.  If we go in that direction, I suggest
> we just give up getting any of this into core.

A GUC would have at least the advantage to make the computation
consistent for any system willing to consume it, with the option to
not pay any potential performance impact, though I have to admit that
just moving the query ID computation of PGSS into core may not be the
best option as a query ID of 0 means the same thing for a utility, for
an initialization, and for a backend running a query with an unknown
value, but that could be worked out.

FWIW, I think that adding the system ID in the hash is too
restrictive, as it could be interesting for users to do stat
comparisons across multiple systems running the same major version.
It would be better to not give any strong guarantee that the query ID
computed will remain consistent across major versions so as it is
possible to keep improving it.  Also, if nothing has been done that
changes the hashing computation, I see little benefit in forcing a
breakage by adding something like PG_MAJORVERSION_NUM or such in the
hash computation.
--
Michael


signature.asc
Description: PGP signature


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-05 Thread Michael Paquier
On Mon, Oct 05, 2020 at 05:24:06PM -0400, Bruce Momjian wrote:
> (Also, did we decide _not_ to make the pg_stat_statements queryid
> always a positive value?)

This specific point has been discussed a couple of years ago, please
see cff440d and its related thread:
https://www.postgresql.org/message-id/ca+tgmobg_kp4cbkfmsznuaam1gww6hhrnizc0kjrmooeynz...@mail.gmail.com
--
Michael


signature.asc
Description: PGP signature


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-05 Thread Bruce Momjian
On Tue, Oct  6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote:
> > I do think the queryid has to display independent of pg_stat_statements,
> > because I can see people using queryid for log file and pg_stat_activity
> > comparisons.  I also think the ability to have queryid accessible is an
> > important feature outside of pg_stat_statements, so I do think we need a
> > way to move this idea forward.
> 
> For the record, for now any extension can compute a queryid and there
> are at least 2 other published extensions that already do that, one of
> them having different semantics on how to compute the queryid.  I'm
> not sure that we'll ever get a consensus on those semantics due to
> performance tradeoff, so removing the ability to let people put their
> own code for that doesn't seem like the best way forward.
> 
> Maybe we could add a new hook for only queryid computation, and add a
> GUC to let people choose between no queryid computed, core computation
> (current pg_stat_statement) and 3rd party plugin?

That all seems very complicated.  If we go in that direction, I suggest
we just give up getting any of this into core.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-05 Thread Julien Rouhaud
On Tue, Oct 6, 2020 at 10:18 AM Bruce Momjian  wrote:
>
> On Mon, Oct  5, 2020 at 07:58:42PM -0300, Álvaro Herrera wrote:
> > On 2020-Oct-05, Tom Lane wrote:
> >
> > > FWIW, I think this proposal is a mess.  I was willing to hold my nose
> > > and have a queryId field in the internal Query struct without any solid
> > > consensus about what its semantics are and which extensions get to use it.
> > > Exposing it to end users seems like a bridge too far, though.  In
> > > particular, I'm afraid that that will cause people to expect it to have
> > > consistent values across PG versions, or even just across architectures
> > > within one version.
> >
> > I wonder if it would help to purposefully change the computation so that
> > it is not -- for instance, hash the system_identifier as initial value.
> > Then users would be forced to accept that it'll change as soon as it
> > migrates to another server or is upgraded to a new major version.
>
> That seems like a good idea, but it would prevent cross-cluster
> same-major-version comparisons, which seems like a negative.  Perhaps we
> should add the major version into the hash to handle this.  Ideally,
> let's just put a queryid-hash-version into to the hash, so if we change
> the computation, we just update the hash version and nothing matches
> anymore.
>
> I do think the queryid has to display independent of pg_stat_statements,
> because I can see people using queryid for log file and pg_stat_activity
> comparisons.  I also think the ability to have queryid accessible is an
> important feature outside of pg_stat_statements, so I do think we need a
> way to move this idea forward.

For the record, for now any extension can compute a queryid and there
are at least 2 other published extensions that already do that, one of
them having different semantics on how to compute the queryid.  I'm
not sure that we'll ever get a consensus on those semantics due to
performance tradeoff, so removing the ability to let people put their
own code for that doesn't seem like the best way forward.

Maybe we could add a new hook for only queryid computation, and add a
GUC to let people choose between no queryid computed, core computation
(current pg_stat_statement) and 3rd party plugin?




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-05 Thread Bruce Momjian
On Mon, Oct  5, 2020 at 07:58:42PM -0300, Álvaro Herrera wrote:
> On 2020-Oct-05, Tom Lane wrote:
> 
> > FWIW, I think this proposal is a mess.  I was willing to hold my nose
> > and have a queryId field in the internal Query struct without any solid
> > consensus about what its semantics are and which extensions get to use it.
> > Exposing it to end users seems like a bridge too far, though.  In
> > particular, I'm afraid that that will cause people to expect it to have
> > consistent values across PG versions, or even just across architectures
> > within one version.
> 
> I wonder if it would help to purposefully change the computation so that
> it is not -- for instance, hash the system_identifier as initial value.
> Then users would be forced to accept that it'll change as soon as it
> migrates to another server or is upgraded to a new major version.

That seems like a good idea, but it would prevent cross-cluster
same-major-version comparisons, which seems like a negative.  Perhaps we
should add the major version into the hash to handle this.  Ideally,
let's just put a queryid-hash-version into to the hash, so if we change
the computation, we just update the hash version and nothing matches
anymore.

I do think the queryid has to display independent of pg_stat_statements,
because I can see people using queryid for log file and pg_stat_activity
comparisons.  I also think the ability to have queryid accessible is an
important feature outside of pg_stat_statements, so I do think we need a
way to move this idea forward.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-05 Thread Alvaro Herrera
On 2020-Oct-05, Tom Lane wrote:

> FWIW, I think this proposal is a mess.  I was willing to hold my nose
> and have a queryId field in the internal Query struct without any solid
> consensus about what its semantics are and which extensions get to use it.
> Exposing it to end users seems like a bridge too far, though.  In
> particular, I'm afraid that that will cause people to expect it to have
> consistent values across PG versions, or even just across architectures
> within one version.

I wonder if it would help to purposefully change the computation so that
it is not -- for instance, hash the system_identifier as initial value.
Then users would be forced to accept that it'll change as soon as it
migrates to another server or is upgraded to a new major version.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-05 Thread Tom Lane
Bruce Momjian  writes:
> I would like to apply this patch (I know it has been in the commitfest
> since July 2019), but I have some questions about the user API.  Does it
> make sense to have a column in pg_stat_actvity and an option in
> log_line_prefix that will be empty unless pg_stat_statements is
> installed?  Is there no clean way to move the query hash computation out
> of pg_stat_statements and into the main code so the query id is always
> visible?  (Also, did we decide _not_ to make the pg_stat_statements
> queryid always a positive value?)

FWIW, I think this proposal is a mess.  I was willing to hold my nose
and have a queryId field in the internal Query struct without any solid
consensus about what its semantics are and which extensions get to use it.
Exposing it to end users seems like a bridge too far, though.  In
particular, I'm afraid that that will cause people to expect it to have
consistent values across PG versions, or even just across architectures
within one version.

The larger picture here is that there's lots of room to doubt whether
pg_stat_statements' decisions about what to ignore or include in the ID
will be satisfactory to everybody.  If that were not so, we'd just move
the computation into core.

regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-05 Thread Bruce Momjian
On Wed, Aug 19, 2020 at 04:19:30PM +0200, Julien Rouhaud wrote:
> Similarly to other fields in pg_stat_activity, only the queryid from the top
> level statements are exposed, and if the backends status isn't active then the
> queryid from the last executed statements is displayed.
> 
> Also add a %Q placeholder to include the queryid in the log_line_prefix, which
> will also only expose top level statements.

I would like to apply this patch (I know it has been in the commitfest
since July 2019), but I have some questions about the user API.  Does it
make sense to have a column in pg_stat_actvity and an option in
log_line_prefix that will be empty unless pg_stat_statements is
installed?  Is there no clean way to move the query hash computation out
of pg_stat_statements and into the main code so the query id is always
visible?  (Also, did we decide _not_ to make the pg_stat_statements
queryid always a positive value?)

Also, in the doc patch:

By default, query identifiers are not computed, so this field will 
always
be null, unless an additional module that compute query identifiers, 
such
as , is configured.

why are you saying "such as"?  Isn't pg_stat_statements the only way to
see the queryid?  This command allowed the queryid to be displayed in
pg_stat_activity:

ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-08-19 Thread Julien Rouhaud
On Tue, Jul 28, 2020 at 10:55:04AM +0200, Julien Rouhaud wrote:
> On Tue, Jul 28, 2020 at 10:07 AM torikoshia  
> wrote:
> >
> > Thanks for updating!
> > I tested the patch setting log_statement = 'all', but %Q in
> > log_line_prefix
> > was always 0 even when pg_stat_statements.queryid and
> > pg_stat_activity.queryid are not 0.
> >
> > Is this an intentional behavior?
> >
> >[...]
> 
> Thanks for the tests!  That's indeed an expected behavior (although I
> wasn't aware of it), which isn't documented in this patch (I'll fix
> it).  The reason for that is that log_statements is done right after
> parsing the query:
> 
> /*
>  * Do basic parsing of the query or queries (this should be safe even if
>  * we are in aborted transaction state!)
>  */
> parsetree_list = pg_parse_query(query_string);
> 
> /* Log immediately if dictated by log_statement */
> if (check_log_statement(parsetree_list))
> {
> ereport(LOG,
> (errmsg("statement: %s", query_string),
>  errhidestmt(true),
>  errdetail_execute(parsetree_list)));
> was_logged = true;
> }
> 
> As parse analysis is not yet done, no queryid can be computed at that
> point, so we always print 0.  That's a limitation that can't be
> removed without changing the semantics of log_statements, so we'll
> probably have to live with it.
> 
> > And here is a minor typo.
> >   optionnally -> optionally
> >
> >
> > > 753 +   /* query identifier, optionnally computed using
> > > post_parse_analyze_hook */
> 
> Thanks, I fixed it locally!


Recent conflict, rebased v11 attached.
>From 473d038a1b447d4569709c3a499fc7356af76452 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v11] Expose queryid in pg_stat_activity and log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 179 --
 doc/src/sgml/config.sgml  |   9 +-
 doc/src/sgml/monitoring.sgml  |  15 ++
 src/backend/catalog/system_views.sql  |   1 +
 src/backend/executor/execMain.c   |   8 +
 src/backend/executor/execParallel.c   |  14 +-
 src/backend/executor/nodeGather.c |   3 +-
 src/backend/executor/nodeGatherMerge.c|   4 +-
 src/backend/parser/analyze.c  |   5 +
 src/backend/postmaster/pgstat.c   |  65 +++
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/adt/pgstatfuncs.c   |   7 +-
 src/backend/utils/error/elog.c|  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/executor/execParallel.h   |   3 +-
 src/include/pgstat.h  |   5 +
 src/test/regress/expected/rules.out   |   9 +-
 18 files changed, 270 insertions(+), 79 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6b91c62c31..486d07f9de 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -115,6 +115,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+	!IsA(n, PrepareStmt) && \
+	!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -345,7 +353,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
 DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 	   int query_location, int query_len,
 	   pgssStoreKind kind,
@@ -845,16 +854,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-07-28 Thread Julien Rouhaud
On Tue, Jul 28, 2020 at 10:07 AM torikoshia  wrote:
>
> Thanks for updating!
> I tested the patch setting log_statement = 'all', but %Q in
> log_line_prefix
> was always 0 even when pg_stat_statements.queryid and
> pg_stat_activity.queryid are not 0.
>
> Is this an intentional behavior?
>
>[...]

Thanks for the tests!  That's indeed an expected behavior (although I
wasn't aware of it), which isn't documented in this patch (I'll fix
it).  The reason for that is that log_statements is done right after
parsing the query:

/*
 * Do basic parsing of the query or queries (this should be safe even if
 * we are in aborted transaction state!)
 */
parsetree_list = pg_parse_query(query_string);

/* Log immediately if dictated by log_statement */
if (check_log_statement(parsetree_list))
{
ereport(LOG,
(errmsg("statement: %s", query_string),
 errhidestmt(true),
 errdetail_execute(parsetree_list)));
was_logged = true;
}

As parse analysis is not yet done, no queryid can be computed at that
point, so we always print 0.  That's a limitation that can't be
removed without changing the semantics of log_statements, so we'll
probably have to live with it.

> And here is a minor typo.
>   optionnally -> optionally
>
>
> > 753 +   /* query identifier, optionnally computed using
> > post_parse_analyze_hook */

Thanks, I fixed it locally!




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-07-28 Thread torikoshia

On 2020-07-14 20:24, Julien Rouhaud wrote:

On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote:

Hi,

v9 patch fails to apply to HEAD, could you check and rebase it?


Thanks for the notice, v10 attached!


And here are minor typos.

 79 +* utility statements.  Note that we don't compute a 
queryId

for prepared
 80 +* statemets related utility, as those will inherit from 
the

underlying
 81 +* statements's one (except DEALLOCATE which is entirely
untracked).

statemets -> statements
statements's -> statements' or statement's?


Thanks!  I went with "statement's".


Thanks for updating!
I tested the patch setting log_statement = 'all', but %Q in 
log_line_prefix

was always 0 even when pg_stat_statements.queryid and
pg_stat_activity.queryid are not 0.

Is this an intentional behavior?


```
  $ initdb --no-locale -D data


  $ edit postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
logging_collector = on
log_line_prefix = '%m [%p] queryid:%Q '
log_statement = 'all'

  $ pg_ctl start -D data

  $ psql
  =# CREATE EXTENSION pg_stat_statements;

  =# CREATE TABLE t1 (i int);
  =# INSERT INTO t1 VALUES (0),(1);
  =# SELECT queryid, query FROM pg_stat_activity;

  -- query ids are all 0 on the log
  $ view log
  2020-07-28 15:57:58.475 EDT [4480] queryid:0 LOG:  statement: CREATE 
TABLE t1 (i int);
  2020-07-28 15:58:13.730 EDT [4480] queryid:0 LOG:  statement: INSERT 
INTO t1 VALUES (0),(1);
  2020-07-28 15:59:28.389 EDT [4480] queryid:0 LOG:  statement: SELECT * 
FROM t1;


  -- on pg_stat_activity and pgss, query ids are not 0
  $ psql
  =# SELECT queryid, query FROM pg_stat_activity WHERE query LIKE 
'%t1%';

 queryid|query
  
--+--

1109063694563750779 | SELECT * FROM t1;
   -2582225123719476948 | SELECT queryid, query FROM pg_stat_activity 
WHERE query LIKE '%t1%';

  (2 rows)

  =# SELECT queryid, query FROM pg_stat_statements WHERE query LIKE 
'%t1%';

 queryid|  query
  --+-
   -5028988130796701553 | CREATE TABLE t1 (i int)
1109063694563750779 | SELECT * FROM t1
2726469050076420724 | INSERT INTO t1 VALUES ($1),($2)

```


And here is a minor typo.
 optionnally -> optionally


753 +   /* query identifier, optionnally computed using 
post_parse_analyze_hook */



Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-07-14 Thread Julien Rouhaud
On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote:
> Hi,
> 
> v9 patch fails to apply to HEAD, could you check and rebase it?

Thanks for the notice, v10 attached!

> And here are minor typos.
> 
>  79 +* utility statements.  Note that we don't compute a queryId
> for prepared
>  80 +* statemets related utility, as those will inherit from the
> underlying
>  81 +* statements's one (except DEALLOCATE which is entirely
> untracked).
> 
> statemets -> statements
> statements's -> statements' or statement's?

Thanks!  I went with "statement's".
>From 8c651ee05c8a5e55966ad1646f48e83941d3776a Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v10] Expose queryid in pg_stat_activity and log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 179 --
 doc/src/sgml/config.sgml  |   9 +-
 doc/src/sgml/monitoring.sgml  |  15 ++
 src/backend/catalog/system_views.sql  |   1 +
 src/backend/executor/execMain.c   |   8 +
 src/backend/executor/execParallel.c   |  14 +-
 src/backend/executor/nodeGather.c |   3 +-
 src/backend/executor/nodeGatherMerge.c|   4 +-
 src/backend/parser/analyze.c  |   5 +
 src/backend/postmaster/pgstat.c   |  65 +++
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/adt/pgstatfuncs.c   |   7 +-
 src/backend/utils/error/elog.c|  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/executor/execParallel.h   |   3 +-
 src/include/pgstat.h  |   5 +
 src/test/regress/expected/rules.out   |   9 +-
 18 files changed, 270 insertions(+), 79 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 14cad19afb..a51c207b49 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -115,6 +115,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+	!IsA(n, PrepareStmt) && \
+	!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -345,7 +353,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
 DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 	   int query_location, int query_len,
 	   pgssStoreKind kind,
@@ -845,16 +854,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statements related utility, as those will inherit from the underlying
+	 * statement's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-07-14 Thread Atsushi Torikoshi
Hi,

v9 patch fails to apply to HEAD, could you check and rebase it?

And here are minor typos.

 79 +* utility statements.  Note that we don't compute a queryId
for prepared
 80 +* statemets related utility, as those will inherit from the
underlying
 81 +* statements's one (except DEALLOCATE which is entirely
untracked).

statemets -> statements
statements's -> statements' or statement's?

Regards,

--
Atsushi Torikoshi

On Wed, Apr 8, 2020 at 11:38 PM Julien Rouhaud  wrote:

> On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada
>  wrote:
> >
> > Hi Julien,
> >
> > On 2020/04/02 22:25, Julien Rouhaud wrote:
> > > New conflict, rebased v9 attached.
> >
> > I tested the patch on the head (c7654f6a3) and
> > the result was fine. See below:
> >
> > $ make installcheck-world
> > =
> >   All 1 tests passed.
> > =
>
> Thanks Yamada-san!  Unfortunately this patch still didn't attract any
> committer, so I moved it to the next commitfest.
>
>
>


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-04-08 Thread Julien Rouhaud
On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada
 wrote:
>
> Hi Julien,
>
> On 2020/04/02 22:25, Julien Rouhaud wrote:
> > New conflict, rebased v9 attached.
>
> I tested the patch on the head (c7654f6a3) and
> the result was fine. See below:
>
> $ make installcheck-world
> =
>   All 1 tests passed.
> =

Thanks Yamada-san!  Unfortunately this patch still didn't attract any
committer, so I moved it to the next commitfest.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-04-07 Thread Tatsuro Yamada

Hi Julien,

On 2020/04/02 22:25, Julien Rouhaud wrote:

New conflict, rebased v9 attached.


I tested the patch on the head (c7654f6a3) and
the result was fine. See below:

$ make installcheck-world
=
 All 1 tests passed.
=


Regards,
Tatsuro Yamada






Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-04-02 Thread Julien Rouhaud
New conflict, rebased v9 attached.
>From 26b98194d8add282158c65f6ac46c721ba80f498 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v9 1/2] Expose queryid in pg_stat_activity and log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 179 --
 doc/src/sgml/config.sgml  |   9 +-
 doc/src/sgml/monitoring.sgml  |  12 ++
 src/backend/catalog/system_views.sql  |   1 +
 src/backend/executor/execMain.c   |   8 +
 src/backend/executor/execParallel.c   |  14 +-
 src/backend/executor/nodeGather.c |   3 +-
 src/backend/executor/nodeGatherMerge.c|   4 +-
 src/backend/parser/analyze.c  |   5 +
 src/backend/postmaster/pgstat.c   |  65 +++
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/adt/pgstatfuncs.c   |   7 +-
 src/backend/utils/error/elog.c|  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/executor/execParallel.h   |   3 +-
 src/include/pgstat.h  |   5 +
 src/test/regress/expected/rules.out   |   9 +-
 18 files changed, 267 insertions(+), 79 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 942922b01f..4073361f4c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -115,6 +115,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+	!IsA(n, PrepareStmt) && \
+	!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -342,7 +350,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
 DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 	   int query_location, int query_len,
 	   pgssStoreKind kind,
@@ -841,16 +850,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statemets related utility, as those will inherit from the underlying
+	 * statements's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = query->stmt_location;
+			int query_len = query->stmt_len;
+
+			/*
+			 * Confine our attention to the relevant part of the string, if the
+			 * query is a portion of a multi-statement source string.
+			 */
+			querytext = pgss_clean_querytext(pstate->p_sourcetext,
+			 _location,
+			 _len);
+
+			query->queryId = pgss_compute_utility_queryid(querytext, query_len);
+		}
+		else
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1098,6 +1125,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	DestReceiver *dest, QueryCompletion *qc)
 {
 	

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-03-16 Thread Julien Rouhaud
On Sat, Mar 14, 2020 at 06:53:51PM +0100, Julien Rouhaud wrote:
> On Tue, Mar 03, 2020 at 04:24:59PM +0100, Julien Rouhaud wrote:
> >
> > cfbot reports a failure since 2f9661311b (command completion tag
> > change), so here's a rebased v6, no change otherwise.
>
>
> Conflict with 8e8a0becb3 (Unify several ways to tracking backend type), thanks
> again to cfbot, rebased v7 attached.


Bit repetita.
>From 87be2c545e32c0c08a410949d5c5d383a4162af3 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v8 1/2] Expose queryid in pg_stat_activity and log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 179 --
 doc/src/sgml/config.sgml  |   9 +-
 doc/src/sgml/monitoring.sgml  |  12 ++
 src/backend/catalog/system_views.sql  |   1 +
 src/backend/executor/execMain.c   |   8 +
 src/backend/executor/execParallel.c   |  14 +-
 src/backend/executor/nodeGather.c |   3 +-
 src/backend/executor/nodeGatherMerge.c|   4 +-
 src/backend/parser/analyze.c  |   5 +
 src/backend/postmaster/pgstat.c   |  65 +++
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/adt/pgstatfuncs.c   |   7 +-
 src/backend/utils/error/elog.c|  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/executor/execParallel.h   |   3 +-
 src/include/pgstat.h  |   5 +
 src/test/regress/expected/rules.out   |   9 +-
 18 files changed, 267 insertions(+), 79 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..2b3aa79cb6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+	!IsA(n, PrepareStmt) && \
+	!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
 DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 	   int query_location, int query_len,
 	   double total_time, uint64 rows,
@@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statemets related utility, as those will inherit from the underlying
+	 * statements's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = query->stmt_location;
+			int query_len = query->stmt_len;
+
+			/*
+			 * Confine our attention to the relevant part of the string, if the
+			 * query is a portion of a multi-statement source string.
+			 */
+			

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-03-14 Thread Julien Rouhaud
On Tue, Mar 03, 2020 at 04:24:59PM +0100, Julien Rouhaud wrote:
>
> cfbot reports a failure since 2f9661311b (command completion tag
> change), so here's a rebased v6, no change otherwise.


Conflict with 8e8a0becb3 (Unify several ways to tracking backend type), thanks
again to cfbot, rebased v7 attached.
>From dda1ab659a44c9a6375ee05d249baa2ec552 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v7 1/2] Expose queryid in pg_stat_activity and log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 179 --
 doc/src/sgml/config.sgml  |   9 +-
 doc/src/sgml/monitoring.sgml  |  12 ++
 src/backend/catalog/system_views.sql  |   1 +
 src/backend/executor/execMain.c   |   8 +
 src/backend/executor/execParallel.c   |  14 +-
 src/backend/executor/nodeGather.c |   3 +-
 src/backend/executor/nodeGatherMerge.c|   4 +-
 src/backend/parser/analyze.c  |   5 +
 src/backend/postmaster/pgstat.c   |  65 +++
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/adt/pgstatfuncs.c   |   7 +-
 src/backend/utils/error/elog.c|  10 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/executor/execParallel.h   |   3 +-
 src/include/pgstat.h  |   5 +
 src/test/regress/expected/rules.out   |   9 +-
 18 files changed, 267 insertions(+), 79 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..2b3aa79cb6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+	!IsA(n, PrepareStmt) && \
+	!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
 DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 	   int query_location, int query_len,
 	   double total_time, uint64 rows,
@@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statemets related utility, as those will inherit from the underlying
+	 * statements's one (except DEALLOCATE which is entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && PGSS_HANDLED_UTILITY(query->utilityStmt)
+			&& pstate->p_sourcetext)
+		{
+			const char *querytext = pstate->p_sourcetext;
+			int query_location = query->stmt_location;
+			int query_len = query->stmt_len;
+
+			/*
+			 * Confine our attention to the relevant part of the string, if the
+			 * query is a portion of a multi-statement source string.
+			 */
+			querytext = pgss_clean_querytext(pstate->p_sourcetext,
+			 _location,
+			 _len);

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-03-08 Thread Julien Rouhaud
On Fri, Jun 28, 2019 at 11:49:53AM -0700, Peter Geoghegan wrote:
> On Tue, Mar 19, 2019 at 12:38 PM legrand legrand
>  wrote:
> > Would it make sense to add it in auto explain ?
> > I don't know for explain itself, but maybe ...
>
> I think that it should appear in EXPLAIN. pg_stat_statements already
> cannot have a query hash of zero, so it might be okay to display it
> only when its value is non-zero.

I had forgotten about this.  After looking at it, I can see a few issues.

For now post_parse_analyze_hook isn't called for the underlying statement, so
we don't have the queryid.  And we can't compute the queryid for the underlying
query in the initial post_parse_analyze_hook call as we don't want the executor
to have a queryid set in that case to avoid cumulating counters for both the
explain and the query.

We could add an extra call in ExplainQuery, but this will be ignored by
pg_stat_statements unless you set pg_stat_statements.track to all. Also,
pgss_post_parse_analyze will try to record an entry with the normalized query
text if no one exists yet and if any constant where removed.  The problem is
that, as I already mentioned in [1], the underlying query doesn't have
query_location or query_len valued, so the recorded query text will at least
contain the explain part of the input query.

[1] 
https://www.postgresql.org/message-id/CAOBaU_Y-y%2BVOhTZgDOuDk6-9V72-ZXdWccXo_kx0P4DDBEEh9A%40mail.gmail.com




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-03-03 Thread Julien Rouhaud
On Fri, Feb 7, 2020 at 11:12 AM Julien Rouhaud  wrote:
>
> On Thu, Feb 06, 2020 at 02:59:09PM -0500, Robert Haas wrote:
> > On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud  
> > wrote:
> > > There's also the possibility to reserve 1 bit of the hash to know if
> > > this is a utility command or not, although I don't recall right now
> > > all the possible issues with utility commands and some special
> > > handling of them.  I'll work on it before the next commitfest.
> >
> > FWIW, I don't really see why it would be bad to have 0 mean that
> > "there's no query ID for some reason" without caring whether that's
> > because the current statement is a utility statement or because
> > there's no statement in progress at all or whatever else. The user
> > probably doesn't need our help to distinguish between "no statement"
> > and "utility statement", right?
>
> Sure, but if we don't fix that it means that we also won't expose any queryid
> for utility statement, even if pg_stat_statements is configured to track those
> (with a very poor queryid handling, but still).
>
> While looking at this again, I realized that pg_stat_statements doesn't 
> compute
> a queryid during the post parse analysis hook just to make sure that no query
> identifier will be set during executorStart and the rest of executor 
> functions.
>
> AFAICT, that can't happen anyway since pg_plan_queries() will discard any
> computed queryid for utility statements.  This seems to be an oversight due to
> original pg_stat_statements implementation, so I fixed this.
>
> Then, as processUtility is called between parse analysis and executor, I think
> that we can simply work around this by computing utility statements query
> identifier during parse analysis, removing it in pgss_ProcessUtility and
> keeping a copy of it for the pgss_store calls in that function, as done in the
> attached v5.
>
> This fixes everything except EXECUTE statements, which has to get the
> underlying query's queryid.  The problem is that EXECUTE won't get through
> parse analysis, so while it's correctly handled for execution and pgss_store,
> it's not being exposed in pg_stat_activity and log_line_prefix.  To fix it, I
> added an extra call to pgstat_report_queryid in executorStart.  As this
> function is a no-op if a queryid is already exposed, this shouldn't cause any
> harm and fix any other cases of query execution that don't go through parse
> analysis.
>
> Finally, DEALLOCATE is entirely ignored by pg_stat_statements, so those
> statements will always be reported with a NULL/0 queryid, but this is
> consistent as it's also not present in pg_stat_statements() SRF.

cfbot reports a failure since 2f9661311b (command completion tag
change), so here's a rebased v6, no change otherwise.
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..2b3aa79cb6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 
 #define JUMBLE_SIZE1024	/* query serialization buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+	!IsA(n, PrepareStmt) && \
+	!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
 DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int *len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
 	   int query_location, int query_len,
 	   double total_time, uint64 rows,
@@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * We compute a queryId now so that it can get exported in out
+	 * PgBackendStatus.  pgss_ProcessUtility will later discard it to prevents
+	 * double counting of optimizable statements that are directly contained in
+	 * utility statements.  Note that we don't compute a queryId for prepared
+	 * statemets related utility, as those 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-02-07 Thread Julien Rouhaud
On Thu, Feb 06, 2020 at 02:59:09PM -0500, Robert Haas wrote:
> On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud  wrote:
> > There's also the possibility to reserve 1 bit of the hash to know if
> > this is a utility command or not, although I don't recall right now
> > all the possible issues with utility commands and some special
> > handling of them.  I'll work on it before the next commitfest.
>
> FWIW, I don't really see why it would be bad to have 0 mean that
> "there's no query ID for some reason" without caring whether that's
> because the current statement is a utility statement or because
> there's no statement in progress at all or whatever else. The user
> probably doesn't need our help to distinguish between "no statement"
> and "utility statement", right?

Sure, but if we don't fix that it means that we also won't expose any queryid
for utility statement, even if pg_stat_statements is configured to track those
(with a very poor queryid handling, but still).

While looking at this again, I realized that pg_stat_statements doesn't compute
a queryid during the post parse analysis hook just to make sure that no query
identifier will be set during executorStart and the rest of executor functions.

AFAICT, that can't happen anyway since pg_plan_queries() will discard any
computed queryid for utility statements.  This seems to be an oversight due to
original pg_stat_statements implementation, so I fixed this.

Then, as processUtility is called between parse analysis and executor, I think
that we can simply work around this by computing utility statements query
identifier during parse analysis, removing it in pgss_ProcessUtility and
keeping a copy of it for the pgss_store calls in that function, as done in the
attached v5.

This fixes everything except EXECUTE statements, which has to get the
underlying query's queryid.  The problem is that EXECUTE won't get through
parse analysis, so while it's correctly handled for execution and pgss_store,
it's not being exposed in pg_stat_activity and log_line_prefix.  To fix it, I
added an extra call to pgstat_report_queryid in executorStart.  As this
function is a no-op if a queryid is already exposed, this shouldn't cause any
harm and fix any other cases of query execution that don't go through parse
analysis.

Finally, DEALLOCATE is entirely ignored by pg_stat_statements, so those
statements will always be reported with a NULL/0 queryid, but this is
consistent as it's also not present in pg_stat_statements() SRF.
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index 6f82a671ee..2da810ade6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -112,6 +112,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM 
/ 100;
 
 #define JUMBLE_SIZE1024/* query serialization 
buffer size */
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)(!IsA(n, ExecuteStmt) && \
+   !IsA(n, 
PrepareStmt) && \
+   !IsA(n, 
DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -308,7 +316,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const 
char *queryString,

ProcessUtilityContext context, ParamListInfo params,

QueryEnvironment *queryEnv,
DestReceiver 
*dest, char *completionTag);
-static uint64 pgss_hash_string(const char *str, int len);
+static const char *pgss_clean_querytext(const char *query, int *location, int 
*len);
+static uint64 pgss_compute_utility_queryid(const char *query, int query_len);
 static void pgss_store(const char *query, uint64 queryId,
   int query_location, int query_len,
   double total_time, uint64 rows,
@@ -792,16 +801,34 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
return;
 
/*
-* Utility statements get queryId zero.  We do this even in cases where
-* the statement contains an optimizable statement for which a queryId
-* could be derived (such as EXPLAIN or DECLARE CURSOR).  For such 
cases,
-* runtime control will first go through ProcessUtility and then the
-* executor, and we don't want the executor hooks to do anything, since 
we
-* are already measuring the statement's costs at the utility level.
+* We compute a queryId now so that it can get exported in out
+* PgBackendStatus.  pgss_ProcessUtility will later discard it to 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-02-06 Thread Robert Haas
On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud  wrote:
> There's also the possibility to reserve 1 bit of the hash to know if
> this is a utility command or not, although I don't recall right now
> all the possible issues with utility commands and some special
> handling of them.  I'll work on it before the next commitfest.

FWIW, I don't really see why it would be bad to have 0 mean that
"there's no query ID for some reason" without caring whether that's
because the current statement is a utility statement or because
there's no statement in progress at all or whatever else. The user
probably doesn't need our help to distinguish between "no statement"
and "utility statement", right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-02-05 Thread Julien Rouhaud
On Sat, Feb 1, 2020 at 12:30 PM Tomas Vondra
 wrote:
>
> On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote:
> >On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier  wrote:
> >>
> >> On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:
> >> > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
> >> >> I'd really like to have the queryid function available through SQL,
> >> >> but I think that this specific case wouldn't work very well for
> >> >> pg_stat_statements' approach as it's working with oid.  The query
> >> >> string in pg_stat_activity is the user provided one rather than a
> >> >> fully-qualified version, so in order to get that query's queryid, you
> >> >> need to know the exact search_path in use in that backend, and that's
> >> >> not something available.
> >> >
> >> > Yeah..  So, we have a patch marked as ready for committer here, and it
> >> > seems to me that we have a couple of issues to discuss more about
> >> > first particularly this query ID of 0.  Again, do others have more
> >> > any input to offer?
> >
> >I just realized that with current infrastructure it's not possible to
> >display a utility queryid.  We need to recognize utility to not
> >process the counters twice (once in processUtility, once in the
> >underlying executor),  so we don't provide a queryid for utility
> >statements in parse analysis.  Current magic value 0 has the side
> >effect of showing an invalid queryid for all utilty statements, and
> >using a magic value different from 0 will just always display that
> >magic value.  We could instead add another field in the Query and
> >PlannedStmt structs, say "int queryid_flags", that extensions could
> >use for their needs?
> >
> >> And while on it, the latest patch does not apply, so a rebase is
> >> needed here.
> >
> >Yep, I noticed that this morning.  I already rebased the patch
> >locally, I'll send a new version with new modifications when we reach
> >an agreement on the utility issue.
> >
>
> Well, this patch was in WoA since November, but now that I look at it
> that might have been wrong - we're clearly waiting for agreement on how
> to handle queryid for utility commands. I suspect the WoA status might
> have been driving people away from this thread :-(

Oh, indeed.

> I've switched the patch to "needs review" and moved it to the next CF.

Thanks

> What I think needs to happen is we get a patch implementing one of the
> proposed solutions, and discuss that.

There's also the possibility to reserve 1 bit of the hash to know if
this is a utility command or not, although I don't recall right now
all the possible issues with utility commands and some special
handling of them.  I'll work on it before the next commitfest.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-02-01 Thread Tomas Vondra

On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote:

On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier  wrote:


On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:
> On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
>> I'd really like to have the queryid function available through SQL,
>> but I think that this specific case wouldn't work very well for
>> pg_stat_statements' approach as it's working with oid.  The query
>> string in pg_stat_activity is the user provided one rather than a
>> fully-qualified version, so in order to get that query's queryid, you
>> need to know the exact search_path in use in that backend, and that's
>> not something available.
>
> Yeah..  So, we have a patch marked as ready for committer here, and it
> seems to me that we have a couple of issues to discuss more about
> first particularly this query ID of 0.  Again, do others have more
> any input to offer?


I just realized that with current infrastructure it's not possible to
display a utility queryid.  We need to recognize utility to not
process the counters twice (once in processUtility, once in the
underlying executor),  so we don't provide a queryid for utility
statements in parse analysis.  Current magic value 0 has the side
effect of showing an invalid queryid for all utilty statements, and
using a magic value different from 0 will just always display that
magic value.  We could instead add another field in the Query and
PlannedStmt structs, say "int queryid_flags", that extensions could
use for their needs?


And while on it, the latest patch does not apply, so a rebase is
needed here.


Yep, I noticed that this morning.  I already rebased the patch
locally, I'll send a new version with new modifications when we reach
an agreement on the utility issue.



Well, this patch was in WoA since November, but now that I look at it
that might have been wrong - we're clearly waiting for agreement on how
to handle queryid for utility commands. I suspect the WoA status might
have been driving people away from this thread :-(

I've switched the patch to "needs review" and moved it to the next CF.
What I think needs to happen is we get a patch implementing one of the
proposed solutions, and discuss that.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-11-29 Thread Julien Rouhaud
On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier  wrote:
>
> On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:
> > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
> >> I'd really like to have the queryid function available through SQL,
> >> but I think that this specific case wouldn't work very well for
> >> pg_stat_statements' approach as it's working with oid.  The query
> >> string in pg_stat_activity is the user provided one rather than a
> >> fully-qualified version, so in order to get that query's queryid, you
> >> need to know the exact search_path in use in that backend, and that's
> >> not something available.
> >
> > Yeah..  So, we have a patch marked as ready for committer here, and it
> > seems to me that we have a couple of issues to discuss more about
> > first particularly this query ID of 0.  Again, do others have more
> > any input to offer?

I just realized that with current infrastructure it's not possible to
display a utility queryid.  We need to recognize utility to not
process the counters twice (once in processUtility, once in the
underlying executor),  so we don't provide a queryid for utility
statements in parse analysis.  Current magic value 0 has the side
effect of showing an invalid queryid for all utilty statements, and
using a magic value different from 0 will just always display that
magic value.  We could instead add another field in the Query and
PlannedStmt structs, say "int queryid_flags", that extensions could
use for their needs?

> And while on it, the latest patch does not apply, so a rebase is
> needed here.

Yep, I noticed that this morning.  I already rebased the patch
locally, I'll send a new version with new modifications when we reach
an agreement on the utility issue.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-11-28 Thread Michael Paquier
On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:
> On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
>> I'd really like to have the queryid function available through SQL,
>> but I think that this specific case wouldn't work very well for
>> pg_stat_statements' approach as it's working with oid.  The query
>> string in pg_stat_activity is the user provided one rather than a
>> fully-qualified version, so in order to get that query's queryid, you
>> need to know the exact search_path in use in that backend, and that's
>> not something available.
> 
> Yeah..  So, we have a patch marked as ready for committer here, and it
> seems to me that we have a couple of issues to discuss more about
> first particularly this query ID of 0.  Again, do others have more
> any input to offer?

And while on it, the latest patch does not apply, so a rebase is
needed here. 
--
Michael


signature.asc
Description: PGP signature


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-11-28 Thread Michael Paquier
On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
> I'd really like to have the queryid function available through SQL,
> but I think that this specific case wouldn't work very well for
> pg_stat_statements' approach as it's working with oid.  The query
> string in pg_stat_activity is the user provided one rather than a
> fully-qualified version, so in order to get that query's queryid, you
> need to know the exact search_path in use in that backend, and that's
> not something available.

Yeah..  So, we have a patch marked as ready for committer here, and it
seems to me that we have a couple of issues to discuss more about
first particularly this query ID of 0.  Again, do others have more
any input to offer?
--
Michael


signature.asc
Description: PGP signature


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-11-13 Thread Julien Rouhaud
On Wed, Nov 13, 2019 at 4:15 AM Bruce Momjian  wrote:
>
> On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote:
> > On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote:
> > > The thing is that pg_stat_statements assigns a 0 queryid in the
> > > post_parse_analyze_hook to recognize utility statements and avoid
> > > tracking instrumentation twice in case of utility statements, and then
> > > compute a queryid base on a hash of the query text.  Maybe we could
> > > instead fully reserve queryid "2" for utility statements (so forcing
> > > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
> > > of only 0), and use "2" as the identifier for utility statement
> > > instead of "0"?
> >
> > Hmm.  Not sure.  At this stage it would be nice to gather more input
> > on the matter, and FWIW, I don't like much the assumption that a query
> > ID of 0 is perhaps a utility statement, or perhaps nothing depending
> > on the state of a backend entry, or even perhaps something else
> > depending how on how modules make use and define such query IDs.
>
> I thought each extension would export a function to compute the query
> id, and you would all that function with the pg_stat_activity.query
> string.

I'd really like to have the queryid function available through SQL,
but I think that this specific case wouldn't work very well for
pg_stat_statements' approach as it's working with oid.  The query
string in pg_stat_activity is the user provided one rather than a
fully-qualified version, so in order to get that query's queryid, you
need to know the exact search_path in use in that backend, and that's
not something available.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-11-12 Thread Bruce Momjian
On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote:
> On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote:
> > The thing is that pg_stat_statements assigns a 0 queryid in the
> > post_parse_analyze_hook to recognize utility statements and avoid
> > tracking instrumentation twice in case of utility statements, and then
> > compute a queryid base on a hash of the query text.  Maybe we could
> > instead fully reserve queryid "2" for utility statements (so forcing
> > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
> > of only 0), and use "2" as the identifier for utility statement
> > instead of "0"?
> 
> Hmm.  Not sure.  At this stage it would be nice to gather more input
> on the matter, and FWIW, I don't like much the assumption that a query
> ID of 0 is perhaps a utility statement, or perhaps nothing depending
> on the state of a backend entry, or even perhaps something else
> depending how on how modules make use and define such query IDs.

I thought each extension would export a function to compute the query
id, and you would all that function with the pg_stat_activity.query
string.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-11-11 Thread Michael Paquier
On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote:
> The thing is that pg_stat_statements assigns a 0 queryid in the
> post_parse_analyze_hook to recognize utility statements and avoid
> tracking instrumentation twice in case of utility statements, and then
> compute a queryid base on a hash of the query text.  Maybe we could
> instead fully reserve queryid "2" for utility statements (so forcing
> queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
> of only 0), and use "2" as the identifier for utility statement
> instead of "0"?

Hmm.  Not sure.  At this stage it would be nice to gather more input
on the matter, and FWIW, I don't like much the assumption that a query
ID of 0 is perhaps a utility statement, or perhaps nothing depending
on the state of a backend entry, or even perhaps something else
depending how on how modules make use and define such query IDs.
--
Michael


signature.asc
Description: PGP signature


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-09-11 Thread Julien Rouhaud
Thanks for looking at it!

On Wed, Sep 11, 2019 at 6:45 AM Michael Paquier  wrote:
>
> An invalid query ID is assumed to be 0 in the patch, per the way it is
> defined in pg_stat_statements.  However this also maps with the case
> where we have a utility statement.

Oh indeed.  Which means that if a utility statements later calls
parse_analyze or friends, this patch would report an unexpected
queryid.  That's at least possible for something like

COPY (SELECT * FROM tbl) TO ...

The thing is that pg_stat_statements assigns a 0 queryid in the
post_parse_analyze_hook to recognize utility statements and avoid
tracking instrumentation twice in case of utility statements, and then
compute a queryid base on a hash of the query text.  Maybe we could
instead fully reserve queryid "2" for utility statements (so forcing
queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
of only 0), and use "2" as the identifier for utility statement
instead of "0"?

> +   /*
> +* We only report the top-level query identifiers.  The stored queryid is
> +* reset when a backend call pgstat_report_activity(STATE_RUNNING), or 
> with
> +* an explicit call to this function.  If the saved query identifier is 
> not
> +* zero it means that it's not a top-level command, so ignore the one
> +* provided unless it's an explicit call to reset the identifier.
> +*/
> +   if (queryId != 0 && beentry->st_queryid != 0)
> +   return;
> Hmm.  I am wondering if we shouldn't have an API dedicated to the
> reset of the query ID.  That logic looks rather brittle..

How about adding a "bool force" parameter to allow resetting the queryid to 0?

> Wouldn't it be better (and more consistent) to update the query ID in
> parse_analyze_varparams() and parse_analyze() as well after going
> through the post_parse_analyze hook instead of pg_analyze_and_rewrite?

I thought about it without knowing what would be best.  I'll change to
report the queryid right after calling post_parse_analyze_hook then.

> +   /*
> +* If a new query is started, we reset the query identifier as it'll only
> +* be known after parse analysis, to avoid reporting last query's
> +* identifier.
> +*/
> +   if (state == STATE_RUNNING)
> +   beentry->st_queryid = 0
> I don't quite get why you don't reset the counter in other cases as
> well.  If the backend entry is idle in transaction or in an idle
> state, it seems to me that we should not report the query ID of the
> last query run in the transaction.  And that would make the reset in
> exec_simple_query() unnecessary, no?

I'm reproducing the same behavior as for the query text, ie. showing
the information about the last executed query text if state is idle:

+ queryid
+ bigint
+ Identifier of this backend's most recent query. If
+  state is active this field
+  shows the identifier of the currently executing query. In all other
+  states, it shows the identifier of last query that was executed.

I think that showing the last executed query's queryid is as useful as
the query text.  Also, while avoiding a reset in exec_simple_query()
it'd be required to do such reset in case of error during query
execution, so that wouldn't make things quite simpler..




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-09-10 Thread Michael Paquier
On Wed, Aug 07, 2019 at 09:03:21AM +, Evgeny Efimkin wrote:
> The new status of this patch is: Ready for Committer

I may be wrong of course, but it looks that this is wanted and the
current shape of the patch looks sensible:
- Register the query ID using a backend entry.
- Only consider the top-level query.

An invalid query ID is assumed to be 0 in the patch, per the way it is
defined in pg_stat_statements.  However this also maps with the case
where we have a utility statement.

+* We only report the top-level query identifiers.  The stored queryid is
+* reset when a backend call pgstat_report_activity(STATE_RUNNING), or with
s/call/calls/

+   /*
+* We only report the top-level query identifiers.  The stored queryid is
+* reset when a backend call pgstat_report_activity(STATE_RUNNING), or with
+* an explicit call to this function.  If the saved query identifier is not
+* zero it means that it's not a top-level command, so ignore the one
+* provided unless it's an explicit call to reset the identifier.
+*/
+   if (queryId != 0 && beentry->st_queryid != 0)
+   return;
Hmm.  I am wondering if we shouldn't have an API dedicated to the
reset of the query ID.  That logic looks rather brittle..

Wouldn't it be better (and more consistent) to update the query ID in
parse_analyze_varparams() and parse_analyze() as well after going
through the post_parse_analyze hook instead of pg_analyze_and_rewrite?

+   /*
+* If a new query is started, we reset the query identifier as it'll only
+* be known after parse analysis, to avoid reporting last query's
+* identifier.
+*/
+   if (state == STATE_RUNNING)
+   beentry->st_queryid = 0
I don't quite get why you don't reset the counter in other cases as
well.  If the backend entry is idle in transaction or in an idle
state, it seems to me that we should not report the query ID of the
last query run in the transaction.  And that would make the reset in
exec_simple_query() unnecessary, no?
--
Michael


signature.asc
Description: PGP signature


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-07 Thread Evgeny Efimkin
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

HI!
patch is look good for me.

The new status of this patch is: Ready for Committer


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-05 Thread legrand legrand
Kyotaro Horiguchi-4 wrote
> At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand 

> legrand_legrand@

>  wrote in <

> 1564902241482-0.post@.nabble

>>
>> >  However having the nested queryid in 
>> > pg_stat_activity would be convenient to track
>> > what is a long stored functions currently doing.
>> 
>> +1
>> 
>> And  this could permit to get wait event sampling per queryid when
>> pg_stat_statements.track = all
> 
> I'm strongly on this side emotionally, but also I'm on Tom and
> Andres's side that exposing querid that way is not the right
> thing.
> 
> Doing that means we don't need exact correspondence between
> top-level query and queryId (in nested or multistatement queries)
> in this patch.  pg_stat_statements will allow us to do the same
> thing by having additional uint64[MaxBackends] array in
> pgssSharedState, instead of expanding PgBackendStatus array in
> core by the same size.
> 
> regards.
> 
> -- 
> Kyotaro Horiguchi
> NTT Open Source Software Center

Hi Kyotaro,
Thank you for this answer.
What you propose here is already available 
Inside pg_stat_sql_plans extension (a derivative from 
Pg_stat_statements and pg_store_plans)
And I’m used to this queryid behavior with top Level
Queries...
My emotion was high but I will accept it !
Regards
PAscal




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-05 Thread Julien Rouhaud
On Mon, Aug 5, 2019 at 9:28 AM Kyotaro Horiguchi
 wrote:
>
> At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand 
>  wrote in <1564902241482-0.p...@n3.nabble.com>
> > >  However having the nested queryid in
> > > pg_stat_activity would be convenient to track
> > > what is a long stored functions currently doing.
> >
> > +1
> >
> > And  this could permit to get wait event sampling per queryid when
> > pg_stat_statements.track = all
>
> I'm strongly on this side emotionally, but also I'm on Tom and
> Andres's side that exposing querid that way is not the right
> thing.
>
> Doing that means we don't need exact correspondence between
> top-level query and queryId (in nested or multistatement queries)
> in this patch.  pg_stat_statements will allow us to do the same
> thing by having additional uint64[MaxBackends] array in
> pgssSharedState, instead of expanding PgBackendStatus array in
> core by the same size.

Sure, but the problem with this approach is that all extensions that
compute their own queryid would have to do the same.  I hope that we
can come up with an approach friendlier for those extensions.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-05 Thread Kyotaro Horiguchi
At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand 
 wrote in <1564902241482-0.p...@n3.nabble.com>
> >  However having the nested queryid in 
> > pg_stat_activity would be convenient to track
> > what is a long stored functions currently doing.
> 
> +1
> 
> And  this could permit to get wait event sampling per queryid when
> pg_stat_statements.track = all

I'm strongly on this side emotionally, but also I'm on Tom and
Andres's side that exposing querid that way is not the right
thing.

Doing that means we don't need exact correspondence between
top-level query and queryId (in nested or multistatement queries)
in this patch.  pg_stat_statements will allow us to do the same
thing by having additional uint64[MaxBackends] array in
pgssSharedState, instead of expanding PgBackendStatus array in
core by the same size.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-04 Thread legrand legrand
>  However having the nested queryid in 
> pg_stat_activity would be convenient to track
> what is a long stored functions currently doing.

+1

And  this could permit to get wait event sampling per queryid when
pg_stat_statements.track = all

Regards 
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-03 Thread Julien Rouhaud
Hi,

On Sat, Aug 3, 2019 at 1:21 AM Andres Freund  wrote:
>
> On 2019-08-02 10:54:35 +0200, Julien Rouhaud wrote:
> > However having the nested queryid in
> > pg_stat_activity would be convenient to track what is a long stored
> > functions currently doing.  Maybe we could expose something like
> > top_level_queryid and current_queryid instead?
>
> Given that the query string is the toplevel one, I think that'd just be
> confusing. And given the fact that it adds *substantial* additional
> complexity, I'd just rip the subcommand bits out.

Ok, so here's a version that only exposes the top-level queryid only.
There can still be discrepancies with the query field, if a
multi-command string is provided.  The queryid will be updated each
time a new top level statement is executed.

As the queryid cannot be immediately known, and may never exist at all
if a query fails to parse, here are the heuristic I used to update the
stored queryid:

- it's reset to 0 each time pgstat_report_activity(STATE_RUNNING) is
called.  This way, we're sure that we don't display last query's
queryid in the logs if the next query fails to parse
- it's also reset to 0 at the beginning of exec_simple_query() loop on
the parsetree_list (for multi-command string case)
- pg_analyze_and_rewrite() and pg_analyze_and_rewrite_params() will
report the new queryid after parse analysis.
- a non-zero queryid will only be updated if the stored one is zero

This should also work as intended for background worker using SPI,
provided that they correctly call pgstat_report_activity.  I also
modified ExecInitParallelPlan() to publish the queryId in the
serialized plannedStmt, so ParallelQueryMain() can report it to make
the queryid available in the parallel workers too.

Note that this patch makes it clear that a zero queryid means no
queryid computed (and NULL will be displayed in such case in
pg_stat_activity).  pg_stat_statements already makes sure that it
cannot compute a zero queryid.

It also assume that any extension computing a queryid will do that in
the post_parse_analysis hook, which seems like a sane requirement.  We
may want to have a dedicated hook for that instead, if more people get
interested in having the queryid only, possibly different
implementations, if it becomes available outside pgss.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c91e3e1550..c7ca1bf9a8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6353,6 +6353,11 @@ local0.*/var/log/postgresql
  session processes
  no
 
+
+ %Q
+ queryid: identifier of session's current query, if any
+ yes
+
 
  %%
  Literal %
@@ -6736,8 +6741,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   

 Enables the collection of information on the currently
-executing command of each session, along with the time when
-that command began execution. This parameter is on by
+executing command of each session, along with its identifier and the
+time when that command began execution. This parameter is on by
 default. Note that even when enabled, this information is not
 visible to all users, only to superusers and the user owning
 the session being reported on, so it should not represent a
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..7f287c7a7e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -824,6 +824,18 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  xid
  The current backend's xmin horizon.
 
+
+ queryid
+ bigint
+ Identifier of this backend's most recent query. If
+  state is active this field
+  shows the identifier of the currently executing query. In all other
+  states, it shows the identifier of last query that was executed.  By
+  default, query identifiers are not computed, so this field will always
+  be null, unless an additional module that compute query identifiers, such
+  as , is configured.
+ 
+
 
  query
  text
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..f30098c2cd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -749,6 +749,7 @@ CREATE VIEW pg_stat_activity AS
 S.state,
 S.backend_xid,
 s.backend_xmin,
+S.queryid,
 S.query,
 S.backend_type
 FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 53cd2fc666..9ba6d3f2e6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -121,7 +121,7 @@ typedef struct 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-02 Thread Andres Freund
Hi,

On 2019-08-02 10:54:35 +0200, Julien Rouhaud wrote:
> On Thu, Aug 1, 2019 at 11:05 PM Andres Freund  wrote:
> >
> > I'm actually quite unconvinced that it's sensible to update the global
> > value for nested queries. That'll mean e.g. the log_line_prefix and
> > pg_stat_activity values are most of the time going to be bogus while
> > nested, because the querystring that's associated with those will *not*
> > be the value that the queryid corresponds to. elog.c uses
> > debug_query_string to log the statement, which is only updated for
> > top-level queries (outside of some exceptions like parallel workers for
> > parallel queries in a function or stuff like that). And pg_stat_activity
> > is also only updated for top level queries.
> 
> Having the nested queryid seems indeed quite broken for
> log_line_prefix.  However having the nested queryid in
> pg_stat_activity would be convenient to track what is a long stored
> functions currently doing.  Maybe we could expose something like
> top_level_queryid and current_queryid instead?

Given that the query string is the toplevel one, I think that'd just be
confusing. And given the fact that it adds *substantial* additional
complexity, I'd just rip the subcommand bits out.

Greetings,

Andres Freund




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-02 Thread Julien Rouhaud
On Thu, Aug 1, 2019 at 11:05 PM Andres Freund  wrote:
>
> I'm actually quite unconvinced that it's sensible to update the global
> value for nested queries. That'll mean e.g. the log_line_prefix and
> pg_stat_activity values are most of the time going to be bogus while
> nested, because the querystring that's associated with those will *not*
> be the value that the queryid corresponds to. elog.c uses
> debug_query_string to log the statement, which is only updated for
> top-level queries (outside of some exceptions like parallel workers for
> parallel queries in a function or stuff like that). And pg_stat_activity
> is also only updated for top level queries.

Having the nested queryid seems indeed quite broken for
log_line_prefix.  However having the nested queryid in
pg_stat_activity would be convenient to track what is a long stored
functions currently doing.  Maybe we could expose something like
top_level_queryid and current_queryid instead?




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-01 Thread Julien Rouhaud
On Thu, Aug 1, 2019 at 10:52 PM Andres Freund  wrote:
>
> On 2019-08-01 22:42:23 +0200, Julien Rouhaud wrote:
> > Sure, but it requires extra wrapper functions, and the st_changecount
> > dance when writing the new value.
>
> So? You need a wrapper function anyway, there's no way we're going to
> add all those separate pg_atomic_write* calls directly.

Ok

> I also think this proposed column should probably respect
> the track_activities GUC.

Oh indeed, I'll fix that when I'll be sure of the semantics to implement.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-01 Thread Andres Freund
Hi,

On 2019-08-01 22:49:48 +0200, Julien Rouhaud wrote:
> On Thu, Aug 1, 2019 at 8:36 PM Andres Freund  wrote:
> >
> > On 2019-08-01 14:20:46 -0400, Robert Haas wrote:
> > > However, I think that the fact that this patch adds 15 new calls to
> > > pg_atomic_write_u64(>queryId, ...) is probably not a good
> > > sign.  It seems like we ought to be able to centralize it better than
> > > that.
> >
> > +1
> 
> Unfortunately I didn't find a better way to do that.  Since you can
> have nested execution, I don't see how to avoid adding extra code in
> every parts of query execution.

At least my +1 is not primarily about the number of sites that need to
handle queryid changes, but that they all need to know about the way the
queryid is stored. Including how atomicity etc is handled. That
knowledge should be in one or two places, not more. In a file where that
knowledge makes sense.

I'm *also* concerned about the number of places, as that makes it likely
that some have been missed/new ones will be introduced without the
queryid handling. But that wasn't what I was referring to above.


I'm actually quite unconvinced that it's sensible to update the global
value for nested queries. That'll mean e.g. the log_line_prefix and
pg_stat_activity values are most of the time going to be bogus while
nested, because the querystring that's associated with those will *not*
be the value that the queryid corresponds to. elog.c uses
debug_query_string to log the statement, which is only updated for
top-level queries (outside of some exceptions like parallel workers for
parallel queries in a function or stuff like that). And pg_stat_activity
is also only updated for top level queries.

Greetings,

Andres Freund




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-01 Thread Andres Freund
Hi,

On 2019-08-01 22:42:23 +0200, Julien Rouhaud wrote:
> On Thu, Aug 1, 2019 at 8:36 PM Andres Freund  wrote:
> >
> > On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote:
> > > On Wed, Jul 31, 2019 at 11:59 PM Andres Freund  wrote:
> > > > And if it were necessary, why wouldn't any of the other fields in
> > > > PgBackendStatus need it? There's plenty of other fields written to
> > > > without a lock, and several of those are also 8 bytes (so it's not a
> > > > case of assuming that 8 byte reads might not be atomic, but for byte
> > > > reads are).
> > >
> > > This patch is actually storing the queryid in PGPROC, not in
> > > PgBackendStatus, thus the need for an atomic.  I used PGPROC because
> > > the value needs to be available in log_line_prefix() and spi.c, so
> > > pgstat.c / PgBackendStatus didn't seem like the best interface in that
> > > case.
> >
> > Hm. I'm not convinced that really is the case? You can just access
> > MyBEentry, and read and update it?
> 
> Sure, but it requires extra wrapper functions, and the st_changecount
> dance when writing the new value.

So? You need a wrapper function anyway, there's no way we're going to
add all those separate pg_atomic_write* calls directly.


> >  I mean, we do so at a frequency
> > roughtly as high as high as the new queryid updates for things like
> > pgstat_report_activity().
> 
> pgstat_report_activity() is only called for top-level statement.  For
> the queryid we need to track it down to all nested statements, which
> could be way higher.

Compared to the overhead of executing a separate query the cost of
single function call containing a MyBEentry update of an 8byte value
seems almost guaranteed to be immeasurable. The executor startup alone
is several orders of magnitude more expensive.

I also think this proposed column should probably respect
the track_activities GUC.

Greetings,

Andres Freund




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-01 Thread Julien Rouhaud
On Thu, Aug 1, 2019 at 8:36 PM Andres Freund  wrote:
>
> On 2019-08-01 14:20:46 -0400, Robert Haas wrote:
> > However, I think that the fact that this patch adds 15 new calls to
> > pg_atomic_write_u64(>queryId, ...) is probably not a good
> > sign.  It seems like we ought to be able to centralize it better than
> > that.
>
> +1

Unfortunately I didn't find a better way to do that.  Since you can
have nested execution, I don't see how to avoid adding extra code in
every parts of query execution.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-01 Thread Julien Rouhaud
On Thu, Aug 1, 2019 at 8:36 PM Andres Freund  wrote:
>
> On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote:
> > On Wed, Jul 31, 2019 at 11:59 PM Andres Freund  wrote:
> > > And if it were necessary, why wouldn't any of the other fields in
> > > PgBackendStatus need it? There's plenty of other fields written to
> > > without a lock, and several of those are also 8 bytes (so it's not a
> > > case of assuming that 8 byte reads might not be atomic, but for byte
> > > reads are).
> >
> > This patch is actually storing the queryid in PGPROC, not in
> > PgBackendStatus, thus the need for an atomic.  I used PGPROC because
> > the value needs to be available in log_line_prefix() and spi.c, so
> > pgstat.c / PgBackendStatus didn't seem like the best interface in that
> > case.
>
> Hm. I'm not convinced that really is the case? You can just access
> MyBEentry, and read and update it?

Sure, but it requires extra wrapper functions, and the st_changecount
dance when writing the new value.

>  I mean, we do so at a frequency
> roughtly as high as high as the new queryid updates for things like
> pgstat_report_activity().

pgstat_report_activity() is only called for top-level statement.  For
the queryid we need to track it down to all nested statements, which
could be way higher.  But pgstat_progress_update_param() is called way
more than that.

>  Reading the value of your own backend you'd
> not need to follow the changecount algorithm, I think, because it's only
> updated from the current backend. If reading were a problem, you
> trivially just could have a cache in a local variable, to avoid
> accessing shared memory.

Yes definitely, except for pgstat_get_activity(), all reads are
backend local and should be totally safe to read as is.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-01 Thread Andres Freund
Hi,

On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote:
> On Wed, Jul 31, 2019 at 11:59 PM Andres Freund  wrote:
> > And if it were necessary, why wouldn't any of the other fields in
> > PgBackendStatus need it? There's plenty of other fields written to
> > without a lock, and several of those are also 8 bytes (so it's not a
> > case of assuming that 8 byte reads might not be atomic, but for byte
> > reads are).
> 
> This patch is actually storing the queryid in PGPROC, not in
> PgBackendStatus, thus the need for an atomic.  I used PGPROC because
> the value needs to be available in log_line_prefix() and spi.c, so
> pgstat.c / PgBackendStatus didn't seem like the best interface in that
> case.

Hm. I'm not convinced that really is the case? You can just access
MyBEentry, and read and update it?  I mean, we do so at a frequency
roughtly as high as high as the new queryid updates for things like
pgstat_report_activity().  Reading the value of your own backend you'd
not need to follow the changecount algorithm, I think, because it's only
updated from the current backend. If reading were a problem, you
trivially just could have a cache in a local variable, to avoid
accessing shared memory.


> Is widening PGPROC is too expensive for this purpose?

Well, I'm mostly not a fan of putting even more in there, because it's
pretty hard to understand already. To me it architecturally status
information doesn't belong there (In fact, I'm somewhat unhappy that
wait_event_info etc in there, but that's at least commonly updated at
the same time as other fields in PGPROC).

Greetings,

Andres Freund




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-01 Thread Andres Freund
On 2019-08-01 14:20:46 -0400, Robert Haas wrote:
> However, I think that the fact that this patch adds 15 new calls to
> pg_atomic_write_u64(>queryId, ...) is probably not a good
> sign.  It seems like we ought to be able to centralize it better than
> that.

+1




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-01 Thread Robert Haas
On Thu, Aug 1, 2019 at 2:46 AM Julien Rouhaud  wrote:
> This patch is actually storing the queryid in PGPROC, not in
> PgBackendStatus, thus the need for an atomic.  I used PGPROC because
> the value needs to be available in log_line_prefix() and spi.c, so
> pgstat.c / PgBackendStatus didn't seem like the best interface in that
> case.  Is widening PGPROC is too expensive for this purpose?

I doubt it.

However, I think that the fact that this patch adds 15 new calls to
pg_atomic_write_u64(>queryId, ...) is probably not a good
sign.  It seems like we ought to be able to centralize it better than
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-08-01 Thread Julien Rouhaud
On Wed, Jul 31, 2019 at 11:59 PM Andres Freund  wrote:
>
> On 2019-07-31 23:51:40 +0200, Julien Rouhaud wrote:
> > On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin  
> > wrote:
> > > What reason to use pg_atomic_uint64?
> >
> > The queryid is read and written without holding any lock on the PGPROC
> > entry, so the pg_atomic_uint64 will guarantee that we get a consistent
> > value in pg_stat_get_activity().  Other reads shouldn't be a problem
> > as far as I remember.
>
> Hm, I don't think that's necessary in this case. That's what the
> st_changecount protocol is trying to ensure, no?
>
> /*
>  * To avoid locking overhead, we use the following protocol: a backend
>  * increments st_changecount before modifying its entry, and again 
> after
>  * finishing a modification.  A would-be reader should note the value 
> of
>  * st_changecount, copy the entry into private memory, then check
>  * st_changecount again.  If the value hasn't changed, and if it's 
> even,
>  * the copy is valid; otherwise start over.  This makes updates cheap
>  * while reads are potentially expensive, but that's the tradeoff we 
> want.
>  *
>  * The above protocol needs memory barriers to ensure that the 
> apparent
>  * order of execution is as it desires.  Otherwise, for example, the 
> CPU
>  * might rearrange the code so that st_changecount is incremented 
> twice
>  * before the modification on a machine with weak memory ordering.  
> Hence,
>  * use the macros defined below for manipulating st_changecount, 
> rather
>  * than touching it directly.
>  */
> int st_changecount;
>
>
> And if it were necessary, why wouldn't any of the other fields in
> PgBackendStatus need it? There's plenty of other fields written to
> without a lock, and several of those are also 8 bytes (so it's not a
> case of assuming that 8 byte reads might not be atomic, but for byte
> reads are).

This patch is actually storing the queryid in PGPROC, not in
PgBackendStatus, thus the need for an atomic.  I used PGPROC because
the value needs to be available in log_line_prefix() and spi.c, so
pgstat.c / PgBackendStatus didn't seem like the best interface in that
case.  Is widening PGPROC is too expensive for this purpose?




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-07-31 Thread Andres Freund
Hi,

On 2019-07-31 23:51:40 +0200, Julien Rouhaud wrote:
> On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin  
> wrote:
> > What reason to use pg_atomic_uint64?
> 
> The queryid is read and written without holding any lock on the PGPROC
> entry, so the pg_atomic_uint64 will guarantee that we get a consistent
> value in pg_stat_get_activity().  Other reads shouldn't be a problem
> as far as I remember.

Hm, I don't think that's necessary in this case. That's what the
st_changecount protocol is trying to ensure, no?

/*
 * To avoid locking overhead, we use the following protocol: a backend
 * increments st_changecount before modifying its entry, and again after
 * finishing a modification.  A would-be reader should note the value of
 * st_changecount, copy the entry into private memory, then check
 * st_changecount again.  If the value hasn't changed, and if it's even,
 * the copy is valid; otherwise start over.  This makes updates cheap
 * while reads are potentially expensive, but that's the tradeoff we 
want.
 *
 * The above protocol needs memory barriers to ensure that the apparent
 * order of execution is as it desires.  Otherwise, for example, the CPU
 * might rearrange the code so that st_changecount is incremented twice
 * before the modification on a machine with weak memory ordering.  
Hence,
 * use the macros defined below for manipulating st_changecount, rather
 * than touching it directly.
 */
int st_changecount;


And if it were necessary, why wouldn't any of the other fields in
PgBackendStatus need it? There's plenty of other fields written to
without a lock, and several of those are also 8 bytes (so it's not a
case of assuming that 8 byte reads might not be atomic, but for byte
reads are).

Greetings,

Andres Freund




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-07-31 Thread Julien Rouhaud
Hello,

On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin  wrote:
>
> What reason to use pg_atomic_uint64?

The queryid is read and written without holding any lock on the PGPROC
entry, so the pg_atomic_uint64 will guarantee that we get a consistent
value in pg_stat_get_activity().  Other reads shouldn't be a problem
as far as I remember.

> In docs:
> occured - > occurred

Thanks!  I fixed it on my local branch.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-07-31 Thread Evgeny Efimkin
What reason to use pg_atomic_uint64?
In docs:
occured - > occurred

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-06-28 Thread Peter Geoghegan
On Tue, Mar 19, 2019 at 12:38 PM legrand legrand
 wrote:
> Would it make sense to add it in auto explain ?
> I don't know for explain itself, but maybe ...

I think that it should appear in EXPLAIN. pg_stat_statements already
cannot have a query hash of zero, so it might be okay to display it
only when its value is non-zero.

-- 
Peter Geoghegan




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-06-28 Thread Peter Geoghegan
On Tue, Mar 19, 2019 at 12:00 PM Robert Haas  wrote:
> On the other hand, it also appears that a lot of people would be very,
> very happy to just be able to see the query ID field that already
> exists, both in pg_stat_statements in pg_stat_activity, and we
> shouldn't throw up unnecessary impediments in the way of making that
> happen, at least IMHO.

+1.

pg_stat_statements will already lose all the statistics that it
aggregated in the event of a hard crash. The trade-off that the query
jumbling logic makes is not a bad one, all things considered.

-- 
Peter Geoghegan




<    1   2   3   >