Re: query_id, pg_stat_activity, extended query protocol

2024-05-16 Thread Imseih (AWS), Sami
> I'm unsure if it needs to call ExecutorStart in the bind code. But if we
> don't change the current logic, would it make more sense to move
> pgstat_report_query_id to the ExecutorRun routine?

I initially thought about that, but for utility statements (CTAS, etc.) being 
executed with extended query protocol, we will still not advertise the queryId 
as  we should. This is why I chose to set the queryId in PortalRunSelect and
PortalRunMulti in v2 of the patch [1].

We can advertise the queryId inside ExecutorRun instead of
PortalRunSelect as the patch does, but we will still need to advertise 
the queryId inside PortalRunMulti.

[1] 
https://www.postgresql.org/message-id/FAB6AEA1-AB5E-4DFF-9A2E-BB320E6C3DF1%40amazon.com


Regards,

Sami







Re: query_id, pg_stat_activity, extended query protocol

2024-05-15 Thread Andrei Lepikhov

On 15.05.2024 10:24, Imseih (AWS), Sami wrote:

I discovered the current state of queryId reporting and found that it
may be unlogical: Postgres resets queryId right before query execution
in simple protocol and doesn't reset it at all in extended protocol and
other ways to execute queries.


In exec_parse_message, exec_bind_message  and exec_execute_message,
the queryId is reset via pgstat_report_activity


I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.


When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.

I discovered this case a bit.
As I can see, the origin of the problem is that the exec_execute_message 
report STATE_RUNNING, although  ExecutorStart was called in the 
exec_bind_message routine beforehand.
I'm unsure if it needs to call ExecutorStart in the bind code. But if we 
don't change the current logic, would it make more sense to move 
pgstat_report_query_id to the ExecutorRun routine?


--
regards, Andrei Lepikhov





Re: query_id, pg_stat_activity, extended query protocol

2024-05-15 Thread Michael Paquier
On Wed, May 15, 2024 at 06:36:23PM +, Imseih (AWS), Sami wrote:
>> Okay, that's what I precisely wanted to understand: queryId doesn't have
>> semantics to show the job that consumes resources right now—it is mostly
>> about convenience to know that the backend processes nothing except
>> (probably) this query.
> 
> It may be a good idea to expose in pg_stat_activity or a
> supplemental activity view information about the current state of the
> query processing. i.e. Is it parsing, planning or executing a query or
> is it processing a nested query. 

pg_stat_activity is already quite bloated with attributes, and I'd
suspect that there are more properties in a query that would be
interesting to track down at a thinner level as long as it mirrors a
dynamic activity of the query.  Perhaps a separate catalog like a
pg_stat_query would make sense, moving query_start there as well?
Catalog breakages are never fun, still always happen because the
reasons behind a backward-incompatible change make the picture better
in the long-term for users.
--
Michael


signature.asc
Description: PGP signature


Re: query_id, pg_stat_activity, extended query protocol

2024-05-15 Thread Imseih (AWS), Sami
> Okay, that's what I precisely wanted to understand: queryId doesn't have
> semantics to show the job that consumes resources right now—it is mostly
> about convenience to know that the backend processes nothing except
> (probably) this query.

It may be a good idea to expose in pg_stat_activity or a
supplemental activity view information about the current state of the
query processing. i.e. Is it parsing, planning or executing a query or
is it processing a nested query. 

I can see this being useful and perhaps could be taken up in a 
separate thread.

Regards,

Sami






Re: query_id, pg_stat_activity, extended query protocol

2024-05-15 Thread Andrei Lepikhov

On 15/5/2024 12:09, Michael Paquier wrote:

On Wed, May 15, 2024 at 03:24:05AM +, Imseih (AWS), Sami wrote:

I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.


When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.

If we reset queryId at the end of execution, this behavior breaks. Right?


Idle sessions keep track of the last query string run, hence being
consistent in pg_stat_activity and report its query ID is user
friendly.  Resetting it while keeping the string is less consistent.
It's been this way for years, so I'd rather let it be this way.
Okay, that's what I precisely wanted to understand: queryId doesn't have 
semantics to show the job that consumes resources right now—it is mostly 
about convenience to know that the backend processes nothing except 
(probably) this query.


--
regards, Andrei Lepikhov





Re: query_id, pg_stat_activity, extended query protocol

2024-05-14 Thread Michael Paquier
On Wed, May 15, 2024 at 03:24:05AM +, Imseih (AWS), Sami wrote:
>> I think we should generally report it when the backend executes a job
>> related to the query with that queryId. This means it would reset the
>> queryId at the end of the query execution.
> 
> When the query completes execution and the session goes into a state 
> other than "active", both the query text and the queryId should be of the 
> last executed statement. This is the documented behavior, and I believe
> it's the correct behavior.
> 
> If we reset queryId at the end of execution, this behavior breaks. Right?

Idle sessions keep track of the last query string run, hence being
consistent in pg_stat_activity and report its query ID is user
friendly.  Resetting it while keeping the string is less consistent.
It's been this way for years, so I'd rather let it be this way.

>> This seems logical, but
>> what about the planning process? If an extension plans a query without
>> the intention to execute it for speculative reasons, should we still
>> show the queryId? Perhaps we should reset the state right after planning
>> to accurately reflect the current queryId.
>
> I think you are suggesting that during planning, the queryId
> of the current statement being planned should not be reported.
>  
> If my understanding is correct, I don't think that is a good idea. Tools that 
> snasphot pg_stat_activity will not be able to account for the queryId during
> planning. This could mean that certain load on the database cannot be tied
> back to a specific queryId.

I'm -1 with the point of resetting the query ID based on what the
patch does, even if it remains available in the hooks.
pg_stat_activity is one thing, but you would also reduce the coverage
of log_line_prefix with %Q.  And that can provide really useful
debugging information in the code paths where the query ID would be
reset as an effect of the proposed patch.

The patch to report the query ID of a planned query when running a
query through a PortalRunSelect() feels more intuitive in the
information it reports.
--
Michael


signature.asc
Description: PGP signature


Re: query_id, pg_stat_activity, extended query protocol

2024-05-14 Thread Imseih (AWS), Sami
> I discovered the current state of queryId reporting and found that it
> may be unlogical: Postgres resets queryId right before query execution
> in simple protocol and doesn't reset it at all in extended protocol and
> other ways to execute queries.

In exec_parse_message, exec_bind_message  and exec_execute_message,
the queryId is reset via pgstat_report_activity

> I think we should generally report it when the backend executes a job
> related to the query with that queryId. This means it would reset the
> queryId at the end of the query execution.

When the query completes execution and the session goes into a state 
other than "active", both the query text and the queryId should be of the 
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.

If we reset queryId at the end of execution, this behavior breaks. Right?

> This seems logical, but
> what about the planning process? If an extension plans a query without
> the intention to execute it for speculative reasons, should we still
> show the queryId? Perhaps we should reset the state right after planning
> to accurately reflect the current queryId.

I think you are suggesting that during planning, the queryId
of the current statement being planned should not be reported.
 
If my understanding is correct, I don't think that is a good idea. Tools that 
snasphot pg_stat_activity will not be able to account for the queryId during
planning. This could mean that certain load on the database cannot be tied
back to a specific queryId.

Regards,

Sami





Re: query_id, pg_stat_activity, extended query protocol

2024-05-08 Thread Andrei Lepikhov

On 5/1/24 10:07, Imseih (AWS), Sami wrote:

Here is a new rev of the patch which deals with the scenario
mentioned by Andrei [1] in which the queryId may change
due to a cached query invalidation.


[1] 
https://www.postgresql.org/message-id/724348C9-8023-41BC-895E-80634E79A538%40amazon.com
I discovered the current state of queryId reporting and found that it 
may be unlogical: Postgres resets queryId right before query execution 
in simple protocol and doesn't reset it at all in extended protocol and 
other ways to execute queries.
I think we should generally report it when the backend executes a job 
related to the query with that queryId. This means it would reset the 
queryId at the end of the query execution.
However, the process of setting up the queryId is more complex. Should 
we set it at the beginning of query execution? This seems logical, but 
what about the planning process? If an extension plans a query without 
the intention to execute it for speculative reasons, should we still 
show the queryId? Perhaps we should reset the state right after planning 
to accurately reflect the current queryId.
See in the attachment some sketch for that - it needs to add queryId 
reset on abortion.


--
regards, Andrei Lepikhov
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d63c..a4d38a157f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -470,6 +470,12 @@ ExecutorEnd(QueryDesc *queryDesc)
 		(*ExecutorEnd_hook) (queryDesc);
 	else
 		standard_ExecutorEnd(queryDesc);
+
+	/*
+	 * Report at the end of query execution. Don't change it for nested
+	 * queries.
+	 */
+	pgstat_report_query_id(0, false);
 }
 
 void
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 032818423f..ba29dc5bc7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -58,6 +58,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
+#include "pgstat.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/selfuncs.h"
@@ -280,6 +281,13 @@ planner(Query *parse, const char *query_string, int cursorOptions,
 		result = (*planner_hook) (parse, query_string, cursorOptions, boundParams);
 	else
 		result = standard_planner(parse, query_string, cursorOptions, boundParams);
+
+	/*
+	 * Reset queryId at the end of planning job. Executor code will set it up
+	 * again on the ExecutorStart call.
+	 */
+	pgstat_report_query_id(0, false);
+
 	return result;
 }
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2dff28afce..10e2529cf6 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1108,8 +1108,6 @@ exec_simple_query(const char *query_string)
 		const char *cmdtagname;
 		size_t		cmdtaglen;
 
-		pgstat_report_query_id(0, true);
-
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and


Re: query_id, pg_stat_activity, extended query protocol

2024-04-30 Thread Imseih (AWS), Sami
Here is a new rev of the patch which deals with the scenario
mentioned by Andrei [1] in which the queryId may change
due to a cached query invalidation.


[1] 
https://www.postgresql.org/message-id/724348C9-8023-41BC-895E-80634E79A538%40amazon.com

Regards,

Sami



0001-v2-Fix-Extended-Query-Protocol-handling-of-queryId.patch
Description: 0001-v2-Fix-Extended-Query-Protocol-handling-of-queryId.patch


Re: query_id, pg_stat_activity, extended query protocol

2024-04-27 Thread Andrei Lepikhov

On 27/4/2024 20:54, Imseih (AWS), Sami wrote:

But simplistic case with a prepared statement shows how the value of
queryId can be changed if you don't acquire all the objects needed for
the execution:




CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;


Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?
I don't think so. It may be any object, even stored procedure, that can 
be changed. IMO, the right option here is to report zero (like the 
undefined value of queryId) until the end of the parsing stage.


--
regards, Andrei Lepikhov





Re: query_id, pg_stat_activity, extended query protocol

2024-04-27 Thread Imseih (AWS), Sami
>> But simplistic case with a prepared statement shows how the value of
>> queryId can be changed if you don't acquire all the objects needed for
>> the execution:




>> CREATE TABLE test();
>> PREPARE name AS SELECT * FROM test;
>> EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
>> DROP TABLE test;
>> CREATE TABLE test();
>> EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;


> Hmm, you raise a good point. Isn't this a fundamental problem
> with prepared statements? If there is DDL on the
> relations of the prepared statement query, shouldn't the prepared
> statement be considered invalid at that point and raise an error
> to the user?

I tested v1 thoroughly.

Using the attached JDBC script for testing, I added some logging of the queryId 
being reported by the patch and added a breakpoint after sync [1] which at that 
point the locks are released on the table. I then proceeded to drop and 
recreate the table
and observed that the first bind after recreating the table still reports the
old queryId but the execute reports the correct queryId. This is because
the bind still has not had a chance to re-parse and re-plan after the
cache invalidation.


2024-04-27 13:51:15.757 CDT [43483] LOG:  duration: 21322.475 ms  execute S_1: 
select pg_sleep(10)
2024-04-27 13:51:21.591 CDT [43483] LOG:  duration: 0.834 ms  parse S_2: select 
from tab1 where id = $1
2024-04-27 13:51:21.591 CDT [43483] LOG:  query_id = -192969736922694368
2024-04-27 13:51:21.592 CDT [43483] LOG:  duration: 0.729 ms  bind S_2: select 
from tab1 where id = $1
2024-04-27 13:51:21.592 CDT [43483] LOG:  query_id = -192969736922694368
2024-04-27 13:51:21.592 CDT [43483] LOG:  duration: 0.032 ms  execute S_2: 
select from tab1 where id = $1
2024-04-27 13:51:32.501 CDT [43483] LOG:  query_id = -192969736922694368
2024-04-27 13:51:32.502 CDT [43483] LOG:  duration: 0.342 ms  bind S_2: select 
from tab1 where id = $1
2024-04-27 13:51:32.502 CDT [43483] LOG:  query_id = -192969736922694368
2024-04-27 13:51:32.502 CDT [43483] LOG:  duration: 0.067 ms  execute S_2: 
select from tab1 where id = $1
2024-04-27 13:51:42.613 CDT [43526] LOG:  query_id = -4766379021163149612
-- recreate the tables
2024-04-27 13:51:42.621 CDT [43526] LOG:  duration: 8.488 ms  statement: drop 
table if exists tab1;
2024-04-27 13:51:42.621 CDT [43526] LOG:  query_id = 7875284141628316369
2024-04-27 13:51:42.625 CDT [43526] LOG:  duration: 3.364 ms  statement: create 
table tab1 ( id int );
2024-04-27 13:51:42.625 CDT [43526] LOG:  query_id = 2967282624086800441
2024-04-27 13:51:42.626 CDT [43526] LOG:  duration: 0.936 ms  statement: insert 
into tab1 values (1);

-- this reports the old query_id
2024-04-27 13:51:45.058 CDT [43483] LOG:  query_id = -192969736922694368 

2024-04-27 13:51:45.059 CDT [43483] LOG:  duration: 0.913 ms  bind S_2: select 
from tab1 where id = $1
2024-04-27 13:51:45.059 CDT [43483] LOG:  query_id = 3010297048333693297
2024-04-27 13:51:45.059 CDT [43483] LOG:  duration: 0.096 ms  execute S_2: 
select from tab1 where id = $1
2024-04-27 13:51:46.777 CDT [43483] LOG:  query_id = 3010297048333693297
2024-04-27 13:51:46.777 CDT [43483] LOG:  duration: 0.108 ms  bind S_2: select 
from tab1 where id = $1
2024-04-27 13:51:46.777 CDT [43483] LOG:  query_id = 3010297048333693297
2024-04-27 13:51:46.777 CDT [43483] LOG:  duration: 0.024 ms  execute S_2: 
select from tab1 where id = $1

The easy answer is to not report queryId during the bind message, but I will 
look
at what else can be done here as it's good to have a queryId reported in this 
scenario
for cases there are long planning times and we rather not have those missed in 
pg_stat_activity sampling.


[1] 
https://github.com/postgres/postgres/blob/master/src/backend/tcop/postgres.c#L4877


Regards,

Sami



Test.java
Description: Test.java


Re: query_id, pg_stat_activity, extended query protocol

2024-04-27 Thread Imseih (AWS), Sami
> We choose a arguably more user-friendly option:

> https://www.postgresql.org/docs/current/sql-prepare.html

Thanks for pointing this out!

Regards,

Sami




Re: query_id, pg_stat_activity, extended query protocol

2024-04-27 Thread David G. Johnston
On Sat, Apr 27, 2024 at 6:55 AM Imseih (AWS), Sami 
wrote:

>
> Hmm, you raise a good point. Isn't this a fundamental problem
> with prepared statements? If there is DDL on the
> relations of the prepared statement query, shouldn't the prepared
> statement be considered invalid at that point and raise an error
> to the user?
>
>
We choose a arguably more user-friendly option:

https://www.postgresql.org/docs/current/sql-prepare.html

"""
Although the main point of a prepared statement is to avoid repeated parse
analysis and planning of the statement, PostgreSQL will force re-analysis
and re-planning of the statement before using it whenever database objects
used in the statement have undergone definitional (DDL) changes or their
planner statistics have been updated since the previous use of the prepared
statement.
"""

David J.


Re: query_id, pg_stat_activity, extended query protocol

2024-04-27 Thread Imseih (AWS), Sami
> But simplistic case with a prepared statement shows how the value of
> queryId can be changed if you don't acquire all the objects needed for
> the execution:


> CREATE TABLE test();
> PREPARE name AS SELECT * FROM test;
> EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
> DROP TABLE test;
> CREATE TABLE test();
> EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;

Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?

Regards,

Sami 



Re: query_id, pg_stat_activity, extended query protocol

2024-04-23 Thread Imseih (AWS), Sami
> I am also a bit surprised with the choice of using the first Query
> available in the list for the ID, FWIW.


IIUC, the query trees returned from QueryRewrite
will all have the same queryId, so it appears valid to 
use the queryId from the first tree in the list. Right?

Here is an example I was working with that includes user-defined rules
that has a list with more than 1 tree.


postgres=# explain (verbose, generic_plan) insert into mytab values ($1) 
RETURNING pg_sleep($1), id ;
QUERY PLAN 
---
Insert on public.mytab (cost=0.00..0.01 rows=1 width=4)
Output: pg_sleep(($1)::double precision), mytab.id
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425


Insert on public.mytab2 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425


Insert on public.mytab3 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425


Insert on public.mytab4 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425
(20 rows)



> Did you consider using \bind to show how this behaves in a regression
> test?


Yes, this is precisely how I tested. Without the patch, I could not
see a queryId after 9 seconds of a pg_sleep, but with the patch it 
appears. See the test below.


## test query
select pg_sleep($1) \bind 30


## unpatched
postgres=# select 
query_id, 
query, 
now()-query_start query_duration, 
state 
from pg_stat_activity where pid <> pg_backend_pid()
and state = 'active';
query_id |query | query_duration  | state  
--+--+-+
  | select pg_sleep($1) +| 00:00:08.604845 | active
  | ;| | 
(1 row)

## patched

postgres=# truncate table large;^C
postgres=# select 
query_id, 
query, 
now()-query_start query_duration, 
state 
from pg_stat_activity where pid <> pg_backend_pid()
and state = 'active';
  query_id   |query | query_duration | state  
-+--++
 2433215470630378210 | select pg_sleep($1) +| 00:00:09.6881  | active
 | ;|| 
(1 row)


For exec_execute_message, I realized that to report queryId for
Utility and non-utility statements, we need to report the queryId 
inside the portal routines where PlannedStmt contains the queryId.

Attached is the first real attempt at the fix. 

Regards,


Sami







0001-v1-Fix-Extended-QUery-Protocol-handling-of-queryId.patch
Description: 0001-v1-Fix-Extended-QUery-Protocol-handling-of-queryId.patch


Re: query_id, pg_stat_activity, extended query protocol

2024-04-23 Thread Andrei Lepikhov

On 4/23/24 12:49, Michael Paquier wrote:

On Tue, Apr 23, 2024 at 11:42:41AM +0700, Andrei Lepikhov wrote:

On 23/4/2024 11:16, Imseih (AWS), Sami wrote:

+   pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, 
true);
  set_ps_display("BIND");
@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long 
max_rows)
  debug_query_string = sourceText;
  pgstat_report_activity(STATE_RUNNING, sourceText);
+   pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);
  cmdtagname = GetCommandTagNameAndLen(portal->commandTag, );


In exec_bind_message, how can you be sure that queryId exists in query_list
before the call of GetCachedPlan(), which will validate and lock the plan?
What if some OIDs were altered in the middle?


I am also a bit surprised with the choice of using the first Query
available in the list for the ID, FWIW.

Did you consider using \bind to show how this behaves in a regression
test?
I'm not sure how to invent a test based on the \bind command - we need 
some pause in the middle.
But simplistic case with a prepared statement shows how the value of 
queryId can be changed if you don't acquire all the objects needed for 
the execution:


CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;

/*
QUERY PLAN
---
 Seq Scan on public.test (actual time=0.002..0.004 rows=0 loops=1)
 Query Identifier: 6750745711909650694

QUERY PLAN
---
 Seq Scan on public.test (actual time=0.004..0.004 rows=0 loops=1)
 Query Identifier: -2597546769858730762
*/

We have different objects which can be changed - I just have invented 
the most trivial example to discuss.


--
regards, Andrei Lepikhov





Re: query_id, pg_stat_activity, extended query protocol

2024-04-22 Thread Michael Paquier
On Tue, Apr 23, 2024 at 11:42:41AM +0700, Andrei Lepikhov wrote:
> On 23/4/2024 11:16, Imseih (AWS), Sami wrote:
>> +   pgstat_report_query_id(linitial_node(Query, 
>> psrc->query_list)->queryId, true);
>>  set_ps_display("BIND");
>> @@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long 
>> max_rows)
>>  debug_query_string = sourceText;
>>  pgstat_report_activity(STATE_RUNNING, sourceText);
>> +   pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, 
>> true);
>>  cmdtagname = GetCommandTagNameAndLen(portal->commandTag, 
>> );
>
> In exec_bind_message, how can you be sure that queryId exists in query_list
> before the call of GetCachedPlan(), which will validate and lock the plan?
> What if some OIDs were altered in the middle?

I am also a bit surprised with the choice of using the first Query
available in the list for the ID, FWIW.

Did you consider using \bind to show how this behaves in a regression
test?
--
Michael


signature.asc
Description: PGP signature


Re: query_id, pg_stat_activity, extended query protocol

2024-04-22 Thread Andrei Lepikhov

On 23/4/2024 11:16, Imseih (AWS), Sami wrote:

FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.


I was dealing with this today and found this thread. I spent some time
looking at possible solutions.

In the flow of extended query protocol, the exec_parse_message
reports the queryId, but subsequent calls to exec_bind_message
and exec_execute_message reset the queryId when calling
pgstat_report_activity(STATE_RUNNING,..) as you can see below.
   
  /*

   * If a new query is started, we reset the query identifier as it'll only
   * be known after parse analysis, to avoid reporting last query's
   * identifier.
   */
  if (state == STATE_RUNNING)
  beentry->st_query_id = UINT64CONST(0);


So, I think the simple answer is something like the below.
Inside exec_bind_message and exec_execute_message,
the query_id should be reported after pg_report_activity.

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 76f48b13d2..7ec2df91d5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1678,6 +1678,7 @@ exec_bind_message(StringInfo input_message)
 debug_query_string = psrc->query_string;
  
 pgstat_report_activity(STATE_RUNNING, psrc->query_string);

+   pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, 
true);
  
 set_ps_display("BIND");
  
@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows)

 debug_query_string = sourceText;
  
 pgstat_report_activity(STATE_RUNNING, sourceText);

+   pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);
  
 cmdtagname = GetCommandTagNameAndLen(portal->commandTag, );



thoughts?
In exec_bind_message, how can you be sure that queryId exists in 
query_list before the call of GetCachedPlan(), which will validate and 
lock the plan? What if some OIDs were altered in the middle?


--
regards, Andrei Lepikhov





Re: query_id, pg_stat_activity, extended query protocol

2024-04-22 Thread Imseih (AWS), Sami
> FWIW, I'd like to think that we could improve the situation, requiring
> a mix of calling pgstat_report_query_id() while feeding on some query
> IDs retrieved from CachedPlanSource->query_list. I have not in
> details looked at how much could be achieved, TBH.

I was dealing with this today and found this thread. I spent some time
looking at possible solutions.

In the flow of extended query protocol, the exec_parse_message 
reports the queryId, but subsequent calls to exec_bind_message
and exec_execute_message reset the queryId when calling
pgstat_report_activity(STATE_RUNNING,..) as you can see below.
  
 /*
  * If a new query is started, we reset the query identifier as it'll only
  * be known after parse analysis, to avoid reporting last query's
  * identifier.
  */
 if (state == STATE_RUNNING)
 beentry->st_query_id = UINT64CONST(0);


So, I think the simple answer is something like the below. 
Inside exec_bind_message and exec_execute_message,
the query_id should be reported after pg_report_activity. 

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 76f48b13d2..7ec2df91d5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1678,6 +1678,7 @@ exec_bind_message(StringInfo input_message)
debug_query_string = psrc->query_string;
 
pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+   pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, 
true);
 
set_ps_display("BIND");
 
@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long 
max_rows)
debug_query_string = sourceText;
 
pgstat_report_activity(STATE_RUNNING, sourceText);
+   pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);
 
cmdtagname = GetCommandTagNameAndLen(portal->commandTag, );


thoughts?


Regards,

Sami Imseih
Amazon Web Services (AWS)



Re: query_id, pg_stat_activity, extended query protocol

2024-03-20 Thread Dave Cramer
>
>
>>
>> FWIW, I'd like to think that we could improve the situation, requiring
>> a mix of calling pgstat_report_query_id() while feeding on some query
>> IDs retrieved from CachedPlanSource->query_list.  I have not in
>> details looked at how much could be achieved, TBH.
>>
>
This just cropped up as a pgjdbc github issue. Seems like something that
should be addressed.

Dave


Re: query_id, pg_stat_activity, extended query protocol

2023-06-13 Thread kaido vaikla
Thnx.
br
Kaido

On Tue, 13 Jun 2023 at 03:16, Michael Paquier  wrote:

> On Mon, Jun 12, 2023 at 09:03:24PM +0300, kaido vaikla wrote:
> > I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
> > is not present in pg_stat_activity.  Erik Wienhold figured out that
> reason
> > can be in extended query protocol (
> >
> https://www.postgresql.org/message-id/1391613709.939460.1684777418...@office.mailbox.org
> > )
> > My question is, is it expected or is it a bug: if extended query protocol
> > then no query_id  in pg_stat_activity for running query.
>
> Well, you could say a bit of both, I guess.  The query ID is compiled
> and stored in backend entries only after parse analysis, which is not
> something that would happen when using the execution phase of the
> extended query protocol, though it should be possible to access to the
> Query nodes in the cached plans and their assigned query IDs.
>
> FWIW, I'd like to think that we could improve the situation, requiring
> a mix of calling pgstat_report_query_id() while feeding on some query
> IDs retrieved from CachedPlanSource->query_list.  I have not in
> details looked at how much could be achieved, TBH.
> --
> Michael
>


Re: query_id, pg_stat_activity, extended query protocol

2023-06-12 Thread Michael Paquier
On Mon, Jun 12, 2023 at 09:03:24PM +0300, kaido vaikla wrote:
> I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
> is not present in pg_stat_activity.  Erik Wienhold figured out that reason
> can be in extended query protocol (
> https://www.postgresql.org/message-id/1391613709.939460.1684777418...@office.mailbox.org
> )
> My question is, is it expected or is it a bug: if extended query protocol
> then no query_id  in pg_stat_activity for running query.

Well, you could say a bit of both, I guess.  The query ID is compiled
and stored in backend entries only after parse analysis, which is not
something that would happen when using the execution phase of the
extended query protocol, though it should be possible to access to the
Query nodes in the cached plans and their assigned query IDs.

FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list.  I have not in
details looked at how much could be achieved, TBH.
--
Michael


signature.asc
Description: PGP signature


query_id, pg_stat_activity, extended query protocol

2023-06-12 Thread kaido vaikla
I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
is not present in pg_stat_activity.  Erik Wienhold figured out that reason
can be in extended query protocol (
https://www.postgresql.org/message-id/1391613709.939460.1684777418...@office.mailbox.org
)
My question is, is it expected or is it a bug: if extended query protocol
then no query_id  in pg_stat_activity for running query.

br
Kaido