Re: RFC: Logging plan of the running query
Hi, On 2024-03-13 15:33:02 -0400, Robert Haas wrote: > But also ... having to wrap the entire plan tree like this seems > pretty awful. I don't really like the idea of a large-scan plan > modification like this in the middle of the query. It's not great. But I also don't really see an alternative with this approach. I guess we could invent a new CFI version that gets the current PlanState and use that in all of src/backend/executor/node* and pass the PlanState to that - but then we could just as well just directly process the interrupt there. > I also wonder whether it interacts properly with JIT. I don't think there's a problem unless somebody invests a lot of time in JITing much more of the query. Which will require a lot more work, basically redesigning the executor... > Andres, did you have some clever idea for this feature that would > avoid the need to do this? No. I think it's acceptable though. However it might be worth inventing an executor tree walker in a preliminary step. We have already quite a few switches over all plan nodes, which we could largely replace with a helper. Greetings, Andres Freund
Re: RFC: Logging plan of the running query
On 2024-03-14 04:33, Robert Haas wrote: Thanks for your review! On Wed, Mar 13, 2024 at 1:28 AM torikoshia wrote: - I saw no way to find the next node to be executed from the planstate tree, so the patch wraps all the ExecProcNode of the planstate tree at CHECK_FOR_INTERRUPTS(). I don't think it does this correctly, because some node types have children other than the left and right node. See /* special child plans */ in ExplainNode(). Agreed. But also ... having to wrap the entire plan tree like this seems pretty awful. I don't really like the idea of a large-scan plan modification like this in the middle of the query. Yeah, but I haven't come up with other ideas to select the appropriate node to wrap. Andres, did you have some clever idea for this feature that would avoid the need to do this? On 2024-03-14 10:02, jian he wrote: On Wed, Mar 13, 2024 at 1:28 PM torikoshia wrote: On Fri, Feb 16, 2024 at 11:42 PM torikoshia wrote: > I'm not so sure about the implementation now, i.e. finding the next > node > to be executed from the planstate tree, but I'm going to try this > approach. Attached a patch which takes this approach. one minor issue. I understand the regress test, compare the expected outcome with testrun outcome, but can you enlighten me about how you check if the change you made in contrib/auto_explain/t/001_auto_explain.pl is correct. (i am not familiar with perl). $pg_log_plan_query_output saves the output plan of pg_log_query_plan() and $auto_explain_output saves the output plan of auto_explain. The test checks the both are the same using cmp_ok(). Did I answer your question? diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h index cf195f1359..2d06bf297e 100644 --- a/src/include/commands/explain.h +++ b/src/include/commands/explain.h @@ -17,6 +17,8 @@ #include "lib/stringinfo.h" #include "parser/parse_node.h" +extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive; diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 054dd2bf62..3c6bd1ea7c 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -167,6 +167,7 @@ struct Node; extern bool message_level_is_interesting(int elevel); +extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive; utils/elog.h is already included in src/include/postgres.h. you don't need to declare ProcessLogQueryPlanInterruptActive at include/commands/explain.h? generally a variable should only declare once? Yeah, this declaration is not necessary and we should add include commands/explain.h to src/backend/access/transam/xact.c. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
On Wed, Mar 13, 2024 at 1:28 PM torikoshia wrote: > > On Fri, Feb 16, 2024 at 11:42 PM torikoshia > wrote: > > I'm not so sure about the implementation now, i.e. finding the next > > node > > to be executed from the planstate tree, but I'm going to try this > > approach. > > Attached a patch which takes this approach. > one minor issue. I understand the regress test, compare the expected outcome with testrun outcome, but can you enlighten me about how you check if the change you made in contrib/auto_explain/t/001_auto_explain.pl is correct. (i am not familiar with perl). diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h index cf195f1359..2d06bf297e 100644 --- a/src/include/commands/explain.h +++ b/src/include/commands/explain.h @@ -17,6 +17,8 @@ #include "lib/stringinfo.h" #include "parser/parse_node.h" +extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive; diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 054dd2bf62..3c6bd1ea7c 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -167,6 +167,7 @@ struct Node; extern bool message_level_is_interesting(int elevel); +extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive; utils/elog.h is already included in src/include/postgres.h. you don't need to declare ProcessLogQueryPlanInterruptActive at include/commands/explain.h? generally a variable should only declare once?
Re: RFC: Logging plan of the running query
On Wed, Mar 13, 2024 at 1:28 AM torikoshia wrote: > - I saw no way to find the next node to be executed from the planstate > tree, so the patch wraps all the ExecProcNode of the planstate tree at > CHECK_FOR_INTERRUPTS(). I don't think it does this correctly, because some node types have children other than the left and right node. See /* special child plans */ in ExplainNode(). But also ... having to wrap the entire plan tree like this seems pretty awful. I don't really like the idea of a large-scan plan modification like this in the middle of the query. I also wonder whether it interacts properly with JIT. But at the same time, I wonder how you're supposed to avoid it. Andres, did you have some clever idea for this feature that would avoid the need to do this? -- Robert Haas EDB: http://www.enterprisedb.com
Re: RFC: Logging plan of the running query
On Fri, Feb 16, 2024 at 11:42 PM torikoshia wrote: I'm not so sure about the implementation now, i.e. finding the next node to be executed from the planstate tree, but I'm going to try this approach. Attached a patch which takes this approach. - I saw no way to find the next node to be executed from the planstate tree, so the patch wraps all the ExecProcNode of the planstate tree at CHECK_FOR_INTERRUPTS(). - To prevent overhead of this wrapped function call, unwrap it at the end of EXPLAIN code execution. - I first tried to use ExecSetExecProcNode() for wrapping, but it 'changes' ExecProcNodeMtd of nodes, not 'adds' some process to ExecProcNodeMtd. I'm not sure this is the right approach, but attached patch adds new member ExecProcNodeOriginal to PlanState to preserve original ExecProcNodeMtd. Any comments are welcomed. -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom a374bf485e6e7237314179313ac7cb61a0ad784b Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Wed, 13 Mar 2024 13:47:18 +0900 Subject: [PATCH v38] Add function to log the plan of the query Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, EXPLAIN codes for logging is wrapped to every ExecProcNode and when the executor runs one of ExecProcNode, the plan is logged. These EXPLAIN codes are unwrapped when EXPLAIN codes actually run once. In this way, we can avoid adding the overhead which we'll face when adding CHECK_FOR_INTERRUPTS() like mechanisms in somewhere in executor codes. --- contrib/auto_explain/Makefile | 2 + contrib/auto_explain/auto_explain.c | 23 +- contrib/auto_explain/t/001_auto_explain.pl| 35 +++ doc/src/sgml/func.sgml| 50 src/backend/access/transam/xact.c | 17 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c| 228 +- src/backend/executor/execMain.c | 19 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/tcop/postgres.c | 4 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h| 9 + src/include/miscadmin.h | 1 + src/include/nodes/execnodes.h | 3 + src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h | 2 +- src/include/utils/elog.h | 1 + .../injection_points/injection_points.c | 11 + src/test/regress/expected/misc_functions.out | 54 - src/test/regress/sql/misc_functions.sql | 41 +++- 21 files changed, 473 insertions(+), 42 deletions(-) diff --git a/contrib/auto_explain/Makefile b/contrib/auto_explain/Makefile index efd127d3ca..64fe0e0573 100644 --- a/contrib/auto_explain/Makefile +++ b/contrib/auto_explain/Makefile @@ -8,6 +8,8 @@ PGFILEDESC = "auto_explain - logging facility for execution plans" TAP_TESTS = 1 +EXTRA_INSTALL = src/test/modules/injection_points + ifdef USE_PGXS PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index 677c135f59..e041b10b0e 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -401,26 +401,9 @@ explain_ExecutorEnd(QueryDesc *queryDesc) es->format = auto_explain_log_format; es->settings = auto_explain_log_settings; - ExplainBeginOutput(es); - ExplainQueryText(es, queryDesc); - ExplainQueryParameters(es, queryDesc->params, auto_explain_log_parameter_max_length); - ExplainPrintPlan(es, queryDesc); - if (es->analyze && auto_explain_log_triggers) -ExplainPrintTriggers(es, queryDesc); - if (es->costs) -ExplainPrintJITSummary(es, queryDesc); - ExplainEndOutput(es); - - /* Remove last line break */ - if (es->str->len > 0 && es->str->data[es->str->len - 1] == '\n') -es->str->data[--es->str->len] = '\0'; - - /* Fix JSON to output an object */ - if (auto_explain_log_format == EXPLAIN_FORMAT_JSON) - { -es->str->data[0] = '{'; -es->str->data[es->str->len - 1] = '}'; - } + ExplainAssembleLogOutput(es, queryDesc, auto_explain_log_format, + auto_explain_log_triggers, + auto_explain_log_parameter_max_length); /* * Note: we rely on the existing logging of context or diff --git
Re: RFC: Logging plan of the running query
On Sat, Mar 2, 2024 at 10:46 AM James Coleman wrote: > If I can rephrase this idea: it's basically "delay this interrupt > until inline to the next ExecProcNode execution". Yes, but it's not just that. It also means that the code which would handle the interrupt doesn't need to be called at every ExecProcNode. Only when the interrupt actually arrives do we enable the code that handles it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: RFC: Logging plan of the running query
On Wed, Feb 28, 2024 at 1:18 AM Robert Haas wrote: > > On Mon, Feb 26, 2024 at 5:31 PM torikoshia wrote: > > It would be nice if there was a place accessed once every few seconds or > > so.. > > I think this comment earlier from Andres deserves close attention: > > # If we went with something like tht approach, I think we'd have to do > something > # like redirecting node->ExecProcNode to a wrapper, presumably from within a > # CFI. That wrapper could then implement the explain support, without slowing > # down the normal execution path. > > If this is correctly implemented, the overhead in the case where the > feature isn't used should be essentially zero, I believe. If I can rephrase this idea: it's basically "delay this interrupt until inline to the next ExecProcNode execution". That seems pretty promising to me as well. Regards, James Coleman
Re: RFC: Logging plan of the running query
On Mon, Feb 26, 2024 at 5:31 PM torikoshia wrote: > It would be nice if there was a place accessed once every few seconds or > so.. I think this comment earlier from Andres deserves close attention: # If we went with something like tht approach, I think we'd have to do something # like redirecting node->ExecProcNode to a wrapper, presumably from within a # CFI. That wrapper could then implement the explain support, without slowing # down the normal execution path. If this is correctly implemented, the overhead in the case where the feature isn't used should be essentially zero, I believe. -- Robert Haas EDB: http://www.enterprisedb.com
Re: RFC: Logging plan of the running query
On Mon, Feb 26, 2024 at 01:56:44PM +0530, Robert Haas wrote: > On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud wrote: > > Yeah, trying to find a generalized solution seems like worth investing some > > time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code > > a > > few years down the road. > > I just don't really see how to do it. I suspect that every task that > wants to run at some CFIs but not others is going to have slightly > different requirements, and we probably can't foresee what all of > those requirements are. > > Said another way, if in the future we want to call > DoSomethingOrOther() from the CFI handler, well then we need to know > that we're not already in the middle of using any subsystem that > DoSomethingOrOther() might also try to use ... and we also need to > know that we're not in the middle of doing anything that's more > critical than DoSomethingOrOther(). But both of these are likely to > vary in each case. > > EXPLAIN might be one member of a general class of things that require > catalog access (and thus might take locks or lwlocks, access the > catalogs, trigger invalidations, etc.) but it's not clear how general > that class really is. Also, I think if we try to do too many different > kinds of things at CFIs, the whole thing is going to fall apart. > You'll end up failing to foresee some interactions, or the stack will > get too deep, or ... something. I still fail to understand your point. So you say we might want to check for safe condition to run explain or DoSomethingOrOther or even DoMeh, with all different requirements. IIUC what you're saying is that we should have CHECK_FOR_EXPLAIN(), CHECK_FOR_DOSOMETHINGOROTHER() and CHECK_FOR_MEH()? And so in some code path A we could have CHECK_FOR_INTERRUPTS(); CHECK_FOR_EXPLAIN(); In another CHECK_FOR_INTERRUPTS(); CHECK_FOR_DOSOMETHINGOROTHER(); and in one happy path CHECK_FOR_INTERRUPTS(); CHECK_FOR_EXPLAIN(); CHECK_FOR_DOSOMETHINGOROTHER(); CHECK_FOR_MEH(); Or that we should still have all of those but they shouldn't be anywhere close to a CHECK_FOR_INTERRUPTS(), or something totally different? In the first case, I'm not sure why having CHECK_FOR_INTERRUPTS(EXPLAIN|...), with a combination of flags and with the flag handling being done after ProcessIterrupts(), would be any different apart from having less boilerplate lines. Similarly for the 2nd case, but relying on a single more general CHECK_FOR_CONDITION(EXPLAIN | ...) rather than N CHECK_FOR_XXX? If you just want something totally different then sure.
Re: RFC: Logging plan of the running query
On 2024-02-24 00:23, Robert Haas wrote: On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud wrote: On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote: > On Thu, Feb 22, 2024 at 6:25 AM James Coleman wrote: > > This is potentially a bit of a wild idea, but I wonder if having some > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > > "normal" as opposed to "critical" (using that word differently than > > the existing critical sections) would be worth it. > > It's worth considering, but the definition of "normal" vs. "critical" > might be hard to pin down. Or, we might end up with a definition that > is specific to this particular case and not generalizable to others. But it doesn't have to be all or nothing right? I mean each call could say what the situation is like in their context, like CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and slowly tag calls as needed, similarly to how we add already CFI based on users report. Absolutely. My gut feeling is that it's going to be simpler to pick a small number of places that are safe and sufficient for this particular feature and add an extra call there Hmm, whether extending CHECK_FOR_INTERRUPTS() or adding extras call directly, currently I'm not sure where are the good 'places', which don't give performance impact. As attached PoC patch, I experimentally added extra calls on ExecScanFetch() which would be less called than ExecProcNode()[1]. When running sequential scan on pgbench_accounts which is on the memory, there seems a performance degradation. - Executed "select * from pgbench_accounts" for 20 times - Compared the elapsed time between the patch applied and not applied on 874d817baa160ca7e68 - there were no heap_blks_read during the query - pgbench_accounts has 300 rows patch NOT applied: - average: 335.88 ms - max: 367.313 ms - min: 309.609 ms patch applied: - average: 342.57 ms - max: 380.099 ms - min: 324.270 ms It would be nice if there was a place accessed once every few seconds or so.. [1] https://www.postgresql.org/message-id/20240215185911.v4o6fo444md6a3w7%40awork3.anarazel.de -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom dee1ce857c89bf646ff98f94c846746b96869ebb Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 26 Feb 2024 19:19:35 +0900 Subject: [PATCH v37] Add function to log the plan of the query Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_LOG_QUERY_PLAN_PENDING() the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. CHECK_LOG_QUERY_PLAN_PENDING() is called from ExecScanFetch() and ProcessClientReadInterrupt(). The former is intended to log plans during any kind of scans and the latter is intended to log that the no query is executing during client read. --- contrib/auto_explain/Makefile | 2 + contrib/auto_explain/auto_explain.c | 23 +-- contrib/auto_explain/t/001_auto_explain.pl| 35 doc/src/sgml/func.sgml| 50 + src/backend/access/transam/xact.c | 19 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c| 176 +- src/backend/executor/execMain.c | 19 ++ src/backend/executor/execScan.c | 2 + src/backend/storage/ipc/procsignal.c | 4 + src/backend/tcop/postgres.c | 5 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h| 16 ++ src/include/miscadmin.h | 1 + src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h | 2 +- src/include/utils/elog.h | 1 + .../injection_points/injection_points.c | 11 ++ src/test/regress/expected/misc_functions.out | 54 +- src/test/regress/sql/misc_functions.sql | 41 +++- 21 files changed, 430 insertions(+), 42 deletions(-) diff --git a/contrib/auto_explain/Makefile b/contrib/auto_explain/Makefile index efd127d3ca..64fe0e0573 100644 --- a/contrib/auto_explain/Makefile +++ b/contrib/auto_explain/Makefile @@ -8,6 +8,8 @@ PGFILEDESC = "auto_explain - logging facility for execution plans" TAP_TESTS = 1
Re: RFC: Logging plan of the running query
On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud wrote: > Yeah, trying to find a generalized solution seems like worth investing some > time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a > few years down the road. I just don't really see how to do it. I suspect that every task that wants to run at some CFIs but not others is going to have slightly different requirements, and we probably can't foresee what all of those requirements are. Said another way, if in the future we want to call DoSomethingOrOther() from the CFI handler, well then we need to know that we're not already in the middle of using any subsystem that DoSomethingOrOther() might also try to use ... and we also need to know that we're not in the middle of doing anything that's more critical than DoSomethingOrOther(). But both of these are likely to vary in each case. EXPLAIN might be one member of a general class of things that require catalog access (and thus might take locks or lwlocks, access the catalogs, trigger invalidations, etc.) but it's not clear how general that class really is. Also, I think if we try to do too many different kinds of things at CFIs, the whole thing is going to fall apart. You'll end up failing to foresee some interactions, or the stack will get too deep, or ... something. -- Robert Haas EDB: http://www.enterprisedb.com
Re: RFC: Logging plan of the running query
On Mon, Feb 26, 2024 at 1:25 PM Julien Rouhaud wrote: > > > > > > > > #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL > > > > > > #define CHECK_FOR_INTERRUPTS(...) \ > > > CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \ > > > > > > CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \ > > > > > > CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \ > > > ) > > > > > > We would have to duplicate the current CFI body, but it should never > > > really > > > change, and we shouldn't end up with more than 2 overloads anyway so I > > > don't > > > see it being much of a problem. > > > > Why do we need this complex macro? > > So that client code can use either CHECK_FOR_INTERRUPTS() or > CHECK_FOR_INTERRUPTS(flag) rather that transforming every single > CHECK_FOR_INTERRUPTS() to CHECK_FOR_INTERRUPTS(0), which was Robert's > complaint about this approach. It might be better to just create two marcos (with names like CHECK_FOR_INTERRUPTS() and CHECK_FOR_INTERRUPTS_SAFE()) to call ProcessInterrupt() directly and modify ProcessInterrupts() to accept the flag (and if required call CFIFlagHandler() additionally). Last macro is hard to understand. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On Mon, Feb 26, 2024 at 12:19:42PM +0530, Ashutosh Bapat wrote: > On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud wrote: > > > > > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud > > > > wrote: > > > > > > > > > > But it doesn't have to be all or nothing right? I mean each call > > > > > could say > > > > > what the situation is like in their context, like > > > > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | > > > > > GUARANTEE_WHATEVER), and > > > > > slowly tag calls as needed, similarly to how we add already CFI based > > > > > on users > > > > > report. > > That has some potential ... > > > > > I might be missing something, but since we already have a ton of macro > > hacks, > > why not get another to allow CFI() overloading rather than modifying every > > single call? Something like that should do the trick (and CFIFlagHandler() > > is > > just a naive example with a function call to avoid multiple evaluation, > > should > > be replaced with anything that required more than 10s thinking): > > > > #define CHECK_FOR_INTERRUPTS_0() \ > > do { \ > > if (INTERRUPTS_PENDING_CONDITION()) \ > > ProcessInterrupts(); \ > > } while(0) > > > > #define CHECK_FOR_INTERRUPTS_1(f) \ > > do { \ > > if (INTERRUPTS_PENDING_CONDITION()) \ > > ProcessInterrupts(); \ > > \ > > CFIFlagHandler(f); \ > > } while(0) > > From your earlier description I thought you are talking about flags > that can be ORed. We need only two macros above. Why are we calling > CFIFLagHandler() after ProcessInterrupts()? Shouldn't we pass flags to > ProcessInterrupts() itself. Yes, I'm still talking about ORed flags passed to CFI(). That CFIFlagHandler call is just an example for a generalized function that would act those flags, rather than having it coded inside the macro. > > > > > #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL > > > > #define CHECK_FOR_INTERRUPTS(...) \ > > CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \ > > > > CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \ > > > > CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \ > > ) > > > > We would have to duplicate the current CFI body, but it should never really > > change, and we shouldn't end up with more than 2 overloads anyway so I don't > > see it being much of a problem. > > Why do we need this complex macro? So that client code can use either CHECK_FOR_INTERRUPTS() or CHECK_FOR_INTERRUPTS(flag) rather that transforming every single CHECK_FOR_INTERRUPTS() to CHECK_FOR_INTERRUPTS(0), which was Robert's complaint about this approach.
Re: RFC: Logging plan of the running query
On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud wrote: > > On Sat, Feb 24, 2024 at 08:56:41AM -0500, James Coleman wrote: > > On Fri, Feb 23, 2024 at 10:23 AM Robert Haas wrote: > > > > > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud wrote: > > > > > > > > But it doesn't have to be all or nothing right? I mean each call could > > > > say > > > > what the situation is like in their context, like > > > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | > > > > GUARANTEE_WHATEVER), and > > > > slowly tag calls as needed, similarly to how we add already CFI based > > > > on users > > > > report. That has some potential ... > > Yeah, trying to find a generalized solution seems like worth investing some > time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a > few years down the road. > > I might be missing something, but since we already have a ton of macro hacks, > why not get another to allow CFI() overloading rather than modifying every > single call? Something like that should do the trick (and CFIFlagHandler() is > just a naive example with a function call to avoid multiple evaluation, should > be replaced with anything that required more than 10s thinking): > > #define CHECK_FOR_INTERRUPTS_0() \ > do { \ > if (INTERRUPTS_PENDING_CONDITION()) \ > ProcessInterrupts(); \ > } while(0) > > #define CHECK_FOR_INTERRUPTS_1(f) \ > do { \ > if (INTERRUPTS_PENDING_CONDITION()) \ > ProcessInterrupts(); \ > \ > CFIFlagHandler(f); \ > } while(0) >From your earlier description I thought you are talking about flags that can be ORed. We need only two macros above. Why are we calling CFIFLagHandler() after ProcessInterrupts()? Shouldn't we pass flags to ProcessInterrupts() itself. > > #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL > > #define CHECK_FOR_INTERRUPTS(...) \ > CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \ > > CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \ > > CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \ > ) > > We would have to duplicate the current CFI body, but it should never really > change, and we shouldn't end up with more than 2 overloads anyway so I don't > see it being much of a problem. Why do we need this complex macro? -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On Sat, Feb 24, 2024 at 08:56:41AM -0500, James Coleman wrote: > On Fri, Feb 23, 2024 at 10:23 AM Robert Haas wrote: > > > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud wrote: > > > > > > But it doesn't have to be all or nothing right? I mean each call could > > > say > > > what the situation is like in their context, like > > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), > > > and > > > slowly tag calls as needed, similarly to how we add already CFI based on > > > users > > > report. > > > > Absolutely. My gut feeling is that it's going to be simpler to pick a > > small number of places that are safe and sufficient for this > > particular feature and add an extra call there, similar to how we do > > vacuum_delay_point(). The reason I think that's likely to be better is > > that it will likely require changing only a relatively small number of > > places. If we instead start annotating CFIs, well, we've got hundreds > > of those. That's a lot more to change, and it also inconveniences > > third-party extension authors and people doing back-patching. I'm not > > here to say it can't work; I just think it's likely not the easiest > > path. > > Yes, I suspect it's not the easiest path. I have a small hunch it > might end up paying more dividends in the future: there isn't just one > of these things that is regularly a thorny discussion for the same > reasons each time (basically "no way to trigger this safely from > another backend interrupting another one at an arbitrary point"), and > if we were able to generalize a solution we may have multiple wins (a > very obvious one in my mind is the inability of auto explain to run an > explain at the precise time it's most useful: when statement timeout > fires). Yeah, trying to find a generalized solution seems like worth investing some time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a few years down the road. I might be missing something, but since we already have a ton of macro hacks, why not get another to allow CFI() overloading rather than modifying every single call? Something like that should do the trick (and CFIFlagHandler() is just a naive example with a function call to avoid multiple evaluation, should be replaced with anything that required more than 10s thinking): #define CHECK_FOR_INTERRUPTS_0() \ do { \ if (INTERRUPTS_PENDING_CONDITION()) \ ProcessInterrupts(); \ } while(0) #define CHECK_FOR_INTERRUPTS_1(f) \ do { \ if (INTERRUPTS_PENDING_CONDITION()) \ ProcessInterrupts(); \ \ CFIFlagHandler(f); \ } while(0) #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL #define CHECK_FOR_INTERRUPTS(...) \ CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \ CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \ CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \ ) We would have to duplicate the current CFI body, but it should never really change, and we shouldn't end up with more than 2 overloads anyway so I don't see it being much of a problem.
Re: RFC: Logging plan of the running query
On Fri, Feb 23, 2024 at 10:23 AM Robert Haas wrote: > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud wrote: > > On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote: > > > On Thu, Feb 22, 2024 at 6:25 AM James Coleman wrote: > > > > This is potentially a bit of a wild idea, but I wonder if having some > > > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > > > > "normal" as opposed to "critical" (using that word differently than > > > > the existing critical sections) would be worth it. > > > > > > It's worth considering, but the definition of "normal" vs. "critical" > > > might be hard to pin down. Or, we might end up with a definition that > > > is specific to this particular case and not generalizable to others. > > > > But it doesn't have to be all or nothing right? I mean each call could say > > what the situation is like in their context, like > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), > > and > > slowly tag calls as needed, similarly to how we add already CFI based on > > users > > report. > > Absolutely. My gut feeling is that it's going to be simpler to pick a > small number of places that are safe and sufficient for this > particular feature and add an extra call there, similar to how we do > vacuum_delay_point(). The reason I think that's likely to be better is > that it will likely require changing only a relatively small number of > places. If we instead start annotating CFIs, well, we've got hundreds > of those. That's a lot more to change, and it also inconveniences > third-party extension authors and people doing back-patching. I'm not > here to say it can't work; I just think it's likely not the easiest > path. Yes, I suspect it's not the easiest path. I have a small hunch it might end up paying more dividends in the future: there isn't just one of these things that is regularly a thorny discussion for the same reasons each time (basically "no way to trigger this safely from another backend interrupting another one at an arbitrary point"), and if we were able to generalize a solution we may have multiple wins (a very obvious one in my mind is the inability of auto explain to run an explain at the precise time it's most useful: when statement timeout fires). But it's also possible there are simply ways that get us more than this scenario also, so I might be wrong; it's merely a hunch. Regards, James Coleman
Re: RFC: Logging plan of the running query
On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud wrote: > On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote: > > On Thu, Feb 22, 2024 at 6:25 AM James Coleman wrote: > > > This is potentially a bit of a wild idea, but I wonder if having some > > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > > > "normal" as opposed to "critical" (using that word differently than > > > the existing critical sections) would be worth it. > > > > It's worth considering, but the definition of "normal" vs. "critical" > > might be hard to pin down. Or, we might end up with a definition that > > is specific to this particular case and not generalizable to others. > > But it doesn't have to be all or nothing right? I mean each call could say > what the situation is like in their context, like > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and > slowly tag calls as needed, similarly to how we add already CFI based on users > report. Absolutely. My gut feeling is that it's going to be simpler to pick a small number of places that are safe and sufficient for this particular feature and add an extra call there, similar to how we do vacuum_delay_point(). The reason I think that's likely to be better is that it will likely require changing only a relatively small number of places. If we instead start annotating CFIs, well, we've got hundreds of those. That's a lot more to change, and it also inconveniences third-party extension authors and people doing back-patching. I'm not here to say it can't work; I just think it's likely not the easiest path. -- Robert Haas EDB: http://www.enterprisedb.com
Re: RFC: Logging plan of the running query
Hi, On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote: > On Thu, Feb 22, 2024 at 6:25 AM James Coleman wrote: > > This is potentially a bit of a wild idea, but I wonder if having some > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > > "normal" as opposed to "critical" (using that word differently than > > the existing critical sections) would be worth it. > > It's worth considering, but the definition of "normal" vs. "critical" > might be hard to pin down. Or, we might end up with a definition that > is specific to this particular case and not generalizable to others. But it doesn't have to be all or nothing right? I mean each call could say what the situation is like in their context, like CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and slowly tag calls as needed, similarly to how we add already CFI based on users report.
Re: RFC: Logging plan of the running query
On Thu, Feb 22, 2024 at 6:25 AM James Coleman wrote: > This is potentially a bit of a wild idea, but I wonder if having some > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > "normal" as opposed to "critical" (using that word differently than > the existing critical sections) would be worth it. It's worth considering, but the definition of "normal" vs. "critical" might be hard to pin down. Or, we might end up with a definition that is specific to this particular case and not generalizable to others. -- Robert Haas EDB: http://www.enterprisedb.com
Re: RFC: Logging plan of the running query
On Thu, Feb 22, 2024 at 6:25 AM James Coleman wrote: > > > > ...I think the current approach is just plain dead, because of this > > issue. We can't take an approach that creates an unbounded number of > > unclear reentrancy issues and then insert hacks one by one to cure > > them (or hack around them, more to the point) as they're discovered. > > > > The premise has to be that we only allow logging the query plan at > > points where we know it's safe, rather than, as at present, allowing > > it in places that are unsafe and then trying to compensate with code > > elsewhere. That's not likely to ever be as stable as we want > > PostgreSQL to be. > > This is potentially a bit of a wild idea, but I wonder if having some > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > "normal" as opposed to "critical" (using that word differently than > the existing critical sections) would be worth it. My hunch is this will end up being a maintenance burden since every caller has to decide (carefully) whether the call is under normal condition or not. Developers will tend to take a safe approach and flag calls as critical. But importantly, what's normal for one interrupt action may be critical for another and vice versa. Approach would be useful depending upon how easy it is to comprehend the definition of "normal". If a query executes for longer than a user defined threashold (session level GUC? or same value as auto_explain parameter), the executor proactively prepares an EXPLAIN output and keeps it handy in case asked for. It can do so at a "known" safe place and time rather than at any random time and location. Extra time spent in creating EXPLAIN output may not be noticeable in a long running query. The EXPLAIN output could be saved in pg_stat_activity or similar place. This will avoid signaling the backend. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On Mon, Feb 19, 2024 at 11:53 PM Robert Haas wrote: > > On Fri, Feb 16, 2024 at 12:29 AM Andres Freund wrote: > > If we went with something like tht approach, I think we'd have to do > > something > > like redirecting node->ExecProcNode to a wrapper, presumably from within a > > CFI. That wrapper could then implement the explain support, without slowing > > down the normal execution path. > > That's an annoying complication; maybe there's some better way to > handle this. But I think we need to do something different than what > the patch does currently because... > > > > It's really hard for me to accept that the heavyweight lock problem > > > for which the patch contains a workaround is the only one that exists. > > > I can't see any reason why that should be true. > > > > I suspect you're right. > > ...I think the current approach is just plain dead, because of this > issue. We can't take an approach that creates an unbounded number of > unclear reentrancy issues and then insert hacks one by one to cure > them (or hack around them, more to the point) as they're discovered. > > The premise has to be that we only allow logging the query plan at > points where we know it's safe, rather than, as at present, allowing > it in places that are unsafe and then trying to compensate with code > elsewhere. That's not likely to ever be as stable as we want > PostgreSQL to be. This is potentially a bit of a wild idea, but I wonder if having some kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in "normal" as opposed to "critical" (using that word differently than the existing critical sections) would be worth it. Regards, James Coleman
Re: RFC: Logging plan of the running query
On Fri, Feb 16, 2024 at 12:29 AM Andres Freund wrote: > If we went with something like tht approach, I think we'd have to do something > like redirecting node->ExecProcNode to a wrapper, presumably from within a > CFI. That wrapper could then implement the explain support, without slowing > down the normal execution path. That's an annoying complication; maybe there's some better way to handle this. But I think we need to do something different than what the patch does currently because... > > It's really hard for me to accept that the heavyweight lock problem > > for which the patch contains a workaround is the only one that exists. > > I can't see any reason why that should be true. > > I suspect you're right. ...I think the current approach is just plain dead, because of this issue. We can't take an approach that creates an unbounded number of unclear reentrancy issues and then insert hacks one by one to cure them (or hack around them, more to the point) as they're discovered. The premise has to be that we only allow logging the query plan at points where we know it's safe, rather than, as at present, allowing it in places that are unsafe and then trying to compensate with code elsewhere. That's not likely to ever be as stable as we want PostgreSQL to be. -- Robert Haas EDB: http://www.enterprisedb.com
Re: RFC: Logging plan of the running query
On Thu, Feb 15, 2024 at 6:12 PM Robert Haas wrote: Hi, I've just been catching up on this thread. + if (MyProc->heldLocks) + { + ereport(LOG_SERVER_ONLY, + errmsg("ignored request for logging query plan due to lock conflicts"), + errdetail("You can try again in a moment.")); + return; + } I don't like this for several reasons. First, I think it's not nice to have a request just get ignored. A user will expect that if we cannot immediately respond to some request, we'll respond later at the first time that it's safe to do so, rather than just ignoring it and telling them to retry. Second, I don't think that the error message is very good. It talks about lock conflicts, but we don't know that there is any real problem. We know that, if we enter this block, the server is in the middle of trying to acquire some lock, and we also know that we could attempt to acquire a lock as part of generating the EXPLAIN output, and maybe that's an issue. But that's not a lock conflict. That's a re-entrancy problem. I don't know that we want to talk about re-entrancy problems in an error message, and I don't really think this error message should exist in the first place, but if we're going to error out in this case surely we shouldn't do so with an error message that describes a problem we don't have. Third, I think that the re-entrancy problems with this patch may extend well beyond lock acquisition. This is one example of how it can be unsafe to do something as complicated as EXPLAIN at any arbitrary CHECK_FOR_INTERRUPTS(). It is not correct to say, as http://postgr.es/m/caaaqye9euuzd8bkjxtvcd9e4n5c7kzhzcvucjxt-xds9x4c...@mail.gmail.com does, that the problems with running EXPLAIN at an arbitrary point are specific to running this code in an aborted transaction. The existence of this code shows that there is at least one hazard even if the current transaction is not aborted, and I see no analysis on this thread indicating whether there are any more such hazards, or how we could go about finding them all. I think the issue is very general. We have lots of subsystems that both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS(). If we process an interrupt while that code is in the middle of manipulating its global variables and then again re-enter that code, bad things might happen. We could try to rule that out by analyzing all such subsystems and all places where CHECK_FOR_INTERRUPTS() may appear, but that's very difficult. Suppose we took the alternative approach recommended by Andrey Lepikhov in http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b5...@app.fastmail.com and instead set a flag that gets handled in some suitable place in the executor code, like ExecProcNode(). If we did that, then we would know that we're not in the middle of a syscache lookup, a catcache lookup, a heavyweight lock acquisition, an ereport, or any other low-level subsystem call that might create problems for the EXPLAIN code. In that design, the hack shown above would go away, and we could be much more certain that we don't need any other similar hacks for other subsystems. The only downside is that we might experience a slightly longer delay before a requested EXPLAIN plan actually shows up, but that seems like a pretty small price to pay for being able to reason about the behavior of the system. I don't *think* there are any cases where we run in the executor for a particularly long time without a new call to ExecProcNode(). I think this case is a bit like vacuum_delay_point(). You might think that vacuum_delay_point() could be moved inside of CHECK_FOR_INTERRUPTS(), but we've made the opposite decision: every vacuum_delay_point() calls CHECK_FOR_INTERRUPTS() but not every CHECK_FOR_INTERRUPTS() calls vacuum_delay_point(). That means that we can allow vacuum_delay_point() only at cases where we know it's safe, rather than at every CHECK_FOR_INTERRUPTS(). I think that's a pretty smart decision, even for vacuum_delay_point(), and AFAICS the code we're proposing to run here does things that are substantially more complicated than what vacuum_delay_point() does. That code just does a bit of reading of shared memory, reports a wait event, and sleeps. That's really simple compared to code that could try to do relcache builds, which can trigger syscache lookups, which can both trigger heavyweight lock acquisition, lightweight lock acquisition, bringing pages into shared_buffers possibly through physical I/O, processing of invalidation messages, and a bunch of other stuff. It's really hard for me to accept that the heavyweight lock problem for which the patch contains a workaround is the only one that exists. I can't see any reason why that should be true. Thanks for the review and the very detailed explanation! I'm convinced that it's unsafe to execute EXPLAIN codes during CHECK_FOR_INTERRUPTS() and we need to execute it in other safe place, as well as the first and second point. On 2024-02-16 03:59, Andres
Re: RFC: Logging plan of the running query
Hi, On 2024-02-15 14:42:11 +0530, Robert Haas wrote: > I think the issue is very general. We have lots of subsystems that > both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS(). > If we process an interrupt while that code is in the middle of > manipulating its global variables and then again re-enter that code, > bad things might happen. We could try to rule that out by analyzing > all such subsystems and all places where CHECK_FOR_INTERRUPTS() may > appear, but that's very difficult. Suppose we took the alternative > approach recommended by Andrey Lepikhov in > http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b5...@app.fastmail.com > and instead set a flag that gets handled in some suitable place in the > executor code, like ExecProcNode(). If we did that, then we would know > that we're not in the middle of a syscache lookup, a catcache lookup, > a heavyweight lock acquisition, an ereport, or any other low-level > subsystem call that might create problems for the EXPLAIN code. > > In that design, the hack shown above would go away, and we could be > much more certain that we don't need any other similar hacks for other > subsystems. The only downside is that we might experience a slightly > longer delay before a requested EXPLAIN plan actually shows up, but > that seems like a pretty small price to pay for being able to reason > about the behavior of the system. I am very wary of adding overhead to ExecProcNode() - I'm quite sure that adding code there would trigger visible overhead for query times. If we went with something like tht approach, I think we'd have to do something like redirecting node->ExecProcNode to a wrapper, presumably from within a CFI. That wrapper could then implement the explain support, without slowing down the normal execution path. > I don't *think* there are any cases where we run in the executor for a > particularly long time without a new call to ExecProcNode(). I guess it depends on what you call a long time. A large sort, for example, could spend a fair amount of time inside tuplesort, similarly, a gather node might need to wait for a worker for a while etc. > It's really hard for me to accept that the heavyweight lock problem > for which the patch contains a workaround is the only one that exists. > I can't see any reason why that should be true. I suspect you're right. Greetings, Andres Freund
Re: RFC: Logging plan of the running query
Hi, I've just been catching up on this thread. + if (MyProc->heldLocks) + { + ereport(LOG_SERVER_ONLY, + errmsg("ignored request for logging query plan due to lock conflicts"), + errdetail("You can try again in a moment.")); + return; + } I don't like this for several reasons. First, I think it's not nice to have a request just get ignored. A user will expect that if we cannot immediately respond to some request, we'll respond later at the first time that it's safe to do so, rather than just ignoring it and telling them to retry. Second, I don't think that the error message is very good. It talks about lock conflicts, but we don't know that there is any real problem. We know that, if we enter this block, the server is in the middle of trying to acquire some lock, and we also know that we could attempt to acquire a lock as part of generating the EXPLAIN output, and maybe that's an issue. But that's not a lock conflict. That's a re-entrancy problem. I don't know that we want to talk about re-entrancy problems in an error message, and I don't really think this error message should exist in the first place, but if we're going to error out in this case surely we shouldn't do so with an error message that describes a problem we don't have. Third, I think that the re-entrancy problems with this patch may extend well beyond lock acquisition. This is one example of how it can be unsafe to do something as complicated as EXPLAIN at any arbitrary CHECK_FOR_INTERRUPTS(). It is not correct to say, as http://postgr.es/m/caaaqye9euuzd8bkjxtvcd9e4n5c7kzhzcvucjxt-xds9x4c...@mail.gmail.com does, that the problems with running EXPLAIN at an arbitrary point are specific to running this code in an aborted transaction. The existence of this code shows that there is at least one hazard even if the current transaction is not aborted, and I see no analysis on this thread indicating whether there are any more such hazards, or how we could go about finding them all. I think the issue is very general. We have lots of subsystems that both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS(). If we process an interrupt while that code is in the middle of manipulating its global variables and then again re-enter that code, bad things might happen. We could try to rule that out by analyzing all such subsystems and all places where CHECK_FOR_INTERRUPTS() may appear, but that's very difficult. Suppose we took the alternative approach recommended by Andrey Lepikhov in http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b5...@app.fastmail.com and instead set a flag that gets handled in some suitable place in the executor code, like ExecProcNode(). If we did that, then we would know that we're not in the middle of a syscache lookup, a catcache lookup, a heavyweight lock acquisition, an ereport, or any other low-level subsystem call that might create problems for the EXPLAIN code. In that design, the hack shown above would go away, and we could be much more certain that we don't need any other similar hacks for other subsystems. The only downside is that we might experience a slightly longer delay before a requested EXPLAIN plan actually shows up, but that seems like a pretty small price to pay for being able to reason about the behavior of the system. I don't *think* there are any cases where we run in the executor for a particularly long time without a new call to ExecProcNode(). I think this case is a bit like vacuum_delay_point(). You might think that vacuum_delay_point() could be moved inside of CHECK_FOR_INTERRUPTS(), but we've made the opposite decision: every vacuum_delay_point() calls CHECK_FOR_INTERRUPTS() but not every CHECK_FOR_INTERRUPTS() calls vacuum_delay_point(). That means that we can allow vacuum_delay_point() only at cases where we know it's safe, rather than at every CHECK_FOR_INTERRUPTS(). I think that's a pretty smart decision, even for vacuum_delay_point(), and AFAICS the code we're proposing to run here does things that are substantially more complicated than what vacuum_delay_point() does. That code just does a bit of reading of shared memory, reports a wait event, and sleeps. That's really simple compared to code that could try to do relcache builds, which can trigger syscache lookups, which can both trigger heavyweight lock acquisition, lightweight lock acquisition, bringing pages into shared_buffers possibly through physical I/O, processing of invalidation messages, and a bunch of other stuff. It's really hard for me to accept that the heavyweight lock problem for which the patch contains a workaround is the only one that exists. I can't see any reason why that should be true. ...Robert
Re: RFC: Logging plan of the running query
On 2024-02-13 11:30, torikoshia wrote: I'll update the patch including other points such as removing ensuring no page lock code. Updated the patch. Using injection point support we should be able to add tests for testing pg_log_query_plan behaviour when there are page locks held or when auto_explain (with instrumentation) and pg_log_query_plan() work on the same query plan. Use injection point to make the backend running query wait at a suitable point to delay its execution and fire pg_log_query_plan() from other backend. May be the same test could examine the server log file to see if the plan is indeed output to the server log file. Attached patch uses injection point as below: - There may be more points to inject, but added an injection point at ExecutorRun(), which seems to be the first interruption point where plans can be reliably displayed. - At injection point, it'd be possible to wait for some duration and fire pg_log_plan_query() as you suggested. However, I'm not sure how long duration is appropriate considering the variety of testing environments. Instead, attached patch calls HandleLogQueryPlanInterrupt() directly and set InterruptPending. - Tests both pg_log_plan_query() and auto_explain logs for their output, and the logged plans are the same. -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 1dcac4fb4291e3b92733494624cbb090dff7aded Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Wed, 14 Feb 2024 14:41:04 +0900 Subject: [PATCH v36] Add function to log the plan of the query Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro Horiguchi, Robert Treat, Alena Rybakina, Ashutosh Bapat, Jian He Co-authored-by: James Coleman Discussion: https://www.postgresql.org/message-id/cf8501bcd95ba4d727cbba886ba9eea8%40oss.nttdata.com Discussion: https://www.postgresql.org/message-id/flat/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru --- contrib/auto_explain/Makefile | 2 + contrib/auto_explain/auto_explain.c | 23 +-- contrib/auto_explain/t/001_auto_explain.pl| 35 doc/src/sgml/func.sgml| 50 + src/backend/access/transam/xact.c | 17 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c| 180 +- src/backend/executor/execMain.c | 19 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 4 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h| 9 + src/include/miscadmin.h | 1 + src/include/storage/lock.h| 2 - src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h | 2 +- src/include/utils/elog.h | 1 + .../injection_points/injection_points.c | 11 ++ src/test/regress/expected/misc_functions.out | 54 +- src/test/regress/sql/misc_functions.sql | 41 +++- 22 files changed, 427 insertions(+), 48 deletions(-) diff --git a/contrib/auto_explain/Makefile b/contrib/auto_explain/Makefile index efd127d3ca..64fe0e0573 100644 --- a/contrib/auto_explain/Makefile +++ b/contrib/auto_explain/Makefile @@ -8,6 +8,8 @@ PGFILEDESC = "auto_explain - logging facility for execution plans" TAP_TESTS = 1 +EXTRA_INSTALL = src/test/modules/injection_points + ifdef USE_PGXS PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index 677c135f59..e041b10b0e 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -401,26 +401,9 @@ explain_ExecutorEnd(QueryDesc *queryDesc) es->format = auto_explain_log_format; es->settings = auto_explain_log_settings; - ExplainBeginOutput(es); - ExplainQueryText(es, queryDesc); - ExplainQueryParameters(es, queryDesc->params, auto_explain_log_parameter_max_length); - ExplainPrintPlan(es,
Re: RFC: Logging plan of the running query
On 2024-02-12 09:00, jian he wrote: Thanks for you comments. On Mon, Jan 29, 2024 at 9:02 PM torikoshia wrote: Hi, Updated the patch to fix typos and move ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction. + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID. +It will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See linkend="runtime-config-logging"/> +for more information), but will not be sent to the client +regardless of . + + it would be better to explain the meaning of return value TRUE/FALSE? Yeah, but I've noticed that this should be located in 'Table Server Signaling Functions' not 'Table Control Data Functions'. Since 'Table Server Signaling Functions' describes the return code as below, just relocation seems fine. Each of these functions returns true if the signal was successfully sent and false if sending the signal failed. +# logging plan of the running query on the specified backend +{ oid => '8000', descr => 'log plan of the running query on the specified backend', + proname => 'pg_log_query_plan', + provolatile => 'v', prorettype => 'bool', you can add `proargnames => '{pid}'` Hmm, pg_log_query_plan() can take one argument, I'm not sure how much sense it makes. Other functions which take one argument such as pg_cancel_backend() does not have proargnames. + if (proc == NULL) + { + /* + * This is just a warning so a loop-through-resultset will not abort + * if one backend terminated on its own during the run. + */ + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL backend process", pid))); + PG_RETURN_BOOL(false); + } + + be_status = pgstat_get_beentry_by_backend_id(proc->backendId); + if (be_status->st_backendType != B_BACKEND) + { + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL client backend process", pid))); + PG_RETURN_BOOL(false); + } + + if (SendProcSignal(pid, PROCSIG_LOG_QUERY_PLAN, proc->backendId) < 0) + { + /* Again, just a warning to allow loops */ + ereport(WARNING, + (errmsg("could not send signal to process %d: %m", pid))); + PG_RETURN_BOOL(false); + } I found out that pg_log_query_plan's comments look like pg_log_backend_memory_contexts. pg_log_backend_memory_contexts will iterate through many memory contexts. but pg_log_query_plan for one specific pid will only output one plan? so I am a little bit confused by the comments. These "loop" mean backend can run pg_log_query_plan() repeatedly even when failing sending signals. pg_signal_backend() also have such comments. + /* + * Ensure no page lock is held on this process. + * + * If page lock is held at the time of the interrupt, we can't acquire any + * other heavyweight lock, which might be necessary for explaining the plan + * when retrieving column names. + * + * This may be overkill, but since page locks are held for a short duration + * we check all the LocalLock entries and when finding even one, give up + * logging the plan. + */ + hash_seq_init(, GetLockMethodLocalHash()); + while ((locallock = (LOCALLOCK *) hash_seq_search()) != NULL) + { + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) maybe not that self evident, the above comments still not explained why we need to ensure only PAGE lock was held on this process? This is for preventing assertion error and it seems not necessary anymore as described in [1]. I'm going remove them. In the commit message, can you add all the discussion links? My gmail account doesn't have a previous discussion history. Sure. I am not sure this (https://www.postgresql.org/message-id/flat/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru) is the only discussion link? This is the original one: https://www.postgresql.org/message-id/cf8501bcd95ba4d727cbba886ba9eea8%40oss.nttdata.com I found a bug: src8=# select *, pg_sleep(10) from tenk1 for update; 2024-02-11 15:54:17.944 CST [48602] LOG: query plan running on backend with PID 48602 is: Query Text: select *, pg_sleep(10) from tenk1 for update; LockRows (cost=0.00..570.00 rows=1 width=254) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4, (pg_sleep('10'::double precision)), ctid -> Seq Scan on public.tenk1 (cost=0.00..470.00 rows=1 width=254) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid another session (PID) executes `SELECT pg_log_query_plan(48602);` in the meantime. pg_log_query_plan returns true successfully, but PID 48602 was
Re: RFC: Logging plan of the running query
On 2024-02-07 19:14, torikoshia wrote: On 2024-02-07 13:58, Ashutosh Bapat wrote: The prologue refers to a very populated lock hash table. I think that will happen if thousands of tables are queried in a single query OR a query runs on a partitioned table with thousands of partitions. May be we want to try that scenario. OK, I'll try such cases. I measured this using partitioned pgbench_accounts with some modification to v36[1]. The results[2] show that CPU time increases in proportion to the number of partitions, and the increase is not that large. However I've noticed that these ensuring no page lock logic would not be necessary anymore since cc32ec24fdf3b98 removed the assertion which caused an error[1]. $ git show cc32ec24fdf3b98 .. diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 0a692ee0a6..f595bce31b 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -186,18 +186,6 @@ static int FastPathLocalUseCount = 0; */ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false; - /* -* We don't acquire any other heavyweight lock while holding the page lock -* except for relation extension. -*/ - Assert(!IsPageLockHeld || - (locktag->locktag_type == LOCKTAG_RELATION_EXTEND)); I'm going to remove ensuring no page lock logic after some testings. [1] $ git diff _submission/log_running_query-v36 +#include "time.h" + bool ProcessLogQueryPlanInterruptActive = false; /* Hook for plugins to get control in ExplainOneQuery() */ @@ -5258,6 +5260,10 @@ ProcessLogQueryPlanInterrupt(void) MemoryContext old_cxt; LogQueryPlanPending = false; + clock_t start, end; + double cpu_time_used; + int num_hash_entry = 0; + /* Cannot re-enter. */ if (ProcessLogQueryPlanInterruptActive) return; @@ -5287,9 +5293,11 @@ ProcessLogQueryPlanInterrupt(void) * we check all the LocalLock entries and when finding even one, give up * logging the plan. */ + start = clock(); hash_seq_init(, GetLockMethodLocalHash()); while ((locallock = (LOCALLOCK *) hash_seq_search()) != NULL) { + num_hash_entry++; if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) { ereport(LOG_SERVER_ONLY, @@ -5301,6 +5309,12 @@ ProcessLogQueryPlanInterrupt(void) return; } } + end = clock(); + cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; + + ereport(LOG, + errmsg("locallock entry search took: %f for %d entries", cpu_time_used, num_hash_entry)); [2] # partition number: 512 locallock entry search took: 0.29 for 1026 entries locallock entry search took: 0.30 for 1026 entries locallock entry search took: 0.36 for 1026 entries # partition number: 1024 locallock entry search took: 0.70 for 2050 entries locallock entry search took: 0.59 for 2050 entries locallock entry search took: 0.49 for 2050 entries # partition number: 2048 locallock entry search took: 0.000100 for 4098 entries locallock entry search took: 0.000103 for 4098 entries locallock entry search took: 0.000101 for 4098 entries # partition number: 4096 locallock entry search took: 0.000197 for 8194 entries locallock entry search took: 0.000193 for 8194 entries locallock entry search took: 0.000192 for 8194 entries [3] https://www.postgresql.org/message-id/0642712f-1298-960a-a3ba-e256d56040ac%40oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
On Mon, Feb 12, 2024 at 12:42 PM Ashutosh Bapat wrote: > > On Mon, Feb 12, 2024 at 5:31 AM jian he wrote: > > > > On Wed, Feb 7, 2024 at 12:58 PM Ashutosh Bapat > > wrote: > > > > > > > > > > > > */ > > > > > How bad this performance could be. Let's assume that a query is taking > > > > > time and pg_log_query_plan() is invoked to examine the plan of this > > > > > query. Is it possible that the looping over all the locks itself takes > > > > > a lot of time delaying the query execution further? > > > > > > corner case test: > > pgbench --initialize --partition-method=range --partitions=2 > > Somehow my setup, the pg_bench didn't populate the data but there are > > 2 partitions there. > > (all my other settings are default) > > > > some interesting things happened when a query touch so many partitions like: > > select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1; > > > > in another session, if you immediate call SELECT pg_log_query_plan(9482); > > then output be > > ` > > LOG: backend with PID 9482 is not running a query or a subtransaction > > is aborted > > ` > > however if you delay a little bit of time (like 1 second), then > > LOG will emit the plan with lots of text (not sure the plan is right). > > > > I think the reason is that the `InitPlan` within > > standard_ExecutorStart takes more time to finish > > when your query touches a lot of partitions. > > That's probably expected unless we make the ActiveQueryDesc available > before ExecutorRun. so the error message should be something like: errmsg("backend with PID %d is not running a query or a subtransaction is aborted or the plan is not generated", > > How much time did it took between issuing SELECT > pg_log_query_plan(9482); and plan getting output to the server error > logs? it either says errmsg("backend with PID %d is not running a query,.) or outputs the plan immediately, if i wait one or two seconds to call pg_log_query_plan. because of previously mentioned: with lots of partitions, initplan took longer to finish. setup: 2 sessions, one runs the query (select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1;), another one calls pg_log_query_plan.
Re: RFC: Logging plan of the running query
On Mon, Feb 12, 2024 at 5:31 AM jian he wrote: > > On Wed, Feb 7, 2024 at 12:58 PM Ashutosh Bapat > wrote: > > > > > > > > > */ > > > > How bad this performance could be. Let's assume that a query is taking > > > > time and pg_log_query_plan() is invoked to examine the plan of this > > > > query. Is it possible that the looping over all the locks itself takes > > > > a lot of time delaying the query execution further? > > > > corner case test: > pgbench --initialize --partition-method=range --partitions=2 > Somehow my setup, the pg_bench didn't populate the data but there are > 2 partitions there. > (all my other settings are default) > > some interesting things happened when a query touch so many partitions like: > select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1; > > in another session, if you immediate call SELECT pg_log_query_plan(9482); > then output be > ` > LOG: backend with PID 9482 is not running a query or a subtransaction > is aborted > ` > however if you delay a little bit of time (like 1 second), then > LOG will emit the plan with lots of text (not sure the plan is right). > > I think the reason is that the `InitPlan` within > standard_ExecutorStart takes more time to finish > when your query touches a lot of partitions. That's probably expected unless we make the ActiveQueryDesc available before ExecutorRun. How much time did it took between issuing SELECT pg_log_query_plan(9482); and plan getting output to the server error logs? How does this time compare with say the same time difference for a simple query and how much of that time can be attributed to Lock table hash scan, if the difference between time difference is huge. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On Mon, Feb 12, 2024 at 5:31 AM jian he wrote: > > I found a bug: > src8=# select *, pg_sleep(10) from tenk1 for update; > 2024-02-11 15:54:17.944 CST [48602] LOG: query plan running on > backend with PID 48602 is: > Query Text: select *, pg_sleep(10) from tenk1 for update; > LockRows (cost=0.00..570.00 rows=1 width=254) > Output: unique1, unique2, two, four, ten, twenty, hundred, > thousand, twothousand, fivethous, tenthous, odd, even, stringu1, > stringu2, string4, (pg_sleep('10'::double precision)), ctid > -> Seq Scan on public.tenk1 (cost=0.00..470.00 rows=1 > width=254) > Output: unique1, unique2, two, four, ten, twenty, > hundred, thousand, twothousand, fivethous, tenthous, odd, even, > stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid > > another session (PID) executes `SELECT pg_log_query_plan(48602);` in > the meantime. > pg_log_query_plan returns true successfully, but PID 48602 was stuck. What do you mean by PID 48602 was stuck? -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On Wed, Feb 7, 2024 at 12:58 PM Ashutosh Bapat wrote: > > > > > > */ > > > How bad this performance could be. Let's assume that a query is taking > > > time and pg_log_query_plan() is invoked to examine the plan of this > > > query. Is it possible that the looping over all the locks itself takes > > > a lot of time delaying the query execution further? > > corner case test: pgbench --initialize --partition-method=range --partitions=2 Somehow my setup, the pg_bench didn't populate the data but there are 2 partitions there. (all my other settings are default) some interesting things happened when a query touch so many partitions like: select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1; in another session, if you immediate call SELECT pg_log_query_plan(9482); then output be ` LOG: backend with PID 9482 is not running a query or a subtransaction is aborted ` however if you delay a little bit of time (like 1 second), then LOG will emit the plan with lots of text (not sure the plan is right). I think the reason is that the `InitPlan` within standard_ExecutorStart takes more time to finish when your query touches a lot of partitions.
Re: RFC: Logging plan of the running query
On Mon, Jan 29, 2024 at 9:02 PM torikoshia wrote: > > Hi, > > Updated the patch to fix typos and move > ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction. > + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID. +It will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + it would be better to explain the meaning of return value TRUE/FALSE? +# logging plan of the running query on the specified backend +{ oid => '8000', descr => 'log plan of the running query on the specified backend', + proname => 'pg_log_query_plan', + provolatile => 'v', prorettype => 'bool', you can add `proargnames => '{pid}'` + if (proc == NULL) + { + /* + * This is just a warning so a loop-through-resultset will not abort + * if one backend terminated on its own during the run. + */ + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL backend process", pid))); + PG_RETURN_BOOL(false); + } + + be_status = pgstat_get_beentry_by_backend_id(proc->backendId); + if (be_status->st_backendType != B_BACKEND) + { + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL client backend process", pid))); + PG_RETURN_BOOL(false); + } + + if (SendProcSignal(pid, PROCSIG_LOG_QUERY_PLAN, proc->backendId) < 0) + { + /* Again, just a warning to allow loops */ + ereport(WARNING, + (errmsg("could not send signal to process %d: %m", pid))); + PG_RETURN_BOOL(false); + } I found out that pg_log_query_plan's comments look like pg_log_backend_memory_contexts. pg_log_backend_memory_contexts will iterate through many memory contexts. but pg_log_query_plan for one specific pid will only output one plan? so I am a little bit confused by the comments. + /* + * Ensure no page lock is held on this process. + * + * If page lock is held at the time of the interrupt, we can't acquire any + * other heavyweight lock, which might be necessary for explaining the plan + * when retrieving column names. + * + * This may be overkill, but since page locks are held for a short duration + * we check all the LocalLock entries and when finding even one, give up + * logging the plan. + */ + hash_seq_init(, GetLockMethodLocalHash()); + while ((locallock = (LOCALLOCK *) hash_seq_search()) != NULL) + { + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) maybe not that self evident, the above comments still not explained why we need to ensure only PAGE lock was held on this process? In the commit message, can you add all the discussion links? My gmail account doesn't have a previous discussion history. I am not sure this (https://www.postgresql.org/message-id/flat/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru) is the only discussion link? I found a bug: src8=# select *, pg_sleep(10) from tenk1 for update; 2024-02-11 15:54:17.944 CST [48602] LOG: query plan running on backend with PID 48602 is: Query Text: select *, pg_sleep(10) from tenk1 for update; LockRows (cost=0.00..570.00 rows=1 width=254) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4, (pg_sleep('10'::double precision)), ctid -> Seq Scan on public.tenk1 (cost=0.00..470.00 rows=1 width=254) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid another session (PID) executes `SELECT pg_log_query_plan(48602);` in the meantime. pg_log_query_plan returns true successfully, but PID 48602 was stuck. I have problem using git apply: error: patch failed: src/include/commands/explain.h:94 error: src/include/commands/explain.h: patch does not apply `patch -p1 < /v35-0001-Add-function-to-log-the-plan-of-the-query.patch` works
Re: RFC: Logging plan of the running query
On 2024-02-07 13:58, Ashutosh Bapat wrote: On Wed, Feb 7, 2024 at 9:38 AM torikoshia wrote: Hi Ashutosh, On 2024-02-06 19:51, Ashutosh Bapat wrote: > Thanks for the summary. It is helpful. I think patch is also getting > better. > > I have a few questions and suggestions Thanks for your comments. > 1. Prologue of GetLockMethodLocalHash() mentions > * NOTE: When there are many entries in LockMethodLocalHash, calling > this > * function and looking into all of them can lead to performance > problems. > */ > How bad this performance could be. Let's assume that a query is taking > time and pg_log_query_plan() is invoked to examine the plan of this > query. Is it possible that the looping over all the locks itself takes > a lot of time delaying the query execution further? I think it depends on the number of local locks, but I've measured cpu time for this page lock check by adding below codes and v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for your information: diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 5f7d77d567..65b7cb4925 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -44,6 +44,8 @@ +#include "time.h" ... @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void) * we check all the LocalLock entries and when finding even one, give up * logging the plan. */ + start = clock(); hash_seq_init(, GetLockMethodLocalHash()); while ((locallock = (LOCALLOCK *) hash_seq_search()) != NULL) { if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) { ereport(LOG_SERVER_ONLY, errmsg("ignored request for logging query plan due to page lock conflicts"), errdetail("You can try again in a moment.")); hash_seq_term(); ProcessLogQueryPlanInterruptActive = false; return; } } + end = clock(); + cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; + + ereport(LOG, + errmsg("all locallock entry search took: %f", cpu_time_used)); + There were about 3 million log lines which recorded the cpu time, and the duration was quite short: =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d. =# select max(d), min(d), avg(d) from cpu_time ; max| min | avg --+-+--- 0.000116 | 0 | 4.706274625332238e-07 I'm not certain that this is valid for actual use cases, but these results seem to suggest that it will not take that long. What load did you run? I don't think any query in make check would take say thousands of locks. Sorry, I forgot to write it but ran make check as you imagined. The prologue refers to a very populated lock hash table. I think that will happen if thousands of tables are queried in a single query OR a query runs on a partitioned table with thousands of partitions. May be we want to try that scenario. OK, I'll try such cases. > 2. What happens if auto_explain is enabled in the backend and > pg_log_query_plan() is called on the same backend? Will they conflict? > I think we should add a test for the same. Hmm, I think they don't conflict since they just refer QueryDesc and don't modify it and don't use same objects for locking. (I imagine 'conflict' here is something like 'hard conflict' in replication[2].) By conflict, I mean the two features behave weird when used together e.g give wrong results or crash etc. Actually using both auto_explain and pg_log_query_plan() output each logs separately: (pid:62835)=# select pg_sleep(10); (pid:7)=# select pg_log_query_plan(62835); (pid:7)=# \! cat data/log/postgres.log ... 2024-02-06 21:44:17.837 JST [62835:4:0] LOG: 0: query plan running on backend with PID 62835 is: Query Text: select pg_sleep(10); Result (cost=0.00..0.01 rows=1 width=4) Output: pg_sleep('10'::double precision) Query Identifier: 3506829283127886044 2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION: ProcessLogQueryPlanInterrupt, explain.c:5336 2024-02-06 21:44:26.974 JST [62835:6:0] LOG: 0: duration: 1.868 ms plan: Query Text: select pg_sleep(10); Result (cost=0.00..0.01 rows=1 width=4) (actual time=1.802..1.804 rows=1 loops=1) > Using injection point support we should be able to add tests for > testing pg_log_query_plan behaviour when there are page locks held or > when auto_explain (with instrumentation) and pg_log_query_plan() work > on the same query plan. Use injection point to make the backend > running query wait at a suitable point to delay its execution and fire > pg_log_query_plan() from other
Re: RFC: Logging plan of the running query
On Wed, Feb 7, 2024 at 9:38 AM torikoshia wrote: > > Hi Ashutosh, > > On 2024-02-06 19:51, Ashutosh Bapat wrote: > > > Thanks for the summary. It is helpful. I think patch is also getting > > better. > > > > I have a few questions and suggestions > > Thanks for your comments. > > > 1. Prologue of GetLockMethodLocalHash() mentions > > * NOTE: When there are many entries in LockMethodLocalHash, calling > > this > > * function and looking into all of them can lead to performance > > problems. > > */ > > How bad this performance could be. Let's assume that a query is taking > > time and pg_log_query_plan() is invoked to examine the plan of this > > query. Is it possible that the looping over all the locks itself takes > > a lot of time delaying the query execution further? > > I think it depends on the number of local locks, but I've measured cpu > time for this page lock check by adding below codes and > v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which > calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for > your information: > >diff --git a/src/backend/commands/explain.c > b/src/backend/commands/explain.c >index 5f7d77d567..65b7cb4925 100644 >--- a/src/backend/commands/explain.c >+++ b/src/backend/commands/explain.c >@@ -44,6 +44,8 @@ > >+#include "time.h" >... >@@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void) > * we check all the LocalLock entries and when finding even > one, give up > * logging the plan. > */ >+ start = clock(); >hash_seq_init(, GetLockMethodLocalHash()); >while ((locallock = (LOCALLOCK *) hash_seq_search()) != > NULL) >{ >if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) >{ >ereport(LOG_SERVER_ONLY, >errmsg("ignored request for logging query plan due > to page lock conflicts"), >errdetail("You can try again in a moment.")); >hash_seq_term(); > >ProcessLogQueryPlanInterruptActive = false; >return; >} >} >+ end = clock(); >+ cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; >+ >+ ereport(LOG, >+ errmsg("all locallock entry search took: %f", > cpu_time_used)); >+ > > There were about 3 million log lines which recorded the cpu time, and > the duration was quite short: > >=# -- Extracted cpu_time_used from log and loaded it to cpu_time.d. >=# select max(d), min(d), avg(d) from cpu_time ; > max| min | avg >--+-+--- > 0.000116 | 0 | 4.706274625332238e-07 > > I'm not certain that this is valid for actual use cases, but these > results seem to suggest that it will not take that long. What load did you run? I don't think any query in make check would take say thousands of locks. The prologue refers to a very populated lock hash table. I think that will happen if thousands of tables are queried in a single query OR a query runs on a partitioned table with thousands of partitions. May be we want to try that scenario. > > > > 2. What happens if auto_explain is enabled in the backend and > > pg_log_query_plan() is called on the same backend? Will they conflict? > > I think we should add a test for the same. > > Hmm, I think they don't conflict since they just refer QueryDesc and > don't modify it and don't use same objects for locking. > (I imagine 'conflict' here is something like 'hard conflict' in > replication[2].) By conflict, I mean the two features behave weird when used together e.g give wrong results or crash etc. > > Actually using both auto_explain and pg_log_query_plan() output each > logs separately: > >(pid:62835)=# select pg_sleep(10); >(pid:7)=# select pg_log_query_plan(62835); > >(pid:7)=# \! cat data/log/postgres.log >... >2024-02-06 21:44:17.837 JST [62835:4:0] LOG: 0: query plan > running on backend with PID 62835 is: > Query Text: select pg_sleep(10); > Result (cost=0.00..0.01 rows=1 width=4) >Output: pg_sleep('10'::double precision) > Query Identifier: 3506829283127886044 >2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION: > ProcessLogQueryPlanInterrupt, explain.c:5336 >2024-02-06 21:44:26.974 JST [62835:6:0] LOG: 0: duration: > 1.868 ms plan: > Query Text: select pg_sleep(10); > Result (cost=0.00..0.01 rows=1 width=4) (actual > time=1.802..1.804 rows=1 loops=1) > > > Using injection point support we should be able to add tests for > > testing pg_log_query_plan behaviour when there are page locks held or > > when auto_explain (with instrumentation) and pg_log_query_plan() work > > on the same query plan. Use injection point to make the backend > > running query wait at a suitable
Re: RFC: Logging plan of the running query
Hi Ashutosh, On 2024-02-06 19:51, Ashutosh Bapat wrote: Thanks for the summary. It is helpful. I think patch is also getting better. I have a few questions and suggestions Thanks for your comments. 1. Prologue of GetLockMethodLocalHash() mentions * NOTE: When there are many entries in LockMethodLocalHash, calling this * function and looking into all of them can lead to performance problems. */ How bad this performance could be. Let's assume that a query is taking time and pg_log_query_plan() is invoked to examine the plan of this query. Is it possible that the looping over all the locks itself takes a lot of time delaying the query execution further? I think it depends on the number of local locks, but I've measured cpu time for this page lock check by adding below codes and v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for your information: diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 5f7d77d567..65b7cb4925 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -44,6 +44,8 @@ +#include "time.h" ... @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void) * we check all the LocalLock entries and when finding even one, give up * logging the plan. */ + start = clock(); hash_seq_init(, GetLockMethodLocalHash()); while ((locallock = (LOCALLOCK *) hash_seq_search()) != NULL) { if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) { ereport(LOG_SERVER_ONLY, errmsg("ignored request for logging query plan due to page lock conflicts"), errdetail("You can try again in a moment.")); hash_seq_term(); ProcessLogQueryPlanInterruptActive = false; return; } } + end = clock(); + cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; + + ereport(LOG, + errmsg("all locallock entry search took: %f", cpu_time_used)); + There were about 3 million log lines which recorded the cpu time, and the duration was quite short: =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d. =# select max(d), min(d), avg(d) from cpu_time ; max| min | avg --+-+--- 0.000116 | 0 | 4.706274625332238e-07 I'm not certain that this is valid for actual use cases, but these results seem to suggest that it will not take that long. 2. What happens if auto_explain is enabled in the backend and pg_log_query_plan() is called on the same backend? Will they conflict? I think we should add a test for the same. Hmm, I think they don't conflict since they just refer QueryDesc and don't modify it and don't use same objects for locking. (I imagine 'conflict' here is something like 'hard conflict' in replication[2].) Actually using both auto_explain and pg_log_query_plan() output each logs separately: (pid:62835)=# select pg_sleep(10); (pid:7)=# select pg_log_query_plan(62835); (pid:7)=# \! cat data/log/postgres.log ... 2024-02-06 21:44:17.837 JST [62835:4:0] LOG: 0: query plan running on backend with PID 62835 is: Query Text: select pg_sleep(10); Result (cost=0.00..0.01 rows=1 width=4) Output: pg_sleep('10'::double precision) Query Identifier: 3506829283127886044 2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION: ProcessLogQueryPlanInterrupt, explain.c:5336 2024-02-06 21:44:26.974 JST [62835:6:0] LOG: 0: duration: 1.868 ms plan: Query Text: select pg_sleep(10); Result (cost=0.00..0.01 rows=1 width=4) (actual time=1.802..1.804 rows=1 loops=1) Using injection point support we should be able to add tests for testing pg_log_query_plan behaviour when there are page locks held or when auto_explain (with instrumentation) and pg_log_query_plan() work on the same query plan. Use injection point to make the backend running query wait at a suitable point to delay its execution and fire pg_log_query_plan() from other backend. May be the same test could examine the server log file to see if the plan is indeed output to the server log file. Given that the feature will be used when the things have already gone wrong, it should not make things more serious. So more testing and especially automated would help. Thanks for the advice, it seems a good idea. I'm going to try to add tests using injection point. [1] https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com [2] https://www.postgresql.org/docs/devel/hot-standby.html#HOT-STANDBY-CONFLICT -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
Hi Atsushi, On Mon, Jan 29, 2024 at 6:32 PM torikoshia wrote: > > Hi, > > Updated the patch to fix typos and move > ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction. > > > BTW since the thread is getting long, I list the some points of the > discussion so far: > > # Safety concern > ## Catalog access inside CFI > - it seems safe if the CFI call is inside an existing valid > transaction/query state[1] > > - We did some tests, for example calling ProcessLogQueryPlanInterrupt() > in every single CHECK_FOR_INTERRUPTS()[2]. This test passed on my env > but was stucked on James's env, so I modified to exit > ProcessLogQueryPlanInterrupt() when target process is inside of lock > acquisition code[3] > > ## Risk of calling EXPLAIN code in CFI > - EXPLAIN is not a simple logic code, and there exists risk calling it > from CFI. For example, if there is a bug, we may find ourselves in a > situation where we can't cancel the query > > - it's a trade-off that's worth making for the introspection benefits > this patch would provide?[4] > > # Design > - Although some suggested it should be in auto_explain, current patch > introduces this feature to the core[5] > > - When the target query is nested, only the most inner query's plan is > explained. In future, all the nested queries' plans are expected to > explained optionally like auto_explain.log_nested_statements[6] > > - When the target process is a parallel worker, the plan is not shown[6] > > - When the target query is nested and its subtransaction is aborted, > pg_log_query_plan cannot log the parental query plan after the abort > even parental query is running[7] > > - The output corresponds to EXPLAIN with VERBOSE, COST, SETTINGS and > FORMAT text. It doesn't do ANALYZE or show the progress of the query > execution. Future work proposed by Rafael Thofehrn Castro may realize > this[8] > > - To prevent assertion error, this patch ensures no page lock is held by > checking all the LocalLock entries before running explain code, but > there is a discussion that ginInsertCleanup() should be modified[9] > > > It may be not so difficult to improve some of restrictions in "Design", > but I'd like to limit the scope of the 1st patch to make it simpler. Thanks for the summary. It is helpful. I think patch is also getting better. I have a few questions and suggestions 1. Prologue of GetLockMethodLocalHash() mentions * NOTE: When there are many entries in LockMethodLocalHash, calling this * function and looking into all of them can lead to performance problems. */ How bad this performance could be. Let's assume that a query is taking time and pg_log_query_plan() is invoked to examine the plan of this query. Is it possible that the looping over all the locks itself takes a lot of time delaying the query execution further? 2. What happens if auto_explain is enabled in the backend and pg_log_query_plan() is called on the same backend? Will they conflict? I think we should add a test for the same. Using injection point support we should be able to add tests for testing pg_log_query_plan behaviour when there are page locks held or when auto_explain (with instrumentation) and pg_log_query_plan() work on the same query plan. Use injection point to make the backend running query wait at a suitable point to delay its execution and fire pg_log_query_plan() from other backend. May be the same test could examine the server log file to see if the plan is indeed output to the server log file. Given that the feature will be used when the things have already gone wrong, it should not make things more serious. So more testing and especially automated would help. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
Hi, Updated the patch to fix typos and move ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction. BTW since the thread is getting long, I list the some points of the discussion so far: # Safety concern ## Catalog access inside CFI - it seems safe if the CFI call is inside an existing valid transaction/query state[1] - We did some tests, for example calling ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS()[2]. This test passed on my env but was stucked on James's env, so I modified to exit ProcessLogQueryPlanInterrupt() when target process is inside of lock acquisition code[3] ## Risk of calling EXPLAIN code in CFI - EXPLAIN is not a simple logic code, and there exists risk calling it from CFI. For example, if there is a bug, we may find ourselves in a situation where we can't cancel the query - it's a trade-off that's worth making for the introspection benefits this patch would provide?[4] # Design - Although some suggested it should be in auto_explain, current patch introduces this feature to the core[5] - When the target query is nested, only the most inner query's plan is explained. In future, all the nested queries' plans are expected to explained optionally like auto_explain.log_nested_statements[6] - When the target process is a parallel worker, the plan is not shown[6] - When the target query is nested and its subtransaction is aborted, pg_log_query_plan cannot log the parental query plan after the abort even parental query is running[7] - The output corresponds to EXPLAIN with VERBOSE, COST, SETTINGS and FORMAT text. It doesn't do ANALYZE or show the progress of the query execution. Future work proposed by Rafael Thofehrn Castro may realize this[8] - To prevent assertion error, this patch ensures no page lock is held by checking all the LocalLock entries before running explain code, but there is a discussion that ginInsertCleanup() should be modified[9] It may be not so difficult to improve some of restrictions in "Design", but I'd like to limit the scope of the 1st patch to make it simpler. [1] https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com [3] https://www.postgresql.org/message-id/0e0e7ca08dff077a625c27a5e0c2ef0a%40oss.nttdata.com [4] https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com [5] https://www.postgresql.org/message-id/CAAaqYe_1EuoTudAz8mr8-qtN5SoNtYRm4JM2J9CqeverpE3B2A%40mail.gmail.com [6] https://www.postgresql.org/message-id/CAExHW5sh4ahrJgmMAGfptWVmESt1JLKCNm148XVxTunRr%2B-6gA%40mail.gmail.com [7] https://www.postgresql.org/message-id/3d121ed5f81cef588bac836b43f5d1f9%40oss.nttdata.com [8] https://www.postgresql.org/message-id/c161b5e7e1888eb9c9eb182a7d9dcf89%40oss.nttdata.com [9] https://www.postgresql.org/message-id/20220201.172757.1480996662235658750.horikyota.ntt%40gmail.com -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 65786ad6c2a9b656c3fd36a45118a39a66da0236 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 29 Jan 2024 21:40:04 +0900 Subject: [PATCH v35] Add function to log the plan of the query Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro Horiguchi, Robert Treat, Alena Rybakina, Ashutosh Bapat Co-authored-by: James Coleman --- contrib/auto_explain/auto_explain.c | 23 +- doc/src/sgml/func.sgml | 50 + src/backend/access/transam/xact.c| 17 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 208 ++- src/backend/executor/execMain.c | 14 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 4 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 9 +
Re: RFC: Logging plan of the running query
On 2023-12-07 08:33, Rafael Thofehrn Castro wrote: Hello hackers, Last Saturday I submitted a patch to the pgsql-hackers list with the title "Proposal: In-flight explain logging" with a patch proposing a feature very similar to the one being worked on in this thread. I should have done a better search in the commitfest before implementing something from scratch. So, as recommended by Ashutosh, I am sending an incremental patch containing an additional feature I personally think we should include: logging the plan with instrumentation details if enabled. Thanks for the proposal and making the patch! When targeting a query with instrumentation PG should log the complete EXPLAIN ANALYZE plan with current row count and, if enabled, timing for each node. This gives the user not only the ability to see what the plan is but also what was executed so far, which is super useful when troubleshooting queries that never finish. I think showing the progress of the query execution would be useful. OTOH it seems to at least need some modifications around Instrumentation as your patch. As a first step, I think it would better to minimize the scope and focus on the fundamental function. For the same reason, getting queries for parallel workers is also prohibited in the current patch as discussed here[1]. [1] https://www.postgresql.org/message-id/c25ae6015be96a1964eddd964657660b%40oss.nttdata.com So I think below steps would be better than pushing all the functionalities to the 1st commit. - First, develop function to enable output of query progress(v34-0001-Add-function-to-log-the-plan-of-the-query.patch). - Then enhance the function - showing the progress of the query execution(v34-0002-Log-plan-along-with-instrumentation-details.patch), etc. --https://www.postgresql.org/message-id/CAG0ozMp3g3drnkDa6RZxXO_OmnisL2sD9vBrmpu5fOBoYpC-3w%40mail.gmail.com - ExplainState customization A ExplainState is allocated and customized for the in-flight logging. Instrumentation related settings are enabled based on how the target query started, which is usually via EXPLAIN ANALYZE or with auto_explain. Does this mean the progress can be got only when the target query was run with EXPLAIN ANALYZE or auto_explain.log_analyze? If so, there might be limited situations we can get the progress since I imagine EXPLAIN ANALYZE is used when user want to get the plan from the beginning and auto_explain.log_analyze can give negative impact on performance as described in the manual and there may not be many environments which enable it. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
Hello hackers, Last Saturday I submitted a patch to the pgsql-hackers list with the title "Proposal: In-flight explain logging" with a patch proposing a feature very similar to the one being worked on in this thread. I should have done a better search in the commitfest before implementing something from scratch. So, as recommended by Ashutosh, I am sending an incremental patch containing an additional feature I personally think we should include: logging the plan with instrumentation details if enabled. When targeting a query with instrumentation PG should log the complete EXPLAIN ANALYZE plan with current row count and, if enabled, timing for each node. This gives the user not only the ability to see what the plan is but also what was executed so far, which is super useful when troubleshooting queries that never finish. Considering that the query is in progress the output will include the statement (never executed) for nodes that weren't touched yet (or may never be). This feature is already present in the current ExplainNode implementation. I added a new statement (in progress) for nodes currently being executed, ie, InstrStartNode was called and clock is ticking there. Back-reading this thread I saw the discussion about extending pg_log_query_plan to parallel workers or not. I added it in my incremental patch as the new capability of logging instrumentation makes it useful to be able to see what parallel workers are currently doing. # DEMONSTRATION: postgres=# select pid, backend_type,pg_log_query_plan(pid) postgres=# from pg_stat_activity postgres=# where (backend_type = 'client backend' and pid != pg_backend_pid()) postgres=#or backend_type = 'parallel worker'; pid | backend_type | pg_log_query_plan ---+-+--- 33833 | client backend | t 47202 | parallel worker | t 47203 | parallel worker | t (3 rows) 2023-12-06 23:14:41.756 UTC [33833] LOG: query plan running on backend with PID 33833 is: Query Text: explain (analyze, buffers) select * from t2 a inner join t1 b on a.c1=b.c1 inner join t1 c on a.c1=c.c1 inner join t1 d on a.c1=d.c1 inner join t1 e on a.c1=e.c1; Gather (cost=70894.63..202643.27 rows=100 width=20) (never executed) Output: a.c1, b.c1, c.c1, d.c1, e.c1 Workers Planned: 2 Workers Launched: 2 -> Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never executed) Output: a.c1, b.c1, c.c1, d.c1, e.c1 Hash Cond: (a.c1 = e.c1) -> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) (never executed) Output: a.c1, b.c1, c.c1, d.c1 Hash Cond: (a.c1 = c.c1) -> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) (actual time=2500.914..2625.922 rows=250412 loops=1) (in progress) Output: a.c1, b.c1 Inner Unique: true Hash Cond: (b.c1 = a.c1) Buffers: shared hit=755 read=2175, temp read=1473 written=1860 -> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..20.904 rows=331492 loops=1) Output: b.c1 Buffers: shared hit=602 read=865 -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=1745.107..1745.107 rows=330638 loops=1) Output: a.c1 Buffers: shared hit=153 read=1310, temp written=868 -> Parallel Seq Scan on public.t2 a (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.042..27.695 rows=330638 loops=1) Output: a.c1 Buffers: shared hit=153 read=1310 -> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) (actual time=2450.489..2450.489 rows=407941 loops=1) Output: c.c1, d.c1 Buffers: shared hit=1141 read=1833, temp read=1938 written=2836 -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) (actual time=1323.422..1575.245 rows=407941 loops=1) Output: c.c1, d.c1 Hash Cond: (c.c1 = d.c1) Buffers: shared hit=1141 read=1833, temp read=1938 written=1864 -> Parallel Seq Scan on public.t1 c (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.026..22.223 rows=336238 loops=1) Output: c.c1 Buffers: shared hit=590 read=898 -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=653.306..653.306 rows=335836 loops=1) Output: d.c1 Buffers: shared hit=551 read=935, temp written=872 -> Parallel Seq Scan on public.t1 d (cost=0.00..8591.67 rows=416667 width=4) (actual
Re: RFC: Logging plan of the running query
On Thu, Nov 9, 2023 at 4:56 PM torikoshia wrote: > > On 2023-11-09 16:11, Ashutosh Bapat wrote: > > On Thu, Nov 9, 2023 at 12:03 PM torikoshia > > wrote: > >> >> > >> >> 1. When a backend is running nested queries, we will see the plan of > >> >> the innermost query. That query may not be the actual culprit if the > >> >> user query is running slowly. E.g a query being run as part of inner > >> >> side nested loop when the nested loop itself is the bottleneck. I > >> >> think it will be useful to print plans of all the whole query stack. > >> > >> This was discussed in previous threads[1] and we thought it'd be > >> useful > >> but since it needed some extra works, we stopped widening the scope. > >> > >> > > >> > I think we can start with what auto_explain is doing. Always print the > >> > plan of the outermost query; the query found in pg_stat_activity. In a > >> > later version we might find a way to print plans of all the queries in > >> > the stack and do so in a readable manner. > >> > >> Agreed there are cases printing plan of the outermost query is more > >> useful. > >> > > > > I am fine printing the plan of the outermost query. This will help > > many cases. Printing plans of the whole query stack can be added as an > > add on later. > > > >> > > >> > This makes tracking activeQueryDesc a bit tricky. My guess is that the > >> > outermost query's descriptor will be available through ActivePortal > >> > most of the time. But there are cases when ExecutorRun is called by > >> > passing a queryDesc directly. So may be there are some cases where > >> > that's not true. > >> > >> Yeah, actually the original version of the patch got the plan from > >> ActivePortal, but it failed logging plan when the query was something > >> like this[2]: > >> > >> DO $$ > >> BEGIN > >> PERFORM pg_sleep(100); > >> END$$; > > > > References [1] and [2] are not listed in your email. > > Oops, sorry. Here are links: > > [1] > https://www.postgresql.org/message-id/64f716c44629e303b66e6c24502147cc%40oss.nttdata.com > [2] > https://www.postgresql.org/message-id/632e99eb-8090-53e6-1b1a-101601904cbd%40oss.nttdata.com > > > Is that because there was no ActivePortal created or the ActivePortal > > pointed to DO block instead of PERFORM pg_sleep? > > ActivePortal is created but ActivePortal->queryDesc is null. Thanks. > > >> > 2. When a query is running in parallel worker do we want to print that > >> > query? It may or may not be interesting given the situation. If the > >> > overall plan itself is faulty, output of the parallel worker query is > >> > not useful. If the plan is fine but a given worker's leg is running > >> > slowly it may be interesting. > >> > >> I think it can be useful. > >> I'm wondering if we can add this after the first patch for this > >> feature > >> is committed. > > > > With the current patches, it will print the query from a parallel > > backend. If that's not desirable we should prohibit that case at > > least. > > Current patch prohibits printing plan if backend type is parallel worker > as below: > >=# select pg_log_query_plan(pid), backend_type from pg_stat_activity > where backend_type = 'parallel worker'; > > pg_log_query_plan | backend_type >---+- > f | parallel worker > f | parallel worker >(2 rows) > >WARNING: PID 4618 is not a PostgreSQL client backend process >WARNING: PID 4617 is not a PostgreSQL client backend process > > Is this the behavior you expect? > I misread then. Thanks for correcting me. We could consider plans from parallel workers in v2 of this feature. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On 2023-11-09 16:11, Ashutosh Bapat wrote: On Thu, Nov 9, 2023 at 12:03 PM torikoshia wrote: >> >> 1. When a backend is running nested queries, we will see the plan of >> the innermost query. That query may not be the actual culprit if the >> user query is running slowly. E.g a query being run as part of inner >> side nested loop when the nested loop itself is the bottleneck. I >> think it will be useful to print plans of all the whole query stack. This was discussed in previous threads[1] and we thought it'd be useful but since it needed some extra works, we stopped widening the scope. > > I think we can start with what auto_explain is doing. Always print the > plan of the outermost query; the query found in pg_stat_activity. In a > later version we might find a way to print plans of all the queries in > the stack and do so in a readable manner. Agreed there are cases printing plan of the outermost query is more useful. I am fine printing the plan of the outermost query. This will help many cases. Printing plans of the whole query stack can be added as an add on later. > > This makes tracking activeQueryDesc a bit tricky. My guess is that the > outermost query's descriptor will be available through ActivePortal > most of the time. But there are cases when ExecutorRun is called by > passing a queryDesc directly. So may be there are some cases where > that's not true. Yeah, actually the original version of the patch got the plan from ActivePortal, but it failed logging plan when the query was something like this[2]: DO $$ BEGIN PERFORM pg_sleep(100); END$$; References [1] and [2] are not listed in your email. Oops, sorry. Here are links: [1] https://www.postgresql.org/message-id/64f716c44629e303b66e6c24502147cc%40oss.nttdata.com [2] https://www.postgresql.org/message-id/632e99eb-8090-53e6-1b1a-101601904cbd%40oss.nttdata.com Is that because there was no ActivePortal created or the ActivePortal pointed to DO block instead of PERFORM pg_sleep? ActivePortal is created but ActivePortal->queryDesc is null. > 2. When a query is running in parallel worker do we want to print that > query? It may or may not be interesting given the situation. If the > overall plan itself is faulty, output of the parallel worker query is > not useful. If the plan is fine but a given worker's leg is running > slowly it may be interesting. I think it can be useful. I'm wondering if we can add this after the first patch for this feature is committed. With the current patches, it will print the query from a parallel backend. If that's not desirable we should prohibit that case at least. Current patch prohibits printing plan if backend type is parallel worker as below: =# select pg_log_query_plan(pid), backend_type from pg_stat_activity where backend_type = 'parallel worker'; pg_log_query_plan | backend_type ---+- f | parallel worker f | parallel worker (2 rows) WARNING: PID 4618 is not a PostgreSQL client backend process WARNING: PID 4617 is not a PostgreSQL client backend process Is this the behavior you expect? -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
On Thu, Nov 9, 2023 at 12:03 PM torikoshia wrote: > >> > >> 1. When a backend is running nested queries, we will see the plan of > >> the innermost query. That query may not be the actual culprit if the > >> user query is running slowly. E.g a query being run as part of inner > >> side nested loop when the nested loop itself is the bottleneck. I > >> think it will be useful to print plans of all the whole query stack. > > This was discussed in previous threads[1] and we thought it'd be useful > but since it needed some extra works, we stopped widening the scope. > > > > > I think we can start with what auto_explain is doing. Always print the > > plan of the outermost query; the query found in pg_stat_activity. In a > > later version we might find a way to print plans of all the queries in > > the stack and do so in a readable manner. > > Agreed there are cases printing plan of the outermost query is more > useful. > I am fine printing the plan of the outermost query. This will help many cases. Printing plans of the whole query stack can be added as an add on later. > > > > This makes tracking activeQueryDesc a bit tricky. My guess is that the > > outermost query's descriptor will be available through ActivePortal > > most of the time. But there are cases when ExecutorRun is called by > > passing a queryDesc directly. So may be there are some cases where > > that's not true. > > Yeah, actually the original version of the patch got the plan from > ActivePortal, but it failed logging plan when the query was something > like this[2]: > > DO $$ > BEGIN > PERFORM pg_sleep(100); > END$$; References [1] and [2] are not listed in your email. Is that because there was no ActivePortal created or the ActivePortal pointed to DO block instead of PERFORM pg_sleep? > > > 2. When a query is running in parallel worker do we want to print that > > query? It may or may not be interesting given the situation. If the > > overall plan itself is faulty, output of the parallel worker query is > > not useful. If the plan is fine but a given worker's leg is running > > slowly it may be interesting. > > I think it can be useful. > I'm wondering if we can add this after the first patch for this feature > is committed. With the current patches, it will print the query from a parallel backend. If that's not desirable we should prohibit that case at least. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On 2023-11-06 15:32, Ashutosh Bapat wrote: Thanks for the suggestion and example. On Fri, Nov 3, 2023 at 7:31 PM Ashutosh Bapat wrote: I have following questions related to the functionality. (Please point me to the relevant discussion if this has been already discussed.) 1. When a backend is running nested queries, we will see the plan of the innermost query. That query may not be the actual culprit if the user query is running slowly. E.g a query being run as part of inner side nested loop when the nested loop itself is the bottleneck. I think it will be useful to print plans of all the whole query stack. This was discussed in previous threads[1] and we thought it'd be useful but since it needed some extra works, we stopped widening the scope. To further explain this point, consider following scenario -- A completely useless function which executes two SQL statements which take small amount individually #create function multstmt() returns int language sql as $$ select count(*) from pg_class where reltype = 12345; select count(*) from pg_type limit 10; $$; -- force a suboptimal plan #set enable_hashjoin to false; #set enable_mergejoin to false; -- A completely useless but long running query #select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid) = multstmt(t.oid); This take a few minutes on my laptop. In a separate session query pg_stat_activity. We will see the original query #select pid, query, backend_type from pg_stat_activity where pid = 349330; pid | query | backend_type +-+ 349330 | select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid) = multstmt(t.oid); | client backend (1 row) Run the plan output function a few times #select pg_log_query_plan(349330); You will observe different plans based on which of the innermost query is runnning LOG: query plan running on backend with PID 349330 is: Query Text: select count(*) from pg_class where reltype = typeid; select count(*) from pg_type where oid = typeid; Query Parameters: $1 = '600' Aggregate (cost=18.18..18.19 rows=1 width=8) Output: count(*) -> Seq Scan on pg_catalog.pg_class (cost=0.00..18.18 rows=2 width=0) Output: oid, relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, reltoastrelid, relhasindex, relisshared, relpersistence, relkind, relnatts, relchecks, relhasrules, relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity, relispopulated, relreplident, relispartition, relrewrite, relfrozenxid, relminmxid, relacl, reloptions, relpartbound Filter: (pg_class.reltype = $1) Settings: enable_hashjoin = 'off', enable_mergejoin = 'off' 2023-11-06 11:52:25.610 IST [349330] LOG: query plan running on backend with PID 349330 is: Query Text: select count(*) from pg_class where reltype = typeid; select count(*) from pg_type where oid = typeid; Query Parameters: $1 = '2203' Aggregate (cost=4.30..4.31 rows=1 width=4) Output: count(*) -> Index Only Scan using pg_type_oid_index on pg_catalog.pg_type (cost=0.28..4.29 rows=1 width=0) Output: oid Index Cond: (pg_type.oid = $1) Settings: enable_hashjoin = 'off', enable_mergejoin = 'off' Individual pieces are confusing here since the query run by the backend is not the one being EXPLAINed. A user may not know that the queries being EXPLAINed arise from the function call multstmt(). So they won't be able to stitch the pieces together unless they see plan of the outermost query with loops and costs. What might help if we explain each query in the hierarchy. I think we can start with what auto_explain is doing. Always print the plan of the outermost query; the query found in pg_stat_activity. In a later version we might find a way to print plans of all the queries in the stack and do so in a readable manner. Agreed there are cases printing plan of the outermost query is more useful. This makes tracking activeQueryDesc a bit tricky. My guess is that the outermost query's descriptor will be available through ActivePortal most of the time. But there are cases when ExecutorRun is called by passing a queryDesc directly. So may be there are some cases where that's not true. Yeah, actually the original version of the patch got the plan from ActivePortal, but it failed logging plan when the query was something like this[2]: DO $$ BEGIN PERFORM pg_sleep(100); END$$; 2. When a query is running in parallel worker do we want to print that query? It may or may not be interesting given the situation. If the overall plan itself is faulty, output of the parallel worker
Re: RFC: Logging plan of the running query
On Fri, Nov 3, 2023 at 7:31 PM Ashutosh Bapat wrote: > > I have following questions related to the functionality. (Please point > me to the relevant discussion if this has been already discussed.) > > 1. When a backend is running nested queries, we will see the plan of > the innermost query. That query may not be the actual culprit if the > user query is running slowly. E.g a query being run as part of inner > side nested loop when the nested loop itself is the bottleneck. I > think it will be useful to print plans of all the whole query stack. To further explain this point, consider following scenario -- A completely useless function which executes two SQL statements which take small amount individually #create function multstmt() returns int language sql as $$ select count(*) from pg_class where reltype = 12345; select count(*) from pg_type limit 10; $$; -- force a suboptimal plan #set enable_hashjoin to false; #set enable_mergejoin to false; -- A completely useless but long running query #select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid) = multstmt(t.oid); This take a few minutes on my laptop. In a separate session query pg_stat_activity. We will see the original query #select pid, query, backend_type from pg_stat_activity where pid = 349330; pid | query | backend_type +-+ 349330 | select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid) = multstmt(t.oid); | client backend (1 row) Run the plan output function a few times #select pg_log_query_plan(349330); You will observe different plans based on which of the innermost query is runnning LOG: query plan running on backend with PID 349330 is: Query Text: select count(*) from pg_class where reltype = typeid; select count(*) from pg_type where oid = typeid; Query Parameters: $1 = '600' Aggregate (cost=18.18..18.19 rows=1 width=8) Output: count(*) -> Seq Scan on pg_catalog.pg_class (cost=0.00..18.18 rows=2 width=0) Output: oid, relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, reltoastrelid, relhasindex, relisshared, relpersistence, relkind, relnatts, relchecks, relhasrules, relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity, relispopulated, relreplident, relispartition, relrewrite, relfrozenxid, relminmxid, relacl, reloptions, relpartbound Filter: (pg_class.reltype = $1) Settings: enable_hashjoin = 'off', enable_mergejoin = 'off' 2023-11-06 11:52:25.610 IST [349330] LOG: query plan running on backend with PID 349330 is: Query Text: select count(*) from pg_class where reltype = typeid; select count(*) from pg_type where oid = typeid; Query Parameters: $1 = '2203' Aggregate (cost=4.30..4.31 rows=1 width=4) Output: count(*) -> Index Only Scan using pg_type_oid_index on pg_catalog.pg_type (cost=0.28..4.29 rows=1 width=0) Output: oid Index Cond: (pg_type.oid = $1) Settings: enable_hashjoin = 'off', enable_mergejoin = 'off' Individual pieces are confusing here since the query run by the backend is not the one being EXPLAINed. A user may not know that the queries being EXPLAINed arise from the function call multstmt(). So they won't be able to stitch the pieces together unless they see plan of the outermost query with loops and costs. What might help if we explain each query in the hierarchy. I think we can start with what auto_explain is doing. Always print the plan of the outermost query; the query found in pg_stat_activity. In a later version we might find a way to print plans of all the queries in the stack and do so in a readable manner. This makes tracking activeQueryDesc a bit tricky. My guess is that the outermost query's descriptor will be available through ActivePortal most of the time. But there are cases when ExecutorRun is called by passing a queryDesc directly. So may be there are some cases where that's not true. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
I have following questions related to the functionality. (Please point me to the relevant discussion if this has been already discussed.) 1. When a backend is running nested queries, we will see the plan of the innermost query. That query may not be the actual culprit if the user query is running slowly. E.g a query being run as part of inner side nested loop when the nested loop itself is the bottleneck. I think it will be useful to print plans of all the whole query stack. 2. When a query is running in parallel worker do we want to print that query? It may or may not be interesting given the situation. If the overall plan itself is faulty, output of the parallel worker query is not useful. If the plan is fine but a given worker's leg is running slowly it may be interesting. As a side note, you may want to fix the indentation in ExplainAssembleLogOutput(). On Fri, Oct 27, 2023 at 6:24 PM Étienne BERSAC wrote: > > > Hi, > > > > > If we use client log message, that message is shown on the target > > process whose pid is specified by the parameter of > > pg_log_query_plan(): > > > >(pid:1000)=# select pg_sleep(60); > >(pid:1001)=# select pg_log_query_plan(1000); > >(pid:1000)=# LOG: query plan running on backend with PID 1000 is: > > Query Text: select pg_sleep(1000); > > Result (cost=0.00..0.01 rows=1 width=4) > > Output: pg_sleep('1000'::double precision) > > > > I think this is not an expected behavior and we set elevel to > > LOG_SERVER_ONLY. > > > Makes sens. Thanks. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
> Hi, > > If we use client log message, that message is shown on the target > process whose pid is specified by the parameter of > pg_log_query_plan(): > > (pid:1000)=# select pg_sleep(60); > (pid:1001)=# select pg_log_query_plan(1000); > (pid:1000)=# LOG: query plan running on backend with PID 1000 is: > Query Text: select pg_sleep(1000); > Result (cost=0.00..0.01 rows=1 width=4) > Output: pg_sleep('1000'::double precision) > > I think this is not an expected behavior and we set elevel to > LOG_SERVER_ONLY. Makes sens. Thanks.
Re: RFC: Logging plan of the running query
On 2023-10-27 16:06, Étienne BERSAC wrote: Hi Torikoshia, If so, we once tried to implement such function for getting memory contexts. However, this attempt didn't succeed because it could lead dead lock situation[1]. Thanks for the pointer. Why not use client log message to allow client to get plan without locking backend memory context and without access to server log ? I missed the rationnal for not sending the plan to client. If we use client log message, that message is shown on the target process whose pid is specified by the parameter of pg_log_query_plan(): (pid:1000)=# select pg_sleep(60); (pid:1001)=# select pg_log_query_plan(1000); (pid:1000)=# LOG: query plan running on backend with PID 1000 is: Query Text: select pg_sleep(1000); Result (cost=0.00..0.01 rows=1 width=4) Output: pg_sleep('1000'::double precision) I think this is not an expected behavior and we set elevel to LOG_SERVER_ONLY. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
Hi Torikoshia, > If so, we once tried to implement such function for getting memory > contexts. > However, this attempt didn't succeed because it could lead dead lock > situation[1]. Thanks for the pointer. Why not use client log message to allow client to get plan without locking backend memory context and without access to server log ? I missed the rationnal for not sending the plan to client. Regards, Étienne
Re: RFC: Logging plan of the running query
On 2023-10-25 12:40, Ashutosh Bapat wrote: On Wed, Oct 18, 2023 at 10:04 PM James Coleman wrote: While that decision as regards auto_explain has long since been made (and there would be work to undo it), I don't believe that we should repeat that choice here. I'm -10 on moving this into auto_explain. Right. However perhaps there is still an opportunity for moving some of the auto_explain code into core so as to enable deduplicating the code. Right. That's what I also think. Thanks for your comments. Attached patch adds a new function which assembles es->str for logging according to specified contents and format. This is called from both auto_explain and pg_log_query_plan(). On 2023-10-11 16:22, Ashutosh Bapat wrote: I am also wondering whether it's better to report the WARNING as status column in the output. Attached patch left as it was since the inconsistency with pg_terminate_backend() and pg_log_backend_memory_contexts() as you pointed out. -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 336ba3943f631dcbc0d1226ebd0a7675cf78c1f8 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 26 Oct 2023 15:42:36 +0900 Subject: [PATCH v32] Add function to log the plan of the query Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro Horiguchi, Robert Treat, Alena Rybakina, Ashutosh Bapat Co-authored-by: James Coleman --- contrib/auto_explain/auto_explain.c | 23 +- doc/src/sgml/func.sgml | 49 + src/backend/access/transam/xact.c| 13 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 209 ++- src/backend/executor/execMain.c | 14 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 4 + src/backend/utils/error/elog.c | 2 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 8 + src/include/miscadmin.h | 1 + src/include/storage/lock.h | 2 - src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h| 2 +- src/include/utils/elog.h | 1 + src/test/regress/expected/misc_functions.out | 54 - src/test/regress/sql/misc_functions.sql | 41 +++- 20 files changed, 399 insertions(+), 48 deletions(-) diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index c3ac27ae99..20a73df8c4 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -399,26 +399,9 @@ explain_ExecutorEnd(QueryDesc *queryDesc) es->format = auto_explain_log_format; es->settings = auto_explain_log_settings; - ExplainBeginOutput(es); - ExplainQueryText(es, queryDesc); - ExplainQueryParameters(es, queryDesc->params, auto_explain_log_parameter_max_length); - ExplainPrintPlan(es, queryDesc); - if (es->analyze && auto_explain_log_triggers) -ExplainPrintTriggers(es, queryDesc); - if (es->costs) -ExplainPrintJITSummary(es, queryDesc); - ExplainEndOutput(es); - - /* Remove last line break */ - if (es->str->len > 0 && es->str->data[es->str->len - 1] == '\n') -es->str->data[--es->str->len] = '\0'; - - /* Fix JSON to output an object */ - if (auto_explain_log_format == EXPLAIN_FORMAT_JSON) - { -es->str->data[0] = '{'; -es->str->data[es->str->len - 1] = '}'; - } + ExplainAssembleLogOutput(es, queryDesc, auto_explain_log_format, + auto_explain_log_triggers, + auto_explain_log_parameter_max_length); /* * Note: we rely on the existing logging of context or diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7c3e940afe..8d77fe1a5b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -26406,6 +26406,25 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan
Re: RFC: Logging plan of the running query
On Wed, Oct 18, 2023 at 10:04 PM James Coleman wrote: > > While that decision as regards auto_explain has long since been made > (and there would be work to undo it), I don't believe that we should > repeat that choice here. I'm -10 on moving this into auto_explain. > Right. > However perhaps there is still an opportunity for moving some of the > auto_explain code into core so as to enable deduplicating the code. > Right. That's what I also think. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On 2023-10-24 16:12, Étienne BERSAC wrote: Hi, Yeah, and when we have a situation where we want to run pg_log_query_plan(), we can run it in any environment as long as it is bundled with the core. Is it possible to get the plan of a backend programmatically without access to the logs? Something like pg_get_query_plan(pid, format) which output would be the same as EXPLAIN. Do you imagine a function like below? =# select pg_get_query_plan(100, 'plain'); QUERY PLAN --- Limit (cost=0.00..0.04 rows=1 width=273) -> Seq Scan on pg_class (cost=0.00..17.14 rows=414 width=273) If so, we once tried to implement such function for getting memory contexts. However, this attempt didn't succeed because it could lead dead lock situation[1]. I think the same problem can occur when implementing pg_get_query_plan(). [1] https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com Regards, Étienne BERSAC Dalibo -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
> Hi, > > Yeah, and when we have a situation where we want to run > pg_log_query_plan(), we can run it in any environment as long as it > is bundled with the core. Is it possible to get the plan of a backend programmatically without access to the logs? Something like pg_get_query_plan(pid, format) which output would be the same as EXPLAIN. Regards, Étienne BERSAC Dalibo
Re: RFC: Logging plan of the running query
On Wed, Oct 18, 2023 at 2:09 AM torikoshia wrote: > > On 2023-10-16 18:46, Ashutosh Bapat wrote: > > On Thu, Oct 12, 2023 at 6:51 PM torikoshia > > wrote: > >> > >> On 2023-10-11 16:22, Ashutosh Bapat wrote: > >> > > >> > Considering the similarity with auto_explain I wondered whether this > >> > function should be part of auto_explain contrib module itself? If we > >> > do that users will need to load auto_explain extension and thus > >> > install executor hooks when this function doesn't need those. So may > >> > not be such a good idea. I didn't see any discussion on this. > >> > >> I once thought about adding this to auto_explain, but I left it asis > >> for > >> below reasons: > >> > >> - One of the typical use case of pg_log_query_plan() would be > >> analyzing > >> slow query on customer environments. On such environments, We cannot > >> always control what extensions to install. > > > > The same argument applies to auto_explain functionality as well. But > > it's not part of the core. > > Yeah, and when we have a situation where we want to run > pg_log_query_plan(), we can run it in any environment as long as it is > bundled with the core. > On the other hand, if it is built into auto_explain, we need to start by > installing auto_explain if we do not have auto_explain, which is often > difficult to do in production environments. > > >>Of course auto_explain is a major extension and it is quite > >> possible > >> that they installed auto_explain, but but it is also possible they do > >> not. > >> - It seems a bit counter-intuitive that pg_log_query_plan() is in an > >> extension called auto_explain, since it `manually`` logs plans > >> > > > > pg_log_query_plan() may not fit auto_explain but > > pg_explain_backend_query() does. What we are logging is more than just > > plan of the query, it might expand to be closer to explain output. > > While auto in auto_explain would refer to its automatically logging > > explain outputs, it can provide an additional function which provides > > similar functionality by manually triggering it. > > > > But we can defer this to a committer, if you want. > > > > I am more interested in avoiding the duplication of code, esp. the > > first comment in my reply > > If there are no objections, I will try porting it to auto_explain and > see its feasibility. > > >>> There is a lot of similarity between what this feature does and what > >>> auto explain does. I see the code is also duplicated. There is some > >>> merit in avoiding this duplication > >>> 1. we will get all the features of auto_explain automatically like > >>> choosing a format (this was expressed somebody earlier in this > >>> thread), setings etc. > >>> 2. avoid bugs. E.g your code switches context after ExplainState has > >>> been allocated. These states may leak depending upon when this > >>> function gets called. > >>> 3. Building features on top as James envisions will be easier. In my view the fact that auto_explain is itself not part of core is a mistake, and I know there are more prominent hackers than myself who hold that view. While that decision as regards auto_explain has long since been made (and there would be work to undo it), I don't believe that we should repeat that choice here. I'm -10 on moving this into auto_explain. However perhaps there is still an opportunity for moving some of the auto_explain code into core so as to enable deduplicating the code. Regards, James Coleman
Re: RFC: Logging plan of the running query
On Wed, Oct 18, 2023 at 11:39 AM torikoshia wrote: > > > > I am more interested in avoiding the duplication of code, esp. the > > first comment in my reply > > If there are no objections, I will try porting it to auto_explain and > see its feasibility. If you want it in core, you will need to port relevant parts of auto_explain code to core. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On 2023-10-16 18:46, Ashutosh Bapat wrote: On Thu, Oct 12, 2023 at 6:51 PM torikoshia wrote: On 2023-10-11 16:22, Ashutosh Bapat wrote: > > Considering the similarity with auto_explain I wondered whether this > function should be part of auto_explain contrib module itself? If we > do that users will need to load auto_explain extension and thus > install executor hooks when this function doesn't need those. So may > not be such a good idea. I didn't see any discussion on this. I once thought about adding this to auto_explain, but I left it asis for below reasons: - One of the typical use case of pg_log_query_plan() would be analyzing slow query on customer environments. On such environments, We cannot always control what extensions to install. The same argument applies to auto_explain functionality as well. But it's not part of the core. Yeah, and when we have a situation where we want to run pg_log_query_plan(), we can run it in any environment as long as it is bundled with the core. On the other hand, if it is built into auto_explain, we need to start by installing auto_explain if we do not have auto_explain, which is often difficult to do in production environments. Of course auto_explain is a major extension and it is quite possible that they installed auto_explain, but but it is also possible they do not. - It seems a bit counter-intuitive that pg_log_query_plan() is in an extension called auto_explain, since it `manually`` logs plans pg_log_query_plan() may not fit auto_explain but pg_explain_backend_query() does. What we are logging is more than just plan of the query, it might expand to be closer to explain output. While auto in auto_explain would refer to its automatically logging explain outputs, it can provide an additional function which provides similar functionality by manually triggering it. But we can defer this to a committer, if you want. I am more interested in avoiding the duplication of code, esp. the first comment in my reply If there are no objections, I will try porting it to auto_explain and see its feasibility. There is a lot of similarity between what this feature does and what auto explain does. I see the code is also duplicated. There is some merit in avoiding this duplication 1. we will get all the features of auto_explain automatically like choosing a format (this was expressed somebody earlier in this thread), setings etc. 2. avoid bugs. E.g your code switches context after ExplainState has been allocated. These states may leak depending upon when this function gets called. 3. Building features on top as James envisions will be easier. =# select pg_log_query_plan(pid), application_name, backend_type from pg_stat_activity where backend_type = 'autovacuum launcher'; WARNING: PID 63323 is not a PostgreSQL client backend process pg_log_query_plan | application_name |backend_type ---+--+- f | | autovacuum launcher > I am also wondering whether it's better to report the WARNING as > status column in the output. E.g. instead of > #select pg_log_query_plan(100); > WARNING: PID 100 is not a PostgreSQL backend process > pg_log_query_plan > --- > f > (1 row) > we output > #select pg_log_query_plan(100); > pg_log_query_plan | status > ---+- > f | PID 100 is not a PostgreSQL backend process > (1 row) > > That looks neater and can easily be handled by scripts, applications > and such. But it will be inconsistent with other functions like > pg_terminate_backend() and pg_log_backend_memory_contexts(). It seems neater, but it might be inconvenient because we can no longer use it in select list like the following query as you wrote: #select pg_log_query_plan(pid), application_name, backend_type from pg_stat_activity where backend_type = 'autovacuum launcher'; Why is that? Sorry, I misunderstood and confirmed we can run queries like below: ``` =# CREATE OR REPLACE FUNCTION pg_log_query_plan_stab(i int) RETURNS TABLE( pg_log_query_plan bool, status text ) AS $$ DECLARE BEGIN RETURN QUERY SELECT false::bool, 'PID 100 is not a PostgreSQL backend
Re: RFC: Logging plan of the running query
On Thu, Oct 12, 2023 at 6:51 PM torikoshia wrote: > > On 2023-10-11 16:22, Ashutosh Bapat wrote: > > > > Considering the similarity with auto_explain I wondered whether this > > function should be part of auto_explain contrib module itself? If we > > do that users will need to load auto_explain extension and thus > > install executor hooks when this function doesn't need those. So may > > not be such a good idea. I didn't see any discussion on this. > > I once thought about adding this to auto_explain, but I left it asis for > below reasons: > > - One of the typical use case of pg_log_query_plan() would be analyzing > slow query on customer environments. On such environments, We cannot > always control what extensions to install. The same argument applies to auto_explain functionality as well. But it's not part of the core. >Of course auto_explain is a major extension and it is quite possible > that they installed auto_explain, but but it is also possible they do > not. > - It seems a bit counter-intuitive that pg_log_query_plan() is in an > extension called auto_explain, since it `manually`` logs plans > pg_log_query_plan() may not fit auto_explain but pg_explain_backend_query() does. What we are logging is more than just plan of the query, it might expand to be closer to explain output. While auto in auto_explain would refer to its automatically logging explain outputs, it can provide an additional function which provides similar functionality by manually triggering it. But we can defer this to a committer, if you want. I am more interested in avoiding the duplication of code, esp. the first comment in my reply >> There is a lot of similarity between what this feature does and what >> auto explain does. I see the code is also duplicated. There is some >> merit in avoiding this duplication >> 1. we will get all the features of auto_explain automatically like >> choosing a format (this was expressed somebody earlier in this >> thread), setings etc. >> 2. avoid bugs. E.g your code switches context after ExplainState has >> been allocated. These states may leak depending upon when this >> function gets called. >> 3. Building features on top as James envisions will be easier. > >=# select pg_log_query_plan(pid), application_name, backend_type from > pg_stat_activity where backend_type = 'autovacuum launcher'; >WARNING: PID 63323 is not a PostgreSQL client backend process > pg_log_query_plan | application_name |backend_type >---+--+- > f | | autovacuum launcher > > > > I am also wondering whether it's better to report the WARNING as > > status column in the output. E.g. instead of > > #select pg_log_query_plan(100); > > WARNING: PID 100 is not a PostgreSQL backend process > > pg_log_query_plan > > --- > > f > > (1 row) > > we output > > #select pg_log_query_plan(100); > > pg_log_query_plan | status > > ---+- > > f | PID 100 is not a PostgreSQL backend process > > (1 row) > > > > That looks neater and can easily be handled by scripts, applications > > and such. But it will be inconsistent with other functions like > > pg_terminate_backend() and pg_log_backend_memory_contexts(). > > It seems neater, but it might be inconvenient because we can no longer > use it in select list like the following query as you wrote: > >#select pg_log_query_plan(pid), application_name, backend_type from >pg_stat_activity where backend_type = 'autovacuum launcher'; Why is that? -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On 2023-10-11 16:22, Ashutosh Bapat wrote: Like many others I think this feature is useful to debug a long running query. Sorry for jumping late into this. I have a few of high level comments Thanks for your comments! There is a lot of similarity between what this feature does and what auto explain does. I see the code is also duplicated. There is some merit in avoiding this duplication 1. we will get all the features of auto_explain automatically like choosing a format (this was expressed somebody earlier in this thread), setings etc. 2. avoid bugs. E.g your code switches context after ExplainState has been allocated. These states may leak depending upon when this function gets called. 3. Building features on top as James envisions will be easier. Considering the similarity with auto_explain I wondered whether this function should be part of auto_explain contrib module itself? If we do that users will need to load auto_explain extension and thus install executor hooks when this function doesn't need those. So may not be such a good idea. I didn't see any discussion on this. I once thought about adding this to auto_explain, but I left it asis for below reasons: - One of the typical use case of pg_log_query_plan() would be analyzing slow query on customer environments. On such environments, We cannot always control what extensions to install. Of course auto_explain is a major extension and it is quite possible that they installed auto_explain, but but it is also possible they do not. - It seems a bit counter-intuitive that pg_log_query_plan() is in an extension called auto_explain, since it `manually`` logs plans I tried following query to pass PID of a non-client backend to this function. #select pg_log_query_plan(pid), application_name, backend_type from pg_stat_activity where backend_type = 'autovacuum launcher'; pg_log_query_plan | application_name |backend_type ---+--+- t | | autovacuum launcher (1 row) I see "LOG: backend with PID 2733631 is not running a query or a subtransaction is aborted" in server logs. That's ok. But may be we should not send signal to these kinds of backends at all, thus avoiding some system calls. Agreed, it seems better. Attached patch checks if the backendType of target process is 'client backend'. =# select pg_log_query_plan(pid), application_name, backend_type from pg_stat_activity where backend_type = 'autovacuum launcher'; WARNING: PID 63323 is not a PostgreSQL client backend process pg_log_query_plan | application_name |backend_type ---+--+- f | | autovacuum launcher I am also wondering whether it's better to report the WARNING as status column in the output. E.g. instead of #select pg_log_query_plan(100); WARNING: PID 100 is not a PostgreSQL backend process pg_log_query_plan --- f (1 row) we output #select pg_log_query_plan(100); pg_log_query_plan | status ---+- f | PID 100 is not a PostgreSQL backend process (1 row) That looks neater and can easily be handled by scripts, applications and such. But it will be inconsistent with other functions like pg_terminate_backend() and pg_log_backend_memory_contexts(). It seems neater, but it might be inconvenient because we can no longer use it in select list like the following query as you wrote: #select pg_log_query_plan(pid), application_name, backend_type from pg_stat_activity where backend_type = 'autovacuum launcher'; I do share a concern that was discussed earlier. If a query is running longer, there's something problematic with it. A diagnostic intervention breaking it further would be unwelcome. James has run experiments to shake this code for any loose breakages. He has not found any. So may be we are good. And we wouldn't know about very rare corner cases so easily without using it in the field. So fine with it. If we could add some safety net that will be great but may not be necessary for the first cut. If there are candidates for the safety net, I'm willing to add them. -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom b7902cf43254450cc7831c235982438ea1e5e8b7 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 12 Oct 2023 22:03:48 +0900 Subject: [PATCH v31] Add function to log the plan of the query Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue
Re: RFC: Logging plan of the running query
Hi, On Tue, Oct 10, 2023 at 7:00 PM torikoshia wrote: > Thanks for your comments! > > I tested v30 patch with > v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch which makes > CFI() call ProcessLogQueryPlanInterrupt() internally, and confirmed that > very few logging queries failed with v30 patch. > > This is a result in line with your prediction. > > > ``` > $ rg -c'ignored request for logging query plan due to lock confilcts' > postmaster.log > 8 Like many others I think this feature is useful to debug a long running query. Sorry for jumping late into this. I have a few of high level comments There is a lot of similarity between what this feature does and what auto explain does. I see the code is also duplicated. There is some merit in avoiding this duplication 1. we will get all the features of auto_explain automatically like choosing a format (this was expressed somebody earlier in this thread), setings etc. 2. avoid bugs. E.g your code switches context after ExplainState has been allocated. These states may leak depending upon when this function gets called. 3. Building features on top as James envisions will be easier. Considering the similarity with auto_explain I wondered whether this function should be part of auto_explain contrib module itself? If we do that users will need to load auto_explain extension and thus install executor hooks when this function doesn't need those. So may not be such a good idea. I didn't see any discussion on this. I tried following query to pass PID of a non-client backend to this function. #select pg_log_query_plan(pid), application_name, backend_type from pg_stat_activity where backend_type = 'autovacuum launcher'; pg_log_query_plan | application_name |backend_type ---+--+- t | | autovacuum launcher (1 row) I see "LOG: backend with PID 2733631 is not running a query or a subtransaction is aborted" in server logs. That's ok. But may be we should not send signal to these kinds of backends at all, thus avoiding some system calls. I am also wondering whether it's better to report the WARNING as status column in the output. E.g. instead of #select pg_log_query_plan(100); WARNING: PID 100 is not a PostgreSQL backend process pg_log_query_plan --- f (1 row) we output #select pg_log_query_plan(100); pg_log_query_plan | status ---+- f | PID 100 is not a PostgreSQL backend process (1 row) That looks neater and can easily be handled by scripts, applications and such. But it will be inconsistent with other functions like pg_terminate_backend() and pg_log_backend_memory_contexts(). I do share a concern that was discussed earlier. If a query is running longer, there's something problematic with it. A diagnostic intervention breaking it further would be unwelcome. James has run experiments to shake this code for any loose breakages. He has not found any. So may be we are good. And we wouldn't know about very rare corner cases so easily without using it in the field. So fine with it. If we could add some safety net that will be great but may not be necessary for the first cut. -- Best Wishes, Ashutosh Bapat
Re: RFC: Logging plan of the running query
On 2023-10-04 03:00, James Coleman wrote: and I think what we need to do is explicitly disallow running this code any time we are inside of lock acquisition code. Updated patch to check if any locks have already been acquired by examining MyProc->heldLocks. I'm not sure this change can "disallow running this code `any time` we are inside of lock acquisition code", but as far as select1.trace, which you shared, I believe it can prevent running explain codes since it must have set MyProc->heldLocks in LockAcquireExtended() before WaitOnLock(): ``` /* * Set bitmask of locks this process already holds on this object. */ MyProc->heldLocks = proclock->holdMask; ..(snip).. WaitOnLock(locallock, owner); ``` On 2023-10-07 00:58, Andres Freund wrote: How so? We shouldn't commonly acquire relevant locks while executing a query? With a few exceptions, they should instead be acquired t the start of query processing. We do acquire a lot of lwlocks, obviously, but we don't process interrupts during the acquisition / holding of lwlocks. And presumably the interrupt would just be processed the next time interrupt processing is happening? Thanks for your comments! I tested v30 patch with v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch which makes CFI() call ProcessLogQueryPlanInterrupt() internally, and confirmed that very few logging queries failed with v30 patch. This is a result in line with your prediction. ``` $ rg -c'ignored request for logging query plan due to lock confilcts' postmaster.log 8 ``` -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom aaaca1523ed5342c9f77d79963e0d256146381d2 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Tue, 10 Oct 2023 22:07:28 +0900 Subject: [PATCH v30] Add function to log the plan of the query Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro Horiguchi, Robert Treat, Alena Rybakina Co-authored-by: James Coleman --- doc/src/sgml/func.sgml | 49 + src/backend/access/transam/xact.c| 13 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 177 ++- src/backend/executor/execMain.c | 14 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 4 + src/backend/utils/error/elog.c | 2 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 4 + src/include/miscadmin.h | 1 + src/include/storage/lock.h | 2 - src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h| 2 +- src/include/utils/elog.h | 1 + src/test/regress/expected/misc_functions.out | 54 -- src/test/regress/sql/misc_functions.sql | 41 +++-- 19 files changed, 360 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index f1ad64c3d6..b7c7fa9169 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -26308,6 +26308,25 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID. +It will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -26528,6 +26547,36 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_query_plan can be used +to log the plan of a backend process. For example:
Re: RFC: Logging plan of the running query
Hi, On 2023-10-06 21:58:46 +0900, torikoshia wrote: > Yeah, I think it's a straightforward workaround. > And I'm also concerned that the condition of being in the process > of acquiring some kind of lock is too strict and will make it almost > impossible to output a running plan. How so? We shouldn't commonly acquire relevant locks while executing a query? With a few exceptions, they should instead be acquired t the start of query processing. We do acquire a lot of lwlocks, obviously, but we don't process interrupts during the acquisition / holding of lwlocks. And presumably the interrupt would just be processed the next time interrupt processing is happening? Greetings, Andres Freund
Re: RFC: Logging plan of the running query
On Fri, Oct 6, 2023 at 8:58 AM torikoshia wrote: > > On 2023-10-04 03:00, James Coleman wrote: > > On Thu, Sep 7, 2023 at 2:09 AM torikoshia > > wrote: > >> > >> On 2023-09-06 11:17, James Coleman wrote: > >> > >> >> > I've never been able to reproduce it (haven't tested the new version, > >> >> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian > >> >> > (first buster and now bullseye). > >> >> > > >> >> > I'm attaching several stacktraces in the hope that they provide some > >> >> > clues. These all match the ps output I sent earlier, though note in > >> >> > that output there is both the regress instance and my test instance > >> >> > (pid 3213249) running (different ports, of course, and they are from > >> >> > the exact same compilation run). I've attached ps output for the > >> >> > postgres processes under the make check process to simplify cross > >> >> > referencing. > >> >> > > >> >> > A few interesting things: > >> >> > - There's definitely a lock on a relation that seems to be what's > >> >> > blocking the processes. > >> >> > - When I try to connect with psql the process forks but then hangs > >> >> > (see the ps output with task names stuck in "authentication"). I've > >> >> > also included a trace from one of these. > >> >> > >> >> Thanks for sharing them! > >> >> > >> >> Many processes are waiting to acquire the LW lock, including the > >> >> process > >> >> trying to output the plan(select1.trace). > >> >> > >> >> I suspect that this is due to a lock that was acquired prior to being > >> >> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able > >> >> to > >> >> reproduce the same situation.. > >> >> > >> > > >> > I don't have time immediately to dig into this, but perhaps loading up > >> > the core dumps would allow us to see what query is running in each > >> > backend process (if it hasn't already been discarded by that point) > >> > and thereby determine what point in each test process led to the error > >> > condition? > >> > >> Thanks for the suggestion. > >> I am concerned that core dumps may not be readable on different > >> operating systems or other environments. (Unfortunately, I do not have > >> Debian on hand) > >> > >> It seems that we can know what queries were running from the stack > >> traces you shared. > >> As described above, I suspect a lock which was acquired prior to > >> ProcessLogQueryPlanInterrupt() caused the issue. > >> I will try a little more to see if I can devise a way to create the > >> same > >> situation. > >> > >> > Alternatively we might be able to apply the same trick to the test > >> > client instead... > >> > > >> > BTW, for my own easy reference in this thread: relid 1259 is pg_class > >> > if I'm not mistaken. > >> > >> Yeah, and I think it's strange that the lock to 1259 appears twice and > >> must be avoided. > >> > >>#10 0x559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) > >> at > >> lmgr.c:117 > >>.. > >>#49 0x559d61b4989d in ProcessLogQueryPlanInterrupt () at > >> explain.c:5158 > >>.. > >>#53 0x559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) > >> at > >> lmgr.c:117 > > > > I chatted with Andres and David about this at PGConf.NYC, > Thanks again for the discussion with hackers! > > > and I think > > what we need to do is explicitly disallow running this code any time > > we are inside of lock acquisition code. > > Yeah, I think it's a straightforward workaround. > And I'm also concerned that the condition of being in the process > of acquiring some kind of lock is too strict and will make it almost > impossible to output a running plan. I was concerned about this too, but I was wondering if we might be able to cheat a bit by making such a case not clear the flag but instead just skip it for now. Regards, James
Re: RFC: Logging plan of the running query
On 2023-10-04 03:00, James Coleman wrote: On Thu, Sep 7, 2023 at 2:09 AM torikoshia wrote: On 2023-09-06 11:17, James Coleman wrote: >> > I've never been able to reproduce it (haven't tested the new version, >> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian >> > (first buster and now bullseye). >> > >> > I'm attaching several stacktraces in the hope that they provide some >> > clues. These all match the ps output I sent earlier, though note in >> > that output there is both the regress instance and my test instance >> > (pid 3213249) running (different ports, of course, and they are from >> > the exact same compilation run). I've attached ps output for the >> > postgres processes under the make check process to simplify cross >> > referencing. >> > >> > A few interesting things: >> > - There's definitely a lock on a relation that seems to be what's >> > blocking the processes. >> > - When I try to connect with psql the process forks but then hangs >> > (see the ps output with task names stuck in "authentication"). I've >> > also included a trace from one of these. >> >> Thanks for sharing them! >> >> Many processes are waiting to acquire the LW lock, including the >> process >> trying to output the plan(select1.trace). >> >> I suspect that this is due to a lock that was acquired prior to being >> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able >> to >> reproduce the same situation.. >> > > I don't have time immediately to dig into this, but perhaps loading up > the core dumps would allow us to see what query is running in each > backend process (if it hasn't already been discarded by that point) > and thereby determine what point in each test process led to the error > condition? Thanks for the suggestion. I am concerned that core dumps may not be readable on different operating systems or other environments. (Unfortunately, I do not have Debian on hand) It seems that we can know what queries were running from the stack traces you shared. As described above, I suspect a lock which was acquired prior to ProcessLogQueryPlanInterrupt() caused the issue. I will try a little more to see if I can devise a way to create the same situation. > Alternatively we might be able to apply the same trick to the test > client instead... > > BTW, for my own easy reference in this thread: relid 1259 is pg_class > if I'm not mistaken. Yeah, and I think it's strange that the lock to 1259 appears twice and must be avoided. #10 0x559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at lmgr.c:117 .. #49 0x559d61b4989d in ProcessLogQueryPlanInterrupt () at explain.c:5158 .. #53 0x559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at lmgr.c:117 I chatted with Andres and David about this at PGConf.NYC, Thanks again for the discussion with hackers! and I think what we need to do is explicitly disallow running this code any time we are inside of lock acquisition code. Yeah, I think it's a straightforward workaround. And I'm also concerned that the condition of being in the process of acquiring some kind of lock is too strict and will make it almost impossible to output a running plan. Anyway I'm going to implement it and run pg_log_query_plan() while the regression test is running to see how successful the plan output is. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
On Thu, Sep 7, 2023 at 2:09 AM torikoshia wrote: > > On 2023-09-06 11:17, James Coleman wrote: > > >> > I've never been able to reproduce it (haven't tested the new version, > >> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian > >> > (first buster and now bullseye). > >> > > >> > I'm attaching several stacktraces in the hope that they provide some > >> > clues. These all match the ps output I sent earlier, though note in > >> > that output there is both the regress instance and my test instance > >> > (pid 3213249) running (different ports, of course, and they are from > >> > the exact same compilation run). I've attached ps output for the > >> > postgres processes under the make check process to simplify cross > >> > referencing. > >> > > >> > A few interesting things: > >> > - There's definitely a lock on a relation that seems to be what's > >> > blocking the processes. > >> > - When I try to connect with psql the process forks but then hangs > >> > (see the ps output with task names stuck in "authentication"). I've > >> > also included a trace from one of these. > >> > >> Thanks for sharing them! > >> > >> Many processes are waiting to acquire the LW lock, including the > >> process > >> trying to output the plan(select1.trace). > >> > >> I suspect that this is due to a lock that was acquired prior to being > >> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able > >> to > >> reproduce the same situation.. > >> > > > > I don't have time immediately to dig into this, but perhaps loading up > > the core dumps would allow us to see what query is running in each > > backend process (if it hasn't already been discarded by that point) > > and thereby determine what point in each test process led to the error > > condition? > > Thanks for the suggestion. > I am concerned that core dumps may not be readable on different > operating systems or other environments. (Unfortunately, I do not have > Debian on hand) > > It seems that we can know what queries were running from the stack > traces you shared. > As described above, I suspect a lock which was acquired prior to > ProcessLogQueryPlanInterrupt() caused the issue. > I will try a little more to see if I can devise a way to create the same > situation. > > > Alternatively we might be able to apply the same trick to the test > > client instead... > > > > BTW, for my own easy reference in this thread: relid 1259 is pg_class > > if I'm not mistaken. > > Yeah, and I think it's strange that the lock to 1259 appears twice and > must be avoided. > >#10 0x559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at > lmgr.c:117 >.. >#49 0x559d61b4989d in ProcessLogQueryPlanInterrupt () at > explain.c:5158 >.. >#53 0x559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at > lmgr.c:117 I chatted with Andres and David about this at PGConf.NYC, and I think what we need to do is explicitly disallow running this code any time we are inside of lock acquisition code. Regards, James Coleman
Re: RFC: Logging plan of the running query
On 28/9/2023 09:04, torikoshia wrote: On 2023-09-25 18:49, Andrey Lepikhov wrote: On 25/9/2023 14:21, torikoshia wrote: On 2023-09-20 14:39, Lepikhov Andrei wrote: Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on all CFI using v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then ran the following query but did not cause any problems. ``` =# CREATE TABLE test(); =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql VOLATILE; =# SELECT ddl(); ``` Is this the case you're worrying about? I didn't find a problem either. I just feel uncomfortable if, at the moment of interruption, we have a descriptor of another query than the query have been executing and holding resources. I think that "descriptor" here refers to ActiveQueryDesc, in which case it is updated at the beginning of ExecutorRun(), so I am wondering if the situation you're worried about would not occur. As you can see, in my example we have the only DDL and no queries with plans. In this case postgres doesn't call ExecutorRun() just because it doesn't have a plan. But locks will be obtained. -- regards, Andrey Lepikhov Postgres Professional
Re: RFC: Logging plan of the running query
On 2023-09-25 18:49, Andrey Lepikhov wrote: On 25/9/2023 14:21, torikoshia wrote: On 2023-09-20 14:39, Lepikhov Andrei wrote: Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on all CFI using v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then ran the following query but did not cause any problems. ``` =# CREATE TABLE test(); =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql VOLATILE; =# SELECT ddl(); ``` Is this the case you're worrying about? I didn't find a problem either. I just feel uncomfortable if, at the moment of interruption, we have a descriptor of another query than the query have been executing and holding resources. I think that "descriptor" here refers to ActiveQueryDesc, in which case it is updated at the beginning of ExecutorRun(), so I am wondering if the situation you're worried about would not occur. - @@ -303,10 +306,21 @@ ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count, bool execute_once) { + /* +* Update ActiveQueryDesc here to enable retrieval of the currently +* running queryDesc for nested queries. +*/ + QueryDesc *save_ActiveQueryDesc; + + save_ActiveQueryDesc = ActiveQueryDesc; + ActiveQueryDesc = queryDesc; - -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
On 25/9/2023 14:21, torikoshia wrote: On 2023-09-20 14:39, Lepikhov Andrei wrote: Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on all CFI using v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then ran the following query but did not cause any problems. ``` =# CREATE TABLE test(); =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql VOLATILE; =# SELECT ddl(); ``` Is this the case you're worrying about? I didn't find a problem either. I just feel uncomfortable if, at the moment of interruption, we have a descriptor of another query than the query have been executing and holding resources. -- regards, Andrey Lepikhov Postgres Professional
Re: RFC: Logging plan of the running query
On 2023-09-20 14:39, Lepikhov Andrei wrote: Thanks for your reply. On Tue, Sep 19, 2023, at 8:39 PM, torikoshia wrote: On 2023-09-15 15:21, Lepikhov Andrei wrote: On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote: I have explored this patch and, by and large, agree with the code. But some questions I want to discuss: 1. Maybe add a hook to give a chance for extensions, like pg_query_state, to do their job? Do you imagine adding a hook which enables adding custom interrupt codes like below? https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch No, I think around the hook, which would allow us to rewrite the pg_query_state extension without additional patches by using the functionality provided by your patch. I mean, an extension could provide console UI, not only server logging. And obtain from target backend so much information in the explain as the instrumentation level of the current query can give. It may make pg_query_state shorter and more stable. 2. In this implementation, ProcessInterrupts does a lot of work during the explain creation: memory allocations, pass through the plan, systables locks, syscache access, etc. I guess it can change the semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be called - I can imagine a SELECT query, which would call a stored procedure, which executes some DDL and acquires row exclusive lock at the time of interruption. So, in my mind, it is too unpredictable to make the explain in the place of interruption processing. Maybe it is better to think about some hook at the end of ExecProcNode, where a pending explain could be created? Yeah, I withdrew this patch once for that reason, but we are resuming development in response to the results of a discussion by James and others at this year's pgcon about the safety of this process in CFI: https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com I can't track the logic path of the decision provided at this conference. But my anxiety related to specific place, where ActiveQueryDesc points top-level query and interruption comes during DDL, wrapped up in stored procedure. For example: CREATE TABLE test(); CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); ... END; $$ LANGUAGE plpgsql VOLATILE; SELECT ddl(), ... FROM ...; Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on all CFI using v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then ran the following query but did not cause any problems. ``` =# CREATE TABLE test(); =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql VOLATILE; =# SELECT ddl(); ``` Is this the case you're worrying about? -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
On Tue, Sep 19, 2023, at 8:39 PM, torikoshia wrote: > On 2023-09-15 15:21, Lepikhov Andrei wrote: >> On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote: >> I have explored this patch and, by and large, agree with the code. But >> some questions I want to discuss: >> 1. Maybe add a hook to give a chance for extensions, like >> pg_query_state, to do their job? > > Do you imagine adding a hook which enables adding custom interrupt codes > like below? > > https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch No, I think around the hook, which would allow us to rewrite the pg_query_state extension without additional patches by using the functionality provided by your patch. I mean, an extension could provide console UI, not only server logging. And obtain from target backend so much information in the explain as the instrumentation level of the current query can give. It may make pg_query_state shorter and more stable. >> 2. In this implementation, ProcessInterrupts does a lot of work during >> the explain creation: memory allocations, pass through the plan, >> systables locks, syscache access, etc. I guess it can change the >> semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be >> called - I can imagine a SELECT query, which would call a stored >> procedure, which executes some DDL and acquires row exclusive lock at >> the time of interruption. So, in my mind, it is too unpredictable to >> make the explain in the place of interruption processing. Maybe it is >> better to think about some hook at the end of ExecProcNode, where a >> pending explain could be created? > > Yeah, I withdrew this patch once for that reason, but we are resuming > development in response to the results of a discussion by James and > others at this year's pgcon about the safety of this process in CFI: > > https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com I can't track the logic path of the decision provided at this conference. But my anxiety related to specific place, where ActiveQueryDesc points top-level query and interruption comes during DDL, wrapped up in stored procedure. For example: CREATE TABLE test(); CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); ... END; $$ LANGUAGE plpgsql VOLATILE; SELECT ddl(), ... FROM ...; > BTW it seems pg_query_state also enables users to get running query > plans using CFI. > Does pg_query_state do something for this safety concern? No, and I'm looking for the solution, which could help to rewrite pg_query_state as a clean extension, without patches. >> Also, I suggest minor code change with the diff in attachment. > > Thanks! > > This might be biased opinion and objections are welcomed, but I feel the > original patch is easier to read since PG_RETURN_BOOL(true/false) is > located in near place to each cases. > Also the existing function pg_log_backend_memory_contexts(), which does > the same thing, has the same form as the original patch. I got it, thank you. -- Regards, Andrei Lepikhov
Re: RFC: Logging plan of the running query
On 2023-09-15 15:21, Lepikhov Andrei wrote: On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote: On 2023-09-06 11:17, James Coleman wrote: It seems that we can know what queries were running from the stack traces you shared. As described above, I suspect a lock which was acquired prior to ProcessLogQueryPlanInterrupt() caused the issue. I will try a little more to see if I can devise a way to create the same situation. Hi, I have explored this patch and, by and large, agree with the code. But some questions I want to discuss: 1. Maybe add a hook to give a chance for extensions, like pg_query_state, to do their job? Do you imagine adding a hook which enables adding custom interrupt codes like below? https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch If so, that would be possible, but this patch doesn't require the functionality and I feel it'd be better doing in independent patch. 2. In this implementation, ProcessInterrupts does a lot of work during the explain creation: memory allocations, pass through the plan, systables locks, syscache access, etc. I guess it can change the semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be called - I can imagine a SELECT query, which would call a stored procedure, which executes some DDL and acquires row exclusive lock at the time of interruption. So, in my mind, it is too unpredictable to make the explain in the place of interruption processing. Maybe it is better to think about some hook at the end of ExecProcNode, where a pending explain could be created? Yeah, I withdrew this patch once for that reason, but we are resuming development in response to the results of a discussion by James and others at this year's pgcon about the safety of this process in CFI: https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com BTW it seems pg_query_state also enables users to get running query plans using CFI. Does pg_query_state do something for this safety concern? Also, I suggest minor code change with the diff in attachment. Thanks! This might be biased opinion and objections are welcomed, but I feel the original patch is easier to read since PG_RETURN_BOOL(true/false) is located in near place to each cases. Also the existing function pg_log_backend_memory_contexts(), which does the same thing, has the same form as the original patch. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote: > On 2023-09-06 11:17, James Coleman wrote: > It seems that we can know what queries were running from the stack > traces you shared. > As described above, I suspect a lock which was acquired prior to > ProcessLogQueryPlanInterrupt() caused the issue. > I will try a little more to see if I can devise a way to create the same > situation. Hi, I have explored this patch and, by and large, agree with the code. But some questions I want to discuss: 1. Maybe add a hook to give a chance for extensions, like pg_query_state, to do their job? 2. In this implementation, ProcessInterrupts does a lot of work during the explain creation: memory allocations, pass through the plan, systables locks, syscache access, etc. I guess it can change the semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be called - I can imagine a SELECT query, which would call a stored procedure, which executes some DDL and acquires row exclusive lock at the time of interruption. So, in my mind, it is too unpredictable to make the explain in the place of interruption processing. Maybe it is better to think about some hook at the end of ExecProcNode, where a pending explain could be created? Also, I suggest minor code change with the diff in attachment. -- Regards, Andrei Lepikhov improve.diff Description: Binary data
Re: RFC: Logging plan of the running query
On 2023-09-06 11:17, James Coleman wrote: > I've never been able to reproduce it (haven't tested the new version, > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian > (first buster and now bullseye). > > I'm attaching several stacktraces in the hope that they provide some > clues. These all match the ps output I sent earlier, though note in > that output there is both the regress instance and my test instance > (pid 3213249) running (different ports, of course, and they are from > the exact same compilation run). I've attached ps output for the > postgres processes under the make check process to simplify cross > referencing. > > A few interesting things: > - There's definitely a lock on a relation that seems to be what's > blocking the processes. > - When I try to connect with psql the process forks but then hangs > (see the ps output with task names stuck in "authentication"). I've > also included a trace from one of these. Thanks for sharing them! Many processes are waiting to acquire the LW lock, including the process trying to output the plan(select1.trace). I suspect that this is due to a lock that was acquired prior to being interrupted by ProcessLogQueryPlanInterrupt(), but have not been able to reproduce the same situation.. I don't have time immediately to dig into this, but perhaps loading up the core dumps would allow us to see what query is running in each backend process (if it hasn't already been discarded by that point) and thereby determine what point in each test process led to the error condition? Thanks for the suggestion. I am concerned that core dumps may not be readable on different operating systems or other environments. (Unfortunately, I do not have Debian on hand) It seems that we can know what queries were running from the stack traces you shared. As described above, I suspect a lock which was acquired prior to ProcessLogQueryPlanInterrupt() caused the issue. I will try a little more to see if I can devise a way to create the same situation. Alternatively we might be able to apply the same trick to the test client instead... BTW, for my own easy reference in this thread: relid 1259 is pg_class if I'm not mistaken. Yeah, and I think it's strange that the lock to 1259 appears twice and must be avoided. #10 0x559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at lmgr.c:117 .. #49 0x559d61b4989d in ProcessLogQueryPlanInterrupt () at explain.c:5158 .. #53 0x559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at lmgr.c:117 -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: RFC: Logging plan of the running query
On Tue, Sep 5, 2023 at 9:59 AM torikoshia wrote: > > On 2023-08-28 22:47, James Coleman wrote: > > On Mon, Aug 28, 2023 at 3:01 AM torikoshia > > wrote: > >> > >> On 2023-08-26 21:03, James Coleman wrote: > >> > On Fri, Aug 25, 2023 at 7:43 AM James Coleman wrote: > >> >> > >> >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia > >> >> wrote: > >> >> > > >> >> > On 2023-06-16 01:34, James Coleman wrote: > >> >> > > Attached is v28 > >> >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > >> >> > > when necessary. Once built with those two patches I'm simply running > >> >> > > `make check`. > >> >> > > >> >> > With v28-0001 and v28-0002 patch, I confirmed backend processes > >> >> > consume > >> >> > huge > >> >> > amount of memory and under some environments they were terminated by > >> >> > OOM > >> >> > killer. > >> >> > > >> >> > This was because memory was allocated from existing memory contexts > >> >> > and > >> >> > they > >> >> > were not freed after ProcessLogQueryPlanInterrupt(). > >> >> > Updated the patch to use dedicated memory context for > >> >> > ProcessLogQueryPlanInterrupt(). > >> >> > > >> >> > Applying attached patch and v28-0002 patch, `make check` successfully > >> >> > completed after 20min and 50GB of logs on my environment. > >> >> > > >> >> > >>> On 2023-06-15 01:48, James Coleman wrote: > >> >> > >>> > The tests have been running since last night, but have been > >> >> > >>> > apparently > >> >> > >>> > hung now for many hours. > >> >> > > >> >> > I don't know if this has anything to do with the hung you faced, but I > >> >> > thought > >> >> > it might be possible that the large amount of memory usage resulted in > >> >> > swapping, which caused a significant delay in processing. > >> >> > >> >> Ah, yes, I think that could be a possible explanation. I was delaying > >> >> on this thread because I wasn't comfortable with having caused an > >> >> issue once (even if I couldn't easily reproduce) without at least some > >> >> theory as to the cause (and a fix). > >> >> > >> >> > If possible, I would be very grateful if you could try to reproduce > >> >> > this > >> >> > with > >> >> > the v29 patch. > >> >> > >> >> I'll kick off some testing. > >> >> > >> > > >> > I don't have time to investigate what's happening here, but 24 hours > >> > later the first "make check" is still running, and at first glance it > >> > seems to have the same behavior I'd seen that first time. The test > >> > output is to this point: > >> > > >> > # parallel group (5 tests): index_including create_view > >> > index_including_gist create_index create_index_spgist > >> > ok 66+ create_index26365 ms > >> > ok 67+ create_index_spgist 27675 ms > >> > ok 68+ create_view 1235 ms > >> > ok 69+ index_including 1102 ms > >> > ok 70+ index_including_gist 1633 ms > >> > # parallel group (16 tests): create_aggregate create_cast errors > >> > roleattributes drop_if_exists hash_func typed_table create_am > >> > infinite_recurse > >> > > >> > and it hasn't progressed past that point since at least ~16 hours ago > >> > (the first several hours of the run I wasn't monitoring it). > >> > > >> > I haven't connected up gdb yet, and won't be able to until maybe > >> > tomorrow, but here's the ps output for postgres processes that are > >> > running: > >> > > >> > admin3213249 0.0 0.0 196824 20552 ?Ss Aug25 0:00 > >> > /home/admin/postgresql-test/bin/postgres -D > >> > /home/admin/postgresql-test-data > >> > admin3213250 0.0 0.0 196964 7284 ?Ss Aug25 0:00 > >> > postgres: checkpointer > >> > admin3213251 0.0 0.0 196956 4276 ?Ss Aug25 0:00 > >> > postgres: background writer > >> > admin3213253 0.0 0.0 196956 8600 ?Ss Aug25 0:00 > >> > postgres: walwriter > >> > admin3213254 0.0 0.0 198424 7316 ?Ss Aug25 0:00 > >> > postgres: autovacuum launcher > >> > admin3213255 0.0 0.0 198412 5488 ?Ss Aug25 0:00 > >> > postgres: logical replication launcher > >> > admin3237967 0.0 0.0 2484 516 pts/4S+ Aug25 0:00 > >> > /bin/sh -c echo "# +++ regress check in src/test/regress +++" && > >> > PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH" > >> > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib" > >> > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template > >> > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check > >> > --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 > >> > --schedule=./parallel_schedule > >> > admin3237973 0.0 0.0 197880 20688 pts/4S+ Aug25 0:00 > >> > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c > >> > listen_addresses= -k
Re: RFC: Logging plan of the running query
On 2023-08-28 22:47, James Coleman wrote: On Mon, Aug 28, 2023 at 3:01 AM torikoshia wrote: On 2023-08-26 21:03, James Coleman wrote: > On Fri, Aug 25, 2023 at 7:43 AM James Coleman wrote: >> >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia >> wrote: >> > >> > On 2023-06-16 01:34, James Coleman wrote: >> > > Attached is v28 >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish >> > > when necessary. Once built with those two patches I'm simply running >> > > `make check`. >> > >> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume >> > huge >> > amount of memory and under some environments they were terminated by OOM >> > killer. >> > >> > This was because memory was allocated from existing memory contexts and >> > they >> > were not freed after ProcessLogQueryPlanInterrupt(). >> > Updated the patch to use dedicated memory context for >> > ProcessLogQueryPlanInterrupt(). >> > >> > Applying attached patch and v28-0002 patch, `make check` successfully >> > completed after 20min and 50GB of logs on my environment. >> > >> > >>> On 2023-06-15 01:48, James Coleman wrote: >> > >>> > The tests have been running since last night, but have been apparently >> > >>> > hung now for many hours. >> > >> > I don't know if this has anything to do with the hung you faced, but I >> > thought >> > it might be possible that the large amount of memory usage resulted in >> > swapping, which caused a significant delay in processing. >> >> Ah, yes, I think that could be a possible explanation. I was delaying >> on this thread because I wasn't comfortable with having caused an >> issue once (even if I couldn't easily reproduce) without at least some >> theory as to the cause (and a fix). >> >> > If possible, I would be very grateful if you could try to reproduce this >> > with >> > the v29 patch. >> >> I'll kick off some testing. >> > > I don't have time to investigate what's happening here, but 24 hours > later the first "make check" is still running, and at first glance it > seems to have the same behavior I'd seen that first time. The test > output is to this point: > > # parallel group (5 tests): index_including create_view > index_including_gist create_index create_index_spgist > ok 66+ create_index26365 ms > ok 67+ create_index_spgist 27675 ms > ok 68+ create_view 1235 ms > ok 69+ index_including 1102 ms > ok 70+ index_including_gist 1633 ms > # parallel group (16 tests): create_aggregate create_cast errors > roleattributes drop_if_exists hash_func typed_table create_am > infinite_recurse > > and it hasn't progressed past that point since at least ~16 hours ago > (the first several hours of the run I wasn't monitoring it). > > I haven't connected up gdb yet, and won't be able to until maybe > tomorrow, but here's the ps output for postgres processes that are > running: > > admin3213249 0.0 0.0 196824 20552 ?Ss Aug25 0:00 > /home/admin/postgresql-test/bin/postgres -D > /home/admin/postgresql-test-data > admin3213250 0.0 0.0 196964 7284 ?Ss Aug25 0:00 > postgres: checkpointer > admin3213251 0.0 0.0 196956 4276 ?Ss Aug25 0:00 > postgres: background writer > admin3213253 0.0 0.0 196956 8600 ?Ss Aug25 0:00 > postgres: walwriter > admin3213254 0.0 0.0 198424 7316 ?Ss Aug25 0:00 > postgres: autovacuum launcher > admin3213255 0.0 0.0 198412 5488 ?Ss Aug25 0:00 > postgres: logical replication launcher > admin3237967 0.0 0.0 2484 516 pts/4S+ Aug25 0:00 > /bin/sh -c echo "# +++ regress check in src/test/regress +++" && > PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH" > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib" > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check > --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 > --schedule=./parallel_schedule > admin3237973 0.0 0.0 197880 20688 pts/4S+ Aug25 0:00 > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c > listen_addresses= -k /tmp/pg_regress-7mmGUa > admin3237976 0.0 0.1 198332 44608 ?Ss Aug25 0:00 > postgres: checkpointer > admin3237977 0.0 0.0 198032 4640 ?Ss Aug25 0:00 > postgres: background writer > admin3237979 0.0 0.0 197880 8580 ?Ss Aug25 0:00 > postgres: walwriter > admin3237980 0.0 0.0 199484 7608 ?Ss Aug25 0:00 > postgres: autovacuum launcher > admin3237981 0.0 0.0 199460 5488 ?Ss Aug25 0:00 > postgres: logical replication launcher > admin3243644 0.0 0.2 252400 74656 ?Ss Aug25 0:01 > postgres:
Re: RFC: Logging plan of the running query
On Mon, Aug 28, 2023 at 3:01 AM torikoshia wrote: > > On 2023-08-26 21:03, James Coleman wrote: > > On Fri, Aug 25, 2023 at 7:43 AM James Coleman wrote: > >> > >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia > >> wrote: > >> > > >> > On 2023-06-16 01:34, James Coleman wrote: > >> > > Attached is v28 > >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > >> > > when necessary. Once built with those two patches I'm simply running > >> > > `make check`. > >> > > >> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume > >> > huge > >> > amount of memory and under some environments they were terminated by OOM > >> > killer. > >> > > >> > This was because memory was allocated from existing memory contexts and > >> > they > >> > were not freed after ProcessLogQueryPlanInterrupt(). > >> > Updated the patch to use dedicated memory context for > >> > ProcessLogQueryPlanInterrupt(). > >> > > >> > Applying attached patch and v28-0002 patch, `make check` successfully > >> > completed after 20min and 50GB of logs on my environment. > >> > > >> > >>> On 2023-06-15 01:48, James Coleman wrote: > >> > >>> > The tests have been running since last night, but have been > >> > >>> > apparently > >> > >>> > hung now for many hours. > >> > > >> > I don't know if this has anything to do with the hung you faced, but I > >> > thought > >> > it might be possible that the large amount of memory usage resulted in > >> > swapping, which caused a significant delay in processing. > >> > >> Ah, yes, I think that could be a possible explanation. I was delaying > >> on this thread because I wasn't comfortable with having caused an > >> issue once (even if I couldn't easily reproduce) without at least some > >> theory as to the cause (and a fix). > >> > >> > If possible, I would be very grateful if you could try to reproduce this > >> > with > >> > the v29 patch. > >> > >> I'll kick off some testing. > >> > > > > I don't have time to investigate what's happening here, but 24 hours > > later the first "make check" is still running, and at first glance it > > seems to have the same behavior I'd seen that first time. The test > > output is to this point: > > > > # parallel group (5 tests): index_including create_view > > index_including_gist create_index create_index_spgist > > ok 66+ create_index26365 ms > > ok 67+ create_index_spgist 27675 ms > > ok 68+ create_view 1235 ms > > ok 69+ index_including 1102 ms > > ok 70+ index_including_gist 1633 ms > > # parallel group (16 tests): create_aggregate create_cast errors > > roleattributes drop_if_exists hash_func typed_table create_am > > infinite_recurse > > > > and it hasn't progressed past that point since at least ~16 hours ago > > (the first several hours of the run I wasn't monitoring it). > > > > I haven't connected up gdb yet, and won't be able to until maybe > > tomorrow, but here's the ps output for postgres processes that are > > running: > > > > admin3213249 0.0 0.0 196824 20552 ?Ss Aug25 0:00 > > /home/admin/postgresql-test/bin/postgres -D > > /home/admin/postgresql-test-data > > admin3213250 0.0 0.0 196964 7284 ?Ss Aug25 0:00 > > postgres: checkpointer > > admin3213251 0.0 0.0 196956 4276 ?Ss Aug25 0:00 > > postgres: background writer > > admin3213253 0.0 0.0 196956 8600 ?Ss Aug25 0:00 > > postgres: walwriter > > admin3213254 0.0 0.0 198424 7316 ?Ss Aug25 0:00 > > postgres: autovacuum launcher > > admin3213255 0.0 0.0 198412 5488 ?Ss Aug25 0:00 > > postgres: logical replication launcher > > admin3237967 0.0 0.0 2484 516 pts/4S+ Aug25 0:00 > > /bin/sh -c echo "# +++ regress check in src/test/regress +++" && > > PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH" > > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib" > > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template > > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check > > --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 > > --schedule=./parallel_schedule > > admin3237973 0.0 0.0 197880 20688 pts/4S+ Aug25 0:00 > > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c > > listen_addresses= -k /tmp/pg_regress-7mmGUa > > admin3237976 0.0 0.1 198332 44608 ?Ss Aug25 0:00 > > postgres: checkpointer > > admin3237977 0.0 0.0 198032 4640 ?Ss Aug25 0:00 > > postgres: background writer > > admin3237979 0.0 0.0 197880 8580 ?Ss Aug25 0:00 > > postgres: walwriter > > admin3237980 0.0 0.0 199484 7608 ?Ss Aug25 0:00 > > postgres: autovacuum launcher > > admin
Re: RFC: Logging plan of the running query
On 2023-08-26 21:03, James Coleman wrote: On Fri, Aug 25, 2023 at 7:43 AM James Coleman wrote: On Thu, Aug 17, 2023 at 10:02 AM torikoshia wrote: > > On 2023-06-16 01:34, James Coleman wrote: > > Attached is v28 > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > > when necessary. Once built with those two patches I'm simply running > > `make check`. > > With v28-0001 and v28-0002 patch, I confirmed backend processes consume > huge > amount of memory and under some environments they were terminated by OOM > killer. > > This was because memory was allocated from existing memory contexts and > they > were not freed after ProcessLogQueryPlanInterrupt(). > Updated the patch to use dedicated memory context for > ProcessLogQueryPlanInterrupt(). > > Applying attached patch and v28-0002 patch, `make check` successfully > completed after 20min and 50GB of logs on my environment. > > >>> On 2023-06-15 01:48, James Coleman wrote: > >>> > The tests have been running since last night, but have been apparently > >>> > hung now for many hours. > > I don't know if this has anything to do with the hung you faced, but I > thought > it might be possible that the large amount of memory usage resulted in > swapping, which caused a significant delay in processing. Ah, yes, I think that could be a possible explanation. I was delaying on this thread because I wasn't comfortable with having caused an issue once (even if I couldn't easily reproduce) without at least some theory as to the cause (and a fix). > If possible, I would be very grateful if you could try to reproduce this > with > the v29 patch. I'll kick off some testing. I don't have time to investigate what's happening here, but 24 hours later the first "make check" is still running, and at first glance it seems to have the same behavior I'd seen that first time. The test output is to this point: # parallel group (5 tests): index_including create_view index_including_gist create_index create_index_spgist ok 66+ create_index26365 ms ok 67+ create_index_spgist 27675 ms ok 68+ create_view 1235 ms ok 69+ index_including 1102 ms ok 70+ index_including_gist 1633 ms # parallel group (16 tests): create_aggregate create_cast errors roleattributes drop_if_exists hash_func typed_table create_am infinite_recurse and it hasn't progressed past that point since at least ~16 hours ago (the first several hours of the run I wasn't monitoring it). I haven't connected up gdb yet, and won't be able to until maybe tomorrow, but here's the ps output for postgres processes that are running: admin3213249 0.0 0.0 196824 20552 ?Ss Aug25 0:00 /home/admin/postgresql-test/bin/postgres -D /home/admin/postgresql-test-data admin3213250 0.0 0.0 196964 7284 ?Ss Aug25 0:00 postgres: checkpointer admin3213251 0.0 0.0 196956 4276 ?Ss Aug25 0:00 postgres: background writer admin3213253 0.0 0.0 196956 8600 ?Ss Aug25 0:00 postgres: walwriter admin3213254 0.0 0.0 198424 7316 ?Ss Aug25 0:00 postgres: autovacuum launcher admin3213255 0.0 0.0 198412 5488 ?Ss Aug25 0:00 postgres: logical replication launcher admin3237967 0.0 0.0 2484 516 pts/4S+ Aug25 0:00 /bin/sh -c echo "# +++ regress check in src/test/regress +++" && PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH" LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib" INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template ../../../src/test/regress/pg_regress --temp-instance=./tmp_check --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 --schedule=./parallel_schedule admin3237973 0.0 0.0 197880 20688 pts/4S+ Aug25 0:00 postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-7mmGUa admin3237976 0.0 0.1 198332 44608 ?Ss Aug25 0:00 postgres: checkpointer admin3237977 0.0 0.0 198032 4640 ?Ss Aug25 0:00 postgres: background writer admin3237979 0.0 0.0 197880 8580 ?Ss Aug25 0:00 postgres: walwriter admin3237980 0.0 0.0 199484 7608 ?Ss Aug25 0:00 postgres: autovacuum launcher admin3237981 0.0 0.0 199460 5488 ?Ss Aug25 0:00 postgres: logical replication launcher admin3243644 0.0 0.2 252400 74656 ?Ss Aug25 0:01 postgres: admin regression [local] SELECT waiting admin3243645 0.0 0.1 205480 33992 ?Ss Aug25 0:00 postgres: admin regression [local] SELECT waiting admin3243654 99.9 0.0 203156 31504 ?Rs Aug25 1437:49 postgres: admin regression [local] VACUUM admin3243655 0.0 0.1 212036 38504 ?Ss Aug25 0:00
Re: RFC: Logging plan of the running query
On Fri, Aug 25, 2023 at 7:43 AM James Coleman wrote: > > On Thu, Aug 17, 2023 at 10:02 AM torikoshia > wrote: > > > > On 2023-06-16 01:34, James Coleman wrote: > > > Attached is v28 > > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > > > when necessary. Once built with those two patches I'm simply running > > > `make check`. > > > > With v28-0001 and v28-0002 patch, I confirmed backend processes consume > > huge > > amount of memory and under some environments they were terminated by OOM > > killer. > > > > This was because memory was allocated from existing memory contexts and > > they > > were not freed after ProcessLogQueryPlanInterrupt(). > > Updated the patch to use dedicated memory context for > > ProcessLogQueryPlanInterrupt(). > > > > Applying attached patch and v28-0002 patch, `make check` successfully > > completed after 20min and 50GB of logs on my environment. > > > > >>> On 2023-06-15 01:48, James Coleman wrote: > > >>> > The tests have been running since last night, but have been apparently > > >>> > hung now for many hours. > > > > I don't know if this has anything to do with the hung you faced, but I > > thought > > it might be possible that the large amount of memory usage resulted in > > swapping, which caused a significant delay in processing. > > Ah, yes, I think that could be a possible explanation. I was delaying > on this thread because I wasn't comfortable with having caused an > issue once (even if I couldn't easily reproduce) without at least some > theory as to the cause (and a fix). > > > If possible, I would be very grateful if you could try to reproduce this > > with > > the v29 patch. > > I'll kick off some testing. > I don't have time to investigate what's happening here, but 24 hours later the first "make check" is still running, and at first glance it seems to have the same behavior I'd seen that first time. The test output is to this point: # parallel group (5 tests): index_including create_view index_including_gist create_index create_index_spgist ok 66+ create_index26365 ms ok 67+ create_index_spgist 27675 ms ok 68+ create_view 1235 ms ok 69+ index_including 1102 ms ok 70+ index_including_gist 1633 ms # parallel group (16 tests): create_aggregate create_cast errors roleattributes drop_if_exists hash_func typed_table create_am infinite_recurse and it hasn't progressed past that point since at least ~16 hours ago (the first several hours of the run I wasn't monitoring it). I haven't connected up gdb yet, and won't be able to until maybe tomorrow, but here's the ps output for postgres processes that are running: admin3213249 0.0 0.0 196824 20552 ?Ss Aug25 0:00 /home/admin/postgresql-test/bin/postgres -D /home/admin/postgresql-test-data admin3213250 0.0 0.0 196964 7284 ?Ss Aug25 0:00 postgres: checkpointer admin3213251 0.0 0.0 196956 4276 ?Ss Aug25 0:00 postgres: background writer admin3213253 0.0 0.0 196956 8600 ?Ss Aug25 0:00 postgres: walwriter admin3213254 0.0 0.0 198424 7316 ?Ss Aug25 0:00 postgres: autovacuum launcher admin3213255 0.0 0.0 198412 5488 ?Ss Aug25 0:00 postgres: logical replication launcher admin3237967 0.0 0.0 2484 516 pts/4S+ Aug25 0:00 /bin/sh -c echo "# +++ regress check in src/test/regress +++" && PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH" LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib" INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template ../../../src/test/regress/pg_regress --temp-instance=./tmp_check --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 --schedule=./parallel_schedule admin3237973 0.0 0.0 197880 20688 pts/4S+ Aug25 0:00 postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-7mmGUa admin3237976 0.0 0.1 198332 44608 ?Ss Aug25 0:00 postgres: checkpointer admin3237977 0.0 0.0 198032 4640 ?Ss Aug25 0:00 postgres: background writer admin3237979 0.0 0.0 197880 8580 ?Ss Aug25 0:00 postgres: walwriter admin3237980 0.0 0.0 199484 7608 ?Ss Aug25 0:00 postgres: autovacuum launcher admin3237981 0.0 0.0 199460 5488 ?Ss Aug25 0:00 postgres: logical replication launcher admin3243644 0.0 0.2 252400 74656 ?Ss Aug25 0:01 postgres: admin regression [local] SELECT waiting admin3243645 0.0 0.1 205480 33992 ?Ss Aug25 0:00 postgres: admin regression [local] SELECT waiting admin3243654 99.9 0.0 203156 31504 ?Rs Aug25 1437:49 postgres: admin regression [local] VACUUM admin3243655 0.0 0.1 212036
Re: RFC: Logging plan of the running query
On Thu, Aug 17, 2023 at 10:02 AM torikoshia wrote: > > On 2023-06-16 01:34, James Coleman wrote: > > Attached is v28 > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > > when necessary. Once built with those two patches I'm simply running > > `make check`. > > With v28-0001 and v28-0002 patch, I confirmed backend processes consume > huge > amount of memory and under some environments they were terminated by OOM > killer. > > This was because memory was allocated from existing memory contexts and > they > were not freed after ProcessLogQueryPlanInterrupt(). > Updated the patch to use dedicated memory context for > ProcessLogQueryPlanInterrupt(). > > Applying attached patch and v28-0002 patch, `make check` successfully > completed after 20min and 50GB of logs on my environment. > > >>> On 2023-06-15 01:48, James Coleman wrote: > >>> > The tests have been running since last night, but have been apparently > >>> > hung now for many hours. > > I don't know if this has anything to do with the hung you faced, but I > thought > it might be possible that the large amount of memory usage resulted in > swapping, which caused a significant delay in processing. Ah, yes, I think that could be a possible explanation. I was delaying on this thread because I wasn't comfortable with having caused an issue once (even if I couldn't easily reproduce) without at least some theory as to the cause (and a fix). > If possible, I would be very grateful if you could try to reproduce this > with > the v29 patch. I'll kick off some testing. Thanks, James Coleman
Re: RFC: Logging plan of the running query
On 2023-06-16 01:34, James Coleman wrote: Attached is v28 which sets ProcessLogQueryPlanInterruptActive to false in errfinish when necessary. Once built with those two patches I'm simply running `make check`. With v28-0001 and v28-0002 patch, I confirmed backend processes consume huge amount of memory and under some environments they were terminated by OOM killer. This was because memory was allocated from existing memory contexts and they were not freed after ProcessLogQueryPlanInterrupt(). Updated the patch to use dedicated memory context for ProcessLogQueryPlanInterrupt(). Applying attached patch and v28-0002 patch, `make check` successfully completed after 20min and 50GB of logs on my environment. On 2023-06-15 01:48, James Coleman wrote: > The tests have been running since last night, but have been apparently > hung now for many hours. I don't know if this has anything to do with the hung you faced, but I thought it might be possible that the large amount of memory usage resulted in swapping, which caused a significant delay in processing. If possible, I would be very grateful if you could try to reproduce this with the v29 patch. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom edadc6a9690a246209ccfbfec28af82a05f49a35 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 17 Aug 2023 22:11:31 +0900 Subject: [PATCH v29] Add function to log the plan of the query Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro Horiguchi, Robert Treat, Alena Rybakina Co-authored-by: James Coleman --- doc/src/sgml/func.sgml | 49 ++ src/backend/access/transam/xact.c| 13 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 164 ++- src/backend/executor/execMain.c | 14 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 4 + src/backend/utils/error/elog.c | 2 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 4 + src/include/miscadmin.h | 1 + src/include/storage/lock.h | 2 - src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h| 2 +- src/include/utils/elog.h | 1 + src/test/regress/expected/misc_functions.out | 54 -- src/test/regress/sql/misc_functions.sql | 41 +++-- 19 files changed, 347 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index be2f54c914..d2ead014d2 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -26243,6 +26243,25 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID. +It will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -26463,6 +26482,36 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_query_plan can be used +to log the plan of a backend process. For example: + +postgres=# SELECT pg_log_query_plan(201116); + pg_log_query_plan +--- + t +(1 row) + +The format of the query plan is the same as when VERBOSE, +COSTS, SETTINGS and +FORMAT TEXT are used in the EXPLAIN +command. For example: + +LOG: query plan running on backend with PID 201116 is: +Query Text: SELECT * FROM pgbench_accounts; +Seq Scan on public.pgbench_accounts (cost=0.00..52787.00 rows=200
Re: RFC: Logging plan of the running query
On 2023-06-16 01:34, James Coleman wrote: On Thu, Jun 15, 2023 at 9:00 AM torikoshia wrote: On 2023-06-15 01:48, James Coleman wrote: > On Tue, Jun 13, 2023 at 11:53 AM James Coleman > wrote: >> >> ... >> I'm going to re-run tests with my patch version + resetting the flag >> on SIGINT (and any other error condition) to be certain that the issue >> you uncovered (where backends get stuck after a SIGINT not responding >> to the requested plan logging) wasn't masking any other issues. >> >> As long as that run is clean also then I believe the patch is safe >> as-is even without the re-entrancy guard. >> >> I'll report back with the results of that testing. > > The tests have been running since last night, but have been apparently > hung now for many hours. I haven't been able to fully look into it, > but so far I know the hung (100% CPU) backend last logged this: > > 2023-06-14 02:00:30.045 UTC client backend[84461] > pg_regress/updatable_views LOG: query plan running on backend with > PID 84461 is: > Query Text: SELECT table_name, column_name, is_updatable > FROM information_schema.columns > WHERE table_name LIKE E'r_\\_view%' > ORDER BY table_name, ordinal_position; > > The last output from the regression test harness was: > > # parallel group (5 tests): index_including create_view > index_including_gist create_index create_index_spgist > ok 66+ create_index36508 ms > ok 67+ create_index_spgist 38588 ms > ok 68+ create_view 1394 ms > ok 69+ index_including 654 ms > ok 70+ index_including_gist 1701 ms > # parallel group (16 tests): errors create_cast drop_if_exists > create_aggregate roleattributes constraints hash_func typed_table > infinite_recurse > > Attaching gdb to the hung backend shows this: > > #0 0x5601ab1f9529 in ProcLockWakeup > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > , lock=lock@entry=0x7f5325c913f0) at proc.c:1655 > #1 0x5601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0, > proclock=proclock@entry=0x7f5325d40d60, > lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > , > hashcode=hashcode@entry=573498161, wakeupNeeded=) > at lock.c:1673 > #2 0x5601ab1e9e21 in LockRefindAndRelease > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > , proc=, > locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1, > > decrement_strong_lock_count=decrement_strong_lock_count@entry=false) > at lock.c:3150 > #3 0x5601ab1edb27 in LockReleaseAll > (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295 > #4 0x5601ab1f8599 in ProcReleaseLocks > (isCommit=isCommit@entry=true) at proc.c:781 > #5 0x5601ab37f1f4 in ResourceOwnerReleaseInternal > (owner=, phase=phase@entry=RESOURCE_RELEASE_LOCKS, > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at > resowner.c:618 > #6 0x5601ab37f7b7 in ResourceOwnerRelease (owner=, > phase=phase@entry=RESOURCE_RELEASE_LOCKS, > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at > resowner.c:494 > #7 0x5601aaec1d84 in CommitTransaction () at xact.c:2334 > #8 0x5601aaec2b22 in CommitTransactionCommand () at xact.c:3067 > #9 0x5601ab200a66 in finish_xact_command () at postgres.c:2783 > #10 0x5601ab20338f in exec_simple_query ( > query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name, > column_name, is_updatable\n FROM information_schema.columns\n WHERE > table_name LIKE E'r__view%'\n ORDER BY table_name, > ordinal_position;") at postgres.c:1300 > > I am unable to connect to the regression test Postgres instance -- > psql just hangs, so the lock seems to have affected the postmaster > also. > > I'm wondering if this might represent a bug in the current patch. Thanks for running and analyzing the test! Sure thing! Could you share me how you are running the test? I imagined something like below, but currently couldn't reproduce it. - apply both v26-0001 and v27-0002 and build - run PostgreSQL with default GUCssaaa - make installcheck-world - run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1' during make installcheck-world Apologies, I should have attached my updated patch (with the fix for the bug you'd reporting with the re-entrancy guard). Attached is v28 which sets ProcessLogQueryPlanInterruptActive to false in errfinish when necessary. Once built with those two patches I'm simply running `make check`. Thanks! However, I haven't succeeded in reproducing the problem as below: The tests have been running since last night, but have been apparently hung now for many hours. I haven't been able to fully look into it, but so far I know the hung (100% CPU) backend last logged this: Did you do something like this? $ ./configure --prefix=/home/ubuntu/pgsql/master --enable-cassert $ git apply
Re: RFC: Logging plan of the running query
On Thu, Jun 15, 2023 at 9:00 AM torikoshia wrote: > > On 2023-06-15 01:48, James Coleman wrote: > > On Tue, Jun 13, 2023 at 11:53 AM James Coleman > > wrote: > >> > >> ... > >> I'm going to re-run tests with my patch version + resetting the flag > >> on SIGINT (and any other error condition) to be certain that the issue > >> you uncovered (where backends get stuck after a SIGINT not responding > >> to the requested plan logging) wasn't masking any other issues. > >> > >> As long as that run is clean also then I believe the patch is safe > >> as-is even without the re-entrancy guard. > >> > >> I'll report back with the results of that testing. > > > > The tests have been running since last night, but have been apparently > > hung now for many hours. I haven't been able to fully look into it, > > but so far I know the hung (100% CPU) backend last logged this: > > > > 2023-06-14 02:00:30.045 UTC client backend[84461] > > pg_regress/updatable_views LOG: query plan running on backend with > > PID 84461 is: > > Query Text: SELECT table_name, column_name, is_updatable > > FROM information_schema.columns > > WHERE table_name LIKE E'r_\\_view%' > > ORDER BY table_name, ordinal_position; > > > > The last output from the regression test harness was: > > > > # parallel group (5 tests): index_including create_view > > index_including_gist create_index create_index_spgist > > ok 66+ create_index36508 ms > > ok 67+ create_index_spgist 38588 ms > > ok 68+ create_view 1394 ms > > ok 69+ index_including 654 ms > > ok 70+ index_including_gist 1701 ms > > # parallel group (16 tests): errors create_cast drop_if_exists > > create_aggregate roleattributes constraints hash_func typed_table > > infinite_recurse > > > > Attaching gdb to the hung backend shows this: > > > > #0 0x5601ab1f9529 in ProcLockWakeup > > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > > , lock=lock@entry=0x7f5325c913f0) at proc.c:1655 > > #1 0x5601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0, > > proclock=proclock@entry=0x7f5325d40d60, > > lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > > , > > hashcode=hashcode@entry=573498161, wakeupNeeded=) > > at lock.c:1673 > > #2 0x5601ab1e9e21 in LockRefindAndRelease > > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > > , proc=, > > locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1, > > > > decrement_strong_lock_count=decrement_strong_lock_count@entry=false) > > at lock.c:3150 > > #3 0x5601ab1edb27 in LockReleaseAll > > (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295 > > #4 0x5601ab1f8599 in ProcReleaseLocks > > (isCommit=isCommit@entry=true) at proc.c:781 > > #5 0x5601ab37f1f4 in ResourceOwnerReleaseInternal > > (owner=, phase=phase@entry=RESOURCE_RELEASE_LOCKS, > > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at > > resowner.c:618 > > #6 0x5601ab37f7b7 in ResourceOwnerRelease (owner=, > > phase=phase@entry=RESOURCE_RELEASE_LOCKS, > > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at > > resowner.c:494 > > #7 0x5601aaec1d84 in CommitTransaction () at xact.c:2334 > > #8 0x5601aaec2b22 in CommitTransactionCommand () at xact.c:3067 > > #9 0x5601ab200a66 in finish_xact_command () at postgres.c:2783 > > #10 0x5601ab20338f in exec_simple_query ( > > query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name, > > column_name, is_updatable\n FROM information_schema.columns\n WHERE > > table_name LIKE E'r__view%'\n ORDER BY table_name, > > ordinal_position;") at postgres.c:1300 > > > > I am unable to connect to the regression test Postgres instance -- > > psql just hangs, so the lock seems to have affected the postmaster > > also. > > > > I'm wondering if this might represent a bug in the current patch. > > Thanks for running and analyzing the test! Sure thing! > Could you share me how you are running the test? > > I imagined something like below, but currently couldn't reproduce it. > - apply both v26-0001 and v27-0002 and build > - run PostgreSQL with default GUCssaaa > - make installcheck-world > - run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1' > during make installcheck-world Apologies, I should have attached my updated patch (with the fix for the bug you'd reporting with the re-entrancy guard). Attached is v28 which sets ProcessLogQueryPlanInterruptActive to false in errfinish when necessary. Once built with those two patches I'm simply running `make check`. Regards, James Coleman v28-0001-Add-function-to-log-the-plan-of-the-query.patch Description: Binary data v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch Description: Binary data
Re: RFC: Logging plan of the running query
On 2023-06-15 01:48, James Coleman wrote: On Tue, Jun 13, 2023 at 11:53 AM James Coleman wrote: ... I'm going to re-run tests with my patch version + resetting the flag on SIGINT (and any other error condition) to be certain that the issue you uncovered (where backends get stuck after a SIGINT not responding to the requested plan logging) wasn't masking any other issues. As long as that run is clean also then I believe the patch is safe as-is even without the re-entrancy guard. I'll report back with the results of that testing. The tests have been running since last night, but have been apparently hung now for many hours. I haven't been able to fully look into it, but so far I know the hung (100% CPU) backend last logged this: 2023-06-14 02:00:30.045 UTC client backend[84461] pg_regress/updatable_views LOG: query plan running on backend with PID 84461 is: Query Text: SELECT table_name, column_name, is_updatable FROM information_schema.columns WHERE table_name LIKE E'r_\\_view%' ORDER BY table_name, ordinal_position; The last output from the regression test harness was: # parallel group (5 tests): index_including create_view index_including_gist create_index create_index_spgist ok 66+ create_index36508 ms ok 67+ create_index_spgist 38588 ms ok 68+ create_view 1394 ms ok 69+ index_including 654 ms ok 70+ index_including_gist 1701 ms # parallel group (16 tests): errors create_cast drop_if_exists create_aggregate roleattributes constraints hash_func typed_table infinite_recurse Attaching gdb to the hung backend shows this: #0 0x5601ab1f9529 in ProcLockWakeup (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 , lock=lock@entry=0x7f5325c913f0) at proc.c:1655 #1 0x5601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0, proclock=proclock@entry=0x7f5325d40d60, lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 , hashcode=hashcode@entry=573498161, wakeupNeeded=) at lock.c:1673 #2 0x5601ab1e9e21 in LockRefindAndRelease (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 , proc=, locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1, decrement_strong_lock_count=decrement_strong_lock_count@entry=false) at lock.c:3150 #3 0x5601ab1edb27 in LockReleaseAll (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295 #4 0x5601ab1f8599 in ProcReleaseLocks (isCommit=isCommit@entry=true) at proc.c:781 #5 0x5601ab37f1f4 in ResourceOwnerReleaseInternal (owner=, phase=phase@entry=RESOURCE_RELEASE_LOCKS, isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at resowner.c:618 #6 0x5601ab37f7b7 in ResourceOwnerRelease (owner=, phase=phase@entry=RESOURCE_RELEASE_LOCKS, isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at resowner.c:494 #7 0x5601aaec1d84 in CommitTransaction () at xact.c:2334 #8 0x5601aaec2b22 in CommitTransactionCommand () at xact.c:3067 #9 0x5601ab200a66 in finish_xact_command () at postgres.c:2783 #10 0x5601ab20338f in exec_simple_query ( query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name, column_name, is_updatable\n FROM information_schema.columns\n WHERE table_name LIKE E'r__view%'\n ORDER BY table_name, ordinal_position;") at postgres.c:1300 I am unable to connect to the regression test Postgres instance -- psql just hangs, so the lock seems to have affected the postmaster also. I'm wondering if this might represent a bug in the current patch. Thanks for running and analyzing the test! Could you share me how you are running the test? I imagined something like below, but currently couldn't reproduce it. - apply both v26-0001 and v27-0002 and build - run PostgreSQL with default GUCssaaa - make installcheck-world - run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1' during make installcheck-world -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: RFC: Logging plan of the running query
On Tue, Jun 13, 2023 at 11:53 AM James Coleman wrote: > > ... > I'm going to re-run tests with my patch version + resetting the flag > on SIGINT (and any other error condition) to be certain that the issue > you uncovered (where backends get stuck after a SIGINT not responding > to the requested plan logging) wasn't masking any other issues. > > As long as that run is clean also then I believe the patch is safe > as-is even without the re-entrancy guard. > > I'll report back with the results of that testing. The tests have been running since last night, but have been apparently hung now for many hours. I haven't been able to fully look into it, but so far I know the hung (100% CPU) backend last logged this: 2023-06-14 02:00:30.045 UTC client backend[84461] pg_regress/updatable_views LOG: query plan running on backend with PID 84461 is: Query Text: SELECT table_name, column_name, is_updatable FROM information_schema.columns WHERE table_name LIKE E'r_\\_view%' ORDER BY table_name, ordinal_position; The last output from the regression test harness was: # parallel group (5 tests): index_including create_view index_including_gist create_index create_index_spgist ok 66+ create_index36508 ms ok 67+ create_index_spgist 38588 ms ok 68+ create_view 1394 ms ok 69+ index_including 654 ms ok 70+ index_including_gist 1701 ms # parallel group (16 tests): errors create_cast drop_if_exists create_aggregate roleattributes constraints hash_func typed_table infinite_recurse Attaching gdb to the hung backend shows this: #0 0x5601ab1f9529 in ProcLockWakeup (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 , lock=lock@entry=0x7f5325c913f0) at proc.c:1655 #1 0x5601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0, proclock=proclock@entry=0x7f5325d40d60, lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 , hashcode=hashcode@entry=573498161, wakeupNeeded=) at lock.c:1673 #2 0x5601ab1e9e21 in LockRefindAndRelease (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 , proc=, locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1, decrement_strong_lock_count=decrement_strong_lock_count@entry=false) at lock.c:3150 #3 0x5601ab1edb27 in LockReleaseAll (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295 #4 0x5601ab1f8599 in ProcReleaseLocks (isCommit=isCommit@entry=true) at proc.c:781 #5 0x5601ab37f1f4 in ResourceOwnerReleaseInternal (owner=, phase=phase@entry=RESOURCE_RELEASE_LOCKS, isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at resowner.c:618 #6 0x5601ab37f7b7 in ResourceOwnerRelease (owner=, phase=phase@entry=RESOURCE_RELEASE_LOCKS, isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at resowner.c:494 #7 0x5601aaec1d84 in CommitTransaction () at xact.c:2334 #8 0x5601aaec2b22 in CommitTransactionCommand () at xact.c:3067 #9 0x5601ab200a66 in finish_xact_command () at postgres.c:2783 #10 0x5601ab20338f in exec_simple_query ( query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name, column_name, is_updatable\n FROM information_schema.columns\n WHERE table_name LIKE E'r__view%'\n ORDER BY table_name, ordinal_position;") at postgres.c:1300 I am unable to connect to the regression test Postgres instance -- psql just hangs, so the lock seems to have affected the postmaster also. I'm wondering if this might represent a bug in the current patch. Regards, James Coleman
Re: RFC: Logging plan of the running query
On Tue, Jun 13, 2023 at 11:22 AM torikoshia wrote: > > On 2023-06-13 00:52, James Coleman wrote: > >> > >> > I've attached v27. The important change here in 0001 is that it > >> > guarantees the interrupt handler is re-entrant, since that was a bug > >> > exposed by my testing. I've also included 0002 which is only meant for > >> > testing (it attempts to log in the plan in every > >> > CHECK_FOR_INTERRUPTS() call). > >> > >> When SIGINT is sent during ProcessLogQueryPlanInterrupt(), > >> ProcessLogQueryPlanInterruptActive can remain true. > >> After that, pg_log_query_plan() does nothing but just returns. > >> > >> AFAIU, v26 logs plan for each pg_log_query_plan() even when another > >> pg_log_query_plan() is running, but it doesn't cause errors or > >> critical > >> problem. > >> > >> Considering the problem solved and introduced by v27, I think v26 > >> might > >> be fine. > >> How do you think? > > > > The testing I did with calling this during every CFI is what uncovered > > the re-entrancy problem. IIRC (without running that test again) the > > problem was a stack overflow. Now, to be sure this is a particularly > > degenerate case because in real-world usage it'd be impossible in > > practice, I think, to trigger that many calls to this function (and by > > extension the interrupt handler). > > Yeah.In addition, currently only superusers are allowed to execute > pg_log_query_plan(), I think we don't need to think about cases > where users are malicious. > > > If SIGINT is the only concern we could reset > > ProcessLogQueryPlanInterruptActive in error handling code. I admit > > that part of my thought process here is thinking ahead to an > > additional patch I'd like to see on top of this, which is logging a > > query plan before cleaning up when statement timeout occurs. > > I remember this is what you wanted do.[1] > > > The > > re-entrancy issue becomes more interesting then, I think, since we > > would then have automated calling of the logging code. BTW: I'd > > thought that would make a nice follow-up patch for this, but if you'd > > prefer I could add it as another patch in the series here. > > > > What do you think about resetting the flag versus just not having it? > > If I understand you correctly, adding the flag is not necessary for this > proposal. > To keep the patch simple, I prefer not having it. > I'm going to re-run tests with my patch version + resetting the flag on SIGINT (and any other error condition) to be certain that the issue you uncovered (where backends get stuck after a SIGINT not responding to the requested plan logging) wasn't masking any other issues. As long as that run is clean also then I believe the patch is safe as-is even without the re-entrancy guard. I'll report back with the results of that testing. Regards, James Coleman
Re: RFC: Logging plan of the running query
On 2023-06-13 00:52, James Coleman wrote: > I've attached v27. The important change here in 0001 is that it > guarantees the interrupt handler is re-entrant, since that was a bug > exposed by my testing. I've also included 0002 which is only meant for > testing (it attempts to log in the plan in every > CHECK_FOR_INTERRUPTS() call). When SIGINT is sent during ProcessLogQueryPlanInterrupt(), ProcessLogQueryPlanInterruptActive can remain true. After that, pg_log_query_plan() does nothing but just returns. AFAIU, v26 logs plan for each pg_log_query_plan() even when another pg_log_query_plan() is running, but it doesn't cause errors or critical problem. Considering the problem solved and introduced by v27, I think v26 might be fine. How do you think? The testing I did with calling this during every CFI is what uncovered the re-entrancy problem. IIRC (without running that test again) the problem was a stack overflow. Now, to be sure this is a particularly degenerate case because in real-world usage it'd be impossible in practice, I think, to trigger that many calls to this function (and by extension the interrupt handler). Yeah.In addition, currently only superusers are allowed to execute pg_log_query_plan(), I think we don't need to think about cases where users are malicious. If SIGINT is the only concern we could reset ProcessLogQueryPlanInterruptActive in error handling code. I admit that part of my thought process here is thinking ahead to an additional patch I'd like to see on top of this, which is logging a query plan before cleaning up when statement timeout occurs. I remember this is what you wanted do.[1] The re-entrancy issue becomes more interesting then, I think, since we would then have automated calling of the logging code. BTW: I'd thought that would make a nice follow-up patch for this, but if you'd prefer I could add it as another patch in the series here. What do you think about resetting the flag versus just not having it? If I understand you correctly, adding the flag is not necessary for this proposal. To keep the patch simple, I prefer not having it. [1] https://www.postgresql.org/message-id/flat/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com#b57432077f8045be8588049269f7a8dd -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: RFC: Logging plan of the running query
On Sun, Jun 11, 2023 at 11:07 PM torikoshia wrote: > > On 2023-06-06 03:26, James Coleman wrote: > > On Mon, Jun 5, 2023 at 4:30 AM torikoshia > > wrote: > >> > >> On 2023-06-03 02:51, James Coleman wrote: > >> > Hello, > >> > > >> > Thanks for working on this patch! > > > > Sure thing! I'm *very interested* in seeing this available, and I > > think it paves the way for some additional features later on... > > > >> > On Thu, Dec 8, 2022 at 12:10 AM torikoshia > >> ... > >> > To put it positively: we believe that, for example, catalog accesses > >> > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside > >> > an existing valid transaction/query state, as it would be for this > >> > patch -- are safe. If there were problems, then those problems are > >> > likely bugs we already have in other CFI cases. > >> > >> Thanks a lot for the discussion and sharing it! > >> I really appreciate it. > >> > >> BTW I'm not sure whether all the CFI are called in valid transaction, > >> do you think we should check each of them? > > > > I kicked off the regressions tests with a call to > > ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS() > > call. Several hours and 52 GB of logs later I have confirmed that > > (with the attached revision) at the very least the regression test > > suite can't trigger any kind of failures regardless of when we trigger > > this. The existing code in the patch for only running the explain when > > there's an active query handling that. > > Thanks for the testing! > > > I've attached v27. The important change here in 0001 is that it > > guarantees the interrupt handler is re-entrant, since that was a bug > > exposed by my testing. I've also included 0002 which is only meant for > > testing (it attempts to log in the plan in every > > CHECK_FOR_INTERRUPTS() call). > > When SIGINT is sent during ProcessLogQueryPlanInterrupt(), > ProcessLogQueryPlanInterruptActive can remain true. > After that, pg_log_query_plan() does nothing but just returns. > > AFAIU, v26 logs plan for each pg_log_query_plan() even when another > pg_log_query_plan() is running, but it doesn't cause errors or critical > problem. > > Considering the problem solved and introduced by v27, I think v26 might > be fine. > How do you think? The testing I did with calling this during every CFI is what uncovered the re-entrancy problem. IIRC (without running that test again) the problem was a stack overflow. Now, to be sure this is a particularly degenerate case because in real-world usage it'd be impossible in practice, I think, to trigger that many calls to this function (and by extension the interrupt handler). If SIGINT is the only concern we could reset ProcessLogQueryPlanInterruptActive in error handling code. I admit that part of my thought process here is thinking ahead to an additional patch I'd like to see on top of this, which is logging a query plan before cleaning up when statement timeout occurs. The re-entrancy issue becomes more interesting then, I think, since we would then have automated calling of the logging code. BTW: I'd thought that would make a nice follow-up patch for this, but if you'd prefer I could add it as another patch in the series here. What do you think about resetting the flag versus just not having it? Regards, James Coleman
Re: RFC: Logging plan of the running query
On 2023-06-06 03:26, James Coleman wrote: On Mon, Jun 5, 2023 at 4:30 AM torikoshia wrote: On 2023-06-03 02:51, James Coleman wrote: > Hello, > > Thanks for working on this patch! Sure thing! I'm *very interested* in seeing this available, and I think it paves the way for some additional features later on... > On Thu, Dec 8, 2022 at 12:10 AM torikoshia ... > To put it positively: we believe that, for example, catalog accesses > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside > an existing valid transaction/query state, as it would be for this > patch -- are safe. If there were problems, then those problems are > likely bugs we already have in other CFI cases. Thanks a lot for the discussion and sharing it! I really appreciate it. BTW I'm not sure whether all the CFI are called in valid transaction, do you think we should check each of them? I kicked off the regressions tests with a call to ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS() call. Several hours and 52 GB of logs later I have confirmed that (with the attached revision) at the very least the regression test suite can't trigger any kind of failures regardless of when we trigger this. The existing code in the patch for only running the explain when there's an active query handling that. Thanks for the testing! I've attached v27. The important change here in 0001 is that it guarantees the interrupt handler is re-entrant, since that was a bug exposed by my testing. I've also included 0002 which is only meant for testing (it attempts to log in the plan in every CHECK_FOR_INTERRUPTS() call). When SIGINT is sent during ProcessLogQueryPlanInterrupt(), ProcessLogQueryPlanInterruptActive can remain true. After that, pg_log_query_plan() does nothing but just returns. AFAIU, v26 logs plan for each pg_log_query_plan() even when another pg_log_query_plan() is running, but it doesn't cause errors or critical problem. Considering the problem solved and introduced by v27, I think v26 might be fine. How do you think? Regards, James -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: RFC: Logging plan of the running query
On Mon, Jun 5, 2023 at 4:30 AM torikoshia wrote: > > On 2023-06-03 02:51, James Coleman wrote: > > Hello, > > > > Thanks for working on this patch! Sure thing! I'm *very interested* in seeing this available, and I think it paves the way for some additional features later on... > > On Thu, Dec 8, 2022 at 12:10 AM torikoshia > ... > > To put it positively: we believe that, for example, catalog accesses > > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside > > an existing valid transaction/query state, as it would be for this > > patch -- are safe. If there were problems, then those problems are > > likely bugs we already have in other CFI cases. > > Thanks a lot for the discussion and sharing it! > I really appreciate it. > > BTW I'm not sure whether all the CFI are called in valid transaction, > do you think we should check each of them? I kicked off the regressions tests with a call to ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS() call. Several hours and 52 GB of logs later I have confirmed that (with the attached revision) at the very least the regression test suite can't trigger any kind of failures regardless of when we trigger this. The existing code in the patch for only running the explain when there's an active query handling that. > > Another concern Robert raised may apply here: what if a person tries > > to cancel a query when we're explaining? I believe that's a reasonable > > question to ask, but I believe it's a trade-off that's worth making > > for the (significant) introspection benefits this patch would provide. > > Is the concern here limited to the case where explain code goes crazy > as Robert pointed out? > If so, this may be a trade-off worth doing. > I am a little concerned about whether there will be cases where the > explain code is not problematic but just takes long time. The explain code could take a long time in some rare cases (e.g., we discovered a bug a few years back with the planning code that actually descends an index to find the max value), but I think the trade-off is worth it. > > On to the patch itself: overall I think it looks like it's in pretty > > good shape. I also noticed we don't have any tests (I assume it'd have > > to be TAP tests) of the actual output happening, and I think it would > > be worth adding that. > > Checking the log output may be better, but I didn't add it since there > is only a little content that can be checked, and similar function > pg_log_backend_memory_contexts() does not do such type of tests. Fair enough. I still think that would be an improvement here, but others could also weigh in. > > Are you interested in re-opening this patch? I'd be happy to provide > > further review and help to try to push this along. > Sure, I'm going to re-open this. I've attached v27. The important change here in 0001 is that it guarantees the interrupt handler is re-entrant, since that was a bug exposed by my testing. I've also included 0002 which is only meant for testing (it attempts to log in the plan in every CHECK_FOR_INTERRUPTS() call). Regards, James v27-0001-Add-function-to-log-the-plan-of-the-query.patch Description: Binary data v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch Description: Binary data
Re: RFC: Logging plan of the running query
On 2023-06-03 02:51, James Coleman wrote: Hello, Thanks for working on this patch! On Thu, Dec 8, 2022 at 12:10 AM torikoshia wrote: BTW, since this patch depends on ProcessInterrupts() and EXPLAIN codes which is used in auto_explain, I'm feeling that the following discussion also applies to this patch. > -- > https://www.postgresql.org/message-id/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com > > explaining a query is a pretty > complicated operation that involves catalog lookups and lots of > complicated stuff, and I don't think that it would be safe to do all > of that at any arbitrary point in the code where ProcessInterrupts() > happened to fire. If I can't come up with some workaround during the next Commitfest, I'm going to withdraw this proposal. While at PGCon this week I'd brought up this idea with a few people without realizing you'd already worked on it previously, and then after I discovered this thread several of us (Greg, Ronan, David, Heikki, and myself -- all cc'd) discussed the safety concerns over dinner last night. Our conclusion was that all of the concerns we could come up with (for example, walking though the code for ExplainTargetRel() and discussing the relevant catalog and syscache accesses) all applied specifically to Robert's concerns about running explain inside an aborted transaction. After all, I'd originally started that thread to ask about running auto-explain after a query timeout. To put it positively: we believe that, for example, catalog accesses inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside an existing valid transaction/query state, as it would be for this patch -- are safe. If there were problems, then those problems are likely bugs we already have in other CFI cases. Thanks a lot for the discussion and sharing it! I really appreciate it. BTW I'm not sure whether all the CFI are called in valid transaction, do you think we should check each of them? Another concern Robert raised may apply here: what if a person tries to cancel a query when we're explaining? I believe that's a reasonable question to ask, but I believe it's a trade-off that's worth making for the (significant) introspection benefits this patch would provide. Is the concern here limited to the case where explain code goes crazy as Robert pointed out? If so, this may be a trade-off worth doing. I am a little concerned about whether there will be cases where the explain code is not problematic but just takes long time. On to the patch itself: overall I think it looks like it's in pretty good shape. I also noticed we don't have any tests (I assume it'd have to be TAP tests) of the actual output happening, and I think it would be worth adding that. Checking the log output may be better, but I didn't add it since there is only a little content that can be checked, and similar function pg_log_backend_memory_contexts() does not do such type of tests. Are you interested in re-opening this patch? I'd be happy to provide further review and help to try to push this along. Sure, I'm going to re-open this. I've rebased the patch and attached as v26. Thanks again for your work! -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: RFC: Logging plan of the running query
Hello, Thanks for working on this patch! On Thu, Dec 8, 2022 at 12:10 AM torikoshia wrote: > > BTW, since this patch depends on ProcessInterrupts() and EXPLAIN codes > which is used in auto_explain, I'm feeling that the following discussion > also applies to this patch. > > > -- > > https://www.postgresql.org/message-id/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com > > > > explaining a query is a pretty > > complicated operation that involves catalog lookups and lots of > > complicated stuff, and I don't think that it would be safe to do all > > of that at any arbitrary point in the code where ProcessInterrupts() > > happened to fire. > > If I can't come up with some workaround during the next Commitfest, I'm > going to withdraw this proposal. While at PGCon this week I'd brought up this idea with a few people without realizing you'd already worked on it previously, and then after I discovered this thread several of us (Greg, Ronan, David, Heikki, and myself -- all cc'd) discussed the safety concerns over dinner last night. Our conclusion was that all of the concerns we could come up with (for example, walking though the code for ExplainTargetRel() and discussing the relevant catalog and syscache accesses) all applied specifically to Robert's concerns about running explain inside an aborted transaction. After all, I'd originally started that thread to ask about running auto-explain after a query timeout. To put it positively: we believe that, for example, catalog accesses inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside an existing valid transaction/query state, as it would be for this patch -- are safe. If there were problems, then those problems are likely bugs we already have in other CFI cases. Another concern Robert raised may apply here: what if a person tries to cancel a query when we're explaining? I believe that's a reasonable question to ask, but I believe it's a trade-off that's worth making for the (significant) introspection benefits this patch would provide. On to the patch itself: overall I think it looks like it's in pretty good shape. I also noticed we don't have any tests (I assume it'd have to be TAP tests) of the actual output happening, and I think it would be worth adding that. Are you interested in re-opening this patch? I'd be happy to provide further review and help to try to push this along. I've rebased the patch and attached as v26. Thanks, James Coleman v26-0001-Add-function-to-log-the-plan-of-the-query.patch Description: Binary data
Re: RFC: Logging plan of the running query
On 2022-12-07 03:41, Andres Freund wrote: Hi, This patch does not currently build, due to a conflicting oid: I suggest you choose a random oid out of the "development purposes" range: Thanks for your advice! Attached updated patch. BTW, since this patch depends on ProcessInterrupts() and EXPLAIN codes which is used in auto_explain, I'm feeling that the following discussion also applies to this patch. -- https://www.postgresql.org/message-id/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com explaining a query is a pretty complicated operation that involves catalog lookups and lots of complicated stuff, and I don't think that it would be safe to do all of that at any arbitrary point in the code where ProcessInterrupts() happened to fire. If I can't come up with some workaround during the next Commitfest, I'm going to withdraw this proposal. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom a0d2179826a0fa224eaf37ca00d14954b76fde6b Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 8 Dec 2022 12:42:22 +0900 Subject: [PATCH v25] Add function to log the plan of the query currently running on the backend. Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro Horiguchi, Robert Treat, Alena Rybakina --- doc/src/sgml/func.sgml | 49 +++ src/backend/access/transam/xact.c| 13 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 139 ++- src/backend/executor/execMain.c | 14 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 4 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 3 + src/include/miscadmin.h | 1 + src/include/storage/lock.h | 2 - src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h| 2 +- src/test/regress/expected/misc_functions.out | 54 +-- src/test/regress/sql/misc_functions.sql | 41 -- 17 files changed, 318 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e57ffce971..d39a4e50a0 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25536,6 +25536,25 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID. +It will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -25756,6 +25775,36 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_query_plan can be used +to log the plan of a backend process. For example: + +postgres=# SELECT pg_log_query_plan(201116); + pg_log_query_plan +--- + t +(1 row) + +The format of the query plan is the same as when VERBOSE, +COSTS, SETTINGS and +FORMAT TEXT are used in the EXPLAIN +command. For example: + +LOG: query plan running on backend with PID 201116 is: +Query Text: SELECT * FROM pgbench_accounts; +Seq Scan on public.pgbench_accounts (cost=0.00..52787.00 rows=200 width=97) + Output: aid, bid, abalance, filler +Settings: work_mem = '1MB' + +Note that when statements are executed inside a function, only the +plan of the most deeply nested query is logged. + + + +When a subtransaction is aborted, pg_log_query_plan +cannot log the query plan after the abort. + + diff --git
Re: RFC: Logging plan of the running query
Hi, This patch does not currently build, due to a conflicting oid: https://cirrus-ci.com/task/4638460594618368?logs=build#L108 [17:26:44.602] /usr/bin/perl ../src/include/catalog/../../backend/catalog/genbki.pl --include-path=../src/include --set-version=16 --output=src/include/catalog ../src/include/catalog/pg_proc.h ../src/include/catalog/pg_type.h ../src/include/catalog/pg_attribute.h ../src/include/catalog/pg_class.h ../src/include/catalog/pg_attrdef.h ../src/include/catalog/pg_constraint.h ../src/include/catalog/pg_inherits.h ../src/include/catalog/pg_index.h ../src/include/catalog/pg_operator.h ../src/include/catalog/pg_opfamily.h ../src/include/catalog/pg_opclass.h ../src/include/catalog/pg_am.h ../src/include/catalog/pg_amop.h ../src/include/catalog/pg_amproc.h ../src/include/catalog/pg_language.h ../src/include/catalog/pg_largeobject_metadata.h ../src/include/catalog/pg_largeobject.h ../src/include/catalog/pg_aggregate.h ../src/include/catalog/pg_statistic.h ../src/include/catalog/pg_statistic_ext.h ../src/include/catalog/pg_statistic_ext_data.h ../src/include/catalog/pg_rewrite.h ../src/include/catalog/pg_trigger.h ../src/include/catalog/pg_event_trigger.h ../src/include/catalog/pg_description.h ../src/include/catalog/pg_cast.h ../src/include/catalog/pg_enum.h ../src/include/catalog/pg_namespace.h ../src/include/catalog/pg_conversion.h ../src/include/catalog/pg_depend.h ../src/include/catalog/pg_database.h ../src/include/catalog/pg_db_role_setting.h ../src/include/catalog/pg_tablespace.h ../src/include/catalog/pg_authid.h ../src/include/catalog/pg_auth_members.h ../src/include/catalog/pg_shdepend.h ../src/include/catalog/pg_shdescription.h ../src/include/catalog/pg_ts_config.h ../src/include/catalog/pg_ts_config_map.h ../src/include/catalog/pg_ts_dict.h ../src/include/catalog/pg_ts_parser.h ../src/include/catalog/pg_ts_template.h ../src/include/catalog/pg_extension.h ../src/include/catalog/pg_foreign_data_wrapper.h ../src/include/catalog/pg_foreign_server.h ../src/include/catalog/pg_user_mapping.h ../src/include/catalog/pg_foreign_table.h ../src/include/catalog/pg_policy.h ../src/include/catalog/pg_replication_origin.h ../src/include/catalog/pg_default_acl.h ../src/include/catalog/pg_init_privs.h ../src/include/catalog/pg_seclabel.h ../src/include/catalog/pg_shseclabel.h ../src/include/catalog/pg_collation.h ../src/include/catalog/pg_parameter_acl.h ../src/include/catalog/pg_partitioned_table.h ../src/include/catalog/pg_range.h ../src/include/catalog/pg_transform.h ../src/include/catalog/pg_sequence.h ../src/include/catalog/pg_publication.h ../src/include/catalog/pg_publication_namespace.h ../src/include/catalog/pg_publication_rel.h ../src/include/catalog/pg_subscription.h ../src/include/catalog/pg_subscription_rel.h [17:26:44.602] Duplicate OIDs detected: [17:26:44.602] 4550 [17:26:44.602] found 1 duplicate OID(s) in catalog data I suggest you choose a random oid out of the "development purposes" range: * OIDs 1- are reserved for manual assignment (see .dat files in * src/include/catalog/). Of these, 8000- are reserved for * development purposes (such as in-progress patches and forks); * they should not appear in released versions. Greetings, Andres Freund
Re: RFC: Logging plan of the running query
Sorry, I wrote confusingly at that time. No, I suggested adding comment about the explanation of HandleLogQueryPlanInterrupt() only in the explain.h and not removing from the explain.c. I seemed to be necessary separating declaration function for 'explaining feature' of executed query from our logging plan of the running query interrupts function. But now, I doubt it. I'm not sure I understand your comment correctly, do you mean HandleLogQueryPlanInterrupt() should not be placed in explain.c? Thank you for having reminded about this function and I looked at ProcessLogMemoryContextInterrupt() declaration. I'm noticed comments in the memutils.h are missed tooю Description of this function is written only in mcxt.c. However, given that ProcesLogMemoryContextInterrupt(), which similarly handles interrupts for pg_log_backend_memory_contexts(), is located in mcxt.c, I also think current location might be acceptable. So I think you are right and the comment about the explanation of HandleLogQueryPlanInterrupt() written in explain.h is redundant. I feel this comment is unnecessary since the explanation of HandleLogQueryPlanInterrupt() is written in explain.c and no functions in explain.h have comments in it. Regards, -- Alena Rybakina Postgres Professional
Re: RFC: Logging plan of the running query
On 2022-09-21 17:30, Alena Rybakina wrote: Thanks for your reply! I also noticed it. However I also discovered that above function declarations to be aplied for explain command and used to be printed details of the executed query. We have a similar task to print the plan of an interrupted process making a request for a specific pid. In short, I think, this task is different and for separating these parts I added this comment. I'm not sure I understand your comment correctly, do you mean HandleLogQueryPlanInterrupt() should not be placed in explain.c? It may be so. However, given that ProcesLogMemoryContextInterrupt(), which similarly handles interrupts for pg_log_backend_memory_contexts(), is located in mcxt.c, I also think current location might be acceptable. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: RFC: Logging plan of the running query
Ok, I get it. Since GetLockMethodLocalHash() is only used for assertions, this is only defined when USE_ASSERT_CHECKING is enabled. This patch uses GetLockMethodLocalHash() not only for the assertion purpose, so I removed "ifdef USE_ASSERT_CHECKING" for this function. I belive it does not lead to skip errors. Agree. Since these processes are needed for nested queries, not only for AbortSubTransaction[1], added comments on it. I also noticed it. However I also discovered that above function declarations to be aplied for explain command and used to be printed details of the executed query. We have a similar task to print the plan of an interrupted process making a request for a specific pid. In short, I think, this task is different and for separating these parts I added this comment. I feel this comment is unnecessary since the explanation of HandleLogQueryPlanInterrupt() is written in explain.c and no functions in explain.h have comments in it. Yes, I was worried about it. I understood it, thank for explaining. AFAIU both of them are processed since SendProcSignal flags ps_signalFlags for each signal. ``` SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId) { volatile ProcSignalSlot *slot; ...(snip)... 278 if (slot->pss_pid == pid) 279 { 280 /* Atomically set the proper flag */ 281 slot->pss_signalFlags[reason] = true; 282 /* Send signal */ 283 return kill(pid, SIGUSR1); ``` Comments of ProcSignalReason also say 'We can cope with concurrent signals for different reasons'. ```C /* * Reasons for signaling a Postgres child process (a backend or an auxiliary * process, like checkpointer). We can cope with concurrent signals for different * reasons. However, if the same reason is signaled multiple times in quick * succession, the process is likely to observe only one notification of it. * This is okay for the present uses. ... typedef enum { PROCSIG_CATCHUP_INTERRUPT, /* sinval catchup interrupt */ PROCSIG_NOTIFY_INTERRUPT, /* listen/notify interrupt */ PROCSIG_PARALLEL_MESSAGE, /* message from cooperating parallel backend */ PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */ PROCSIG_BARRIER, /* global barrier interrupt */ PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */ PROCSIG_LOG_QUERY_PLAN, /* ask backend to log plan of the current query */ ... } ProcSignalReason; ``` [1] https://www.postgresql.org/message-id/8b53b32f-26cc-0531-4ac0-27310e0bef4b%40oss.nttdata.com
Re: RFC: Logging plan of the running query
On 2022-09-19 17:47, Алена Рыбакина wrote: Thanks for your review and comments! Hi, I'm sorry,if this message is duplicated previous this one, but I'm not sure that the previous message is sent correctly. I sent it from email address a.rybak...@postgrespro.ru and I couldn't send this one email from those address. I've successfully received your mail from both a.rybak...@postgrespro.ru and lena.riback...@yandex.ru. I like idea to create patch for logging query plan. After reviewing this code and notice some moments and I'd rather ask you some questions. Firstly, I suggest some editing in the comment of commit. I think, it is turned out the more laconic and the same clear. I wrote it below since I can't think of any other way to add it. ``` Currently, we have to wait for finishing of the query execution to check its plan. This is not so convenient in investigation long-running queries on production environments where we cannot use debuggers. To improve this situation there is proposed the patch containing the pg_log_query_plan() function which request to log plan of the specified backend process. By default, only superusers are allowed to request log of the plan otherwise allowing any users to issue this request could create cause lots of log messages and it can lead to denial of service. At the next requesting CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level and therefore this plan will appear in the server log only, not to be sent to the client. Thanks, I have incorporated your comments. Since the latter part of the original message comes from the commit message of pg_log_backend_memory_contexts(43620e328617c), so I left it as it was for consistency. Secondly, I have question about deleting USE_ASSERT_CHECKING in lock.h? It supposed to have been checked in another placed of the code by matching values. I am worry about skipping errors due to untesting with assert option in the places where it (GetLockMethodLocalHash) participates and we won't able to get core file in segfault cases. I might not understand something, then can you please explain to me? Since GetLockMethodLocalHash() is only used for assertions, this is only defined when USE_ASSERT_CHECKING is enabled. This patch uses GetLockMethodLocalHash() not only for the assertion purpose, so I removed "ifdef USE_ASSERT_CHECKING" for this function. I belive it does not lead to skip errors. Thirdly, I have incomprehension of the point why save_ActiveQueryDesc is declared in the pquery.h? I am seemed to save_ActiveQueryDesc to be used in an once time in the ExecutorRun function and its declaration superfluous. I added it in the attached patch. Exactly. Fourthly, it seems to me there are not enough explanatory comments in the code. I also added them in the attached patch. Thanks! | + /* | +* Save value of ActiveQueryDesc before having called | +* ExecutorRun_hook function due to having reset by | +* AbortSubTransaction. | +*/ | + | save_ActiveQueryDesc = ActiveQueryDesc; | ActiveQueryDesc = queryDesc; | | @@ -314,6 +320,7 @@ ExecutorRun(QueryDesc *queryDesc, | else | standard_ExecutorRun(queryDesc, direction, count, execute_once); | | + /* We set the actual value of ActiveQueryDesc */ | ActiveQueryDesc = save_ActiveQueryDesc; Since these processes are needed for nested queries, not only for AbortSubTransaction[1], added comments on it. | +/* Function to handle the signal to output the query plan. */ | extern void HandleLogQueryPlanInterrupt(void); I feel this comment is unnecessary since the explanation of HandleLogQueryPlanInterrupt() is written in explain.c and no functions in explain.h have comments in it. Lastly, I have incomprehension about handling signals since have been unused it before. Could another signal disabled calling this signal to log query plan? I noticed this signal to be checked the latest in procsignal_sigusr1_handler function. Are you concerned that one signal will not be processed when multiple signals are sent in succession? AFAIU both of them are processed since SendProcSignal flags ps_signalFlags for each signal. ``` SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId) { volatile ProcSignalSlot *slot; ...(snip)... 278 if (slot->pss_pid == pid) 279 { 280 /* Atomically set the proper flag */ 281 slot->pss_signalFlags[reason] = true; 282 /* Send signal */ 283 return kill(pid, SIGUSR1); ``` Comments of ProcSignalReason also say 'We can cope with concurrent signals for different reasons'. ```C /* * Reasons for signaling a Postgres child process (a backend or an auxiliary * process, like checkpointer). We can cope with concurrent signals for different * reasons. However, if the same reason is signaled multiple times in quick * succession, the process is likely to observe only one notification
Re: RFC: Logging plan of the running query
Hi, I'm sorry,if this message is duplicated previous this one, but the previous message is sent incorrectly. I sent it from email address lena.riback...@yandex.ru. I liked this idea and after reviewing code I noticed some moments and I'd rather ask you some questions. Firstly, I suggest some editing in the comment of commit. I think, it is turned out the more laconic and the same clear. I wrote it below since I can't think of any other way to add it. ``` Currently, we have to wait for finishing of the query execution to check its plan. This is not so convenient in investigation long-running queries on production environments where we cannot use debuggers. To improve this situation there is proposed the patch containing the pg_log_query_plan() function which request to log plan of the specified backend process. By default, only superusers are allowed to request log of the plan otherwise allowing any users to issue this request could create cause lots of log messages and it can lead to denial of service. At the next requesting CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level and therefore this plan will appear in the server log only, not to be sent to the client. ``` Secondly, I have question about deleting USE_ASSERT_CHECKING in lock.h? It supposed to have been checked in another placed of the code by matching values. I am worry about skipping errors due to untesting with assert option in the places where it (GetLockMethodLocalHash) participates and we won't able to get core file in segfault cases. I might not understand something, then can you please explain to me? Thirdly, I have incomprehension of the point why save_ActiveQueryDesc is declared in the pquery.h? I am seemed to save_ActiveQueryDesc to be used in an once time in the ExecutorRun function and its declaration superfluous. I added it in the attached patch. Fourthly, it seems to me there are not enough explanatory comments in the code. I also added them in the attached patch. Lastly, I have incomprehension about handling signals since have been unused it before. Could another signal disabled calling this signal to log query plan? I noticed this signal to be checked the latest in procsignal_sigusr1_handler function. Regards, -- Alena Rybakina Postgres Professional 19.09.2022, 11:01, "torikoshia" : On 2022-05-16 17:02, torikoshia wrote: On 2022-03-09 19:04, torikoshia wrote: On 2022-02-08 01:13, Fujii Masao wrote: AbortSubTransaction() should reset ActiveQueryDesc to save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? Otherwise ActiveQueryDesc of top-level statement will be unavailable after subtransaction is aborted in the nested statements. I once agreed above suggestion and made v20 patch making save_ActiveQueryDesc a global variable, but it caused segfault when calling pg_log_query_plan() after FreeQueryDesc(). OTOH, doing some kind of reset of ActiveQueryDesc seems necessary since it also caused segfault when running pg_log_query_plan() during installcheck. There may be a better way, but resetting ActiveQueryDesc to NULL seems safe and simple. Of course it makes pg_log_query_plan() useless after a subtransaction is aborted. However, if it does not often happen that people want to know the running query's plan whose subtransaction is aborted, resetting ActiveQueryDesc to NULL would be acceptable. Attached is a patch that sets ActiveQueryDesc to NULL when a subtransaction is aborted. How do you think? Attached new patch to fix patch apply failures again. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index a82ac87457e..65b692b0ddf 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -306,6 +306,12 @@ ExecutorRun(QueryDesc *queryDesc, { QueryDesc *save_ActiveQueryDesc; + /* + * Save value of ActiveQueryDesc before having called + * ExecutorRun_hook function due to having reset by + * AbortSubTransaction. + */ + save_ActiveQueryDesc = ActiveQueryDesc; ActiveQueryDesc = queryDesc; @@ -314,6 +320,7 @@ ExecutorRun(QueryDesc *queryDesc, else standard_ExecutorRun(queryDesc, direction, count, execute_once); + /* We set the actual value of ActiveQueryDesc */ ActiveQueryDesc = save_ActiveQueryDesc; } diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h index fc9f9f8e3f0..8e7ce3c976f 100644 --- a/src/include/commands/explain.h +++
Re: RFC: Logging plan of the running query
Hi,I'm sorry,if this message is duplicated previous this one, but I'm not sure that the previous message is sent correctly. I sent it from email address a.rybak...@postgrespro.ru and I couldn't send this one email from those address.I like idea to create patch for logging query plan. After reviewing this code and notice some moments and I'd rather ask you some questions.Firstly, I suggest some editing in the comment of commit. I think, it is turned out the more laconic and the same clear. I wrote it below since I can't think of any other way to add it.```Currently, we have to wait for finishing of the query execution to check its plan.This is not so convenient in investigation long-running queries on productionenvironments where we cannot use debuggers.To improve this situation there is proposed the patch containing the pg_log_query_plan()function which request to log plan of the specified backend process.By default, only superusers are allowed to request log of the plan otherwiseallowing any users to issue this request could create cause lots of log messagesand it can lead to denial of service.At the next requesting CHECK_FOR_INTERRUPTS(), the target backend logs its plan atLOG_SERVER_ONLY level and therefore this plan will appear in the server log only,not to be sent to the client.```Secondly, I have question about deleting USE_ASSERT_CHECKING in lock.h?It supposed to have been checked in another placed of the code by matching values. I am worry about skipping errors due to untesting with assert option in the places where it (GetLockMethodLocalHash) participates and we won't able to get core file in segfault cases. I might not understand something, then can you please explain to me?Thirdly, I have incomprehension of the point why save_ActiveQueryDesc is declared in the pquery.h? I am seemed to save_ActiveQueryDesc to be used in an once time in the ExecutorRun function and its declaration superfluous. I added it in the attached patch.Fourthly, it seems to me there are not enough explanatory comments in the code. I also added them in the attached patch.Lastly, I have incomprehension about handling signals since have been unused it before. Could another signal disabled calling this signal to log query plan? I noticed this signal to be checked the latest in procsignal_sigusr1_handler function. -- Regards,Alena RybakinaPostgres Professional 19.09.2022, 11:01, "torikoshia" :On 2022-05-16 17:02, torikoshia wrote: On 2022-03-09 19:04, torikoshia wrote: On 2022-02-08 01:13, Fujii Masao wrote: AbortSubTransaction() should reset ActiveQueryDesc to save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? Otherwise ActiveQueryDesc of top-level statement will be unavailable after subtransaction is aborted in the nested statements. I once agreed above suggestion and made v20 patch making save_ActiveQueryDesc a global variable, but it caused segfault when calling pg_log_query_plan() after FreeQueryDesc(). OTOH, doing some kind of reset of ActiveQueryDesc seems necessary since it also caused segfault when running pg_log_query_plan() during installcheck. There may be a better way, but resetting ActiveQueryDesc to NULL seems safe and simple. Of course it makes pg_log_query_plan() useless after a subtransaction is aborted. However, if it does not often happen that people want to know the running query's plan whose subtransaction is aborted, resetting ActiveQueryDesc to NULL would be acceptable. Attached is a patch that sets ActiveQueryDesc to NULL when a subtransaction is aborted. How do you think?Attached new patch to fix patch apply failures again. --Regards,--Atsushi TorikoshiNTT DATA CORPORATION
Re: RFC: Logging plan of the running query
On 2022-05-16 17:02, torikoshia wrote: On 2022-03-09 19:04, torikoshia wrote: On 2022-02-08 01:13, Fujii Masao wrote: AbortSubTransaction() should reset ActiveQueryDesc to save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? Otherwise ActiveQueryDesc of top-level statement will be unavailable after subtransaction is aborted in the nested statements. I once agreed above suggestion and made v20 patch making save_ActiveQueryDesc a global variable, but it caused segfault when calling pg_log_query_plan() after FreeQueryDesc(). OTOH, doing some kind of reset of ActiveQueryDesc seems necessary since it also caused segfault when running pg_log_query_plan() during installcheck. There may be a better way, but resetting ActiveQueryDesc to NULL seems safe and simple. Of course it makes pg_log_query_plan() useless after a subtransaction is aborted. However, if it does not often happen that people want to know the running query's plan whose subtransaction is aborted, resetting ActiveQueryDesc to NULL would be acceptable. Attached is a patch that sets ActiveQueryDesc to NULL when a subtransaction is aborted. How do you think? Attached new patch to fix patch apply failures again. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 5be784278e8e7aeeeadf60a772afccda7b59e6e4 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 5 Sep 2022 12:02:16 +0900 Subject: [PATCH v23] Add function to log the plan of the query currently running on the backend. Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro Horiguchi, Robert Treat --- doc/src/sgml/func.sgml | 49 +++ src/backend/access/transam/xact.c| 13 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 140 ++- src/backend/executor/execMain.c | 10 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 4 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 3 + src/include/miscadmin.h | 1 + src/include/storage/lock.h | 2 - src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h| 2 + src/test/regress/expected/misc_functions.out | 54 +-- src/test/regress/sql/misc_functions.sql | 41 -- 17 files changed, 315 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 67eb380632..0e12f73b9c 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25535,6 +25535,25 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID. +It will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -25649,6 +25668,36 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_query_plan can be used +to log the plan of a backend process. For example: + +postgres=# SELECT pg_log_query_plan(201116); + pg_log_query_plan +--- + t +(1 row) + +The format of the query plan is the same as when VERBOSE, +COSTS, SETTINGS and +FORMAT TEXT are used in the EXPLAIN +command. For example: + +LOG: query plan running on backend with PID 201116 is: +Query Text: SELECT * FROM pgbench_accounts; +Seq Scan on public.pgbench_accounts (cost=0.00..52787.00 rows=200 width=97) + Output: aid, bid, abalance, filler +Settings: work_mem = '1MB' + +
Re: RFC: Logging plan of the running query
On 2022-03-09 19:04, torikoshia wrote: On 2022-02-08 01:13, Fujii Masao wrote: AbortSubTransaction() should reset ActiveQueryDesc to save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? Otherwise ActiveQueryDesc of top-level statement will be unavailable after subtransaction is aborted in the nested statements. I once agreed above suggestion and made v20 patch making save_ActiveQueryDesc a global variable, but it caused segfault when calling pg_log_query_plan() after FreeQueryDesc(). OTOH, doing some kind of reset of ActiveQueryDesc seems necessary since it also caused segfault when running pg_log_query_plan() during installcheck. There may be a better way, but resetting ActiveQueryDesc to NULL seems safe and simple. Of course it makes pg_log_query_plan() useless after a subtransaction is aborted. However, if it does not often happen that people want to know the running query's plan whose subtransaction is aborted, resetting ActiveQueryDesc to NULL would be acceptable. Attached is a patch that sets ActiveQueryDesc to NULL when a subtransaction is aborted. How do you think? Attached new patch to fix patch apply failures. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 5be784278e8e7aeeeadf60a772afccda7b59e6e4 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 16 May 2022 12:02:16 +0900 Subject: [PATCH v22] Add function to log the plan of the query currently running on the backend. Currently, we have to wait for the query execution to finish to check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the specified backend process. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro Horiguchi, Robert Treat --- doc/src/sgml/func.sgml | 49 +++ src/backend/access/transam/xact.c| 13 ++ src/backend/catalog/system_functions.sql | 2 + src/backend/commands/explain.c | 140 ++- src/backend/executor/execMain.c | 10 ++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/lmgr/lock.c | 9 +- src/backend/tcop/postgres.c | 4 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/explain.h | 3 + src/include/miscadmin.h | 1 + src/include/storage/lock.h | 2 - src/include/storage/procsignal.h | 1 + src/include/tcop/pquery.h| 2 + src/test/regress/expected/misc_functions.out | 54 +-- src/test/regress/sql/misc_functions.sql | 41 -- 17 files changed, 315 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 85ecc639fd..47f54b978b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28027,6 +28027,25 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_query_plan + +pg_log_query_plan ( pid integer ) +boolean + + +Requests to log the plan of the query currently running on the +backend with specified process ID. +It will be logged at LOG message level and +will appear in the server log based on the log +configuration set (See +for more information), but will not be sent to the client +regardless of . + + + @@ -28141,6 +28160,36 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_query_plan can be used +to log the plan of a backend process. For example: + +postgres=# SELECT pg_log_query_plan(201116); + pg_log_query_plan +--- + t +(1 row) + +The format of the query plan is the same as when VERBOSE, +COSTS, SETTINGS and +FORMAT TEXT are used in the EXPLAIN +command. For example: + +LOG: query plan running on backend with PID 201116 is: +Query Text: SELECT * FROM pgbench_accounts; +Seq Scan on public.pgbench_accounts (cost=0.00..52787.00 rows=200 width=97) + Output: aid, bid, abalance, filler +Settings: work_mem = '1MB' + +Note that when statements are executed