On Fri, Sep 18, 2015 at 7:04 PM, Robert Haas <robertmh...@gmail.com> wrote:
> > Frankly, I think you guys are making this out to be way more > complicated than it really is. Basically, I think the process being > queried should publish a DSM via a slot it owns. The recipient is > responsible for reading it and then notifying the sender. If a second > process requests data before the first process reads its data, the > second process can either (1) become an additional reader of the > already-published data or (2) wait for the first process to finish, > and then send its own inquiry. > > There are some problems to solve here, but they hardly seem impossible. Thank you Robert, for your invaluable input on this patch! I now believe that use of ProcessInterrupts() in the recently proposed design as well as manipulation of proc latch due to use of shared memory queue are major blockers. In order to simplify things up, I've taken a step back and looked again at the auto_explain contrib module. I now propose the most simple thing possible in my opinion: a new GUC option for auto_explain. It will make every backend, in which the module is loaded via *_preload_libraries mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the plans of queries in dynamic shared memory as long as auto_explain.publish_plans is turned on. The greatest part for me, is that this approach doesn't require handling of signals and is isolated in an external module, so it can be readily used with the current server versions, no need to wait for >= 9.6. Some implementation details: For every backend that might be running (MaxBackends) we reserve a dsm_handle slot in the addins shared memory. When the new option is turned on, the ExecutorRun hook will produce a plan in whatever format is specified by the auto_explain.log_format, allocate a DSM segment, copy the plan into the segment and finally store the DSM handle into its own slot. No locking is required around this because every backend writes to its slot exclusively, no other backend can be writing into it concurrently. In the ExecutorFinish hook we invalidate the backend's slot by setting the handle to 0 and deallocate the DSM segment. Reading of the plan is performed by a newly added function pg_explain_backend(PID). Since it can determine the target process' backendId, it simply reads the DSM handle from that backend's slot and tries to attach it (there's not much point in checking the handle for being non-0, because the other backend could release the segment the moment after we've checked it, so we rely on dsm_attach returning non-NULL). If attached successfully, we parse the contents and detach. At this point the backend to detach the last is actually releasing the segment, due to reference counting. Handling of the nested statements plans is an open question. It can be really useful when the top-level plan is simply displaying a "SELECT my_stored_procedure()" and all the interesting stuff is happening behind the scenes, but I didn't start to think about how this could be implemented yet. Pavel was really interested in retrieving the complete query text/plans which could be over a megabyte in his case (and pg_stat_activity.query is capped by 10240 bytes I believe). This is now possible with the patch, but some others might still want to put a threshold on the allocation, especially given this is shared memory. I can envision another GUC, but in our experience the extremely long queries (and plans) are most of the time due to use of VALUES() or IN() clauses with a huge list of literals. I think we could fold the VALUES/IN into "?" if the query/plan text exceeds the specified threshold, or unconditionally (yes, pg_stat_statements, I'm looking at you). This should help in the cases when the most interesting part is in the plan nodes near the end, but there's such a huge list of literals before it. Future plans: I believe this approach can be extended to enable instrumentation once again. The backend executing the query could update the published plan every once in a while (for example, every N ms or 1% of rows processed in a node), and any other process interested in this data, can simply read it without the need for signals and complex and fragile communication. This obviously requires a core patch. Some problems: There is a potential problem with the limited total number of DSM segments: it is reserved in a way to only allow 2 per backend (on average) and 64 additional per server, so if you run with the new option enabled at all times, you're down to only 1 additional DSM per backend (again, on average). Not sure how critical this can be, but no one is forced to run with this option enabled for all backends. If you don't want to run it enabled at all times, then enabling the GUC per-backend can be problematic. It's still possible to update the conf file and send SIGHUP to a single backend, but it's harder to accomplish over psql, for example. I think here we might still have some luck with the signals: use another array of per-backend slots with flags, set the target backend's flag and send it SIGUSR1. The backend wakes on the signal and examines its slot, then toggles the GUC if needed. Sounds pretty safe, eh? No documentation changes yet, waiting for your comments. :-) Happy hacking! -- Alex
From 9f192d17e4674b2a042ca435ad5219f59722cd7a Mon Sep 17 00:00:00 2001 From: Oleksandr Shulgin <oleksandr.shul...@zalando.de> Date: Fri, 25 Sep 2015 17:29:34 +0200 Subject: [PATCH] Add auto_explain.publish_plans This adds an optional feature to publish the plans of every query being executed into per-backend dynamic shared memory segment. The plans can then be examined by other backends by use of newly added pg_explain_backend() function. The process ID that is passed to the function must be this server's backend PID and the caller must be either superuser or the same role that the other backend is running under. In order to support addition of the SQL-callable function, EXTENSION packaging is provided for auto_explain. --- contrib/auto_explain/Makefile | 2 + contrib/auto_explain/auto_explain--1.0.sql | 10 + contrib/auto_explain/auto_explain.c | 326 ++++++++++++++++++++++++++--- contrib/auto_explain/auto_explain.control | 5 + 4 files changed, 315 insertions(+), 28 deletions(-) create mode 100644 contrib/auto_explain/auto_explain--1.0.sql create mode 100644 contrib/auto_explain/auto_explain.control diff --git a/contrib/auto_explain/Makefile b/contrib/auto_explain/Makefile index fcf36c5..b6057a3 100644 --- a/contrib/auto_explain/Makefile +++ b/contrib/auto_explain/Makefile @@ -3,6 +3,8 @@ MODULE_big = auto_explain OBJS = auto_explain.o $(WIN32RES) PGFILEDESC = "auto_explain - logging facility for execution plans" +EXTENSION = auto_explain +DATA = auto_explain--1.0.sql ifdef USE_PGXS PG_CONFIG = pg_config diff --git a/contrib/auto_explain/auto_explain--1.0.sql b/contrib/auto_explain/auto_explain--1.0.sql new file mode 100644 index 0000000..6059949 --- /dev/null +++ b/contrib/auto_explain/auto_explain--1.0.sql @@ -0,0 +1,10 @@ +/* contrib/auto_explain/auto_explain--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION auto_explain" to load this file. \quit + +-- Register functions. +CREATE FUNCTION pg_explain_backend(INT) +RETURNS SETOF TEXT +AS 'MODULE_PATHNAME' +LANGUAGE C; diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index 2a184ed..0fa123f 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -14,9 +14,18 @@ #include <limits.h> +#include "access/htup_details.h" #include "commands/explain.h" #include "executor/instrument.h" +#include "miscadmin.h" +#include "storage/dsm.h" +#include "storage/ipc.h" +#include "storage/proc.h" +#include "storage/procarray.h" +#include "storage/shmem.h" #include "utils/guc.h" +#include "utils/builtins.h" +#include "utils/memutils.h" PG_MODULE_MAGIC; @@ -29,6 +38,7 @@ static bool auto_explain_log_triggers = false; static bool auto_explain_log_timing = true; static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT; static bool auto_explain_log_nested_statements = false; +static bool auto_explain_publish_plans = false; static const struct config_enum_entry format_options[] = { {"text", EXPLAIN_FORMAT_TEXT, false}, @@ -38,10 +48,23 @@ static const struct config_enum_entry format_options[] = { {NULL, 0, false} }; + +/* Shared memory structs */ +static dsm_handle *published_plan_handles = NULL; + +#define MaxPublishedPlans MaxBackends +#define explain_shmem_size() (sizeof(dsm_handle) * MaxPublishedPlans) + +static dsm_segment *MyPublishedPlanSegment = NULL; +static dsm_handle *MyPublishedPlanHandle = NULL; + + /* Current nesting depth of ExecutorRun calls */ static int nesting_level = 0; /* Saved hook values in case of unload */ +static shmem_startup_hook_type prev_shmem_startup_hook = NULL; + static ExecutorStart_hook_type prev_ExecutorStart = NULL; static ExecutorRun_hook_type prev_ExecutorRun = NULL; static ExecutorFinish_hook_type prev_ExecutorFinish = NULL; @@ -51,15 +74,28 @@ static ExecutorEnd_hook_type prev_ExecutorEnd = NULL; (auto_explain_log_min_duration >= 0 && \ (nesting_level == 0 || auto_explain_log_nested_statements)) +#define publish_explain_enabled() \ + (auto_explain_publish_plans && \ + nesting_level == 0) /* XXX what do we do with the nested statements? */ + void _PG_init(void); void _PG_fini(void); +PG_FUNCTION_INFO_V1(pg_explain_backend); + +static void explain_shmem_startup(void); +static void attach_published_plan_handles(void); + static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags); static void explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count); static void explain_ExecutorFinish(QueryDesc *queryDesc); static void explain_ExecutorEnd(QueryDesc *queryDesc); +static StringInfo explain_query(QueryDesc *queryDesc); + +static void explain_publish_plan(QueryDesc *queryDesc); +static void explain_unpublish_my_plan(void); /* @@ -159,9 +195,26 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable("auto_explain.publish_plans", + "Publish plan of every executed statement in shared memory.", + NULL, + &auto_explain_publish_plans, + false, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + EmitWarningsOnPlaceholders("auto_explain"); + if (auto_explain_publish_plans) + RequestAddinShmemSpace(explain_shmem_size()); + /* Install hooks. */ + prev_shmem_startup_hook = shmem_startup_hook; + shmem_startup_hook = explain_shmem_startup; + prev_ExecutorStart = ExecutorStart_hook; ExecutorStart_hook = explain_ExecutorStart; prev_ExecutorRun = ExecutorRun_hook; @@ -183,8 +236,42 @@ _PG_fini(void) ExecutorRun_hook = prev_ExecutorRun; ExecutorFinish_hook = prev_ExecutorFinish; ExecutorEnd_hook = prev_ExecutorEnd; + + shmem_startup_hook = prev_shmem_startup_hook; } + +static void +explain_shmem_startup(void) +{ + if (prev_shmem_startup_hook) + prev_shmem_startup_hook(); + + attach_published_plan_handles(); +} + +static void +attach_published_plan_handles(void) +{ + bool found; + + if (auto_explain_publish_plans && published_plan_handles == NULL) + { + LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE); + + published_plan_handles = + (dsm_handle *) ShmemInitStruct("auto_explain.published_plan_handles", + explain_shmem_size(), &found); + if (!found) + MemSet(published_plan_handles, 0, explain_shmem_size()); + + MyPublishedPlanHandle = &published_plan_handles[MyBackendId]; + + LWLockRelease(AddinShmemInitLock); + } +} + + /* * ExecutorStart hook: start up logging if needed */ @@ -234,6 +321,11 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags) static void explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count) { + if (publish_explain_enabled()) + { + explain_publish_plan(queryDesc); + } + nesting_level++; PG_TRY(); { @@ -257,6 +349,11 @@ explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count) static void explain_ExecutorFinish(QueryDesc *queryDesc) { + if (MyPublishedPlanSegment) + { + explain_unpublish_my_plan(); + } + nesting_level++; PG_TRY(); { @@ -294,32 +391,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc) msec = queryDesc->totaltime->total * 1000.0; if (msec >= auto_explain_log_min_duration) { - ExplainState *es = NewExplainState(); - - es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze); - es->verbose = auto_explain_log_verbose; - es->buffers = (es->analyze && auto_explain_log_buffers); - es->timing = (es->analyze && auto_explain_log_timing); - es->summary = es->analyze; - es->format = auto_explain_log_format; - - ExplainBeginOutput(es); - ExplainQueryText(es, queryDesc); - ExplainPrintPlan(es, queryDesc); - if (es->analyze && auto_explain_log_triggers) - ExplainPrintTriggers(es, queryDesc); - ExplainEndOutput(es); - - /* Remove last line break */ - if (es->str->len > 0 && es->str->data[es->str->len - 1] == '\n') - es->str->data[--es->str->len] = '\0'; - - /* Fix JSON to output an object */ - if (auto_explain_log_format == EXPLAIN_FORMAT_JSON) - { - es->str->data[0] = '{'; - es->str->data[es->str->len - 1] = '}'; - } + StringInfo str = explain_query(queryDesc); /* * Note: we rely on the existing logging of context or @@ -329,10 +401,11 @@ explain_ExecutorEnd(QueryDesc *queryDesc) */ ereport(LOG, (errmsg("duration: %.3f ms plan:\n%s", - msec, es->str->data), + msec, str->data), errhidestmt(true))); - pfree(es->str->data); + pfree(str->data); + pfree(str); } } @@ -341,3 +414,200 @@ explain_ExecutorEnd(QueryDesc *queryDesc) else standard_ExecutorEnd(queryDesc); } + +static StringInfo +explain_query(QueryDesc *queryDesc) +{ + StringInfo result; + ExplainState *es = NewExplainState(); + + es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze); + es->verbose = auto_explain_log_verbose; + es->buffers = (es->analyze && auto_explain_log_buffers); + es->timing = (es->analyze && auto_explain_log_timing); + es->summary = es->analyze; + es->format = auto_explain_log_format; + + ExplainBeginOutput(es); + ExplainQueryText(es, queryDesc); + ExplainPrintPlan(es, queryDesc); + if (es->analyze && auto_explain_log_triggers) + ExplainPrintTriggers(es, queryDesc); + ExplainEndOutput(es); + + /* Remove last line break */ + if (es->str->len > 0 && es->str->data[es->str->len - 1] == '\n') + es->str->data[--es->str->len] = '\0'; + + /* Fix JSON to output an object */ + if (auto_explain_log_format == EXPLAIN_FORMAT_JSON) + { + es->str->data[0] = '{'; + es->str->data[es->str->len - 1] = '}'; + } + + result = es->str; + pfree(es); + + return result; +} + +static void +explain_publish_plan(QueryDesc *queryDesc) +{ + StringInfo str = explain_query(queryDesc); + dsm_segment *dsm; + + attach_published_plan_handles(); + + dsm = dsm_create(str->len + 1, DSM_CREATE_NULL_IF_MAXSEGMENTS); + if (dsm) + { + char *buffer = (char *) dsm_segment_address(dsm); + dsm_handle handle; + + memcpy(buffer, str->data, str->len + 1); + + MyPublishedPlanSegment = dsm; + + handle = dsm_segment_handle(dsm); + *MyPublishedPlanHandle = handle; + + elog(DEBUG5, "published explain plan in a DSM segment: %d", handle); + } + else + elog(LOG, "explain_publish_plan: out of dynamic shared memory, needed: %d", + str->len + 1); + + pfree(str->data); + pfree(str); +} + +static void +explain_unpublish_my_plan(void) +{ + Assert(MyPublishedPlanSegment); + Assert(MyPublishedPlanHandle); + + elog(DEBUG5, "unpublishing explain plan in a DSM segment: %d", *MyPublishedPlanHandle); + + *MyPublishedPlanHandle = 0; + + dsm_detach(MyPublishedPlanSegment); + MyPublishedPlanSegment = NULL; +} + + +/* + * Retrieve the plan of a statement currently running in a server backend + * specified by the process ID. + * + * FUNCTION pg_explain_backend(pid) + * RETURNS SETOF text + */ +Datum +pg_explain_backend(PG_FUNCTION_ARGS) +{ + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + Tuplestorestate *tupstore; + TupleDesc tupdesc; + MemoryContext per_query_ctx; + MemoryContext oldcontext; + pid_t target_pid = (pid_t) PG_GETARG_INT32(0); + PGPROC *target_proc; + dsm_handle handle = 0; + dsm_segment *seg = NULL; + + /* check to see if caller supports us returning a tuplestore */ + if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("set-valued function called in context that cannot accept a set"))); + + if (!(rsinfo->allowedModes & SFRM_Materialize)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("materialize mode required, but it is not allowed in this context"))); + + /* verify access to target_pid */ + target_proc = BackendPidGetProc(target_pid); + + if (target_proc == NULL) + ereport(ERROR, + (errmsg("PID %d is not a PostgreSQL server process", target_pid))); + + if (!(superuser() || target_proc->roleId == GetUserId())) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg("must be superuser or have the same role to explain queries running in other server processes")))); + + attach_published_plan_handles(); + if (published_plan_handles == NULL) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot attach to published plans shared memory"), + errhint("is auto_explain.publish_plans turned on?"))); + + /* need to build tuplestore in query context */ + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory; + oldcontext = MemoryContextSwitchTo(per_query_ctx); + + tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc); + tupstore = tuplestore_begin_heap(false, false, work_mem); + MemoryContextSwitchTo(oldcontext); + + if (target_pid == MyProcPid) + { + /* cannot attach to our own segment, but we know where to look */ + seg = MyPublishedPlanSegment; + } + else + { + handle = published_plan_handles[target_proc->backendId]; + seg = dsm_attach(handle); + } + if (seg != NULL) + { + PG_TRY(); + { + const char *p = (const char *) dsm_segment_address(seg); + + /* break into tuples on newline */ + while (*p) + { + const char *q = strchr(p, '\n'); + Size len = q ? q - p : strlen(p); + Datum value; + HeapTuple tuple; + bool isnull = false; + + value = PointerGetDatum(cstring_to_text_with_len(p, len)); + + tuple = heap_form_tuple(tupdesc, &value, &isnull); + tuplestore_puttuple(tupstore, tuple); + + if (!q) + break; + + p += len + 1; + } + + /* detach DSM, but only if it's not our own one */ + if (handle) + dsm_detach(seg); + } + PG_CATCH(); + { + if (handle) + dsm_detach(seg); + PG_RE_THROW(); + } + PG_END_TRY(); + } + + rsinfo->returnMode = SFRM_Materialize; + rsinfo->setResult = tupstore; + rsinfo->setDesc = tupdesc; + + return (Datum) 0; +} diff --git a/contrib/auto_explain/auto_explain.control b/contrib/auto_explain/auto_explain.control new file mode 100644 index 0000000..bd92c71 --- /dev/null +++ b/contrib/auto_explain/auto_explain.control @@ -0,0 +1,5 @@ +# auto_explain extension +comment = 'logging facility for execution plans' +default_version = '1.0' +module_pathname = '$libdir/auto_explain' +relocatable = true -- 2.1.4
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers