Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-16 Thread Robert Haas
On Tue, Aug 16, 2016 at 1:05 AM, Rushabh Lathia
 wrote:
> I agree, this make sense.
>
> Here is the patch to allocate worker instrumentation into same context
> as the regular instrumentation which is per-query context.

Looks good, committed.  I am not sure it was a very good idea for
af33039317ddc4a0e38a02e2255c2bf453115fd2 by Tom Lane to change the
current memory context for the entire execution of gather_readnext();
this might not be the only or the last bug that results from that
decision.  However, I don't really want to get an argument about that
right now, and this at least fixes the problem we know about.  Thanks
for the report and patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-15 Thread Rushabh Lathia
On Mon, Aug 15, 2016 at 6:02 PM, Robert Haas  wrote:

> On Sat, Aug 13, 2016 at 4:36 AM, Amit Kapila 
> wrote:
> > AFAICS, your patch seems to be the right fix for this issue, unless we
> > need the instrumentation information during execution (other than for
> > explain) for some purpose.
>
> Hmm, I disagree.  It should be the job of
> ExecParallelRetrieveInstrumentation to allocate its data in the
> correct context, not the responsibility of nodeGather.c to work around
> the fact that it doesn't.  The worker instrumentation should be
> allocated in the same context as the regular instrumentation
> information, which I assume is probably the per-query context.
>

I agree, this make sense.

Here is the patch to allocate worker instrumentation into same context
as the regular instrumentation which is per-query context.

PFA patch.


--
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 380d743..5aa6f02 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -500,6 +500,7 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 	int			n;
 	int			ibytes;
 	int			plan_node_id = planstate->plan->plan_node_id;
+	MemoryContext oldcontext;
 
 	/* Find the instumentation for this node. */
 	for (i = 0; i < instrumentation->num_plan_nodes; ++i)
@@ -514,10 +515,19 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 	for (n = 0; n < instrumentation->num_workers; ++n)
 		InstrAggNode(planstate->instrument, [n]);
 
-	/* Also store the per-worker detail. */
+	/*
+	 * Also store the per-worker detail.
+	 *
+	 * Worker instrumentation should be allocated in the same context as
+	 * the regular instrumentation information, which is the per-query
+	 * context. Switch into per-query memory context.
+	 */
+	oldcontext = MemoryContextSwitchTo(planstate->state->es_query_cxt);
 	ibytes = mul_size(instrumentation->num_workers, sizeof(Instrumentation));
 	planstate->worker_instrument =
 		palloc(ibytes + offsetof(WorkerInstrumentation, instrument));
+	MemoryContextSwitchTo(oldcontext);
+
 	planstate->worker_instrument->num_workers = instrumentation->num_workers;
 	memcpy(>worker_instrument->instrument, instrument, ibytes);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-15 Thread Robert Haas
On Sat, Aug 13, 2016 at 4:36 AM, Amit Kapila  wrote:
> AFAICS, your patch seems to be the right fix for this issue, unless we
> need the instrumentation information during execution (other than for
> explain) for some purpose.

Hmm, I disagree.  It should be the job of
ExecParallelRetrieveInstrumentation to allocate its data in the
correct context, not the responsibility of nodeGather.c to work around
the fact that it doesn't.  The worker instrumentation should be
allocated in the same context as the regular instrumentation
information, which I assume is probably the per-query context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-13 Thread Amit Kapila
On Sat, Aug 13, 2016 at 11:10 AM, Rushabh Lathia
 wrote:
> Hi All,
>
> Recently while running tpc-h queries on postgresql master branch, I am
> noticed
> random server crash. Most of the time server crash coming while turn tpch
> query
> number 3 - (but its very random).
>
>
> Here its clear that work_instrument is either corrupted or Un-inililized
> that is the
> reason its ending up with server crash.
>
> With bit more debugging and looked at git history I found that issue started
> coming
> with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext()
> calls
> ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers()
> calls ExecParallelFinish() which collects the instrumentation before marking
> ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do the
> allocation
> of planstate->worker_instrument.
>
> With commit af33039317 now we calling the gather_readnext() with per-tuple
> context,
> but with nreader == 0 with ExecShutdownGatherWorkers() we end up with
> allocation
> of planstate->worker_instrument into per-tuple context - which is wrong.
>
> Now fix can be:
>
> 1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and
> let
> ExecEndGather() do that things.
>

I don't think we can wait till ExecEndGather() to collect statistics,
as we need it before that for explain path.  However, we do call
ExecShutdownNode() from ExecutePlan() when there are no more tuples
which can take care of ensuring the shutdown of Gather node.   I think
the advantage of calling it in
gather_readnext() is that it will resources to be released early and
populating the instrumentation/statistics as early as possible.

> But with this change, gather_readread() and
> gather_getnext() depend on planstate->reader structure to continue reading
> tuple.
> Now either we can change those condition to be depend on planstate->nreaders
> or
> just pfree(planstate->reader) into gather_readnext() instead of calling
> ExecShutdownGatherWorkers().
>
>
> Attaching patch, which fix the issue with approach 1).
>

AFAICS, your patch seems to be the right fix for this issue, unless we
need the instrumentation information during execution (other than for
explain) for some purpose.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers