Re: [HACKERS] Parallel Seq Scan

Amit Kapila Thu, 07 May 2015 00:24:07 -0700

On Wed, May 6, 2015 at 7:10 PM, Robert Haas <robertmh...@gmail.com> wrote:
>
> On Wed, May 6, 2015 at 7:55 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:
> >>> - I believe the separation of concerns between ExecFunnel() and
> >>> ExecEndFunnel() is not quite right.  If the scan is shut down before
> >>> it runs to completion (e.g. because of LIMIT), then I think we'll call
> >>> ExecEndFunnel() before ExecFunnel() hits the TupIsNull(slot) path.  I
> >>> think you probably need to create a static subroutine that is called
> >>> both as soon as TupIsNull(slot) and also from ExecEndFunnel(), in each
> >>> case cleaning up whatever resources remain.
> >
> >> Right, will fix as per suggestion.
> >
> > I observed one issue while working on this review comment.  When we
> > try to destroy the parallel setup via ExecEndNode (as due to Limit
> > Node, it could not destroy after consuming all tuples), it waits for
> > parallel
> > workers to finish (WaitForParallelWorkersToFinish()) and parallel
workers
> > are waiting for master backend to signal them as their queue is full.
> > I think in such a case master backend needs to inform workers either
when
> > the scan is discontinued due to limit node or while waiting for parallel
> > workers to finish.
>
> Isn't this why TupleQueueFunnelShutdown() calls shm_mq_detach()?
> That's supposed to unstick the workers; any impending or future writes
> will just return SHM_MQ_DETACHED without waiting.
>


Okay, that can work if we call it in ExecEndNode() before
WaitForParallelWorkersToFinish(), however what if we want to do something
like TupleQueueFunnelShutdown() when Limit node decides to stop processing
the outer node.  We can traverse the whole plan tree and find the nodes
where
parallel workers needs to be stopped, but I don't think thats good way to
handle
it.  If we don't want to stop workers from processing until
ExecutorEnd()--->ExecEndNode(), then it will lead to workers continuing till
that time and it won't be easy to get instrumentation/buffer usage
information
from workers (workers fill such information for master backend after
execution
is complete) as that is done before ExecutorEnd().  For Explain Analyze ..,
we
can ensure that workers are stopped before fetching that information from
Funnel node, but the same is not easy for buffer usage stats required by
plugins as that operates at ExecutorRun() and ExecutorFinish() level where
we don't have direct access to node level information.  You can refer
pgss_ExecutorEnd() where it completes the storage of stats information
before calling ExecutorEnd().  Offhand, I could not think of a good way to
do this, but one crude way could be introduce a new API
(ParallelExecutorEnd())
for such plugins which needs to be called before completing the stats
accumulation.
This API will call ExecEndPlan() if parallelmodeNeeded flag is set and allow
accumulation of stats (InstrStartNode()/InstrStopNode())

>
> Well, all the workers restore that state in parallel, so adding it up
> across all workers doesn't really make sense.  But anyway, no, I don't
> think that's a big cost.  I think the big cost is going to the
> operating system overhead of process creation.  The new process will
> incur lots of page faults as it populates its address space and
> dirties pages marked copy-on-write.  That's where I expect most of the
> expense to be.
>

Okay, will remove parallel_startup_cost from patch in next version.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

Reply via email to