On Wed, May 6, 2015 at 7:10 PM, Robert Haas <robertmh...@gmail.com> wrote: > > On Wed, May 6, 2015 at 7:55 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > >>> - I believe the separation of concerns between ExecFunnel() and > >>> ExecEndFunnel() is not quite right. If the scan is shut down before > >>> it runs to completion (e.g. because of LIMIT), then I think we'll call > >>> ExecEndFunnel() before ExecFunnel() hits the TupIsNull(slot) path. I > >>> think you probably need to create a static subroutine that is called > >>> both as soon as TupIsNull(slot) and also from ExecEndFunnel(), in each > >>> case cleaning up whatever resources remain. > > > >> Right, will fix as per suggestion. > > > > I observed one issue while working on this review comment. When we > > try to destroy the parallel setup via ExecEndNode (as due to Limit > > Node, it could not destroy after consuming all tuples), it waits for > > parallel > > workers to finish (WaitForParallelWorkersToFinish()) and parallel workers > > are waiting for master backend to signal them as their queue is full. > > I think in such a case master backend needs to inform workers either when > > the scan is discontinued due to limit node or while waiting for parallel > > workers to finish. > > Isn't this why TupleQueueFunnelShutdown() calls shm_mq_detach()? > That's supposed to unstick the workers; any impending or future writes > will just return SHM_MQ_DETACHED without waiting. >
Okay, that can work if we call it in ExecEndNode() before WaitForParallelWorkersToFinish(), however what if we want to do something like TupleQueueFunnelShutdown() when Limit node decides to stop processing the outer node. We can traverse the whole plan tree and find the nodes where parallel workers needs to be stopped, but I don't think thats good way to handle it. If we don't want to stop workers from processing until ExecutorEnd()--->ExecEndNode(), then it will lead to workers continuing till that time and it won't be easy to get instrumentation/buffer usage information from workers (workers fill such information for master backend after execution is complete) as that is done before ExecutorEnd(). For Explain Analyze .., we can ensure that workers are stopped before fetching that information from Funnel node, but the same is not easy for buffer usage stats required by plugins as that operates at ExecutorRun() and ExecutorFinish() level where we don't have direct access to node level information. You can refer pgss_ExecutorEnd() where it completes the storage of stats information before calling ExecutorEnd(). Offhand, I could not think of a good way to do this, but one crude way could be introduce a new API (ParallelExecutorEnd()) for such plugins which needs to be called before completing the stats accumulation. This API will call ExecEndPlan() if parallelmodeNeeded flag is set and allow accumulation of stats (InstrStartNode()/InstrStopNode()) > > Well, all the workers restore that state in parallel, so adding it up > across all workers doesn't really make sense. But anyway, no, I don't > think that's a big cost. I think the big cost is going to the > operating system overhead of process creation. The new process will > incur lots of page faults as it populates its address space and > dirties pages marked copy-on-write. That's where I expect most of the > expense to be. > Okay, will remove parallel_startup_cost from patch in next version. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com