On Thu, May 7, 2015 at 3:23 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> > I observed one issue while working on this review comment. When we >> > try to destroy the parallel setup via ExecEndNode (as due to Limit >> > Node, it could not destroy after consuming all tuples), it waits for >> > parallel >> > workers to finish (WaitForParallelWorkersToFinish()) and parallel >> > workers >> > are waiting for master backend to signal them as their queue is full. >> > I think in such a case master backend needs to inform workers either >> > when >> > the scan is discontinued due to limit node or while waiting for parallel >> > workers to finish. >> >> Isn't this why TupleQueueFunnelShutdown() calls shm_mq_detach()? >> That's supposed to unstick the workers; any impending or future writes >> will just return SHM_MQ_DETACHED without waiting. > > Okay, that can work if we call it in ExecEndNode() before > WaitForParallelWorkersToFinish(), however what if we want to do something > like TupleQueueFunnelShutdown() when Limit node decides to stop processing > the outer node. We can traverse the whole plan tree and find the nodes > where > parallel workers needs to be stopped, but I don't think thats good way to > handle > it. If we don't want to stop workers from processing until > ExecutorEnd()--->ExecEndNode(), then it will lead to workers continuing till > that time and it won't be easy to get instrumentation/buffer usage > information > from workers (workers fill such information for master backend after > execution > is complete) as that is done before ExecutorEnd(). For Explain Analyze .., > we > can ensure that workers are stopped before fetching that information from > Funnel node, but the same is not easy for buffer usage stats required by > plugins as that operates at ExecutorRun() and ExecutorFinish() level where > we don't have direct access to node level information. You can refer > pgss_ExecutorEnd() where it completes the storage of stats information > before calling ExecutorEnd(). Offhand, I could not think of a good way to > do this, but one crude way could be introduce a new API > (ParallelExecutorEnd()) > for such plugins which needs to be called before completing the stats > accumulation. > This API will call ExecEndPlan() if parallelmodeNeeded flag is set and allow > accumulation of stats (InstrStartNode()/InstrStopNode())
OK, so if I understand you here, the problem is what to do about an "orphaned" worker. The Limit node just stops fetching from the lower nodes, and those nodes don't get any clue that this has happened, so their workers just sit there until the end of the query. Of course, that happens already, but it doesn't usually hurt very much, because the Limit node usually appears at or near the top of the plan. It could matter, though. Suppose the Limit is for a subquery that has a Sort somewhere (not immediately) beneath it. My guess is the Sort's tuplestore will stick around until after the subquery finishes executing for as long as the top-level query is executing, which in theory could be a huge waste of resources. In practice, I guess people don't really write queries that way. If they did, I think we'd have already developed some general method for fixing this sort of problem. I think it might be better to try to solve this problem in a more localized way. Can we arrange for planstate->instrumentation to point directory into the DSM, instead of copying the data over later? That seems like it might help, or perhaps there's another approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers