On Thu, Oct 1, 2015 at 7:52 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Wed, Sep 30, 2015 at 7:05 AM, Robert Haas <robertmh...@gmail.com> wrote: >> On Thu, Sep 24, 2015 at 2:31 PM, Amit Kapila <amit.kapil...@gmail.com> >> wrote: >> > [ parallel_seqscan_partialseqscan_v18.patch ] >> >> I spent a bit of time reviewing the heapam.c changes in this patch >> this evening, and I think that your attempt to add support for >> synchronized scans has some problems. > > Thanks for the review and I agree with all the suggestions provided > by you. Fixed all of them in attached patch > (parallel_seqscan_heapscan_v1.patch).
Thanks. Does heap_parallelscan_nextpage really need a pscan_finished output parameter, or can it just return InvalidBlockNumber to indicate end of scan? I think the latter can be done and would be cleaner. There doesn't seem to be anything that ensures that everybody who's scanning the relation agrees on whether we're doing a synchronized scan. I think that heap_parallelscan_initialize() should taken an additional Boolean argument, allow_sync. It should decide whether to actually perform a syncscan using the logic from initscan(), and then it should store a phs_syncscan flag into the ParallelHeapScanDesc. heap_beginscan_internal should set rs_syncscan based on phs_syncscan, regardless of anything else. I think heap_parallel_rescan() is an unworkable API. When rescanning a synchronized scan, the existing logic keeps the original start-block so that the scan's results are reproducible, but no longer reports the scan position since we're presumably out of step with other backends. This isn't going to work at all with this API. I don't think you can swap out the ParallelHeapScanDesc for another one once you've installed it; the point of a rescan is that you are letting the HeapScanDesc (or ParallelHeapScanDesc) hold onto some state from the first time, and that doesn't work at all here. So, I think this function should just go away, and callers should be able to just use heap_rescan(). Now this presents a bit of a problem for PartialSeqScan, because, on a rescan, nodeGather.c completely destroys the DSM and creates a new one. I think we're going to need to change that. I think we can adapt the parallel context machinery so that after WaitForParallelWorkersToFinish(), you can again call LaunchParallelWorkers(). (That might already work, but I wouldn't be surprised if it doesn't quite work.) This would make rescans somewhat more efficient because we wouldn't have to destroy and re-create the DSM each time. It means that the DSM would have to stick around until we're totally done with the query, rather than going away when ExecGather() returns the last tuple, but that doesn't sound too bad. We can still clean up the workers when we've returned all the tuples, which I think is the most important thing. This is obviously going to present some design complications for the as-yet-uncommitted code to push down PARAM_EXEC parameters, because if the new value takes more bytes to store than the old value, there won't be room to update the existing DSM in place. There are several possible solutions to that problem, but the one that appeals to me most right now is just don't generate plans that would require that feature. It doesn't seem particularly appealing to me to put a Gather node on the inner side of a NestLoop -- at least not until we can execute that without restarting workers, which we're certainly some distance from today. And maybe not even then. For initPlans, the existing code might be adequate, because I think we never re-evaluate those. And for subPlans, we can potentially handle cases where each worker can evaluate the subPlan separately below the Gather; we just can't handle cases where the subPlan attaches above the Gather and is used below it. Or, we can get around these limitations by redesigning the PARAM_EXEC pushdown mechanism in some way. But even if we don't, it's not crippling. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers