On Thu, Oct 1, 2015 at 7:41 PM, Robert Haas <robertmh...@gmail.com> wrote: > > > Does heap_parallelscan_nextpage really need a pscan_finished output > parameter, or can it just return InvalidBlockNumber to indicate end of > scan? I think the latter can be done and would be cleaner. >
I think we can do that way as well, however we have to keep the check of page == InvalidBlockNumber at all the callers to indicate finish of scan which made me write the function as it exists in patch. However, I don't mind changing it the way you have suggested if you feel that would be cleaner. > There doesn't seem to be anything that ensures that everybody who's > scanning the relation agrees on whether we're doing a synchronized > scan. > I think that is implicitly ensured. We perform sync scan's based on GUC synchronize_seqscans and few other conditions in initscan which I think will ensure that all workers will perform sync scans on relation. Do you see any problem with exiting logic which would break the guarantee of sync scans on a relation among parallel workers? > I think that heap_parallelscan_initialize() should taken an > additional Boolean argument, allow_sync. It should decide whether to > actually perform a syncscan using the logic from initscan(), and then > it should store a phs_syncscan flag into the ParallelHeapScanDesc. > heap_beginscan_internal should set rs_syncscan based on phs_syncscan, > regardless of anything else. > I think this will ensure that any future change in this area won't break the guarantee for sync scans for parallel workers, is that the reason you prefer to implement this function in the way suggested by you? > I think heap_parallel_rescan() is an unworkable API. When rescanning > a synchronized scan, the existing logic keeps the original start-block > so that the scan's results are reproducible, It seems from the code comments in initscan, the reason for keeping previous startblock is to allow rewinding the cursor which doesn't hold for parallel scan. We might or might not want to support such cases with parallel query, but even if we want to there is a way we can do with current logic (as mentioned in one of my replies below). > but no longer reports the > scan position since we're presumably out of step with other backends. Is it true for all form of rescans or are you talking about some case (like SampleScan) in particular? As per my reading of code (and verified by debugging that code path), it doesn't seem to be true for rescan in case of seqscan. > This isn't going to work at all with this API. I don't think you can > swap out the ParallelHeapScanDesc for another one once you've > installed it; > Sure, but if we really need some such parameters like startblock position, then we can preserve those in ScanDesc. > > This is obviously going to present some design complications for the > as-yet-uncommitted code to push down PARAM_EXEC parameters, because if > the new value takes more bytes to store than the old value, there > won't be room to update the existing DSM in place. > PARAM_EXEC parameters will be used to support initPlan in parallel query (as it is done in the initial patch), so it seems to me this is the main downside of this approach. I think rather than trying to come up with various possible solutions for this problem, lets prohibit sync scans with parallel query if you see some problem with the suggestions made by me above. Do you see any main use case getting hit due to non support of sync scans with parallel seq. scan? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com