On Tue, Aug 8, 2023 at 11:16 PM Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Aug 8, 2023 at 2:58 AM Amit Langote <amitlangot...@gmail.com> wrote: > > Or we could consider something like the patch I mentioned in my 1st > > email. The idea there was to pass the pruning result via a separate > > channel, not the DSM chunk linked into the PlanState tree. To wit, on > > the leader side, ExecInitParallelPlan() puts the serialized > > List-of-Bitmapset into the shm_toc with a dedicated PARALLEL_KEY, > > alongside PlannedStmt, ParamListInfo, etc. The List-of-Bitmpaset is > > initialized during the leader's ExecInitNode(). On the worker side, > > ExecParallelGetQueryDesc() reads the List-of-Bitmapset string and puts > > the resulting node into the QueryDesc, that ParallelQueryMain() then > > uses to do ExecutorStart() which copies the pointer to > > EState.es_part_prune_results. ExecInitAppend() consults > > EState.es_part_prune_results and uses the Bitmapset from there, if > > present, instead of performing initial pruning. > > This also seems reasonable. > > > I'm assuming it's not > > too ugly if ExecInitAppend() uses IsParallelWorker() to decide whether > > it should be writing to EState.es_part_prune_results or reading from > > it -- the former if in the leader and the latter in a worker. > > I don't think that's too ugly. I mean you have to have an if statement > someplace.
Yes, that makes sense. It's just that I thought maybe I haven't thought hard enough about options before adding a new IsParallelWorker(), because I don't find too many instances of IsParallelWorker() in the generic executor code. I think that's because most parallel worker-specific logic lives in execParallel.c or in Exec*Worker() functions outside that file, so the generic code remains parallel query agnostic as much as possible. > > If we > > are to go with this approach we will need to un-revert ec386948948c, > > which moved PartitionPruneInfo nodes out of Append/MergeAppend nodes > > to a List in PlannedStmt (copied into EState.es_part_prune_infos), > > such that es_part_prune_results mirrors es_part_prune_infos. > > The comment for the revert (which was > 5472743d9e8583638a897b47558066167cc14583) points to > https://www.postgresql.org/message-id/4191508.1674157...@sss.pgh.pa.us > as the reason, but it's not very clear to me why that email led to > this being reverted. In any event, I agree that if we go with your > idea to pass this via a separate PARALLEL_KEY, unreverting that patch > seems to make sense, because otherwise I think we don't have a fast > way to find the nodes that contain the state that we care about. OK, I've attached the unreverted patch that adds EState.es_part_prune_infos as 0001. 0002 adds EState.es_part_prune_results. Parallel query leader stores the bitmapset of initially valid subplans by performing initial pruning steps contained in a given PartitionPruneInfo into that list at the same index as the PartitionPruneInfo's index in es_part_prune_infos. ExecInitParallelPlan() serializes es_part_prune_results and stores it in the DSM. A worker initializes es_part_prune_results in its own EState by reading the leader's value from the DSM and for each PartitionPruneInfo in its own copy of EState.es_part_prune_infos, gets the set of initially valid subplans by referring to es_part_prune_results in lieu of performing initial pruning again. Should workers, as Tom says, instead do the pruning and cross-check the result to give an error if it doesn't match the leader's? The error message can't specifically point out which, though a user would at least know that they have functions in their database with wrong volatility markings. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
v1-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pla.patch
Description: Binary data
v1-0002-Share-initial-pruning-result-between-parallel-que.patch
Description: Binary data