avamingli commented on code in PR #1571:
URL: https://github.com/apache/cloudberry/pull/1571#discussion_r2793268119


##########
src/backend/optimizer/util/pathnode.c:
##########
@@ -3019,6 +3018,16 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, 
Path *subpath,
        pathnode->path.parallel_workers = 0;
        pathnode->path.pathkeys = NIL;  /* Gather has unordered result */
 
+       /* Inherit locus from subpath — Gather collects within the same segment,
+        * data distribution across segments doesn't change. */
+       pathnode->path.locus = subpath->locus;
+       pathnode->path.locus.parallel_workers = 0;      /* Gather output is 
single-stream */
+
+       pathnode->path.motionHazard = subpath->motionHazard;
+       pathnode->path.barrierHazard = subpath->barrierHazard;
+       pathnode->path.rescannable = false;
+       pathnode->path.sameslice_relids = subpath->sameslice_relids;
+

Review Comment:
   create_gather_path was guarded by Assert(false) for a reason — the locus 
semantics of Gather in an MPP context were never fully defined or implemented.
   
   The comment is wrong, and the conclusion is wrong. 
   Gather does change the data distribution. 
   Simplest example: a Hash-distributed table's parallel partial scan has locus 
HashWorkers (data is hash-distributed across segments, and within each segment 
it's split across workers). 
   Gather collects all workers' data back to the leader process, so the locus 
should become Hash. You can't just copy the subpath locus and zero out 
parallel_workers — that's incorrect for HashWorkers , SegmentGeneral, and might 
be other locus types as well.
   Also, this change is global — create_gather_path isn't FDW-specific. Once 
the Assert is gone, every code path that calls this function is affected. The 
hardest case is JOINs — mixing Gather with CBDB-style parallelism 
(Motion/slice) introduces a ton of problems. This PR doesn't seem to have 
considered any of that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to