avamingli commented on code in PR #1571:
URL: https://github.com/apache/cloudberry/pull/1571#discussion_r2793268119
##########
src/backend/optimizer/util/pathnode.c:
##########
@@ -3019,6 +3018,16 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel,
Path *subpath,
pathnode->path.parallel_workers = 0;
pathnode->path.pathkeys = NIL; /* Gather has unordered result */
+ /* Inherit locus from subpath — Gather collects within the same segment,
+ * data distribution across segments doesn't change. */
+ pathnode->path.locus = subpath->locus;
+ pathnode->path.locus.parallel_workers = 0; /* Gather output is
single-stream */
+
+ pathnode->path.motionHazard = subpath->motionHazard;
+ pathnode->path.barrierHazard = subpath->barrierHazard;
+ pathnode->path.rescannable = false;
+ pathnode->path.sameslice_relids = subpath->sameslice_relids;
+
Review Comment:
create_gather_path was guarded by Assert(false) for a reason — the locus
semantics of Gather in an MPP context were never fully defined or implemented.
The comment is wrong, and the conclusion is wrong.
Gather does change the data distribution.
Simplest example: a Hash-distributed table's parallel partial scan has locus
HashWorkers (data is hash-distributed across segments, and within each segment
it's split across workers).
Gather collects all workers' data back to the leader process, so the locus
should become Hash. You can't just copy the subpath locus and zero out
parallel_workers — that's incorrect for Strewn, Replicated, SegmentGeneral, and
other locus types as well.
Also, this change is global — create_gather_path isn't FDW-specific. Once
the Assert is gone, every code path that calls this function is affected. The
hardest case is JOINs — mixing Gather with CBDB-style parallelism
(Motion/slice) introduces a ton of problems. This PR doesn't seem to have
considered any of that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]