MisterRaindrop commented on PR #1571: URL: https://github.com/apache/cloudberry/pull/1571#issuecomment-3950109069
> > Are the variables ParallelWorkerNumberOfSlice and TotalParallelWorkerNumberOfSlice stable and reliable under the current CBDB parallel framework? > > Yes, they are stable and reliable under the current CBDB parallel framework. But I'm not sure how you plan to use them. > > > During execution, the FDW calculates the virtual segment ID based on these two values, modifies the HTTP header sent to PXF, and allows PXF's round-robin sharding mechanism to automatically distribute data evenly among all gang workers. > > I'm not entirely sure I follow — isn't this essentially how MPP PXF works today? except `the virtual segment ID based on these two values` -- not sure, off the hand I think it's not enough, different Slice on same Segment could have same parallel workers. Yes, essentially, it reuses the existing MPP round-robin sharding mechanism of PXF—by modifying the segment ID/count in the HTTP header, PXF can distribute data to N×W gang workers instead of N physical segments. No changes are required on the PXF server side. Regarding ParallelWorkerNumberOfSlice: From the assignment logic in parallel.c, workers on the same segment are assigned incrementally via DSM entry (0, 1, 2, ...), which should be unique. However, I want to confirm: In the CBDB parallel framework, is this value guaranteed to be unique within the same slice on the same segment? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
