MisterRaindrop commented on PR #1571:
URL: https://github.com/apache/cloudberry/pull/1571#issuecomment-3897151327

   > Overall, FDW parallel scan is a direction worth exploring, but this 
approach is too rough. The core problems are:
   > 
   > 1. locus transition semantics for Gather in an MPP context haven't been 
thought through, and the changes are too broad.
   > 2. FDW is a black box from the database's perspective.
   >    For heap tables we have parallel scan (divide work by pages), for 
AO/AOCS we have parallel scan (divide work by files) — the work partitioning is 
well-defined.
   >    But for FDWs, the parallel behavior depends entirely on the FDW's own 
implementation. If an FDW (say file_fdw) sets parallel_safe = true following 
planner's parallel logic but doesn't actually implement the DSM parallel 
callbacks (EstimateDSMForeignScan, InitializeDSMForeignScan, 
InitializeWorkerForeignScan), then multiple workers will each scan the full 
dataset, producing duplicate rows.
   
   I'm not very familiar with Cloudberry. Still learning.
   
   FDW itself is a black box. Its specific implementation largely depends on 
how the user implements it. My understanding is that users need to take 
responsibility for their own implementations. Additionally, I should only 
enable gather for FDW. In other cases, it should remain false, this will 
parallel processing advantages of PostgreSQL?
   
   
   Additionally, I've looked into other aspects of FDW parallelism. Currently, 
it seems there is no optimal solution.
   
   So, should we aim to implement parallelism that is transparent to users? Or 
are there better approaches? Could you share some idea?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to