mbutrovich commented on PR #3349:
URL: 
https://github.com/apache/datafusion-comet/pull/3349#issuecomment-3863033259

   > however some particular executor would use only few of transferred tasks? 
   
   Yep, exactly. Iceberg calculates all of the FileScanTasks for partitions, 
but Comet currently distributes all of them to every partition (like a 
broadcast).
   
   > This PR makes some FileScanTask distribution between executors, only 
needed are sent?
   
   Yep, exactly. An executor now only receives its partition's tasks.
   
   
   > if so what is the distribution algorithm and do you envision any shuffle 
increase if executors reads not collocated data?
   
   That's implementation defined. In this case Iceberg does its own bin-packing 
algorithm and has its own configs for target split size (amount of data per 
task) to read. We respect and that and accept the distribution that Iceberg 
generates both at planning time (initial tasks) and after subqueries and DPP 
(if applicable). Iceberg handles it all, and Comet just accepts them. If the 
Iceberg bin-packing/task allocation algorithm changes, Comet will be oblivious.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to