Re: [I] support dynamic filtering on partitioned data from file source [datafusion]

via GitHub Tue, 31 Mar 2026 11:13:23 -0700


alamb commented on issue #20195:
URL: https://github.com/apache/datafusion/issues/20195#issuecomment-4164493836


   > I guess the question depends on what the alternative is. As far as I can 
remember / think of the two alternatives would be to try every partition's hash 
table (poor build side performance and questionable probe side performance) or 
to make a combined hash table with all values (poor memory use).
   
   
   I think in certain circumstances you only have to try a single partition's 
hash table
   
   For example:
   1. The join is an eqi-join (equality predicate) 
   2. The inputs have exactly the same hash (or range) partitioning on the 
columns used as join keys
   
   In that case there is exactly one partition's hash table where matching 
tuples would be (if they are there at all) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] support dynamic filtering on partitioned data from file source [datafusion]

Reply via email to