Re: [I] RFC: What table provider features would be helpful in an example? [datafusion]

via GitHub Sun, 20 Jul 2025 17:22:13 -0700


corasaurus-hex commented on issue #16821:
URL: https://github.com/apache/datafusion/issues/16821#issuecomment-3094934112


   I have a problem I'd love to solve but I'm not exactly sure how to go about 
it. My issue is I need to do a join across a time axis, where an event in the 
past has a corresponding event between two dates in the future, and where field 
A is identical between the two events and field B is within some set of values 
in the past and within another set of values in the future. I believe if I 
partitioned on field A and ordered by date then I could do the self-join 
manually with far more efficiency than a more generic self-join.
   
   I was thinking I could put this join behind a streaming table provider of 
some sort so that the consumer of this join can take advantage of the query 
engine to filter/group/aggregate the results of this join, while allowing 
myself to perform a more optimal join.
   
   This problem seems to be in the same vein as what you're describing here? 
It's a query where knowledge of partitioning and ordering can make a big 
difference in performance. I'd love to see examples of something like this.
   
   Does that sound like what you're looking for?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] RFC: What table provider features would be helpful in an example? [datafusion]

Reply via email to