corasaurus-hex commented on issue #16821: URL: https://github.com/apache/datafusion/issues/16821#issuecomment-3094934112
I have a problem I'd love to solve but I'm not exactly sure how to go about it. My issue is I need to do a join across a time axis, where an event in the past has a corresponding event between two dates in the future, and where field A is identical between the two events and field B is within some set of values in the past and within another set of values in the future. I believe if I partitioned on field A and ordered by date then I could do the self-join manually with far more efficiency than a more generic self-join. I was thinking I could put this join behind a streaming table provider of some sort so that the consumer of this join can take advantage of the query engine to filter/group/aggregate the results of this join, while allowing myself to perform a more optimal join. This problem seems to be in the same vein as what you're describing here? It's a query where knowledge of partitioning and ordering can make a big difference in performance. I'd love to see examples of something like this. Does that sound like what you're looking for? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org