timsaucer commented on issue #16821: URL: https://github.com/apache/datafusion/issues/16821#issuecomment-3161406074
> I have a problem I'd love to solve but I'm not exactly sure how to go about it. My issue is I need to do a join across a time axis, where an event in the past has a corresponding event between two dates in the future, and where field A is identical between the two events and field B is within some set of values in the past and within another set of values in the future. I believe if I partitioned on field A and ordered by date then I could do the self-join manually with far more efficiency than a more generic self-join. This sounds very interesting. Can we make it a concrete example? I think I'm missing part of what the output would look like. Suppose I had this data frame: ``` +------------+------+-------+---------+ | event | time | price | acct_nr | +------------+------+-------+---------+ | purchase-1 | 1 | 90.0 | 429 | | sale-2 | 2 | 135.0 | 184 | | sale-3 | 3 | 150.0 | 129 | | purchase-1 | 4 | 100.0 | 584 | | sale-2 | 5 | 125.0 | 231 | +------------+------+-------+---------+ ``` And I did the self join you're talking about where I'm searching for cases where `event` is the common Field A you describe but I want cases where the price goes up from early to late times. This would yield ``` +------------+------------+-------------+---------------+-----------+------------+--------------+ | event | early_time | early_price | early_acct_nr | late_time | late_price | late_acct_nr | +------------+------------+-------------+---------------+-----------+------------+--------------+ | purchase-1 | 1 | 90.0 | 429 | 4 | 100.0 | 584 | +------------+------------+-------------+---------------+-----------+------------+--------------+ ``` I added in an extra piece of data because I didn't know what all the self join would entail - do you want something that ends up sending out only a subset of the data. If you have a real world use case that is more compelling, that would be helpful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org