Cool KIP, Walker! Thanks for sharing this proposal. A few clarifications:
1. Is the order that records exit the buffer in necessarily the same as the order that records enter the buffer in, or no? Based on the description in the KIP, it sounds like the answer is no, i.e., records will exit the buffer in increasing timestamp order, which means that they may be ordered (even for the same key) compared to the input order. 2. What happens if the join grace period is nonzero, and a stream-side record arrives with a timestamp that is older than the current stream time minus the grace period? Will this record trigger a join result, or will it be dropped? Based on the description for what happens when the join grace period is set to zero, it sounds like the late record will be dropped, even if the join grace period is nonzero. Is that true? 3. What could cause stream time to advance, for purposes of removing records from the join buffer? For example, will new records arriving on the table side of the join cause stream time to advance? From the KIP it sounds like only stream-side records will advance stream time -- does that mean that the join processor itself will have to track this stream time? Also +1 to Lucas's question about what options will be available for configuring the join buffer. Will users have the option to choose whether they want the buffer to be in-memory vs persistent? - Victoria On Fri, Apr 28, 2023 at 11:54 AM Lucas Brutschy <lbruts...@confluent.io.invalid> wrote: > HI Walker, > > thanks for the KIP! We definitely need this. I have two questions: > > - Have you considered allowing the customization of the underlying > buffer implementation? As I can see, `StreamJoined` lets you customize > the underlying store via a `WindowStoreSupplier`. Would it make sense > for `Joined` to have this as well? I can imagine one may want to limit > the number of records in the buffer, for example. If we hit the > maximum, the only option would be to drop semantic guarantees, but > users may still want to do this. > - With "second option on the table side" you are referring to > versioned tables, right? Will the buffer on the stream side behave any > different whether the table side is versioned or not? > > Finally, I think a simple example in the motivation section could help > non-experts understand the KIP. > > Best, > Lucas > > On Tue, Apr 25, 2023 at 9:13 PM Walker Carlson > <wcarl...@confluent.io.invalid> wrote: > > > > Hello everybody, > > > > I have a stream proposal to improve the stream table join by adding a > grace > > period and buffer to the stream side of the join to allow processing in > > timestamp order matching the recent improvements of the versioned tables. > > > > Please take a look here <https://cwiki.apache.org/confluence/x/lAs0Dw> > and > > share your thoughts. > > > > best, > > Walker >