Cool KIP, Walker! Thanks for sharing this proposal.

A few clarifications:

1. Is the order that records exit the buffer in necessarily the same as the
order that records enter the buffer in, or no? Based on the description in
the KIP, it sounds like the answer is no, i.e., records will exit the
buffer in increasing timestamp order, which means that they may be ordered
(even for the same key) compared to the input order.

2. What happens if the join grace period is nonzero, and a stream-side
record arrives with a timestamp that is older than the current stream time
minus the grace period? Will this record trigger a join result, or will it
be dropped? Based on the description for what happens when the join grace
period is set to zero, it sounds like the late record will be dropped, even
if the join grace period is nonzero. Is that true?

3. What could cause stream time to advance, for purposes of removing
records from the join buffer? For example, will new records arriving on the
table side of the join cause stream time to advance? From the KIP it sounds
like only stream-side records will advance stream time -- does that mean
that the join processor itself will have to track this stream time?

Also +1 to Lucas's question about what options will be available for
configuring the join buffer. Will users have the option to choose whether
they want the buffer to be in-memory vs persistent?

- Victoria

On Fri, Apr 28, 2023 at 11:54 AM Lucas Brutschy
<lbruts...@confluent.io.invalid> wrote:

> HI Walker,
>
> thanks for the KIP! We definitely need this. I have two questions:
>
>  - Have you considered allowing the customization of the underlying
> buffer implementation? As I can see, `StreamJoined` lets you customize
> the underlying store via a `WindowStoreSupplier`. Would it make sense
> for `Joined` to have this as well? I can imagine one may want to limit
> the number of records in the buffer, for example. If we hit the
> maximum, the only option would be to drop semantic guarantees, but
> users may still want to do this.
>  - With "second option on the table side" you are referring to
> versioned tables, right? Will the buffer on the stream side behave any
> different whether the table side is versioned or not?
>
> Finally, I think a simple example in the motivation section could help
> non-experts understand the KIP.
>
> Best,
> Lucas
>
> On Tue, Apr 25, 2023 at 9:13 PM Walker Carlson
> <wcarl...@confluent.io.invalid> wrote:
> >
> > Hello everybody,
> >
> > I have a stream proposal to improve the stream table join by adding a
> grace
> > period and buffer to the stream side of the join to allow processing in
> > timestamp order matching the recent improvements of the versioned tables.
> >
> > Please take a look here <https://cwiki.apache.org/confluence/x/lAs0Dw>
> and
> > share your thoughts.
> >
> > best,
> > Walker
>

Reply via email to