Hi Walker, Thanks for picking up the KIP!
I hope you don't mind if I pile on a question also. The motivation section depicts the current "non-optimal" case as a multi-way KTable equi-join, but the proposed API looks like a multi-way KStream windowed join+aggregate. My question is specifically whether the intent is to improve the experience with KTables, KStreams, or both. Thanks, -John On Wed, Oct 23, 2019 at 12:49 PM Walker Carlson <wcarl...@confluent.io> wrote: > > Hi Sophie, > > Thank you for your comments. As for the different methods signatures I have > not really considered any other options but while I do agree it is > confusing, I don't see any obvious solutions. The problem is that the > cogroup essentially pairs a group stream with an aggregator and when it is > first made the method is called on a groupedStream already. However each > subsequent stream-aggregator pair is added on to a cogroup stream so it > needs both arguments. > > For the second question you should not need a joiner. The idea is that you > can collect many grouped streams with overlapping key spaces and any kind > of value types. Once aggregated its value will be reduced into one type. > This is why you need only one initializer. Each aggregator will need to > integrate the new value with the new object made in the initializer. > Does that make sense? > > This is a good question and I will include this explanation in the kip as > well. > > Thanks, > Walker > > On Tue, Oct 22, 2019 at 8:59 PM Sophie Blee-Goldman <sop...@confluent.io> > wrote: > > > Hey Walker, > > > > Thanks for the KIP! I have just a couple of questions: > > > > 1) It seems a little awkward to me that with the current API, we have a > > nearly identical > > "add stream to cogroup" method, except for the first which has a different > > signature > > (ie the first stream is joined as stream.cogroup(Aggregator) while the > > subsequent ones > > are joined as .cogroup(Stream, Aggregator) ). I'm not sure what it would > > look like exactly, > > but I was just wondering if you'd considered and/or rejected any > > alternative APIs? > > > > 2) This might just be my lack of familiarity with "cogroup" as a concept, > > but with the > > current (non-optimal) API the user seems to have some control over how > > exactly > > the different streams are joined through the ValueJoiners. Would this new > > cogroup > > simply concatenate the values from the different cogroup streams, or could > > users > > potentially pass some kind of Joiner to the cogroup/aggregate methods? Or, > > is the > > whole point of cogroups that you no longer ever need to specify a Joiner? > > If so, you > > should add a short line to the KIP explaining that for those of us who > > aren't fluent > > in cogroup semantics :) > > > > Cheers, > > Sophie > > > > On Thu, Oct 17, 2019 at 3:06 PM Walker Carlson <wcarl...@confluent.io> > > wrote: > > > > > Good catch I updated that. > > > > > > I have made a PR for this KIP > > > > > > I then am splitting it into 3 parts, first cogroup for a key-value store > > ( > > > here <https://github.com/apache/kafka/pull/7538>), then for a > > > timeWindowedStore, and then a sessionWindowedStore + ensuring > > partitioning. > > > > > > Walker > > > > > > On Tue, Oct 15, 2019 at 12:47 PM Matthias J. Sax <matth...@confluent.io> > > > wrote: > > > > > > > Walker, > > > > > > > > thanks for picking up the KIP and reworking it for the changed API. > > > > > > > > Overall, the updated API suggestions make sense to me. The seem to > > align > > > > quite nicely with our current API design. > > > > > > > > One nit: In `CogroupedKStream#aggregate(...)` the type parameter of > > > > `Materialized` should be `V`, not `VR`? > > > > > > > > > > > > -Matthias > > > > > > > > > > > > > > > > On 10/14/19 2:57 PM, Walker Carlson wrote: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-150+-+Kafka-Streams+Cogroup > > > > > here > > > > > is a link > > > > > > > > > > On Mon, Oct 14, 2019 at 2:52 PM Walker Carlson < > > wcarl...@confluent.io> > > > > > wrote: > > > > > > > > > >> Hello all, > > > > >> > > > > >> I have picked up and updated KIP-150. Due to changes to the project > > > > since > > > > >> KIP #150 was written there are a few items that need to be updated. > > > > >> > > > > >> First item that changed is the adoption of the Materialized > > parameter. > > > > >> > > > > >> The second item is the WindowedBy. How the old KIP handles windowing > > > is > > > > >> that it overloads the aggregate function to take in a Window object > > as > > > > well > > > > >> as the other parameters. The current practice to window > > > grouped-streams > > > > is > > > > >> to call windowedBy and receive a windowed stream object. The > > existing > > > > >> interface for a windowed stream made from a grouped stream will not > > > work > > > > >> for cogrouped streams. Hence, we have to make new interfaces for > > > > cogrouped > > > > >> windowed streams. > > > > >> > > > > >> Please take a look, I would like to hear your feedback, > > > > >> > > > > >> Walker > > > > >> > > > > > > > > > > > > > > > > > >