Hi Walker,

Thanks for picking up the KIP!

I hope you don't mind if I pile on a question also.

The motivation section depicts the current "non-optimal" case as a
multi-way KTable equi-join, but the proposed API looks like a
multi-way KStream windowed join+aggregate. My question is specifically
whether the intent is to improve the experience with KTables,
KStreams, or both.

Thanks,
-John

On Wed, Oct 23, 2019 at 12:49 PM Walker Carlson <wcarl...@confluent.io> wrote:
>
> Hi Sophie,
>
> Thank you for your comments. As for the different methods signatures I have
> not really considered any other options but  while I do agree it is
> confusing, I don't see any obvious solutions. The problem is that the
> cogroup essentially pairs a group stream with an aggregator and when it is
> first made the method is called on a groupedStream already. However each
> subsequent stream-aggregator pair is added on to a cogroup stream so it
> needs both arguments.
>
> For the second question you should not need a joiner. The idea is that you
> can collect many grouped streams with overlapping key spaces and any kind
> of value types. Once aggregated its value will be reduced into one type.
> This is why you need only one initializer. Each aggregator will need to
> integrate the new value with the new object made in the initializer.
> Does that make sense?
>
> This is a good question and I will include this explanation in the kip as
> well.
>
> Thanks,
> Walker
>
> On Tue, Oct 22, 2019 at 8:59 PM Sophie Blee-Goldman <sop...@confluent.io>
> wrote:
>
> > Hey Walker,
> >
> > Thanks for the KIP! I have just a couple of questions:
> >
> > 1) It seems a little awkward to me that with the current API, we have a
> > nearly identical
> > "add stream to cogroup" method, except for the first which has a different
> > signature
> > (ie the first stream is joined as stream.cogroup(Aggregator) while the
> > subsequent ones
> > are joined as .cogroup(Stream, Aggregator) ). I'm not sure what it would
> > look like exactly,
> > but I was just wondering if you'd considered and/or rejected any
> > alternative APIs?
> >
> > 2) This might just be my lack of familiarity with "cogroup" as a concept,
> > but with the
> > current (non-optimal) API the user seems to have some control over how
> > exactly
> > the different streams are joined through the ValueJoiners. Would this new
> > cogroup
> > simply concatenate the values from the different cogroup streams, or could
> > users
> > potentially pass some kind of Joiner to the cogroup/aggregate methods? Or,
> > is the
> > whole point of cogroups that you no longer ever need to specify a Joiner?
> > If so, you
> > should add a short line to the KIP explaining that for those of us who
> > aren't fluent
> > in cogroup semantics :)
> >
> > Cheers,
> > Sophie
> >
> > On Thu, Oct 17, 2019 at 3:06 PM Walker Carlson <wcarl...@confluent.io>
> > wrote:
> >
> > > Good catch I updated that.
> > >
> > > I have made a PR for this KIP
> > >
> > > I then am splitting it into 3 parts, first cogroup for a key-value store
> > (
> > > here <https://github.com/apache/kafka/pull/7538>), then for a
> > > timeWindowedStore, and then a sessionWindowedStore + ensuring
> > partitioning.
> > >
> > > Walker
> > >
> > > On Tue, Oct 15, 2019 at 12:47 PM Matthias J. Sax <matth...@confluent.io>
> > > wrote:
> > >
> > > > Walker,
> > > >
> > > > thanks for picking up the KIP and reworking it for the changed API.
> > > >
> > > > Overall, the updated API suggestions make sense to me. The seem to
> > align
> > > > quite nicely with our current API design.
> > > >
> > > > One nit: In `CogroupedKStream#aggregate(...)` the type parameter of
> > > > `Materialized` should be `V`, not `VR`?
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > > >
> > > > On 10/14/19 2:57 PM, Walker Carlson wrote:
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-150+-+Kafka-Streams+Cogroup
> > > > > here
> > > > > is a link
> > > > >
> > > > > On Mon, Oct 14, 2019 at 2:52 PM Walker Carlson <
> > wcarl...@confluent.io>
> > > > > wrote:
> > > > >
> > > > >> Hello all,
> > > > >>
> > > > >> I have picked up and updated KIP-150. Due to changes to the project
> > > > since
> > > > >> KIP #150 was written there are a few items that need to be updated.
> > > > >>
> > > > >> First item that changed is the adoption of the Materialized
> > parameter.
> > > > >>
> > > > >> The second item is the WindowedBy. How the old KIP handles windowing
> > > is
> > > > >> that it overloads the aggregate function to take in a Window object
> > as
> > > > well
> > > > >> as the other parameters. The current practice to window
> > > grouped-streams
> > > > is
> > > > >> to call windowedBy and receive a windowed stream object. The
> > existing
> > > > >> interface for a windowed stream made from a grouped stream will not
> > > work
> > > > >> for cogrouped streams. Hence, we have to make new interfaces for
> > > > cogrouped
> > > > >> windowed streams.
> > > > >>
> > > > >> Please take a look, I would like to hear your feedback,
> > > > >>
> > > > >> Walker
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >

Reply via email to