Hi Bill,

For the record, the current proposal looks good to me also.

Thanks,
-John

On Tue, Sep 17, 2019 at 5:06 PM Matthias J. Sax <matth...@confluent.io> wrote:
>
> > Just to clarify I'll update `as(StoreSupplier, StoreSupplier)` to
> > `with(..., ...)` and change the class name from `StreamJoined` to
> > `StreamJoin`
>
>
> Thanks Bill. SGTM.
>
>
>
> -Matthias
>
>
> On 9/17/19 4:52 PM, aishwarya kumar wrote:
> > Hi Bill,
> >
> > Thanks for clarifying, and the KIP looks great!!
> >
> > Best regards,
> > Aishwarya
> >
> > On Tue, Sep 17, 2019, 6:16 PM Bill Bejeck <bbej...@gmail.com> wrote:
> >
> >> Hi Aishwarya,
> >>
> >> On Tue, Sep 17, 2019 at 1:41 PM aishwarya kumar <ash26...@gmail.com>
> >> wrote:
> >>
> >>> Will this be applicable to Kstream-Ktable joins as well? Or do users
> >> always
> >>> materialize these joins explicitly?
> >>>
> >>
> >> No, this change applies to KStream-KStream joins only.  With KStream-KTable
> >> joins KafkaStreams has materialized the KTable already, and we don't need
> >> to do anything with the KStream instance in this case.
> >>
> >>
> >>> I'm not sure if its even necessary (or makes sense), just trying to
> >>> understand why the change is applicable to Kstream joins only?
> >>>
> >>
> >> The full details are in the KIP, but in a nutshell, we needed to break the
> >> naming of the StateStore from `Joined.withName` and provide users a way to
> >> name the StateStore explicitly.  While making the changes, we realized it
> >> would be beneficial to give users the ability to use different WindowStore
> >> implementations.  The most likely reason to use different WindowStores
> >> would be to enable in-memory joins.
> >>
> >>
> >>> Best,
> >>> Aishwarya
> >>>
> >>
> >> Regards,
> >> Bill
> >>
> >>
> >>>
> >>> On Tue, Sep 17, 2019 at 4:05 PM Bill Bejeck <bbej...@gmail.com> wrote:
> >>>
> >>>> Guozhang,
> >>>>
> >>>> Thanks for the comments.
> >>>>
> >>>> Yes, you are correct in your assessment regarding names, *if* the users
> >>>> provide their own StoreSuppliers  When constructing as StoreSupplier,
> >> the
> >>>> name can't be null, and additionally, there is no way to update the
> >> name.
> >>>> Further downstream, the underlying StateStore instances use the
> >> provided
> >>>> name for registration so they must be unique.  If users don't provide
> >>>> distinct names for the StoreSuppliers, KafkaStreams will thow a
> >>>> StreamsException when building the topology.
> >>>>
> >>>> Thanks,
> >>>> Bill
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Sep 17, 2019 at 9:39 AM Guozhang Wang <wangg...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Hello Bill,
> >>>>>
> >>>>> Thanks for the updated KIP. I made a pass on the StreamJoined
> >> section.
> >>>> Just
> >>>>> a quick question from user's perspective: if a user wants to provide
> >> a
> >>>>> customized store-supplier, she is forced to also provide a name via
> >> the
> >>>>> store-supplier. So there's no way to say "I want to provide my own
> >>> store
> >>>>> engine but let the library decide its name", is that right?
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Tue, Sep 17, 2019 at 8:53 AM Bill Bejeck <bbej...@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> Bumping this discussion as we need to re-vote before the KIP
> >>> deadline.
> >>>>>>
> >>>>>> On Fri, Sep 13, 2019 at 10:29 AM Bill Bejeck <bbej...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi All,
> >>>>>>>
> >>>>>>> While working on the implementation of KIP-479, some issues came
> >> to
> >>>>> light
> >>>>>>> that the KIP as written won't work.  I have updated the KIP with
> >> a
> >>>>>> solution
> >>>>>>> I believe will solve the original problem as well as address the
> >>>>>>> impediment to the initial approach.
> >>>>>>>
> >>>>>>> This update is a significant change, so please review the updated
> >>> KIP
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+StreamJoined+config+object+to+Join
> >>>>>> and
> >>>>>>> provide feedback.  After we conclude the discussion, there will
> >> be
> >>> a
> >>>>>>> re-vote.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> Bill
> >>>>>>>
> >>>>>>> On Wed, Jul 17, 2019 at 7:01 PM Guozhang Wang <
> >> wangg...@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Bill, thanks for your explanations. I'm on board with your
> >>>> decision
> >>>>>>>> too.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>> On Wed, Jul 17, 2019 at 10:20 AM Bill Bejeck <bbej...@gmail.com
> >>>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks for the response, John.
> >>>>>>>>>
> >>>>>>>>>> If I can offer my thoughts, it seems better to just document
> >>> on
> >>>>> the
> >>>>>>>>>> Stream join javadoc for the Materialized parameter that it
> >>> will
> >>>>> not
> >>>>>>>>>> make the join result queriable. I'm not opposed to the
> >>> queriable
> >>>>>> flag
> >>>>>>>>>> in general, but introducing it is a much larger
> >> consideration
> >>>> that
> >>>>>> has
> >>>>>>>>>> previously derailed this KIP discussion. In the interest of
> >>> just
> >>>>>>>>>> closing the gap and keeping the API change small, it seems
> >>>> better
> >>>>> to
> >>>>>>>>>> just go with documentation for now.
> >>>>>>>>>
> >>>>>>>>> I agree with your statement here.  IMHO the most important
> >> goal
> >>> of
> >>>>>> this
> >>>>>>>> KIP
> >>>>>>>>> is to not breaking existing users and gain some consistency of
> >>> the
> >>>>>> API.
> >>>>>>>>>
> >>>>>>>>> I'll update the KIP accordingly.
> >>>>>>>>>
> >>>>>>>>> -Bill
> >>>>>>>>>
> >>>>>>>>> On Tue, Jul 16, 2019 at 11:55 AM John Roesler <
> >>> j...@confluent.io>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Bill,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for driving this KIP toward a conclusion. I'm on
> >> board
> >>>> with
> >>>>>>>>>> your decision.
> >>>>>>>>>>
> >>>>>>>>>> You didn't mention whether you're still proposing to add the
> >>>>>>>>>> "queriable" flag to the Materialized config object, or just
> >>>>> document
> >>>>>>>>>> that a Stream join is never queriable. Both options have
> >> come
> >>> up
> >>>>>>>>>> earlier in the discussion, so it would be good to pin this
> >>> down.
> >>>>>>>>>>
> >>>>>>>>>> If I can offer my thoughts, it seems better to just document
> >>> on
> >>>>> the
> >>>>>>>>>> Stream join javadoc for the Materialized parameter that it
> >>> will
> >>>>> not
> >>>>>>>>>> make the join result queriable. I'm not opposed to the
> >>> queriable
> >>>>>> flag
> >>>>>>>>>> in general, but introducing it is a much larger
> >> consideration
> >>>> that
> >>>>>> has
> >>>>>>>>>> previously derailed this KIP discussion. In the interest of
> >>> just
> >>>>>>>>>> closing the gap and keeping the API change small, it seems
> >>>> better
> >>>>> to
> >>>>>>>>>> just go with documentation for now.
> >>>>>>>>>>
> >>>>>>>>>> Thanks again,
> >>>>>>>>>> -John
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Bill Bejeck <
> >>> bbej...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks all for the great discussion so far.
> >>>>>>>>>>>
> >>>>>>>>>>> Everyone has made excellent points, and I appreciate the
> >>>> detail
> >>>>>>>>> everyone
> >>>>>>>>>>> has put into their arguments.
> >>>>>>>>>>>
> >>>>>>>>>>> However, after carefully evaluating all the points made so
> >>>> far,
> >>>>>>>>> creating
> >>>>>>>>>> an
> >>>>>>>>>>> overload with Materialized is still my #1 option.
> >>>>>>>>>>> My reasoning for saying so is two-fold:
> >>>>>>>>>>>
> >>>>>>>>>>>    1. It's a small change, and IMHO since it's consistent
> >>> with
> >>>>> our
> >>>>>>>>>> current
> >>>>>>>>>>>    API concerning state store usage, the cognitive load on
> >>>> users
> >>>>>>>> will
> >>>>>>>>> be
> >>>>>>>>>>>    minimal.
> >>>>>>>>>>>    2. It achieves the most important goal of this KIP,
> >>> namely
> >>>> to
> >>>>>>>> close
> >>>>>>>>>> the
> >>>>>>>>>>>    gap of naming state stores independently of the join
> >>>> operator
> >>>>>>>> name.
> >>>>>>>>>>>
> >>>>>>>>>>> Additionally, I agree with the points made by Matthias
> >>> earlier
> >>>>> (I
> >>>>>>>>> realize
> >>>>>>>>>>> there is some overlap here).
> >>>>>>>>>>>
> >>>>>>>>>>>>  - the main purpose of this KIP is to close the naming
> >> gap
> >>>>> what
> >>>>>> we
> >>>>>>>>>> achieve
> >>>>>>>>>>>>  - we can allow people to use the new in-memory store
> >>>>>>>>>>>>  - we allow people to enable/disable caching
> >>>>>>>>>>>>  - we unify the API
> >>>>>>>>>>>>  - we decouple querying from naming
> >>>>>>>>>>>>  - it's a small API change
> >>>>>>>>>>>
> >>>>>>>>>>> Although it's not a perfect solution,  IMHO the positives
> >> of
> >>>>> using
> >>>>>>>>>>> Materialize far outweigh the negatives, and from what
> >> we've
> >>>>>>>> discussed
> >>>>>>>>> so
> >>>>>>>>>>> far, anything we implement seems to involve an additional
> >>>> change
> >>>>>>>> down
> >>>>>>>>> the
> >>>>>>>>>>> road.
> >>>>>>>>>>>
> >>>>>>>>>>> If others are still strongly opposed to using
> >> Materialized,
> >>> my
> >>>>>> other
> >>>>>>>>>>> preferences would be
> >>>>>>>>>>>
> >>>>>>>>>>>    1. Add a "withStoreName" to Joined.  Although I agree
> >>> with
> >>>>>>>> Guozhang
> >>>>>>>>>> that
> >>>>>>>>>>>    having a parameter that only applies to one use-case
> >>> would
> >>>> be
> >>>>>>>>> clumsy.
> >>>>>>>>>>>    2. Add a String overload for naming the store, but this
> >>>> would
> >>>>>> be
> >>>>>>>> my
> >>>>>>>>>>>    least favorite option as IMHO this seems to be a step
> >>>>> backward
> >>>>>>>> from
> >>>>>>>>>> why we
> >>>>>>>>>>>    introduced configuration objects in the first place.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Bill
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Jun 27, 2019 at 4:45 PM Matthias J. Sax <
> >>>>>>>> matth...@confluent.io
> >>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks for the KIP Bill!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Great discussion to far.
> >>>>>>>>>>>>
> >>>>>>>>>>>> About John's idea about querying upstream stores and
> >> don't
> >>>>>>>>> materialize
> >>>>>>>>>> a
> >>>>>>>>>>>> store: I agree with Bill that this seems to be an
> >>> orthogonal
> >>>>>>>>> question,
> >>>>>>>>>>>> and it might be better to treat it as an independent
> >>>>>> optimization
> >>>>>>>> and
> >>>>>>>>>>>> exclude from this KIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> What should be the behavior if there is no store
> >>>>>>>>>>>>> configured (e.g., if Materialized with only serdes)
> >> and
> >>>>>>>> querying is
> >>>>>>>>>>>>> enabled?
> >>>>>>>>>>>>
> >>>>>>>>>>>> IMHO, this could be an error case. If one wants to
> >> query a
> >>>>>> store,
> >>>>>>>>> they
> >>>>>>>>>>>> need to provide a name -- if you don't know the name,
> >> how
> >>>>> would
> >>>>>>>> you
> >>>>>>>>>>>> actually query the store (even if it would be possible
> >> to
> >>>> get
> >>>>>> the
> >>>>>>>>> name
> >>>>>>>>>>>> from the `TopologyDescription`, it seems clumsy).
> >>>>>>>>>>>>
> >>>>>>>>>>>> If we don't want to throw an error, materializing seems
> >> to
> >>>> be
> >>>>>> the
> >>>>>>>>> right
> >>>>>>>>>>>> option, to exclude "query optimization" from this KIP. I
> >>>> would
> >>>>>> be
> >>>>>>>> ok
> >>>>>>>>>>>> with this option, even if it's clumsy to get the name
> >> from
> >>>>>>>>>>>> `TopologyDescription`; hence, I would prefer to treat it
> >>> as
> >>>> an
> >>>>>>>> error.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> To get back to the current behavior, users would have
> >> to
> >>>>>>>>>>>>> add a "bytes store supplier" to the Materialized to
> >>>> indicate
> >>>>>>>> that,
> >>>>>>>>>>>>> yes, they really want a state store there.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This sound like a quite subtle semantic difference on
> >> how
> >>> to
> >>>>> use
> >>>>>>>> the
> >>>>>>>>>>>> API. Might be hard to explain to users. I would prefer
> >> to
> >>>> not
> >>>>>>>>>> introduce it.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> About Guozhang's points:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1a) That is actually a good point. However, I believe we
> >>>>> cannot
> >>>>>>>> get
> >>>>>>>>>>>> around this issue easily, and it seems ok to me, to
> >> expose
> >>>> the
> >>>>>>>> actual
> >>>>>>>>>>>> store type we are using. (More thoughts later.)
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1b) I don't see an issue with allowing users to query
> >> all
> >>>>>> stores?
> >>>>>>>>> What
> >>>>>>>>>>>> is the rational behind it? What do we gain by not
> >> allowing
> >>>> it?
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2) While I understand what you are saying, we also
> >>> want/need
> >>>>> to
> >>>>>>>> have
> >>>>>>>>> a
> >>>>>>>>>>>> way in the PAPI to allow users adding "internal/private"
> >>>>>>>>> non-queryable
> >>>>>>>>>>>> stores to a topology. That's possible via
> >>>>>>>>>>>> `Materialized#withQueryingDisabled()`. We could also
> >>> update
> >>>>>>>>>>>> `Topology#addStateStore(StoreBuilder, boolean
> >> isQueryable,
> >>>>>>>>> String...)`
> >>>>>>>>>>>> to address this. Again, I agree with Bill that the
> >> current
> >>>> API
> >>>>>> is
> >>>>>>>>> built
> >>>>>>>>>>>> in a certain way, and if we want to change it, it should
> >>> be
> >>>> a
> >>>>>>>>> separate
> >>>>>>>>>>>> KIP, as it seems to be an orthogonal concern.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Instead, we just restrict KIP-307 to NOT
> >>>>>>>>>>>>> use the Joined.name for state store names and always
> >> use
> >>>>>>>> internal
> >>>>>>>>>> names
> >>>>>>>>>>>> as
> >>>>>>>>>>>>> well, which admittedly indeed leaves a hole of not
> >> being
> >>>>> able
> >>>>>> to
> >>>>>>>>>> cover
> >>>>>>>>>>>> all
> >>>>>>>>>>>>> internal names here
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think it's important to close this gap. Naming
> >> entities
> >>>>> seems
> >>>>>>>> to a
> >>>>>>>>>>>> binary feature: if there is a gap, the feature is more
> >> or
> >>>> less
> >>>>>>>>> useless,
> >>>>>>>>>>>> rendering KIP-307 void.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I like John's detailed list of required features and
> >> what
> >>>>>>>>>>>> Materialized/WindowByteStoreSuppliers offers. My take
> >> is,
> >>>> that
> >>>>>>>> adding
> >>>>>>>>>>>> Materialized including the required run-time checks is
> >> the
> >>>>> best
> >>>>>>>>> option
> >>>>>>>>>>>> we have, for the following reasons:
> >>>>>>>>>>>>
> >>>>>>>>>>>>  - the main purpose of this KIP is to close the naming
> >> gap
> >>>>> what
> >>>>>> we
> >>>>>>>>>> achieve
> >>>>>>>>>>>>  - we can allow people to use the new in-memory store
> >>>>>>>>>>>>  - we allow people to enable/disable caching
> >>>>>>>>>>>>  - we unify the API
> >>>>>>>>>>>>  - we decouple querying from naming
> >>>>>>>>>>>>  - it's a small API change
> >>>>>>>>>>>>
> >>>>>>>>>>>> Adding an overload and only passing in a name, would
> >>> address
> >>>>> the
> >>>>>>>> main
> >>>>>>>>>>>> purpose of the KIP. However, it falls short on all the
> >>> other
> >>>>>>>>> "goodies".
> >>>>>>>>>>>> As you mentioned, passing in `Materialized` might not be
> >>>>> perfect
> >>>>>>>> and
> >>>>>>>>>>>> maybe we need to deprecate is at some point; but this is
> >>>> also
> >>>>>> true
> >>>>>>>>> for
> >>>>>>>>>>>> passing in just a name.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am also not convinced, that a `StreamJoinStore` would
> >>>>> resolve
> >>>>>>>> all
> >>>>>>>>> the
> >>>>>>>>>>>> issues. In the end, as long as we are using a
> >>>> `WindowedStore`
> >>>>>>>>>>>> internally, we need to expose this "implemenation
> >> detail"
> >>> to
> >>>>>>>> users to
> >>>>>>>>>>>> allow them to plug in a custom store. Adding
> >>> `Materialized`
> >>>>> seem
> >>>>>>>> to
> >>>>>>>>> be
> >>>>>>>>>>>> the best short-term fix from my point of view.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 6/27/19 9:56 AM, Guozhang Wang wrote:
> >>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I actually feels better about a new interface but I'm
> >>> not
> >>>>> sure
> >>>>>>>> if
> >>>>>>>>> we
> >>>>>>>>>>>> would
> >>>>>>>>>>>>> need the full configuration of store / log / cache,
> >> now
> >>> or
> >>>>> in
> >>>>>>>> the
> >>>>>>>>>> future
> >>>>>>>>>>>>> ever for stream-stream join.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Right now I feel that 1) we want to improve our
> >>>>> implementation
> >>>>>>>> of
> >>>>>>>>>>>>> stream-stream join, and potentially also allow users
> >> to
> >>>>>>>> customize
> >>>>>>>>>> this
> >>>>>>>>>>>>> implementation but with a more suitable interface than
> >>> the
> >>>>>>>> current
> >>>>>>>>>>>>> WindowStore interface, how to do that is less clear
> >> and
> >>>>>>>>>> execution-wise
> >>>>>>>>>>>> it's
> >>>>>>>>>>>>> (arguably..) not urgent; 2) we want to close the last
> >>> gap
> >>>>>>>>>> (Stream-stream
> >>>>>>>>>>>>> join) of allowing users to specify all internal names
> >> to
> >>>>> help
> >>>>>> on
> >>>>>>>>>> backward
> >>>>>>>>>>>>> compatibility, which is urgent.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Therefore if we want to unblock 2) from 1) in the near
> >>>>> term, I
> >>>>>>>> feel
> >>>>>>>>>>>>> slightly inclined to just add overload functions that
> >>>> takes
> >>>>>> in a
> >>>>>>>>>> store
> >>>>>>>>>>>> name
> >>>>>>>>>>>>> for stream-stream joins only -- and admittedly, in the
> >>>>> future
> >>>>>>>> this
> >>>>>>>>>>>> function
> >>>>>>>>>>>>> maybe deprecated -- i.e. if we have to do something
> >> that
> >>>> we
> >>>>>> "may
> >>>>>>>>>> regret"
> >>>>>>>>>>>> in
> >>>>>>>>>>>>> the future, I'd like to pick the least intrusive
> >> option.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> About `Joined#withStoreName`: since the Joined class
> >>>> itself
> >>>>> is
> >>>>>>>> also
> >>>>>>>>>> used
> >>>>>>>>>>>> in
> >>>>>>>>>>>>> other join types, I feel less comfortable to have a
> >>>>>>>>>>>> `Joined#withStoreName`
> >>>>>>>>>>>>> which is only going to be used by stream-stream join.
> >> Or
> >>>>>> maybe I
> >>>>>>>>> miss
> >>>>>>>>>>>>> something here about the "latter" case that you are
> >>>>> referring
> >>>>>>>> to?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Jun 24, 2019 at 12:16 PM John Roesler <
> >>>>>>>> j...@confluent.io>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks Guozhang,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yep. Maybe we can consider just exactly what the join
> >>>>> needs:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> the WindowStore<Bytes, byte[]> itself is fine, if
> >>> overly
> >>>>>>>> broad,
> >>>>>>>>>>>>>>> since the only two methods we need are
> >>> `window.put(key,
> >>>>>> value,
> >>>>>>>>>>>>>>> context().timestamp())` and `WindowStoreIterator<V2>
> >>>> iter
> >>>>> =
> >>>>>>>>>>>>>>> window.fetch(key, timeFrom, timeTo)`.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> One "middle ground" would be to extract _this_ into a
> >>> new
> >>>>>> store
> >>>>>>>>>>>>>> interface, which only supports these API calls, like
> >>>>>>>>>>>>>> StreamJoinStore<K, V>. This would give us the
> >> latitude
> >>> we
> >>>>>> need
> >>>>>>>> to
> >>>>>>>>>>>>>> efficiently support the exact operation without
> >>>> concerning
> >>>>>>>>> ourselves
> >>>>>>>>>>>>>> with all the other things a WindowStore can do (which
> >>> are
> >>>>>>>>>> unreachable
> >>>>>>>>>>>>>> for the join use case). It would also let us drop
> >>> "store
> >>>>>>>>> duplicates"
> >>>>>>>>>>>>>> from the main WindowStore interface, since it only
> >>> exists
> >>>>> to
> >>>>>>>>> support
> >>>>>>>>>>>>>> the join use case.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> If we were to add a new StreamJoinStore interface,
> >> then
> >>>>> it'd
> >>>>>> be
> >>>>>>>>>>>>>> straightforward how we could add also
> >>>>>>>>>>>>>> `Materialized.as(StreamJoinBytesStoreSupplier)` and
> >> use
> >>>>>>>>>> Materialized,
> >>>>>>>>>>>>>> or alternatively add the ability to set the bytes
> >> store
> >>>> on
> >>>>>>>> Joined.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Personally, I'm kind of leaning toward the latter
> >> (and
> >>>> also
> >>>>>>>> doing
> >>>>>>>>>>>>>> `Joined#withStoreName`), since adding the new
> >> interface
> >>>> to
> >>>>>>>>>>>>>> Materialized then also pollutes the interface for its
> >>>>>> _actual_
> >>>>>>>> use
> >>>>>>>>>>>>>> case of materializing a table view. Of course, to
> >> solve
> >>>> the
> >>>>>>>>>> immediate
> >>>>>>>>>>>>>> problem, all we need is the store name, but we might
> >>> feel
> >>>>>>>> better
> >>>>>>>>>> about
> >>>>>>>>>>>>>> adding the store name to Joined if we _also_ feel
> >> like
> >>> in
> >>>>> the
> >>>>>>>>>> future,
> >>>>>>>>>>>>>> we would add store/log/cache configuration to Joined
> >> as
> >>>>> well.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 12:56 PM Guozhang Wang <
> >>>>>>>>> wangg...@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hello John,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> My main concern is exactly the first point at the
> >>> bottom
> >>>>> of
> >>>>>>>> your
> >>>>>>>>>>>> analysis
> >>>>>>>>>>>>>>> here: "* configure the bytes store". I'm not sure if
> >>>>> using a
> >>>>>>>>> window
> >>>>>>>>>>>> bytes
> >>>>>>>>>>>>>>> store would be ideal for stream-stream windowed
> >> join;
> >>>> e.g.
> >>>>>> we
> >>>>>>>>> could
> >>>>>>>>>>>>>>> consider two dimensional list sorted by timestamps
> >> and
> >>>>> then
> >>>>>> by
> >>>>>>>>>> keys to
> >>>>>>>>>>>> do
> >>>>>>>>>>>>>>> the join, whereas a windowed bytes store is
> >> basically
> >>>>> sorted
> >>>>>>>> by
> >>>>>>>>> key
> >>>>>>>>>>>>>> first,
> >>>>>>>>>>>>>>> then by timestamp. If we expose the Materialized to
> >>> let
> >>>>> user
> >>>>>>>> pass
> >>>>>>>>>> in a
> >>>>>>>>>>>>>>> windowed bytes store, then we would need to change
> >>> that
> >>>> if
> >>>>>> we
> >>>>>>>>> want
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> replace it with a different implementation
> >> interface.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 8:59 AM John Roesler <
> >>>>>>>> j...@confluent.io>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hey Guozhang and Bill,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For what it's worth, I agree with you both!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think it might help the discussion to look
> >>> concretely
> >>>>> at
> >>>>>>>> what
> >>>>>>>>>>>>>>>> Materialized does:
> >>>>>>>>>>>>>>>> * set a WindowBytesStoreSupplier
> >>>>>>>>>>>>>>>> * set a name
> >>>>>>>>>>>>>>>> * set the key/value serdes
> >>>>>>>>>>>>>>>> * disable/enable/configure change-logging
> >>>>>>>>>>>>>>>> * disable/enable caching
> >>>>>>>>>>>>>>>> * configure retention
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Further, looking into the WindowBytesStoreSupplier,
> >>> the
> >>>>>>>>> interface
> >>>>>>>>>> lets
> >>>>>>>>>>>>>> you:
> >>>>>>>>>>>>>>>> * get the segment interval
> >>>>>>>>>>>>>>>> * get the window size
> >>>>>>>>>>>>>>>> * get whether "duplicates" are enabled
> >>>>>>>>>>>>>>>> * get the retention period
> >>>>>>>>>>>>>>>> * (obviously) get a WindowStore<Bytes, byte[]>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We know that Materialized isn't exactly what we
> >> need
> >>>> for
> >>>>>>>> stream
> >>>>>>>>>> joins,
> >>>>>>>>>>>>>>>> but we can see how close Materialized is to what we
> >>>> need.
> >>>>>> If
> >>>>>>>> it
> >>>>>>>>> is
> >>>>>>>>>>>>>>>> close, maybe we can use it and document the gaps,
> >> and
> >>>> if
> >>>>> it
> >>>>>>>> is
> >>>>>>>>> not
> >>>>>>>>>>>>>>>> close, then maybe we should just add what we need
> >> to
> >>>>>> Joined.
> >>>>>>>>>>>>>>>> Stream Join's requirements for its stores:
> >>>>>>>>>>>>>>>> * a multimap store (i.e., it keeps duplicates) for
> >>>>> storing
> >>>>>>>>> general
> >>>>>>>>>>>>>>>> (not windowed) keyed records associated with their
> >>>>>> insertion
> >>>>>>>>>> time, and
> >>>>>>>>>>>>>>>> allows efficient time-bounded lookups and also
> >>>> efficient
> >>>>>>>> purges
> >>>>>>>>>> of old
> >>>>>>>>>>>>>>>> data.
> >>>>>>>>>>>>>>>> ** Note, a properly configured
> >>> WindowBytesStoreSupplier
> >>>>>>>>> satisfies
> >>>>>>>>>> this
> >>>>>>>>>>>>>>>> requirement, and the interface supports the queries
> >>> we
> >>>>> need
> >>>>>>>> to
> >>>>>>>>>> verify
> >>>>>>>>>>>>>>>> the configuration at run-time
> >>>>>>>>>>>>>>>> * set a name for the store
> >>>>>>>>>>>>>>>> * do _not_ set the serdes (they are already set in
> >>>>> Joined)
> >>>>>>>>>>>>>>>> * logging could be configurable (set to enabled
> >> now)
> >>>>>>>>>>>>>>>> * caching could be configurable (set to enabled
> >> now)
> >>>>>>>>>>>>>>>> * do _not_ configure retention (determined by
> >>>>> JoinWindows)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> So, out of six capabilities for Materialized, there
> >>> are
> >>>>> two
> >>>>>>>> we
> >>>>>>>>>> don't
> >>>>>>>>>>>>>>>> want (serdes and retention). These would become
> >>>> run-time
> >>>>>>>> checks
> >>>>>>>>>> if we
> >>>>>>>>>>>>>>>> use it.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> A third questionable capability is to provide a
> >>>>>>>>>>>>>>>> WindowBytesStoreSupplier. Looking at whether the
> >>>>>>>>>>>>>>>> WindowBytesStoreSupplier is the right interface for
> >>>>> Stream
> >>>>>>>> Join:
> >>>>>>>>>>>>>>>> * configuring segment interval is fine
> >>>>>>>>>>>>>>>> * should _not_ configure window size (it's
> >> determined
> >>>> by
> >>>>>>>>>> JoinWindows)
> >>>>>>>>>>>>>>>> * duplicates _must_ be enabled
> >>>>>>>>>>>>>>>> * retention should be _at least_ windowSize +
> >>>>> gracePeriod,
> >>>>>>>> but
> >>>>>>>>>> note
> >>>>>>>>>>>>>>>> that (unlike for Table window stores) there is no
> >>>> utility
> >>>>>> in
> >>>>>>>>>> having a
> >>>>>>>>>>>>>>>> longer retention time.
> >>>>>>>>>>>>>>>> * the WindowStore<Bytes, byte[]> itself is fine, if
> >>>>> overly
> >>>>>>>>> broad,
> >>>>>>>>>>>>>>>> since the only two methods we need are
> >>> `window.put(key,
> >>>>>>>> value,
> >>>>>>>>>>>>>>>> context().timestamp())` and
> >> `WindowStoreIterator<V2>
> >>>>> iter =
> >>>>>>>>>>>>>>>> window.fetch(key, timeFrom, timeTo)`.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thus, flattening out the overlap for
> >>>>>> WindowBytesStoreSupplier
> >>>>>>>>>> onto the
> >>>>>>>>>>>>>>>> overlap for Materialized, we have 9 capabilities
> >>> total
> >>>>>> (note
> >>>>>>>>>> retention
> >>>>>>>>>>>>>>>> is duplicated), we have 4 that we don't want:
> >>>>>>>>>>>>>>>> * do _not_ set the serdes (they are already set in
> >>>>> Joined)
> >>>>>>>>>>>>>>>> * do _not_ configure retention (determined by
> >>>>> JoinWindows)
> >>>>>>>>>>>>>>>> * should _not_ configure window size (it's
> >> determined
> >>>> by
> >>>>>>>>>> JoinWindows)
> >>>>>>>>>>>>>>>> * duplicates _must_ be enabled
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> These gaps would have to be covered with run-time
> >>>> checks
> >>>>> if
> >>>>>>>> we
> >>>>>>>>>> re-use
> >>>>>>>>>>>>>>>> Materialized and WindowStoreBytesStoreSupplier
> >> both.
> >>>>> Maybe
> >>>>>>>> this
> >>>>>>>>>> sounds
> >>>>>>>>>>>>>>>> bad, but consider the other side, that we get 5 new
> >>>>>>>> capabilities
> >>>>>>>>>> we
> >>>>>>>>>>>>>>>> don't require, but are still pretty nice:
> >>>>>>>>>>>>>>>> * configure the bytes store
> >>>>>>>>>>>>>>>> * set a name for the store
> >>>>>>>>>>>>>>>> * configure caching
> >>>>>>>>>>>>>>>> * configure logging
> >>>>>>>>>>>>>>>> * configure segment interval
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Not sure where this nets us out, but it's food for
> >>>>> thought.
> >>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Sun, Jun 23, 2019 at 7:52 PM Guozhang Wang <
> >>>>>>>>> wangg...@gmail.com
> >>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi Bill,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I think by giving a Materialized param into
> >>>>> stream-stream
> >>>>>>>> join,
> >>>>>>>>>> it's
> >>>>>>>>>>>>>> okay
> >>>>>>>>>>>>>>>>> (though still ideal) to say "we still would not
> >>> expose
> >>>>> the
> >>>>>>>>> store
> >>>>>>>>>> for
> >>>>>>>>>>>>>>>>> queries", but it would sound a bit awkward to say
> >>> "we
> >>>>>> would
> >>>>>>>>> also
> >>>>>>>>>>>>>> ignore
> >>>>>>>>>>>>>>>>> whatever the passed in store supplier but just use
> >>> our
> >>>>>>>> default
> >>>>>>>>>> ones"
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> again the concern is that, if in the future we'd
> >>> want
> >>>> to
> >>>>>>>> change
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> default
> >>>>>>>>>>>>>>>>> implementation of join algorithm which no longer
> >>> rely
> >>>>> on a
> >>>>>>>>> window
> >>>>>>>>>>>>>> store
> >>>>>>>>>>>>>>>>> with deduping enabled, then we need to change this
> >>> API
> >>>>>>>> again by
> >>>>>>>>>>>>>> changing
> >>>>>>>>>>>>>>>>> the store supplier type.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> If we do want to fill this hole for stream-stream
> >>>> join,
> >>>>> I
> >>>>>>>> feel
> >>>>>>>>>> just
> >>>>>>>>>>>>>>>> adding
> >>>>>>>>>>>>>>>>> a String typed store-name would even be less
> >>>>>>>> future-intrusive
> >>>>>>>>> if
> >>>>>>>>>> we
> >>>>>>>>>>>>>>>> expect
> >>>>>>>>>>>>>>>>> this parameter to be modified later.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Does that makes sense?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Sat, Jun 22, 2019 at 12:51 PM Bill Bejeck <
> >>>>>>>>> bbej...@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks for the comments John and Guozhang, I'll
> >>>> address
> >>>>>>>> each
> >>>>>>>>>> one of
> >>>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>> comments in turn.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> John,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I'm wondering about a missing quadrant from the
> >>>> truth
> >>>>>>>> table
> >>>>>>>>>>>>>> involving
> >>>>>>>>>>>>>>>>>>> whether a Materialized is stored or not and
> >>> querying
> >>>>> is
> >>>>>>>>>>>>>>>>>>> enabled/disabled... What should be the behavior
> >> if
> >>>>> there
> >>>>>>>> is
> >>>>>>>>> no
> >>>>>>>>>>>>>> store
> >>>>>>>>>>>>>>>>>>> configured (e.g., if Materialized with only
> >>> serdes)
> >>>>> and
> >>>>>>>>>> querying
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> enabled?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> It seems we have two choices:
> >>>>>>>>>>>>>>>>>>> 1. we can force creation of a state store in
> >> this
> >>>>> case,
> >>>>>> so
> >>>>>>>>> the
> >>>>>>>>>>>>>> store
> >>>>>>>>>>>>>>>>>>> can be used to serve the queries
> >>>>>>>>>>>>>>>>>>> 2. we can provide just a queriable view,
> >> basically
> >>>>>>>> letting IQ
> >>>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>> into the "KTableValueGetter", which would
> >>>>> transparently
> >>>>>>>>>>>>>> construct the
> >>>>>>>>>>>>>>>>>>> query response by applying the operator logic to
> >>> the
> >>>>>>>> upstream
> >>>>>>>>>>>>>> state
> >>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>> the operator state isn't already stored.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I agree with your assertion about a missing
> >>> quadrant
> >>>>> from
> >>>>>>>> the
> >>>>>>>>>> truth
> >>>>>>>>>>>>>>>> table.
> >>>>>>>>>>>>>>>>>> Additionally, I too like the concept of a
> >> queriable
> >>>>> view.
> >>>>>>>>> But I
> >>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> goes a bit beyond the scope of this KIP and would
> >>>> like
> >>>>> to
> >>>>>>>>> pursue
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> feature as follow-on work.  Also thinking about
> >>> this
> >>>>> KIP
> >>>>>>>> some
> >>>>>>>>>>>>>> more, I'm
> >>>>>>>>>>>>>>>>>> thinking of the changes to Materialized might be
> >> a
> >>>>> reach
> >>>>>> as
> >>>>>>>>>> well.
> >>>>>>>>>>>>>>>>>> Separating the naming from a store and its
> >>> queryable
> >>>>>> state
> >>>>>>>>> seems
> >>>>>>>>>>>>>> like a
> >>>>>>>>>>>>>>>>>> complex issue in and of itself and should be
> >>> treated
> >>>>>>>>>> accordingly.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So here's what I'm thinking now.  We add
> >>> Materialzied
> >>>>> to
> >>>>>>>> Join,
> >>>>>>>>>> but
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> now,
> >>>>>>>>>>>>>>>>>> we internally disable querying.  I know this
> >> breaks
> >>>> our
> >>>>>>>>> current
> >>>>>>>>>>>>>>>> semantic
> >>>>>>>>>>>>>>>>>> approach, but I think it's crucial that we do two
> >>>>> things
> >>>>>> in
> >>>>>>>>> this
> >>>>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>    1. Break the naming of the state stores from
> >>>> Joined
> >>>>> to
> >>>>>>>>>>>>>>>> Materialized, so
> >>>>>>>>>>>>>>>>>>    the naming of state stores follows our current
> >>>>> pattern
> >>>>>>>> and
> >>>>>>>>>>>>>> enables
> >>>>>>>>>>>>>>>>>> upgrades
> >>>>>>>>>>>>>>>>>>    from 2.3 to 2.4
> >>>>>>>>>>>>>>>>>>    2. Offer the ability to configure the state
> >>> stores
> >>>>> of
> >>>>>>>> the
> >>>>>>>>>> join,
> >>>>>>>>>>>>>> even
> >>>>>>>>>>>>>>>>>>    providing a different implementation (i.e.
> >>>>> in-memory)
> >>>>>> if
> >>>>>>>>>>>>>> desired.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> With that in mind I'm considering changing the
> >> KIP
> >>> to
> >>>>>>>> remove
> >>>>>>>>> the
> >>>>>>>>>>>>>>>> changes to
> >>>>>>>>>>>>>>>>>> Materialized, and we document very clearly that
> >> by
> >>>>>>>> providing a
> >>>>>>>>>>>>>>>> Materialized
> >>>>>>>>>>>>>>>>>> object with a name is only for naming the state
> >>>> store,
> >>>>>>>> hence
> >>>>>>>>> the
> >>>>>>>>>>>>>>>> changelog
> >>>>>>>>>>>>>>>>>> topics and any possible configurations of the
> >>> store,
> >>>>> but
> >>>>>>>> this
> >>>>>>>>>> store
> >>>>>>>>>>>>>>>> *will
> >>>>>>>>>>>>>>>>>> not be available for IQ.*
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Guozhang,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1. About not breaking compatibility of
> >>> stream-stream
> >>>>>> join
> >>>>>>>>>>>>>>>> materialized
> >>>>>>>>>>>>>>>>>>> stores: I think this is a valid issue to tackle,
> >>> but
> >>>>>> after
> >>>>>>>>>>>>>> thinking
> >>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>> it once more I'm not sure if exposing
> >> Materialized
> >>>>> would
> >>>>>>>> be a
> >>>>>>>>>>>>>> good
> >>>>>>>>>>>>>>>>>> solution
> >>>>>>>>>>>>>>>>>>> here. My rationles:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1.a For stream-stream join, our current usage of
> >>>>>>>> window-store
> >>>>>>>>>> is
> >>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>> ideal,
> >>>>>>>>>>>>>>>>>>> and we want to modify it in the near future to
> >> be
> >>>> more
> >>>>>>>>>>>>>> efficient. Not
> >>>>>>>>>>>>>>>>>>> allowing users to override such state store
> >>> backend
> >>>>>> gives
> >>>>>>>> us
> >>>>>>>>>> such
> >>>>>>>>>>>>>>>> freedom
> >>>>>>>>>>>>>>>>>>> (which was also considered in the original DSL
> >>>>> design),
> >>>>>>>>> whereas
> >>>>>>>>>>>>>>>> getting a
> >>>>>>>>>>>>>>>>>>> Materialized<WindowStore> basically kicks out
> >> that
> >>>>>> freedom
> >>>>>>>>> out
> >>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>> window.
> >>>>>>>>>>>>>>>>>>> 1.b For strema-stream join, in our original
> >> design
> >>>> we
> >>>>>>>> intend
> >>>>>>>>> to
> >>>>>>>>>>>>>>>> "never"
> >>>>>>>>>>>>>>>>>>> want users to query the state, since it is just
> >>> for
> >>>>>>>> buffering
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> upcoming
> >>>>>>>>>>>>>>>>>>> records from the stream. Now I know that some
> >>> users
> >>>>> may
> >>>>>>>>> indeed
> >>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> query it from the debugging perspective, but
> >>> still I
> >>>>>>>>> concerned
> >>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>> whether leveraging IQ for debugging purposes
> >> would
> >>>> be
> >>>>>> the
> >>>>>>>>> right
> >>>>>>>>>>>>>>>> solution
> >>>>>>>>>>>>>>>>>>> here. And adding Materialized object opens the
> >>> door
> >>>> to
> >>>>>> let
> >>>>>>>>>> users
> >>>>>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>> about it (unless we did something intentionally
> >> to
> >>>>> still
> >>>>>>>>>> forbids
> >>>>>>>>>>>>>> it),
> >>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>> also restricts us in the future.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2. About the coupling between
> >> Materialized.name()
> >>>> and
> >>>>>>>>>> queryable:
> >>>>>>>>>>>>>>>> again I
> >>>>>>>>>>>>>>>>>>> think this is a valid issue. But I'm not sure if
> >>> the
> >>>>>>>> current
> >>>>>>>>>>>>>>>>>>> "withQuerryingDisabled / Enabled" at
> >> Materialized
> >>> is
> >>>>> the
> >>>>>>>> best
> >>>>>>>>>>>>>>>> approach.
> >>>>>>>>>>>>>>>>>>> Here I think I agree with John, that generally
> >>>>> speaking
> >>>>>>>> it's
> >>>>>>>>>>>>>> better
> >>>>>>>>>>>>>>>> be a
> >>>>>>>>>>>>>>>>>>> control function on the `KTable` itself, rather
> >>> than
> >>>>> on
> >>>>>>>>>>>>>>>> `Materialized`,
> >>>>>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>>> fixing it via adding functions through
> >>>> `Materialized`
> >>>>>>>> seems
> >>>>>>>>>> not a
> >>>>>>>>>>>>>>>> natural
> >>>>>>>>>>>>>>>>>> approach either.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I understand your thoughts here, and up to a
> >>> point, I
> >>>>>> agree
> >>>>>>>>> with
> >>>>>>>>>>>>>> you.
> >>>>>>>>>>>>>>>>>> But concerning not providing Materialized as it
> >> may
> >>>>>>>> restrict
> >>>>>>>>> us
> >>>>>>>>>> in
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> future for delivering different implementations,
> >>> I'm
> >>>>>>>> wondering
> >>>>>>>>>> if
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>> doing some premature optimization here.
> >>>>>>>>>>>>>>>>>> My rationale for saying so
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>    1. I think the cost of not allowing the naming
> >>> of
> >>>>>> state
> >>>>>>>>>> stores
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> joins
> >>>>>>>>>>>>>>>>>>    is too big of a gap to leave.   IMHO for joins
> >>> to
> >>>>>> follow
> >>>>>>>>> the
> >>>>>>>>>>>>>> current
> >>>>>>>>>>>>>>>>>>    pattern of using Materialized for naming state
> >>>>> stores
> >>>>>>>> would
> >>>>>>>>>> be
> >>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>> most
> >>>>>>>>>>>>>>>>>>    users would expect to use.  As I said in my
> >>>> comments
> >>>>>>>>> above, I
> >>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>    should *not include* the changes to
> >> Materialized
> >>>> and
> >>>>>>>>> enforce
> >>>>>>>>>>>>>> named
> >>>>>>>>>>>>>>>>>>    stores for joins as unavailable for IQ.
> >>>>>>>>>>>>>>>>>>    2. We'll still have the join methods available
> >>>>>> without a
> >>>>>>>>>>>>>>>> Materialized
> >>>>>>>>>>>>>>>>>>    allowing us to do something different
> >> internally
> >>>> if
> >>>>> a
> >>>>>>>>>>>>>> Materialized
> >>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>    provided.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Overall, I'm thinking maybe we should still use
> >>> two
> >>>>>> stones
> >>>>>>>>>> rather
> >>>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>> to kill these two birds, and probably for this
> >> KIP
> >>>> we
> >>>>>> just
> >>>>>>>>>> focus
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>> 1)
> >>>>>>>>>>>>>>>>>>> above. And for that I'd like to not expose the
> >>>>>>>> Materialized
> >>>>>>>>>>>>>> either
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> rationales that I've listed above. Instead, we
> >>> just
> >>>>>>>> restrict
> >>>>>>>>>>>>>> KIP-307
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> NOT
> >>>>>>>>>>>>>>>>>>> use the Joined.name for state store names and
> >>> always
> >>>>> use
> >>>>>>>>>> internal
> >>>>>>>>>>>>>>>> names
> >>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>> well, which admittedly indeed leaves a hole of
> >> not
> >>>>> being
> >>>>>>>> able
> >>>>>>>>>> to
> >>>>>>>>>>>>>>>> cover
> >>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>> internal names here, but now I feel this `hole`
> >>> may
> >>>>>>>> better be
> >>>>>>>>>>>>>> filled
> >>>>>>>>>>>>>>>> by,
> >>>>>>>>>>>>>>>>>>> e.g. not creating changelog topics but just use
> >>> the
> >>>>>>>> upstream
> >>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> re-bootstrap the materialized store, more
> >>>> concretely:
> >>>>>> when
> >>>>>>>>>>>>>>>> materializing
> >>>>>>>>>>>>>>>>>>> the store, try to piggy-back the changelog topic
> >>> on
> >>>> an
> >>>>>>>>> existing
> >>>>>>>>>>>>>>>> topic,
> >>>>>>>>>>>>>>>>>> e.g.
> >>>>>>>>>>>>>>>>>>> a) if the stream is coming directly from some
> >>> source
> >>>>>> topic
> >>>>>>>>>>>>>> (including
> >>>>>>>>>>>>>>>>>>> repartition topic), make that as changelog topic
> >>> and
> >>>>> if
> >>>>>>>> it is
> >>>>>>>>>>>>>>>> repartition
> >>>>>>>>>>>>>>>>>>> topic change the retention / data purging policy
> >>>>>>>> necessarily
> >>>>>>>>> as
> >>>>>>>>>>>>>>>> well; b)
> >>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>> the stream is coming from some stateless
> >>> operators,
> >>>>>>>> delegate
> >>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> stateless
> >>>>>>>>>>>>>>>>>>> operator to the parent stream similar as a); if
> >>> the
> >>>>>>>> stream is
> >>>>>>>>>>>>>> coming
> >>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> stream-stream join which is the only stateful
> >>>> operator
> >>>>>>>> that
> >>>>>>>>> can
> >>>>>>>>>>>>>>>> result in
> >>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> stream, consider merging the join into multi-way
> >>>> joins
> >>>>>>>> (yes,
> >>>>>>>>>>>>>> this is
> >>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>> very
> >>>>>>>>>>>>>>>>>>> hand-wavy thought, but the point here is that we
> >>> do
> >>>>> not
> >>>>>>>> try
> >>>>>>>>> to
> >>>>>>>>>>>>>>>> tackle it
> >>>>>>>>>>>>>>>>>>> now but leave it for a better solution :).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I really like this idea!  I agree with you in
> >> that
> >>>> this
> >>>>>>>>> approach
> >>>>>>>>>>>>>> to too
> >>>>>>>>>>>>>>>>>> much for adding in this KIP, but we could pick it
> >>> up
> >>>>>> later
> >>>>>>>> and
> >>>>>>>>>>>>>>>> leverage the
> >>>>>>>>>>>>>>>>>> Optimization framework to accomplish this re-use.
> >>>>>>>>>>>>>>>>>> Again, while I agree we should break the naming
> >> of
> >>>> join
> >>>>>>>> state
> >>>>>>>>>>>>>> stores
> >>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>> KIP-307, IMHO it's something we should fix now as
> >>> it
> >>>>> will
> >>>>>>>> be
> >>>>>>>>> the
> >>>>>>>>>>>>>> last
> >>>>>>>>>>>>>>>> piece
> >>>>>>>>>>>>>>>>>> we can provide to give users the ability to
> >>>> completely
> >>>>>> make
> >>>>>>>>>> their
> >>>>>>>>>>>>>>>>>> topologies "upgrade proof" when adding additional
> >>>>>>>> operations.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks again to both of you for comments and I
> >> look
> >>>>>>>> forward to
> >>>>>>>>>>>>>> hearing
> >>>>>>>>>>>>>>>> back
> >>>>>>>>>>>>>>>>>> from you.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>> Bill
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:33 PM Guozhang Wang <
> >>>>>>>>>> wangg...@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hello Bill,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks for the KIP. Glad to see that we can
> >> likely
> >>>>>>>> shooting
> >>>>>>>>> two
> >>>>>>>>>>>>>> birds
> >>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> one stone. I have some concerns though about
> >> those
> >>>>> "two
> >>>>>>>>> birds"
> >>>>>>>>>>>>>>>>>> themselves:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1. About not breaking compatibility of
> >>> stream-stream
> >>>>>> join
> >>>>>>>>>>>>>>>> materialized
> >>>>>>>>>>>>>>>>>>> stores: I think this is a valid issue to tackle,
> >>> but
> >>>>>> after
> >>>>>>>>>>>>>> thinking
> >>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>> it once more I'm not sure if exposing
> >> Materialized
> >>>>> would
> >>>>>>>> be a
> >>>>>>>>>>>>>> good
> >>>>>>>>>>>>>>>>>> solution
> >>>>>>>>>>>>>>>>>>> here. My rationles:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1.a For stream-stream join, our current usage of
> >>>>>>>> window-store
> >>>>>>>>>> is
> >>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>> ideal,
> >>>>>>>>>>>>>>>>>>> and we want to modify it in the near future to
> >> be
> >>>> more
> >>>>>>>>>>>>>> efficient. Not
> >>>>>>>>>>>>>>>>>>> allowing users to override such state store
> >>> backend
> >>>>>> gives
> >>>>>>>> us
> >>>>>>>>>> such
> >>>>>>>>>>>>>>>> freedom
> >>>>>>>>>>>>>>>>>>> (which was also considered in the original DSL
> >>>>> design),
> >>>>>>>>> whereas
> >>>>>>>>>>>>>>>> getting a
> >>>>>>>>>>>>>>>>>>> Materialized<WindowStore> basically kicks out
> >> that
> >>>>>> freedom
> >>>>>>>>> out
> >>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>> window.
> >>>>>>>>>>>>>>>>>>> 1.b For strema-stream join, in our original
> >> design
> >>>> we
> >>>>>>>> intend
> >>>>>>>>> to
> >>>>>>>>>>>>>>>> "never"
> >>>>>>>>>>>>>>>>>>> want users to query the state, since it is just
> >>> for
> >>>>>>>> buffering
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> upcoming
> >>>>>>>>>>>>>>>>>>> records from the stream. Now I know that some
> >>> users
> >>>>> may
> >>>>>>>>> indeed
> >>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> query it from the debugging perspective, but
> >>> still I
> >>>>>>>>> concerned
> >>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>> whether leveraging IQ for debugging purposes
> >> would
> >>>> be
> >>>>>> the
> >>>>>>>>> right
> >>>>>>>>>>>>>>>> solution
> >>>>>>>>>>>>>>>>>>> here. And adding Materialized object opens the
> >>> door
> >>>> to
> >>>>>> let
> >>>>>>>>>> users
> >>>>>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>> about it (unless we did something intentionally
> >> to
> >>>>> still
> >>>>>>>>>> forbids
> >>>>>>>>>>>>>> it),
> >>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>> also restricts us in the future.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2. About the coupling between
> >> Materialized.name()
> >>>> and
> >>>>>>>>>> queryable:
> >>>>>>>>>>>>>>>> again I
> >>>>>>>>>>>>>>>>>>> think this is a valid issue. But I'm not sure if
> >>> the
> >>>>>>>> current
> >>>>>>>>>>>>>>>>>>> "withQuerryingDisabled / Enabled" at
> >> Materialized
> >>> is
> >>>>> the
> >>>>>>>> best
> >>>>>>>>>>>>>>>> approach.
> >>>>>>>>>>>>>>>>>>> Here I think I agree with John, that generally
> >>>>> speaking
> >>>>>>>> it's
> >>>>>>>>>>>>>> better
> >>>>>>>>>>>>>>>> be a
> >>>>>>>>>>>>>>>>>>> control function on the `KTable` itself, rather
> >>> than
> >>>>> on
> >>>>>>>>>>>>>>>> `Materialized`,
> >>>>>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>>> fixing it via adding functions through
> >>>> `Materialized`
> >>>>>>>> seems
> >>>>>>>>>> not a
> >>>>>>>>>>>>>>>> natural
> >>>>>>>>>>>>>>>>>>> approach either.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Overall, I'm thinking maybe we should still use
> >>> two
> >>>>>> stones
> >>>>>>>>>> rather
> >>>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>> to kill these two birds, and probably for this
> >> KIP
> >>>> we
> >>>>>> just
> >>>>>>>>>> focus
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>> 1)
> >>>>>>>>>>>>>>>>>>> above. And for that I'd like to not expose the
> >>>>>>>> Materialized
> >>>>>>>>>>>>>> either
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> rationales that I've listed above. Instead, we
> >>> just
> >>>>>>>> restrict
> >>>>>>>>>>>>>> KIP-307
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> NOT
> >>>>>>>>>>>>>>>>>>> use the Joined.name for state store names and
> >>> always
> >>>>> use
> >>>>>>>>>> internal
> >>>>>>>>>>>>>>>> names
> >>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>> well, which admittedly indeed leaves a hole of
> >> not
> >>>>> being
> >>>>>>>> able
> >>>>>>>>>> to
> >>>>>>>>>>>>>>>> cover
> >>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>> internal names here, but now I feel this `hole`
> >>> may
> >>>>>>>> better be
> >>>>>>>>>>>>>> filled
> >>>>>>>>>>>>>>>> by,
> >>>>>>>>>>>>>>>>>>> e.g. not creating changelog topics but just use
> >>> the
> >>>>>>>> upstream
> >>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> re-bootstrap the materialized store, more
> >>>> concretely:
> >>>>>> when
> >>>>>>>>>>>>>>>> materializing
> >>>>>>>>>>>>>>>>>>> the store, try to piggy-back the changelog topic
> >>> on
> >>>> an
> >>>>>>>>> existing
> >>>>>>>>>>>>>>>> topic,
> >>>>>>>>>>>>>>>>>> e.g.
> >>>>>>>>>>>>>>>>>>> a) if the stream is coming directly from some
> >>> source
> >>>>>> topic
> >>>>>>>>>>>>>> (including
> >>>>>>>>>>>>>>>>>>> repartition topic), make that as changelog topic
> >>> and
> >>>>> if
> >>>>>>>> it is
> >>>>>>>>>>>>>>>> repartition
> >>>>>>>>>>>>>>>>>>> topic change the retention / data purging policy
> >>>>>>>> necessarily
> >>>>>>>>> as
> >>>>>>>>>>>>>>>> well; b)
> >>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>> the stream is coming from some stateless
> >>> operators,
> >>>>>>>> delegate
> >>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> stateless
> >>>>>>>>>>>>>>>>>>> operator to the parent stream similar as a); if
> >>> the
> >>>>>>>> stream is
> >>>>>>>>>>>>>> coming
> >>>>>>>>>>>>>>>>>> from a
> >>>>>>>>>>>>>>>>>>> stream-stream join which is the only stateful
> >>>> operator
> >>>>>>>> that
> >>>>>>>>> can
> >>>>>>>>>>>>>>>> result
> >>>>>>>>>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>>> stream, consider merging the join into multi-way
> >>>> joins
> >>>>>>>> (yes,
> >>>>>>>>>>>>>> this is
> >>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>> very
> >>>>>>>>>>>>>>>>>>> hand-wavy thought, but the point here is that we
> >>> do
> >>>>> not
> >>>>>>>> try
> >>>>>>>>> to
> >>>>>>>>>>>>>>>> tackle it
> >>>>>>>>>>>>>>>>>>> now but leave it for a better solution :).
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, Jun 19, 2019 at 11:41 AM John Roesler <
> >>>>>>>>>> j...@confluent.io
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Bill,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for the KIP! Awesome job catching this
> >>>>>> unexpected
> >>>>>>>>>>>>>>>> consequence
> >>>>>>>>>>>>>>>>>>>> of the prior KIPs before it was released.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> The proposal looks good to me. On top of just
> >>>> fixing
> >>>>>> the
> >>>>>>>>>>>>>> problem,
> >>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>> seems to address two other pain points:
> >>>>>>>>>>>>>>>>>>>> * that naming a s
> >
>

Reply via email to