Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

Adam Bellemare Tue, 21 Aug 2018 11:59:23 -0700

Hi John

That is an excellent idea. The header usage I propose would be limited
entirely to internal topics, and this could very well be the solution to
potential conflicts. If we do not officially reserve a prefix "__" then I
think this would be the safest idea, as it would entirely avoid any
accidents (perhaps if a company is using its own "__" prefix for other
reasons).


Thanks

Adam


On Tue, Aug 21, 2018 at 2:24 PM, John Roesler <[email protected]> wrote:

> Just a quick thought regarding headers:
> > I think there is no absolute-safe ways to avoid conflicts, but we can
> still
> > consider using some name patterns to reduce the likelihood as much as
> > possible.. e.g. consider sth. like the internal topics naming: e.g.
> > "__internal_[name]"?
>
> I think there is a safe way to avoid conflicts, since these headers are
> only needed in internal topics (I think):
> For internal and changelog topics, we can namespace all headers:
> * user-defined headers are namespaced as "external." + headerKey
> * internal headers are namespaced as "internal." + headerKey
>
> This is a lot of characters, so we could use a sigil instead (e.g., "_" for
> internal, "~" for external)
>
> We simply apply the namespacing when we read user headers from external
> topics into the topology and then de-namespace them before we emit them to
> an external topic (via "to" or "through").
> Now, it is not possible to collide with user-defined headers.
>
> That said, I'd also be fine with just reserving "__" as a header prefix and
> not worrying about collisions.
>
> Thanks for the KIP,
> -John
>
> On Tue, Aug 21, 2018 at 9:48 AM Jan Filipiak <[email protected]>
> wrote:
>
> > Still havent completly grabbed it.
> > sorry will read more
> >
> > On 17.08.2018 21:23, Jan Filipiak wrote:
> > > Cool stuff.
> > >
> > > I made some random remarks. Did not touch the core of the algorithm
> yet.
> > >
> > > Will do Monday 100%
> > >
> > > I don't see Interactive Queries :) like that!
> > >
> > >
> > >
> > >
> > > On 17.08.2018 20:28, Adam Bellemare wrote:
> > >> I have submitted a PR with my code against trunk:
> > >> https://github.com/apache/kafka/pull/5527
> > >>
> > >> Do I continue on this thread or do we begin a new one for discussion?
> > >>
> > >> On Thu, Aug 16, 2018 at 1:40 AM, Jan Filipiak <
> [email protected]
> > >
> > >> wrote:
> > >>
> > >>> even before message headers, the option for me always existed to
> > >>> just wrap
> > >>> the messages into my own custom envelop.
> > >>> So I of course thought this through. One sentence in your last email
> > >>> triggered all the thought process I put in the back then
> > >>> again to design it in the, what i think is the "kafka-way". It ended
> up
> > >>> ranting a little about what happened in the past.
> > >>>
> > >>> I see plenty of colleagues of mine falling into traps in the API,
> > >>> that I
> > >>> did warn about in the 1.0 DSL rewrite. I have the same
> > >>> feeling again. So I hope it gives you some insights into my though
> > >>> process. I am aware that since i never ported 213 to higher
> > >>> streams version, I don't really have a steak here and initially I
> > >>> didn't
> > >>> feel like actually sending it. But maybe you can pull
> > >>> something good from it.
> > >>>
> > >>>   Best jan
> > >>>
> > >>>
> > >>>
> > >>> On 15.08.2018 04:44, Adam Bellemare wrote:
> > >>>
> > >>>> @Jan
> > >>>> Thanks Jan. I take it you mean "key-widening" somehow includes
> > >>>> information
> > >>>> about which record is processed first? I understand about a
> > >>>> CombinedKey
> > >>>> with both the Foreign and Primary key, but I don't see how you track
> > >>>> ordering metadata in there unless you actually included a metadata
> > >>>> field
> > >>>> in
> > >>>> the key type as well.
> > >>>>
> > >>>> @Guozhang
> > >>>> As Jan mentioned earlier, is Record Headers mean to strictly be
> > >>>> used in
> > >>>> just the user-space? It seems that it is possible that a collision
> > >>>> on the
> > >>>> (key,value) tuple I wish to add to it could occur. For instance, if
> I
> > >>>> wanted to add a ("foreignKeyOffset",10) to the Headers but the user
> > >>>> already
> > >>>> specified their own header with the same key name, then it appears
> > >>>> there
> > >>>> would be a collision. (This is one of the issues I brought up in
> > >>>> the KIP).
> > >>>>
> > >>>> --------------------------------
> > >>>>
> > >>>> I will be posting a prototype PR against trunk within the next day
> > >>>> or two.
> > >>>> One thing I need to point out is that my design very strictly wraps
> > >>>> the
> > >>>> entire foreignKeyJoin process entirely within the DSL function.
> > >>>> There is
> > >>>> no
> > >>>> exposure of CombinedKeys or widened keys, nothing to resolve with
> > >>>> regards
> > >>>> to out-of-order processing and no need for the DSL user to even know
> > >>>> what's
> > >>>> going on inside of the function. The code simply returns the
> > >>>> results of
> > >>>> the
> > >>>> join, keyed by the original key. Currently my API mirrors
> > >>>> identically the
> > >>>> format of the data returned by the regular join function, and I
> > >>>> believe
> > >>>> that this is very useful to many users of the DSL. It is my
> > >>>> understanding
> > >>>> that one of the main design goals of the DSL is to provide higher
> > >>>> level
> > >>>> functionality without requiring the users to know exactly what's
> > >>>> going on
> > >>>> under the hood. With this in mind, I thought it best to solve
> > >>>> ordering and
> > >>>> partitioning problems within the function and eliminate the
> > >>>> requirement
> > >>>> for
> > >>>> users to do additional work after the fact to resolve the results
> > >>>> of their
> > >>>> join. Basically, I am assuming that most users of the DSL just
> > >>>> "want it to
> > >>>> work" and want it to be easy. I did this operating under the
> > >>>> assumption
> > >>>> that if a user truly wants to optimize their own workflow down to
> the
> > >>>> finest details then they will break from strictly using the DSL and
> > >>>> move
> > >>>> down to the processors API.
> > >>>>
> > >>> I think. The abstraction is not powerful enough
> > >>> to not have kafka specifics leak up The leak I currently think this
> > >>> has is
> > >>> that you can not reliable prevent the delete coming out first,
> > >>> before you emit the correct new record. As it is an abstraction
> > >>> entirely
> > >>> around kafka.
> > >>> I can only recommend to not to. Honesty and simplicity should always
> be
> > >>> first prio
> > >>> trying to hide this just makes it more complex, less understandable
> and
> > >>> will lead to mistakes
> > >>> in usage.
> > >>>
> > >>> Exactly why I am also in big disfavour of GraphNodes and later
> > >>> optimization stages.
> > >>> Can someone give me an example of an optimisation that really can't
> be
> > >>> handled by the user
> > >>> constructing his topology differently?
> > >>> Having reusable Processor API components accessible by the DSL and
> > >>> composable as
> > >>> one likes is exactly where DSL should max out and KSQL should do the
> > >>> next
> > >>> step.
> > >>> I find it very unprofessional from a software engineering approach
> > >>> to run
> > >>> software where
> > >>> you can not at least senseful reason about the inner workings of the
> > >>> libraries used.
> > >>> Gives this people have to read and understand in anyway, why try to
> > >>> hide
> > >>> it?
> > >>>
> > >>> It really miss the beauty of 0.10 version DSL.
> > >>> Apparently not a thing I can influence but just warn about.
> > >>>
> > >>> @gouzhang
> > >>> you can't imagine how many extra IQ-Statestores I constantly prune
> from
> > >>> stream app's
> > >>> because people just keep passing Materialized's into all the
> > >>> operations.
> > >>> :D :'-(
> > >>> I regret that I couldn't convince you guys back then. Plus this whole
> > >>> entire topology as a floating
> > >>> interface chain, never seen it anywhere :-/ :'(
> > >>>
> > >>> I don't know. I guess this is just me regretting to only have
> 24h/day.
> > >>>
> > >>>
> > >>>
> > >>> I updated the KIP today with some points worth talking about, should
> > >>> anyone
> > >>>> be so inclined to check it out. Currently we are running this code
> in
> > >>>> production to handle relational joins from our Kafka Connect
> > >>>> topics, as
> > >>>> per
> > >>>> the original motivation of the KIP.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> I believe the foreignKeyJoin should be responsible for. In my
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tue, Aug 14, 2018 at 5:22 PM, Guozhang Wang<[email protected]>
> > >>>> wrote:
> > >>>>
> > >>>> Hello Adam,
> > >>>>> As for your question regarding GraphNodes, it is for extending
> > >>>>> Streams
> > >>>>> optimization framework. You can find more details on
> > >>>>> https://issues.apache.org/jira/browse/KAFKA-6761.
> > >>>>>
> > >>>>> The main idea is that instead of directly building up the "physical
> > >>>>> topology" (represented as Topology in the public package, and
> > >>>>> internally
> > >>>>> built as the ProcessorTopology class) while users are specifying
> the
> > >>>>> transformation operators, we first keep it as a "logical topology"
> > >>>>> (represented as GraphNode inside InternalStreamsBuilder). And then
> > >>>>> only
> > >>>>> execute the optimization and the construction of the "physical"
> > >>>>> Topology
> > >>>>> when StreamsBuilder.build() is called.
> > >>>>>
> > >>>>> Back to your question, I think it makes more sense to add a new
> > >>>>> type of
> > >>>>> StreamsGraphNode (maybe you can consider inheriting from the
> > >>>>> BaseJoinProcessorNode). Note that although in the Topology we will
> > >>>>> have
> > >>>>> multiple connected ProcessorNodes to represent a (foreign-key)
> > >>>>> join, we
> > >>>>> still want to keep it as a single StreamsGraphNode, or just a
> > >>>>> couple of
> > >>>>> them in the logical representation so that in the future we can
> > >>>>> construct
> > >>>>> the physical topology differently (e.g. having another way than the
> > >>>>> current
> > >>>>> distributed hash-join).
> > >>>>>
> > >>>>> -------------------------------------------------------
> > >>>>>
> > >>>>> Back to your questions to KIP-213, I think Jan has summarized it
> > >>>>> pretty-well. Note that back then we do not have headers support so
> we
> > >>>>> have
> > >>>>> to do such "key-widening" approach to ensure ordering.
> > >>>>>
> > >>>>>
> > >>>>> Guozhang
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Mon, Aug 13, 2018 at 11:39 PM, Jan
> > >>>>> Filipiak<[email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>> Hi Adam,
> > >>>>>> I love how you are on to this already! I resolve this by
> > >>>>>> "key-widening"
> > >>>>>> I
> > >>>>>> treat the result of FKA,and FKB differently.
> > >>>>>> As you can see the output of my join has a Combined Key and
> > >>>>>> therefore I
> > >>>>>> can resolve the "race condition" in a group by
> > >>>>>> if I so desire.
> > >>>>>>
> > >>>>>> I think this reflects more what happens under the hood and makes
> > >>>>>> it more
> > >>>>>> clear to the user what is going on. The Idea
> > >>>>>> of hiding this behind metadata and handle it in the DSL is from
> > >>>>>> my POV
> > >>>>>> unideal.
> > >>>>>>
> > >>>>>> To write into your example:
> > >>>>>>
> > >>>>>> key + A, null)
> > >>>>>> (key +B, <joined On FK =B>)
> > >>>>>>
> > >>>>>> is what my output would look like.
> > >>>>>>
> > >>>>>>
> > >>>>>> Hope that makes sense :D
> > >>>>>>
> > >>>>>> Best Jan
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On 13.08.2018 18:16, Adam Bellemare wrote:
> > >>>>>>
> > >>>>>> Hi Jan
> > >>>>>>> If you do not use headers or other metadata, how do you ensure
> that
> > >>>>>>> changes
> > >>>>>>> to the foreign-key value are not resolved out-of-order?
> > >>>>>>> ie: If an event has FK = A, but you change it to FK = B, you
> > >>>>>>> need to
> > >>>>>>> propagate both a delete (FK=A -> null) and an addition (FK=B).
> > >>>>>>> In my
> > >>>>>>> solution, without maintaining any metadata, it is possible for
> the
> > >>>>>>> final
> > >>>>>>> output to be in either order - the correctly updated joined
> > >>>>>>> value, or
> > >>>>>>>
> > >>>>>> the
> > >>>>>> null for the delete.
> > >>>>>>> (key, null)
> > >>>>>>> (key, <joined On FK =B>)
> > >>>>>>>
> > >>>>>>> or
> > >>>>>>>
> > >>>>>>> (key, <joined On FK =B>)
> > >>>>>>> (key, null)
> > >>>>>>>
> > >>>>>>> I looked back through your code and through the discussion
> > >>>>>>> threads, and
> > >>>>>>> didn't see any information on how you resolved this. I have a
> > >>>>>>> version
> > >>>>>>> of
> > >>>>>>> my
> > >>>>>>> code working for 2.0, I am just adding more integration tests
> > >>>>>>> and will
> > >>>>>>> update the KIP accordingly. Any insight you could provide on
> > >>>>>>> resolving
> > >>>>>>> out-of-order semantics without metadata would be helpful.
> > >>>>>>>
> > >>>>>>> Thanks
> > >>>>>>> Adam
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Mon, Aug 13, 2018 at 3:34 AM, Jan Filipiak <
> > >>>>>>> [email protected]
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>>> Happy to see that you want to make an effort here.
> > >>>>>>>>
> > >>>>>>>> Regarding the ProcessSuppliers I couldn't find a way to not
> > >>>>>>>> rewrite
> > >>>>>>>> the
> > >>>>>>>> joiners + the merger.
> > >>>>>>>> The re-partitioners can be reused in theory. I don't know if
> > >>>>>>>>
> > >>>>>>> repartition
> > >>>>>> is optimized in 2.0 now.
> > >>>>>>>> I made this
> > >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-241+
> > >>>>>>>> KTable+repartition+with+compacted+Topics
> > >>>>>>>> back then and we are running KIP-213 with KIP-241 in
> combination.
> > >>>>>>>>
> > >>>>>>>> For us it is vital as it minimized the size we had in our
> > >>>>>>>> repartition
> > >>>>>>>> topics plus it removed the factor of 2 in events on every
> message.
> > >>>>>>>> I know about this new  "delete once consumer has read it".  I
> > >>>>>>>> don't
> > >>>>>>>>
> > >>>>>>> think
> > >>>>>> 241 is vital for all usecases, for ours it is. I wanted
> > >>>>>>>> to use 213 to sneak in the foundations for 241 aswell.
> > >>>>>>>>
> > >>>>>>>> I don't quite understand what a PropagationWrapper is, but I am
> > >>>>>>>> certain
> > >>>>>>>> that you do not need RecordHeaders
> > >>>>>>>> for 213 and I would try to leave them out. They either belong
> > >>>>>>>> to the
> > >>>>>>>>
> > >>>>>>> DSL
> > >>>>>> or to the user, having a mixed use is
> > >>>>>>>> to be avoided. We run the join with 0.8 logformat and I don't
> > >>>>>>>> think
> > >>>>>>>> one
> > >>>>>>>> needs more.
> > >>>>>>>>
> > >>>>>>>> This KIP will be very valuable for the streams project! I
> couldn't
> > >>>>>>>>
> > >>>>>>> never
> > >>>>>> convince myself to invest into the 1.0+ DSL
> > >>>>>>>> as I used almost all my energy to fight against it. Maybe this
> can
> > >>>>>>>> also
> > >>>>>>>> help me see the good sides a little bit more.
> > >>>>>>>>
> > >>>>>>>> If there is anything unclear with all the text that has been
> > >>>>>>>> written,
> > >>>>>>>> feel
> > >>>>>>>> free to just directly cc me so I don't miss it on
> > >>>>>>>> the mailing list.
> > >>>>>>>>
> > >>>>>>>> Best Jan
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On 08.08.2018 15:26, Adam Bellemare wrote:
> > >>>>>>>>
> > >>>>>>>> More followup, and +dev as Guozhang replied to me directly
> > >>>>>>>> previously.
> > >>>>>>>>
> > >>>>>>>>> I am currently porting the code over to trunk. One of the major
> > >>>>>>>>>
> > >>>>>>>> changes
> > >>>>>> since 1.0 is the usage of GraphNodes. I have a question about
> this:
> > >>>>>>>>> For a foreignKey joiner, should it have its own dedicated node
> > >>>>>>>>> type?
> > >>>>>>>>>
> > >>>>>>>> Or
> > >>>>>> would it be advisable to construct it from existing GraphNode
> > >>>>>>>>> components?
> > >>>>>>>>> For instance, I believe I could construct it from several
> > >>>>>>>>> OptimizableRepartitionNode, some SinkNode, some SourceNode, and
> > >>>>>>>>>
> > >>>>>>>> several
> > >>>>>> StatefulProcessorNode. That being said, there is some underlying
> > >>>>>>>>> complexity
> > >>>>>>>>> to each approach.
> > >>>>>>>>>
> > >>>>>>>>> I will be switching the KIP-213 to use the RecordHeaders in
> Kafka
> > >>>>>>>>> Streams
> > >>>>>>>>> instead of the PropagationWrapper, but conceptually it should
> > >>>>>>>>> be the
> > >>>>>>>>> same.
> > >>>>>>>>>
> > >>>>>>>>> Again, any feedback is welcomed...
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Jul 30, 2018 at 9:38 AM, Adam Bellemare <
> > >>>>>>>>> [email protected]
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi Guozhang et al
> > >>>>>>>>>
> > >>>>>>>>> I was just reading the 2.0 release notes and noticed a section
> on
> > >>>>>>>>>> Record
> > >>>>>>>>>> Headers.
> > >>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>>>>>> 244%3A+Add+Record+Header+support+to+Kafka+Streams+
> Processor+API
> > >>>>>>>>>>
> > >>>>>>>>>> I am not yet sure if the contents of a RecordHeader is
> > >>>>>>>>>> propagated
> > >>>>>>>>>> all
> > >>>>>>>>>> the
> > >>>>>>>>>> way through the Sinks and Sources, but if it is, and if it
> > >>>>>>>>>> remains
> > >>>>>>>>>> attached
> > >>>>>>>>>> to the record (including null records) I may be able to ditch
> > >>>>>>>>>> the
> > >>>>>>>>>> propagationWrapper for an implementation using RecordHeader.
> > >>>>>>>>>> I am
> > >>>>>>>>>> not
> > >>>>>>>>>> yet
> > >>>>>>>>>> sure if this is doable, so if anyone understands RecordHeader
> > >>>>>>>>>> impl
> > >>>>>>>>>> better
> > >>>>>>>>>> than I, I would be happy to hear from you.
> > >>>>>>>>>>
> > >>>>>>>>>> In the meantime, let me know of any questions. I believe this
> > >>>>>>>>>> PR has
> > >>>>>>>>>>
> > >>>>>>>>> a
> > >>>>>> lot
> > >>>>>>>>>> of potential to solve problems for other people, as I have
> > >>>>>>>>>>
> > >>>>>>>>> encountered
> > >>>>>> a
> > >>>>>>>>>> number of other companies in the wild all home-brewing their
> own
> > >>>>>>>>>> solutions
> > >>>>>>>>>> to come up with a method of handling relational data in
> streams.
> > >>>>>>>>>>
> > >>>>>>>>>> Adam
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Fri, Jul 27, 2018 at 1:45 AM, Guozhang
> > >>>>>>>>>> Wang<[email protected]>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Hello Adam,
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks for rebooting the discussion of this KIP ! Let me
> > >>>>>>>>>> finish my
> > >>>>>>>>>>> pass
> > >>>>>>>>>>> on the wiki and get back to you soon. Sorry for the delays..
> > >>>>>>>>>>>
> > >>>>>>>>>>> Guozhang
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Tue, Jul 24, 2018 at 6:08 AM, Adam Bellemare <
> > >>>>>>>>>>> [email protected]
> > >>>>>>>>>>>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Let me kick this off with a few starting points that I
> > >>>>>>>>>>>> would like
> > >>>>>>>>>>>>
> > >>>>>>>>>>> to
> > >>>>>> generate some discussion on.
> > >>>>>>>>>>>> 1) It seems to me that I will need to repartition the data
> > >>>>>>>>>>>> twice -
> > >>>>>>>>>>>> once
> > >>>>>>>>>>>> on
> > >>>>>>>>>>>> the foreign key, and once back to the primary key. Is there
> > >>>>>>>>>>>>
> > >>>>>>>>>>> anything
> > >>>>>> I
> > >>>>>>>>>>>> am
> > >>>>>>>>>>>> missing here?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 2) I believe I will also need to materialize 3 state
> > >>>>>>>>>>>> stores: the
> > >>>>>>>>>>>> prefixScan
> > >>>>>>>>>>>> SS, the highwater mark SS (for out-of-order resolution) and
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>
> > >>>>>>>>>>> final
> > >>>>>> state
> > >>>>>>>>>>>> store, due to the workflow I have laid out. I have not
> > >>>>>>>>>>>> thought of
> > >>>>>>>>>>>> a
> > >>>>>>>>>>>> better
> > >>>>>>>>>>>> way yet, but would appreciate any input on this matter. I
> have
> > >>>>>>>>>>>> gone
> > >>>>>>>>>>>> back
> > >>>>>>>>>>>> through the mailing list for the previous discussions on
> > >>>>>>>>>>>> this KIP,
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>> I
> > >>>>>>>>>>>> did not see anything relating to resolving out-of-order
> > >>>>>>>>>>>> compute. I
> > >>>>>>>>>>>> cannot
> > >>>>>>>>>>>> see a way around the current three-SS structure that I have.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 3) Caching is disabled on the prefixScan SS, as I do not
> > >>>>>>>>>>>> know how
> > >>>>>>>>>>>>
> > >>>>>>>>>>> to
> > >>>>>> resolve the iterator obtained from rocksDB with that of the cache.
> > >>>>>>>>>>> In
> > >>>>>> addition, I must ensure everything is flushed before scanning.
> > >>>>>>>>>>> Since
> > >>>>>> the
> > >>>>>>>>>>>> materialized prefixScan SS is under "control" of the
> > >>>>>>>>>>>> function, I
> > >>>>>>>>>>>> do
> > >>>>>>>>>>>> not
> > >>>>>>>>>>>> anticipate this to be a problem. Performance throughput
> > >>>>>>>>>>>> will need
> > >>>>>>>>>>>>
> > >>>>>>>>>>> to
> > >>>>>> be
> > >>>>>>>>>>>> tested, but as Jan observed in his initial overview of this
> > >>>>>>>>>>>> issue,
> > >>>>>>>>>>>>
> > >>>>>>>>>>> it
> > >>>>>> is
> > >>>>>>>>>>>> generally a surge of output events which affect performance
> > >>>>>>>>>>>> moreso
> > >>>>>>>>>>>> than
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>> flush or prefixScan itself.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thoughts on any of these are greatly appreciated, since
> these
> > >>>>>>>>>>>> elements
> > >>>>>>>>>>>> are
> > >>>>>>>>>>>> really the cornerstone of the whole design. I can put up
> > >>>>>>>>>>>> the code
> > >>>>>>>>>>>> I
> > >>>>>>>>>>>> have
> > >>>>>>>>>>>> written against 1.0.2 if we so desire, but first I was
> > >>>>>>>>>>>> hoping to
> > >>>>>>>>>>>>
> > >>>>>>>>>>> just
> > >>>>>> tackle some of the fundamental design proposals.
> > >>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>> Adam
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Mon, Jul 23, 2018 at 10:05 AM, Adam Bellemare <
> > >>>>>>>>>>>> [email protected]>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Here is the new discussion thread for KIP-213. I picked
> > >>>>>>>>>>>> back up on
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>> KIP
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> as this is something that we too at Flipp are now running in
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> production.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> Jan started this last year, and I know that Trivago is also
> > >>>>>>>>>>>> using
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> something
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> similar in production, at least in terms of APIs and
> > >>>>>>>>>>>> functionality.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>>>>>>>>> 213+Support+non-key+joining+in+KTable
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I do have an implementation of the code for Kafka 1.0.2
> (our
> > >>>>>>>>>>>>> local
> > >>>>>>>>>>>>> production version) but I won't post it yet as I would
> > >>>>>>>>>>>>> like to
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> focus
> > >>>>>> on the
> > >>>>>>>>>>>> workflow and design first. That being said, I also need to
> add
> > >>>>>>>>>>>> some
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> clearer
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> integration tests (I did a lot of testing using a non-Kafka
> > >>>>>>>>>>>> Streams
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> framework) and clean up the code a bit more before putting
> > >>>>>>>>>>>>> it in
> > >>>>>>>>>>>>> a
> > >>>>>>>>>>>>> PR
> > >>>>>>>>>>>>> against trunk (I can do so later this week likely).
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Please take a look,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Adam Bellemare
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> -- Guozhang
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>> -- Guozhang
> > >>>>>
> > >>>>>
> > >
> >
> >
>

Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

Reply via email to