Hi Jiangjie, Inlined.
On Thu, Jul 23, 2015 at 11:32 PM, Jiangjie Qin <j...@linkedin.com.invalid> wrote: > Hey Guozhang, > > I just took a quick look at the KIP, is it very similar to mirror maker > with message handler? > > I think the processor client would supporting a superset of functionalities than MM with message handlers in that: 1. API-wise, it would provide beyond per-message processing like the message handler, including local storage (with committing mechanism), time-triggered processing, etc. 2. Feature-wise, it would support user-customizable partition assignment such as co-partitioning, sticky-partitioning (for local state maintenance, for example), etc. Thanks, > > Jiangjie (Becket) Qin > > On Thu, Jul 23, 2015 at 10:25 PM, Ewen Cheslack-Postava <e...@confluent.io > > > wrote: > > > Just some notes on the KIP doc itself: > > > > * It'd be useful to clarify at what point the plain consumer + custom > code > > + producer breaks down. I think trivial filtering and aggregation on a > > single stream usually work fine with this model. Anything where you need > > more complex joins, windowing, etc. are where it breaks down. I think > most > > interesting applications require that functionality, but it's helpful to > > make this really clear in the motivation -- right now, Kafka only > provides > > the lowest level plumbing for stream processing applications, so most > > interesting apps require very heavyweight frameworks. > > * I think the feature comparison of plain producer/consumer, stream > > processing frameworks, and this new library is a good start, but we might > > want something more thorough and structured, like a feature matrix. Right > > now it's hard to figure out exactly how they relate to each other. > > * I'd personally push the library vs. framework story very strongly -- > the > > total buy-in and weak integration story of stream processing frameworks > is > > a big downside and makes a library a really compelling (and currently > > unavailable, as far as I am aware) alternative. > > * Comment about in-memory storage of other frameworks is interesting -- > it > > is specific to the framework, but is supposed to also give performance > > benefits. The high-level functional processing interface would allow for > > combining multiple operations when there's no shuffle, but when there is > a > > shuffle, we'll always be writing to Kafka, right? Spark (and presumably > > spark streaming) is supposed to get a big win by handling shuffles such > > that the data just stays in cache and never actually hits disk, or at > least > > hits disk in the background. Will we take a hit because we always write > to > > Kafka? > > * I really struggled with the structure of the KIP template with Copycat > > because the flow doesn't work well for proposals like this. They aren't > as > > concrete changes as the KIP template was designed for. I'd completely > > ignore that template in favor of optimizing for clarity if I were you. > > > > -Ewen > > > > On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Hi all, > > > > > > I just posted KIP-28: Add a transform client for data processing > > > < > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing > > > > > > > . > > > > > > The wiki page does not yet have the full design / implementation > details, > > > and this email is to kick-off the conversation on whether we should add > > > this new client with the described motivations, and if yes what > features > > / > > > functionalities should be included. > > > > > > Looking forward to your feedback! > > > > > > -- Guozhang > > > > > > > > > > > -- > > Thanks, > > Ewen > > > -- -- Guozhang