Re: [DISCUSS] KIP-28 - Add a transform client for data processing

Guozhang Wang Fri, 24 Jul 2015 09:57:12 -0700

Hi Jiangjie,

Inlined.


On Thu, Jul 23, 2015 at 11:32 PM, Jiangjie Qin <[email protected]>
wrote:

> Hey Guozhang,
>
> I just took a quick look at the KIP, is it very similar to mirror maker
> with message handler?
>
>
I think the processor client would supporting a superset of functionalities
than MM with message handlers in that:

1. API-wise, it would provide beyond per-message processing like the
message handler, including local storage (with committing mechanism),
time-triggered processing, etc.
2. Feature-wise, it would support user-customizable partition assignment
such as co-partitioning, sticky-partitioning (for local state maintenance,
for example), etc.

Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Jul 23, 2015 at 10:25 PM, Ewen Cheslack-Postava <[email protected]
> >
> wrote:
>
> > Just some notes on the KIP doc itself:
> >
> > * It'd be useful to clarify at what point the plain consumer + custom
> code
> > + producer breaks down. I think trivial filtering and aggregation on a
> > single stream usually work fine with this model. Anything where you need
> > more complex joins, windowing, etc. are where it breaks down. I think
> most
> > interesting applications require that functionality, but it's helpful to
> > make this really clear in the motivation -- right now, Kafka only
> provides
> > the lowest level plumbing for stream processing applications, so most
> > interesting apps require very heavyweight frameworks.
> > * I think the feature comparison of plain producer/consumer, stream
> > processing frameworks, and this new library is a good start, but we might
> > want something more thorough and structured, like a feature matrix. Right
> > now it's hard to figure out exactly how they relate to each other.
> > * I'd personally push the library vs. framework story very strongly --
> the
> > total buy-in and weak integration story of stream processing frameworks
> is
> > a big downside and makes a library a really compelling (and currently
> > unavailable, as far as I am aware) alternative.
> > * Comment about in-memory storage of other frameworks is interesting --
> it
> > is specific to the framework, but is supposed to also give performance
> > benefits. The high-level functional processing interface would allow for
> > combining multiple operations when there's no shuffle, but when there is
> a
> > shuffle, we'll always be writing to Kafka, right? Spark (and presumably
> > spark streaming) is supposed to get a big win by handling shuffles such
> > that the data just stays in cache and never actually hits disk, or at
> least
> > hits disk in the background. Will we take a hit because we always write
> to
> > Kafka?
> > * I really struggled with the structure of the KIP template with Copycat
> > because the flow doesn't work well for proposals like this. They aren't
> as
> > concrete changes as the KIP template was designed for. I'd completely
> > ignore that template in favor of optimizing for clarity if I were you.
> >
> > -Ewen
> >
> > On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang <[email protected]>
> wrote:
> >
> > > Hi all,
> > >
> > > I just posted KIP-28: Add a transform client for data processing
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing
> > > >
> > > .
> > >
> > > The wiki page does not yet have the full design / implementation
> details,
> > > and this email is to kick-off the conversation on whether we should add
> > > this new client with the described motivations, and if yes what
> features
> > /
> > > functionalities should be included.
> > >
> > > Looking forward to your feedback!
> > >
> > > -- Guozhang
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-28 - Add a transform client for data processing

Reply via email to