Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

Jay Kreps Thu, 18 Jun 2015 15:11:33 -0700

I think the only problem we came up with was that Kafka KopyKat abbreviates
as KKK which is not ideal in the US. Copykat would still be googlable
without that issue. :-)


-Jay

On Thu, Jun 18, 2015 at 1:20 PM, Otis Gospodnetic <
[email protected]> wrote:

> Just a comment on the name. KopyKat? More unique, easy to write,
> pronounce, remember...
>
> Otis
>
>
>
> > On Jun 18, 2015, at 13:36, Jay Kreps <[email protected]> wrote:
> >
> > 1. We were calling the plugins connectors (which is kind of a generic way
> > to say either source or sink) and the framework copycat. The pro of
> copycat
> > is it is kind of fun. The con is that it doesn't really say what it does.
> > The Kafka Connector Framework would be a duller but more intuitive name,
> > but I suspect people would then just shorten it to KCF which again has no
> > intuitive meaning.
> >
> > 2. Potentially. One alternative we had thought of wrt the consumer was to
> > have the protocol just handle the group management part and have the
> > partition assignment be purely driven by the client. At the time copycat
> > wasn't even a twinkle in our eyes so we weren't really thinking about
> that.
> > There were pros and cons to this and we decided it was better to do
> > partition assignment on the broker side. We could revisit this, it might
> > not be a massive change in the consumer, but it would definitely add work
> > there. I do agree that if we have learned one thing it is to keep clients
> > away from zk. This zk usage is more limited though, in that there is no
> > intention of having copycat in different languages as the clients are.
> >
> > 4. I think the idea is to include the structural schema information you
> > have available so it can be taken advantage of. Obviously the easiest
> > approach would just be to have a static schema for the messages like
> > timestamp + string/byte[]. However this means that i the source has
> schema
> > information there is no real official way to propagate that. Having a
> real
> > built-in schema mechanism gives you a little more power to make the data
> > usable. So if you were publishing apache logs the low-touch generic way
> > would just be to have the schema be "string" since that is what apache
> log
> > entries are. However if you had the actual format string used for the log
> > you could use that to have a richer schema and parse out the individual
> > fields, which is significantly more usable. The advantage of this is that
> > systems like databases, Hadoop, and so on that have some notion of
> schemas
> > can take advantage of this information that is captured with the source
> > data. So, e.g. the JDBC plugin can map the individual fields to columns
> > automatically, and you can support features like projecting out
> particular
> > fields and renaming fields easily without having to write custom
> > source-specific code.
> >
> > -Jay
> >
> >> On Tue, Jun 16, 2015 at 5:00 PM, Joe Stein <[email protected]>
> wrote:
> >>
> >> Hey Ewen, very interesting!
> >>
> >> I like the idea of the connector and making one side always being Kafka
> for
> >> all the reasons you mentioned. It makes having to build consumers (over
> and
> >> over and over (and over)) again for these type of tasks much more
> >> consistent for everyone.
> >>
> >> Some initial comments (will read a few more times and think more through
> >> it).
> >>
> >> 1) Copycat, it might be weird/hard to talk about producers, consumers,
> >> brokers and copycat for what and how "kafka" runs. I think the other
> naming
> >> makes sense but maybe we can call it something else? "Sinks" or whatever
> >> (don't really care just bringing up it might be something to consider).
> We
> >> could also just call it "connectors"...dunno.... producers, consumers,
> >> brokers and connectors...
> >>
> >> 2) Can we do copycat-workers without having to rely on Zookeeper? So
> much
> >> work has been done to remove this dependency if we can do something
> without
> >> ZK lets try (or at least abstract it so it is easier later to make it
> >> pluggable).
> >>
> >> 3) Even though connectors being managed in project has already been
> >> rejected... maybe we want to have a few (or one) that are in the project
> >> and maintained. This makes out of the box really out of the box (if only
> >> file or hdfs or something).
> >>
> >> 4) "all records include schemas which describe the format of their
> data" I
> >> don't totally get this... a lot of data doesn't have the schema with
> it, we
> >> have to plug that in... so would the plugin you are talking about for
> >> serializer would inject the schema to use with the record when it sees
> the
> >> data?
> >>
> >>
> >> ~ Joe Stein
> >> - - - - - - - - - - - - - - - - -
> >>
> >>  http://www.stealth.ly
> >> - - - - - - - - - - - - - - - - -
> >>
> >> On Tue, Jun 16, 2015 at 4:33 PM, Ewen Cheslack-Postava <
> [email protected]>
> >> wrote:
> >>
> >>> Oops, linked the wrong thing. Here's the correct one:
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
> >>>
> >>> -Ewen
> >>>
> >>> On Tue, Jun 16, 2015 at 4:32 PM, Ewen Cheslack-Postava <
> >> [email protected]>
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I just posted KIP-26 - Add Copycat, a connector framework for data
> >>>> import/export here:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >>>>
> >>>> This is a large KIP compared to what we've had so far, and is a bit
> >>>> different from most. We're proposing the addition of a fairly big new
> >>>> component to Kafka because we think including it as part of Kafka
> >> rather
> >>>> than as an external project is in the best interest of both Copycat
> and
> >>>> Kafka itself.
> >>>>
> >>>> The goal with this KIP is to decide whether such a tool would make
> >> sense
> >>>> in Kafka, give a high level sense of what it would entail, and scope
> >> what
> >>>> would be included vs what would be left to third-parties. I'm hoping
> to
> >>>> leave discussion of specific design and implementation details, as
> well
> >>>> logistics like how best to include it in the Kafka repository &
> >> project,
> >>> to
> >>>> the subsequent JIRAs or follow up KIPs.
> >>>>
> >>>> Looking forward to your feedback!
> >>>>
> >>>> -Ewen
> >>>>
> >>>> P.S. Preemptive relevant XKCD: https://xkcd.com/927/
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Ewen
> >>
>

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

Reply via email to