I think the only problem we came up with was that Kafka KopyKat abbreviates as KKK which is not ideal in the US. Copykat would still be googlable without that issue. :-)
-Jay On Thu, Jun 18, 2015 at 1:20 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Just a comment on the name. KopyKat? More unique, easy to write, > pronounce, remember... > > Otis > > > > > On Jun 18, 2015, at 13:36, Jay Kreps <j...@confluent.io> wrote: > > > > 1. We were calling the plugins connectors (which is kind of a generic way > > to say either source or sink) and the framework copycat. The pro of > copycat > > is it is kind of fun. The con is that it doesn't really say what it does. > > The Kafka Connector Framework would be a duller but more intuitive name, > > but I suspect people would then just shorten it to KCF which again has no > > intuitive meaning. > > > > 2. Potentially. One alternative we had thought of wrt the consumer was to > > have the protocol just handle the group management part and have the > > partition assignment be purely driven by the client. At the time copycat > > wasn't even a twinkle in our eyes so we weren't really thinking about > that. > > There were pros and cons to this and we decided it was better to do > > partition assignment on the broker side. We could revisit this, it might > > not be a massive change in the consumer, but it would definitely add work > > there. I do agree that if we have learned one thing it is to keep clients > > away from zk. This zk usage is more limited though, in that there is no > > intention of having copycat in different languages as the clients are. > > > > 4. I think the idea is to include the structural schema information you > > have available so it can be taken advantage of. Obviously the easiest > > approach would just be to have a static schema for the messages like > > timestamp + string/byte[]. However this means that i the source has > schema > > information there is no real official way to propagate that. Having a > real > > built-in schema mechanism gives you a little more power to make the data > > usable. So if you were publishing apache logs the low-touch generic way > > would just be to have the schema be "string" since that is what apache > log > > entries are. However if you had the actual format string used for the log > > you could use that to have a richer schema and parse out the individual > > fields, which is significantly more usable. The advantage of this is that > > systems like databases, Hadoop, and so on that have some notion of > schemas > > can take advantage of this information that is captured with the source > > data. So, e.g. the JDBC plugin can map the individual fields to columns > > automatically, and you can support features like projecting out > particular > > fields and renaming fields easily without having to write custom > > source-specific code. > > > > -Jay > > > >> On Tue, Jun 16, 2015 at 5:00 PM, Joe Stein <joe.st...@stealth.ly> > wrote: > >> > >> Hey Ewen, very interesting! > >> > >> I like the idea of the connector and making one side always being Kafka > for > >> all the reasons you mentioned. It makes having to build consumers (over > and > >> over and over (and over)) again for these type of tasks much more > >> consistent for everyone. > >> > >> Some initial comments (will read a few more times and think more through > >> it). > >> > >> 1) Copycat, it might be weird/hard to talk about producers, consumers, > >> brokers and copycat for what and how "kafka" runs. I think the other > naming > >> makes sense but maybe we can call it something else? "Sinks" or whatever > >> (don't really care just bringing up it might be something to consider). > We > >> could also just call it "connectors"...dunno.... producers, consumers, > >> brokers and connectors... > >> > >> 2) Can we do copycat-workers without having to rely on Zookeeper? So > much > >> work has been done to remove this dependency if we can do something > without > >> ZK lets try (or at least abstract it so it is easier later to make it > >> pluggable). > >> > >> 3) Even though connectors being managed in project has already been > >> rejected... maybe we want to have a few (or one) that are in the project > >> and maintained. This makes out of the box really out of the box (if only > >> file or hdfs or something). > >> > >> 4) "all records include schemas which describe the format of their > data" I > >> don't totally get this... a lot of data doesn't have the schema with > it, we > >> have to plug that in... so would the plugin you are talking about for > >> serializer would inject the schema to use with the record when it sees > the > >> data? > >> > >> > >> ~ Joe Stein > >> - - - - - - - - - - - - - - - - - > >> > >> http://www.stealth.ly > >> - - - - - - - - - - - - - - - - - > >> > >> On Tue, Jun 16, 2015 at 4:33 PM, Ewen Cheslack-Postava < > e...@confluent.io> > >> wrote: > >> > >>> Oops, linked the wrong thing. Here's the correct one: > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767 > >>> > >>> -Ewen > >>> > >>> On Tue, Jun 16, 2015 at 4:32 PM, Ewen Cheslack-Postava < > >> e...@confluent.io> > >>> wrote: > >>> > >>>> Hi all, > >>>> > >>>> I just posted KIP-26 - Add Copycat, a connector framework for data > >>>> import/export here: > >> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > >>>> > >>>> This is a large KIP compared to what we've had so far, and is a bit > >>>> different from most. We're proposing the addition of a fairly big new > >>>> component to Kafka because we think including it as part of Kafka > >> rather > >>>> than as an external project is in the best interest of both Copycat > and > >>>> Kafka itself. > >>>> > >>>> The goal with this KIP is to decide whether such a tool would make > >> sense > >>>> in Kafka, give a high level sense of what it would entail, and scope > >> what > >>>> would be included vs what would be left to third-parties. I'm hoping > to > >>>> leave discussion of specific design and implementation details, as > well > >>>> logistics like how best to include it in the Kafka repository & > >> project, > >>> to > >>>> the subsequent JIRAs or follow up KIPs. > >>>> > >>>> Looking forward to your feedback! > >>>> > >>>> -Ewen > >>>> > >>>> P.S. Preemptive relevant XKCD: https://xkcd.com/927/ > >>> > >>> > >>> -- > >>> Thanks, > >>> Ewen > >> >