I think we want the connectors to be federated just because trying to maintain all the connectors centrally would be really painful. I think if we really do this well we would want to have >100 of these connectors so it really won't make sense to maintain them with the project. I think the thought was just to include the framework and maybe one simple connector as an example.
Thoughts? -Jay On Fri, Jun 19, 2015 at 2:38 PM, Gwen Shapira <gshap...@cloudera.com> wrote: > I think BikeShed will be a great name. > > Can you clarify the scope? The KIP discusses a framework and also few > examples for connectors. Does the addition include just the framework > (and perhaps an example or two), or do we plan to start accepting > connectors to Apache Kafka project? > > Gwen > > On Thu, Jun 18, 2015 at 3:09 PM, Jay Kreps <j...@confluent.io> wrote: > > I think the only problem we came up with was that Kafka KopyKat > abbreviates > > as KKK which is not ideal in the US. Copykat would still be googlable > > without that issue. :-) > > > > -Jay > > > > On Thu, Jun 18, 2015 at 1:20 PM, Otis Gospodnetic < > > otis.gospodne...@gmail.com> wrote: > > > >> Just a comment on the name. KopyKat? More unique, easy to write, > >> pronounce, remember... > >> > >> Otis > >> > >> > >> > >> > On Jun 18, 2015, at 13:36, Jay Kreps <j...@confluent.io> wrote: > >> > > >> > 1. We were calling the plugins connectors (which is kind of a generic > way > >> > to say either source or sink) and the framework copycat. The pro of > >> copycat > >> > is it is kind of fun. The con is that it doesn't really say what it > does. > >> > The Kafka Connector Framework would be a duller but more intuitive > name, > >> > but I suspect people would then just shorten it to KCF which again > has no > >> > intuitive meaning. > >> > > >> > 2. Potentially. One alternative we had thought of wrt the consumer > was to > >> > have the protocol just handle the group management part and have the > >> > partition assignment be purely driven by the client. At the time > copycat > >> > wasn't even a twinkle in our eyes so we weren't really thinking about > >> that. > >> > There were pros and cons to this and we decided it was better to do > >> > partition assignment on the broker side. We could revisit this, it > might > >> > not be a massive change in the consumer, but it would definitely add > work > >> > there. I do agree that if we have learned one thing it is to keep > clients > >> > away from zk. This zk usage is more limited though, in that there is > no > >> > intention of having copycat in different languages as the clients are. > >> > > >> > 4. I think the idea is to include the structural schema information > you > >> > have available so it can be taken advantage of. Obviously the easiest > >> > approach would just be to have a static schema for the messages like > >> > timestamp + string/byte[]. However this means that i the source has > >> schema > >> > information there is no real official way to propagate that. Having a > >> real > >> > built-in schema mechanism gives you a little more power to make the > data > >> > usable. So if you were publishing apache logs the low-touch generic > way > >> > would just be to have the schema be "string" since that is what apache > >> log > >> > entries are. However if you had the actual format string used for the > log > >> > you could use that to have a richer schema and parse out the > individual > >> > fields, which is significantly more usable. The advantage of this is > that > >> > systems like databases, Hadoop, and so on that have some notion of > >> schemas > >> > can take advantage of this information that is captured with the > source > >> > data. So, e.g. the JDBC plugin can map the individual fields to > columns > >> > automatically, and you can support features like projecting out > >> particular > >> > fields and renaming fields easily without having to write custom > >> > source-specific code. > >> > > >> > -Jay > >> > > >> >> On Tue, Jun 16, 2015 at 5:00 PM, Joe Stein <joe.st...@stealth.ly> > >> wrote: > >> >> > >> >> Hey Ewen, very interesting! > >> >> > >> >> I like the idea of the connector and making one side always being > Kafka > >> for > >> >> all the reasons you mentioned. It makes having to build consumers > (over > >> and > >> >> over and over (and over)) again for these type of tasks much more > >> >> consistent for everyone. > >> >> > >> >> Some initial comments (will read a few more times and think more > through > >> >> it). > >> >> > >> >> 1) Copycat, it might be weird/hard to talk about producers, > consumers, > >> >> brokers and copycat for what and how "kafka" runs. I think the other > >> naming > >> >> makes sense but maybe we can call it something else? "Sinks" or > whatever > >> >> (don't really care just bringing up it might be something to > consider). > >> We > >> >> could also just call it "connectors"...dunno.... producers, > consumers, > >> >> brokers and connectors... > >> >> > >> >> 2) Can we do copycat-workers without having to rely on Zookeeper? So > >> much > >> >> work has been done to remove this dependency if we can do something > >> without > >> >> ZK lets try (or at least abstract it so it is easier later to make it > >> >> pluggable). > >> >> > >> >> 3) Even though connectors being managed in project has already been > >> >> rejected... maybe we want to have a few (or one) that are in the > project > >> >> and maintained. This makes out of the box really out of the box (if > only > >> >> file or hdfs or something). > >> >> > >> >> 4) "all records include schemas which describe the format of their > >> data" I > >> >> don't totally get this... a lot of data doesn't have the schema with > >> it, we > >> >> have to plug that in... so would the plugin you are talking about for > >> >> serializer would inject the schema to use with the record when it > sees > >> the > >> >> data? > >> >> > >> >> > >> >> ~ Joe Stein > >> >> - - - - - - - - - - - - - - - - - > >> >> > >> >> http://www.stealth.ly > >> >> - - - - - - - - - - - - - - - - - > >> >> > >> >> On Tue, Jun 16, 2015 at 4:33 PM, Ewen Cheslack-Postava < > >> e...@confluent.io> > >> >> wrote: > >> >> > >> >>> Oops, linked the wrong thing. Here's the correct one: > >> >> > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767 > >> >>> > >> >>> -Ewen > >> >>> > >> >>> On Tue, Jun 16, 2015 at 4:32 PM, Ewen Cheslack-Postava < > >> >> e...@confluent.io> > >> >>> wrote: > >> >>> > >> >>>> Hi all, > >> >>>> > >> >>>> I just posted KIP-26 - Add Copycat, a connector framework for data > >> >>>> import/export here: > >> >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > >> >>>> > >> >>>> This is a large KIP compared to what we've had so far, and is a bit > >> >>>> different from most. We're proposing the addition of a fairly big > new > >> >>>> component to Kafka because we think including it as part of Kafka > >> >> rather > >> >>>> than as an external project is in the best interest of both Copycat > >> and > >> >>>> Kafka itself. > >> >>>> > >> >>>> The goal with this KIP is to decide whether such a tool would make > >> >> sense > >> >>>> in Kafka, give a high level sense of what it would entail, and > scope > >> >> what > >> >>>> would be included vs what would be left to third-parties. I'm > hoping > >> to > >> >>>> leave discussion of specific design and implementation details, as > >> well > >> >>>> logistics like how best to include it in the Kafka repository & > >> >> project, > >> >>> to > >> >>>> the subsequent JIRAs or follow up KIPs. > >> >>>> > >> >>>> Looking forward to your feedback! > >> >>>> > >> >>>> -Ewen > >> >>>> > >> >>>> P.S. Preemptive relevant XKCD: https://xkcd.com/927/ > >> >>> > >> >>> > >> >>> -- > >> >>> Thanks, > >> >>> Ewen > >> >> > >> >