Hey Ewen, very interesting! I like the idea of the connector and making one side always being Kafka for all the reasons you mentioned. It makes having to build consumers (over and over and over (and over)) again for these type of tasks much more consistent for everyone.
Some initial comments (will read a few more times and think more through it). 1) Copycat, it might be weird/hard to talk about producers, consumers, brokers and copycat for what and how "kafka" runs. I think the other naming makes sense but maybe we can call it something else? "Sinks" or whatever (don't really care just bringing up it might be something to consider). We could also just call it "connectors"...dunno.... producers, consumers, brokers and connectors... 2) Can we do copycat-workers without having to rely on Zookeeper? So much work has been done to remove this dependency if we can do something without ZK lets try (or at least abstract it so it is easier later to make it pluggable). 3) Even though connectors being managed in project has already been rejected... maybe we want to have a few (or one) that are in the project and maintained. This makes out of the box really out of the box (if only file or hdfs or something). 4) "all records include schemas which describe the format of their data" I don't totally get this... a lot of data doesn't have the schema with it, we have to plug that in... so would the plugin you are talking about for serializer would inject the schema to use with the record when it sees the data? ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Tue, Jun 16, 2015 at 4:33 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > Oops, linked the wrong thing. Here's the correct one: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767 > > -Ewen > > On Tue, Jun 16, 2015 at 4:32 PM, Ewen Cheslack-Postava <e...@confluent.io> > wrote: > > > Hi all, > > > > I just posted KIP-26 - Add Copycat, a connector framework for data > > import/export here: > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > > > > This is a large KIP compared to what we've had so far, and is a bit > > different from most. We're proposing the addition of a fairly big new > > component to Kafka because we think including it as part of Kafka rather > > than as an external project is in the best interest of both Copycat and > > Kafka itself. > > > > The goal with this KIP is to decide whether such a tool would make sense > > in Kafka, give a high level sense of what it would entail, and scope what > > would be included vs what would be left to third-parties. I'm hoping to > > leave discussion of specific design and implementation details, as well > > logistics like how best to include it in the Kafka repository & project, > to > > the subsequent JIRAs or follow up KIPs. > > > > Looking forward to your feedback! > > > > -Ewen > > > > P.S. Preemptive relevant XKCD: https://xkcd.com/927/ > > > > > > > -- > Thanks, > Ewen >