Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

Otis Gospodnetic Thu, 18 Jun 2015 13:21:24 -0700

Just a comment on the name. KopyKat? More unique, easy to write, pronounce, 
remember...


Otis

 

> On Jun 18, 2015, at 13:36, Jay Kreps <[email protected]> wrote:
> 
> 1. We were calling the plugins connectors (which is kind of a generic way
> to say either source or sink) and the framework copycat. The pro of copycat
> is it is kind of fun. The con is that it doesn't really say what it does.
> The Kafka Connector Framework would be a duller but more intuitive name,
> but I suspect people would then just shorten it to KCF which again has no
> intuitive meaning.
> 
> 2. Potentially. One alternative we had thought of wrt the consumer was to
> have the protocol just handle the group management part and have the
> partition assignment be purely driven by the client. At the time copycat
> wasn't even a twinkle in our eyes so we weren't really thinking about that.
> There were pros and cons to this and we decided it was better to do
> partition assignment on the broker side. We could revisit this, it might
> not be a massive change in the consumer, but it would definitely add work
> there. I do agree that if we have learned one thing it is to keep clients
> away from zk. This zk usage is more limited though, in that there is no
> intention of having copycat in different languages as the clients are.
> 
> 4. I think the idea is to include the structural schema information you
> have available so it can be taken advantage of. Obviously the easiest
> approach would just be to have a static schema for the messages like
> timestamp + string/byte[]. However this means that i the source has schema
> information there is no real official way to propagate that. Having a real
> built-in schema mechanism gives you a little more power to make the data
> usable. So if you were publishing apache logs the low-touch generic way
> would just be to have the schema be "string" since that is what apache log
> entries are. However if you had the actual format string used for the log
> you could use that to have a richer schema and parse out the individual
> fields, which is significantly more usable. The advantage of this is that
> systems like databases, Hadoop, and so on that have some notion of schemas
> can take advantage of this information that is captured with the source
> data. So, e.g. the JDBC plugin can map the individual fields to columns
> automatically, and you can support features like projecting out particular
> fields and renaming fields easily without having to write custom
> source-specific code.
> 
> -Jay
> 
>> On Tue, Jun 16, 2015 at 5:00 PM, Joe Stein <[email protected]> wrote:
>> 
>> Hey Ewen, very interesting!
>> 
>> I like the idea of the connector and making one side always being Kafka for
>> all the reasons you mentioned. It makes having to build consumers (over and
>> over and over (and over)) again for these type of tasks much more
>> consistent for everyone.
>> 
>> Some initial comments (will read a few more times and think more through
>> it).
>> 
>> 1) Copycat, it might be weird/hard to talk about producers, consumers,
>> brokers and copycat for what and how "kafka" runs. I think the other naming
>> makes sense but maybe we can call it something else? "Sinks" or whatever
>> (don't really care just bringing up it might be something to consider). We
>> could also just call it "connectors"...dunno.... producers, consumers,
>> brokers and connectors...
>> 
>> 2) Can we do copycat-workers without having to rely on Zookeeper? So much
>> work has been done to remove this dependency if we can do something without
>> ZK lets try (or at least abstract it so it is easier later to make it
>> pluggable).
>> 
>> 3) Even though connectors being managed in project has already been
>> rejected... maybe we want to have a few (or one) that are in the project
>> and maintained. This makes out of the box really out of the box (if only
>> file or hdfs or something).
>> 
>> 4) "all records include schemas which describe the format of their data" I
>> don't totally get this... a lot of data doesn't have the schema with it, we
>> have to plug that in... so would the plugin you are talking about for
>> serializer would inject the schema to use with the record when it sees the
>> data?
>> 
>> 
>> ~ Joe Stein
>> - - - - - - - - - - - - - - - - -
>> 
>>  http://www.stealth.ly
>> - - - - - - - - - - - - - - - - -
>> 
>> On Tue, Jun 16, 2015 at 4:33 PM, Ewen Cheslack-Postava <[email protected]>
>> wrote:
>> 
>>> Oops, linked the wrong thing. Here's the correct one:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
>>> 
>>> -Ewen
>>> 
>>> On Tue, Jun 16, 2015 at 4:32 PM, Ewen Cheslack-Postava <
>> [email protected]>
>>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I just posted KIP-26 - Add Copycat, a connector framework for data
>>>> import/export here:
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>>> 
>>>> This is a large KIP compared to what we've had so far, and is a bit
>>>> different from most. We're proposing the addition of a fairly big new
>>>> component to Kafka because we think including it as part of Kafka
>> rather
>>>> than as an external project is in the best interest of both Copycat and
>>>> Kafka itself.
>>>> 
>>>> The goal with this KIP is to decide whether such a tool would make
>> sense
>>>> in Kafka, give a high level sense of what it would entail, and scope
>> what
>>>> would be included vs what would be left to third-parties. I'm hoping to
>>>> leave discussion of specific design and implementation details, as well
>>>> logistics like how best to include it in the Kafka repository &
>> project,
>>> to
>>>> the subsequent JIRAs or follow up KIPs.
>>>> 
>>>> Looking forward to your feedback!
>>>> 
>>>> -Ewen
>>>> 
>>>> P.S. Preemptive relevant XKCD: https://xkcd.com/927/
>>> 
>>> 
>>> --
>>> Thanks,
>>> Ewen
>>

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

Reply via email to