Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

Ewen Cheslack-Postava Mon, 22 Jun 2015 16:11:11 -0700

I'll respond to specific comments, but at the bottom of this email I've
included some comparisons with other connector frameworks and Kafka
import/export tools. This definitely isn't an exhaustive list, but
hopefully will clarify how I'm thinking about Copycat should live wrt these
other systems.

Since Jay replied with 2 essays as I was writing this up, there may be some
duplication. Sorry for the verbosity...

@Roshan - The main gist is that by designing a framework around Kafka, we
don't have to generalize in a way that loses important features. Of the
systems you mentioned, the ones that are fairly general and have lots of
connectors don't offer the parallelism or semantics that could be achieved
(e.g. Flume) and the ones that have these benefits are almost all highly
specific to just one or two systems (e.g. Camus). Since Kafka is
increasingly becoming a central hub for streaming data (and buffer for
batch systems), one *common* system for integrating all these pieces is
pretty compelling.
Import: Flume is just one of many similar systems designed around log
collection. See notes below, but one major point is that they generally
don't provide any sort of guaranteed delivery semantics.
Export: Same deal here, you either get good delivery semantics and
parallelism for one system or a lot of connectors with very limited
guarantees. Copycat is intended to make it very easy to write connectors
for a variety of systems, get good (configurable!) delivery semantics,
parallelism, and work for a wide variety of systems (e.g. both batch and
streaming).
YARN: My point isn't that YARN is bad, it's that tying to any particular
cluster manager severely limits the applicability of the tool. The goal is
to make Copycat agnostic to the cluster manager so it can run under Mesos,
YARN, etc.
Exactly once: You accomplish this in any system by managing offsets in the
destination system atomically with the data or through some kind of
deduplication. Jiangjie actually just gave a great talk about this issue at
a recent Kafka meetup, perhaps he can share some slides about it. When you
see all the details involved, you'll see why I think it might be nice to
have the framework help you manage the complexities of achieving different
delivery semantics ;)
Connector variety: Addressed above.

@Jiangjie -
1. Yes, the expectation is that most coding is in the connectors. Ideally
the framework doesn't need many changes after we get the basics up and
running. But I'm not sure I understand what you mean about a library vs.
static framework?
2. This depends on packaging. We should at least have a separate jar, just
as we now do with clients. It's true that the tar.gz downloads would
contain both, but that probably makes sense since you need Kafka to do any
local testing with Copycat anyway, which you presumably want to do before
running any production jobs.

@Gwen -
I agree that the community around a project is really important. Some of
the issues you mentioned -- committership and dependencies -- are
definitely important considerations. The community aspect can easily make
or break something like Copycat. I think this is something Kafka needs to
address anyway (committership in particular, since committers are currently
overloaded).

One immediate benefit of including it in the same community is that it
starts out with a great, supportive community. We'd get to leverage all the
great existing Kafka knowledge of the community. It also means Copycat
patches are more likely to be seen by Kafka devs that can give helpful
reviews. I'll definitely agree that there are some drawbacks too -- joining
the mailing lists might be a bit overwhelming if you only wanted help w/
Copycat :)

Another benefit, not to be overlooked, is that it avoids a bunch of extra
overhead. Incubating an entire separate Apache project adds a *lot* of
overhead.

I also want to mention that the KIP specifically mentions that Copycat
should use public Kafka APIs, but I don't think this means development of
both should be decoupled. In particular, the distributed version of Copycat
needs functionality that is very closely related to functionality that
already exists in Kafka, some of which is exposed via public protocols
(worker membership needs to be tracked like consumers, worker assignments
have similar needs to consumer topic-partition assignments, offset commits
in Copycat are similar to consumer offset commits). It's hard to say if any
of that can be directly reused, but if it could, it could pay off in
spades. Even if not, since there are so many similar issues involved, it'd
be worth it just to leverage previous experience. Even though Copycat
should be cleanly separated from the main Kafka code (just as the clients
are now cleanly separated from the broker), I think they can likely benefit
from careful co-evolution that is more difficult to achieve if they really
are separate communities.

On docs, you're right that we could address that issue just by adding a few
links, but that doesn't get to quite the level I was imagining.  I think
the integration with documentation should probably be fairly extensive --
it is easily tied into the Getting Started section, has an embedded API
that should be explained along with the client APIs, it helps explain some
use cases more clearly, fits into discussions in the docs about design
decisions (e.g. section on delivery semantics), etc. The fact that there is
basically nothing in the documentation today about getting data into and
out of Kafka actually makes it a lot harder for people to get started.
(Console producer/consumer don't count :)

Finally, incorporating the project into Kafka itself has another important
effect: it indicates to people that the Kafka devs have thought through the
best way to do import/export from Kafka, which is an important signal about
the quality of the framework (although, admittedly not a clear indicator of
the quality of individual connectors). You can kind of get that just by
linking the project in the docs or semi-officially recommending it, but the
impact isn't the same.

Anyway, I agree that there are some drawbacks to making it part of Kafka
itself, but (obviously) I think the benefits outweigh the drawbacks.

And here's the promised review of related systems:

   1.

   *Log and metric collection, processing, and aggregation*

   Examples: Flume <https://flume.apache.org>, Logstash
   <http://logstash.net>, Fluentd <http://www.fluentd.org>, Heka
   <http://hekad.readthedocs.org>

   These systems are motivated by the need to collect and process large
   quantities of log or metric data from both application and infrastructure
   servers. This leads to a very common design using an *agent* on each
   node that collects the log data, possibly buffers it in case of faults, and
   forwards it either to a destination storage system or an aggregation agent
   which further processes the data before forwarding it again. In order to
   get the data from its source format into a format for the destination
   system, these systems have a framework for decoding, filtering, and
   encoding events.

   However, it does not extend well to many other use cases. For example,
   these systems do not handle integration with batch systems like HDFS well
   because they are designed around the expectation that processing of each
   event will be handled promptly, with most failure handling left to the
   user. Some systems offer some type of buffering and may even be persistent,
   but generally do not offer any guarantees.

   These types of systems can also be operationally complex for a large
   pipeline. Collecting logs requires an agent per server anyway. However, a
   purely agent-based approach then requires other tasks like copying data
   into Hadoop to allocate run their own agent, allocate dedicated server
   resources for it, and manually manage partitioning the job if it cannot be
   handled by a single process. Additionally, adding a new task may require
   reconfiguring upstream tasks as well since there is no standardized storage
   layer.
   2.

   *ETL for data warehousing*

   Examples: Gobblin <https://github.com/linkedin/gobblin>, Chukwa
   <http://chukwa.apache.org/>, Suro

<http://techblog.netflix.com/2013/12/announcing-suro-backbone-of-netflixs.html>,
   Morphlines
   <http://cloudera.github.io/cdk/docs/current/cdk-morphlines/index.html>,
   HIHO <https://github.com/sonalgoyal/hiho>

   These systems are trying to bridge the gap from a disparate set of
   systems to data warehouses, most popularly HDFS. This focus on data
   warehouses leads to a common set of patterns in these systems. Most
   obviously, they focus primarily on batch jobs. In some systems these
   batches can be made quite small, but they are not designed to achieve the
   low latency required for stream processing applications. This design makes
   sense given their goals and the context they were designed in, but does not
   extend to the variety of data replication jobs that are required in a
   stream data platform.

   Another common feature is a flexible, pluggable data processing
   pipeline. In the context of ETL for a data warehouse this is a requirement
   if processing can not be performed earlier in the data pipeline. Data must
   be processed into a form suitable for long term storage, querying, and
   analysis before it hits HDFS. However, this greatly complicates these tools
   – both their use and implementation – and requires users to learn how to
   process data in the ETL framework rather than use other, existing tools
   they might already understand.

   Finally, because of the very specific use case, these systems generally
   only work with a single sink (HDFS) or a small set of sinks that are very
   similar (e.g. HDFS and S3). Again, given the specific application domain
   this is a reasonable design tradeoff, but limits the use of these systems
   for other types of data copying jobs.
   3.

   *Data pipelines management*

   Examples: NiFi <https://nifi.incubator.apache.org/>

   These systems try to make building a data pipeline as easy as possible.
   Instead of focusing on configuration and execution of individual jobs that
   copy data between two systems, they give the operator a view of the entire
   pipeline and focus on ease of use through a GUI. At their core, they
   require the same basic components (individual copy tasks, data sources and
   sinks, intermediate queues, etc.), but the default view for these systems
   is of the entire pipeline.

   Because these systems “own” the data pipeline as a whole, they may not
   work well at the scale of an entire organization where different teams may
   need to control different parts of the pipeline. A large organization may
   have many mini data pipelines managed in a tool like this instead of one
   large data pipeline. However, this holistic view allows for better global
   handling of processing errors allows these systems to integrate monitoring
   and metrics.

   Additionally, these systems are designed around generic processor
   components which can be connected arbitrarily to create the data pipeline.
   This offers great flexibility, but provides few guarantees for reliability
   and delivery semantics. These systems usually have some queuing between
   stages, but this queuing usually provides limited fault tolerance, much
   like the log and metric processing systems.

On Mon, Jun 22, 2015 at 4:02 PM, Jay Kreps <j...@confluent.io> wrote:

> Hey Gwen,
>
> That makes a lot of sense. Here was the thinking on our side.
>
> I guess there are two questions, where does Copycat go and where do the
> connectors go?
>
> I'm in favor of Copycat being in Kafka and the connectors being federated.
>
> Arguments for federating connectors:
> - There will be like >> 100 connectors so if we keep them all in the same
> repo it will be a lot.
> - These plugin apis are a fantastic area for open source contribution--well
> defined, bite sized, immediately useful, etc.
> - If I wrote connector A I'm not particularly qualified to review connector
> B. These things require basic Kafka knowledge but mostly they're very
> system specific. Putting them all in one project ends up being kind of a
> mess.
> - Many people will have in-house systems that require custom connectors
> anyway.
> - You can't centrally maintain all the connectors so you need in any case
> need to solve the whole "app store" experience for connectors (Ewen laughs
> at me every time I say "app store for connectors"). Once you do that it
> makes sense to just use the mechanism for everything.
> - Many vendors we've talked to want to be able to maintain their own
> connector and release it with their system not release it with Kafka or
> another third party project.
> - There is definitely a role for testing and certification of the
> connectors but it's probably not something the main project should take on.
>
> Federation doesn't necessarily mean that there can only be one repository
> for each connector. We have a single repo for the connectors we're building
> at confluent just for simplicity. It just means that regardless of where
> the connector is maintained it integrates as a first-class citizen.
>
> Basically I think really nailing federated connectors is pretty central to
> having a healthy connector ecosystem which is the primary thing for making
> this work.
>
> Okay now the question of whether the copycat apis/framework should be in
> Kafka or be an external project. We debated this a lot internally.
>
> I was on the pro-kafka-inclusion side so let me give that argument. I think
> the apis for pulling data into Kafka or pushing into a third party system
> are actually really a core thing to what Kafka is. Kafka currently provides
> a push producer and pull consumer because those are the harder problems to
> solve, but about half the time you need the opposite (a pull producer and
> push consumer). It feels weird to include any new thing, but I actually
> feel like these apis are super central and natural to include in Kafka (in
> fact they are so natural many other system only have that style of API).
>
> I think the key question is whether we can do a good job at designing these
> apis. If we can then we should really have an official set of apis. Having
> official Kafka apis that are documented as part of the main docs and are
> part of each release will do a ton to help foster the connector ecosystem
> because it will be kind of a default way of doing Kaka integration and all
> the people building in-house from-scratch connectors will likely just use
> it. If it is a separate project then it is a separate discovery and
> adoption decision (this is somewhat irrational but totally true).
>
> I think one assumption we are making is that the copycat framework won't be
> huge. It should be a manageable chunk of code.
>
> I agree with your description of the some of the cons of bundling. However
> I think there are pros as well and some of them are quite important.
>
> The biggest is that for some reasons things that are maintained and
> documented together end up feeling and working like a single product. This
> is sort of a fuzzy thing. But one complaint I have about the Hadoop
> ecosystem (and it is one of the more amazing products of open source in the
> history of the world, so forgive the criticism) is that it FEELs like a
> loosely affiliated collection of independent things kind of bolted
> together. Products that are more centralized can give a much more holistic
> feel to usage (configuration, commands, monitoring, etc) and things that
> aren't somehow always drift apart (maybe just because the committers are
> different).
>
> So I actually totally agree with what you said about Spark. And if we end
> up trying to include a machine learning library or anything far afield I
> think I would agree we would have exactly that problem.
>
> But I think the argument I would make is that this is actually a gap in our
> existing product, not a new product and so having that identity is
> important.
>
> -Jay
>
> On Sun, Jun 21, 2015 at 9:24 PM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
>
> > Ah, I see this in rejected alternatives now. Sorry :)
> >
> > I actually prefer the idea of a separate project for framework +
> > connectors over having the framework be part of Apache Kafka.
> >
> > Looking at nearby examples: Hadoop has created a wide ecosystem of
> > projects, with Sqoop and Flume supplying connectors. Spark on the
> > other hand keeps its subprojects as part of Apache Spark.
> >
> > When I look at both projects, I see that Flume and Sqoop created
> > active communities (that was especially true a few years back when we
> > were rapidly growing), with many companies contributing. Spark OTOH
> > (and with all respect to my friends at Spark), has tons of
> > contributors to its core, but much less activity on its sub-projects
> > (for example, SparkStreaming). I strongly believe that SparkStreaming
> > is under-served by being a part of Spark, especially when compared to
> > Storm which is an independent project with its own community.
> >
> > The way I see it, connector frameworks are significantly simpler than
> > distributed data stores (although they are pretty large in terms of
> > code base, especially with copycat having its own distributed
> > processing framework). Which means that the barrier to contribution to
> > connector frameworks is lower, both for contributing to the framework
> > and for contributing connectors. Separate communities can also have
> > different rules regarding dependencies and committership.
> > Committership is the big one, and IMO what prevents SparkStreaming
> > from growing - I can give someone commit bit on Sqoop without giving
> > them any power over Hadoop. Not true for Spark and SparkStreaming.
> > This means that a CopyCat community (with its own sexy cat logo) will
> > be able to attract more volunteers and grow at a faster pace than core
> > Kafka, making it more useful to the community.
> >
> > The other part is that just like Kafka will be more useful with a
> > connector framework, a connector framework tends to work better when
> > there are lots of connectors. So if we decide to partition the Kafka /
> > Connector framework / Connectors triad, I'm not sure which
> > partitioning makes more sense. Giving CopyCat (I love the name. You
> > can say things like "get the data into MySQL and CC Kafka") its own
> > community will allow the CopyCat community to accept connector
> > contributions, which is good for CopyCat and for Kafka adoption.
> > Oracle and Netezza contributed connectors to Sqoop, they probably
> > couldn't contribute it at all if Sqoop was inside Hadoop, and they
> > can't really opensource their own stuff through Github, so it was a
> > win for our community. This doesn't negate the possibility to create
> > connectors for CopyCat and not contribute them to the community (like
> > the popular Teradata connector for Sqoop).
> >
> > Regarding ease of use and adoption: Right now, a lot of people adopt
> > Kafka as stand-alone piece, while Hadoop usually shows up through a
> > distribution. I expect that soon people will start adopting Kafka
> > through distributions, so the framework and a collection of connectors
> > will be part of every distribution. In the same way that no one thinks
> > of Sqoop or Flume as stand alone projects. With a bunch of Kafka
> > distributions out there, people will get Kafka + Framework +
> > Connectors, with a core connection portion being common to multiple
> > distributions - this will allow even easier adoption, while allowing
> > the Kafka community to focus on core Kafka.
> >
> > The point about documentation that Ewen has made in the KIP is a good
> > one. We definitely want to point people to the right place for export
> > / import tools. However, it sounds solvable with few links.
> >
> > Sorry for the lengthy essay - I'm a bit passionate about connectors
> > and want to see CopyCat off to a great start in life :)
> >
> > (BTW. I think Apache is a great place for CopyCat. I'll be happy to
> > help with the process of incubating it)
> >
> >
> > On Fri, Jun 19, 2015 at 2:47 PM, Jay Kreps <j...@confluent.io> wrote:
> > > I think we want the connectors to be federated just because trying to
> > > maintain all the connectors centrally would be really painful. I think
> if
> > > we really do this well we would want to have >100 of these connectors
> so
> > it
> > > really won't make sense to maintain them with the project. I think the
> > > thought was just to include the framework and maybe one simple
> connector
> > as
> > > an example.
> > >
> > > Thoughts?
> > >
> > > -Jay
> > >
> > > On Fri, Jun 19, 2015 at 2:38 PM, Gwen Shapira <gshap...@cloudera.com>
> > wrote:
> > >
> > >> I think BikeShed will be a great name.
> > >>
> > >> Can you clarify the scope? The KIP discusses a framework and also few
> > >> examples for connectors. Does the addition include just the framework
> > >> (and perhaps an example or two), or do we plan to start accepting
> > >> connectors to Apache Kafka project?
> > >>
> > >> Gwen
> > >>
> > >> On Thu, Jun 18, 2015 at 3:09 PM, Jay Kreps <j...@confluent.io> wrote:
> > >> > I think the only problem we came up with was that Kafka KopyKat
> > >> abbreviates
> > >> > as KKK which is not ideal in the US. Copykat would still be
> googlable
> > >> > without that issue. :-)
> > >> >
> > >> > -Jay
> > >> >
> > >> > On Thu, Jun 18, 2015 at 1:20 PM, Otis Gospodnetic <
> > >> > otis.gospodne...@gmail.com> wrote:
> > >> >
> > >> >> Just a comment on the name. KopyKat? More unique, easy to write,
> > >> >> pronounce, remember...
> > >> >>
> > >> >> Otis
> > >> >>
> > >> >>
> > >> >>
> > >> >> > On Jun 18, 2015, at 13:36, Jay Kreps <j...@confluent.io> wrote:
> > >> >> >
> > >> >> > 1. We were calling the plugins connectors (which is kind of a
> > generic
> > >> way
> > >> >> > to say either source or sink) and the framework copycat. The pro
> of
> > >> >> copycat
> > >> >> > is it is kind of fun. The con is that it doesn't really say what
> it
> > >> does.
> > >> >> > The Kafka Connector Framework would be a duller but more
> intuitive
> > >> name,
> > >> >> > but I suspect people would then just shorten it to KCF which
> again
> > >> has no
> > >> >> > intuitive meaning.
> > >> >> >
> > >> >> > 2. Potentially. One alternative we had thought of wrt the
> consumer
> > >> was to
> > >> >> > have the protocol just handle the group management part and have
> > the
> > >> >> > partition assignment be purely driven by the client. At the time
> > >> copycat
> > >> >> > wasn't even a twinkle in our eyes so we weren't really thinking
> > about
> > >> >> that.
> > >> >> > There were pros and cons to this and we decided it was better to
> do
> > >> >> > partition assignment on the broker side. We could revisit this,
> it
> > >> might
> > >> >> > not be a massive change in the consumer, but it would definitely
> > add
> > >> work
> > >> >> > there. I do agree that if we have learned one thing it is to keep
> > >> clients
> > >> >> > away from zk. This zk usage is more limited though, in that there
> > is
> > >> no
> > >> >> > intention of having copycat in different languages as the clients
> > are.
> > >> >> >
> > >> >> > 4. I think the idea is to include the structural schema
> information
> > >> you
> > >> >> > have available so it can be taken advantage of. Obviously the
> > easiest
> > >> >> > approach would just be to have a static schema for the messages
> > like
> > >> >> > timestamp + string/byte[]. However this means that i the source
> has
> > >> >> schema
> > >> >> > information there is no real official way to propagate that.
> > Having a
> > >> >> real
> > >> >> > built-in schema mechanism gives you a little more power to make
> the
> > >> data
> > >> >> > usable. So if you were publishing apache logs the low-touch
> generic
> > >> way
> > >> >> > would just be to have the schema be "string" since that is what
> > apache
> > >> >> log
> > >> >> > entries are. However if you had the actual format string used for
> > the
> > >> log
> > >> >> > you could use that to have a richer schema and parse out the
> > >> individual
> > >> >> > fields, which is significantly more usable. The advantage of this
> > is
> > >> that
> > >> >> > systems like databases, Hadoop, and so on that have some notion
> of
> > >> >> schemas
> > >> >> > can take advantage of this information that is captured with the
> > >> source
> > >> >> > data. So, e.g. the JDBC plugin can map the individual fields to
> > >> columns
> > >> >> > automatically, and you can support features like projecting out
> > >> >> particular
> > >> >> > fields and renaming fields easily without having to write custom
> > >> >> > source-specific code.
> > >> >> >
> > >> >> > -Jay
> > >> >> >
> > >> >> >> On Tue, Jun 16, 2015 at 5:00 PM, Joe Stein <
> joe.st...@stealth.ly>
> > >> >> wrote:
> > >> >> >>
> > >> >> >> Hey Ewen, very interesting!
> > >> >> >>
> > >> >> >> I like the idea of the connector and making one side always
> being
> > >> Kafka
> > >> >> for
> > >> >> >> all the reasons you mentioned. It makes having to build
> consumers
> > >> (over
> > >> >> and
> > >> >> >> over and over (and over)) again for these type of tasks much
> more
> > >> >> >> consistent for everyone.
> > >> >> >>
> > >> >> >> Some initial comments (will read a few more times and think more
> > >> through
> > >> >> >> it).
> > >> >> >>
> > >> >> >> 1) Copycat, it might be weird/hard to talk about producers,
> > >> consumers,
> > >> >> >> brokers and copycat for what and how "kafka" runs. I think the
> > other
> > >> >> naming
> > >> >> >> makes sense but maybe we can call it something else? "Sinks" or
> > >> whatever
> > >> >> >> (don't really care just bringing up it might be something to
> > >> consider).
> > >> >> We
> > >> >> >> could also just call it "connectors"...dunno.... producers,
> > >> consumers,
> > >> >> >> brokers and connectors...
> > >> >> >>
> > >> >> >> 2) Can we do copycat-workers without having to rely on
> Zookeeper?
> > So
> > >> >> much
> > >> >> >> work has been done to remove this dependency if we can do
> > something
> > >> >> without
> > >> >> >> ZK lets try (or at least abstract it so it is easier later to
> > make it
> > >> >> >> pluggable).
> > >> >> >>
> > >> >> >> 3) Even though connectors being managed in project has already
> > been
> > >> >> >> rejected... maybe we want to have a few (or one) that are in the
> > >> project
> > >> >> >> and maintained. This makes out of the box really out of the box
> > (if
> > >> only
> > >> >> >> file or hdfs or something).
> > >> >> >>
> > >> >> >> 4) "all records include schemas which describe the format of
> their
> > >> >> data" I
> > >> >> >> don't totally get this... a lot of data doesn't have the schema
> > with
> > >> >> it, we
> > >> >> >> have to plug that in... so would the plugin you are talking
> about
> > for
> > >> >> >> serializer would inject the schema to use with the record when
> it
> > >> sees
> > >> >> the
> > >> >> >> data?
> > >> >> >>
> > >> >> >>
> > >> >> >> ~ Joe Stein
> > >> >> >> - - - - - - - - - - - - - - - - -
> > >> >> >>
> > >> >> >>  http://www.stealth.ly
> > >> >> >> - - - - - - - - - - - - - - - - -
> > >> >> >>
> > >> >> >> On Tue, Jun 16, 2015 at 4:33 PM, Ewen Cheslack-Postava <
> > >> >> e...@confluent.io>
> > >> >> >> wrote:
> > >> >> >>
> > >> >> >>> Oops, linked the wrong thing. Here's the correct one:
> > >> >> >>
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
> > >> >> >>>
> > >> >> >>> -Ewen
> > >> >> >>>
> > >> >> >>> On Tue, Jun 16, 2015 at 4:32 PM, Ewen Cheslack-Postava <
> > >> >> >> e...@confluent.io>
> > >> >> >>> wrote:
> > >> >> >>>
> > >> >> >>>> Hi all,
> > >> >> >>>>
> > >> >> >>>> I just posted KIP-26 - Add Copycat, a connector framework for
> > data
> > >> >> >>>> import/export here:
> > >> >> >>
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > >> >> >>>>
> > >> >> >>>> This is a large KIP compared to what we've had so far, and is
> a
> > bit
> > >> >> >>>> different from most. We're proposing the addition of a fairly
> > big
> > >> new
> > >> >> >>>> component to Kafka because we think including it as part of
> > Kafka
> > >> >> >> rather
> > >> >> >>>> than as an external project is in the best interest of both
> > Copycat
> > >> >> and
> > >> >> >>>> Kafka itself.
> > >> >> >>>>
> > >> >> >>>> The goal with this KIP is to decide whether such a tool would
> > make
> > >> >> >> sense
> > >> >> >>>> in Kafka, give a high level sense of what it would entail, and
> > >> scope
> > >> >> >> what
> > >> >> >>>> would be included vs what would be left to third-parties. I'm
> > >> hoping
> > >> >> to
> > >> >> >>>> leave discussion of specific design and implementation
> details,
> > as
> > >> >> well
> > >> >> >>>> logistics like how best to include it in the Kafka repository
> &
> > >> >> >> project,
> > >> >> >>> to
> > >> >> >>>> the subsequent JIRAs or follow up KIPs.
> > >> >> >>>>
> > >> >> >>>> Looking forward to your feedback!
> > >> >> >>>>
> > >> >> >>>> -Ewen
> > >> >> >>>>
> > >> >> >>>> P.S. Preemptive relevant XKCD: https://xkcd.com/927/
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> --
> > >> >> >>> Thanks,
> > >> >> >>> Ewen
> > >> >> >>
> > >> >>
> > >>
> >
>

-- 
Thanks,
Ewen

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

Reply via email to