I agree with you Rhys that Kafka Connect is an integral part of Apache
Kafka and it perfectly makes sense, in many cases, not to overload the core
or the clients with responsibilities that are related to data export to and
data import from specific systems. That can be true even when Kafka itself
is such a system. Connect already provides a scalable and fault tolerant
infrastructure for such "connectors" (in the broader sense of the word).

But I see the arguments of including connectors in AK and achieving better
mirroring between Kafka clusters as orthogonal. My reference to MirrorMaker
meant to say that MirrorMaker was released in AK when Connect or Streams
where not in place yet, not that it's an outdated tool. It frequently
receives updates (copying headers, handling messages without timestamps,
etc). Also, independently, I'm not dogmatic about not including Connectors
as part of AK. It might happen in the future. But for this particular
connector suggested by this KIP, I don't see a compelling reason to do so.
Mirror Maker is still maintained and if it was an intention of this KIP to
suggest its replacement, then the "Compatibility, Deprecation, and
Migration Plan" section should have been much more extended. Having said
that, I would still be in favor of such a connector living in a separate
repo right now, given that several other connectors or tools are available
to copy data between Kafka clusters and, again, that MirrorMaker is not
deprecated.

In any case, thanks for bringing this subject up. I also welcome this
discussion.

Konstantine



On Wed, Sep 26, 2018 at 2:02 PM McCaig, Rhys <rhys_mcc...@comcast.com>
wrote:

> Hi Konstantine,
>
> Thank you for your thoughtful comments!
>
> > However, I don't think the apache/kafka repository is the right place to
> > host such a Connector.
>
> <snip>
> > I find this approach very appealing. AK focuses on providing the core
> > infrastructure for Connect, that is required in every Kafka Connect
> > deployment, as well as offering the means to generically install, deploy
> > and operate connectors.
>
> I personally flip-flopped on this with similar thoughts with this when I
> initially considered raising a KIP for this functionality.
>
> When I initially developed a Kafka source connector, this was out of
> necessity - MirrorMaker requires zkconnect strings, which I didn't have
> access to for the source cluster, and Confluent’s proprietary connector
> also requried zk connections - though it has now been updated to remove
> this limitation.
>
> While I understand the point of view that MirrorMaker refers to the early
> days of Apache Kafka, it has become a critical tool for replicating data
> across Kafka clusters in for a large portion of the community who are
> managing Kafka at scale. As such, I suspect that there is a lot of interest
> in the Kafka project supporting topic replication across clusters. While
> one approach (which I don’t have the knowledge or time to address) could be
> to include it as a core component of Kafka itself (such as Apache Pulsar’s
> global topics), my view is that at this point in time, Kafka Connect is
> considered *the* way to ship data in and our of a specific Kafka cluster,
> regardless of the external system.
>
> I’d welcome further discussion on whether the community thinks is the
> right approach for the Kafka project to take, in regards to handling Kafka
> topic mirroring. I *think* that its important and common enough, that there
> should be support in the project - and MirrorMaker is, as you mention,
> showing its age.
>
> Cheers,
> Rhys
>
>
>
>
> > On Sep 26, 2018, at 10:42 AM, Konstantine Karantasis <
> konstant...@confluent.io> wrote:
> >
> > Hi Rhys,
> >
> > thanks for the proposal and apologies for the late feedback. Utilizing
> > Connect to mirror Kafka topics is definitely a plausible proposal for a
> > very useful use case.
> >
> > However, I don't think the apache/kafka repository is the right place to
> > host such a Connector. Currently, no full-featured, production-ready
> > connectors are hosted in AK. The only two connectors shipped with AK
> > (FileStreamSourceConnector and FileStreamSinkConnector) are there to
> > demonstrate implementations only as examples.
> >
> > I find this approach very appealing. AK focuses on providing the core
> > infrastructure for Connect, that is required in every Kafka Connect
> > deployment, as well as offering the means to generically install, deploy
> > and operate connectors. But all the connectors reside outside AK and
> > comprise a vibrant ecosystem of open source and proprietary components
> > that, essentially - even for the most useful and ubiquitous of the
> > connectors - are optional for users to install and use. This seems simple
> > and flexible, both in terms of releasing and using/deploying software
> > related to Kafka Connect. I might even say that I'd be in favor of
> > extending this approach to all the Connect components, including
> > Transformations and Converters.
> >
> > I'm aware that MirrorMaker is part of AK, but to me this refers to the
> > early days of Apache Kafka, when the size of the project and the
> ecosystem
> > was smaller, Connect and Streams had not been implemented yet, and
> > mirroring topics between Kafka clusters was already a basic need. With a
> > much more rich ecosystem now and more sizable and well defined packages
> in
> > AK, I think the approach that decouples connectors from the Connect
> > framework itself is a good one.
> >
> > In my opinion, the fact that this connector targets Kafka itself as a
> > source is not an adequate reason to include it in apache/kafka within the
> > Connect framework. It seems it can evolve naturally, as every other
> > connector, in its own repository.
> >
> > Regards,
> > Konstantine
> >
> >
> > On Sat, Aug 4, 2018 at 7:20 PM McCaig, Rhys <rhys_mcc...@comcast.com>
> wrote:
> >
> >> Hi All,
> >>
> >> If there are no further comments on this KIP I’ll start a vote early
> this
> >> week.
> >>
> >> Rhys
> >>
> >> On Aug 1, 2018, at 12:32 AM, McCaig, Rhys <
> rhys_mcc...@cable.comcast.com
> >> <mailto:rhys_mcc...@cable.comcast.com>> wrote:
> >>
> >> Hi All,
> >>
> >> I’ve updated the proposal to include the improvements suggested by
> >> Stephane.
> >>
> >> I have also submitted a PR to implement this functionality into Kafka.
> >> https://github.com/apache/kafka/pull/5438
> >>
> >> I don’t have a benchmark against MirrorMaker yet, as I only currently
> have
> >> a local docker stack available to me, though I have seen very good
> >> performance in that test stack (200k messages/sec@100bytes on limited
> >> compute resource containers). Further benchmarking might take a few
> days.
> >>
> >> Review and comments would be appreciated.
> >>
> >> Cheers,
> >> Rhys
> >>
> >>
> >> On Jun 18, 2018, at 9:00 AM, McCaig, Rhys <
> rhys_mcc...@cable.comcast.com
> >> <mailto:rhys_mcc...@cable.comcast.com>> wrote:
> >>
> >> Hi Stephane,
> >>
> >> Thanks for your feedback and apologies for the delay in my response.
> >>
> >> Are there any performance benchmarks against Mirror Maker available? I'm
> >> interested to know if this is more performant / scalable.
> >> Regarding the implementation, here's some feedback:
> >>
> >>
> >> Currently I don’t have any performance benchmarks, but I think this is a
> >> great idea, ill see if I can set up something one the next week or so.
> >>
> >> - I think it's worth mentioning that this solution does not rely on
> >> consumer groups, and therefore tracking progress may be tricky. Can you
> >> think of a way to expose that?
> >>
> >> This is a reasonable concern. I’m not sure how to track this other than
> >> looking at the Kafka connect offsets. Once a messages is passed to the
> >> framework, I'm unaware of a way to get at the commit offsets on the
> >> producer side. Any thoughts?
> >>
> >> - Some code can be in config Validator I believe:
> >>
> >>
> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
> >>
> >> - I think your kip mentions `source.admin.` and `source.consumer.` but I
> >> don't see it reflected yet in the code
> >>
> >> - Is there a way to be flexible and merge list and regex, or offer the
> two
> >> simultaneously ? source_topics=my_static_topic,prefix.* ?
> >>
> >> Agree on all of the above - I will incorporate into the code later this
> >> week as ill get some time back to work on this.
> >>
> >> Cheers,
> >> Rhys
> >>
> >>
> >>
> >> On Jun 6, 2018, at 7:16 PM, Stephane Maarek <
> >> steph...@simplemachines.com.au<mailto:steph...@simplemachines.com.au>>
> >> wrote:
> >>
> >> Hi Rhys,
> >>
> >> I think this will be a great addition.
> >>
> >> Are there any performance benchmarks against Mirror Maker available? I'm
> >> interested to know if this is more performant / scalable.
> >> Regarding the implementation, here's some feedback:
> >>
> >> - I think it's worth mentioning that this solution does not rely on
> >> consumer groups, and therefore tracking progress may be tricky. Can you
> >> think of a way to expose that?
> >>
> >>
> >> - Some code can be in config Validator I believe:
> >>
> >>
> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
> >>
> >> - I think your kip mentions `source.admin.` and `source.consumer.` but I
> >> don't see it reflected yet in the code
> >>
> >> - Is there a way to be flexible and merge list and regex, or offer the
> two
> >> simultaneously ? source_topics=my_static_topic,prefix.* ?
> >>
> >> Hope that helps
> >> Stephane
> >>
> >> Kind regards,
> >> Stephane
> >>
> >> [image: Simple Machines]
> >>
> >> Stephane Maarek | Developer
> >>
> >> +61 416 575 980
> >> steph...@simplemachines.com.au<mailto:steph...@simplemachines.com.au>
> >> simplemachines.com.au<http://simplemachines.com.au>
> >> Level 2, 145 William Street, Sydney NSW 2010
> >>
> >> On 5 June 2018 at 09:04, McCaig, Rhys <rhys_mcc...@comcast.com<mailto:
> >> rhys_mcc...@comcast.com>> wrote:
> >>
> >> Hi All,
> >>
> >> As I didn’t get any comment on this KIP and there has since been an
> >> additional 2 KIP’s created numbered 308 since, I'm bumping this and
> >> renaming the KIP to 310 to remove the duplication:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >>
> >> Let me know if you have any comments or feedback, would love to hear
> them.
> >>
> >> Cheers,
> >> Rhys
> >>
> >> On May 28, 2018, at 10:23 PM, McCaig, Rhys <rhys_mcc...@comcast.com
> >> <mailto:rhys_mcc...@comcast.com>>
> >> wrote:
> >>
> >> Sorry for the bad link to the KIP, here it is:
> https://cwiki.apache.org/
> >> confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
> >> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
> >
> >> Connector+to+Kafka+Connect
> >>
> >> On May 28, 2018, at 10:19 PM, McCaig, Rhys <rhys_mcc...@comcast.com
> >> <mailto:rhys_mcc...@comcast.com>>
> >> wrote:
> >>
> >> Hi All,
> >>
> >> I added a KIP to include a Kafka Source Connector with Kafka Connect.
> >> Here is the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >
> >> <htt
> >> ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
> >>
> >> Looking forward to your feedback and suggestions.
> >>
> >> Cheers,
> >> Rhys
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
>

Reply via email to