Re hops to stop the cycle and to allow a range of multi cluster topologies, see https://www.rabbitmq.com/federated-exchanges.html where very similar was done in rabbit.
On 12/7/18, 12:47 AM, "Michael Pearce" <michael.pea...@ig.com> wrote: Nice proposal. Some comments. On the section around cycle detection. I would like to see support for this to be done by hops, as well e.g. using approach is to use a header for the number of hops, as the mm2 replicates it increases the hop count and you can make the mm2 configurable to only produce messages onwards where hops are less than x. This then allows ring (hops = number of brokers in the ring), mesh (every cluster interconnected so hop=1), or even a tree (more fine grained setup) cluster topology. FYI we do this currently with the current mirror maker, using a custom handler. On the section around running a standalone MirrorMaker connector I would suggest making this as easy to run as the mirrormakers are today, with a simple single sh script. I assume this is what is proposed in section "Running MirrorMaker in legacy mode" but I would even do this before MM would be removed, with a -2 varient. e.g. ./bin/kafka-mirror-maker-2.sh --consumer consumer.properties --producer producer.properties Lastly I see no section on providing support for mirror maker Handlers, today people can add handlers to have a little extra custom logic if needed, and the handler api is public today so should be supported going forwards so people are not on mass re-writing these. On 12/5/18, 5:36 PM, "Ryanne Dolan" <ryannedo...@gmail.com> wrote: Sönke, > The only thing that I could come up with is the limitation to a single offset commit interval Yes, and other internal properties, e.g. those used by the internal consumers and producers, which, granted, probably are not often changed from their defaults, but that apply to Connectors across the entire cluster. Ryanne On Wed, Dec 5, 2018 at 3:21 AM Sönke Liebau <soenke.lie...@opencore.com.invalid> wrote: > Hi Ryanne, > > when you say "Currently worker configs apply across the entire cluster, > which is limiting even for use-cases involving a single Kafka cluster.", > may I ask you to elaborate on those limitations a little? > The only thing that I could come up with is the limitation to a single > offset commit interval value for all running connectors. > Maybe also the limitation to shared config providers.. > > But you sound like you had painful experiences with this before, maybe > you'd like to share the burden :) > > Best regards, > Sönke > > On Wed, Dec 5, 2018 at 5:15 AM Ryanne Dolan <ryannedo...@gmail.com> wrote: > > > Sönke, > > > > I think so long as we can keep the differences at a very high level (i.e. > > the "control plane"), there is little downside to MM2 and Connect > > coexisting. I do expect them to converge to some extent, with features > from > > MM2 being pulled into Connect whenever this is possible without breaking > > things. > > > > I could definitely see your idea re hierarchies or groups of connectors > > being useful outside MM2. Currently "worker configs" apply across the > > entire cluster, which is limiting even for use-cases involving a single > > Kafka cluster. If Connect supported multiple workers in the same cluster, > > it would start to look a lot like a MM2 cluster. > > > > Ryanne > > > > On Tue, Dec 4, 2018 at 3:26 PM Sönke Liebau > > <soenke.lie...@opencore.com.invalid> wrote: > > > > > Hi Ryanne, > > > > > > thanks for your response! > > > > > > It seems like you have already done a lot of investigation into the > > > existing code and the solution design and all of what you write makes > > sense > > > to me. Would it potentially be worth adding this to the KIP, now that > you > > > had to write it up because of me anyway? > > > > > > However, I am afraid that I am still not entirely convinced of the > > > fundamental benefit this provides over an extended Connect that has the > > > following functionality: > > > - allow for organizing connectors into a hierarchical structure - > > > "clusters/us-west/..." > > > - allow defining external Kafka clusters to be used by Source and Sink > > > connectors instead of the local cluster > > > > > > Personally I think both of these features are useful additions to > > Connect, > > > I'll address both separately below. > > > > > > Allowing to structure connectors in a hierarchy > > > Organizing running connectors will grow more important as corporate > > > customers adapt Connect and installations grow in size. Additionally > this > > > could be useful for ACLs in case they are ever added to Connect, as you > > > could allow specific users access only to specific namespaces (and > until > > > ACLs are added it would facilitate using a reverse proxy for the same > > > effect). > > > > > > Allow accessing multiple external clusters > > > The reasoning for this feature is pretty much the same as for a central > > > Mirror Maker cluster, if a company has multiple clusters for whatever > > > reason but wants to have ingest centralized in one system aka one > Connect > > > cluster they would need the ability to read from and write to an > > arbitrary > > > number of Kafka clusters. > > > I haven't really looked at the code, just poked around a couple of > > minutes, > > > but it appears like this could be done with fairly low effort. My > general > > > idea would be to leave the existing configuration options untouched - > > > Connect will always need a "primary" cluster that is used for storage > of > > > internal data (config, offsets, status) there is no need to break > > existing > > > configs. But additionally allow adding named extra clusters by > specifying > > > options like > > > external.sales_cluster.bootstrap_servers=... > > > external.sales_cluster.ssl.keystore.location=... > > > external.marketing_cluster.bootstrap_servers=... > > > > > > The code for status, offset and config storage is mostly isolated in > the > > > Kafka[Offset|Status|Config]BackingStore classes and could remain pretty > > > much unchanged. > > > > > > Producer and consumer creation for Tasks is done in the Worker as of > > > KAFKA-7551 and is isolated in two functions. We could add a two more > > > functions with an extra argument for the external cluster name to be > used > > > and return fitting consumers/producers. > > > The source and sink config would then simply gain an optional setting > to > > > specify the cluster name. > > > > > > I am very sure that I am missing a few large issues with these ideas, > I'm > > > mostly back-of-the-napkin designing here, but it might be worth a > second > > > look. > > > > > > Once we decide to diverge into two clusters: MirrorMaker and Connect, I > > > think realistically the chance of those two ever being merged again > > because > > > they grow back together is practically zero - hence my hesitation. > > > > > > ---- > > > > > > All of that being said, I am absolutely happy to agree to disagree, I > > think > > > to a certain extent this is down to a question of personal > > > style/preference. And as this is your baby and you have put a lot more > > > effort and thought into it than I ever will I'll shut up now :) > > > > > > Again, thanks for all your good work! > > > > > > Best regards, > > > Sönke > > > > > > On Fri, Nov 30, 2018 at 9:00 PM Ryanne Dolan <ryannedo...@gmail.com> > > > wrote: > > > > > > > Thanks Sönke. > > > > > > > > > it just feels to me like an awful lot of Connect functionality > would > > > need > > > > to be reimplemented or at least wrapped > > > > > > > > Connect currently has two drivers, ConnectDistributed and > > > > ConnectStandalone. Both set up a Herder, which manages Workers. I've > > > > implemented a third driver which sets up multiple Herders, one for > each > > > > Kafka cluster as specified in a config file. From the Herder level > > down, > > > > nothing is changed or duplicated -- it's just Connect. > > > > > > > > For the REST API, Connect wraps a Herder in a RestServer class, which > > > > creates a Jetty server with a few JAX-RS resources. One of these > > > resources > > > > is ConnectorsResource, which is the real meat of the REST API, > enabling > > > > start, stop, creation, deletion, and configuration of Connectors. > > > > > > > > I've added MirrorRestServer, which wraps a set of Herders instead of > > one. > > > > The server exposes a single resource, ClustersResource, which is > only a > > > few > > > > lines of code: > > > > > > > > @GET > > > > @Path("/") > > > > public Collection<String> listClusters() { > > > > return clusters.keySet(); > > > > } > > > > > > > > @Path("/{cluster}") > > > > public ConnectorsResource > getConnectorsForCluster(@PathParam("cluster") > > > > cluster) { > > > > return new ConnectorsResource(clusters.get(cluster)); > > > > } > > > > > > > > (simplified a bit and subject to change) > > > > > > > > The ClustersResource defers to the existing ConnectorsResource, which > > > again > > > > is most of the Connect API. With this in place, I can make requests > > like: > > > > > > > > GET /clusters > > > > > > > > GET /clusters/us-west/connectors > > > > > > > > PUT /clusters/us-west/connectors/us-east/config > > > > { "topics" : "topic1" } > > > > > > > > etc. > > > > > > > > So on the whole, very little code is involved in implementing > > > "MirrorMaker > > > > clusters". I won't rule out adding additional features on top of this > > > basic > > > > API, but nothing should require re-implementing what is already in > > > Connect. > > > > > > > > > Wouldn't it be a viable alternative to look into extending Connect > > > itself > > > > > > > > Maybe Connect will evolve to the point where Connect clusters and > > > > MirrorMaker clusters are indistinguishable, but I think this is > > unlikely, > > > > since really no use-case outside replication would benefit from the > > added > > > > complexity. Moreover, I think support for multiple Kafka clusters > would > > > be > > > > hard to add without significant changes to the existing APIs and > > configs, > > > > which all assume a single Kafka cluster. I think Connect-as-a-Service > > and > > > > Replication-as-a-Service are sufficiently different use-cases that we > > > > should expect the APIs and configuration files to be at least > slightly > > > > different, even if both use the same framework underneath. That > said, I > > > do > > > > plan to contribute a few improvements to the Connect framework in > > support > > > > of MM2 -- just nothing within the scope of the current KIP. > > > > > > > > Thanks again! > > > > Ryanne > > > > > > > > > > > > On Fri, Nov 30, 2018 at 3:47 AM Sönke Liebau > > > > <soenke.lie...@opencore.com.invalid> wrote: > > > > > > > > > Hi Ryanne, > > > > > > > > > > thanks. I missed the remote to remote replication scenario in my > > train > > > of > > > > > thought, you are right. > > > > > > > > > > That being said I have to admit that I am not yet fully on board > with > > > the > > > > > concept, sorry. But I might just be misunderstanding what your > > > intention > > > > > is. Let me try and explain what I think it is you are trying to do > > and > > > > why > > > > > I am on the fence about that and take it from there. > > > > > > > > > > You want to create an extra mirrormaker driver class which will > take > > > > > multiple clusters as configuration options. Based on these clusters > > it > > > > will > > > > > then reuse the connect workers and create as many as necessary to > be > > > able > > > > > to replicate to/from each of those configured clusters. It will > then > > > > > expose a rest api (since you stated subset of Connect rest api I > > assume > > > > it > > > > > will be a new / own one?) that allows users to send requests like > > > > > "replicate topic a from cluster 1 to cluster 1" and start a > connector > > > on > > > > > the relevant worker that can offer this "route". > > > > > This can be extended to a cluster by starting mirror maker drivers > on > > > > other > > > > > nodes with the same config and it would offer all the connect > > features > > > of > > > > > balancing restarting in case of failure etc. > > > > > > > > > > If this understanding is correct then it just feels to me like an > > awful > > > > lot > > > > > of Connect functionality would need to be reimplemented or at least > > > > > wrapped, which potentially could mean additional effort for > > maintaining > > > > and > > > > > extending Connect down the line. Wouldn't it be a viable > alternative > > to > > > > > look into extending Connect itself to allow defining "remote > > clusters" > > > > > which can then be specified in the connector config to be used > > instead > > > of > > > > > the local cluster? I imagine that change itself would not be too > > > > extensive, > > > > > the main effort would probably be in coming up with a sensible > config > > > > > structure and ensuring backwards compatibility with existing > > connector > > > > > configs. > > > > > This would still allow to use a regular Connect cluster for an > > > arbitrary > > > > > number of clusters, thus still having a dedicated MirrorMaker > cluster > > > by > > > > > running only MirrorMaker Connectors in there if you want the > > > isolation. I > > > > > agree that it would not offer the level of abstraction around > > > replication > > > > > that your concept would enable to implement, but I think if would > be > > > far > > > > > less implementation and maintenance effort. > > > > > > > > > > But again, all of that is based on my, potentially flawed, > > > understanding > > > > of > > > > > your proposal, please feel free to correct me :) > > > > > > > > > > Best regards, > > > > > Sönke > > > > > > > > > > On Fri, Nov 30, 2018 at 1:39 AM Ryanne Dolan < > ryannedo...@gmail.com> > > > > > wrote: > > > > > > > > > > > Sönke, thanks for the feedback! > > > > > > > > > > > > > the renaming policy [...] can be disabled [...] The KIP itself > > > does > > > > > not > > > > > > mention this > > > > > > > > > > > > Good catch. I've updated the KIP to call this out. > > > > > > > > > > > > > "MirrorMaker clusters" I am not sure I fully understand the > issue > > > you > > > > > > are trying to solve > > > > > > > > > > > > MirrorMaker today is not scalable from an operational > perspective. > > > > Celia > > > > > > Kung at LinkedIn does a great job of explaining this problem [1], > > > which > > > > > has > > > > > > caused LinkedIn to drop MirrorMaker in favor of Brooklin. With > > > > Brooklin, > > > > > a > > > > > > single cluster, single API, and single UI controls replication > > flows > > > > for > > > > > an > > > > > > entire data center. With MirrorMaker 2.0, the vision is much the > > > same. > > > > > > > > > > > > If your data center consists of a small number of Kafka clusters > > and > > > an > > > > > > existing Connect cluster, it might make more sense to re-use the > > > > Connect > > > > > > cluster with MirrorSource/SinkConnectors. There's nothing wrong > > with > > > > this > > > > > > approach for small deployments, but this model also doesn't > scale. > > > This > > > > > is > > > > > > because Connect clusters are built around a single Kafka cluster > -- > > > > what > > > > > I > > > > > > call the "primary" cluster -- and all Connectors in the cluster > > must > > > > > either > > > > > > consume from or produce to this single cluster. If you have more > > than > > > > one > > > > > > "active" Kafka cluster in each data center, you'll end up needing > > > > > multiple > > > > > > Connect clusters there as well. > > > > > > > > > > > > The problem with Connect clusters for replication is way less > > severe > > > > > > compared to legacy MirrorMaker. Generally you need one Connect > > > cluster > > > > > per > > > > > > active Kafka cluster. As you point out, MM2's SinkConnector means > > you > > > > can > > > > > > get away with a single Connect cluster for topologies that center > > > > around > > > > > a > > > > > > single primary cluster. But each Connector within each Connect > > > cluster > > > > > must > > > > > > be configured independently, with no high-level view of your > > > > replication > > > > > > flows within and between data centers. > > > > > > > > > > > > With MirrorMaker 2.0, a single MirrorMaker cluster manages > > > replication > > > > > > across any number of Kafka clusters. Much like Brooklin, MM2 does > > the > > > > > work > > > > > > of setting up connectors between clusters as needed. This > > > > > > Replication-as-a-Service is a huge win for larger deployments, as > > > well > > > > as > > > > > > for organizations that haven't adopted Connect. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > https://www.slideshare.net/ConfluentInc/more-data-more-problems-scaling-kafkamirroring-pipelines-at-linkedin > > > > > > > > > > > > Keep the questions coming! Thanks. > > > > > > Ryanne > > > > > > > > > > > > On Thu, Nov 29, 2018 at 3:30 AM Sönke Liebau < > > > > soenke.lie...@opencore.com > > > > > > > > > > > > wrote: > > > > > > > > > > > >> Hi Ryanne, > > > > > >> > > > > > >> first of all, thanks for the KIP, great work overall and much > > > needed I > > > > > >> think! > > > > > >> > > > > > >> I have a small comment on the renaming policy, in one of the > mails > > > on > > > > > >> this thread you mention that this can be disabled (to replicate > > > topic1 > > > > > in > > > > > >> cluster A as topic1 on cluster B I assume). The KIP itself does > > not > > > > > mention > > > > > >> this, from reading just the KIP one might get the assumption > that > > > > > renaming > > > > > >> is mandatory. It might be useful to add a sentence or two around > > > > > renaming > > > > > >> policies and what is possible here. I assume you intend to make > > > these > > > > > >> pluggable? > > > > > >> > > > > > >> Regarding the latest addition of "MirrorMaker clusters" I am not > > > sure > > > > I > > > > > >> fully understand the issue you are trying to solve and what > > exactly > > > > > these > > > > > >> scripts will do - but that may just me being dense about it :) > > > > > >> I understand the limitation to a single source and target > cluster > > > that > > > > > >> Connect imposes, but isn't this worked around by the fact that > you > > > > have > > > > > >> MirrorSource- and MirrorSinkConnectors and one part of the > > equation > > > > will > > > > > >> always be under your control? > > > > > >> The way I understood your intention was that there is a > (regular, > > > not > > > > > MM) > > > > > >> Connect Cluster somewhere next to a Kafka Cluster A and if you > > > deploy > > > > a > > > > > >> MirrorSourceTask to that it will read messages from a remote > > > cluster B > > > > > and > > > > > >> replicate them into the local cluster A. If you deploy a > > > > MirrorSinkTask > > > > > it > > > > > >> will read from local cluster A and replicate into cluster B. > > > > > >> > > > > > >> Since in both causes the configuration for cluster B will be > > passed > > > > into > > > > > >> the connector in the ConnectorConfig contained in the rest > > request, > > > > > what's > > > > > >> to stop us from starting a third connector with a > MirrorSourceTask > > > > > reading > > > > > >> from cluster C? > > > > > >> > > > > > >> I am a bit hesitant about the entire concept of having extra > > scripts > > > > to > > > > > >> run an entire separate Connect cluster - I'd much prefer an > option > > > to > > > > > use a > > > > > >> regular connect cluster from an ops point of view. Is it maybe > > worth > > > > > >> spending some time investigating whether we can come up with a > > > change > > > > to > > > > > >> connect that enables what MM would need? > > > > > >> > > > > > >> Best regards, > > > > > >> Sönke > > > > > >> > > > > > >> > > > > > >> > > > > > >> On Tue, Nov 27, 2018 at 10:02 PM Ryanne Dolan < > > > ryannedo...@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > >>> Hey y'all, I'd like you draw your attention to a new section in > > > > KIP-382 > > > > > >>> re > > > > > >>> MirrorMaker Clusters: > > > > > >>> > > > > > >>> > > > > > >>> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382:+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-MirrorMakerClusters > > > > > >>> > > > > > >>> A common concern I hear about using Connect for replication is > > that > > > > all > > > > > >>> SourceConnectors in a Connect cluster must use the same target > > > Kafka > > > > > >>> cluster, and likewise all SinkConnectors must use the same > source > > > > Kafka > > > > > >>> cluster. In order to use multiple Kafka clusters from Connect, > > > there > > > > > are > > > > > >>> two possible approaches: > > > > > >>> > > > > > >>> 1) use an intermediate Kafka cluster, K. SourceConnectors (A, > B, > > C) > > > > > write > > > > > >>> to K and SinkConnectors (X, Y, Z) read from K. This enables > flows > > > > like > > > > > A > > > > > >>> -> > > > > > >>> K - > X but means that some topologies require extraneous hops, > > and > > > > > means > > > > > >>> that K must be scaled to handle records from all sources and > > sinks. > > > > > >>> > > > > > >>> 2) use multiple Connect clusters, one for each target cluster. > > Each > > > > > >>> cluster > > > > > >>> has multiple SourceConnectors, one for each source cluster. > This > > > > > enables > > > > > >>> direct replication of A -> X but means there is a proliferation > > of > > > > > >>> Connect > > > > > >>> clusters, each of which must be managed separately. > > > > > >>> > > > > > >>> Both options are viable for small deployments involving a small > > > > number > > > > > of > > > > > >>> Kafka clusters in a small number of data centers. However, > > neither > > > is > > > > > >>> scalable, especially from an operational standpoint. > > > > > >>> > > > > > >>> KIP-382 now introduces "MirrorMaker clusters", which are > distinct > > > > from > > > > > >>> Connect clusters. A single MirrorMaker cluster provides > > > > > >>> "Replication-as-a-Service" among any number of Kafka clusters > > via a > > > > > >>> high-level REST API based on the Connect API. Under the hood, > > > > > MirrorMaker > > > > > >>> sets up Connectors between each pair of Kafka clusters. The > REST > > > API > > > > > >>> enables on-the-fly reconfiguration of each Connector, including > > > > updates > > > > > >>> to > > > > > >>> topic whitelists/blacklists. > > > > > >>> > > > > > >>> To configure MirrorMaker 2.0, you need a configuration file > that > > > > lists > > > > > >>> connection information for each Kafka cluster (broker lists, > SSL > > > > > settings > > > > > >>> etc). At a minimum, this looks like: > > > > > >>> > > > > > >>> clusters=us-west, us-east > > > > > >>> cluster.us-west.broker.list=us-west-kafka-server:9092 > > > > > >>> cluster.us-east.broker.list=us-east-kafka-server:9092 > > > > > >>> > > > > > >>> You can specify topic whitelists and other connector-level > > settings > > > > > here > > > > > >>> too, or you can use the REST API to remote-control a running > > > cluster. > > > > > >>> > > > > > >>> I've also updated the KIP with minor changes to bring it in > line > > > with > > > > > the > > > > > >>> current implementation. > > > > > >>> > > > > > >>> Looking forward to your feedback, thanks! > > > > > >>> Ryanne > > > > > >>> > > > > > >>> On Mon, Nov 19, 2018 at 10:26 PM Ryanne Dolan < > > > ryannedo...@gmail.com > > > > > > > > > > >>> wrote: > > > > > >>> > > > > > >>> > Dan, you've got it right. ACL sync will be done by MM2 > > > > automatically > > > > > >>> > (unless disabled) according to simple rules: > > > > > >>> > > > > > > >>> > - If a principal has READ access on a topic in a source > > cluster, > > > > the > > > > > >>> same > > > > > >>> > principal should have READ access on downstream replicated > > topics > > > > > >>> ("remote > > > > > >>> > topics"). > > > > > >>> > - Only MM2 has WRITE access on "remote topics". > > > > > >>> > > > > > > >>> > This covers sync from upstream topics like "topic1" to > > downstream > > > > > >>> remote > > > > > >>> > topics like "us-west.topic1". What's missing from the KIP, as > > you > > > > > point > > > > > >>> > out, is ACL sync between normal topics (non-remote). If a > > > consumer > > > > > has > > > > > >>> READ > > > > > >>> > access to topic1 in an upstream cluster, should it have READ > > > access > > > > > in > > > > > >>> > topic1 in a downstream cluster? > > > > > >>> > > > > > > >>> > I think the answer generally is no, you don't want to give > > > > principals > > > > > >>> > blanket permissions across all DCs automatically. For > example, > > > I've > > > > > >>> seen > > > > > >>> > scenarios where certain topics are replicated between an > > internal > > > > and > > > > > >>> > external Kafka cluster. You don't want to accidentally push > ACL > > > > > changes > > > > > >>> > across this boundary. > > > > > >>> > > > > > > >>> > Moreover, it's clear that MM2 "owns" downstream remote topics > > > like > > > > > >>> > "us-west.topic1" -- MM2 is the only producer and the only > admin > > > of > > > > > >>> these > > > > > >>> > topics -- so it's natural to have MM2 set the ACL for these > > > topics. > > > > > >>> But I > > > > > >>> > think it would be surprising if MM2 tried to manipulate > topics > > it > > > > > >>> doesn't > > > > > >>> > own. So I think granting permissions across DCs is probably > > > outside > > > > > >>> MM2's > > > > > >>> > purview, but I agree it'd be nice to have tooling to help > with > > > > this. > > > > > >>> > > > > > > >>> > Thanks. > > > > > >>> > Ryanne > > > > > >>> > > > > > > >>> > -- > > > > > >>> > www.ryannedolan.info > > > > > >>> > > > > > > >>> > > > > > > >>> > On Mon, Nov 19, 2018 at 3:58 PM daniel.loci...@gmail.com < > > > > > >>> > daniel.loci...@gmail.com> wrote: > > > > > >>> > > > > > > >>> >> Hi guys, > > > > > >>> >> > > > > > >>> >> This is an exciting topic. could I have a word here? > > > > > >>> >> I understand there are many scenarios that we can apply > > > > mirrormaker. > > > > > >>> I am > > > > > >>> >> at the moment working on active/active DC solution using > > > > > MirrorMaker; > > > > > >>> our > > > > > >>> >> goal is to allow the clients to failover to connect the > other > > > > kafka > > > > > >>> >> cluster (on the other DC) when an incident happens. > > > > > >>> >> > > > > > >>> >> To do this, I need: > > > > > >>> >> 1 MirrorMaker to replicate the partitioned messages in a > > > > sequential > > > > > >>> order > > > > > >>> >> (in timely fashion) to the same partition on the other > cluster > > > > (also > > > > > >>> need > > > > > >>> >> keep the promise that both clusters creates the same number > of > > > > > >>> partitions > > > > > >>> >> for a topic) – so that a consumer can pick up the right > order > > of > > > > the > > > > > >>> latest > > > > > >>> >> messages > > > > > >>> >> 2 MirrorMaker to replicate the local consumer offset to the > > > other > > > > > >>> side – > > > > > >>> >> so that the consumer knows where is the offset/ latest > > messages > > > > > >>> >> 3 MirrorMaker to provide cycle detection for messages across > > the > > > > > DCs. > > > > > >>> >> > > > > > >>> >> I can see the possibility for Remote Topic to solve all > these > > > > > >>> problems, > > > > > >>> >> as long as the consumer can see the remote topic equally as > > the > > > > > local > > > > > >>> >> topic, i.e. For a consumer which has a permission to consume > > > > topic1, > > > > > >>> on > > > > > >>> >> subscribe event it can automatically subscribe both > > > remote.topic1 > > > > > and > > > > > >>> >> local.topic1. First we need to find a way for topic ACL > > granting > > > > for > > > > > >>> the > > > > > >>> >> consumer across the DCs. Secondly the consumer need to be > able > > > to > > > > > >>> subscribe > > > > > >>> >> topics with wildcard or suffix. Last but not the least, the > > > > consumer > > > > > >>> has to > > > > > >>> >> deal with the timely ordering of the messages from the 2 > > topics. > > > > > >>> >> > > > > > >>> >> My understanding is, all of these should be configurable to > be > > > > > turned > > > > > >>> on > > > > > >>> >> or off, to fit for different use cases. > > > > > >>> >> > > > > > >>> >> Interesting I was going to support topic messages with extra > > > > headers > > > > > >>> of > > > > > >>> >> source DC info, for cycle detection….. > > > > > >>> >> > > > > > >>> >> Looking forward your reply. > > > > > >>> >> > > > > > >>> >> Regards, > > > > > >>> >> > > > > > >>> >> Dan > > > > > >>> >> On 2018/10/23 19:56:02, Ryanne Dolan <ryannedo...@gmail.com > > > > > > wrote: > > > > > >>> >> > Alex, thanks for the feedback. > > > > > >>> >> > > > > > > >>> >> > > Would it be possible to utilize the > > > > > >>> >> > > Message Headers feature to prevent infinite recursion > > > > > >>> >> > > > > > > >>> >> > This isn't necessary due to the topic renaming feature > which > > > > > already > > > > > >>> >> > prevents infinite recursion. > > > > > >>> >> > > > > > > >>> >> > If you turn off topic renaming you lose cycle detection, > so > > > > maybe > > > > > we > > > > > >>> >> could > > > > > >>> >> > provide message headers as an optional second mechanism. > I'm > > > not > > > > > >>> >> opposed to > > > > > >>> >> > that idea, but there are ways to improve efficiency if we > > > don't > > > > > >>> need to > > > > > >>> >> > modify or inspect individual records. > > > > > >>> >> > > > > > > >>> >> > Ryanne > > > > > >>> >> > > > > > > >>> >> > On Tue, Oct 23, 2018 at 6:06 AM Alex Mironov < > > > > > alexandr...@gmail.com > > > > > >>> > > > > > > >>> >> wrote: > > > > > >>> >> > > > > > > >>> >> > > Hey Ryanne, > > > > > >>> >> > > > > > > > >>> >> > > Awesome KIP, exited to see improvements in MirrorMaker > > > land, I > > > > > >>> >> particularly > > > > > >>> >> > > like the reuse of Connect framework! Would it be > possible > > to > > > > > >>> utilize > > > > > >>> >> the > > > > > >>> >> > > Message Headers feature to prevent infinite recursion? > For > > > > > >>> example, > > > > > >>> >> MM2 > > > > > >>> >> > > could stamp every message with a special header payload > > > (e.g. > > > > > >>> >> > > MM2="cluster-name-foo") so in case another MM2 instance > > sees > > > > > this > > > > > >>> >> message > > > > > >>> >> > > and it is configured to replicate data into > > > "cluster-name-foo" > > > > > it > > > > > >>> >> would > > > > > >>> >> > > just skip it instead of replicating it back. > > > > > >>> >> > > > > > > > >>> >> > > On Sat, Oct 20, 2018 at 5:48 AM Ryanne Dolan < > > > > > >>> ryannedo...@gmail.com> > > > > > >>> >> > > wrote: > > > > > >>> >> > > > > > > > >>> >> > > > Thanks Harsha. Done. > > > > > >>> >> > > > > > > > > >>> >> > > > On Fri, Oct 19, 2018 at 1:03 AM Harsha Chintalapani < > > > > > >>> >> ka...@harsha.io> > > > > > >>> >> > > > wrote: > > > > > >>> >> > > > > > > > > >>> >> > > > > Ryanne, > > > > > >>> >> > > > > Makes sense. Can you please add this under > > > rejected > > > > > >>> >> alternatives > > > > > >>> >> > > > so > > > > > >>> >> > > > > that everyone has context on why it wasn’t picked. > > > > > >>> >> > > > > > > > > > >>> >> > > > > Thanks, > > > > > >>> >> > > > > Harsha > > > > > >>> >> > > > > On Oct 18, 2018, 8:02 AM -0700, Ryanne Dolan < > > > > > >>> >> ryannedo...@gmail.com>, > > > > > >>> >> > > > > wrote: > > > > > >>> >> > > > > > > > > > >>> >> > > > > Harsha, concerning uReplicator specifically, the > > project > > > > is > > > > > a > > > > > >>> >> major > > > > > >>> >> > > > > inspiration for MM2, but I don't think it is a good > > > > > >>> foundation for > > > > > >>> >> > > > anything > > > > > >>> >> > > > > included in Apache Kafka. uReplicator uses Helix to > > > solve > > > > > >>> >> problems that > > > > > >>> >> > > > > Connect also solves, e.g. REST API, live > configuration > > > > > >>> changes, > > > > > >>> >> cluster > > > > > >>> >> > > > > management, coordination etc. This also means that > > > > existing > > > > > >>> >> tooling, > > > > > >>> >> > > > > dashboards etc that work with Connectors do not work > > > with > > > > > >>> >> uReplicator, > > > > > >>> >> > > > and > > > > > >>> >> > > > > any future tooling would need to treat uReplicator > as > > a > > > > > >>> special > > > > > >>> >> case. > > > > > >>> >> > > > > > > > > > >>> >> > > > > Ryanne > > > > > >>> >> > > > > > > > > > >>> >> > > > > On Wed, Oct 17, 2018 at 12:30 PM Ryanne Dolan < > > > > > >>> >> ryannedo...@gmail.com> > > > > > >>> >> > > > > wrote: > > > > > >>> >> > > > > > > > > > >>> >> > > > >> Harsha, yes I can do that. I'll update the KIP > > > > accordingly, > > > > > >>> >> thanks. > > > > > >>> >> > > > >> > > > > > >>> >> > > > >> Ryanne > > > > > >>> >> > > > >> > > > > > >>> >> > > > >> On Wed, Oct 17, 2018 at 12:18 PM Harsha < > > > ka...@harsha.io > > > > > > > > > > >>> wrote: > > > > > >>> >> > > > >> > > > > > >>> >> > > > >>> Hi Ryanne, > > > > > >>> >> > > > >>> Thanks for the KIP. I am also > curious > > > > about > > > > > >>> why > > > > > >>> >> not > > > > > >>> >> > > use > > > > > >>> >> > > > >>> the uReplicator design as the foundation given it > > > > alreadys > > > > > >>> >> resolves > > > > > >>> >> > > > some of > > > > > >>> >> > > > >>> the fundamental issues in current MIrrorMaker, > > > updating > > > > > the > > > > > >>> >> confifgs > > > > > >>> >> > > > on the > > > > > >>> >> > > > >>> fly and running the mirror maker agents in a > worker > > > > model > > > > > >>> which > > > > > >>> >> can > > > > > >>> >> > > > >>> deployed in mesos or container orchestrations. If > > > > > possible > > > > > >>> can > > > > > >>> >> you > > > > > >>> >> > > > >>> document in the rejected alternatives what are > > missing > > > > > parts > > > > > >>> >> that > > > > > >>> >> > > made > > > > > >>> >> > > > you > > > > > >>> >> > > > >>> to consider a new design from ground up. > > > > > >>> >> > > > >>> > > > > > >>> >> > > > >>> Thanks, > > > > > >>> >> > > > >>> Harsha > > > > > >>> >> > > > >>> > > > > > >>> >> > > > >>> On Wed, Oct 17, 2018, at 8:34 AM, Ryanne Dolan > > wrote: > > > > > >>> >> > > > >>> > Jan, these are two separate issues. > > > > > >>> >> > > > >>> > > > > > > >>> >> > > > >>> > 1) consumer coordination should not, ideally, > > > involve > > > > > >>> >> unreliable or > > > > > >>> >> > > > >>> slow > > > > > >>> >> > > > >>> > connections. Naively, a KafkaSourceConnector > would > > > > > >>> coordinate > > > > > >>> >> via > > > > > >>> >> > > the > > > > > >>> >> > > > >>> > source cluster. We can do better than this, but > > I'm > > > > > >>> deferring > > > > > >>> >> this > > > > > >>> >> > > > >>> > optimization for now. > > > > > >>> >> > > > >>> > > > > > > >>> >> > > > >>> > 2) exactly-once between two clusters is > > > mind-bending. > > > > > But > > > > > >>> >> keep in > > > > > >>> >> > > > mind > > > > > >>> >> > > > >>> that > > > > > >>> >> > > > >>> > transactions are managed by the producer, not > the > > > > > >>> consumer. In > > > > > >>> >> > > fact, > > > > > >>> >> > > > >>> it's > > > > > >>> >> > > > >>> > the producer that requests that offsets be > > committed > > > > for > > > > > >>> the > > > > > >>> >> > > current > > > > > >>> >> > > > >>> > transaction. Obviously, these offsets are > > committed > > > in > > > > > >>> >> whatever > > > > > >>> >> > > > >>> cluster the > > > > > >>> >> > > > >>> > producer is sending to. > > > > > >>> >> > > > >>> > > > > > > >>> >> > > > >>> > These two issues are closely related. They are > > both > > > > > >>> resolved > > > > > >>> >> by not > > > > > >>> >> > > > >>> > coordinating or committing via the source > cluster. > > > And > > > > > in > > > > > >>> >> fact, > > > > > >>> >> > > this > > > > > >>> >> > > > >>> is the > > > > > >>> >> > > > >>> > general model of SourceConnectors anyway, since > > most > > > > > >>> >> > > SourceConnectors > > > > > >>> >> > > > >>> > _only_ have a destination cluster. > > > > > >>> >> > > > >>> > > > > > > >>> >> > > > >>> > If there is a lot of interest here, I can > expound > > > > > further > > > > > >>> on > > > > > >>> >> this > > > > > >>> >> > > > >>> aspect of > > > > > >>> >> > > > >>> > MM2, but again I think this is premature until > > this > > > > > first > > > > > >>> KIP > > > > > >>> >> is > > > > > >>> >> > > > >>> approved. > > > > > >>> >> > > > >>> > I intend to address each of these in separate > KIPs > > > > > >>> following > > > > > >>> >> this > > > > > >>> >> > > > one. > > > > > >>> >> > > > >>> > > > > > > >>> >> > > > >>> > Ryanne > > > > > >>> >> > > > >>> > > > > > > >>> >> > > > >>> > On Wed, Oct 17, 2018 at 7:09 AM Jan Filipiak < > > > > > >>> >> > > > jan.filip...@trivago.com > > > > > >>> >> > > > >>> > > > > > > >>> >> > > > >>> > wrote: > > > > > >>> >> > > > >>> > > > > > > >>> >> > > > >>> > > This is not a performance optimisation. Its a > > > > > >>> fundamental > > > > > >>> >> design > > > > > >>> >> > > > >>> choice. > > > > > >>> >> > > > >>> > > > > > > > >>> >> > > > >>> > > > > > > > >>> >> > > > >>> > > I never really took a look how streams does > > > exactly > > > > > >>> once. > > > > > >>> >> (its a > > > > > >>> >> > > > trap > > > > > >>> >> > > > >>> > > anyways and you usually can deal with at least > > > once > > > > > >>> >> donwstream > > > > > >>> >> > > > pretty > > > > > >>> >> > > > >>> > > easy). But I am very certain its not gonna get > > > > > >>> somewhere if > > > > > >>> >> > > offset > > > > > >>> >> > > > >>> > > commit and record produce cluster are not the > > > same. > > > > > >>> >> > > > >>> > > > > > > > >>> >> > > > >>> > > Pretty sure without this _design choice_ you > can > > > > skip > > > > > on > > > > > >>> >> that > > > > > >>> >> > > > exactly > > > > > >>> >> > > > >>> > > once already > > > > > >>> >> > > > >>> > > > > > > > >>> >> > > > >>> > > Best Jan > > > > > >>> >> > > > >>> > > > > > > > >>> >> > > > >>> > > On 16.10.2018 18:16, Ryanne Dolan wrote: > > > > > >>> >> > > > >>> > > > > But one big obstacle in this was > > > > > >>> >> > > > >>> > > > always that group coordination happened on > the > > > > > source > > > > > >>> >> cluster. > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > Jan, thank you for bringing up this issue > with > > > > > legacy > > > > > >>> >> > > > MirrorMaker. > > > > > >>> >> > > > >>> I > > > > > >>> >> > > > >>> > > > totally agree with you. This is one of > several > > > > > >>> problems > > > > > >>> >> with > > > > > >>> >> > > > >>> MirrorMaker > > > > > >>> >> > > > >>> > > > I intend to solve in MM2, and I already > have a > > > > > design > > > > > >>> and > > > > > >>> >> > > > >>> prototype that > > > > > >>> >> > > > >>> > > > solves this and related issues. But as you > > > pointed > > > > > >>> out, > > > > > >>> >> this > > > > > >>> >> > > KIP > > > > > >>> >> > > > is > > > > > >>> >> > > > >>> > > > already rather complex, and I want to focus > on > > > the > > > > > >>> core > > > > > >>> >> feature > > > > > >>> >> > > > set > > > > > >>> >> > > > >>> > > > rather than performance optimizations for > now. > > > If > > > > we > > > > > >>> can > > > > > >>> >> agree > > > > > >>> >> > > on > > > > > >>> >> > > > >>> what > > > > > >>> >> > > > >>> > > > MM2 looks like, it will be very easy to > agree > > to > > > > > >>> improve > > > > > >>> >> its > > > > > >>> >> > > > >>> performance > > > > > >>> >> > > > >>> > > > and reliability. > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > That said, I look forward to your support > on a > > > > > >>> subsequent > > > > > >>> >> KIP > > > > > >>> >> > > > that > > > > > >>> >> > > > >>> > > > addresses consumer coordination and > rebalance > > > > > issues. > > > > > >>> Stay > > > > > >>> >> > > tuned! > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > Ryanne > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > On Tue, Oct 16, 2018 at 6:58 AM Jan > Filipiak < > > > > > >>> >> > > > >>> jan.filip...@trivago.com > > > > > >>> >> > > > >>> > > > <mailto:jan.filip...@trivago.com>> wrote: > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > Hi, > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > Currently MirrorMaker is usually run > > > > collocated > > > > > >>> with > > > > > >>> >> the > > > > > >>> >> > > > target > > > > > >>> >> > > > >>> > > > cluster. > > > > > >>> >> > > > >>> > > > This is all nice and good. But one big > > > > obstacle > > > > > in > > > > > >>> >> this was > > > > > >>> >> > > > >>> > > > always that group coordination happened > on > > > the > > > > > >>> source > > > > > >>> >> > > > cluster. > > > > > >>> >> > > > >>> So > > > > > >>> >> > > > >>> > > when > > > > > >>> >> > > > >>> > > > then network was congested, you > sometimes > > > > loose > > > > > >>> group > > > > > >>> >> > > > >>> membership and > > > > > >>> >> > > > >>> > > > have to rebalance and all this. > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > So one big request from we would be the > > > > support > > > > > of > > > > > >>> >> having > > > > > >>> >> > > > >>> > > coordination > > > > > >>> >> > > > >>> > > > cluster != source cluster. > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > I would generally say a LAN is better > > than a > > > > WAN > > > > > >>> for > > > > > >>> >> doing > > > > > >>> >> > > > >>> group > > > > > >>> >> > > > >>> > > > coordinaton and there is no reason we > > > couldn't > > > > > >>> have a > > > > > >>> >> group > > > > > >>> >> > > > >>> consuming > > > > > >>> >> > > > >>> > > > topics from a different cluster and > > > committing > > > > > >>> >> offsets to > > > > > >>> >> > > > >>> another > > > > > >>> >> > > > >>> > > > one right? > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > Other than that. It feels like the KIP > has > > > too > > > > > >>> much > > > > > >>> >> > > features > > > > > >>> >> > > > >>> where > > > > > >>> >> > > > >>> > > many > > > > > >>> >> > > > >>> > > > of them are not really wanted and > counter > > > > > >>> productive > > > > > >>> >> but I > > > > > >>> >> > > > >>> will just > > > > > >>> >> > > > >>> > > > wait and see how the discussion goes. > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > Best Jan > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > On 15.10.2018 18:16, Ryanne Dolan wrote: > > > > > >>> >> > > > >>> > > > > Hey y'all! > > > > > >>> >> > > > >>> > > > > > > > > > >>> >> > > > >>> > > > > Please take a look at KIP-382: > > > > > >>> >> > > > >>> > > > > > > > > > >>> >> > > > >>> > > > > > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > > > > > >>> >> > > > >>> > > > > > >>> >> > > > > > > > > >>> >> > > > > > > > >>> >> > > > > > >>> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0 > > > > > >>> >> > > > >>> > > > > > > > > > >>> >> > > > >>> > > > > Thanks for your feedback and support. > > > > > >>> >> > > > >>> > > > > > > > > > >>> >> > > > >>> > > > > Ryanne > > > > > >>> >> > > > >>> > > > > > > > > > >>> >> > > > >>> > > > > > > > > >>> >> > > > >>> > > > > > > > >>> >> > > > >>> > > > > > >>> >> > > > >> > > > > > >>> >> > > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> >> > > -- > > > > > >>> >> > > Best, > > > > > >>> >> > > Alex Mironov > > > > > >>> >> > > > > > > > >>> >> > > > > > > >>> >> > > > > > >>> > > > > > > >>> > > > > > >> > > > > > >> > > > > > >> -- > > > > > >> Sönke Liebau > > > > > >> Partner > > > > > >> Tel. +49 179 7940878 > > > > > >> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - > > > Germany > > > > > >> > > > > > > > > > > > > > > > > -- > > > > > Sönke Liebau > > > > > Partner > > > > > Tel. +49 179 7940878 > > > > > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - > Germany > > > > > > > > > > > > > > > > > > -- > > > Sönke Liebau > > > Partner > > > Tel. +49 179 7940878 > > > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany > > > > > > > > -- > Sönke Liebau > Partner > Tel. +49 179 7940878 > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany > The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Conduct Authority.