Please find my response inline.

On Mon, Jan 31, 2022 at 9:17 PM Michael Marshall <mmarsh...@apache.org>
wrote:

> I think this is a very appropriate direction to take Pulsar's
> geo-replication. Your proposal is essentially to make the
> inter-cluster configuration event driven. This increases fault
> tolerance and better decouples clusters.
>
> Thank you for your detailed proposal. After reading through it, I have
> some questions :)
>
> 1. What do you think about using protobuf to define the event
> protocol? I know we already have a topic policy event stream
> defined with Java POJOs, but since this feature is specifically
> designed for egressing cloud providers, ensuring compact data transfer
> would keep egress costs down. Additionally, protobuf can help make it
> clear that the schema is strict, should evolve thoughtfully, and
> should be designed to work between clusters of different versions.
>

 >>> I don't see a need of protobuf for this particular usecase because of
two reasons:
  >> a. policy changes don't generate huge traffic which could be 1 rps b.
and it doesn't need performance optimization.
  >> It should be similar as storing policy in text instead protobuf which
doesn't impact footprint size or performance due to limited number of
 >> update operations and relatively less complexity. I agree that protobuf
could be another option but in this case it's not needed. Also, POJO
 >> can also support schema and versioning.



>
> 2. In your view, which tenant/namespace will host
> `metadataSyncEventTopic`? Will there be several of these topics or is
> it just hosted in a system tenant/namespace? This question gets back
> to my questions about system topics on this mailing list last week [0]. I
> view this topic as a system topic, so we'd need to make sure that it
> has the right authorization rules and that it won't be affected by calls
> like "clearNamespaceBacklog".


  >> It doesn't matter if it's system-topic or not because it's
configurable and the admin of the system can decide and configure it
according to the required persistent policy. I would keep the system topic
separate because this topic serves a specific purpose with specific schema,
replication policy and retention policy.



>
> 3. Which broker will host the metadata update publisher? I assume we
> want the producer to be collocated with the bundle that hosts the
> event topic. How will this be coordinated?
>
>> It's already explained into PIP in section: "Event publisher and handler"
>> Every isolated cluster deployed on a separate cloud platform will have a
source region and part of replicated clusters for the event topic. The
Source region will have a broker which will create a failover consumer on
that topic and a broker with an active consumer will watch the metadata
changes and publish the changes to the event topic.



>
> 4. Why isn't a topic a `ResourceType`? Is this because the topic level
> policies already have this feature? If so, is there a way to integrate
> this feature with the existing topic policy feature?
>
>> Yes, ResourceType can be extensible to a topic as well.



>
> 5. By decentralizing the metadata store, it looks like there is a
> chance for conflicts due to concurrent updates. How do we handle those
> conflicts?
>
>>  PIP briefly talks about it but I will update the PIP with more
explanation. MetadataChangeEvent contains source-cluster and updated time.
Also, resources Tenant/Namespace will also contain lastUpdatedTime which
will help to destination clusters to handle stale/duplicate events and race
conditions. Also, snapshot-sync an additional task helps all clusters to be
synced with each other eventually.



> I'll also note that I previously proposed a system event topic here
> [1] and it was proposed again here [2]. Those features were for
> different use cases, but ultimately looked very similar. In my view, a
> stream of system events is a very natural feature to expect in a
> streaming technology. I wonder if there is a way to generalize this
> feature to fulfill local cluster consumers and geo-replication
> consumers. Even if this PIP only implements the geo-replication
> portion of the feature, it'd be good to design it in an extensible fashion.
>
 >> I think answer (2) addresses this concern as well.



> Thanks,
> Michael
>
> [0] https://lists.apache.org/thread/pj4n4wzm3do8nkc52l7g7obh0sktzm17
> [1] https://lists.apache.org/thread/h4cbvwjdomktsq2jo66x5qpvhdrqk871
> [2] https://lists.apache.org/thread/0xkg0gpsobp0dbgb6tp9xq097lpm65bx
>
>
>
> On Sun, Jan 30, 2022 at 10:33 PM Rajan Dhabalia <rdhaba...@apache.org>
> wrote:
> >
> > Hi,
> >
> > I would like to start a discussion about PIP-136: Sync Pulsar policies
> > across multiple clouds.
> >
> > PIP documentation: https://github.com/apache/pulsar/issues/13728
> >
> > *Motivation*
> > Apache Pulsar is a cloud-native, distributed messaging framework which
> > natively provides geo-replication. Many organizations deploy pulsar
> > instances on-prem and on multiple different cloud providers and at the
> same
> > time they would like to enable replication between multiple clusters
> > deployed in different cloud providers. Pulsar already provides various
> > proxy options (Pulsar proxy/ enterprise proxy solutions on SNI) to
> fulfill
> > security requirements when brokers are deployed on different security
> zones
> > connected with each other. However, sometimes it's not possible to share
> > metadata-store (global zookeeper) between pulsar clusters deployed on
> > separate cloud provider platforms, and synchronizing configuration
> metadata
> > (policies) can be a critical path to share tenant/namespace/topic
> policies
> > between clusters and administrate pulsar policies uniformly across all
> > clusters. Therefore, we need a mechanism to sync configuration metadata
> > between clusters deployed on the different cloud platforms.
> >
> > *Sync Pulsar policies across multiple clouds*
> > https://github.com/apache/pulsar/issues/13728
> > Prototype git-hub-link
> > <
> https://github.com/rdhabalia/pulsar/commit/e59803b942918076ce6376b50b35ca827a49bcf6
> >
> > Thanks,
> > Rajan
>

Reply via email to