Hi Rajan, Thanks for the great proposal.
Will all the namespace policies be replicated to the remote cluster? I noticed the PIP title mentioned policies, but looks like from the `MetadataChangeEvent`, no namespaces policies defined. If it contains namespace policy replication, There are some policies no need to replicate to another cluster, for example, the rate limiter, max producers/consumers limiter. In https://github.com/apache/pulsar/wiki/PIP-92%3A-Topic-policy-across-multiple-clusters , it introduced a --global option to provide ability to apply the policy in global or local. The new partitioned topic also needs to be replicated to the remote cluster? Currently, we already have a PulsarEvent struct to define the pulsar system events, Looks like we can use a unified event definition by PulsarEvent. Others look good to me. Regards, Penghui On Sat, Mar 19, 2022 at 1:32 AM Joe F <joefranc...@gmail.com> wrote: > +1 > > On Thu, Mar 17, 2022 at 12:07 PM Rajan Dhabalia <rdhaba...@apache.org> > wrote: > > > Hi, > > > > I would like to start VOTE on PIP-136: > > https://github.com/apache/pulsar/issues/13728 > > > > Thanks, > > Rajan > > > > On Tue, Feb 8, 2022 at 4:58 PM Rajan Dhabalia <dhabalia...@gmail.com> > > wrote: > > > > > > > > >> How do we designate the host broker? Is it manual? How does it work > > > when the host broker is removed from the cluster? > > > No, it will not be manual but as I explained earlier a broker which > has a > > > failover consumer to consume remote events will be the publisher for > > > metadata update. If that broker is removed then a new failover > > > consumer/broker will be selected for the same. > > > > > > >> I look forward to seeing more about this design for conflict > > resolution. > > > Sure, I have updated PIP to handle such race condition: > > https://github.com/apache/pulsar/issues/13728 > > > > > > > > > >> (1) scenarios where the Pulsar cluster operators and tenant admins > > are > > > different entities and tenants can be malicious, or more probably, > write > > > bad code that will produce malicious outcomes. > > > I agree, Pulsar should have provision to prevent such scenarios where > > > changes from one tenant in a cluster can impact other clusters. This > PIP > > > considers the tenant/admin will be the same at both the ends but that > can > > > not be true in all cases. We can add an enhancement later or we can > > create > > > a separate PIP to start discussion with the possible solutions. > > > > > > Thanks, > > > Rajan > > > > > > On Thu, Feb 3, 2022 at 9:59 AM Joe F <joefranc...@gmail.com> wrote: > > > > > >> >On my first reading, it wasn't clear if there was only one topic > > >> required for this feature. I now see that the topic is not tied to a > > >> specific tenant or namespace. As such, we can avoid complicated > > >> authorization questions by putting the required event topic(s) into a > > >> "system" tenant and namespace > > >> > > >> We should consider complicated questions. We can say why we chose not > to > > >> address it, or why it does not apply. for a particular situation > > >> > > >> Many namespace policies are administered by tenants. As such any > tenant > > >> can load this topic. Is it possible for one abusive tenant to make > your > > >> system topic dysfunctional? > > >> > > >> Pulsar committers should think about > > >> (1) scenarios where the Pulsar cluster operators and tenant admins > are > > >> different entities and tenants can be malicious, or more probably, > write > > >> bad code that will produce malicious outcomes. > > >> (2) whether the changes introduce additional SPOFs into the cluster. > > >> > > >> I don't think this PIP has those issues, but as a matter of > practice, I > > >> would like to see backend/system PIPs consider these questions and > > >> explicitly state the conclusions with rationale > > >> > > >> Joe > > >> > > >> > > >> On Wed, Feb 2, 2022 at 9:27 PM Michael Marshall <mmarsh...@apache.org > > > > >> wrote: > > >> > > >> > Thanks for your responses. > > >> > > > >> > > I don't see a need of protobuf for this particular usecase > > >> > > > >> > If no one else feels strongly on this point, I am good with using a > > >> POJO. > > >> > > > >> > > It doesn't matter if it's system-topic or not because it's > > >> > > configurable and the admin of the system can decide and configure > it > > >> > > according to the required persistent policy. > > >> > > > >> > On my first reading, it wasn't clear if there was only one topic > > >> > required for this feature. I now see that the topic is not tied to a > > >> > specific tenant or namespace. As such, we can avoid complicated > > >> > authorization questions by putting the required event topic(s) into > a > > >> > "system" tenant and namespace, by default. The `pulsar/system` > tenant > > >> > and namespace seem appropriate to me. > > >> > > > >> > > I would keep the system topic > > >> > > separate because this topic serves a specific purpose with > specific > > >> > schema, > > >> > > replication policy and retention policy. > > >> > > > >> > I think we need a more formal definition for system topics. This > topic > > >> > is exactly the kind of topic I would call a system topic: its > intended > > >> > producers and consumers are Pulsar components. However, because > > >> > this feature can live on a topic in a system namespace, we can avoid > > >> > the classification discussion for this PIP. > > >> > > > >> > > Source region will have a broker which will create a failover > > >> consumer on > > >> > > that topic and a broker with an active consumer will watch the > > >> metadata > > >> > > changes and publish the changes to the event topic. > > >> > > > >> > How do we designate the host broker? Is it manual? How does it work > > >> > when the host broker is removed from the cluster? > > >> > > > >> > If we collocate the active consumer with the broker hosting the > event > > >> > topic, can we skip creating the failover consumer? > > >> > > > >> > > PIP briefly talks about it but I will update the PIP with more > > >> > > explanation. > > >> > > > >> > I look forward to seeing more about this design for conflict > > resolution. > > >> > > > >> > Thanks, > > >> > Michael > > >> > > > >> > > > >> > > > >> > On Tue, Feb 1, 2022 at 3:01 AM Rajan Dhabalia < > dhabalia...@gmail.com> > > >> > wrote: > > >> > > > > >> > > Please find my response inline. > > >> > > > > >> > > On Mon, Jan 31, 2022 at 9:17 PM Michael Marshall < > > >> mmarsh...@apache.org> > > >> > > wrote: > > >> > > > > >> > > > I think this is a very appropriate direction to take Pulsar's > > >> > > > geo-replication. Your proposal is essentially to make the > > >> > > > inter-cluster configuration event driven. This increases fault > > >> > > > tolerance and better decouples clusters. > > >> > > > > > >> > > > Thank you for your detailed proposal. After reading through it, > I > > >> have > > >> > > > some questions :) > > >> > > > > > >> > > > 1. What do you think about using protobuf to define the event > > >> > > > protocol? I know we already have a topic policy event stream > > >> > > > defined with Java POJOs, but since this feature is specifically > > >> > > > designed for egressing cloud providers, ensuring compact data > > >> transfer > > >> > > > would keep egress costs down. Additionally, protobuf can help > make > > >> it > > >> > > > clear that the schema is strict, should evolve thoughtfully, and > > >> > > > should be designed to work between clusters of different > versions. > > >> > > > > > >> > > > > >> > > >>> I don't see a need of protobuf for this particular usecase > > >> because > > >> > of > > >> > > two reasons: > > >> > > >> a. policy changes don't generate huge traffic which could be > 1 > > >> rps > > >> > b. > > >> > > and it doesn't need performance optimization. > > >> > > >> It should be similar as storing policy in text instead > protobuf > > >> > which > > >> > > doesn't impact footprint size or performance due to limited number > > of > > >> > > >> update operations and relatively less complexity. I agree that > > >> > protobuf > > >> > > could be another option but in this case it's not needed. Also, > POJO > > >> > > >> can also support schema and versioning. > > >> > > > > >> > > > > >> > > > > >> > > > > > >> > > > 2. In your view, which tenant/namespace will host > > >> > > > `metadataSyncEventTopic`? Will there be several of these topics > or > > >> is > > >> > > > it just hosted in a system tenant/namespace? This question gets > > back > > >> > > > to my questions about system topics on this mailing list last > week > > >> > [0]. I > > >> > > > view this topic as a system topic, so we'd need to make sure > that > > it > > >> > > > has the right authorization rules and that it won't be affected > by > > >> > calls > > >> > > > like "clearNamespaceBacklog". > > >> > > > > >> > > > > >> > > >> It doesn't matter if it's system-topic or not because it's > > >> > > configurable and the admin of the system can decide and configure > it > > >> > > according to the required persistent policy. I would keep the > system > > >> > topic > > >> > > separate because this topic serves a specific purpose with > specific > > >> > schema, > > >> > > replication policy and retention policy. > > >> > > > > >> > > > > >> > > > > >> > > > > > >> > > > 3. Which broker will host the metadata update publisher? I > assume > > we > > >> > > > want the producer to be collocated with the bundle that hosts > the > > >> > > > event topic. How will this be coordinated? > > >> > > > > > >> > > >> It's already explained into PIP in section: "Event publisher > and > > >> > handler" > > >> > > >> Every isolated cluster deployed on a separate cloud platform > will > > >> > have a > > >> > > source region and part of replicated clusters for the event topic. > > The > > >> > > Source region will have a broker which will create a failover > > >> consumer on > > >> > > that topic and a broker with an active consumer will watch the > > >> metadata > > >> > > changes and publish the changes to the event topic. > > >> > > > > >> > > > > >> > > > > >> > > > > > >> > > > 4. Why isn't a topic a `ResourceType`? Is this because the topic > > >> level > > >> > > > policies already have this feature? If so, is there a way to > > >> integrate > > >> > > > this feature with the existing topic policy feature? > > >> > > > > > >> > > >> Yes, ResourceType can be extensible to a topic as well. > > >> > > > > >> > > > > >> > > > > >> > > > > > >> > > > 5. By decentralizing the metadata store, it looks like there is > a > > >> > > > chance for conflicts due to concurrent updates. How do we handle > > >> those > > >> > > > conflicts? > > >> > > > > > >> > > >> PIP briefly talks about it but I will update the PIP with more > > >> > > explanation. MetadataChangeEvent contains source-cluster and > updated > > >> > time. > > >> > > Also, resources Tenant/Namespace will also contain lastUpdatedTime > > >> which > > >> > > will help to destination clusters to handle stale/duplicate events > > and > > >> > race > > >> > > conditions. Also, snapshot-sync an additional task helps all > > clusters > > >> to > > >> > be > > >> > > synced with each other eventually. > > >> > > > > >> > > > > >> > > > > >> > > > I'll also note that I previously proposed a system event topic > > here > > >> > > > [1] and it was proposed again here [2]. Those features were for > > >> > > > different use cases, but ultimately looked very similar. In my > > >> view, a > > >> > > > stream of system events is a very natural feature to expect in a > > >> > > > streaming technology. I wonder if there is a way to generalize > > this > > >> > > > feature to fulfill local cluster consumers and geo-replication > > >> > > > consumers. Even if this PIP only implements the geo-replication > > >> > > > portion of the feature, it'd be good to design it in an > extensible > > >> > fashion. > > >> > > > > > >> > > >> I think answer (2) addresses this concern as well. > > >> > > > > >> > > > > >> > > > > >> > > > Thanks, > > >> > > > Michael > > >> > > > > > >> > > > [0] > > >> https://lists.apache.org/thread/pj4n4wzm3do8nkc52l7g7obh0sktzm17 > > >> > > > [1] > > >> https://lists.apache.org/thread/h4cbvwjdomktsq2jo66x5qpvhdrqk871 > > >> > > > [2] > > >> https://lists.apache.org/thread/0xkg0gpsobp0dbgb6tp9xq097lpm65bx > > >> > > > > > >> > > > > > >> > > > > > >> > > > On Sun, Jan 30, 2022 at 10:33 PM Rajan Dhabalia < > > >> rdhaba...@apache.org> > > >> > > > wrote: > > >> > > > > > > >> > > > > Hi, > > >> > > > > > > >> > > > > I would like to start a discussion about PIP-136: Sync Pulsar > > >> > policies > > >> > > > > across multiple clouds. > > >> > > > > > > >> > > > > PIP documentation: > > https://github.com/apache/pulsar/issues/13728 > > >> > > > > > > >> > > > > *Motivation* > > >> > > > > Apache Pulsar is a cloud-native, distributed messaging > framework > > >> > which > > >> > > > > natively provides geo-replication. Many organizations deploy > > >> pulsar > > >> > > > > instances on-prem and on multiple different cloud providers > and > > at > > >> > the > > >> > > > same > > >> > > > > time they would like to enable replication between multiple > > >> clusters > > >> > > > > deployed in different cloud providers. Pulsar already provides > > >> > various > > >> > > > > proxy options (Pulsar proxy/ enterprise proxy solutions on > SNI) > > to > > >> > > > fulfill > > >> > > > > security requirements when brokers are deployed on different > > >> security > > >> > > > zones > > >> > > > > connected with each other. However, sometimes it's not > possible > > to > > >> > share > > >> > > > > metadata-store (global zookeeper) between pulsar clusters > > >> deployed on > > >> > > > > separate cloud provider platforms, and synchronizing > > configuration > > >> > > > metadata > > >> > > > > (policies) can be a critical path to share > > tenant/namespace/topic > > >> > > > policies > > >> > > > > between clusters and administrate pulsar policies uniformly > > across > > >> > all > > >> > > > > clusters. Therefore, we need a mechanism to sync configuration > > >> > metadata > > >> > > > > between clusters deployed on the different cloud platforms. > > >> > > > > > > >> > > > > *Sync Pulsar policies across multiple clouds* > > >> > > > > https://github.com/apache/pulsar/issues/13728 > > >> > > > > Prototype git-hub-link > > >> > > > > < > > >> > > > > > >> > > > >> > > > https://github.com/rdhabalia/pulsar/commit/e59803b942918076ce6376b50b35ca827a49bcf6 > > >> > > > > > > >> > > > > Thanks, > > >> > > > > Rajan > > >> > > > > > >> > > > >> > > > > > >