Fwd: [Discuss] Pulsar Release Notes Architecture

2022-01-11 Thread Yu
Hi Pulsarers,

To move [PIP 112: Generate Release Notes Automatically] forward, we need to
define the Pulsar Release Notes Architecture, which only includes major
changes (important features/enhancements/bug fixes) in list form rather
than a raw dump of PRs.

Here [2] is the draft, feel free to comment within 72 hours. If no
more suggestions, we'll implement the design as described in this draft,
 thanks!

[1]
https://docs.google.com/document/d/1Ul2qIChDe8QDlDwJBICq1VviYZhdk1djKJJC5wXAGsI/edit
[2]
https://docs.google.com/document/d/1tptgyTPD9AUrKBYS09yVeqm9VfTffy6oGkd9KvlMMPg/edit#

BR,
Anonymitaet


Re: [DISCUSSION] PIP-124: Create init subscription before sending message to DLQ

2022-01-11 Thread Matteo Merli
> If we want to hold that the DLQ is not a normal topic, then I can see
> why we would have a DLQ specific feature here.

I think that, good or bad, the impression that users have that the DLQ
is not a "normal" topic comes from 2 factors:
 1. The experience with traditional messaging systems JMS and others
where the DLQ are handled in slightly different ways, compared to
other topics
 2. The name "DLQ" which in a way it's implying a "queue"... which can
be implemented on topic, using a subscription..


Re: [DISCUSSION] PIP-124: Create init subscription before sending message to DLQ

2022-01-11 Thread Michael Marshall
> It looks like a feature that supports retaining data while no subscriptions.

Yes, that is my proposed feature. How we handle messages on a topic
with an empty set of subscriptions is a design decision.

Note that when there are no subscriptions for a topic, the following two
statements are both true (in a set theoretic sense):

1. All messages are acknowledged for all subscriptions.
2. No messages are acknowledged for all subscriptions.

Pulsar's current design only uses option 1. I propose that we make it
possible to use option 2. (Option 2 would solve the DLQ concerns here.)

> so it looks like only guarantee the
> first subscription can
> retrieve all the data

Yes, that is true. However, it is also true in this DLQ PIP, since the
current design only creates a single subscription. I think the important
nuance is that we're deciding how to handle a topic with no
subscriptions.

> they should create the topic and subscription manually or use the
> consumer to trigger the topic auto-creation, not the producer.

When producers create arbitrary topics, this design forces
the producer to create subscriptions, which is the same design for
this PIP. I think we should avoid producers creating subscriptions.

> It is not easy to determine consumer behavior on the producer side. But for
> DLQ, it's not a normal topic from the user's point of view

If we want to hold that the DLQ is not a normal topic, then I can see
why we would have a DLQ specific feature here.

Thanks,
Michael



On Sun, Jan 9, 2022 at 10:20 PM PengHui Li  wrote:
>
> > I think we should consider adding a new policy for Pulsar topics: a
> namespace (or topic) policy that makes it possible to retain messages
> indefinitely when a topic has no subscriptions.
>
> It looks like a feature that supports retaining data while no subscriptions.
> With infinite data retention, the data will not be removed after all the
> subscriptions
> acked the message. But with “retain_data_no_subscriptions”, the data will
> be removed
> after all the subscriptions acked messages. But for the subsequent
> subscriptions,
> still can't retrieve all the data, so it looks like only guarantee the
> first subscription can
> retrieve all the data. If users want to guarantee all the subscriptions
> (all the existing and will create subscriptions),
> that is equivalent to infinite data retention.
>
> For the auto-created topic, the subscription can only be determined at the
> time of creation. It may or may not create. If users are able to determine
> which consumers are,
> and these consumers need to receive any message sent by the producer, they
> should
> create the topic and subscription manually or use the consumer to trigger
> the topic
> auto-creation, not the producer.
>
> It is not easy to determine consumer behavior on the producer side. But for
> DLQ,
> it's not a normal topic from the user's point of view, it's a local
> container for a subscription
> to store the messages that the consumer can't process.
> It's a "consumer determine consumer behavior", I think this is the most
> essential difference.
>
> Regards,
> Penghui
>
> On Sat, Jan 8, 2022 at 12:34 PM Michael Marshall 
> wrote:
>
> > Thanks for your response, Penghui.
> >
> > I support simplifying message loss prevention for DLQ topics. However,
> > it's not clear to me why we should only simplify it for DLQ topics.
> >
> > As a Pulsar user, I encountered many of the challenges you mention
> > when producing to auto created topics. In my architecture, I had
> > consumers reading from an input topic, transforming the data, and then
> > producing to an arbitrary number of output topics. My business logic
> > required that I not lose any messages, which is essentially the same
> > expectation from DLQ users here. I ended up increasing the retention
> > policy to about 4 hours on the output topics to minimize the possibility
> > of losing data. I had to scale up my bookkeeper cluster because of the
> > extra retention. If I had been able to ensure my auto created topic
> > would not delete messages before I created my subscriptions, I would
> > have had no retention policy and a smaller bookie cluster.
> >
> > > Yes, essentially, the DLQ is only a topic, no other specific behaviors.
> > > But the issue that the proposal wants to resolve is not to introduce a
> > > specific behavior for the DLQ topic or something
> >
> > I'm not sure this statement aligns with the PIP. It seems to me that
> > the PIP proposes solving the message loss issues by adding a DLQ
> > specific feature to the pulsar client.
> >
> > Earlier, I proposed expanding the CreateProducer command to be able to
> > create a subscription. This solution is not right: it tightly couples
> > producers and consumers, which we want to avoid.
> >
> > I think we should consider adding a new policy for Pulsar topics: a
> > namespace (or topic) policy that makes it possible to retain messages
> > indefinitely when a topic has no subscriptions.
> >
> > Our 

Re: [VOTE] PIP-122: Change loadBalancer default loadSheddingStrategy to ThresholdShedder

2022-01-11 Thread Michael Marshall
+1 - assuming we ensure that the `ThresholdShedder` has unit test coverage.

Thanks,
Michael


On Tue, Jan 11, 2022 at 9:53 PM r...@apache.org  
wrote:
>
> +1 (non-binding)
>
> --
>
> Thanks
> Xiaolong Ran
>
> Haiting Jiang  于2022年1月12日周三 09:52写道:
>
> > +1
> >
> > On 2022/01/10 06:47:44 Hang Chen wrote:
> > > This is the voting thread for PIP-122. It will stay open for at least 48
> > > hours.
> > >
> > > https://github.com/apache/pulsar/issues/13340
> > >
> > > Pasted below for quoting convenience.
> > >
> > > 
> > >
> > > ### Motivation
> > > The ThresholdShedder load balance policy since Pulsar 2.6.0 by
> > > https://github.com/apache/pulsar/pull/6772. It can resolve many load
> > > balance issues of `OverloadShedder` and works well in many Pulsar
> > > production clusters.
> > >
> > > In Pulsar 2.6.0, 2.7.0, 2.8.0 and 2.9.0, Pulsar's default load balance
> > > policy is `OverloadShedder`.
> > >
> > > I think it's a good time for 2.10 to change default load balance
> > > policy to `ThresholdShedder`, it will make throughput more balance
> > > between brokers.
> > >
> > > ### Proposed Changes
> > > In 2.10 release,for `broker.conf`, change
> > > `loadBalancerLoadSheddingStrategy` from
> > > `org.apache.pulsar.broker.loadbalance.impl.OverloadShedder` to
> > > `org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
> > >
> >


Re: [VOTE] PIP-122: Change loadBalancer default loadSheddingStrategy to ThresholdShedder

2022-01-11 Thread r...@apache.org
+1 (non-binding)

--

Thanks
Xiaolong Ran

Haiting Jiang  于2022年1月12日周三 09:52写道:

> +1
>
> On 2022/01/10 06:47:44 Hang Chen wrote:
> > This is the voting thread for PIP-122. It will stay open for at least 48
> > hours.
> >
> > https://github.com/apache/pulsar/issues/13340
> >
> > Pasted below for quoting convenience.
> >
> > 
> >
> > ### Motivation
> > The ThresholdShedder load balance policy since Pulsar 2.6.0 by
> > https://github.com/apache/pulsar/pull/6772. It can resolve many load
> > balance issues of `OverloadShedder` and works well in many Pulsar
> > production clusters.
> >
> > In Pulsar 2.6.0, 2.7.0, 2.8.0 and 2.9.0, Pulsar's default load balance
> > policy is `OverloadShedder`.
> >
> > I think it's a good time for 2.10 to change default load balance
> > policy to `ThresholdShedder`, it will make throughput more balance
> > between brokers.
> >
> > ### Proposed Changes
> > In 2.10 release,for `broker.conf`, change
> > `loadBalancerLoadSheddingStrategy` from
> > `org.apache.pulsar.broker.loadbalance.impl.OverloadShedder` to
> > `org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
> >
>


Re: [VOTE] PIP-121: Pulsar cluster level auto failover on client side

2022-01-11 Thread r...@apache.org
+1 (non-binding)

--
Thanks
Xiaolong Ran

Zike Yang  于2022年1月12日周三 09:58写道:

> +1 (non-binding)
>
> On Wed, Jan 12, 2022 at 9:52 AM Haiting Jiang 
> wrote:
> >
> > +1
> >
> > On 2022/01/12 00:09:26 Matteo Merli wrote:
> > > +1
> > > --
> > > Matteo Merli
> > > 
> > >
> > > On Tue, Jan 11, 2022 at 12:07 PM Neng Lu  wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > On Mon, Jan 10, 2022 at 12:40 AM PengHui Li 
> wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Penghui
> > > > >
> > > > > On Mon, Jan 10, 2022 at 4:38 PM Enrico Olivelli <
> eolive...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > Enrico
> > > > > >
> > > > > > Il giorno lun 10 gen 2022 alle ore 07:45 Hang Chen
> > > > > >  ha scritto:
> > > > > > >
> > > > > > > This is the voting thread for PIP-121. It will stay open for
> at least
> > > > > 48
> > > > > > > hours.
> > > > > > >
> > > > > > > https://github.com/apache/pulsar/issues/13315
> > > > > > >
> > > > > > > Pasted below for quoting convenience.
> > > > > > >
> > > > > > > -
> > > > > > > ### Motivation
> > > > > > > We have geo-replication to support Pulsar cluster level
> failover. We
> > > > > > > can set up Pulsar cluster A as a primary cluster in data
> center A, and
> > > > > > > setup Pulsar cluster B as backup cluster in data center B.
> Then we
> > > > > > > configure geo-replication between cluster A and cluster B. All
> the
> > > > > > > clients are connected to the Pulsar cluster by DNS. If cluster
> A is
> > > > > > > down, we should switch the DNS to point the target Pulsar
> cluster from
> > > > > > > cluster A to cluster B. After the clients are resolved to
> cluster B,
> > > > > > > they can produce and consume messages normally. After cluster A
> > > > > > > recovers, the administrator should switch the DNS back to
> cluster A.
> > > > > > >
> > > > > > > However, the current method has two shortcomings.
> > > > > > > 1. The administrator should monitor the status of all Pulsar
> clusters,
> > > > > > > and switch the DNS as soon as possible when cluster A is down.
> The
> > > > > > > switch and recovery is not automatic and recovery time is
> controlled
> > > > > > > by the administrator, which will put the administrator under
> heavy
> > > > > > > load.
> > > > > > > 2. The Pulsar client and DNS system have a cache. When the
> > > > > > > administrator switches the DNS from cluster A to Cluster B, it
> will
> > > > > > > take some time for cache trigger timeout, which will delay
> client
> > > > > > > recovery time and lead to the product/consumer message failing.
> > > > > > >
> > > > > > > ### Goal
> > > > > > > It's better to provide an automatic cluster level failure
> recovery
> > > > > > > mechanism to make pulsar cluster failover more effective. We
> should
> > > > > > > support pulsar clients auto switching from cluster A to
> cluster B when
> > > > > > > it detects cluster A has been down according to the configured
> > > > > > > detecting policy and switch back to cluster A when it has
> recovered.
> > > > > > > The reason why we should switch back to cluster A is that most
> > > > > > > applications may be deployed in data center A and they have low
> > > > > > > network cost for communicating with pulsar cluster A. If they
> keep
> > > > > > > visiting pulsar cluster B, they have high network cost, and
> cause high
> > > > > > > produce/consume latency.
> > > > > > >
> > > > > > > In order to improve the DNS cache problem, we should provide an
> > > > > > > administrator controlled switch provider for administrators to
> update
> > > > > > > service URLs.
> > > > > > >
> > > > > > > In the end, we should provide an auto service URL switch
> provider and
> > > > > > > administrator controlled switch provider.
> > > > > > >
> > > > > > > ### Design
> > > > > > > We have already provided the `ServiceUrlProvider` interface to
> support
> > > > > > > different service URLs. In order to support automatic cluster
> level
> > > > > > > failure auto recovery, we can provide different
> ServiceUrlProvider
> > > > > > > implementations. For current requirements, we can provide
> > > > > > > `AutoClusterFailover` and `ControlledClusterFailover`.
> > > > > > >
> > > > > > >  AutoClusterFailover
> > > > > > > In order to support auto switching from the primary cluster to
> the
> > > > > > > secondary, we can provide a probe task, which will probe the
> activity
> > > > > > > of the primary cluster and the secondary one. When it finds the
> > > > > > > primary cluster failed more than `failoverDelayMs`, it will
> switch to
> > > > > > > the secondary cluster by calling `updateServiceUrl`. After
> switching
> > > > > > > to the secondary cluster, the `AutoClusterFailover` will
> continue to
> > > > > > > probe the primary cluster. If the primary cluster comes back
> and
> > > > > > > remains active for `switchBackDelayMs`, it will switch back to
> the
> > > > > > > primary cluster.
> > > > > > > The APIs are 

Re: [VOTE] PIP-117: Change Pulsar standalone defaults

2022-01-11 Thread r...@apache.org
+1 (non-binding)

--
Thanks
Xiaolong Ran

Neng Lu  于2022年1月12日周三 04:45写道:

> +1 (non-binding)
>
> On Wed, Jan 5, 2022 at 7:19 AM Lan Liang  wrote:
>
> > +1
> >
> >
> >
> >
> >
> >
> > Best Regards,
> > Lan Liang
> > On 12/23/2021 19:21,Haiting Jiang wrote:
> > +1
> >
> > Thanks,
> > Haiting
> >
> > On 2021/12/23 05:35:03 Michael Marshall wrote:
> > +1
> >
> > - Michael
> >
> > On Wed, Dec 22, 2021 at 6:18 PM Sijie Guo  wrote:
> >
> > +1
> >
> > On Tue, Dec 21, 2021 at 3:49 PM Matteo Merli  wrote:
> >
> > This is the voting thread for PIP-117. It will stay open for at least
> 48h.
> >
> > https://github.com/apache/pulsar/issues/13302
> >
> > 
> >
> > ## Motivation
> >
> > Pulsar standalone is the "Pulsar in a box" version of a Pulsar cluster,
> > where
> > all the components are started within the context of a single JVM
> process.
> >
> > Users are using the standalone as a way to get quickly started with
> Pulsar
> > or
> > in all the cases where it makes sense to have a single node deployment.
> >
> > Right now, the standalone is starting by default with many components,
> > several of
> > which are quite complex, since they are designed to be deployed in a
> > distributed
> > fashion.
> >
> > ## Goal
> >
> > Simplify the components of Pulsar standalone to achieve:
> >
> > 1. Reduce complexity
> > 2. Reduce startup time
> > 3. Reduce memory and CPU footprint of running standalone
> >
> > ## Proposed changes
> >
> > The proposal here is to change some of the default implementations that
> are
> > used for the Pulsar standalone.
> >
> > 1. **Metadata Store implementation** -->
> > Change from ZooKeeper to RocksDB
> >
> > 2. **Pulsar functions package backend** -->
> > Change from using DistributedLog to using local filesystem, storing
> > the
> > jars directly in the data folder instead of uploading them into BK.
> >
> > 3. **Pulsar functions state store implementation** -->
> > Change the state store to be backed by a MetadataStore based backed,
> > with the RocksDB implementation.
> >
> > 4. **Table Service** -->
> > Do not start BK table service by default
> >
> > ## Compatibility considerations
> >
> > In order to avoid compatibility issues where users have existing Pulsar
> > standalone services that they want to upgrade without conflicts, we will
> > follow the principle of keeping the old defaults where there is existing
> > data on the disk.
> >
> > We will add a file, serving the purpose as a flag, in the
> `data/standalone`
> > directory, for example `new-2.10-defaults`.
> >
> > If the file is present, or if the data directory is completely missing,
> we
> > will adopt the new set of default configuration settings.
> >
> > If the file is not there, we will continue to use existing defaults and
> we
> > will
> > not break the upgrade operation.
> >
> >
> > --
> > Matteo Merli
> > 
> >
> >
> >
>
> --
> Best Regards,
> Neng
>


Re: [VOTE] PIP-132: Include message header size when check maxMessageSize of non-batch message on the client side.

2022-01-11 Thread r...@apache.org
+1 (non-binding)

--
Thanks
Xiaolong Ran

mattison chao  于2022年1月12日周三 10:15写道:

> +1  (non-binding)
>
> On Wed, 12 Jan 2022 at 09:59, Zike Yang 
> wrote:
>
> > +1 (non-binding)
> >
> > On Wed, Jan 12, 2022 at 9:58 AM Haiting Jiang 
> > wrote:
> > >
> > > This is the voting thread for PIP-132. It will stay open for at least
> 48
> > hours.
> > >
> > > https://github.com/apache/pulsar/issues/13591
> > >
> > > Pasted below for quoting convenience.
> > >
> > > 
> > >
> > > ## Motivation
> > >
> > > Currently, Pulsar client (Java) only checks payload size for max
> message
> > size validation.
> > >
> > > Client throws TimeoutException if we produce a message with too many
> > properties, see [1].
> > > But the root cause is that is trigged TooLongFrameException in broker
> > server.
> > >
> > > In this PIP, I propose to include message header size when check
> > maxMessageSize of non-batch
> > > messages, this brings the following benefits.
> > > 1. Clients can throw InvalidMessageException immediately if properties
> > takes too much storage space.
> > > 2. This will make the behaviour consistent with topic level max message
> > size check in broker.
> > > 3. Strictly limit the entry size less than maxMessageSize, avoid
> sending
> > message to bookkeeper failed.
> > >
> > > ## Goal
> > >
> > > Include message header size when check maxMessageSize for non-batch
> > message on the client side.
> > >
> > > ## Implementation
> > >
> > > ```
> > > // Add a size check in
> > org.apache.pulsar.client.impl.ProducerImpl#processOpSendMsg
> > > if (op.msg != null // for non-batch messages only
> > > && op.getMessageHeaderAndPayloadSize() >
> ClientCnx.getMaxMessageSize()) {
> > > // finish send op with InvalidMessageException
> > > releaseSemaphoreForSendOp(op);
> > > op.sendComplete(new PulsarClientException(new InvalidMessageException,
> > op.sequenceId));
> > > }
> > >
> > >
> > > //
> >
> org.apache.pulsar.client.impl.ProducerImpl.OpSendMsg#getMessageHeaderAndPayloadSize
> > >
> > > public int getMessageHeaderAndPayloadSize() {
> > > ByteBuf cmdHeader = cmd.getFirst();
> > > cmdHeader.markReaderIndex();
> > > int totalSize = cmdHeader.readInt();
> > > int cmdSize = cmdHeader.readInt();
> > > int msgHeadersAndPayloadSize = totalSize - cmdSize - 4;
> > > cmdHeader.resetReaderIndex();
> > > return msgHeadersAndPayloadSize;
> > > }
> > > ```
> > >
> > > ## Reject Alternatives
> > > Add a new property like "maxPropertiesSize" or "maxHeaderSize" in
> > broker.conf and pass it to
> > > client like maxMessageSize. But the implementation is much more
> complex,
> > and don't have the
> > > benefit 2 and 3 mentioned in Motivation.
> > >
> > > ## Compatibility Issue
> > > As a matter of fact, this PIP narrows down the sendable range.
> > Previously, when maxMessageSize
> > > is 1KB, it's ok to send message with 1KB properties and 1KB payload.
> But
> > with this PIP, the
> > > sending will fail with InvalidMessageException.
> > >
> > > One conservative way is to add a boolean config
> > "includeHeaderInSizeCheck" to enable this
> > > feature. But I think it's OK to enable this directly as it's more
> > reasonable, and I don't see good
> > > migration plan if we add a config for this.
> > >
> > > [1] https://github.com/apache/pulsar/issues/13560
> > >
> > > Thanks,
> > > Haiting Jiang
> >
> >
> >
> > --
> > Zike Yang
> >
>


[Discuss] Create new issues to SDKs in different languages

2022-01-11 Thread r...@apache.org
Hello everyone:

At present, all our PIP and related function changes are mainly in the Java
language, and all new functions will be merged into the Java SDK first, but
for SDKs in other languages, this is completely a black box, they don't
know what changes or optimizations have been made on the Java SDK side.

The most typical problem is that when users of other languages encounter
various problems during use, when the maintainers of other languages want
to fix these problems, we do not know that the Java SDK side has made these
changes. Therefore, every current solution is to constantly check where the
gap of the current Java SDK is, which brings great challenges to the
maintainers themselves.

So here is an idea, when the committters/PMC responsible for reviewing the
Java SDK can do more to help evaluate whether these PIPs or new changes
need to support this function in other languages, and then the
corresponding issue is created in the corresponding SDK, so that it is
convenient for the maintainers of other language SDKs to further evaluate
the priority of this function, and it can also attract more contributors
who are good at certain languages to claim the corresponding issue and
contribute the corresponding function.

--
Thanks
Xiaolong Ran


Re: [DISCUSSION] PIP-129: Introduce intermediate state for ledger deletion

2022-01-11 Thread Hang Chen
+1 Good Job. Looking forward to this feature.

Best,
Hang

PengHui Li  于2022年1月12日周三 10:54写道:
>
> +1 It's a nice approach for making sure the ledger can be deleted correctly.
>
> Regards,
> Penghui
>
> On Wed, Jan 12, 2022 at 10:23 AM Zhanpeng Wu 
> wrote:
>
> > https://github.com/apache/pulsar/issues/13526
> >
> > 
> >
> > ## Motivation
> >
> > Under the current ledger-trimming design in
> > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`,
> > we need to collect those ledgers that need to be deleted first, and then
> > perform the asynchronous deletion of the ledger concurrently, but we do not
> > continue to pay attention to whether the deletion operation is completed.
> > If the meta-information update has been successfully completed but an error
> > occurs during the asynchronous deletion, the ledger may not be deleted, but
> > at the logical level we think that the deletion has been completed, which
> > will make this part of the data remain in the storage layer forever (such
> > as bk). As the usage time of the cluster becomes longer, the residual data
> > that cannot be deleted will gradually increase.
> >
> > In order to achieve this goal, we can separate the logic of
> > meta-information update and ledger deletion. In the trimming process, we
> > can first mark which ledgers are deletable, and update the results to the
> > metadatastore. We can perform the deletion of marked ledgers asynchronously
> > in the callback of updating the meta information, so that the original
> > logic can be retained seamlessly. Therefore, when we are rolling upgrade or
> > rollback, the only difference is whether the deleted ledger is marked for
> > deletion.
> >
> > To be more specific:
> > 1. for upgrade, only the marker information of ledger has been added, and
> > the logical sequence of deletion has not changed.
> > 2. for rollback, some ledgers that have been marked for deletion may not be
> > deleted due to the restart of the broker. This behavior is consistent with
> > the original version.
> >
> > In addition, if the ledger that has been marked is not deleted
> > successfully, the marker will not be removed. So for this part of ledgers,
> > every time trimming is triggered, it will be deleted again, which is
> > equivalent to a check and retry mechanism.
> >
> > ## Goal
> >
> > We need to modify some logic in
> > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`
> > so that the ledger deletion logic in ledger-trimming is split into two
> > stages, marking and deleting. Once the marker information is updated to the
> > metadatastore, every trimming will try to trigger the ledger deletion until
> > all the deleteable ledgers are successfully deleted.
> >
> > ## Implementation
> >
> > This proposal aims to separate the deletion logic in ledger-trimming, so
> > that `ManagedLedgerImpl#internalTrimLedgers` is responsible for marking the
> > deletable ledgers and then perform actual ledger deletion according to the
> > metadatastore.
> >
> > Therefore, the entire trimming process is broken down into the following
> > steps:
> >
> > 1. mark deletable ledgers and update ledger metadata.
> > 2. do acutual ledger deletion after metadata is updated.
> >
> > For step 1, we can store the marker of deletable information in
> > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#propertiesMap`. When
> > retrieving the deleted ledger information, we can directly query by
> > iterating `propertiesMap`. If this solution is not accepted, maybe we can
> > create a new znode to store these information, but this approach will not
> > be able to reuse the current design.
> >
> > For step 2, we can perform the deletion of marked ledgers asynchronously in
> > the callback of updating the meta information. And every trimming will
> > trigger the check and delete for those deleteable ledgers.
> >
> > Related PR: https://github.com/apache/pulsar/pull/13575
> >


Re: [DISCUSSION] PIP-129: Introduce intermediate state for ledger deletion

2022-01-11 Thread PengHui Li
+1 It's a nice approach for making sure the ledger can be deleted correctly.

Regards,
Penghui

On Wed, Jan 12, 2022 at 10:23 AM Zhanpeng Wu 
wrote:

> https://github.com/apache/pulsar/issues/13526
>
> 
>
> ## Motivation
>
> Under the current ledger-trimming design in
> `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`,
> we need to collect those ledgers that need to be deleted first, and then
> perform the asynchronous deletion of the ledger concurrently, but we do not
> continue to pay attention to whether the deletion operation is completed.
> If the meta-information update has been successfully completed but an error
> occurs during the asynchronous deletion, the ledger may not be deleted, but
> at the logical level we think that the deletion has been completed, which
> will make this part of the data remain in the storage layer forever (such
> as bk). As the usage time of the cluster becomes longer, the residual data
> that cannot be deleted will gradually increase.
>
> In order to achieve this goal, we can separate the logic of
> meta-information update and ledger deletion. In the trimming process, we
> can first mark which ledgers are deletable, and update the results to the
> metadatastore. We can perform the deletion of marked ledgers asynchronously
> in the callback of updating the meta information, so that the original
> logic can be retained seamlessly. Therefore, when we are rolling upgrade or
> rollback, the only difference is whether the deleted ledger is marked for
> deletion.
>
> To be more specific:
> 1. for upgrade, only the marker information of ledger has been added, and
> the logical sequence of deletion has not changed.
> 2. for rollback, some ledgers that have been marked for deletion may not be
> deleted due to the restart of the broker. This behavior is consistent with
> the original version.
>
> In addition, if the ledger that has been marked is not deleted
> successfully, the marker will not be removed. So for this part of ledgers,
> every time trimming is triggered, it will be deleted again, which is
> equivalent to a check and retry mechanism.
>
> ## Goal
>
> We need to modify some logic in
> `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`
> so that the ledger deletion logic in ledger-trimming is split into two
> stages, marking and deleting. Once the marker information is updated to the
> metadatastore, every trimming will try to trigger the ledger deletion until
> all the deleteable ledgers are successfully deleted.
>
> ## Implementation
>
> This proposal aims to separate the deletion logic in ledger-trimming, so
> that `ManagedLedgerImpl#internalTrimLedgers` is responsible for marking the
> deletable ledgers and then perform actual ledger deletion according to the
> metadatastore.
>
> Therefore, the entire trimming process is broken down into the following
> steps:
>
> 1. mark deletable ledgers and update ledger metadata.
> 2. do acutual ledger deletion after metadata is updated.
>
> For step 1, we can store the marker of deletable information in
> `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#propertiesMap`. When
> retrieving the deleted ledger information, we can directly query by
> iterating `propertiesMap`. If this solution is not accepted, maybe we can
> create a new znode to store these information, but this approach will not
> be able to reuse the current design.
>
> For step 2, we can perform the deletion of marked ledgers asynchronously in
> the callback of updating the meta information. And every trimming will
> trigger the check and delete for those deleteable ledgers.
>
> Related PR: https://github.com/apache/pulsar/pull/13575
>


[DISCUSSION] PIP-129: Introduce intermediate state for ledger deletion

2022-01-11 Thread Zhanpeng Wu
https://github.com/apache/pulsar/issues/13526



## Motivation

Under the current ledger-trimming design in
`org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`,
we need to collect those ledgers that need to be deleted first, and then
perform the asynchronous deletion of the ledger concurrently, but we do not
continue to pay attention to whether the deletion operation is completed.
If the meta-information update has been successfully completed but an error
occurs during the asynchronous deletion, the ledger may not be deleted, but
at the logical level we think that the deletion has been completed, which
will make this part of the data remain in the storage layer forever (such
as bk). As the usage time of the cluster becomes longer, the residual data
that cannot be deleted will gradually increase.

In order to achieve this goal, we can separate the logic of
meta-information update and ledger deletion. In the trimming process, we
can first mark which ledgers are deletable, and update the results to the
metadatastore. We can perform the deletion of marked ledgers asynchronously
in the callback of updating the meta information, so that the original
logic can be retained seamlessly. Therefore, when we are rolling upgrade or
rollback, the only difference is whether the deleted ledger is marked for
deletion.

To be more specific:
1. for upgrade, only the marker information of ledger has been added, and
the logical sequence of deletion has not changed.
2. for rollback, some ledgers that have been marked for deletion may not be
deleted due to the restart of the broker. This behavior is consistent with
the original version.

In addition, if the ledger that has been marked is not deleted
successfully, the marker will not be removed. So for this part of ledgers,
every time trimming is triggered, it will be deleted again, which is
equivalent to a check and retry mechanism.

## Goal

We need to modify some logic in
`org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`
so that the ledger deletion logic in ledger-trimming is split into two
stages, marking and deleting. Once the marker information is updated to the
metadatastore, every trimming will try to trigger the ledger deletion until
all the deleteable ledgers are successfully deleted.

## Implementation

This proposal aims to separate the deletion logic in ledger-trimming, so
that `ManagedLedgerImpl#internalTrimLedgers` is responsible for marking the
deletable ledgers and then perform actual ledger deletion according to the
metadatastore.

Therefore, the entire trimming process is broken down into the following
steps:

1. mark deletable ledgers and update ledger metadata.
2. do acutual ledger deletion after metadata is updated.

For step 1, we can store the marker of deletable information in
`org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#propertiesMap`. When
retrieving the deleted ledger information, we can directly query by
iterating `propertiesMap`. If this solution is not accepted, maybe we can
create a new znode to store these information, but this approach will not
be able to reuse the current design.

For step 2, we can perform the deletion of marked ledgers asynchronously in
the callback of updating the meta information. And every trimming will
trigger the check and delete for those deleteable ledgers.

Related PR: https://github.com/apache/pulsar/pull/13575


Re: [DISCUSSION] PIP-135: Include MetadataStore backend for Etcd

2022-01-11 Thread 陳智弘
+1

Haiting Jiang  於 2022年1月12日 週三 09:50 寫道:

> +1
>
> On 2022/01/12 01:44:21 PengHui Li wrote:
> > +1
> >
> > Penghui
> >
> > On Wed, Jan 12, 2022 at 8:39 AM mattison chao 
> > wrote:
> >
> > > +1
> > >
> > > On Wed, 12 Jan 2022 at 08:09, Matteo Merli  wrote:
> > >
> > > > https://github.com/apache/pulsar/issues/13717
> > > >
> > > > -
> > > >
> > > > ## Motivation
> > > >
> > > > Since all the pieces that composed the proposal in PIP-45 were
> finally
> > > > merged
> > > > and are currently ready for 2.10 release, it is now possible to add
> other
> > > > metadata backends that can be used to support a BookKeeper + Pulsar
> > > > cluster.
> > > >
> > > > One of the popular systems that is most commonly used as an
> alternative
> > > of
> > > > ZooKeeper is Etcd, thus it makes sense to have this as the first
> > > > non-zookeeper
> > > > implementation.
> > > >
> > > > ## Goal
> > > >
> > > > Provide an Etcd implementation for the `MetadataStore` API. This will
> > > allow
> > > > users to deploy Pulsar clusters using Etcd service for the metadata
> and
> > > it
> > > > will
> > > > not require the presence of ZooKeeper.
> > > >
> > > >
> > > > ## Implementation
> > > >
> > > >  * Use the existing JEtcd Java client library for Etcd
> > > >  * Extends the `AbstractBatchedMetadataStore` class, in order to
> reuse
> > > the
> > > >transparent batching logic that will be shared with the ZooKeeper
> > > >implementation.
> > > >
> > > > Work in progress: https://github.com/apache/pulsar/pull/13225
> > > >
> > > >
> > > >
> > > > --
> > > > Matteo Merli
> > > > 
> > > >
> > >
> >
>


Re: [VOTE] PIP-132: Include message header size when check maxMessageSize of non-batch message on the client side.

2022-01-11 Thread mattison chao
+1  (non-binding)

On Wed, 12 Jan 2022 at 09:59, Zike Yang 
wrote:

> +1 (non-binding)
>
> On Wed, Jan 12, 2022 at 9:58 AM Haiting Jiang 
> wrote:
> >
> > This is the voting thread for PIP-132. It will stay open for at least 48
> hours.
> >
> > https://github.com/apache/pulsar/issues/13591
> >
> > Pasted below for quoting convenience.
> >
> > 
> >
> > ## Motivation
> >
> > Currently, Pulsar client (Java) only checks payload size for max message
> size validation.
> >
> > Client throws TimeoutException if we produce a message with too many
> properties, see [1].
> > But the root cause is that is trigged TooLongFrameException in broker
> server.
> >
> > In this PIP, I propose to include message header size when check
> maxMessageSize of non-batch
> > messages, this brings the following benefits.
> > 1. Clients can throw InvalidMessageException immediately if properties
> takes too much storage space.
> > 2. This will make the behaviour consistent with topic level max message
> size check in broker.
> > 3. Strictly limit the entry size less than maxMessageSize, avoid sending
> message to bookkeeper failed.
> >
> > ## Goal
> >
> > Include message header size when check maxMessageSize for non-batch
> message on the client side.
> >
> > ## Implementation
> >
> > ```
> > // Add a size check in
> org.apache.pulsar.client.impl.ProducerImpl#processOpSendMsg
> > if (op.msg != null // for non-batch messages only
> > && op.getMessageHeaderAndPayloadSize() > ClientCnx.getMaxMessageSize()) {
> > // finish send op with InvalidMessageException
> > releaseSemaphoreForSendOp(op);
> > op.sendComplete(new PulsarClientException(new InvalidMessageException,
> op.sequenceId));
> > }
> >
> >
> > //
> org.apache.pulsar.client.impl.ProducerImpl.OpSendMsg#getMessageHeaderAndPayloadSize
> >
> > public int getMessageHeaderAndPayloadSize() {
> > ByteBuf cmdHeader = cmd.getFirst();
> > cmdHeader.markReaderIndex();
> > int totalSize = cmdHeader.readInt();
> > int cmdSize = cmdHeader.readInt();
> > int msgHeadersAndPayloadSize = totalSize - cmdSize - 4;
> > cmdHeader.resetReaderIndex();
> > return msgHeadersAndPayloadSize;
> > }
> > ```
> >
> > ## Reject Alternatives
> > Add a new property like "maxPropertiesSize" or "maxHeaderSize" in
> broker.conf and pass it to
> > client like maxMessageSize. But the implementation is much more complex,
> and don't have the
> > benefit 2 and 3 mentioned in Motivation.
> >
> > ## Compatibility Issue
> > As a matter of fact, this PIP narrows down the sendable range.
> Previously, when maxMessageSize
> > is 1KB, it's ok to send message with 1KB properties and 1KB payload. But
> with this PIP, the
> > sending will fail with InvalidMessageException.
> >
> > One conservative way is to add a boolean config
> "includeHeaderInSizeCheck" to enable this
> > feature. But I think it's OK to enable this directly as it's more
> reasonable, and I don't see good
> > migration plan if we add a config for this.
> >
> > [1] https://github.com/apache/pulsar/issues/13560
> >
> > Thanks,
> > Haiting Jiang
>
>
>
> --
> Zike Yang
>


Re: [VOTE] PIP-132: Include message header size when check maxMessageSize of non-batch message on the client side.

2022-01-11 Thread Hang Chen
+1 (binding)

Best,
Hang

Zike Yang  于2022年1月12日周三 09:59写道:
>
> +1 (non-binding)
>
> On Wed, Jan 12, 2022 at 9:58 AM Haiting Jiang  wrote:
> >
> > This is the voting thread for PIP-132. It will stay open for at least 48 
> > hours.
> >
> > https://github.com/apache/pulsar/issues/13591
> >
> > Pasted below for quoting convenience.
> >
> > 
> >
> > ## Motivation
> >
> > Currently, Pulsar client (Java) only checks payload size for max message 
> > size validation.
> >
> > Client throws TimeoutException if we produce a message with too many 
> > properties, see [1].
> > But the root cause is that is trigged TooLongFrameException in broker 
> > server.
> >
> > In this PIP, I propose to include message header size when check 
> > maxMessageSize of non-batch
> > messages, this brings the following benefits.
> > 1. Clients can throw InvalidMessageException immediately if properties 
> > takes too much storage space.
> > 2. This will make the behaviour consistent with topic level max message 
> > size check in broker.
> > 3. Strictly limit the entry size less than maxMessageSize, avoid sending 
> > message to bookkeeper failed.
> >
> > ## Goal
> >
> > Include message header size when check maxMessageSize for non-batch message 
> > on the client side.
> >
> > ## Implementation
> >
> > ```
> > // Add a size check in 
> > org.apache.pulsar.client.impl.ProducerImpl#processOpSendMsg
> > if (op.msg != null // for non-batch messages only
> > && op.getMessageHeaderAndPayloadSize() > ClientCnx.getMaxMessageSize()) {
> > // finish send op with InvalidMessageException
> > releaseSemaphoreForSendOp(op);
> > op.sendComplete(new PulsarClientException(new InvalidMessageException, 
> > op.sequenceId));
> > }
> >
> >
> > // 
> > org.apache.pulsar.client.impl.ProducerImpl.OpSendMsg#getMessageHeaderAndPayloadSize
> >
> > public int getMessageHeaderAndPayloadSize() {
> > ByteBuf cmdHeader = cmd.getFirst();
> > cmdHeader.markReaderIndex();
> > int totalSize = cmdHeader.readInt();
> > int cmdSize = cmdHeader.readInt();
> > int msgHeadersAndPayloadSize = totalSize - cmdSize - 4;
> > cmdHeader.resetReaderIndex();
> > return msgHeadersAndPayloadSize;
> > }
> > ```
> >
> > ## Reject Alternatives
> > Add a new property like "maxPropertiesSize" or "maxHeaderSize" in 
> > broker.conf and pass it to
> > client like maxMessageSize. But the implementation is much more complex, 
> > and don't have the
> > benefit 2 and 3 mentioned in Motivation.
> >
> > ## Compatibility Issue
> > As a matter of fact, this PIP narrows down the sendable range. Previously, 
> > when maxMessageSize
> > is 1KB, it's ok to send message with 1KB properties and 1KB payload. But 
> > with this PIP, the
> > sending will fail with InvalidMessageException.
> >
> > One conservative way is to add a boolean config "includeHeaderInSizeCheck" 
> > to enable this
> > feature. But I think it's OK to enable this directly as it's more 
> > reasonable, and I don't see good
> > migration plan if we add a config for this.
> >
> > [1] https://github.com/apache/pulsar/issues/13560
> >
> > Thanks,
> > Haiting Jiang
>
>
>
> --
> Zike Yang


Re: [VOTE] PIP-132: Include message header size when check maxMessageSize of non-batch message on the client side.

2022-01-11 Thread Zike Yang
+1 (non-binding)

On Wed, Jan 12, 2022 at 9:58 AM Haiting Jiang  wrote:
>
> This is the voting thread for PIP-132. It will stay open for at least 48 
> hours.
>
> https://github.com/apache/pulsar/issues/13591
>
> Pasted below for quoting convenience.
>
> 
>
> ## Motivation
>
> Currently, Pulsar client (Java) only checks payload size for max message size 
> validation.
>
> Client throws TimeoutException if we produce a message with too many 
> properties, see [1].
> But the root cause is that is trigged TooLongFrameException in broker server.
>
> In this PIP, I propose to include message header size when check 
> maxMessageSize of non-batch
> messages, this brings the following benefits.
> 1. Clients can throw InvalidMessageException immediately if properties takes 
> too much storage space.
> 2. This will make the behaviour consistent with topic level max message size 
> check in broker.
> 3. Strictly limit the entry size less than maxMessageSize, avoid sending 
> message to bookkeeper failed.
>
> ## Goal
>
> Include message header size when check maxMessageSize for non-batch message 
> on the client side.
>
> ## Implementation
>
> ```
> // Add a size check in 
> org.apache.pulsar.client.impl.ProducerImpl#processOpSendMsg
> if (op.msg != null // for non-batch messages only
> && op.getMessageHeaderAndPayloadSize() > ClientCnx.getMaxMessageSize()) {
> // finish send op with InvalidMessageException
> releaseSemaphoreForSendOp(op);
> op.sendComplete(new PulsarClientException(new InvalidMessageException, 
> op.sequenceId));
> }
>
>
> // 
> org.apache.pulsar.client.impl.ProducerImpl.OpSendMsg#getMessageHeaderAndPayloadSize
>
> public int getMessageHeaderAndPayloadSize() {
> ByteBuf cmdHeader = cmd.getFirst();
> cmdHeader.markReaderIndex();
> int totalSize = cmdHeader.readInt();
> int cmdSize = cmdHeader.readInt();
> int msgHeadersAndPayloadSize = totalSize - cmdSize - 4;
> cmdHeader.resetReaderIndex();
> return msgHeadersAndPayloadSize;
> }
> ```
>
> ## Reject Alternatives
> Add a new property like "maxPropertiesSize" or "maxHeaderSize" in broker.conf 
> and pass it to
> client like maxMessageSize. But the implementation is much more complex, and 
> don't have the
> benefit 2 and 3 mentioned in Motivation.
>
> ## Compatibility Issue
> As a matter of fact, this PIP narrows down the sendable range. Previously, 
> when maxMessageSize
> is 1KB, it's ok to send message with 1KB properties and 1KB payload. But with 
> this PIP, the
> sending will fail with InvalidMessageException.
>
> One conservative way is to add a boolean config "includeHeaderInSizeCheck" to 
> enable this
> feature. But I think it's OK to enable this directly as it's more reasonable, 
> and I don't see good
> migration plan if we add a config for this.
>
> [1] https://github.com/apache/pulsar/issues/13560
>
> Thanks,
> Haiting Jiang



-- 
Zike Yang


Re: [VOTE] PIP-121: Pulsar cluster level auto failover on client side

2022-01-11 Thread Zike Yang
+1 (non-binding)

On Wed, Jan 12, 2022 at 9:52 AM Haiting Jiang  wrote:
>
> +1
>
> On 2022/01/12 00:09:26 Matteo Merli wrote:
> > +1
> > --
> > Matteo Merli
> > 
> >
> > On Tue, Jan 11, 2022 at 12:07 PM Neng Lu  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > On Mon, Jan 10, 2022 at 12:40 AM PengHui Li  wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Penghui
> > > >
> > > > On Mon, Jan 10, 2022 at 4:38 PM Enrico Olivelli 
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Enrico
> > > > >
> > > > > Il giorno lun 10 gen 2022 alle ore 07:45 Hang Chen
> > > > >  ha scritto:
> > > > > >
> > > > > > This is the voting thread for PIP-121. It will stay open for at 
> > > > > > least
> > > > 48
> > > > > > hours.
> > > > > >
> > > > > > https://github.com/apache/pulsar/issues/13315
> > > > > >
> > > > > > Pasted below for quoting convenience.
> > > > > >
> > > > > > -
> > > > > > ### Motivation
> > > > > > We have geo-replication to support Pulsar cluster level failover. We
> > > > > > can set up Pulsar cluster A as a primary cluster in data center A, 
> > > > > > and
> > > > > > setup Pulsar cluster B as backup cluster in data center B. Then we
> > > > > > configure geo-replication between cluster A and cluster B. All the
> > > > > > clients are connected to the Pulsar cluster by DNS. If cluster A is
> > > > > > down, we should switch the DNS to point the target Pulsar cluster 
> > > > > > from
> > > > > > cluster A to cluster B. After the clients are resolved to cluster B,
> > > > > > they can produce and consume messages normally. After cluster A
> > > > > > recovers, the administrator should switch the DNS back to cluster A.
> > > > > >
> > > > > > However, the current method has two shortcomings.
> > > > > > 1. The administrator should monitor the status of all Pulsar 
> > > > > > clusters,
> > > > > > and switch the DNS as soon as possible when cluster A is down. The
> > > > > > switch and recovery is not automatic and recovery time is controlled
> > > > > > by the administrator, which will put the administrator under heavy
> > > > > > load.
> > > > > > 2. The Pulsar client and DNS system have a cache. When the
> > > > > > administrator switches the DNS from cluster A to Cluster B, it will
> > > > > > take some time for cache trigger timeout, which will delay client
> > > > > > recovery time and lead to the product/consumer message failing.
> > > > > >
> > > > > > ### Goal
> > > > > > It's better to provide an automatic cluster level failure recovery
> > > > > > mechanism to make pulsar cluster failover more effective. We should
> > > > > > support pulsar clients auto switching from cluster A to cluster B 
> > > > > > when
> > > > > > it detects cluster A has been down according to the configured
> > > > > > detecting policy and switch back to cluster A when it has recovered.
> > > > > > The reason why we should switch back to cluster A is that most
> > > > > > applications may be deployed in data center A and they have low
> > > > > > network cost for communicating with pulsar cluster A. If they keep
> > > > > > visiting pulsar cluster B, they have high network cost, and cause 
> > > > > > high
> > > > > > produce/consume latency.
> > > > > >
> > > > > > In order to improve the DNS cache problem, we should provide an
> > > > > > administrator controlled switch provider for administrators to 
> > > > > > update
> > > > > > service URLs.
> > > > > >
> > > > > > In the end, we should provide an auto service URL switch provider 
> > > > > > and
> > > > > > administrator controlled switch provider.
> > > > > >
> > > > > > ### Design
> > > > > > We have already provided the `ServiceUrlProvider` interface to 
> > > > > > support
> > > > > > different service URLs. In order to support automatic cluster level
> > > > > > failure auto recovery, we can provide different ServiceUrlProvider
> > > > > > implementations. For current requirements, we can provide
> > > > > > `AutoClusterFailover` and `ControlledClusterFailover`.
> > > > > >
> > > > > >  AutoClusterFailover
> > > > > > In order to support auto switching from the primary cluster to the
> > > > > > secondary, we can provide a probe task, which will probe the 
> > > > > > activity
> > > > > > of the primary cluster and the secondary one. When it finds the
> > > > > > primary cluster failed more than `failoverDelayMs`, it will switch 
> > > > > > to
> > > > > > the secondary cluster by calling `updateServiceUrl`. After switching
> > > > > > to the secondary cluster, the `AutoClusterFailover` will continue to
> > > > > > probe the primary cluster. If the primary cluster comes back and
> > > > > > remains active for `switchBackDelayMs`, it will switch back to the
> > > > > > primary cluster.
> > > > > > The APIs are listed as follows.
> > > > > >
> > > > > > In order to support multiple secondary clusters, use List to store
> > > > > > secondary cluster urls. When the primary cluster probe fails for
> > > > > > failoverDelayMs, it will 

[VOTE] PIP-132: Include message header size when check maxMessageSize of non-batch message on the client side.

2022-01-11 Thread Haiting Jiang
This is the voting thread for PIP-132. It will stay open for at least 48 hours.

https://github.com/apache/pulsar/issues/13591

Pasted below for quoting convenience.



## Motivation

Currently, Pulsar client (Java) only checks payload size for max message size 
validation.

Client throws TimeoutException if we produce a message with too many 
properties, see [1].
But the root cause is that is trigged TooLongFrameException in broker server.

In this PIP, I propose to include message header size when check maxMessageSize 
of non-batch
messages, this brings the following benefits.
1. Clients can throw InvalidMessageException immediately if properties takes 
too much storage space.
2. This will make the behaviour consistent with topic level max message size 
check in broker.
3. Strictly limit the entry size less than maxMessageSize, avoid sending 
message to bookkeeper failed.

## Goal

Include message header size when check maxMessageSize for non-batch message on 
the client side.

## Implementation

```
// Add a size check in 
org.apache.pulsar.client.impl.ProducerImpl#processOpSendMsg
if (op.msg != null // for non-batch messages only
&& op.getMessageHeaderAndPayloadSize() > ClientCnx.getMaxMessageSize()) {
// finish send op with InvalidMessageException
releaseSemaphoreForSendOp(op);
op.sendComplete(new PulsarClientException(new InvalidMessageException, 
op.sequenceId));
}


// 
org.apache.pulsar.client.impl.ProducerImpl.OpSendMsg#getMessageHeaderAndPayloadSize

public int getMessageHeaderAndPayloadSize() {
ByteBuf cmdHeader = cmd.getFirst();
cmdHeader.markReaderIndex();
int totalSize = cmdHeader.readInt();
int cmdSize = cmdHeader.readInt();
int msgHeadersAndPayloadSize = totalSize - cmdSize - 4;
cmdHeader.resetReaderIndex();
return msgHeadersAndPayloadSize;
}
```

## Reject Alternatives
Add a new property like "maxPropertiesSize" or "maxHeaderSize" in broker.conf 
and pass it to
client like maxMessageSize. But the implementation is much more complex, and 
don't have the
benefit 2 and 3 mentioned in Motivation.

## Compatibility Issue
As a matter of fact, this PIP narrows down the sendable range. Previously, when 
maxMessageSize
is 1KB, it's ok to send message with 1KB properties and 1KB payload. But with 
this PIP, the
sending will fail with InvalidMessageException.

One conservative way is to add a boolean config "includeHeaderInSizeCheck" to 
enable this
feature. But I think it's OK to enable this directly as it's more reasonable, 
and I don't see good
migration plan if we add a config for this.

[1] https://github.com/apache/pulsar/issues/13560

Thanks,
Haiting Jiang


Re: [VOTE] PIP-121: Pulsar cluster level auto failover on client side

2022-01-11 Thread mattison chao
+1

On Wed, 12 Jan 2022 at 09:52, Haiting Jiang  wrote:

> +1
>
> On 2022/01/12 00:09:26 Matteo Merli wrote:
> > +1
> > --
> > Matteo Merli
> > 
> >
> > On Tue, Jan 11, 2022 at 12:07 PM Neng Lu  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > On Mon, Jan 10, 2022 at 12:40 AM PengHui Li 
> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Penghui
> > > >
> > > > On Mon, Jan 10, 2022 at 4:38 PM Enrico Olivelli  >
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Enrico
> > > > >
> > > > > Il giorno lun 10 gen 2022 alle ore 07:45 Hang Chen
> > > > >  ha scritto:
> > > > > >
> > > > > > This is the voting thread for PIP-121. It will stay open for at
> least
> > > > 48
> > > > > > hours.
> > > > > >
> > > > > > https://github.com/apache/pulsar/issues/13315
> > > > > >
> > > > > > Pasted below for quoting convenience.
> > > > > >
> > > > > > -
> > > > > > ### Motivation
> > > > > > We have geo-replication to support Pulsar cluster level
> failover. We
> > > > > > can set up Pulsar cluster A as a primary cluster in data center
> A, and
> > > > > > setup Pulsar cluster B as backup cluster in data center B. Then
> we
> > > > > > configure geo-replication between cluster A and cluster B. All
> the
> > > > > > clients are connected to the Pulsar cluster by DNS. If cluster A
> is
> > > > > > down, we should switch the DNS to point the target Pulsar
> cluster from
> > > > > > cluster A to cluster B. After the clients are resolved to
> cluster B,
> > > > > > they can produce and consume messages normally. After cluster A
> > > > > > recovers, the administrator should switch the DNS back to
> cluster A.
> > > > > >
> > > > > > However, the current method has two shortcomings.
> > > > > > 1. The administrator should monitor the status of all Pulsar
> clusters,
> > > > > > and switch the DNS as soon as possible when cluster A is down.
> The
> > > > > > switch and recovery is not automatic and recovery time is
> controlled
> > > > > > by the administrator, which will put the administrator under
> heavy
> > > > > > load.
> > > > > > 2. The Pulsar client and DNS system have a cache. When the
> > > > > > administrator switches the DNS from cluster A to Cluster B, it
> will
> > > > > > take some time for cache trigger timeout, which will delay client
> > > > > > recovery time and lead to the product/consumer message failing.
> > > > > >
> > > > > > ### Goal
> > > > > > It's better to provide an automatic cluster level failure
> recovery
> > > > > > mechanism to make pulsar cluster failover more effective. We
> should
> > > > > > support pulsar clients auto switching from cluster A to cluster
> B when
> > > > > > it detects cluster A has been down according to the configured
> > > > > > detecting policy and switch back to cluster A when it has
> recovered.
> > > > > > The reason why we should switch back to cluster A is that most
> > > > > > applications may be deployed in data center A and they have low
> > > > > > network cost for communicating with pulsar cluster A. If they
> keep
> > > > > > visiting pulsar cluster B, they have high network cost, and
> cause high
> > > > > > produce/consume latency.
> > > > > >
> > > > > > In order to improve the DNS cache problem, we should provide an
> > > > > > administrator controlled switch provider for administrators to
> update
> > > > > > service URLs.
> > > > > >
> > > > > > In the end, we should provide an auto service URL switch
> provider and
> > > > > > administrator controlled switch provider.
> > > > > >
> > > > > > ### Design
> > > > > > We have already provided the `ServiceUrlProvider` interface to
> support
> > > > > > different service URLs. In order to support automatic cluster
> level
> > > > > > failure auto recovery, we can provide different
> ServiceUrlProvider
> > > > > > implementations. For current requirements, we can provide
> > > > > > `AutoClusterFailover` and `ControlledClusterFailover`.
> > > > > >
> > > > > >  AutoClusterFailover
> > > > > > In order to support auto switching from the primary cluster to
> the
> > > > > > secondary, we can provide a probe task, which will probe the
> activity
> > > > > > of the primary cluster and the secondary one. When it finds the
> > > > > > primary cluster failed more than `failoverDelayMs`, it will
> switch to
> > > > > > the secondary cluster by calling `updateServiceUrl`. After
> switching
> > > > > > to the secondary cluster, the `AutoClusterFailover` will
> continue to
> > > > > > probe the primary cluster. If the primary cluster comes back and
> > > > > > remains active for `switchBackDelayMs`, it will switch back to
> the
> > > > > > primary cluster.
> > > > > > The APIs are listed as follows.
> > > > > >
> > > > > > In order to support multiple secondary clusters, use List to
> store
> > > > > > secondary cluster urls. When the primary cluster probe fails for
> > > > > > failoverDelayMs, it will start to probe the secondary cluster
> list one
> > > > > > by one, once it finds the active 

Re: [VOTE] PIP-121: Pulsar cluster level auto failover on client side

2022-01-11 Thread Haiting Jiang
+1

On 2022/01/12 00:09:26 Matteo Merli wrote:
> +1
> --
> Matteo Merli
> 
> 
> On Tue, Jan 11, 2022 at 12:07 PM Neng Lu  wrote:
> >
> > +1 (non-binding)
> >
> > On Mon, Jan 10, 2022 at 12:40 AM PengHui Li  wrote:
> >
> > > +1 (binding)
> > >
> > > Penghui
> > >
> > > On Mon, Jan 10, 2022 at 4:38 PM Enrico Olivelli 
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Enrico
> > > >
> > > > Il giorno lun 10 gen 2022 alle ore 07:45 Hang Chen
> > > >  ha scritto:
> > > > >
> > > > > This is the voting thread for PIP-121. It will stay open for at least
> > > 48
> > > > > hours.
> > > > >
> > > > > https://github.com/apache/pulsar/issues/13315
> > > > >
> > > > > Pasted below for quoting convenience.
> > > > >
> > > > > -
> > > > > ### Motivation
> > > > > We have geo-replication to support Pulsar cluster level failover. We
> > > > > can set up Pulsar cluster A as a primary cluster in data center A, and
> > > > > setup Pulsar cluster B as backup cluster in data center B. Then we
> > > > > configure geo-replication between cluster A and cluster B. All the
> > > > > clients are connected to the Pulsar cluster by DNS. If cluster A is
> > > > > down, we should switch the DNS to point the target Pulsar cluster from
> > > > > cluster A to cluster B. After the clients are resolved to cluster B,
> > > > > they can produce and consume messages normally. After cluster A
> > > > > recovers, the administrator should switch the DNS back to cluster A.
> > > > >
> > > > > However, the current method has two shortcomings.
> > > > > 1. The administrator should monitor the status of all Pulsar clusters,
> > > > > and switch the DNS as soon as possible when cluster A is down. The
> > > > > switch and recovery is not automatic and recovery time is controlled
> > > > > by the administrator, which will put the administrator under heavy
> > > > > load.
> > > > > 2. The Pulsar client and DNS system have a cache. When the
> > > > > administrator switches the DNS from cluster A to Cluster B, it will
> > > > > take some time for cache trigger timeout, which will delay client
> > > > > recovery time and lead to the product/consumer message failing.
> > > > >
> > > > > ### Goal
> > > > > It's better to provide an automatic cluster level failure recovery
> > > > > mechanism to make pulsar cluster failover more effective. We should
> > > > > support pulsar clients auto switching from cluster A to cluster B when
> > > > > it detects cluster A has been down according to the configured
> > > > > detecting policy and switch back to cluster A when it has recovered.
> > > > > The reason why we should switch back to cluster A is that most
> > > > > applications may be deployed in data center A and they have low
> > > > > network cost for communicating with pulsar cluster A. If they keep
> > > > > visiting pulsar cluster B, they have high network cost, and cause high
> > > > > produce/consume latency.
> > > > >
> > > > > In order to improve the DNS cache problem, we should provide an
> > > > > administrator controlled switch provider for administrators to update
> > > > > service URLs.
> > > > >
> > > > > In the end, we should provide an auto service URL switch provider and
> > > > > administrator controlled switch provider.
> > > > >
> > > > > ### Design
> > > > > We have already provided the `ServiceUrlProvider` interface to support
> > > > > different service URLs. In order to support automatic cluster level
> > > > > failure auto recovery, we can provide different ServiceUrlProvider
> > > > > implementations. For current requirements, we can provide
> > > > > `AutoClusterFailover` and `ControlledClusterFailover`.
> > > > >
> > > > >  AutoClusterFailover
> > > > > In order to support auto switching from the primary cluster to the
> > > > > secondary, we can provide a probe task, which will probe the activity
> > > > > of the primary cluster and the secondary one. When it finds the
> > > > > primary cluster failed more than `failoverDelayMs`, it will switch to
> > > > > the secondary cluster by calling `updateServiceUrl`. After switching
> > > > > to the secondary cluster, the `AutoClusterFailover` will continue to
> > > > > probe the primary cluster. If the primary cluster comes back and
> > > > > remains active for `switchBackDelayMs`, it will switch back to the
> > > > > primary cluster.
> > > > > The APIs are listed as follows.
> > > > >
> > > > > In order to support multiple secondary clusters, use List to store
> > > > > secondary cluster urls. When the primary cluster probe fails for
> > > > > failoverDelayMs, it will start to probe the secondary cluster list one
> > > > > by one, once it finds the active cluster, it will switch to the target
> > > > > cluster. Notice: If you configured multiple clusters, you should turn
> > > > > on cluster level geo-replication to ensure the topic data sync between
> > > > > all primary and secondary clusters. Otherwise, it may distribute the
> > > > > topic data into different clusters. And 

Re: [VOTE] PIP-122: Change loadBalancer default loadSheddingStrategy to ThresholdShedder

2022-01-11 Thread Haiting Jiang
+1

On 2022/01/10 06:47:44 Hang Chen wrote:
> This is the voting thread for PIP-122. It will stay open for at least 48
> hours.
> 
> https://github.com/apache/pulsar/issues/13340
> 
> Pasted below for quoting convenience.
> 
> 
> 
> ### Motivation
> The ThresholdShedder load balance policy since Pulsar 2.6.0 by
> https://github.com/apache/pulsar/pull/6772. It can resolve many load
> balance issues of `OverloadShedder` and works well in many Pulsar
> production clusters.
> 
> In Pulsar 2.6.0, 2.7.0, 2.8.0 and 2.9.0, Pulsar's default load balance
> policy is `OverloadShedder`.
> 
> I think it's a good time for 2.10 to change default load balance
> policy to `ThresholdShedder`, it will make throughput more balance
> between brokers.
> 
> ### Proposed Changes
> In 2.10 release,for `broker.conf`, change
> `loadBalancerLoadSheddingStrategy` from
> `org.apache.pulsar.broker.loadbalance.impl.OverloadShedder` to
> `org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
> 


Re: [DISCUSSION] PIP-135: Include MetadataStore backend for Etcd

2022-01-11 Thread Haiting Jiang
+1

On 2022/01/12 01:44:21 PengHui Li wrote:
> +1
> 
> Penghui
> 
> On Wed, Jan 12, 2022 at 8:39 AM mattison chao 
> wrote:
> 
> > +1
> >
> > On Wed, 12 Jan 2022 at 08:09, Matteo Merli  wrote:
> >
> > > https://github.com/apache/pulsar/issues/13717
> > >
> > > -
> > >
> > > ## Motivation
> > >
> > > Since all the pieces that composed the proposal in PIP-45 were finally
> > > merged
> > > and are currently ready for 2.10 release, it is now possible to add other
> > > metadata backends that can be used to support a BookKeeper + Pulsar
> > > cluster.
> > >
> > > One of the popular systems that is most commonly used as an alternative
> > of
> > > ZooKeeper is Etcd, thus it makes sense to have this as the first
> > > non-zookeeper
> > > implementation.
> > >
> > > ## Goal
> > >
> > > Provide an Etcd implementation for the `MetadataStore` API. This will
> > allow
> > > users to deploy Pulsar clusters using Etcd service for the metadata and
> > it
> > > will
> > > not require the presence of ZooKeeper.
> > >
> > >
> > > ## Implementation
> > >
> > >  * Use the existing JEtcd Java client library for Etcd
> > >  * Extends the `AbstractBatchedMetadataStore` class, in order to reuse
> > the
> > >transparent batching logic that will be shared with the ZooKeeper
> > >implementation.
> > >
> > > Work in progress: https://github.com/apache/pulsar/pull/13225
> > >
> > >
> > >
> > > --
> > > Matteo Merli
> > > 
> > >
> >
> 


Re: [DISCUSSION] PIP-135: Include MetadataStore backend for Etcd

2022-01-11 Thread PengHui Li
+1

Penghui

On Wed, Jan 12, 2022 at 8:39 AM mattison chao 
wrote:

> +1
>
> On Wed, 12 Jan 2022 at 08:09, Matteo Merli  wrote:
>
> > https://github.com/apache/pulsar/issues/13717
> >
> > -
> >
> > ## Motivation
> >
> > Since all the pieces that composed the proposal in PIP-45 were finally
> > merged
> > and are currently ready for 2.10 release, it is now possible to add other
> > metadata backends that can be used to support a BookKeeper + Pulsar
> > cluster.
> >
> > One of the popular systems that is most commonly used as an alternative
> of
> > ZooKeeper is Etcd, thus it makes sense to have this as the first
> > non-zookeeper
> > implementation.
> >
> > ## Goal
> >
> > Provide an Etcd implementation for the `MetadataStore` API. This will
> allow
> > users to deploy Pulsar clusters using Etcd service for the metadata and
> it
> > will
> > not require the presence of ZooKeeper.
> >
> >
> > ## Implementation
> >
> >  * Use the existing JEtcd Java client library for Etcd
> >  * Extends the `AbstractBatchedMetadataStore` class, in order to reuse
> the
> >transparent batching logic that will be shared with the ZooKeeper
> >implementation.
> >
> > Work in progress: https://github.com/apache/pulsar/pull/13225
> >
> >
> >
> > --
> > Matteo Merli
> > 
> >
>


Re: [DISCUSSION] PIP-135: Include MetadataStore backend for Etcd

2022-01-11 Thread mattison chao
+1

On Wed, 12 Jan 2022 at 08:09, Matteo Merli  wrote:

> https://github.com/apache/pulsar/issues/13717
>
> -
>
> ## Motivation
>
> Since all the pieces that composed the proposal in PIP-45 were finally
> merged
> and are currently ready for 2.10 release, it is now possible to add other
> metadata backends that can be used to support a BookKeeper + Pulsar
> cluster.
>
> One of the popular systems that is most commonly used as an alternative of
> ZooKeeper is Etcd, thus it makes sense to have this as the first
> non-zookeeper
> implementation.
>
> ## Goal
>
> Provide an Etcd implementation for the `MetadataStore` API. This will allow
> users to deploy Pulsar clusters using Etcd service for the metadata and it
> will
> not require the presence of ZooKeeper.
>
>
> ## Implementation
>
>  * Use the existing JEtcd Java client library for Etcd
>  * Extends the `AbstractBatchedMetadataStore` class, in order to reuse the
>transparent batching logic that will be shared with the ZooKeeper
>implementation.
>
> Work in progress: https://github.com/apache/pulsar/pull/13225
>
>
>
> --
> Matteo Merli
> 
>


Re: [VOTE] PIP-121: Pulsar cluster level auto failover on client side

2022-01-11 Thread Matteo Merli
+1
--
Matteo Merli


On Tue, Jan 11, 2022 at 12:07 PM Neng Lu  wrote:
>
> +1 (non-binding)
>
> On Mon, Jan 10, 2022 at 12:40 AM PengHui Li  wrote:
>
> > +1 (binding)
> >
> > Penghui
> >
> > On Mon, Jan 10, 2022 at 4:38 PM Enrico Olivelli 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Enrico
> > >
> > > Il giorno lun 10 gen 2022 alle ore 07:45 Hang Chen
> > >  ha scritto:
> > > >
> > > > This is the voting thread for PIP-121. It will stay open for at least
> > 48
> > > > hours.
> > > >
> > > > https://github.com/apache/pulsar/issues/13315
> > > >
> > > > Pasted below for quoting convenience.
> > > >
> > > > -
> > > > ### Motivation
> > > > We have geo-replication to support Pulsar cluster level failover. We
> > > > can set up Pulsar cluster A as a primary cluster in data center A, and
> > > > setup Pulsar cluster B as backup cluster in data center B. Then we
> > > > configure geo-replication between cluster A and cluster B. All the
> > > > clients are connected to the Pulsar cluster by DNS. If cluster A is
> > > > down, we should switch the DNS to point the target Pulsar cluster from
> > > > cluster A to cluster B. After the clients are resolved to cluster B,
> > > > they can produce and consume messages normally. After cluster A
> > > > recovers, the administrator should switch the DNS back to cluster A.
> > > >
> > > > However, the current method has two shortcomings.
> > > > 1. The administrator should monitor the status of all Pulsar clusters,
> > > > and switch the DNS as soon as possible when cluster A is down. The
> > > > switch and recovery is not automatic and recovery time is controlled
> > > > by the administrator, which will put the administrator under heavy
> > > > load.
> > > > 2. The Pulsar client and DNS system have a cache. When the
> > > > administrator switches the DNS from cluster A to Cluster B, it will
> > > > take some time for cache trigger timeout, which will delay client
> > > > recovery time and lead to the product/consumer message failing.
> > > >
> > > > ### Goal
> > > > It's better to provide an automatic cluster level failure recovery
> > > > mechanism to make pulsar cluster failover more effective. We should
> > > > support pulsar clients auto switching from cluster A to cluster B when
> > > > it detects cluster A has been down according to the configured
> > > > detecting policy and switch back to cluster A when it has recovered.
> > > > The reason why we should switch back to cluster A is that most
> > > > applications may be deployed in data center A and they have low
> > > > network cost for communicating with pulsar cluster A. If they keep
> > > > visiting pulsar cluster B, they have high network cost, and cause high
> > > > produce/consume latency.
> > > >
> > > > In order to improve the DNS cache problem, we should provide an
> > > > administrator controlled switch provider for administrators to update
> > > > service URLs.
> > > >
> > > > In the end, we should provide an auto service URL switch provider and
> > > > administrator controlled switch provider.
> > > >
> > > > ### Design
> > > > We have already provided the `ServiceUrlProvider` interface to support
> > > > different service URLs. In order to support automatic cluster level
> > > > failure auto recovery, we can provide different ServiceUrlProvider
> > > > implementations. For current requirements, we can provide
> > > > `AutoClusterFailover` and `ControlledClusterFailover`.
> > > >
> > > >  AutoClusterFailover
> > > > In order to support auto switching from the primary cluster to the
> > > > secondary, we can provide a probe task, which will probe the activity
> > > > of the primary cluster and the secondary one. When it finds the
> > > > primary cluster failed more than `failoverDelayMs`, it will switch to
> > > > the secondary cluster by calling `updateServiceUrl`. After switching
> > > > to the secondary cluster, the `AutoClusterFailover` will continue to
> > > > probe the primary cluster. If the primary cluster comes back and
> > > > remains active for `switchBackDelayMs`, it will switch back to the
> > > > primary cluster.
> > > > The APIs are listed as follows.
> > > >
> > > > In order to support multiple secondary clusters, use List to store
> > > > secondary cluster urls. When the primary cluster probe fails for
> > > > failoverDelayMs, it will start to probe the secondary cluster list one
> > > > by one, once it finds the active cluster, it will switch to the target
> > > > cluster. Notice: If you configured multiple clusters, you should turn
> > > > on cluster level geo-replication to ensure the topic data sync between
> > > > all primary and secondary clusters. Otherwise, it may distribute the
> > > > topic data into different clusters. And the consumers won’t get the
> > > > whole data of the topic.
> > > >
> > > > In order to support different authentication configurations between
> > > > clusters, we provide the authentication relation configurations
> > > > updated with the target 

Re: [VOTE] PIP-122: Change loadBalancer default loadSheddingStrategy to ThresholdShedder

2022-01-11 Thread Matteo Merli
+1
--
Matteo Merli


On Tue, Jan 11, 2022 at 12:05 PM Neng Lu  wrote:
>
> +1 (non-binding)
>
> On Mon, Jan 10, 2022 at 12:39 AM Enrico Olivelli 
> wrote:
>
> > +1 (binding)
> >
> > Enrico
> >
> > Il giorno lun 10 gen 2022 alle ore 07:47 Hang Chen
> >  ha scritto:
> >
> > >
> > > This is the voting thread for PIP-122. It will stay open for at least 48
> > > hours.
> > >
> > > https://github.com/apache/pulsar/issues/13340
> > >
> > > Pasted below for quoting convenience.
> > >
> > > 
> > >
> > > ### Motivation
> > > The ThresholdShedder load balance policy since Pulsar 2.6.0 by
> > > https://github.com/apache/pulsar/pull/6772. It can resolve many load
> > > balance issues of `OverloadShedder` and works well in many Pulsar
> > > production clusters.
> > >
> > > In Pulsar 2.6.0, 2.7.0, 2.8.0 and 2.9.0, Pulsar's default load balance
> > > policy is `OverloadShedder`.
> > >
> > > I think it's a good time for 2.10 to change default load balance
> > > policy to `ThresholdShedder`, it will make throughput more balance
> > > between brokers.
> > >
> > > ### Proposed Changes
> > > In 2.10 release,for `broker.conf`, change
> > > `loadBalancerLoadSheddingStrategy` from
> > > `org.apache.pulsar.broker.loadbalance.impl.OverloadShedder` to
> > > `org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
> >


[DISCUSSION] PIP-135: Include MetadataStore backend for Etcd

2022-01-11 Thread Matteo Merli
https://github.com/apache/pulsar/issues/13717

-

## Motivation

Since all the pieces that composed the proposal in PIP-45 were finally merged
and are currently ready for 2.10 release, it is now possible to add other
metadata backends that can be used to support a BookKeeper + Pulsar cluster.

One of the popular systems that is most commonly used as an alternative of
ZooKeeper is Etcd, thus it makes sense to have this as the first non-zookeeper
implementation.

## Goal

Provide an Etcd implementation for the `MetadataStore` API. This will allow
users to deploy Pulsar clusters using Etcd service for the metadata and it will
not require the presence of ZooKeeper.


## Implementation

 * Use the existing JEtcd Java client library for Etcd
 * Extends the `AbstractBatchedMetadataStore` class, in order to reuse the
   transparent batching logic that will be shared with the ZooKeeper
   implementation.

Work in progress: https://github.com/apache/pulsar/pull/13225



--
Matteo Merli



Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States

2022-01-11 Thread Enrico Olivelli
Thank you for posting your PIP !

I am sharing some of Neng's concerns.
We should define clearly how security works.

Also, currently the function defines some "namespace" for the state
storage, and we recently added support for custom state storage
implementation. With this change each function will possibly access
other state storage namespaces (think about using a Database per each
tenant).

We should also state something about guarantees while accessing
multiple storages and/or about transactional (atomic?) access


Enrico

Il giorno mar 11 gen 2022 alle ore 21:38 Neng Lu  ha scritto:
>
> Before we advance further, we first need to get on the same page of the
> pros and cons of allowing this feature.
>
> If functions can access (especially the write access) other functions'
> state, the data ownership will be a mess, isolation is broken and data
> security might be compromised.
>
>
>
>
>
> On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill 
> wrote:
>
> > Original PIP: https://github.com/apache/pulsar/issues/13633
> >
> > Pasted below for quoting convenience.
> >
> > -
> >
> > ## Motivation
> >
> > Certain uses of Pulsar functions could benefit from the ability to access
> > the states of other functions. Currently functions can only access their
> > own states, and so sharing information between functions requires other
> > solutions such as writing to a separate database.
> >
> > ## Goal
> >
> > The goal is to enable the ability for a function to access another
> > function's state. Given another function's tenant, namespace, and name, any
> > function should be able to access the other function's state for read and
> > write purposes. This PIP is not concerned with expanding the capabilities
> > of function states, It only deals with expanding access to function states.
> >
> > ## API Changes
> >
> > The Pulsar function API would be modified to have the function context
> > implement the following interface for accessing function states using a
> > tenant, namespace, and name.
> >
> > ```
> > public interface SharedContext {
> > /**
> >  * Update the state value for the key.
> >  *
> >  * @param key   name of the key
> >  * @param value state value of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > void putState(String key, ByteBuffer value, String tenant, String ns,
> > String name);
> >
> > /**
> >  * Update the state value for the key, but don't wait for the
> > operation to be completed
> >  *
> >  * @param key   name of the key
> >  * @param value state value of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > CompletableFuture putStateAsync(String key, ByteBuffer value,
> > String tenant, String ns, String name);
> >
> > /**
> >  * Retrieve the state value for the key.
> >  *
> >  * @param key name of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  * @return the state value for the key.
> >  */
> > ByteBuffer getState(String key, String tenant, String ns, String name);
> >
> > /**
> >  * Retrieve the state value for the key, but don't wait for the
> > operation to be completed
> >  *
> >  * @param key name of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  * @return the state value for the key.
> >  */
> > CompletableFuture getStateAsync(String key, String tenant,
> > String ns, String name);
> >
> > /**
> >  * Delete the state value for the key.
> >  *
> >  * @param key   name of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > void deleteState(String key, String tenant, String ns, String name);
> >
> > /**
> >  * Delete the state value for the key, but don't wait for the
> > operation to be completed
> >  *
> >  * @param key   name of the key
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > CompletableFuture deleteStateAsync(String key, String tenant,
> > String ns, String name);
> >
> > /**
> >  * Increment the builtin distributed counter referred by key.
> >  *
> >  * @param keyThe name of the key
> >  * @param amount The amount to be incremented
> >  * @param tenant the state tenant name
> >  * @param ns the state namespace name
> >  * @param name the state store name
> >  */
> > void incrCounter(String key, long amount, String tenant, String ns,
> > String name);
> >
> >  

Re: [VOTE] PIP-117: Change Pulsar standalone defaults

2022-01-11 Thread Neng Lu
+1 (non-binding)

On Wed, Jan 5, 2022 at 7:19 AM Lan Liang  wrote:

> +1
>
>
>
>
>
>
> Best Regards,
> Lan Liang
> On 12/23/2021 19:21,Haiting Jiang wrote:
> +1
>
> Thanks,
> Haiting
>
> On 2021/12/23 05:35:03 Michael Marshall wrote:
> +1
>
> - Michael
>
> On Wed, Dec 22, 2021 at 6:18 PM Sijie Guo  wrote:
>
> +1
>
> On Tue, Dec 21, 2021 at 3:49 PM Matteo Merli  wrote:
>
> This is the voting thread for PIP-117. It will stay open for at least 48h.
>
> https://github.com/apache/pulsar/issues/13302
>
> 
>
> ## Motivation
>
> Pulsar standalone is the "Pulsar in a box" version of a Pulsar cluster,
> where
> all the components are started within the context of a single JVM process.
>
> Users are using the standalone as a way to get quickly started with Pulsar
> or
> in all the cases where it makes sense to have a single node deployment.
>
> Right now, the standalone is starting by default with many components,
> several of
> which are quite complex, since they are designed to be deployed in a
> distributed
> fashion.
>
> ## Goal
>
> Simplify the components of Pulsar standalone to achieve:
>
> 1. Reduce complexity
> 2. Reduce startup time
> 3. Reduce memory and CPU footprint of running standalone
>
> ## Proposed changes
>
> The proposal here is to change some of the default implementations that are
> used for the Pulsar standalone.
>
> 1. **Metadata Store implementation** -->
> Change from ZooKeeper to RocksDB
>
> 2. **Pulsar functions package backend** -->
> Change from using DistributedLog to using local filesystem, storing
> the
> jars directly in the data folder instead of uploading them into BK.
>
> 3. **Pulsar functions state store implementation** -->
> Change the state store to be backed by a MetadataStore based backed,
> with the RocksDB implementation.
>
> 4. **Table Service** -->
> Do not start BK table service by default
>
> ## Compatibility considerations
>
> In order to avoid compatibility issues where users have existing Pulsar
> standalone services that they want to upgrade without conflicts, we will
> follow the principle of keeping the old defaults where there is existing
> data on the disk.
>
> We will add a file, serving the purpose as a flag, in the `data/standalone`
> directory, for example `new-2.10-defaults`.
>
> If the file is present, or if the data directory is completely missing, we
> will adopt the new set of default configuration settings.
>
> If the file is not there, we will continue to use existing defaults and we
> will
> not break the upgrade operation.
>
>
> --
> Matteo Merli
> 
>
>
>

-- 
Best Regards,
Neng


Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States

2022-01-11 Thread Neng Lu
Before we advance further, we first need to get on the same page of the
pros and cons of allowing this feature.

If functions can access (especially the write access) other functions'
state, the data ownership will be a mess, isolation is broken and data
security might be compromised.





On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill 
wrote:

> Original PIP: https://github.com/apache/pulsar/issues/13633
>
> Pasted below for quoting convenience.
>
> -
>
> ## Motivation
>
> Certain uses of Pulsar functions could benefit from the ability to access
> the states of other functions. Currently functions can only access their
> own states, and so sharing information between functions requires other
> solutions such as writing to a separate database.
>
> ## Goal
>
> The goal is to enable the ability for a function to access another
> function's state. Given another function's tenant, namespace, and name, any
> function should be able to access the other function's state for read and
> write purposes. This PIP is not concerned with expanding the capabilities
> of function states, It only deals with expanding access to function states.
>
> ## API Changes
>
> The Pulsar function API would be modified to have the function context
> implement the following interface for accessing function states using a
> tenant, namespace, and name.
>
> ```
> public interface SharedContext {
> /**
>  * Update the state value for the key.
>  *
>  * @param key   name of the key
>  * @param value state value of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> void putState(String key, ByteBuffer value, String tenant, String ns,
> String name);
>
> /**
>  * Update the state value for the key, but don't wait for the
> operation to be completed
>  *
>  * @param key   name of the key
>  * @param value state value of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> CompletableFuture putStateAsync(String key, ByteBuffer value,
> String tenant, String ns, String name);
>
> /**
>  * Retrieve the state value for the key.
>  *
>  * @param key name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  * @return the state value for the key.
>  */
> ByteBuffer getState(String key, String tenant, String ns, String name);
>
> /**
>  * Retrieve the state value for the key, but don't wait for the
> operation to be completed
>  *
>  * @param key name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  * @return the state value for the key.
>  */
> CompletableFuture getStateAsync(String key, String tenant,
> String ns, String name);
>
> /**
>  * Delete the state value for the key.
>  *
>  * @param key   name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> void deleteState(String key, String tenant, String ns, String name);
>
> /**
>  * Delete the state value for the key, but don't wait for the
> operation to be completed
>  *
>  * @param key   name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> CompletableFuture deleteStateAsync(String key, String tenant,
> String ns, String name);
>
> /**
>  * Increment the builtin distributed counter referred by key.
>  *
>  * @param keyThe name of the key
>  * @param amount The amount to be incremented
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> void incrCounter(String key, long amount, String tenant, String ns,
> String name);
>
> /**
>  * Increment the builtin distributed counter referred by key
>  * but dont wait for the completion of the increment operation
>  *
>  * @param keyThe name of the key
>  * @param amount The amount to be incremented
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> CompletableFuture incrCounterAsync(String key, long amount,
> String tenant, String ns, String name);
>
> /**
>  * Retrieve the counter value for the key.
>  *
>  * @param key name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  * @return the amount of the counter value for this key
>  */
> long getCounter(String key, 

Re: [VOTE] PIP-121: Pulsar cluster level auto failover on client side

2022-01-11 Thread Neng Lu
+1 (non-binding)

On Mon, Jan 10, 2022 at 12:40 AM PengHui Li  wrote:

> +1 (binding)
>
> Penghui
>
> On Mon, Jan 10, 2022 at 4:38 PM Enrico Olivelli 
> wrote:
>
> > +1 (binding)
> >
> > Enrico
> >
> > Il giorno lun 10 gen 2022 alle ore 07:45 Hang Chen
> >  ha scritto:
> > >
> > > This is the voting thread for PIP-121. It will stay open for at least
> 48
> > > hours.
> > >
> > > https://github.com/apache/pulsar/issues/13315
> > >
> > > Pasted below for quoting convenience.
> > >
> > > -
> > > ### Motivation
> > > We have geo-replication to support Pulsar cluster level failover. We
> > > can set up Pulsar cluster A as a primary cluster in data center A, and
> > > setup Pulsar cluster B as backup cluster in data center B. Then we
> > > configure geo-replication between cluster A and cluster B. All the
> > > clients are connected to the Pulsar cluster by DNS. If cluster A is
> > > down, we should switch the DNS to point the target Pulsar cluster from
> > > cluster A to cluster B. After the clients are resolved to cluster B,
> > > they can produce and consume messages normally. After cluster A
> > > recovers, the administrator should switch the DNS back to cluster A.
> > >
> > > However, the current method has two shortcomings.
> > > 1. The administrator should monitor the status of all Pulsar clusters,
> > > and switch the DNS as soon as possible when cluster A is down. The
> > > switch and recovery is not automatic and recovery time is controlled
> > > by the administrator, which will put the administrator under heavy
> > > load.
> > > 2. The Pulsar client and DNS system have a cache. When the
> > > administrator switches the DNS from cluster A to Cluster B, it will
> > > take some time for cache trigger timeout, which will delay client
> > > recovery time and lead to the product/consumer message failing.
> > >
> > > ### Goal
> > > It's better to provide an automatic cluster level failure recovery
> > > mechanism to make pulsar cluster failover more effective. We should
> > > support pulsar clients auto switching from cluster A to cluster B when
> > > it detects cluster A has been down according to the configured
> > > detecting policy and switch back to cluster A when it has recovered.
> > > The reason why we should switch back to cluster A is that most
> > > applications may be deployed in data center A and they have low
> > > network cost for communicating with pulsar cluster A. If they keep
> > > visiting pulsar cluster B, they have high network cost, and cause high
> > > produce/consume latency.
> > >
> > > In order to improve the DNS cache problem, we should provide an
> > > administrator controlled switch provider for administrators to update
> > > service URLs.
> > >
> > > In the end, we should provide an auto service URL switch provider and
> > > administrator controlled switch provider.
> > >
> > > ### Design
> > > We have already provided the `ServiceUrlProvider` interface to support
> > > different service URLs. In order to support automatic cluster level
> > > failure auto recovery, we can provide different ServiceUrlProvider
> > > implementations. For current requirements, we can provide
> > > `AutoClusterFailover` and `ControlledClusterFailover`.
> > >
> > >  AutoClusterFailover
> > > In order to support auto switching from the primary cluster to the
> > > secondary, we can provide a probe task, which will probe the activity
> > > of the primary cluster and the secondary one. When it finds the
> > > primary cluster failed more than `failoverDelayMs`, it will switch to
> > > the secondary cluster by calling `updateServiceUrl`. After switching
> > > to the secondary cluster, the `AutoClusterFailover` will continue to
> > > probe the primary cluster. If the primary cluster comes back and
> > > remains active for `switchBackDelayMs`, it will switch back to the
> > > primary cluster.
> > > The APIs are listed as follows.
> > >
> > > In order to support multiple secondary clusters, use List to store
> > > secondary cluster urls. When the primary cluster probe fails for
> > > failoverDelayMs, it will start to probe the secondary cluster list one
> > > by one, once it finds the active cluster, it will switch to the target
> > > cluster. Notice: If you configured multiple clusters, you should turn
> > > on cluster level geo-replication to ensure the topic data sync between
> > > all primary and secondary clusters. Otherwise, it may distribute the
> > > topic data into different clusters. And the consumers won’t get the
> > > whole data of the topic.
> > >
> > > In order to support different authentication configurations between
> > > clusters, we provide the authentication relation configurations
> > > updated with the target cluster.
> > >
> > > ```Java
> > > public class AutoClusterFailover implements ServiceUrlProvider {
> > >
> > >private AutoClusterFailover(AutoClusterFailoverBuilderImpl builder)
> {
> > > //
> > > }
> > >
> > > @Override
> > > public void 

Re: [VOTE] PIP-122: Change loadBalancer default loadSheddingStrategy to ThresholdShedder

2022-01-11 Thread Neng Lu
+1 (non-binding)

On Mon, Jan 10, 2022 at 12:39 AM Enrico Olivelli 
wrote:

> +1 (binding)
>
> Enrico
>
> Il giorno lun 10 gen 2022 alle ore 07:47 Hang Chen
>  ha scritto:
>
> >
> > This is the voting thread for PIP-122. It will stay open for at least 48
> > hours.
> >
> > https://github.com/apache/pulsar/issues/13340
> >
> > Pasted below for quoting convenience.
> >
> > 
> >
> > ### Motivation
> > The ThresholdShedder load balance policy since Pulsar 2.6.0 by
> > https://github.com/apache/pulsar/pull/6772. It can resolve many load
> > balance issues of `OverloadShedder` and works well in many Pulsar
> > production clusters.
> >
> > In Pulsar 2.6.0, 2.7.0, 2.8.0 and 2.9.0, Pulsar's default load balance
> > policy is `OverloadShedder`.
> >
> > I think it's a good time for 2.10 to change default load balance
> > policy to `ThresholdShedder`, it will make throughput more balance
> > between brokers.
> >
> > ### Proposed Changes
> > In 2.10 release,for `broker.conf`, change
> > `loadBalancerLoadSheddingStrategy` from
> > `org.apache.pulsar.broker.loadbalance.impl.OverloadShedder` to
> > `org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
>