[VOTE] Pulsar Client C++ Release 3.4.0 Candidate 2

2023-11-06 Thread Yunze Xu
This is the second release candidate for Apache Pulsar Client C++,
version 3.4.0.

It fixes the following issues:
https://github.com/apache/pulsar-client-cpp/milestone/5?closed=1

*** Please download, test and vote on this release. This vote will stay open
for at least 72 hours ***

Note that we are voting upon the source (tag), binaries are provided for
convenience.

Source and binary files:
https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-cpp/pulsar-client-cpp-3.4.0-candidate-2/

SHA-512 checksums:

10517590a2e4296d6767a044e58dd32c79e404a5136cf41126f4cb2416ed0ef8fb1ad7aa7da54c37c12ed71b05a527ed08bfac9d50d3550fa5d475c1e8c00950
 apache-pulsar-client-cpp-3.4.0.tar.gz


The tag to be voted upon:
v3.4.0-candidate-2 (f337eff7caae93730ec1260810655cbb5a345e70)
https://github.com/apache/pulsar-client-cpp/releases/tag/v3.4.0-candidate-2

Pulsar's KEYS file containing PGP keys you use to sign the release:
https://downloads.apache.org/pulsar/KEYS

Please download the source package, and follow the README to compile and test.


Re: [DISSCUSS] Don't retain null-key messages during topic compaction

2023-11-06 Thread Cong Zhao
Hi mattison,

Thanks for your suggestion, I agree with you, we can add a configuration to 
smooth the migrate this change.

Let's see if anyone else has any other ideas, and if everyone agrees with this 
approach, I'll implement it.

Thanks,
Cong Zhao

On 2023/11/07 03:03:58 mattison chao wrote:
> Hi, Cong
> 
> IMO, Please do not break the previous directly. We can migrate it smoothly.  
> We can add a configuration and give the
> Timeline of making this configuration default and removing it in the next 
> release version.
> 
> For example:
> 
> - Add configuration: `compactionRemainNullKey=true` by default (current 
> behaviour)
> - Make `compactionRemainNullKey=false` default  in the 3.2.0
> - Delete the configuration `compactionRemainNullKey` in 3.3.0.
> 
> This approach will avoid breaking changes and give our users enough time to 
> migrate their usage.
> 
> Plus, I think it’s fair to cherry-pick it to all the previous active 
> branches. 
> 
> 
> Thanks!
> Mattison
> 
> 
> 
> > On Nov 7, 2023, at 10:55, Cong Zhao  wrote:
> > 
> > Hi, Pulsar community
> > 
> > Currently, we retain all null-key messages during topic compaction, which I
> > don't think is necessary because when you use topic compaction,
> > it means that you want to retain the value according to the key, so
> > retaining null-key messages is meaningless.
> > 
> > Additionally, retaining all null-key messages will double the storage cost,
> > and we'll never be able to clean them up since the compacted topic has not
> > supported the retention policy yet.
> > 
> > In summary, I don't think we should retain null-key messages during topic
> > compaction.
> > Looking forward to your feedback!
> > 
> > Thanks,
> > Cong Zhao
> 
> 


Re: [DISSCUSS] Don't retain null-key messages during topic compaction

2023-11-06 Thread mattison chao
Hi, Cong

IMO, Please do not break the previous directly. We can migrate it smoothly.  We 
can add a configuration and give the
Timeline of making this configuration default and removing it in the next 
release version.

For example:

- Add configuration: `compactionRemainNullKey=true` by default (current 
behaviour)
- Make `compactionRemainNullKey=false` default  in the 3.2.0
- Delete the configuration `compactionRemainNullKey` in 3.3.0.

This approach will avoid breaking changes and give our users enough time to 
migrate their usage.

Plus, I think it’s fair to cherry-pick it to all the previous active branches. 


Thanks!
Mattison



> On Nov 7, 2023, at 10:55, Cong Zhao  wrote:
> 
> Hi, Pulsar community
> 
> Currently, we retain all null-key messages during topic compaction, which I
> don't think is necessary because when you use topic compaction,
> it means that you want to retain the value according to the key, so
> retaining null-key messages is meaningless.
> 
> Additionally, retaining all null-key messages will double the storage cost,
> and we'll never be able to clean them up since the compacted topic has not
> supported the retention policy yet.
> 
> In summary, I don't think we should retain null-key messages during topic
> compaction.
> Looking forward to your feedback!
> 
> Thanks,
> Cong Zhao



[DISSCUSS] Don't retain null-key messages during topic compaction

2023-11-06 Thread Cong Zhao
Hi, Pulsar community

Currently, we retain all null-key messages during topic compaction, which I
don't think is necessary because when you use topic compaction,
it means that you want to retain the value according to the key, so
retaining null-key messages is meaningless.

Additionally, retaining all null-key messages will double the storage cost,
and we'll never be able to clean them up since the compacted topic has not
supported the retention policy yet.

In summary, I don't think we should retain null-key messages during topic
compaction.
Looking forward to your feedback!

Thanks,
Cong Zhao


Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-06 Thread Girish Sharma
Hello Lari, inline once again.


On Mon, Nov 6, 2023 at 5:44 PM Lari Hotari  wrote:

> Hi Girish,
>
> Replies inline. We are getting into a very detailed discussion. We
> could also discuss this topic in one of the upcoming Pulsar Community
> meetings. However, I might miss the next meeting that is scheduled
> this Thursday.
>

Is this every thursday? I am willing to meet at a separate time as well if
enough folks with a viewpoint on this can meet together. I assume that the
community meeting has a much bigger agenda with detailed discussions not
possible?



> Although I am currently opposing to your proposal PIP-310, I am
> supporting solving your problems related to rate limiting. :)
> Let's continue the discussion since that is necessary so that we could
> make progress. I hope this makes sense from your perspective.
>
>
It is all good, as long as the final goal is met within reasonable
timelines.


>
> I acknowledge that there are different usages, but my assumption is
> that we could implement a generic solution that could be configured to
> handle each specific use case.
> I haven't yet seen any evidence that the requirements in your case are
> so special that it justifies adding a pluggable interface for rate
>

Well, the blacklisting use case is a very specific use case. I am
explaining below why that can't be done using metrics and a separate
blacklisting API.


> limiters. Exposing yet another pluggable interface in Pulsar will add
> complexity without gains. Each supported public interface is a
> maintenance burden if we care about the quality of the exposed
> interfaces and put effort in ensuring that the interfaces are
> supported in future versions. Exposing an interface will also lock
> down or slow down some future refactorings.
>

This actually might be a blessing in disguise, at least for RateLimiter and
PublishRateLimiter.java, being an internal interface, it has gone out of
hand and unchecked. Explained more below.


> One concrete example of this is the desired behavior of bursting. In
> token bucket rate limiting, bursting is about using the buffered
> tokens in the "token bucket" and having a configurable limit for the
> buffer (the "bucket"). This buffer will usually only contain tokens
> when the actual rate has been lower than the configured maximum rate
> for some duration.
>
> However, there could be an expectation for a different type of
> bursting which is more like "auto scaling" of the rate limit in a way
> where the end-to-end latency of the produced messages
> is taken into account. The expected behavior might be about scaling
> the rate temporarily to a higher rate so that the queues can be
>

I would like to keep auto-scaling out of scope for this discussion. That
opens up another huge can of worms, specially given the gaps in proper
scale down support in pulsar.


>
> I don't know what "bursting" means for you. Would it be possible to
> provide concrete examples of desired behavior? That would be very
> helpful in making progress.
>
>
Here are a few different use cases:

   - A producer(s) is producing at a near constant rate into a topic, with
   equal distribution among partitions. Due to a hiccup in their downstream
   component, the produce rate goes to 0 for a few seconds, and thus, to
   compensate, in the next few seconds, the produce rate tries to double up.
   - In a visitor based produce rate (where produce rate goes up in the day
   and goes down in the night, think in terms of popular website hourly view
   counts pattern) , there are cases when, due to certain external/internal
   triggers, the views - and thus - the produce rate spikes for a few minutes.
   It is also important to keep this in check so as to not allow bots to do
   DDOS into your system, while that might be a responsibility of an upstream
   system like API gateway, but we cannot be ignorant about that completely.
   - In streaming systems, where there are micro batches, there might be
   constant fluctuations in produce rate from time to time, based on batch
   failure or retries.

In all of these situations, setting the throughput of the topic to be the
absolute maximum of the various spikes observed during the day is very
suboptimal.

Moreover, in each of these situations, once bursting support is present in
the system, it would also need to have proper checks in place to penalize
the producers from trying to mis-use the system. In a true multi-tenant
platform, this is very critical. Thus, blacklisting actually goes hand in
hand here. Explained more below.



> It's interesting that you mention that you would like to improve the
> PublishRateLimiter interface.
> How would you change it?
>
>
The current interface of PublishRateLimiter has duplicate methods. I am
assuming after an initial implementation (poller), the next implementation
simply added more methods into the interface rather than actually using the
ones already existing.
For instance, there are both `tryAcquire` and 

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-06 Thread Lari Hotari
Hi Girish,

Replies inline. We are getting into a very detailed discussion. We
could also discuss this topic in one of the upcoming Pulsar Community
meetings. However, I might miss the next meeting that is scheduled
this Thursday.
Although I am currently opposing to your proposal PIP-310, I am
supporting solving your problems related to rate limiting. :)
Let's continue the discussion since that is necessary so that we could
make progress. I hope this makes sense from your perspective.

On Sat, 4 Nov 2023 at 17:53, Girish Sharma  wrote:
>
> There are challenges in this. As explained in the PIP, there are several
> different usages of rate limiter, stats, unloading, etc. While I am open to
> having a burstable rate limiter in pulsar out of box, it might complicate
> things considering backward compatibility etc. More on this below.
>

I acknowledge that there are different usages, but my assumption is
that we could implement a generic solution that could be configured to
handle each specific use case.
I haven't yet seen any evidence that the requirements in your case are
so special that it justifies adding a pluggable interface for rate
limiters. Exposing yet another pluggable interface in Pulsar will add
complexity without gains. Each supported public interface is a
maintenance burden if we care about the quality of the exposed
interfaces and put effort in ensuring that the interfaces are
supported in future versions. Exposing an interface will also lock
down or slow down some future refactorings.
There will be a need to refactor and improve rate limiters as part of
the flow control and back pressure improvements within the Pulsar
broker. I'd rather keep the rate limiter internal interfaces an
internal implementation detail instead of leaking the details to an
exposed public interface. That's why we should primarily look for a
generic solution. I hope we could put effort in looking into the
characteristics of your requirements and attempt to sketch a design
for a generic solution that could be configured for your purposes.

> > The problems you are describing seem to be common to many Pulsar use cases,
> > and therefore, I think they should be handled directly in Pulsar.
> >
>
> I personally haven't seen many burstability related discussions; so this
> feature might actually not be that useful for all current Pulsar users.

In general there are not many advanced discussions on the mailing list
about the Pulsar internals.
It may also be difficult for others to recognize that specific
behaviors could be resolved by enhancing flow control, back pressure,
and rate limiting/throttling mechanisms.

> I would personally suggest we tackle this problem in parts so that it's
> available incrementally over versions rather than making the scope so big
> that it takes pulsar 4.0 for these features to land.

Sure, quick delivery time is the goal of everyone. Before talking
about schedules, we should be able to discuss the use case and the
design of the desired type of rate limiting and throttling in more
depth.

One concrete example of this is the desired behavior of bursting. In
token bucket rate limiting, bursting is about using the buffered
tokens in the "token bucket" and having a configurable limit for the
buffer (the "bucket"). This buffer will usually only contain tokens
when the actual rate has been lower than the configured maximum rate
for some duration.

However, there could be an expectation for a different type of
bursting which is more like "auto scaling" of the rate limit in a way
where the end-to-end latency of the produced messages
is taken into account. The expected behavior might be about scaling
the rate temporarily to a higher rate so that the queues can be
cleared and that the latency of the messages being sent stay under a
target latency. The current
org.apache.pulsar.broker.service.PublishRateLimiter interface cannot
control aspects that would be needed to handle this type of bursting
where we actually need to scale up the rate limit based on end-to-end
feedback . The PublishRateLimiter interface doesn't have feedback
loops currently.

I don't know what "bursting" means for you. Would it be possible to
provide concrete examples of desired behavior? That would be very
helpful in making progress.

> Yes, we were also thinking on the same terms once this is pluggable. The
> idea was to have some numbers and real world usage backing an
> implementation of rate limiter before merging it back into pulsar. Any
> decision we would take right now would be limited only by theoretical
> discussion of the implementation and our assumption that it covers 99% of
> the use cases, probably just like how the precise and poller ones came into
> being.

This will remain theoretical discussion without concrete examples of
desired behavior or what problem we want to solve.
Initially, we need to have a good grasp on the problem we want to
solve. The solution might change while we experiment and iterate. It
will be 

[DISSCUSS] PIP-317: Add `bookkeeperDeleted` field to show whether a ledger is deleted from the Bookie while using tiered storage

2023-11-06 Thread 刘燊
Hi community,

The motivation behind this PIP is to provide administrators and users with 
better insights into the state of the ledgers and the overall storage usage. By 
including the `bookkeeperDeleted` field in the `ledgers`, we can make it easier 
for users to understand the current state of their ledgers, which can be 
helpful for monitoring and troubleshooting purposes.


Hopes for discuss.
PIP: https://github.com/apache/pulsar/pull/21521
Releted PR/issue: https://github.com/apache/pulsar/pull/20833




--

Best Regards,
Shen Liu