[VOTE] Pulsar Client C++ Release 3.4.0 Candidate 2
This is the second release candidate for Apache Pulsar Client C++, version 3.4.0. It fixes the following issues: https://github.com/apache/pulsar-client-cpp/milestone/5?closed=1 *** Please download, test and vote on this release. This vote will stay open for at least 72 hours *** Note that we are voting upon the source (tag), binaries are provided for convenience. Source and binary files: https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-cpp/pulsar-client-cpp-3.4.0-candidate-2/ SHA-512 checksums: 10517590a2e4296d6767a044e58dd32c79e404a5136cf41126f4cb2416ed0ef8fb1ad7aa7da54c37c12ed71b05a527ed08bfac9d50d3550fa5d475c1e8c00950 apache-pulsar-client-cpp-3.4.0.tar.gz The tag to be voted upon: v3.4.0-candidate-2 (f337eff7caae93730ec1260810655cbb5a345e70) https://github.com/apache/pulsar-client-cpp/releases/tag/v3.4.0-candidate-2 Pulsar's KEYS file containing PGP keys you use to sign the release: https://downloads.apache.org/pulsar/KEYS Please download the source package, and follow the README to compile and test.
Re: [DISSCUSS] Don't retain null-key messages during topic compaction
Hi mattison, Thanks for your suggestion, I agree with you, we can add a configuration to smooth the migrate this change. Let's see if anyone else has any other ideas, and if everyone agrees with this approach, I'll implement it. Thanks, Cong Zhao On 2023/11/07 03:03:58 mattison chao wrote: > Hi, Cong > > IMO, Please do not break the previous directly. We can migrate it smoothly. > We can add a configuration and give the > Timeline of making this configuration default and removing it in the next > release version. > > For example: > > - Add configuration: `compactionRemainNullKey=true` by default (current > behaviour) > - Make `compactionRemainNullKey=false` default in the 3.2.0 > - Delete the configuration `compactionRemainNullKey` in 3.3.0. > > This approach will avoid breaking changes and give our users enough time to > migrate their usage. > > Plus, I think it’s fair to cherry-pick it to all the previous active > branches. > > > Thanks! > Mattison > > > > > On Nov 7, 2023, at 10:55, Cong Zhao wrote: > > > > Hi, Pulsar community > > > > Currently, we retain all null-key messages during topic compaction, which I > > don't think is necessary because when you use topic compaction, > > it means that you want to retain the value according to the key, so > > retaining null-key messages is meaningless. > > > > Additionally, retaining all null-key messages will double the storage cost, > > and we'll never be able to clean them up since the compacted topic has not > > supported the retention policy yet. > > > > In summary, I don't think we should retain null-key messages during topic > > compaction. > > Looking forward to your feedback! > > > > Thanks, > > Cong Zhao > >
Re: [DISSCUSS] Don't retain null-key messages during topic compaction
Hi, Cong IMO, Please do not break the previous directly. We can migrate it smoothly. We can add a configuration and give the Timeline of making this configuration default and removing it in the next release version. For example: - Add configuration: `compactionRemainNullKey=true` by default (current behaviour) - Make `compactionRemainNullKey=false` default in the 3.2.0 - Delete the configuration `compactionRemainNullKey` in 3.3.0. This approach will avoid breaking changes and give our users enough time to migrate their usage. Plus, I think it’s fair to cherry-pick it to all the previous active branches. Thanks! Mattison > On Nov 7, 2023, at 10:55, Cong Zhao wrote: > > Hi, Pulsar community > > Currently, we retain all null-key messages during topic compaction, which I > don't think is necessary because when you use topic compaction, > it means that you want to retain the value according to the key, so > retaining null-key messages is meaningless. > > Additionally, retaining all null-key messages will double the storage cost, > and we'll never be able to clean them up since the compacted topic has not > supported the retention policy yet. > > In summary, I don't think we should retain null-key messages during topic > compaction. > Looking forward to your feedback! > > Thanks, > Cong Zhao
[DISSCUSS] Don't retain null-key messages during topic compaction
Hi, Pulsar community Currently, we retain all null-key messages during topic compaction, which I don't think is necessary because when you use topic compaction, it means that you want to retain the value according to the key, so retaining null-key messages is meaningless. Additionally, retaining all null-key messages will double the storage cost, and we'll never be able to clean them up since the compacted topic has not supported the retention policy yet. In summary, I don't think we should retain null-key messages during topic compaction. Looking forward to your feedback! Thanks, Cong Zhao
Re: [DISCUSS] PIP-310: Support custom publish rate limiters
Hello Lari, inline once again. On Mon, Nov 6, 2023 at 5:44 PM Lari Hotari wrote: > Hi Girish, > > Replies inline. We are getting into a very detailed discussion. We > could also discuss this topic in one of the upcoming Pulsar Community > meetings. However, I might miss the next meeting that is scheduled > this Thursday. > Is this every thursday? I am willing to meet at a separate time as well if enough folks with a viewpoint on this can meet together. I assume that the community meeting has a much bigger agenda with detailed discussions not possible? > Although I am currently opposing to your proposal PIP-310, I am > supporting solving your problems related to rate limiting. :) > Let's continue the discussion since that is necessary so that we could > make progress. I hope this makes sense from your perspective. > > It is all good, as long as the final goal is met within reasonable timelines. > > I acknowledge that there are different usages, but my assumption is > that we could implement a generic solution that could be configured to > handle each specific use case. > I haven't yet seen any evidence that the requirements in your case are > so special that it justifies adding a pluggable interface for rate > Well, the blacklisting use case is a very specific use case. I am explaining below why that can't be done using metrics and a separate blacklisting API. > limiters. Exposing yet another pluggable interface in Pulsar will add > complexity without gains. Each supported public interface is a > maintenance burden if we care about the quality of the exposed > interfaces and put effort in ensuring that the interfaces are > supported in future versions. Exposing an interface will also lock > down or slow down some future refactorings. > This actually might be a blessing in disguise, at least for RateLimiter and PublishRateLimiter.java, being an internal interface, it has gone out of hand and unchecked. Explained more below. > One concrete example of this is the desired behavior of bursting. In > token bucket rate limiting, bursting is about using the buffered > tokens in the "token bucket" and having a configurable limit for the > buffer (the "bucket"). This buffer will usually only contain tokens > when the actual rate has been lower than the configured maximum rate > for some duration. > > However, there could be an expectation for a different type of > bursting which is more like "auto scaling" of the rate limit in a way > where the end-to-end latency of the produced messages > is taken into account. The expected behavior might be about scaling > the rate temporarily to a higher rate so that the queues can be > I would like to keep auto-scaling out of scope for this discussion. That opens up another huge can of worms, specially given the gaps in proper scale down support in pulsar. > > I don't know what "bursting" means for you. Would it be possible to > provide concrete examples of desired behavior? That would be very > helpful in making progress. > > Here are a few different use cases: - A producer(s) is producing at a near constant rate into a topic, with equal distribution among partitions. Due to a hiccup in their downstream component, the produce rate goes to 0 for a few seconds, and thus, to compensate, in the next few seconds, the produce rate tries to double up. - In a visitor based produce rate (where produce rate goes up in the day and goes down in the night, think in terms of popular website hourly view counts pattern) , there are cases when, due to certain external/internal triggers, the views - and thus - the produce rate spikes for a few minutes. It is also important to keep this in check so as to not allow bots to do DDOS into your system, while that might be a responsibility of an upstream system like API gateway, but we cannot be ignorant about that completely. - In streaming systems, where there are micro batches, there might be constant fluctuations in produce rate from time to time, based on batch failure or retries. In all of these situations, setting the throughput of the topic to be the absolute maximum of the various spikes observed during the day is very suboptimal. Moreover, in each of these situations, once bursting support is present in the system, it would also need to have proper checks in place to penalize the producers from trying to mis-use the system. In a true multi-tenant platform, this is very critical. Thus, blacklisting actually goes hand in hand here. Explained more below. > It's interesting that you mention that you would like to improve the > PublishRateLimiter interface. > How would you change it? > > The current interface of PublishRateLimiter has duplicate methods. I am assuming after an initial implementation (poller), the next implementation simply added more methods into the interface rather than actually using the ones already existing. For instance, there are both `tryAcquire` and
Re: [DISCUSS] PIP-310: Support custom publish rate limiters
Hi Girish, Replies inline. We are getting into a very detailed discussion. We could also discuss this topic in one of the upcoming Pulsar Community meetings. However, I might miss the next meeting that is scheduled this Thursday. Although I am currently opposing to your proposal PIP-310, I am supporting solving your problems related to rate limiting. :) Let's continue the discussion since that is necessary so that we could make progress. I hope this makes sense from your perspective. On Sat, 4 Nov 2023 at 17:53, Girish Sharma wrote: > > There are challenges in this. As explained in the PIP, there are several > different usages of rate limiter, stats, unloading, etc. While I am open to > having a burstable rate limiter in pulsar out of box, it might complicate > things considering backward compatibility etc. More on this below. > I acknowledge that there are different usages, but my assumption is that we could implement a generic solution that could be configured to handle each specific use case. I haven't yet seen any evidence that the requirements in your case are so special that it justifies adding a pluggable interface for rate limiters. Exposing yet another pluggable interface in Pulsar will add complexity without gains. Each supported public interface is a maintenance burden if we care about the quality of the exposed interfaces and put effort in ensuring that the interfaces are supported in future versions. Exposing an interface will also lock down or slow down some future refactorings. There will be a need to refactor and improve rate limiters as part of the flow control and back pressure improvements within the Pulsar broker. I'd rather keep the rate limiter internal interfaces an internal implementation detail instead of leaking the details to an exposed public interface. That's why we should primarily look for a generic solution. I hope we could put effort in looking into the characteristics of your requirements and attempt to sketch a design for a generic solution that could be configured for your purposes. > > The problems you are describing seem to be common to many Pulsar use cases, > > and therefore, I think they should be handled directly in Pulsar. > > > > I personally haven't seen many burstability related discussions; so this > feature might actually not be that useful for all current Pulsar users. In general there are not many advanced discussions on the mailing list about the Pulsar internals. It may also be difficult for others to recognize that specific behaviors could be resolved by enhancing flow control, back pressure, and rate limiting/throttling mechanisms. > I would personally suggest we tackle this problem in parts so that it's > available incrementally over versions rather than making the scope so big > that it takes pulsar 4.0 for these features to land. Sure, quick delivery time is the goal of everyone. Before talking about schedules, we should be able to discuss the use case and the design of the desired type of rate limiting and throttling in more depth. One concrete example of this is the desired behavior of bursting. In token bucket rate limiting, bursting is about using the buffered tokens in the "token bucket" and having a configurable limit for the buffer (the "bucket"). This buffer will usually only contain tokens when the actual rate has been lower than the configured maximum rate for some duration. However, there could be an expectation for a different type of bursting which is more like "auto scaling" of the rate limit in a way where the end-to-end latency of the produced messages is taken into account. The expected behavior might be about scaling the rate temporarily to a higher rate so that the queues can be cleared and that the latency of the messages being sent stay under a target latency. The current org.apache.pulsar.broker.service.PublishRateLimiter interface cannot control aspects that would be needed to handle this type of bursting where we actually need to scale up the rate limit based on end-to-end feedback . The PublishRateLimiter interface doesn't have feedback loops currently. I don't know what "bursting" means for you. Would it be possible to provide concrete examples of desired behavior? That would be very helpful in making progress. > Yes, we were also thinking on the same terms once this is pluggable. The > idea was to have some numbers and real world usage backing an > implementation of rate limiter before merging it back into pulsar. Any > decision we would take right now would be limited only by theoretical > discussion of the implementation and our assumption that it covers 99% of > the use cases, probably just like how the precise and poller ones came into > being. This will remain theoretical discussion without concrete examples of desired behavior or what problem we want to solve. Initially, we need to have a good grasp on the problem we want to solve. The solution might change while we experiment and iterate. It will be
[DISSCUSS] PIP-317: Add `bookkeeperDeleted` field to show whether a ledger is deleted from the Bookie while using tiered storage
Hi community, The motivation behind this PIP is to provide administrators and users with better insights into the state of the ledgers and the overall storage usage. By including the `bookkeeperDeleted` field in the `ledgers`, we can make it easier for users to understand the current state of their ledgers, which can be helpful for monitoring and troubleshooting purposes. Hopes for discuss. PIP: https://github.com/apache/pulsar/pull/21521 Releted PR/issue: https://github.com/apache/pulsar/pull/20833 -- Best Regards, Shen Liu