[GitHub] [pulsar-dotpulsar] blankensteiner commented on issue #105: Support - Custom authentication
blankensteiner commented on issue #105: URL: https://github.com/apache/pulsar-dotpulsar/issues/105#issuecomment-1149485733 Is this not covered by IAuthentication? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [DISCUSS] PIP-175: Extend time based release process
I'm not sure I fully understand the LTS release and feature release. > The LTS releases will be identified by being a `.0` version. For example: > * `3.0` -> LTS > * `3.1` -> regular release > * `3.2` -> regular release > * `4.0` -> LTS In this example, we can only introduce new features in 3.1 and 3.2, and 3.1.x and 3.2.x should be the patch release based on the feature release? We can have one patch release for a month. 3.0.x is the LTS release that will support at most three years. After we have 4.0 LTS release, we will still support 3.0.x TLS for at least 18 months. And 4.0 TLS will have all the new features from 3.x > This can be translated into: > * We support the last 2 LTS releases and the last 2 feature releases > * Security patches are provided for the past 3 LTS releases and 2 > feature releases Does this mean we can introduce new features in 3.x even if we have 4.x? And how many patch releases for the feature releases we will support, such as 3.1.x, 3.2.x, 3.3.x, 4.1.x, 4.2.x. I think 2 feature releases means 3.x and 4.x here? Thanks, Penghui On Wed, Jun 8, 2022 at 12:41 PM Michael Marshall wrote: > Thanks for putting together this PIP to continue this discussion, > Matteo. This is an important one. > > I'll need time to think over your points before I respond, but I want > to address two of them right away. > > > Actually, I was wrong. PIP-47 says the last 4 releases so 2.7 would be > included. > > The point though still remains in that there's nowhere in the website > > where a user could check until when the 2.7 release is going to be > > supported. > > We actually do have this documented on the website. I added this page > in February: > > https://pulsar.apache.org/docs/next/security-policy-and-supported-versions#supported-versions > . > > > we'd be releasing 2 LTSs super close between each other > > and we'd have to support 1 release more for the time being. > > I agree with this reasoning. If 2.10 is LTS, and I think it should be, > 2.11 shouldn't be LTS. > > Thanks, > Michael > > On Tue, Jun 7, 2022 at 6:39 PM Matteo Merli > wrote: > > > > > > There is a high cost to maintain a lot of old releases, backport bug > > > > fixes, and security patches. In general, we actively support the last > > > > 3 minor releases while continuing to develop the next release. E.g., > > > > 2.8, 2.9, and 2.10, while 2.11 is under development. > > > > > > Is 2.7 EOL? If so then we need to announce it explicitly. > > > > Actually, I was wrong. PIP-47 says the last 4 releases so 2.7 would be > included. > > The point though still remains in that there's nowhere in the website > > where a user could check until when the 2.7 release is going to be > > supported. > > > > > > We need to ensure that we have a date set in stone to deliver the > > > > release to users. > > > > > > I would like the new plan to address the delays in cherry picking > changes. These must never wait until a release is being made. We must keep > these up to date. If someone marks a PR for an older release then they are > volunteering to do the cherry pick within a few days. We need to be > prepared for a 0-day security release. > > > > I agree that this is a problem, though I'd prefer to keep it in a > > separate proposal, specifically targeted at the process for patch > > releases, to avoid putting too many things into a single discussion. > > > > > > The major version bump will not carry any special meaning in terms of > > > > "big features" included in the release or breaking API changes. > > > > Instead, it would simply signal the type of the release. > > > > > > From our existing release what is LTS? > > > > Good point, as we discussed earlier, 2.10 should be marked as LTS for > > being the last Java 8 release. I'll update the text to reflect this. > > > > > Does this mean that you are proposing the current Master as > release/3.0 or will it remain 2.11? > > > > I was actually not thinking of changing the denomination of 2.11. On > > one hand, it could make sense for being the first Java 17 release, but > > on the other, we'd be releasing 2 LTSs super close between each other > > and we'd have to support 1 release more for the time being. > > > > I'd like to hear more opinions here :) > > > > > > The support model will be: > > > > > > > > * LTS > > > > * Released every 18 months > > > > * Support for 24 months > > > > * Security patches for 36 months > > > > * Feature releases > > > > * Released every 3 months > > > > * Support for 6 months > > > > * Security patches for 6 months > > > > > > Are those times since the initial release? It would be helpful to have > a swim lane diagram. > > > > Yes, from the initial release (eg: 3.0.0) and yes we would have a > > clear diagram on the website. > > > > > > This can be translated into: > > > > * We support the last 2 LTS releases and the last 2 feature > releases > > > > * Security patches are provided for the past 3 LTS releases and 2 > > > > feature
Re: [DISCUSS] PIP-175: Extend time based release process
Thanks for putting together this PIP to continue this discussion, Matteo. This is an important one. I'll need time to think over your points before I respond, but I want to address two of them right away. > Actually, I was wrong. PIP-47 says the last 4 releases so 2.7 would be > included. > The point though still remains in that there's nowhere in the website > where a user could check until when the 2.7 release is going to be > supported. We actually do have this documented on the website. I added this page in February: https://pulsar.apache.org/docs/next/security-policy-and-supported-versions#supported-versions. > we'd be releasing 2 LTSs super close between each other > and we'd have to support 1 release more for the time being. I agree with this reasoning. If 2.10 is LTS, and I think it should be, 2.11 shouldn't be LTS. Thanks, Michael On Tue, Jun 7, 2022 at 6:39 PM Matteo Merli wrote: > > > > There is a high cost to maintain a lot of old releases, backport bug > > > fixes, and security patches. In general, we actively support the last > > > 3 minor releases while continuing to develop the next release. E.g., > > > 2.8, 2.9, and 2.10, while 2.11 is under development. > > > > Is 2.7 EOL? If so then we need to announce it explicitly. > > Actually, I was wrong. PIP-47 says the last 4 releases so 2.7 would be > included. > The point though still remains in that there's nowhere in the website > where a user could check until when the 2.7 release is going to be > supported. > > > > We need to ensure that we have a date set in stone to deliver the > > > release to users. > > > > I would like the new plan to address the delays in cherry picking changes. > > These must never wait until a release is being made. We must keep these up > > to date. If someone marks a PR for an older release then they are > > volunteering to do the cherry pick within a few days. We need to be > > prepared for a 0-day security release. > > I agree that this is a problem, though I'd prefer to keep it in a > separate proposal, specifically targeted at the process for patch > releases, to avoid putting too many things into a single discussion. > > > > The major version bump will not carry any special meaning in terms of > > > "big features" included in the release or breaking API changes. > > > Instead, it would simply signal the type of the release. > > > > From our existing release what is LTS? > > Good point, as we discussed earlier, 2.10 should be marked as LTS for > being the last Java 8 release. I'll update the text to reflect this. > > > Does this mean that you are proposing the current Master as release/3.0 or > > will it remain 2.11? > > I was actually not thinking of changing the denomination of 2.11. On > one hand, it could make sense for being the first Java 17 release, but > on the other, we'd be releasing 2 LTSs super close between each other > and we'd have to support 1 release more for the time being. > > I'd like to hear more opinions here :) > > > > The support model will be: > > > > > > * LTS > > > * Released every 18 months > > > * Support for 24 months > > > * Security patches for 36 months > > > * Feature releases > > > * Released every 3 months > > > * Support for 6 months > > > * Security patches for 6 months > > > > Are those times since the initial release? It would be helpful to have a > > swim lane diagram. > > Yes, from the initial release (eg: 3.0.0) and yes we would have a > clear diagram on the website. > > > > This can be translated into: > > > * We support the last 2 LTS releases and the last 2 feature releases > > > * Security patches are provided for the past 3 LTS releases and 2 > > > feature releases > > > > Please note that in the event of a security release that PMC members will > > generally need to do these in secret. > > No changes about that. This is only to set the user expectation for > how long they can expect the security patches. > > It doesn't change a comma on the PMC process of discussing such > releases, nor it would prevent doing additional security releases > outside of the "guaranteed" window. > > > What is the plan for bug fix / security releases on say 3.0? > > Since 3.0 would be LTS, based on the above-proposed table: 2y for bug > fixes - 3y for security patches
Re: [ANNOUNCE] New Committer: Dezhi Liu
Congrats! BR//Huanli > On Jun 8, 2022, at 10:25 AM, Yu wrote: > > Hi Dezhi, kudos to you! Well deserved! > > On Wed, Jun 8, 2022 at 9:52 AM Li Li wrote: > >> Congratulations Dezhi! >> >> Thanks, >> Li Li >> >>> On Jun 7, 2022, at 9:13 PM, PengHui Li wrote: >>> >>> Congratulations Dezhi! >>> >>> Penghui >>> On Jun 7, 2022, 17:22 +0800, Enrico Olivelli , >> wrote: Congratulations !! Enrico Il giorno mar 7 giu 2022 alle ore 10:57 Zike Yang ha >> scritto: > > Congratulations! > > Best Regards, > Zike Yang > > On Tue, Jun 7, 2022 at 3:52 PM ZhangJian He >> wrote: > >> Congratulations! >> >> Thanks >> ZhangJian He >> >> Haiting Jiang 于2022年6月7日周二 15:46写道: >> >>> Congrats! >>> >>> BR, >>> Haiting >>> >>> On 2022/06/07 06:46:00 Hang Chen wrote: The Project Management Committee (PMC) for Apache Pulsar has invited Dezhi Liu (https://github.com/liudezhi2098) to become a committer >> and we are pleased to announce that he has accepted. Dezhi Liu (with Github id liudezhi2098) contributed many >> improvements and bug fixes to Pulsar. Being a committer enables easier contribution to the project since there is no need to go via the patch submission process. This should enable better productivity. Welcome and Congratulations, Dezhi Liu! Please join us in congratulating and welcoming Dezhi Liu onboard! Best Regards, Hang Chen on behalf of the Pulsar PMC >>> >> >> >>
Re: [VOTE] PIP-166: Function add MANUAL delivery semantics
+1 Penghui On Jun 8, 2022, 09:32 +0800, Rui Fu , wrote: > +1 > > Best, > > Rui Fu > 在 2022年6月8日 +0800 04:51,Neng Lu ,写道: > > Hi All, > > > > +1 (non-binding) > > > > On Tue, Jun 7, 2022 at 5:42 AM Enrico Olivelli wrote: > > > > > I have left one last minute comment, can you please take a look ? then > > > I will post my +1 > > > > > > thanks > > > Enrico > > > > > > > > > -- > > Best Regards, > > Neng
[DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol
Hello Pulsar Community, Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol. I look forward to your feedback. PIP: https://github.com/apache/pulsar/issues/15859 Thanks, Cong Zhao
[GitHub] [pulsar-dotpulsar] RobertIndie opened a new issue, #105: Support - Custom authentication
RobertIndie opened a new issue, #105: URL: https://github.com/apache/pulsar-dotpulsar/issues/105 The Pulsar Java client has support custom authentcaition: https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/AuthenticationFactory.html#create-java.lang.String-java.lang.String- We can add support for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [ANNOUNCE] New Committer: Dezhi Liu
Hi Dezhi, kudos to you! Well deserved! On Wed, Jun 8, 2022 at 9:52 AM Li Li wrote: > Congratulations Dezhi! > > Thanks, > Li Li > > > On Jun 7, 2022, at 9:13 PM, PengHui Li wrote: > > > > Congratulations Dezhi! > > > > Penghui > > On Jun 7, 2022, 17:22 +0800, Enrico Olivelli , > wrote: > >> Congratulations !! > >> > >> Enrico > >> > >> Il giorno mar 7 giu 2022 alle ore 10:57 Zike Yang ha > scritto: > >>> > >>> Congratulations! > >>> > >>> Best Regards, > >>> Zike Yang > >>> > >>> On Tue, Jun 7, 2022 at 3:52 PM ZhangJian He > wrote: > >>> > Congratulations! > > Thanks > ZhangJian He > > Haiting Jiang 于2022年6月7日周二 15:46写道: > > > Congrats! > > > > BR, > > Haiting > > > > On 2022/06/07 06:46:00 Hang Chen wrote: > >> The Project Management Committee (PMC) for Apache Pulsar has invited > >> Dezhi Liu (https://github.com/liudezhi2098) to become a committer > and > >> we are pleased to announce that he has accepted. > >> > >> Dezhi Liu (with Github id liudezhi2098) contributed many > improvements > >> and bug fixes to Pulsar. > >> > >> Being a committer enables easier contribution to the project since > >> there is no need to go via the patch submission process. This should > >> enable better productivity. > >> > >> Welcome and Congratulations, Dezhi Liu! > >> > >> Please join us in congratulating and welcoming Dezhi Liu onboard! > >> > >> Best Regards, > >> Hang Chen on behalf of the Pulsar PMC > >> > > > > >
Re: [ANNOUNCE] New Committer: Dezhi Liu
Congratulations Dezhi! Thanks, Li Li > On Jun 7, 2022, at 9:13 PM, PengHui Li wrote: > > Congratulations Dezhi! > > Penghui > On Jun 7, 2022, 17:22 +0800, Enrico Olivelli , wrote: >> Congratulations !! >> >> Enrico >> >> Il giorno mar 7 giu 2022 alle ore 10:57 Zike Yang ha >> scritto: >>> >>> Congratulations! >>> >>> Best Regards, >>> Zike Yang >>> >>> On Tue, Jun 7, 2022 at 3:52 PM ZhangJian He wrote: >>> Congratulations! Thanks ZhangJian He Haiting Jiang 于2022年6月7日周二 15:46写道: > Congrats! > > BR, > Haiting > > On 2022/06/07 06:46:00 Hang Chen wrote: >> The Project Management Committee (PMC) for Apache Pulsar has invited >> Dezhi Liu (https://github.com/liudezhi2098) to become a committer and >> we are pleased to announce that he has accepted. >> >> Dezhi Liu (with Github id liudezhi2098) contributed many improvements >> and bug fixes to Pulsar. >> >> Being a committer enables easier contribution to the project since >> there is no need to go via the patch submission process. This should >> enable better productivity. >> >> Welcome and Congratulations, Dezhi Liu! >> >> Please join us in congratulating and welcoming Dezhi Liu onboard! >> >> Best Regards, >> Hang Chen on behalf of the Pulsar PMC >> >
Re: [VOTE] PIP-166: Function add MANUAL delivery semantics
+1 Best, Rui Fu 在 2022年6月8日 +0800 04:51,Neng Lu ,写道: > Hi All, > > +1 (non-binding) > > On Tue, Jun 7, 2022 at 5:42 AM Enrico Olivelli wrote: > > > I have left one last minute comment, can you please take a look ? then > > I will post my +1 > > > > thanks > > Enrico > > > > > -- > Best Regards, > Neng
Re: [DISCUSS] PIP-174: Provide new implementation for broker dispatch cache
+1 Great idea! Thanks, Hang Lari Hotari 于2022年6月8日周三 03:32写道: > > This is a very useful proposal. LGTM > > -Lari > > On Tue, Jun 7, 2022 at 3:48 AM Matteo Merli wrote: > > > https://github.com/apache/pulsar/issues/15954 > > > > WIP can be seen at: https://github.com/apache/pulsar/pull/15955 > > > > --- > > > > > > ## Motivation > > > > The current implementation of the read cache in the Pulsar broker has > > largely > > remained unchanged for a long time, except for a few minor tweaks. > > > > While the implementation is stable and reasonably efficient for > > typical workloads, > > the overhead required for managing the cache evictions in a broker > > that is running > > many topics can be pretty high in terms of extra CPU utilization and on > > the JVM > > garbage collection to track an increased number of medium-lived objects. > > > > The goal is to provide an alternative implementation that can adapt better > > to > > a wider variety of operating conditions. > > > > ### Current implementation details > > > > The broker cache is implemented as part of the `ManagedLedger` component, > > which sits in the Pulsar broker and provides a higher level of > > abstraction of top > > of BookKeeper. > > > > Each topic (and managed-ledger) has its own private cache space. This > > cache is implemented > > as a `ConcurrentSkipList` sorted map that maps `(ledgerId, entryId) -> > > payload`. The payload > > is a `ByteBuf` reference that can either be a slice of a `ByteBuf` that we > > got > > when reading from a socket, or it can be a copied buffer. > > > > Each topic cache is allowed to use the full broker max cache size before an > > eviction is triggered. The total cache size is effectively a resource > > shared across all > > the topics, where a topic can use a more prominent portion of it if it > > "asks for more". > > > > When the eviction happens, we need to do an expensive ranking of all > > the caches in the broker > > and do an eviction in a proportional way to the currently used space > > for each of them. > > > > The bigger problem is represented by the `ConcurrentSkipList` and the > > `ByteBuf` objects > > that need to be tracked. The skip list is essentially like a "tree" > > structure and needs to > > maintain Java objects for each entry in the cache. We also need to > > potentially have > > a huge number of ByteBuf objects. > > > > A cache workload is typically the worst-case scenario for each garbage > > collector implementation because it involves creating objects, storing > > them for some amount of > > time and then throwing them away. During that time, the GC would have > > already tenured these > > objects and copy them into an "old generation" space, and sometime > > later, a costly compaction > > of that memory would have to be performed. > > > > To mitigate the effect of the cache workload on the GC, we're being > > very aggressive in > > purging the cache by triggering time-based eviction. By putting a max > > TTL on the elements in > > the cache, we can avoid keeping the objects around for too long to be > > a problem for the GC. > > > > The reverse side of this is that we're artificially reducing the cache > > capacity to a very > > short time frame, reducing the cache usefulness. > > > > The other problem is the CPU cost involved in doing these frequent > > evictions, which can > > be very high when there are 10s of thousands of topics in a broker. > > > > > > ## Proposed changes > > > > Instead of dealing with individual caches for each topic, let's adopt > > a model where > > there is a single cache space for the broker. > > > > This cache is broken into N segments which act as a circular buffer. > > Whenever a segment > > is full, we start writing into the next one, and when we reach the > > last one, we will > > restart recycling the first segment. > > > > Each segment is composed of a buffer, an offset, and a hashmap which maps > > `(ledgerId, entryId) -> offset`. > > > > This model has been working very well for the BookKeeper `ReadCache`: > > > > https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/ReadCache.java > > > > There are two main advantages to this approach: > > > > 1. Entries are copied into the cache buffer (in direct memory), and > > we don't need to keep any > > long-lived Java objects around > > 2. The eviction becomes a completely trivial operation, buffers are > > just rotated and > > overwritten. We don't need to do any per-topic task or keep track > > of utilization. > > > > ### API changes > > > > No user-facing API changes are required. > > > > ### New configuration options > > > > The existing cache implementation will not be removed at this point. Users > > will > > be able to configure the old implementation in `broker.conf`. > > > > This option will be helpful in case of performance regressions would be > > seen for > > some use cases with the new cache
Re: [DISCUSS] PIP-175: Extend time based release process
> > There is a high cost to maintain a lot of old releases, backport bug > > fixes, and security patches. In general, we actively support the last > > 3 minor releases while continuing to develop the next release. E.g., > > 2.8, 2.9, and 2.10, while 2.11 is under development. > > Is 2.7 EOL? If so then we need to announce it explicitly. Actually, I was wrong. PIP-47 says the last 4 releases so 2.7 would be included. The point though still remains in that there's nowhere in the website where a user could check until when the 2.7 release is going to be supported. > > We need to ensure that we have a date set in stone to deliver the > > release to users. > > I would like the new plan to address the delays in cherry picking changes. > These must never wait until a release is being made. We must keep these up to > date. If someone marks a PR for an older release then they are volunteering > to do the cherry pick within a few days. We need to be prepared for a 0-day > security release. I agree that this is a problem, though I'd prefer to keep it in a separate proposal, specifically targeted at the process for patch releases, to avoid putting too many things into a single discussion. > > The major version bump will not carry any special meaning in terms of > > "big features" included in the release or breaking API changes. > > Instead, it would simply signal the type of the release. > > From our existing release what is LTS? Good point, as we discussed earlier, 2.10 should be marked as LTS for being the last Java 8 release. I'll update the text to reflect this. > Does this mean that you are proposing the current Master as release/3.0 or > will it remain 2.11? I was actually not thinking of changing the denomination of 2.11. On one hand, it could make sense for being the first Java 17 release, but on the other, we'd be releasing 2 LTSs super close between each other and we'd have to support 1 release more for the time being. I'd like to hear more opinions here :) > > The support model will be: > > > > * LTS > > * Released every 18 months > > * Support for 24 months > > * Security patches for 36 months > > * Feature releases > > * Released every 3 months > > * Support for 6 months > > * Security patches for 6 months > > Are those times since the initial release? It would be helpful to have a swim > lane diagram. Yes, from the initial release (eg: 3.0.0) and yes we would have a clear diagram on the website. > > This can be translated into: > > * We support the last 2 LTS releases and the last 2 feature releases > > * Security patches are provided for the past 3 LTS releases and 2 > > feature releases > > Please note that in the event of a security release that PMC members will > generally need to do these in secret. No changes about that. This is only to set the user expectation for how long they can expect the security patches. It doesn't change a comma on the PMC process of discussing such releases, nor it would prevent doing additional security releases outside of the "guaranteed" window. > What is the plan for bug fix / security releases on say 3.0? Since 3.0 would be LTS, based on the above-proposed table: 2y for bug fixes - 3y for security patches
Re: [DISCUSS] PIP-175: Extend time based release process
Hi - Interesting, some responses inline. > On Jun 7, 2022, at 3:25 PM, Matteo Merli wrote: > > https://github.com/apache/pulsar/issues/15966 > > > > ## Motivation > > In PIP-47 > (https://github.com/apache/pulsar/wiki/PIP-47:-Time-Based-Release-Plan), > we have adopted a time-based release plan. This was the first attempt > at establishing a new principle on how releases should b > > The main two benefits of this approach have been: > > 1. Clarity for users and developers on when to expect a release > 2. Breaking a hard relationship between feature and release: a > particular feature will be included in the release if it is completed > in time. Otherwise, it will be bubbled up to the next release. > > The motivation for the current proposal is to extend the existing > process to address the issues that we have seen and that were left out > of the scope of PIP-47. > > ## Summary of existing issues in the process > > ### Short maintenance cycles for releases > > Since we're doing a 3 months release cycle, we are ending with 4 > releases done per year, even though it's more close to 3 releases. > > There is a high cost to maintain a lot of old releases, backport bug > fixes, and security patches. In general, we actively support the last > 3 minor releases while continuing to develop the next release. E.g., > 2.8, 2.9, and 2.10, while 2.11 is under development. Is 2.7 EOL? If so then we need to announce it explicitly. > > The result is that a user adopting a particular release is forced to > upgrade in a < 1-year timeframe to keep up to date and use a supported > release. This timeframe is too short for many users as it imposes a > lot of forced upgrades, for which they are not prepared in terms of > available time and required effort. > > ### Live Upgrade/Downgrade compatibility path > > In Pulsar, we guarantee that users have a way to do live upgrades and > downgrades with zero downtime. > > This is very powerful because it gives them the freedom to upgrade to > a new release with the assurance of being able to roll back to the > previous release in case any functional or performance regressions are > encountered. > > Today, this compatibility is guaranteed across minor versions. Eg: I > can do `2.7 -> 2.8 -> 2.7` as a live upgrade. > > What is not guaranteed is to "skip" releases. E.g.: `2.7 -> 2.9` might > work or not, but it's not guaranteed. In that case an intermediated > upgrade would be required: `2.7 -> 2.8 -> 2.9`. > > The reasons for which the "skip" upgrade might not work are multiple: > 1. Incompatible upgrade of some dependency (e.g., ZooKeeper) that > might not be compatible with an older version. > 2. Adoption of a new metadata format or data format on disk. > Every time we introduce a new incompatible format change (outside > of a regular Protobuf field addition), we do it in a 2 steps way: > - In a new release, we introduce the new feature/format, > disabled by default. The new release can read both old and new > formats, though it keeps writing the old format by default. > - In a subsequent release, we change the default to the new format > > Note that this consideration is separate from the compatibility > between clients and brokers, where we ***never*** break compatibility. > The oldest available Pulsar client can still talk with the newest > Pulsar broker, and vice versa, a new client, will be perfectly fine > with an older broker (except the new features won't be working). > > ### Releases getting delayed > > Another problem we have been experiencing is that release cycles have > been stretching considerably. Part of this has been because we have > been reaching the end of the release window, preparing a candidate, > and then taking a long time to flush out all issues found at the last > minute in the new release. > > We need to ensure that we have a date set in stone to deliver the > release to users. I would like the new plan to address the delays in cherry picking changes. These must never wait until a release is being made. We must keep these up to date. If someone marks a PR for an older release then they are volunteering to do the cherry pick within a few days. We need to be prepared for a 0-day security release. > > ## Proposal > > The proposal to address the above issues is composed of 2 parts. > > ### 1. Establish Long Term Support releases > > We need to provide a way for users to quickly understand the expected > lifecycle timeline of a given release and for that timeline to be long > enough not to be a constant update mandate. > > At the same time, we need to ensure that we maintainers are not > spending all the time just maintaining a huge list of old releases. > > For that, we can use the established concept of "Long Term Releases" or LTS. > > We will perform LTS releases at a fixed cadence every 18 months, and > we will keep doing regular feature releases every 3 months as we're > currently doing. > >
[DISCUSS] PIP-175: Extend time based release process
https://github.com/apache/pulsar/issues/15966 ## Motivation In PIP-47 (https://github.com/apache/pulsar/wiki/PIP-47:-Time-Based-Release-Plan), we have adopted a time-based release plan. This was the first attempt at establishing a new principle on how releases should b The main two benefits of this approach have been: 1. Clarity for users and developers on when to expect a release 2. Breaking a hard relationship between feature and release: a particular feature will be included in the release if it is completed in time. Otherwise, it will be bubbled up to the next release. The motivation for the current proposal is to extend the existing process to address the issues that we have seen and that were left out of the scope of PIP-47. ## Summary of existing issues in the process ### Short maintenance cycles for releases Since we're doing a 3 months release cycle, we are ending with 4 releases done per year, even though it's more close to 3 releases. There is a high cost to maintain a lot of old releases, backport bug fixes, and security patches. In general, we actively support the last 3 minor releases while continuing to develop the next release. E.g., 2.8, 2.9, and 2.10, while 2.11 is under development. The result is that a user adopting a particular release is forced to upgrade in a < 1-year timeframe to keep up to date and use a supported release. This timeframe is too short for many users as it imposes a lot of forced upgrades, for which they are not prepared in terms of available time and required effort. ### Live Upgrade/Downgrade compatibility path In Pulsar, we guarantee that users have a way to do live upgrades and downgrades with zero downtime. This is very powerful because it gives them the freedom to upgrade to a new release with the assurance of being able to roll back to the previous release in case any functional or performance regressions are encountered. Today, this compatibility is guaranteed across minor versions. Eg: I can do `2.7 -> 2.8 -> 2.7` as a live upgrade. What is not guaranteed is to "skip" releases. E.g.: `2.7 -> 2.9` might work or not, but it's not guaranteed. In that case an intermediated upgrade would be required: `2.7 -> 2.8 -> 2.9`. The reasons for which the "skip" upgrade might not work are multiple: 1. Incompatible upgrade of some dependency (e.g., ZooKeeper) that might not be compatible with an older version. 2. Adoption of a new metadata format or data format on disk. Every time we introduce a new incompatible format change (outside of a regular Protobuf field addition), we do it in a 2 steps way: - In a new release, we introduce the new feature/format, disabled by default. The new release can read both old and new formats, though it keeps writing the old format by default. - In a subsequent release, we change the default to the new format Note that this consideration is separate from the compatibility between clients and brokers, where we ***never*** break compatibility. The oldest available Pulsar client can still talk with the newest Pulsar broker, and vice versa, a new client, will be perfectly fine with an older broker (except the new features won't be working). ### Releases getting delayed Another problem we have been experiencing is that release cycles have been stretching considerably. Part of this has been because we have been reaching the end of the release window, preparing a candidate, and then taking a long time to flush out all issues found at the last minute in the new release. We need to ensure that we have a date set in stone to deliver the release to users. ## Proposal The proposal to address the above issues is composed of 2 parts. ### 1. Establish Long Term Support releases We need to provide a way for users to quickly understand the expected lifecycle timeline of a given release and for that timeline to be long enough not to be a constant update mandate. At the same time, we need to ensure that we maintainers are not spending all the time just maintaining a huge list of old releases. For that, we can use the established concept of "Long Term Releases" or LTS. We will perform LTS releases at a fixed cadence every 18 months, and we will keep doing regular feature releases every 3 months as we're currently doing. The LTS releases will be identified by being a `.0` version. For example: * `3.0` -> LTS * `3.1` -> regular release * `3.2` -> regular release * `4.0` -> LTS The major version bump will not carry any special meaning in terms of "big features" included in the release or breaking API changes. Instead, it would simply signal the type of the release. Compatibility between releases It will be guaranteed to be able to do a live upgrade/downgrade between one LTS and the next one. For example: * `3.0 -> 4.0 -> 3.0` : OK * `3.2 -> 4.0 -> 3.2` : OK * `3.2 -> 4.4 -> 3.2` : OK * `3.2 -> 5.0` : Not OK Release support expectation We will publish clear guidelines on the
Re: [VOTE] PIP-166: Function add MANUAL delivery semantics
Hi All, +1 (non-binding) On Tue, Jun 7, 2022 at 5:42 AM Enrico Olivelli wrote: > I have left one last minute comment, can you please take a look ? then > I will post my +1 > > thanks > Enrico > -- Best Regards, Neng
Re: [DISCUSS] PIP-174: Provide new implementation for broker dispatch cache
This is a very useful proposal. LGTM -Lari On Tue, Jun 7, 2022 at 3:48 AM Matteo Merli wrote: > https://github.com/apache/pulsar/issues/15954 > > WIP can be seen at: https://github.com/apache/pulsar/pull/15955 > > --- > > > ## Motivation > > The current implementation of the read cache in the Pulsar broker has > largely > remained unchanged for a long time, except for a few minor tweaks. > > While the implementation is stable and reasonably efficient for > typical workloads, > the overhead required for managing the cache evictions in a broker > that is running > many topics can be pretty high in terms of extra CPU utilization and on > the JVM > garbage collection to track an increased number of medium-lived objects. > > The goal is to provide an alternative implementation that can adapt better > to > a wider variety of operating conditions. > > ### Current implementation details > > The broker cache is implemented as part of the `ManagedLedger` component, > which sits in the Pulsar broker and provides a higher level of > abstraction of top > of BookKeeper. > > Each topic (and managed-ledger) has its own private cache space. This > cache is implemented > as a `ConcurrentSkipList` sorted map that maps `(ledgerId, entryId) -> > payload`. The payload > is a `ByteBuf` reference that can either be a slice of a `ByteBuf` that we > got > when reading from a socket, or it can be a copied buffer. > > Each topic cache is allowed to use the full broker max cache size before an > eviction is triggered. The total cache size is effectively a resource > shared across all > the topics, where a topic can use a more prominent portion of it if it > "asks for more". > > When the eviction happens, we need to do an expensive ranking of all > the caches in the broker > and do an eviction in a proportional way to the currently used space > for each of them. > > The bigger problem is represented by the `ConcurrentSkipList` and the > `ByteBuf` objects > that need to be tracked. The skip list is essentially like a "tree" > structure and needs to > maintain Java objects for each entry in the cache. We also need to > potentially have > a huge number of ByteBuf objects. > > A cache workload is typically the worst-case scenario for each garbage > collector implementation because it involves creating objects, storing > them for some amount of > time and then throwing them away. During that time, the GC would have > already tenured these > objects and copy them into an "old generation" space, and sometime > later, a costly compaction > of that memory would have to be performed. > > To mitigate the effect of the cache workload on the GC, we're being > very aggressive in > purging the cache by triggering time-based eviction. By putting a max > TTL on the elements in > the cache, we can avoid keeping the objects around for too long to be > a problem for the GC. > > The reverse side of this is that we're artificially reducing the cache > capacity to a very > short time frame, reducing the cache usefulness. > > The other problem is the CPU cost involved in doing these frequent > evictions, which can > be very high when there are 10s of thousands of topics in a broker. > > > ## Proposed changes > > Instead of dealing with individual caches for each topic, let's adopt > a model where > there is a single cache space for the broker. > > This cache is broken into N segments which act as a circular buffer. > Whenever a segment > is full, we start writing into the next one, and when we reach the > last one, we will > restart recycling the first segment. > > Each segment is composed of a buffer, an offset, and a hashmap which maps > `(ledgerId, entryId) -> offset`. > > This model has been working very well for the BookKeeper `ReadCache`: > > https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/ReadCache.java > > There are two main advantages to this approach: > > 1. Entries are copied into the cache buffer (in direct memory), and > we don't need to keep any > long-lived Java objects around > 2. The eviction becomes a completely trivial operation, buffers are > just rotated and > overwritten. We don't need to do any per-topic task or keep track > of utilization. > > ### API changes > > No user-facing API changes are required. > > ### New configuration options > > The existing cache implementation will not be removed at this point. Users > will > be able to configure the old implementation in `broker.conf`. > > This option will be helpful in case of performance regressions would be > seen for > some use cases with the new cache implementation. >
Re: [DISCUSS] Implementation for HTTP endpoint producer/consumer
I would make a separate project and release it as a .nar. It can run as a Broker Protocol Handler or a Proxy Extension. Then if the project gets traction we can add it to Pulsar core repo. Enrico and Il Mar 7 Giu 2022, 17:05 Dave Fisher ha scritto: > If this is a REST endpoint then call it REST. It is very likely that users > will want to use HTTPS to use it. Calling it HTTP is a misnomer. > > All The Best, > Dave > > Sent from my iPhone > > > On Jun 7, 2022, at 7:25 AM, Zhengxin Cai wrote: > > > > Thanks for bringing this up. > > I think building a separate HTTP server to serve REST produce/consume > > requests might be a good idea, like FunctionWorkerService, users can > choose > > to run with broker for simplicity or run as a separate component if user > > wants isolation and scale independently. > > I think we just missed this option when building V1, I think it's working > > considering. > > > > mattison chao 于2022年6月6日周一 21:33写道: > > > >> Hi, Pulsar Community, > >> > >> We have the PIP-64 that introduces HTTP Rest API for producing/consuming > >> messages( > >> > >> > https://github.com/apache/pulsar/wiki/PIP-64%3A-Introduce-REST-endpoints-for-producing%2C-consuming-and-reading-messages > >> ). But this proposal does not define the implementation. > >> > >> However, we already have producer HTTP API at the broker side. But, > there > >> are some problems, so refactored in this patch: > >> https://github.com/apache/pulsar/pull/15876. > >> > >> Then we add HTTP consumer in this patch: > >> https://github.com/apache/pulsar/pull/15942. > >> > >> But, currently have some ideas that do not reach a consensus. Like @Lari > >> Hotaril mentioned at pull request > >> https://github.com/apache/pulsar/pull/15942. > >> > >> It might not be a good idea to add the implementation to the main Pulsar > >> Admin API at all. > >> > >> HTTP consuming would be better to handle in a separate component. PIP-64 > >> doesn't determine that this should be part of Pulsar Admin API and we > >> should revisit this decision. I think it's a bad idea to add HTTP > consuming > >> to Pulsar Admin API and brokers. > >> > >> I want to discuss whether we should implement the HTTP endpoint in the > >> broker or separate it at another component(like pulsar-WebSocket). > >> > >> Best, > >> > >> Mattison > >> > >
Re: [DISCUSS] PIP-174: Provide new implementation for broker dispatch cache
On Tue, Jun 7, 2022 at 6:37 AM Enrico Olivelli wrote: > Great idea. > I wonder which kind of metrics we could have. To see how each > tenant/namespace is using the cache It can be done, at the expense of some CPU cost, for example by adding a tag for the topic and computing the sizes at the time of eviction. Right now we do have that information but we're not using/exposing it in any way. In general, the cache usage would be exactly proportional to the bytes/s incoming rate across all the topics.
Re: [DISCUSS] Implementation for HTTP endpoint producer/consumer
If this is a REST endpoint then call it REST. It is very likely that users will want to use HTTPS to use it. Calling it HTTP is a misnomer. All The Best, Dave Sent from my iPhone > On Jun 7, 2022, at 7:25 AM, Zhengxin Cai wrote: > > Thanks for bringing this up. > I think building a separate HTTP server to serve REST produce/consume > requests might be a good idea, like FunctionWorkerService, users can choose > to run with broker for simplicity or run as a separate component if user > wants isolation and scale independently. > I think we just missed this option when building V1, I think it's working > considering. > > mattison chao 于2022年6月6日周一 21:33写道: > >> Hi, Pulsar Community, >> >> We have the PIP-64 that introduces HTTP Rest API for producing/consuming >> messages( >> >> https://github.com/apache/pulsar/wiki/PIP-64%3A-Introduce-REST-endpoints-for-producing%2C-consuming-and-reading-messages >> ). But this proposal does not define the implementation. >> >> However, we already have producer HTTP API at the broker side. But, there >> are some problems, so refactored in this patch: >> https://github.com/apache/pulsar/pull/15876. >> >> Then we add HTTP consumer in this patch: >> https://github.com/apache/pulsar/pull/15942. >> >> But, currently have some ideas that do not reach a consensus. Like @Lari >> Hotaril mentioned at pull request >> https://github.com/apache/pulsar/pull/15942. >> >> It might not be a good idea to add the implementation to the main Pulsar >> Admin API at all. >> >> HTTP consuming would be better to handle in a separate component. PIP-64 >> doesn't determine that this should be part of Pulsar Admin API and we >> should revisit this decision. I think it's a bad idea to add HTTP consuming >> to Pulsar Admin API and brokers. >> >> I want to discuss whether we should implement the HTTP endpoint in the >> broker or separate it at another component(like pulsar-WebSocket). >> >> Best, >> >> Mattison >>
Re: [DISCUSS] Implementation for HTTP endpoint producer/consumer
Thanks for bringing this up. I think building a separate HTTP server to serve REST produce/consume requests might be a good idea, like FunctionWorkerService, users can choose to run with broker for simplicity or run as a separate component if user wants isolation and scale independently. I think we just missed this option when building V1, I think it's working considering. mattison chao 于2022年6月6日周一 21:33写道: > Hi, Pulsar Community, > > We have the PIP-64 that introduces HTTP Rest API for producing/consuming > messages( > > https://github.com/apache/pulsar/wiki/PIP-64%3A-Introduce-REST-endpoints-for-producing%2C-consuming-and-reading-messages > ). But this proposal does not define the implementation. > > However, we already have producer HTTP API at the broker side. But, there > are some problems, so refactored in this patch: > https://github.com/apache/pulsar/pull/15876. > > Then we add HTTP consumer in this patch: > https://github.com/apache/pulsar/pull/15942. > > But, currently have some ideas that do not reach a consensus. Like @Lari > Hotaril mentioned at pull request > https://github.com/apache/pulsar/pull/15942. > > It might not be a good idea to add the implementation to the main Pulsar > Admin API at all. > > HTTP consuming would be better to handle in a separate component. PIP-64 > doesn't determine that this should be part of Pulsar Admin API and we > should revisit this decision. I think it's a bad idea to add HTTP consuming > to Pulsar Admin API and brokers. > > I want to discuss whether we should implement the HTTP endpoint in the > broker or separate it at another component(like pulsar-WebSocket). > > Best, > > Mattison >
New proposal for chunk messages with shared subscriptions
Hi folks, Recently I'm working on the implementation of PIP-37, see https://github.com/apache/pulsar/wiki/PIP-37%3A-Large-message-size-handling-in-Pulsar#usecase-3-multiple-producers-with-shared-consumers As we can see, https://github.com/apache/pulsar/pull/4400 only implements chunking messages with non-shared subscriptions. When I followed the **Option 2** section, I found it works but there are many details that need to be taken care of. For example, - Should we add a marker type to indicate the chunk marker? - Normally, the markers like Transaction markers are not visible to the client, but we need to send the chunk marker to client. - What's the format of the chunk marker? - Which compatibility problems would be brought by this design? I think we need a new proposal to explain it in details and I'm working on that, as well as the demo. Feel free to ping me if you have any concern. Thanks, Yunze
Re: [DISCUSS] PIP-174: Provide new implementation for broker dispatch cache
Great idea. I wonder which kind of metrics we could have. To see how each tenant/namespace is using the cache Enrico Il giorno mar 7 giu 2022 alle ore 15:12 PengHui Li ha scritto: > > +1 > > Penghui > On Jun 7, 2022, 08:48 +0800, Matteo Merli , wrote: > > https://github.com/apache/pulsar/issues/15954 > > > > WIP can be seen at: https://github.com/apache/pulsar/pull/15955 > > > > --- > > > > > > ## Motivation > > > > The current implementation of the read cache in the Pulsar broker has > > largely > > remained unchanged for a long time, except for a few minor tweaks. > > > > While the implementation is stable and reasonably efficient for > > typical workloads, > > the overhead required for managing the cache evictions in a broker > > that is running > > many topics can be pretty high in terms of extra CPU utilization and on the > > JVM > > garbage collection to track an increased number of medium-lived objects. > > > > The goal is to provide an alternative implementation that can adapt better > > to > > a wider variety of operating conditions. > > > > ### Current implementation details > > > > The broker cache is implemented as part of the `ManagedLedger` component, > > which sits in the Pulsar broker and provides a higher level of > > abstraction of top > > of BookKeeper. > > > > Each topic (and managed-ledger) has its own private cache space. This > > cache is implemented > > as a `ConcurrentSkipList` sorted map that maps `(ledgerId, entryId) -> > > payload`. The payload > > is a `ByteBuf` reference that can either be a slice of a `ByteBuf` that we > > got > > when reading from a socket, or it can be a copied buffer. > > > > Each topic cache is allowed to use the full broker max cache size before an > > eviction is triggered. The total cache size is effectively a resource > > shared across all > > the topics, where a topic can use a more prominent portion of it if it > > "asks for more". > > > > When the eviction happens, we need to do an expensive ranking of all > > the caches in the broker > > and do an eviction in a proportional way to the currently used space > > for each of them. > > > > The bigger problem is represented by the `ConcurrentSkipList` and the > > `ByteBuf` objects > > that need to be tracked. The skip list is essentially like a "tree" > > structure and needs to > > maintain Java objects for each entry in the cache. We also need to > > potentially have > > a huge number of ByteBuf objects. > > > > A cache workload is typically the worst-case scenario for each garbage > > collector implementation because it involves creating objects, storing > > them for some amount of > > time and then throwing them away. During that time, the GC would have > > already tenured these > > objects and copy them into an "old generation" space, and sometime > > later, a costly compaction > > of that memory would have to be performed. > > > > To mitigate the effect of the cache workload on the GC, we're being > > very aggressive in > > purging the cache by triggering time-based eviction. By putting a max > > TTL on the elements in > > the cache, we can avoid keeping the objects around for too long to be > > a problem for the GC. > > > > The reverse side of this is that we're artificially reducing the cache > > capacity to a very > > short time frame, reducing the cache usefulness. > > > > The other problem is the CPU cost involved in doing these frequent > > evictions, which can > > be very high when there are 10s of thousands of topics in a broker. > > > > > > ## Proposed changes > > > > Instead of dealing with individual caches for each topic, let's adopt > > a model where > > there is a single cache space for the broker. > > > > This cache is broken into N segments which act as a circular buffer. > > Whenever a segment > > is full, we start writing into the next one, and when we reach the > > last one, we will > > restart recycling the first segment. > > > > Each segment is composed of a buffer, an offset, and a hashmap which maps > > `(ledgerId, entryId) -> offset`. > > > > This model has been working very well for the BookKeeper `ReadCache`: > > https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/ReadCache.java > > > > There are two main advantages to this approach: > > > > 1. Entries are copied into the cache buffer (in direct memory), and > > we don't need to keep any > > long-lived Java objects around > > 2. The eviction becomes a completely trivial operation, buffers are > > just rotated and > > overwritten. We don't need to do any per-topic task or keep track > > of utilization. > > > > ### API changes > > > > No user-facing API changes are required. > > > > ### New configuration options > > > > The existing cache implementation will not be removed at this point. Users > > will > > be able to configure the old implementation in `broker.conf`. > > > > This option will be helpful in case of performance regressions
Re: [DISCUSS] Apache Pulsar 2.9.3 release
Thanks for your update. I will continue to release 2.9.3 Best, Mattison On Sat, 4 Jun 2022 at 04:04, Dave Fisher wrote: > > > On Jun 2, 2022, at 11:55 PM, mattison chao > wrote: > > > > Hi Dave Fisher, > > > >> There are some PRs that are coming in that must be included. > > > > How's the progress on these PRs? > > They are merged. > > Regards, > Dave > > > > > > Best, > > Mattison > > > > On Wed, 25 May 2022 at 21:33, Just do it > > wrote: > > > >> +1 > >> Thanks, > >> Dezhi > >> > >> > >> > >> > >> > >> -- Original -- > >> From: Hang Chen >> Date: Wed,May 25,2022 9:10 AM > >> To: dev >> Subject: Re: [DISCUSS] Apache Pulsar 2.9.3 release > >> > >> > >> > >> +1 > >> > >> Thanks, > >> Hang > >> > >> Dave Fisher >> > >> There are some PRs that are coming in that must be included. > >> > >> Thanks, > >> Dave > >> > >> > >> On May 23, 2022, at 4:29 AM, PengHui Li > >> wrote: > >> > >> +1 > >> > >> Thanks > >> Penghui > >> > >> On Mon, May 23, 2022 at 3:31 PM mattison chao < > >> mattisonc...@apache.org > >> wrote: > >> > >> Hello, Pulsar community: > >> > >> I'd like to propose to release Apache Pulsar 2.9.3 > >> > >> Currently, we have 192 commits [0] and there are many > >> transaction > >> fixes, security fixes. > >> > >> And there are 22 open PRs [1], I will follow them to make > >> sure that > >> the important fixes could be contained in 2.9.3 > >> > >> If you have any important fixes or any questions, > >> please reply to this email, we will evaluate whether to > >> include it in 2.9.3 > >> > >> [0] > >> > >> > >> > https://github.com/apache/pulsar/pulls?q=is%3Amerged+is%3Apr+label%3Arelease%2F2.9.3+ > >> > >> < > https://github.com/apache/pulsar/pulls?q=is%3Amerged+is%3Apr+label%3Arelease%2F2.9.3+ > >; > >> [1] > >> > >> > >> > https://github.com/apache/pulsar/pulls?q=is%3Aopen+is%3Apr+label%3Arelease%2F2.9.3+ > >> > >> < > https://github.com/apache/pulsar/pulls?q=is%3Aopen+is%3Apr+label%3Arelease%2F2.9.3+ > >; > >> > >> Best Regards > >> Mattison > >> > >> > >
Re: [ANNOUNCE] New Committer: Dezhi Liu
Congratulations Dezhi! Penghui On Jun 7, 2022, 17:22 +0800, Enrico Olivelli , wrote: > Congratulations !! > > Enrico > > Il giorno mar 7 giu 2022 alle ore 10:57 Zike Yang ha > scritto: > > > > Congratulations! > > > > Best Regards, > > Zike Yang > > > > On Tue, Jun 7, 2022 at 3:52 PM ZhangJian He wrote: > > > > > Congratulations! > > > > > > Thanks > > > ZhangJian He > > > > > > Haiting Jiang 于2022年6月7日周二 15:46写道: > > > > > > > Congrats! > > > > > > > > BR, > > > > Haiting > > > > > > > > On 2022/06/07 06:46:00 Hang Chen wrote: > > > > > The Project Management Committee (PMC) for Apache Pulsar has invited > > > > > Dezhi Liu (https://github.com/liudezhi2098) to become a committer and > > > > > we are pleased to announce that he has accepted. > > > > > > > > > > Dezhi Liu (with Github id liudezhi2098) contributed many improvements > > > > > and bug fixes to Pulsar. > > > > > > > > > > Being a committer enables easier contribution to the project since > > > > > there is no need to go via the patch submission process. This should > > > > > enable better productivity. > > > > > > > > > > Welcome and Congratulations, Dezhi Liu! > > > > > > > > > > Please join us in congratulating and welcoming Dezhi Liu onboard! > > > > > > > > > > Best Regards, > > > > > Hang Chen on behalf of the Pulsar PMC > > > > > > > > > > > >
Re: [DISCUSS] PIP-174: Provide new implementation for broker dispatch cache
+1 Penghui On Jun 7, 2022, 08:48 +0800, Matteo Merli , wrote: > https://github.com/apache/pulsar/issues/15954 > > WIP can be seen at: https://github.com/apache/pulsar/pull/15955 > > --- > > > ## Motivation > > The current implementation of the read cache in the Pulsar broker has largely > remained unchanged for a long time, except for a few minor tweaks. > > While the implementation is stable and reasonably efficient for > typical workloads, > the overhead required for managing the cache evictions in a broker > that is running > many topics can be pretty high in terms of extra CPU utilization and on the > JVM > garbage collection to track an increased number of medium-lived objects. > > The goal is to provide an alternative implementation that can adapt better to > a wider variety of operating conditions. > > ### Current implementation details > > The broker cache is implemented as part of the `ManagedLedger` component, > which sits in the Pulsar broker and provides a higher level of > abstraction of top > of BookKeeper. > > Each topic (and managed-ledger) has its own private cache space. This > cache is implemented > as a `ConcurrentSkipList` sorted map that maps `(ledgerId, entryId) -> > payload`. The payload > is a `ByteBuf` reference that can either be a slice of a `ByteBuf` that we got > when reading from a socket, or it can be a copied buffer. > > Each topic cache is allowed to use the full broker max cache size before an > eviction is triggered. The total cache size is effectively a resource > shared across all > the topics, where a topic can use a more prominent portion of it if it > "asks for more". > > When the eviction happens, we need to do an expensive ranking of all > the caches in the broker > and do an eviction in a proportional way to the currently used space > for each of them. > > The bigger problem is represented by the `ConcurrentSkipList` and the > `ByteBuf` objects > that need to be tracked. The skip list is essentially like a "tree" > structure and needs to > maintain Java objects for each entry in the cache. We also need to > potentially have > a huge number of ByteBuf objects. > > A cache workload is typically the worst-case scenario for each garbage > collector implementation because it involves creating objects, storing > them for some amount of > time and then throwing them away. During that time, the GC would have > already tenured these > objects and copy them into an "old generation" space, and sometime > later, a costly compaction > of that memory would have to be performed. > > To mitigate the effect of the cache workload on the GC, we're being > very aggressive in > purging the cache by triggering time-based eviction. By putting a max > TTL on the elements in > the cache, we can avoid keeping the objects around for too long to be > a problem for the GC. > > The reverse side of this is that we're artificially reducing the cache > capacity to a very > short time frame, reducing the cache usefulness. > > The other problem is the CPU cost involved in doing these frequent > evictions, which can > be very high when there are 10s of thousands of topics in a broker. > > > ## Proposed changes > > Instead of dealing with individual caches for each topic, let's adopt > a model where > there is a single cache space for the broker. > > This cache is broken into N segments which act as a circular buffer. > Whenever a segment > is full, we start writing into the next one, and when we reach the > last one, we will > restart recycling the first segment. > > Each segment is composed of a buffer, an offset, and a hashmap which maps > `(ledgerId, entryId) -> offset`. > > This model has been working very well for the BookKeeper `ReadCache`: > https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/ReadCache.java > > There are two main advantages to this approach: > > 1. Entries are copied into the cache buffer (in direct memory), and > we don't need to keep any > long-lived Java objects around > 2. The eviction becomes a completely trivial operation, buffers are > just rotated and > overwritten. We don't need to do any per-topic task or keep track > of utilization. > > ### API changes > > No user-facing API changes are required. > > ### New configuration options > > The existing cache implementation will not be removed at this point. Users > will > be able to configure the old implementation in `broker.conf`. > > This option will be helpful in case of performance regressions would be seen > for > some use cases with the new cache implementation.
Re: [VOTE] PIP-166: Function add MANUAL delivery semantics
I have left one last minute comment, can you please take a look ? then I will post my +1 thanks Enrico
Re: [VOTE] PIP-166: Function add MANUAL delivery semantics
+1 On Mon, Jun 6, 2022 at 4:04 AM Baodi Shi wrote: > Hi Pulsar Community, > > > I would like to start a VOTE on "Function add MANUAL delivery semantics" > (PIP-166). > > > The proposal can be read at https://github.com/apache/pulsar/issues/15560 > > and the discussion thead is available at > > https://lists.apache.org/thread/4f2w1mqvhhs3mvccbcg2sk19b60xwkgf > > > Voting will stay open for at least 48h. > > > Thanks, > > Baodi Shi >
Re: [ANNOUNCE] New Committer: Dezhi Liu
Congratulations !! Enrico Il giorno mar 7 giu 2022 alle ore 10:57 Zike Yang ha scritto: > > Congratulations! > > Best Regards, > Zike Yang > > On Tue, Jun 7, 2022 at 3:52 PM ZhangJian He wrote: > > > Congratulations! > > > > Thanks > > ZhangJian He > > > > Haiting Jiang 于2022年6月7日周二 15:46写道: > > > > > Congrats! > > > > > > BR, > > > Haiting > > > > > > On 2022/06/07 06:46:00 Hang Chen wrote: > > > > The Project Management Committee (PMC) for Apache Pulsar has invited > > > > Dezhi Liu (https://github.com/liudezhi2098) to become a committer and > > > > we are pleased to announce that he has accepted. > > > > > > > > Dezhi Liu (with Github id liudezhi2098) contributed many improvements > > > > and bug fixes to Pulsar. > > > > > > > > Being a committer enables easier contribution to the project since > > > > there is no need to go via the patch submission process. This should > > > > enable better productivity. > > > > > > > > Welcome and Congratulations, Dezhi Liu! > > > > > > > > Please join us in congratulating and welcoming Dezhi Liu onboard! > > > > > > > > Best Regards, > > > > Hang Chen on behalf of the Pulsar PMC > > > > > > > > >
Re: [ANNOUNCE] New Committer: Dezhi Liu
Congratulations! Best Regards, Zike Yang On Tue, Jun 7, 2022 at 3:52 PM ZhangJian He wrote: > Congratulations! > > Thanks > ZhangJian He > > Haiting Jiang 于2022年6月7日周二 15:46写道: > > > Congrats! > > > > BR, > > Haiting > > > > On 2022/06/07 06:46:00 Hang Chen wrote: > > > The Project Management Committee (PMC) for Apache Pulsar has invited > > > Dezhi Liu (https://github.com/liudezhi2098) to become a committer and > > > we are pleased to announce that he has accepted. > > > > > > Dezhi Liu (with Github id liudezhi2098) contributed many improvements > > > and bug fixes to Pulsar. > > > > > > Being a committer enables easier contribution to the project since > > > there is no need to go via the patch submission process. This should > > > enable better productivity. > > > > > > Welcome and Congratulations, Dezhi Liu! > > > > > > Please join us in congratulating and welcoming Dezhi Liu onboard! > > > > > > Best Regards, > > > Hang Chen on behalf of the Pulsar PMC > > > > > >
Re: [ANNOUNCE] New Committer: Dezhi Liu
Congratulations! Thanks ZhangJian He Haiting Jiang 于2022年6月7日周二 15:46写道: > Congrats! > > BR, > Haiting > > On 2022/06/07 06:46:00 Hang Chen wrote: > > The Project Management Committee (PMC) for Apache Pulsar has invited > > Dezhi Liu (https://github.com/liudezhi2098) to become a committer and > > we are pleased to announce that he has accepted. > > > > Dezhi Liu (with Github id liudezhi2098) contributed many improvements > > and bug fixes to Pulsar. > > > > Being a committer enables easier contribution to the project since > > there is no need to go via the patch submission process. This should > > enable better productivity. > > > > Welcome and Congratulations, Dezhi Liu! > > > > Please join us in congratulating and welcoming Dezhi Liu onboard! > > > > Best Regards, > > Hang Chen on behalf of the Pulsar PMC > > >
Re: [ANNOUNCE] New Committer: Dezhi Liu
Congrats! BR, Haiting On 2022/06/07 06:46:00 Hang Chen wrote: > The Project Management Committee (PMC) for Apache Pulsar has invited > Dezhi Liu (https://github.com/liudezhi2098) to become a committer and > we are pleased to announce that he has accepted. > > Dezhi Liu (with Github id liudezhi2098) contributed many improvements > and bug fixes to Pulsar. > > Being a committer enables easier contribution to the project since > there is no need to go via the patch submission process. This should > enable better productivity. > > Welcome and Congratulations, Dezhi Liu! > > Please join us in congratulating and welcoming Dezhi Liu onboard! > > Best Regards, > Hang Chen on behalf of the Pulsar PMC >
[ANNOUNCE] New Committer: Dezhi Liu
The Project Management Committee (PMC) for Apache Pulsar has invited Dezhi Liu (https://github.com/liudezhi2098) to become a committer and we are pleased to announce that he has accepted. Dezhi Liu (with Github id liudezhi2098) contributed many improvements and bug fixes to Pulsar. Being a committer enables easier contribution to the project since there is no need to go via the patch submission process. This should enable better productivity. Welcome and Congratulations, Dezhi Liu! Please join us in congratulating and welcoming Dezhi Liu onboard! Best Regards, Hang Chen on behalf of the Pulsar PMC