Re: [ANNOUNCE] New Committer: Zili Chen
Congratulations! tison Enrico Olivelli 于2022年11月10日周四 15:09写道: > Congratulations! > > Enrico > > Il Gio 10 Nov 2022, 08:07 Haiting Jiang ha > scritto: > > > Congratulations! > > > > > > BR, > > Haiting > > > > On Thu, Nov 10, 2022 at 2:14 PM Jun Ma wrote: > > > > > > Congratulations! > > > > > > > > > > > > > > Yu : > > > > > > > > > The Project Management Committee (PMC) for Apache Pulsar has > invited > > Zili > > > > > Chen (https://github.com/tisonkun) > > > > > to become a committer and we are pleased to announce that he has > > > > accepted. > > > > > > > > > > Being a committer enables easier contribution to the > > > > > project since there is no need to go via the patch > > > > > submission process. This should enable better productivity. > > > > > > > > > > Welcome and congratulations, Zili Chen! > > > > > > > > > > Please join us in congratulating and welcoming Zili Chen onboard! > > > > > > > > > > Best Regards, > > > > > Yu on behalf of the Pulsar PMC > > > > > > > > > > > >
Re: [DISCUSS] Make the behavior of producers that use the mode WaitForExclusive clear
Yubiao, Il Gio 10 Nov 2022, 07:40 Yubiao Feng ha scritto: > Hi all, > > Now if an Exclusive-Mode producer has been registered, we can register > another WaitForExclusive-Mode producer. When the old producer goes > offline, the new producer will take over as the new exclusive producer. > > < Highlight >But if some Shared-Mode producers have been registered, we can > register another WaitForExclusive-Mode producer too, but even if all > Shared-Mode goes offline, the new producer will not take over as the new > exclusive producer. This looks like a bug to me. When you start a -Exclusive producer you want to take control over the topic. I think that we should kick out all the connected Shared producers. The competition is among Exclusive producers and Shared don't play the game. The fix should be easy Enrico Then, if we register a new WaitForExclusive-Mode > producer, the earlier WaitForExclusive-Mode producer will take over as the > new exclusive producer. > > < Highlight >I want to make the behavior of producers that use the mode > WaitForExclusive clear: When no other mode of producer exists for a topic, > the first producer in the WaitForExclusive queue can auto succeeds as the > exclusive producer. > > Thanks > Yubiao >
Re: [ANNOUNCE] New Committer: Zili Chen
Congratulations! Enrico Il Gio 10 Nov 2022, 08:07 Haiting Jiang ha scritto: > Congratulations! > > > BR, > Haiting > > On Thu, Nov 10, 2022 at 2:14 PM Jun Ma wrote: > > > > Congratulations! > > > > > > > > > > Yu : > > > > > > > The Project Management Committee (PMC) for Apache Pulsar has invited > Zili > > > > Chen (https://github.com/tisonkun) > > > > to become a committer and we are pleased to announce that he has > > > accepted. > > > > > > > > Being a committer enables easier contribution to the > > > > project since there is no need to go via the patch > > > > submission process. This should enable better productivity. > > > > > > > > Welcome and congratulations, Zili Chen! > > > > > > > > Please join us in congratulating and welcoming Zili Chen onboard! > > > > > > > > Best Regards, > > > > Yu on behalf of the Pulsar PMC > > > > > > > >
Re: [ANNOUNCE] New Committer: Zili Chen
Congratulations! BR, Haiting On Thu, Nov 10, 2022 at 2:14 PM Jun Ma wrote: > > Congratulations! > > > > > > Yu : > > > > > The Project Management Committee (PMC) for Apache Pulsar has invited Zili > > > Chen (https://github.com/tisonkun) > > > to become a committer and we are pleased to announce that he has > > accepted. > > > > > > Being a committer enables easier contribution to the > > > project since there is no need to go via the patch > > > submission process. This should enable better productivity. > > > > > > Welcome and congratulations, Zili Chen! > > > > > > Please join us in congratulating and welcoming Zili Chen onboard! > > > > > > Best Regards, > > > Yu on behalf of the Pulsar PMC > > > > >
[DISCUSS] Make the behavior of producers that use the mode WaitForExclusive clear
Hi all, Now if an Exclusive-Mode producer has been registered, we can register another WaitForExclusive-Mode producer. When the old producer goes offline, the new producer will take over as the new exclusive producer. < Highlight >But if some Shared-Mode producers have been registered, we can register another WaitForExclusive-Mode producer too, but even if all Shared-Mode goes offline, the new producer will not take over as the new exclusive producer. Then, if we register a new WaitForExclusive-Mode producer, the earlier WaitForExclusive-Mode producer will take over as the new exclusive producer. < Highlight >I want to make the behavior of producers that use the mode WaitForExclusive clear: When no other mode of producer exists for a topic, the first producer in the WaitForExclusive queue can auto succeeds as the exclusive producer. Thanks Yubiao
[GitHub] [pulsar] github-actions[bot] added a comment to the discussion: Pulsar Audit - Question
GitHub user github-actions[bot] added a comment to the discussion: Pulsar Audit - Question The issue had no activity for 30 days, mark with Stale label. GitHub link: https://github.com/apache/pulsar/discussions/18404#discussioncomment-4104450 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
Re: [ANNOUNCE] New Committer: Zili Chen
Congratulations! > > Yu : > > > The Project Management Committee (PMC) for Apache Pulsar has invited Zili > > Chen (https://github.com/tisonkun) > > to become a committer and we are pleased to announce that he has > accepted. > > > > Being a committer enables easier contribution to the > > project since there is no need to go via the patch > > submission process. This should enable better productivity. > > > > Welcome and congratulations, Zili Chen! > > > > Please join us in congratulating and welcoming Zili Chen onboard! > > > > Best Regards, > > Yu on behalf of the Pulsar PMC > > >
Re: [ANNOUNCE] New Committer: Zili Chen
Congratulations! Thanks ZhangJian He On Thu, 10 Nov 2022 at 14:01, houxiaoyu wrote: > Congratulations! > > Best, > Xiaoyu Hou > > Yu 于2022年11月10日周四 08:16写道: > > > The Project Management Committee (PMC) for Apache Pulsar has invited Zili > > Chen (https://github.com/tisonkun) > > to become a committer and we are pleased to announce that he has > accepted. > > > > Being a committer enables easier contribution to the > > project since there is no need to go via the patch > > submission process. This should enable better productivity. > > > > Welcome and congratulations, Zili Chen! > > > > Please join us in congratulating and welcoming Zili Chen onboard! > > > > Best Regards, > > Yu on behalf of the Pulsar PMC > > >
Re: Request a site ID and tracking code for Apache Pulsar
Hi Martijn, Thanks for your help! We successfully integrate Matomo to the Pulsar website now. I'd like to know how to create an account to add segments or further analysis board. It seems an Apache account is not carried to Matomo account system. Best, tison. tison 于2022年11月7日周一 16:49写道: > > Hi Martijn, > > Thank you! > Best, > tison. > > > Martijn Visser 于2022年11月7日周一 16:44写道: > >> Hi Tison, >> >> It most certainly is but I haven't had the time yet to reply yet. My >> apologies! >> >> Here's the tracking code for Pulsar. The code will need to be integrated >> in >> any page you want to track by adding it before the tag. You can >> find the results at https://analytics.apache.org >> >> >> >> var _paq = window._paq = window._paq || []; >> /* tracker methods like "setCustomDimension" should be called before >> "trackPageView" */ >> /* We explicitly disable cookie tracking to avoid privacy issues */ >> _paq.push(['disableCookies']); >> _paq.push(['trackPageView']); >> _paq.push(['enableLinkTracking']); >> (function() { >> var u="https://analytics.apache.org/"; >> _paq.push(['setTrackerUrl', u+'matomo.php']); >> _paq.push(['setSiteId', '32']); >> var d=document, g=d.createElement('script'), >> s=d.getElementsByTagName('script')[0]; >> g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); >> })(); >> >> >> >> Best regards, >> >> Martijn >> >> Op ma 7 nov. 2022 om 09:17 schreef tison : >> >> > Hi, >> > >> > Is here the correct place to send this request to? >> > >> > Best, >> > tison. >> > >> > >> > Dave Fisher 于2022年11月2日周三 00:36写道: >> > >> > > - privacy. >> > > >> > > THANK YOU! >> > > >> > > Sent from my iPhone >> > > >> > > > On Nov 1, 2022, at 9:21 AM, tison wrote: >> > > > >> > > > Hi Privacy Team, >> > > > >> > > > As proposed in https://github.com/apache/pulsar/issues/15664, the >> > Apache >> > > > Pulsar community is actively migrating from Google Analytics to the >> > > Matomo >> > > > solution. >> > > > >> > > > Reading from https://privacy.apache.org/matomo/, I send this email >> to >> > > > request a site ID and tracking code for Apache Pulsar. I think this >> > > setting >> > > > is public and the motivation is provided. >> > > > >> > > > dev@pulsar.a.o in cc. Please correct me if more prerequisites are >> > > needed. >> > > > >> > > > Best, >> > > > tison. >> > > >> > > >> > >> >
Re: [ANNOUNCE] New Committer: Zili Chen
Congratulations! Best, Xiaoyu Hou Yu 于2022年11月10日周四 08:16写道: > The Project Management Committee (PMC) for Apache Pulsar has invited Zili > Chen (https://github.com/tisonkun) > to become a committer and we are pleased to announce that he has accepted. > > Being a committer enables easier contribution to the > project since there is no need to go via the patch > submission process. This should enable better productivity. > > Welcome and congratulations, Zili Chen! > > Please join us in congratulating and welcoming Zili Chen onboard! > > Best Regards, > Yu on behalf of the Pulsar PMC >
Re: [ANNOUNCE] New Committer: Zili Chen
Congratulations to tisun kun / zilli chen! Welcome! Sent from my iPhone > On Nov 9, 2022, at 4:16 PM, Yu wrote: > > The Project Management Committee (PMC) for Apache Pulsar has invited Zili > Chen (https://github.com/tisonkun) > to become a committer and we are pleased to announce that he has accepted. > > Being a committer enables easier contribution to the > project since there is no need to go via the patch > submission process. This should enable better productivity. > > Welcome and congratulations, Zili Chen! > > Please join us in congratulating and welcoming Zili Chen onboard! > > Best Regards, > Yu on behalf of the Pulsar PMC
Re: [ANNOUNCE] New Committer: Zili Chen
> Congratulations! Zili > 在 2022年11月10日,上午11:18,Max Xu 写道: > > Congratulations! Zili
[GitHub] [pulsar] BewareMyPower added a comment to the discussion: [QUESTION] Thread safe problem about HandlerState#changeToReadyState
GitHub user BewareMyPower added a comment to the discussion: [QUESTION] Thread safe problem about HandlerState#changeToReadyState Is following implementation better? ```java private static boolean notClosed(State state) { return state == State.Uninitialized || state == State.Connecting || state == State.RegisteringSchema; } // moves the state to ready if it wasn't closed protected boolean changeToReadyState() { return STATE_UPDATER.getAndUpdate(this, state -> (notClosed(state) ? State.Ready : state)) == State.Ready; } ``` GitHub link: https://github.com/apache/pulsar/discussions/18401#discussioncomment-4103573 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] codelipenghui added a comment to the discussion: [QUESTION] Thread safe problem about HandlerState#changeToReadyState
GitHub user codelipenghui added a comment to the discussion: [QUESTION] Thread safe problem about HandlerState#changeToReadyState The issue had no activity for 30 days, mark with Stale label. GitHub link: https://github.com/apache/pulsar/discussions/18401#discussioncomment-4103574 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] BewareMyPower created a discussion: [QUESTION] Thread safe problem about HandlerState#changeToReadyState
GitHub user BewareMyPower created a discussion: [QUESTION] Thread safe problem about HandlerState#changeToReadyState I'm not sure if it's bug. It's more a question. As we can see, https://github.com/apache/pulsar/blob/608929227824fe4303f46aa432e42af77bcbf625/pulsar-client/src/main/java/org/apache/pulsar/client/impl/HandlerState.java#L53-L56 `HandlerState#changeToReadyState` is not an atomic operation. I'm not sure there's a race case like following timeline | Time | Event | State Before | State Now | | : | :- | : | :-- | | 1 | `STATE_UPDATER.compareAndSet(this, State.Uninitialized, State.Ready)` | `State.Connecting` | `State.Connecting` | | 2 | `setState(State.Uninitialized)` | `State.Connecting` | `State.Uninitialized` | | 3 | `STATE_UPDATER.compareAndSet(this, State.Connecting, State.Ready)` | `State.Uninitialized` | `State.Uninitialized` | | 4 | `STATE_UPDATER.compareAndSet(this, State.RegisteringSchema, State.Ready)` | `State.Uninitialized` | `State.Uninitialized` | As we can see, there's a time point that the state was changed back to `Uninitialized` from `Connecting`. However, we should expect the state to be `Ready` because neither `Uninitialized` nor `Connecting` was a closed state. I see references of `changeToReadyState` in `ProducerImpl` and `ConsumerImpl` were protected by the lock directly or indirectly, like https://github.com/apache/pulsar/blob/608929227824fe4303f46aa432e42af77bcbf625/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L738-L739 I'm not sure if the lock works because it requires some `setState` invocations are protected by the lock and I didn't check it in detail. And in `TransactionMetaStoreHandler#connectionOpened`, there's no lock. https://github.com/apache/pulsar/blob/608929227824fe4303f46aa432e42af77bcbf625/pulsar-client/src/main/java/org/apache/pulsar/client/impl/TransactionMetaStoreHandler.java#L115-L117 I'm not sure if the thread safety could be guaranteed. IMO, if there's no possibility that the state was changed back to `Connecting` or `Uninitialized` during `changeToReadyState`, it will be thread safe. Or this race condition is acceptable? GitHub link: https://github.com/apache/pulsar/discussions/18401 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
Re: [ANNOUNCE] New Committer: Zili Chen
Congratulations! Zili Best, Max Xu On Thu, Nov 10, 2022 at 8:16 AM Yu wrote: > The Project Management Committee (PMC) for Apache Pulsar has invited Zili > Chen (https://github.com/tisonkun) > to become a committer and we are pleased to announce that he has accepted. > > Being a committer enables easier contribution to the > project since there is no need to go via the patch > submission process. This should enable better productivity. > > Welcome and congratulations, Zili Chen! > > Please join us in congratulating and welcoming Zili Chen onboard! > > Best Regards, > Yu on behalf of the Pulsar PMC >
Re: [DISCUSSION] Redesign the MessageId interface
I also changed my mind after I saw Flink's MesssageIdUtils implementation. Now it's clear to me that: - For application users, the APIs in the pulsar-client-api module are what they should use. - For Pulsar ecosystem developers, the APIs in the pulsar-client module are interfaces So at the moment, these MessageId implementations could already be used in some external applications. In conclusion, 1. The MessageId interface should not be touched 2. The public methods of the MessageId implementations should not be touched Based on these two key points, I'm going to look into these implementations and mark some as deprecated but they should still work like before. Thanks, Yunze On Thu, Nov 10, 2022 at 3:50 AM Rajan Dhabalia wrote: > > Hi, > > I was reading the email thread why we want to change MessageId interface: > https://lists.apache.org/thread/rdkqnkohbmkjjs61hvoqplhhngr0b0sd > >> Currently we have the following 5 implementations of MessageId: > >> These implementations are such a mess. For example, when users get a > MessageId from `Producer#send`: > > I think above discussion started by stating that MessageId has multiple > implementations eg: MessageIdImpl, BatchMessageIdImpl, ChunkMessageIdImpl, > etc.. and as a client we receive a single MessageId with send message API. > > Well, I see it's expected and very well defined behavior. No matter what > implementation client library is internally using, as a user for my topic I > should be able to publish and consume messages by providing specific > configurations. The moment a user has to know implementation details of a > client library such as chunk or batch message internals, there will be a > strong dependency created between application and server implementation > which is a RED flag for any system. Abstraction helps user adoption by > simplifying user API and allowing systems to enhance without worrying about > application dependency. > Removing abstraction of MessageId and extracting implementation information > in getter API will prevent us to make any implementation change (eg: won't > be able to change chunk mesasgeId behavior which we might need in future > for Shared-subscription), introduce hacks for any enhancement by making > sure existing internal implementation can't change (eg: if we can't change > chunk/batch-message-Id then we might create a new wrapper and attach to > existing messageId which will make even things worse), and even prevent us > to change backend implementation (eg: we will be stuck with BK-Ledger > implementation forever). > Pulsar is used for many large scale business usecaeses with a large number > of users and it can create a nightmare for operators and users if we remove > abstractions and start following the practice of exposing system internals > to user applications. > > Thanks, > Rajan > > > On Tue, Nov 8, 2022 at 6:05 PM Yunze Xu > wrote: > > > Hi Joe, > > > > Then what would we expect users to do with the MessageId? It should only > > be passed to Consumer#seek or ReaderBuilder#startMessageId? > > > > What about the partition index? We have a `TopicMetadata` interface that > > returns > > the number of partitions. If the partition is also "implementation > > details", should we expose > > this interface? Or should we support customizing a MessageRouter because it > > returns the partition index? > > > > What about the batch index and batch size? For example, we have an > > enableBatchIndexAcknowledgment method to enable batch index ACK. If batch > > index is also "implementation details", how could users know what does > > "batch > > index ack" mean? > > > > Even for ledger id and entry id, this pair represents a logic storage > > position like the offset > > concept in Kafka (though each offset represents a message while each > > entry represents > > a batch). If you see the Message API, it also exposes many attributes. > > IMO, for the > > MessageIdData, only the ack_set (a long array serialized from the > > BitSet) is the implementation > > detail. > > > > The MessageId API should be flexible, not an abstract one. If not, why > > do we still implement > > the toString() method? We should not encourage users to print the > > MessageId. It would > > be easy to know what "ledger is 0, entry id is 1" means, users only > > need to know the concepts > > of ledger id and entry id. But it would be harder to know a tuple like > > "0:1:-1:-1" means. > > > > Thanks, > > Yunze > > > > On Tue, Nov 8, 2022 at 11:16 PM Joe F wrote: > > > > > > >Maybe this design is to hidden some details, but if > > > users don't know the details like ledger id and entry id, how could > > > you know what does "0:0:-1:0" mean? > > > > > > Abstractions exist for a reason. Ledgerid and entryid are > > implementation > > > details, and an application should not be interpreting that at all. > > > -j > > > > > > > > > On Tue, Nov 8, 2022 at 3:43 AM Yunze Xu > > > wrote: > > > > > > > I didn't look into these two methods at the
[GitHub] [pulsar] blaghed added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user blaghed added a comment to the discussion: Pulsar Operator On Kubernetes Hi all, I'm very much interested in the discussion revolving around the Cluster and Resource Operators for Pulsar. So, kind of hope that the reason this ticket went quiet is that you moved the ongoing convo over to Slack? Is there any update from that side that would make sense adding here? GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103122 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] tisonkun added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user tisonkun added a comment to the discussion: Pulsar Operator On Kubernetes Open-ended discussion. Moved to the Discussions forum. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103123 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes @nicknezis I‘ve invited you to join the channel GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103121 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes @nicknezis Hi, Sorry, I didn't know there was pulsar slack,now, I created one pulsar operator channel in the Pulsar Slack,We will discuss the following here. There's another team looking at the same thing, we can discuss it together here. We've been talking about priorities for all function points, and we've been talking about rules. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103120 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes @chrsoo OK GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103117 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] nicknezis added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user nicknezis added a comment to the discussion: Pulsar Operator On Kubernetes Why not just an #operator channel in the Pulsar Slack? We hang out in there so not sure if this is big enough effort to need a whole new Slack. Having a design spec laid out is a great idea because we are starting to build something similar. If we have an agreed upon approach, members from my team could contribute functionality to flesh out the implementation. Also we can perhaps keep separate Operators as Strimzi did for their Kafka Operators. This would allow the current big-sky Operator to exist for managing the cluster and my team could create another operator for managing the User,Topic, Tenant, Namespace type objects. https://strimzi.io/docs/overview/latest/#overview-components_str GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103119 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes I created slack https://pulsaroperator.slack.com/ GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103118 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes > @sijie yes, exactly. Should we take your comment regarding StreamNative to > mean that you are planning to open source an Operator for Pulsar on > Kubernetes that deals with the full scope of CRD's for Pulsar management? How > soon is "soon"? @sijie any feedback on this? > @sky-big regarding a specific design I would design it as explained above but > I will see if I can produce a more detailed design specification as a PR. @sky-big sorry for not getting back to you, the pulsar project took a back-seat for while where I work. It is getting reprioritised again so I will start looking into that PR again. Note that I am not talking about code at this point but at least I should be able to put how I think it should work. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103116 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sijie added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sijie added a comment to the discussion: Pulsar Operator On Kubernetes I think both ansible and go operators can co-exist. I would also recommend submitting the operators to the operator hub as well so that it can be exposed to wider community. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103114 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes The next step is to publish to [OperatorHub](https://operatorhub.io/) GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103115 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes @sijie yes, exactly. Should we take your comment regarding StreamNative to mean that you are planning to open source an Operator for Pulsar on Kubernetes that deals with the full scope of CRD's for Pulsar management? How soon is "soon"? @sky-big regarding a specific design I would design it as explained above but I will see if I can produce a more detailed design specification as a PR. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103110 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes Operator add pulsar manager component success. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103112 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] nicknezis added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user nicknezis added a comment to the discussion: Pulsar Operator On Kubernetes This is cool to see. A team I am on was going to start making a Pulsar Operator using Ansible OperatorSDK. https://github.com/operator-framework/operator-sdk/blob/master/doc/ansible/user-guide.md Here is an Ansible Operator example for reference. https://github.com/tomaszkiewicz/mysql-operator/tree/335556ee3331ee7deba9049eafd67ad130125174/playbooks Would the Pulsar team be interested in this work? Any opinions on Ansible vs Go? GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103113 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes @chrsoo OK, Looking forward to you PR. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103111 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sijie added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sijie added a comment to the discussion: Pulsar Operator On Kubernetes @chrsoo @sky-big @wolfstudy I think what @chrsoo request is an operator for the full lifecycle management for all the resources within a Pulsar instance. This includes both ClusterOperator and ResourceOperator. A ClusterOperator is what @sky-big is doing and most of the *operators* are doing. It is used for setting up a cluster and managing the upgrade, downgrade and scaling up-and-down a cluster. A ResourceOperator is more about using *CRD* for provisioning *resources* within a Pulsar cluster. A *resource* such as a role (key and tokens), a tenant, a namespace, a topic, a function and etc. Once everything is modeled as CRD, you can use kubernetes to provision both cluster and resources. Thus you can apply a GitOps patten in a CI/CD pipeline to manage everything. @chrsoo : this is kind of aligned with what we (StreamNative) are working on. We shall be able to share what we have done soon. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103109 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes @chrsoo I generally understand what you mean, Starting with Tenants CRDs, Can you make a specific design and let everyone discuss? GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103108 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes @wolfstudy @sijie @jiazhai Can you disscuss it together? GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103107 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes @wolfstudy thanks for the tip (I am aware) but how is [pulsar-manager](https://github.com/apache/pulsar-manager) not **imperative**? A web UI where you do point-and-click is virtually the same thing as issuing commands on the command line. At each instance in time you tell the system what to do instead of telling it the desired target state and have it figure out the changes by itself. In the end I would like to **declare** the state of my cluster in an external git repository and then have a CI/CD pipeline provision it for me (gitops). The CRD proposed by @sky-big is a really good start and I could work with this towards my end goal, but I need to declare other parts of the configuration as well. Point-and-click will not do the job :-) GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103104 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] wolfstudy added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user wolfstudy added a comment to the discussion: Pulsar Operator On Kubernetes > I would really like to get out of an imperative approach based on > pulsar-admin for managing the cluster. @chrsoo Maybe the [pulsar-manager](https://github.com/apache/pulsar-manager) is a good idea for you. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103103 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes The difficulty here is where to draw the line between more or less static configuration and what should remain dynamically defined at runtime by the clients/users of the system. This might vary in different contexts. I think it could be useful to define CRDs for the following Pulsar configuration items: 1. **Tenants** with namespaces and their relation to Clusters 1. **Persistent Topics** and their relation to tenants/namespaces 1. **Sources** and **Sinks** and their relation to clusters/topics Starting with Tenants the following CRDs might be useful * **PulsarTenant** - defines a Pulsar Tenant * **PulsarNamespace** - defines a Pulsar Namespace for a Tenant * **PulsarNamespaceBinding** - binds a Pulsar Namespace to a PulsarCluster It would be good if Tenants and Namespaces are defined at the K8s cluster level as a Tenants and Namespaces then could be reused across multiple PulsarClusters defined in the different K8s namespaces. The PulsarNamespaceBinding should be defined at the namespace level and in the same namespace as the PulsarCluster to which it is bound. @sky-big - does all this make sense to you? GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103106 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes @chrsoo Cloud you please give a sepcific example based on your sence? then we can add this function in the pulsar operator. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103105 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes Great initiative! I know you have to start somewhere and cannot do everything at once but have you thought about CRD's for Sources and Sinks? What about configuration like Channels, Tenants and Namespaces? I would really like to get out of an imperative approach based on pulsar-admin for managing the cluster. Still trying to figure out how to get the whole configuration for a geo-replicated cluster in Git... GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103102 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes Next step concern pulsar stateful component(include Bookeeper and Zookeeper), use operator control complex stateful component. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103101 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes Basically realized the function of pulsar operator,include pulsar dashboard, grafana, prometheus and so on [Pulsar Operator](https://github.com/sky-big/pulsar-operator),weclome to use and contribute. GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103100 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes
GitHub user sky-big added a comment to the discussion: Pulsar Operator On Kubernetes Welcome everyone to contribute GitHub link: https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103099 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[GitHub] [pulsar] sky-big created a discussion: Pulsar Operator On Kubernetes
GitHub user sky-big created a discussion: Pulsar Operator On Kubernetes Add pulsar operator for kubernetes(https://github.com/sky-big/pulsar-operator) based on deployment/kubernetes/*.yaml GitHub link: https://github.com/apache/pulsar/discussions/18398 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org
[ANNOUNCE] New Committer: Zili Chen
The Project Management Committee (PMC) for Apache Pulsar has invited Zili Chen (https://github.com/tisonkun) to become a committer and we are pleased to announce that he has accepted. Being a committer enables easier contribution to the project since there is no need to go via the patch submission process. This should enable better productivity. Welcome and congratulations, Zili Chen! Please join us in congratulating and welcoming Zili Chen onboard! Best Regards, Yu on behalf of the Pulsar PMC
Re: [DISCUSSION] Redesign the MessageId interface
Hi, I was reading the email thread why we want to change MessageId interface: https://lists.apache.org/thread/rdkqnkohbmkjjs61hvoqplhhngr0b0sd >> Currently we have the following 5 implementations of MessageId: >> These implementations are such a mess. For example, when users get a MessageId from `Producer#send`: I think above discussion started by stating that MessageId has multiple implementations eg: MessageIdImpl, BatchMessageIdImpl, ChunkMessageIdImpl, etc.. and as a client we receive a single MessageId with send message API. Well, I see it's expected and very well defined behavior. No matter what implementation client library is internally using, as a user for my topic I should be able to publish and consume messages by providing specific configurations. The moment a user has to know implementation details of a client library such as chunk or batch message internals, there will be a strong dependency created between application and server implementation which is a RED flag for any system. Abstraction helps user adoption by simplifying user API and allowing systems to enhance without worrying about application dependency. Removing abstraction of MessageId and extracting implementation information in getter API will prevent us to make any implementation change (eg: won't be able to change chunk mesasgeId behavior which we might need in future for Shared-subscription), introduce hacks for any enhancement by making sure existing internal implementation can't change (eg: if we can't change chunk/batch-message-Id then we might create a new wrapper and attach to existing messageId which will make even things worse), and even prevent us to change backend implementation (eg: we will be stuck with BK-Ledger implementation forever). Pulsar is used for many large scale business usecaeses with a large number of users and it can create a nightmare for operators and users if we remove abstractions and start following the practice of exposing system internals to user applications. Thanks, Rajan On Tue, Nov 8, 2022 at 6:05 PM Yunze Xu wrote: > Hi Joe, > > Then what would we expect users to do with the MessageId? It should only > be passed to Consumer#seek or ReaderBuilder#startMessageId? > > What about the partition index? We have a `TopicMetadata` interface that > returns > the number of partitions. If the partition is also "implementation > details", should we expose > this interface? Or should we support customizing a MessageRouter because it > returns the partition index? > > What about the batch index and batch size? For example, we have an > enableBatchIndexAcknowledgment method to enable batch index ACK. If batch > index is also "implementation details", how could users know what does > "batch > index ack" mean? > > Even for ledger id and entry id, this pair represents a logic storage > position like the offset > concept in Kafka (though each offset represents a message while each > entry represents > a batch). If you see the Message API, it also exposes many attributes. > IMO, for the > MessageIdData, only the ack_set (a long array serialized from the > BitSet) is the implementation > detail. > > The MessageId API should be flexible, not an abstract one. If not, why > do we still implement > the toString() method? We should not encourage users to print the > MessageId. It would > be easy to know what "ledger is 0, entry id is 1" means, users only > need to know the concepts > of ledger id and entry id. But it would be harder to know a tuple like > "0:1:-1:-1" means. > > Thanks, > Yunze > > On Tue, Nov 8, 2022 at 11:16 PM Joe F wrote: > > > > >Maybe this design is to hidden some details, but if > > users don't know the details like ledger id and entry id, how could > > you know what does "0:0:-1:0" mean? > > > > Abstractions exist for a reason. Ledgerid and entryid are > implementation > > details, and an application should not be interpreting that at all. > > -j > > > > > > On Tue, Nov 8, 2022 at 3:43 AM Yunze Xu > > wrote: > > > > > I didn't look into these two methods at the moment. But I think it's > > > possible to > > > retain only the `fromByteArray`. > > > > > > Thanks, > > > Yunze > > > > > > On Tue, Nov 8, 2022 at 7:02 PM Enrico Olivelli > > > wrote: > > > > > > > > Il giorno mar 8 nov 2022 alle ore 11:52 Yunze Xu > > > > ha scritto: > > > > > > > > > > Hi Enrico, > > > > > > > > > > > We also need a way to represent this as a String or a byte[] > > > > > > > > > > We already have the `toByteArray` method, right? > > > > > > > > Yes, correct. So we are fine. I forgot about it and I answered too > > > quickly. > > > > > > > > I am not sure if this can be in the scope of this initiative, but we > > > > should somehow get rid of > > > > stuff like "fromByteArrayWithTopic" vs "fromByteArray". > > > > > > > > Thanks > > > > Enrico > > > > > > > > > > > > > > Thanks, > > > > > Yunze > > > > > > > > > > On Tue, Nov 8, 2022 at 6:43 PM Enrico Olivelli < > eolive...@gmail.com> > > > wrote: > > > >
Re: [DISCUSSION] Redesign the MessageId interface
FWIW, the Flink Pulsar connector hacky parses the message id internals to get the next message id: https://github.com/apache/flink/blob/421f057a7488fd64854a82424755f76b89561a0b/flink-connectors/flink-connector-pulsar/src/main/java/org/apache/flink/connector/pulsar/source/enumerator/cursor/MessageIdUtils.java Enrico Olivelli 于2022年11月10日 周四01:03写道: > After reading Joe's comments I have changed my mind. > Actually it is better to not expose "ledgerId" and "entryId" to client > applications. > They are useless pieces of information. > And also if in the future we want to change the way we internally > address a message we will always have to support these fields. > > toByteArray() is enough for an application to save the ID into an > external database and then to recover a Subscription (or a Reader) > from a certain point. > toString() is good only for debug/logs, we can change it but it is > better to not touch it and add "tests" > > It is better that the MessageId API stays as opaque as possible. > > Enrico > > > Il giorno mer 9 nov 2022 alle ore 15:50 Yunze Xu > ha scritto: > > > > Hi Jiaqi, > > > > > I don't think `tostring` should be used in any serious case because it > has > > no standard. > > > > I agree. But it's better to keep it not changed. Just like my previous > reply, it > > might be a de-facto standard because the `toString()` like methods are > used > > in logging, not only for debugging. For example, there is a > getLastMessageId > > API in consumer and users might log the last message ID. > > > > Different representations across different languages is not a big issue > but it > > could make users and administrators confused. > > > > Here is an example that the C++ client supports constructing a MessageId > with > > 4 arguments, but the 1st argument is the partition, not the ledger id. > > However, the > > string representation is still > > ":::". Though > > in Java client a non-batched message ID doesn't have the > > ":" suffix. > > > > Thanks, > > Yunze > > > > On Wed, Nov 9, 2022 at 9:13 PM Jiaqi Shen > wrote: > > > > > > Thanks, this is very inspiring to me. > > > > > > But I have a different opinion on `tostring`. > > > > > > >>You can only see a representation from `toString` method and got some > > > output like "0:0:-1:0". > > > > > > I don't think `tostring` should be used in any serious case because it > has > > > no standard. There is no constraint on how the messageId should be > > > converted to a string. For example, in go client, `tostring` is not > being > > > supported now. If go client should implement a `tostring` method, does > go > > > client' s`tostring` must follow the java implement like "0:0:-1:0"? > > > > > > If user do need a string/[]byte to record a messageId, `toByteArray` > will > > > be enough. In user side, most of the time , I think users don't really > care > > > about the "messageId string" is meaningful. I think `tostring` only > should > > > be used in debug. > > > > > > Thanks, > > > Jiaqi Shen > > > > > > > > > Joe F 于2022年11月9日周三 20:25写道: > > > > > > > Messageid is an identifier which identifies a message. How that id > is > > > > constructed, or what it contains should not matter to an > application, and > > > > an application should not assume anything about the implementation > of that > > > > id. > > > > > > > > >What about the partition index? We have a `TopicMetadata` interface > that > > > > returns the number of partitions. > > > > > > > > Partitioning is a first class concept, and is designed to be used by > > > > application. How a partition is implemented should not be used by > the > > > > application . > > > > > > > > [ People violate this all the time, and I regret that Pulsar did not > > > > provide get_Nth_topicpartion(), which led to people hardcoding it as > > > > topicname-N. and using that directly. Now we are stuck with it.] > > > > > > > > Similarly batch index and batch size. Those are all logical concepts > > > > exposed to the user. For eg: batch size is something the app is > allowed to > > > > tune > > > > > > > > >Even for ledger id and entry id, this pair represents a logic > storage > > > > position like the offset concept in Kafka > > > > These are not equivalent. In Pulsar these are implementation > details, > > > > while in Kafka those are logical concepts. > > > > > > > > One might think that these are logical concepts in Pulsar, because > if you > > > > reverse engineer the current msgid implementation, you observe some > > > > "properties". > > > > > > > > Ledger id/entry id are logical concepts in __Bookkeeper__ , not in > Pulsar. > > > > There is the Managed Ledger abstraction on top of BK, and then there > is > > > > Pulsar on top of ML. You will break two levels of abstraction to > expose > > > > ledger/entryid to an application > > > > > > > > An application should only care about the operations that can be > done > > > > with a messageId > > > > > > > > - getmsgid() to return the message
Re: [DISCUSSION] Redesign the MessageId interface
After reading Joe's comments I have changed my mind. Actually it is better to not expose "ledgerId" and "entryId" to client applications. They are useless pieces of information. And also if in the future we want to change the way we internally address a message we will always have to support these fields. toByteArray() is enough for an application to save the ID into an external database and then to recover a Subscription (or a Reader) from a certain point. toString() is good only for debug/logs, we can change it but it is better to not touch it and add "tests" It is better that the MessageId API stays as opaque as possible. Enrico Il giorno mer 9 nov 2022 alle ore 15:50 Yunze Xu ha scritto: > > Hi Jiaqi, > > > I don't think `tostring` should be used in any serious case because it has > no standard. > > I agree. But it's better to keep it not changed. Just like my previous reply, > it > might be a de-facto standard because the `toString()` like methods are used > in logging, not only for debugging. For example, there is a getLastMessageId > API in consumer and users might log the last message ID. > > Different representations across different languages is not a big issue but it > could make users and administrators confused. > > Here is an example that the C++ client supports constructing a MessageId with > 4 arguments, but the 1st argument is the partition, not the ledger id. > However, the > string representation is still > ":::". Though > in Java client a non-batched message ID doesn't have the > ":" suffix. > > Thanks, > Yunze > > On Wed, Nov 9, 2022 at 9:13 PM Jiaqi Shen wrote: > > > > Thanks, this is very inspiring to me. > > > > But I have a different opinion on `tostring`. > > > > >>You can only see a representation from `toString` method and got some > > output like "0:0:-1:0". > > > > I don't think `tostring` should be used in any serious case because it has > > no standard. There is no constraint on how the messageId should be > > converted to a string. For example, in go client, `tostring` is not being > > supported now. If go client should implement a `tostring` method, does go > > client' s`tostring` must follow the java implement like "0:0:-1:0"? > > > > If user do need a string/[]byte to record a messageId, `toByteArray` will > > be enough. In user side, most of the time , I think users don't really care > > about the "messageId string" is meaningful. I think `tostring` only should > > be used in debug. > > > > Thanks, > > Jiaqi Shen > > > > > > Joe F 于2022年11月9日周三 20:25写道: > > > > > Messageid is an identifier which identifies a message. How that id is > > > constructed, or what it contains should not matter to an application, > > > and > > > an application should not assume anything about the implementation of that > > > id. > > > > > > >What about the partition index? We have a `TopicMetadata` interface that > > > returns the number of partitions. > > > > > > Partitioning is a first class concept, and is designed to be used by > > > application. How a partition is implemented should not be used by the > > > application . > > > > > > [ People violate this all the time, and I regret that Pulsar did not > > > provide get_Nth_topicpartion(), which led to people hardcoding it as > > > topicname-N. and using that directly. Now we are stuck with it.] > > > > > > Similarly batch index and batch size. Those are all logical concepts > > > exposed to the user. For eg: batch size is something the app is allowed > > > to > > > tune > > > > > > >Even for ledger id and entry id, this pair represents a logic storage > > > position like the offset concept in Kafka > > > These are not equivalent. In Pulsar these are implementation details, > > > while in Kafka those are logical concepts. > > > > > > One might think that these are logical concepts in Pulsar, because if you > > > reverse engineer the current msgid implementation, you observe some > > > "properties". > > > > > > Ledger id/entry id are logical concepts in __Bookkeeper__ , not in > > > Pulsar. > > > There is the Managed Ledger abstraction on top of BK, and then there is > > > Pulsar on top of ML. You will break two levels of abstraction to expose > > > ledger/entryid to an application > > > > > > An application should only care about the operations that can be done > > > with a messageId > > > > > > - getmsgid() to return the message id as an opaque object > > > > > > [Operators using one messageId ] > > > -serde, like tostring(). for storage/retrieval of message identifier > > > -getter/setter on logical properties of the message (partition id etc...) > > > -increment/decrement > > > > > > [Operators that take multiple messageIds] > > > -comparator > > > -range > > > > > > Those are the kind of operators Pulsar should provide to a user. > > > Applications should not implement these operators on their own by reverse > > > engineering the msgId. No application should be directly using ledgerid or > > > entryid
Re: [DISCUSSION] Redesign the MessageId interface
Hi Jiaqi, > I don't think `tostring` should be used in any serious case because it has no standard. I agree. But it's better to keep it not changed. Just like my previous reply, it might be a de-facto standard because the `toString()` like methods are used in logging, not only for debugging. For example, there is a getLastMessageId API in consumer and users might log the last message ID. Different representations across different languages is not a big issue but it could make users and administrators confused. Here is an example that the C++ client supports constructing a MessageId with 4 arguments, but the 1st argument is the partition, not the ledger id. However, the string representation is still ":::". Though in Java client a non-batched message ID doesn't have the ":" suffix. Thanks, Yunze On Wed, Nov 9, 2022 at 9:13 PM Jiaqi Shen wrote: > > Thanks, this is very inspiring to me. > > But I have a different opinion on `tostring`. > > >>You can only see a representation from `toString` method and got some > output like "0:0:-1:0". > > I don't think `tostring` should be used in any serious case because it has > no standard. There is no constraint on how the messageId should be > converted to a string. For example, in go client, `tostring` is not being > supported now. If go client should implement a `tostring` method, does go > client' s`tostring` must follow the java implement like "0:0:-1:0"? > > If user do need a string/[]byte to record a messageId, `toByteArray` will > be enough. In user side, most of the time , I think users don't really care > about the "messageId string" is meaningful. I think `tostring` only should > be used in debug. > > Thanks, > Jiaqi Shen > > > Joe F 于2022年11月9日周三 20:25写道: > > > Messageid is an identifier which identifies a message. How that id is > > constructed, or what it contains should not matter to an application, and > > an application should not assume anything about the implementation of that > > id. > > > > >What about the partition index? We have a `TopicMetadata` interface that > > returns the number of partitions. > > > > Partitioning is a first class concept, and is designed to be used by > > application. How a partition is implemented should not be used by the > > application . > > > > [ People violate this all the time, and I regret that Pulsar did not > > provide get_Nth_topicpartion(), which led to people hardcoding it as > > topicname-N. and using that directly. Now we are stuck with it.] > > > > Similarly batch index and batch size. Those are all logical concepts > > exposed to the user. For eg: batch size is something the app is allowed to > > tune > > > > >Even for ledger id and entry id, this pair represents a logic storage > > position like the offset concept in Kafka > > These are not equivalent. In Pulsar these are implementation details, > > while in Kafka those are logical concepts. > > > > One might think that these are logical concepts in Pulsar, because if you > > reverse engineer the current msgid implementation, you observe some > > "properties". > > > > Ledger id/entry id are logical concepts in __Bookkeeper__ , not in Pulsar. > > There is the Managed Ledger abstraction on top of BK, and then there is > > Pulsar on top of ML. You will break two levels of abstraction to expose > > ledger/entryid to an application > > > > An application should only care about the operations that can be done > > with a messageId > > > > - getmsgid() to return the message id as an opaque object > > > > [Operators using one messageId ] > > -serde, like tostring(). for storage/retrieval of message identifier > > -getter/setter on logical properties of the message (partition id etc...) > > -increment/decrement > > > > [Operators that take multiple messageIds] > > -comparator > > -range > > > > Those are the kind of operators Pulsar should provide to a user. > > Applications should not implement these operators on their own by reverse > > engineering the msgId. No application should be directly using ledgerid or > > entryid for doing anything (math or logic), > > > > As long as Pulsar provides these operations with msgid to the > > application, it should not care whether it's represented as "0:1:-1:-1" > > or "a:b:-b-b", or "#xba4231!haxcy1826923f" or as a serialized binary > > object or..whatever it may be. > > > > >>But it would be harder to know a tuple like "0:1:-1:-1" means. > > > > A user shouldn't have to know what this means. That's the point. > > > > Pulsar itself changed the messageId multiple times as it added > > partitioning, batching and so on, and it might do so again. And bookkeeper > > could change its representation of ledgers, (for eg, to uuids and byte > > offsets) ML could replace BK with something else (for eg. a table in a > > db.) Anything is possible - Pulsar would then just have to change the > > implementation of the operator functions, and no application needs to be > > rewritten. > > > > -j > >
Re: [DISCUSSION] Redesign the MessageId interface
Hi Joe, I think the most controversial point is what should a MessageId be used for. >From your opinion, it should only be used as a comparable object (opaque), which represents the position of a message [1]. What I have thought is, MessageId should be a wrapper of the MessageIdData in PulsarApi.proto [2]. I agree that at application side, there are not many cases that need to look into the details of a specific MessageId implementation. However, these "internal fields" are the de-facto concepts in Pulsar. Yeah, ledger id and entry id are concepts from BK and they might change. I doubt if there is any Pulsar application developer that doesn't know these two concepts? Let's look at the clients of other languages, including C++[3], Golang[4], Python[5], Rust[6][7], C#[8]. Only the official Java client doesn't expose these fields. I'm also okay to not change the MessageId interface in the pulsar-client-api module. Because the motivation is to simplify the messed implementations of MessageId in the pulsar-client module. We can add a MessageIdPojo class in the pulsar-common module or else. In the Pulsar repo, all MessageId implementations must extend the MessageIdPojo class. ```java @Data class MessageIdPojo extends MessageId { private final long ledgerId; // ... } ``` [1] https://pulsar.apache.org/docs/concepts-messaging#messages [2] https://github.com/apache/pulsar/blob/b31c5a6a325728b5dc5faebd1a33386952d733d5/pulsar-common/src/main/proto/PulsarApi.proto#L57 [3] https://github.com/apache/pulsar-client-cpp/blob/main/include/pulsar/MessageId.h [4] https://github.com/apache/pulsar-client-go/blob/d9c8b0ab9c14e8d571b632c93002ea20db1a2c16/pulsar/message.go#L147 [5] https://github.com/apache/pulsar-client-python/blob/75a57b427d4c6944c49f4b712344107b5444aa36/pulsar/__init__.py#L84 [6] https://github.com/streamnative/pulsar-rs/blob/de59974080daa248bfdeaea8510eb72ec8f30bac/src/consumer.rs#L1984 [7] https://github.com/streamnative/pulsar-rs/blob/de59974080daa248bfdeaea8510eb72ec8f30bac/src/consumer.rs#L1353 [8] https://github.com/apache/pulsar-dotpulsar/blob/0590b1ad6c4474d425662352ba62abb41bfb9f0a/src/DotPulsar/MessageId.cs#L56 Thanks, Yunze On Wed, Nov 9, 2022 at 8:24 PM Joe F wrote: > > Messageid is an identifier which identifies a message. How that id is > constructed, or what it contains should not matter to an application, and > an application should not assume anything about the implementation of that > id. > > >What about the partition index? We have a `TopicMetadata` interface that > returns the number of partitions. > > Partitioning is a first class concept, and is designed to be used by > application. How a partition is implemented should not be used by the > application . > > [ People violate this all the time, and I regret that Pulsar did not > provide get_Nth_topicpartion(), which led to people hardcoding it as > topicname-N. and using that directly. Now we are stuck with it.] > > Similarly batch index and batch size. Those are all logical concepts > exposed to the user. For eg: batch size is something the app is allowed to > tune > > >Even for ledger id and entry id, this pair represents a logic storage > position like the offset concept in Kafka > These are not equivalent. In Pulsar these are implementation details, > while in Kafka those are logical concepts. > > One might think that these are logical concepts in Pulsar, because if you > reverse engineer the current msgid implementation, you observe some > "properties". > > Ledger id/entry id are logical concepts in __Bookkeeper__ , not in Pulsar. > There is the Managed Ledger abstraction on top of BK, and then there is > Pulsar on top of ML. You will break two levels of abstraction to expose > ledger/entryid to an application > > An application should only care about the operations that can be done > with a messageId > > - getmsgid() to return the message id as an opaque object > > [Operators using one messageId ] > -serde, like tostring(). for storage/retrieval of message identifier > -getter/setter on logical properties of the message (partition id etc...) > -increment/decrement > > [Operators that take multiple messageIds] > -comparator > -range > > Those are the kind of operators Pulsar should provide to a user. > Applications should not implement these operators on their own by reverse > engineering the msgId. No application should be directly using ledgerid or > entryid for doing anything (math or logic), > > As long as Pulsar provides these operations with msgid to the > application, it should not care whether it's represented as "0:1:-1:-1" > or "a:b:-b-b", or "#xba4231!haxcy1826923f" or as a serialized binary > object or..whatever it may be. > > >>But it would be harder to know a tuple like "0:1:-1:-1" means. > > A user shouldn't have to know what this means. That's the point. > > Pulsar itself changed the messageId multiple times as it added > partitioning, batching and so on, and it
Re: [DISCUSSION] Redesign the MessageId interface
Thanks, this is very inspiring to me. But I have a different opinion on `tostring`. >>You can only see a representation from `toString` method and got some output like "0:0:-1:0". I don't think `tostring` should be used in any serious case because it has no standard. There is no constraint on how the messageId should be converted to a string. For example, in go client, `tostring` is not being supported now. If go client should implement a `tostring` method, does go client' s`tostring` must follow the java implement like "0:0:-1:0"? If user do need a string/[]byte to record a messageId, `toByteArray` will be enough. In user side, most of the time , I think users don't really care about the "messageId string" is meaningful. I think `tostring` only should be used in debug. Thanks, Jiaqi Shen Joe F 于2022年11月9日周三 20:25写道: > Messageid is an identifier which identifies a message. How that id is > constructed, or what it contains should not matter to an application, and > an application should not assume anything about the implementation of that > id. > > >What about the partition index? We have a `TopicMetadata` interface that > returns the number of partitions. > > Partitioning is a first class concept, and is designed to be used by > application. How a partition is implemented should not be used by the > application . > > [ People violate this all the time, and I regret that Pulsar did not > provide get_Nth_topicpartion(), which led to people hardcoding it as > topicname-N. and using that directly. Now we are stuck with it.] > > Similarly batch index and batch size. Those are all logical concepts > exposed to the user. For eg: batch size is something the app is allowed to > tune > > >Even for ledger id and entry id, this pair represents a logic storage > position like the offset concept in Kafka > These are not equivalent. In Pulsar these are implementation details, > while in Kafka those are logical concepts. > > One might think that these are logical concepts in Pulsar, because if you > reverse engineer the current msgid implementation, you observe some > "properties". > > Ledger id/entry id are logical concepts in __Bookkeeper__ , not in Pulsar. > There is the Managed Ledger abstraction on top of BK, and then there is > Pulsar on top of ML. You will break two levels of abstraction to expose > ledger/entryid to an application > > An application should only care about the operations that can be done > with a messageId > > - getmsgid() to return the message id as an opaque object > > [Operators using one messageId ] > -serde, like tostring(). for storage/retrieval of message identifier > -getter/setter on logical properties of the message (partition id etc...) > -increment/decrement > > [Operators that take multiple messageIds] > -comparator > -range > > Those are the kind of operators Pulsar should provide to a user. > Applications should not implement these operators on their own by reverse > engineering the msgId. No application should be directly using ledgerid or > entryid for doing anything (math or logic), > > As long as Pulsar provides these operations with msgid to the > application, it should not care whether it's represented as "0:1:-1:-1" > or "a:b:-b-b", or "#xba4231!haxcy1826923f" or as a serialized binary > object or..whatever it may be. > > >>But it would be harder to know a tuple like "0:1:-1:-1" means. > > A user shouldn't have to know what this means. That's the point. > > Pulsar itself changed the messageId multiple times as it added > partitioning, batching and so on, and it might do so again. And bookkeeper > could change its representation of ledgers, (for eg, to uuids and byte > offsets) ML could replace BK with something else (for eg. a table in a > db.) Anything is possible - Pulsar would then just have to change the > implementation of the operator functions, and no application needs to be > rewritten. > > -j > > On Tue, Nov 8, 2022 at 6:05 PM Yunze Xu > wrote: > > > Hi Joe, > > > > Then what would we expect users to do with the MessageId? It should only > > be passed to Consumer#seek or ReaderBuilder#startMessageId? > > > > What about the partition index? We have a `TopicMetadata` interface that > > returns > > the number of partitions. If the partition is also "implementation > > details", should we expose > > this interface? Or should we support customizing a MessageRouter because > it > > returns the partition index? > > > > What about the batch index and batch size? For example, we have an > > enableBatchIndexAcknowledgment method to enable batch index ACK. If batch > > index is also "implementation details", how could users know what does > > "batch > > index ack" mean? > > > > Even for ledger id and entry id, this pair represents a logic storage > > position like the offset > > concept in Kafka (though each offset represents a message while each > > entry represents > > a batch). If you see the Message API, it also exposes many
Re: [DISCUSSION] Redesign the MessageId interface
Messageid is an identifier which identifies a message. How that id is constructed, or what it contains should not matter to an application, and an application should not assume anything about the implementation of that id. >What about the partition index? We have a `TopicMetadata` interface that returns the number of partitions. Partitioning is a first class concept, and is designed to be used by application. How a partition is implemented should not be used by the application . [ People violate this all the time, and I regret that Pulsar did not provide get_Nth_topicpartion(), which led to people hardcoding it as topicname-N. and using that directly. Now we are stuck with it.] Similarly batch index and batch size. Those are all logical concepts exposed to the user. For eg: batch size is something the app is allowed to tune >Even for ledger id and entry id, this pair represents a logic storage position like the offset concept in Kafka These are not equivalent. In Pulsar these are implementation details, while in Kafka those are logical concepts. One might think that these are logical concepts in Pulsar, because if you reverse engineer the current msgid implementation, you observe some "properties". Ledger id/entry id are logical concepts in __Bookkeeper__ , not in Pulsar. There is the Managed Ledger abstraction on top of BK, and then there is Pulsar on top of ML. You will break two levels of abstraction to expose ledger/entryid to an application An application should only care about the operations that can be done with a messageId - getmsgid() to return the message id as an opaque object [Operators using one messageId ] -serde, like tostring(). for storage/retrieval of message identifier -getter/setter on logical properties of the message (partition id etc...) -increment/decrement [Operators that take multiple messageIds] -comparator -range Those are the kind of operators Pulsar should provide to a user. Applications should not implement these operators on their own by reverse engineering the msgId. No application should be directly using ledgerid or entryid for doing anything (math or logic), As long as Pulsar provides these operations with msgid to the application, it should not care whether it's represented as "0:1:-1:-1" or "a:b:-b-b", or "#xba4231!haxcy1826923f" or as a serialized binary object or..whatever it may be. >>But it would be harder to know a tuple like "0:1:-1:-1" means. A user shouldn't have to know what this means. That's the point. Pulsar itself changed the messageId multiple times as it added partitioning, batching and so on, and it might do so again. And bookkeeper could change its representation of ledgers, (for eg, to uuids and byte offsets) ML could replace BK with something else (for eg. a table in a db.) Anything is possible - Pulsar would then just have to change the implementation of the operator functions, and no application needs to be rewritten. -j On Tue, Nov 8, 2022 at 6:05 PM Yunze Xu wrote: > Hi Joe, > > Then what would we expect users to do with the MessageId? It should only > be passed to Consumer#seek or ReaderBuilder#startMessageId? > > What about the partition index? We have a `TopicMetadata` interface that > returns > the number of partitions. If the partition is also "implementation > details", should we expose > this interface? Or should we support customizing a MessageRouter because it > returns the partition index? > > What about the batch index and batch size? For example, we have an > enableBatchIndexAcknowledgment method to enable batch index ACK. If batch > index is also "implementation details", how could users know what does > "batch > index ack" mean? > > Even for ledger id and entry id, this pair represents a logic storage > position like the offset > concept in Kafka (though each offset represents a message while each > entry represents > a batch). If you see the Message API, it also exposes many attributes. > IMO, for the > MessageIdData, only the ack_set (a long array serialized from the > BitSet) is the implementation > detail. > > The MessageId API should be flexible, not an abstract one. If not, why > do we still implement > the toString() method? We should not encourage users to print the > MessageId. It would > be easy to know what "ledger is 0, entry id is 1" means, users only > need to know the concepts > of ledger id and entry id. But it would be harder to know a tuple like > "0:1:-1:-1" means. > > Thanks, > Yunze > > On Tue, Nov 8, 2022 at 11:16 PM Joe F wrote: > > > > >Maybe this design is to hidden some details, but if > > users don't know the details like ledger id and entry id, how could > > you know what does "0:0:-1:0" mean? > > > > Abstractions exist for a reason. Ledgerid and entryid are > implementation > > details, and an application should not be interpreting that at all. > > -j > > > > > > On Tue, Nov 8, 2022 at 3:43 AM Yunze Xu > > wrote: > > > > > I didn't look into
Re: [DISCUSSION] Redesign the MessageId interface
Hi Haiting, > But please make sure we have to make it compatible with previous implementations, like the `toString` method Yeah, I agree, I will keep it compatible. BTW, while I'm working on this, I found the MessageId implementations are more complicated than I thought. The MessageIdImpl class must be a POJO. Otherwise it cannot be passed into the admin API as the Entity. But I still insists on returning an Optional instead of T with default value explained in API docs. Thanks, Yunze On Wed, Nov 9, 2022 at 3:02 PM Haiting Jiang wrote: > > Overall, this makes sense to me. > The current status of MessageId is a bit messy, especially for client > developers and senior users who are interested in the implementation > details. > But please make sure we have to make it compatible with previous > implementations, like the `toString` method, I bet someone has already > done the parsing and got the ledgerId and entryId from it. > > > Thanks, > Haiting > > On Tue, Nov 8, 2022 at 6:25 PM Yunze Xu wrote: > > > > Hi all, > > > > Currently we have the following 5 implementations of MessageId: > > > > - MessageIdImpl: (ledger id, entry id, partition index) > > - BatchMessageIdImpl: adds (batch index, batch size, acker), where > > acker is a wrapper of a BitSet. > > - ChunkMessageIdImpl: adds another MessageIdImpl that represents > > the first MessageIdImpl of a BitSet. > > - MultiMessageIdImpl: adds a map that maps the topic name to the > > MessageId. > > - TopicMessageIdImpl: adds the topic name and the partition name > > > > These implementations are such a mess. For example, when users get a > > MessageId from `Producer#send`: > > > > ```java > > var id = producer.send("msg"); > > ``` > > > > There is no getter to get some specific fields like ledger id. You can > > only see a representation from `toString` method and got some output > > like "0:0:-1:0". Maybe this design is to hidden some details, but if > > users don't know the details like ledger id and entry id, how could > > you know what does "0:0:-1:0" mean? What if `MessageId#toString`'s > > implementation changed? Should it be treated as a breaking change? > > > > The original definition of the underlying MessageIdData is much more > > clear: > > > > ```proto > > message MessageIdData { > > required uint64 ledgerId = 1; > > required uint64 entryId = 2; > > optional int32 partition = 3 [default = -1]; > > optional int32 batch_index = 4 [default = -1]; > > repeated int64 ack_set = 5; > > optional int32 batch_size = 6; > > > > // For the chunk message id, we need to specify the first chunk message > > id. > > optional MessageIdData first_chunk_message_id = 7; > > } > > ``` > > > > IMO, MessageId should be a wrapper of MessageIdData. It's more natural > > to have an interface like: > > > > ```java > > interface MessageId { > > long ledgerId(); > > long entryId(); > > Optional partition(); > > Optional batchIndex(); > > // ... > > ``` > > > > Additionally, there are many places that use only the triple of > > (ledger id, entry id, batch index) as the key to represent the position. > > Currently, they are done by adding a conversion from > > BatchMessageIdImpl to MessageIdImpl. However, it's more intuitive to > > write something like: > > > > ```java > > class MessageIdPosition implements Comparable { > > private final MessageId messageId; > > // TODO: compare only the triple (ledger, entry, batch) > > ``` > > > > Therefore, I'm going to write a proposal to redesign the MessageId > > interface only by adding some getters. Regarding the 5 existing > > implementations, I think we can drop them because they are a part > > of `pulsar-client`, not `pulsar-client-api`. > > > > Please feel free to share your points. > > > > Thanks, > > Yunze
Re: [DISCUSS] PIP-175: Extend time based release process
Hi all, What's status of this PIP? There's an issue talking about fixable vulnerabilities in the latest release. https://github.com/apache/pulsar/issues/18348 >From what I see, one of the problems is that we take too long to make a new >release ( over 2 months for 2.10.2 ). Hopefully, this PIP could do some help >on the issue. Thanks, Haiting On 2022/06/07 22:25:24 Matteo Merli wrote: > https://github.com/apache/pulsar/issues/15966 > > > > ## Motivation > > In PIP-47 > (https://github.com/apache/pulsar/wiki/PIP-47:-Time-Based-Release-Plan), > we have adopted a time-based release plan. This was the first attempt > at establishing a new principle on how releases should b > > The main two benefits of this approach have been: > > 1. Clarity for users and developers on when to expect a release > 2. Breaking a hard relationship between feature and release: a > particular feature will be included in the release if it is completed > in time. Otherwise, it will be bubbled up to the next release. > > The motivation for the current proposal is to extend the existing > process to address the issues that we have seen and that were left out > of the scope of PIP-47. > > ## Summary of existing issues in the process > > ### Short maintenance cycles for releases > > Since we're doing a 3 months release cycle, we are ending with 4 > releases done per year, even though it's more close to 3 releases. > > There is a high cost to maintain a lot of old releases, backport bug > fixes, and security patches. In general, we actively support the last > 3 minor releases while continuing to develop the next release. E.g., > 2.8, 2.9, and 2.10, while 2.11 is under development. > > The result is that a user adopting a particular release is forced to > upgrade in a < 1-year timeframe to keep up to date and use a supported > release. This timeframe is too short for many users as it imposes a > lot of forced upgrades, for which they are not prepared in terms of > available time and required effort. > > ### Live Upgrade/Downgrade compatibility path > > In Pulsar, we guarantee that users have a way to do live upgrades and > downgrades with zero downtime. > > This is very powerful because it gives them the freedom to upgrade to > a new release with the assurance of being able to roll back to the > previous release in case any functional or performance regressions are > encountered. > > Today, this compatibility is guaranteed across minor versions. Eg: I > can do `2.7 -> 2.8 -> 2.7` as a live upgrade. > > What is not guaranteed is to "skip" releases. E.g.: `2.7 -> 2.9` might > work or not, but it's not guaranteed. In that case an intermediated > upgrade would be required: `2.7 -> 2.8 -> 2.9`. > > The reasons for which the "skip" upgrade might not work are multiple: > 1. Incompatible upgrade of some dependency (e.g., ZooKeeper) that > might not be compatible with an older version. > 2. Adoption of a new metadata format or data format on disk. > Every time we introduce a new incompatible format change (outside > of a regular Protobuf field addition), we do it in a 2 steps way: > - In a new release, we introduce the new feature/format, > disabled by default. The new release can read both old and new > formats, though it keeps writing the old format by default. > - In a subsequent release, we change the default to the new format > > Note that this consideration is separate from the compatibility > between clients and brokers, where we ***never*** break compatibility. > The oldest available Pulsar client can still talk with the newest > Pulsar broker, and vice versa, a new client, will be perfectly fine > with an older broker (except the new features won't be working). > > ### Releases getting delayed > > Another problem we have been experiencing is that release cycles have > been stretching considerably. Part of this has been because we have > been reaching the end of the release window, preparing a candidate, > and then taking a long time to flush out all issues found at the last > minute in the new release. > > We need to ensure that we have a date set in stone to deliver the > release to users. > > ## Proposal > > The proposal to address the above issues is composed of 2 parts. > > ### 1. Establish Long Term Support releases > > We need to provide a way for users to quickly understand the expected > lifecycle timeline of a given release and for that timeline to be long > enough not to be a constant update mandate. > > At the same time, we need to ensure that we maintainers are not > spending all the time just maintaining a huge list of old releases. > > For that, we can use the established concept of "Long Term Releases" or LTS. > > We will perform LTS releases at a fixed cadence every 18 months, and > we will keep doing regular feature releases every 3 months as we're > currently doing. > > The LTS releases will be identified by being a `.0` version. For example: > * `3.0`
[GitHub] [pulsar] Jason918 added a comment to the discussion: Why pulsar retention quota must exceed configured backlog quota
GitHub user Jason918 added a comment to the discussion: Why pulsar retention quota must exceed configured backlog quota > Hello @Jason918 , @michaeljmarshall , but I still confused, could you please > explain for me why backlog quota couldn't exceed the retention? Is there any > risk if we set backlog quota larger than retention? Messages will be deleted if they meet the constrains of retention policy. So if backlog quota is larger than retention, it just won't take effect. GitHub link: https://github.com/apache/pulsar/discussions/17697#discussioncomment-4094535 This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org