Re: [ANNOUNCE] New Committer: Zili Chen

2022-11-09 Thread Aloys Zhang
Congratulations! tison

Enrico Olivelli  于2022年11月10日周四 15:09写道:

> Congratulations!
>
> Enrico
>
> Il Gio 10 Nov 2022, 08:07 Haiting Jiang  ha
> scritto:
>
> > Congratulations!
> >
> >
> > BR,
> > Haiting
> >
> > On Thu, Nov 10, 2022 at 2:14 PM Jun Ma  wrote:
> > >
> > > Congratulations!
> > >
> > >
> > > >
> > > > Yu  :
> > > >
> > > > > The Project Management Committee (PMC) for Apache Pulsar has
> invited
> > Zili
> > > > > Chen (https://github.com/tisonkun)
> > > > > to become a committer and we are pleased to announce that he has
> > > > accepted.
> > > > >
> > > > > Being a committer enables easier contribution to the
> > > > > project since there is no need to go via the patch
> > > > > submission process. This should enable better productivity.
> > > > >
> > > > > Welcome and congratulations, Zili Chen!
> > > > >
> > > > > Please join us in congratulating and welcoming Zili Chen onboard!
> > > > >
> > > > > Best Regards,
> > > > > Yu on behalf of the Pulsar PMC
> > > > >
> > > >
> >
>


Re: [DISCUSS] Make the behavior of producers that use the mode WaitForExclusive clear

2022-11-09 Thread Enrico Olivelli
Yubiao,

Il Gio 10 Nov 2022, 07:40 Yubiao Feng 
ha scritto:

> Hi all,
>
> Now if an Exclusive-Mode producer has been registered, we can register
> another WaitForExclusive-Mode producer.  When the old producer goes
> offline, the new producer will take over as the new exclusive producer.
>
> < Highlight >But if some Shared-Mode producers have been registered, we can
> register another WaitForExclusive-Mode producer too, but even if all
> Shared-Mode goes offline, the new producer will not take over as the new
> exclusive producer.


This looks like a bug to me.
When you start a -Exclusive producer you want to take control over the
topic.
I think that we should kick out all the connected Shared producers.
The competition is among Exclusive producers and Shared don't play the game.
The fix should be easy

Enrico


Then, if we register a new WaitForExclusive-Mode
> producer, the earlier WaitForExclusive-Mode producer will take over as the
> new exclusive producer.
>
> < Highlight >I want to make the behavior of producers that use the mode
> WaitForExclusive clear: When no other mode of producer exists for a topic,
> the first producer in the WaitForExclusive queue can auto succeeds as the
> exclusive producer.
>
> Thanks
> Yubiao
>


Re: [ANNOUNCE] New Committer: Zili Chen

2022-11-09 Thread Enrico Olivelli
Congratulations!

Enrico

Il Gio 10 Nov 2022, 08:07 Haiting Jiang  ha scritto:

> Congratulations!
>
>
> BR,
> Haiting
>
> On Thu, Nov 10, 2022 at 2:14 PM Jun Ma  wrote:
> >
> > Congratulations!
> >
> >
> > >
> > > Yu  :
> > >
> > > > The Project Management Committee (PMC) for Apache Pulsar has invited
> Zili
> > > > Chen (https://github.com/tisonkun)
> > > > to become a committer and we are pleased to announce that he has
> > > accepted.
> > > >
> > > > Being a committer enables easier contribution to the
> > > > project since there is no need to go via the patch
> > > > submission process. This should enable better productivity.
> > > >
> > > > Welcome and congratulations, Zili Chen!
> > > >
> > > > Please join us in congratulating and welcoming Zili Chen onboard!
> > > >
> > > > Best Regards,
> > > > Yu on behalf of the Pulsar PMC
> > > >
> > >
>


Re: [ANNOUNCE] New Committer: Zili Chen

2022-11-09 Thread Haiting Jiang
Congratulations!


BR,
Haiting

On Thu, Nov 10, 2022 at 2:14 PM Jun Ma  wrote:
>
> Congratulations!
>
>
> >
> > Yu  :
> >
> > > The Project Management Committee (PMC) for Apache Pulsar has invited Zili
> > > Chen (https://github.com/tisonkun)
> > > to become a committer and we are pleased to announce that he has
> > accepted.
> > >
> > > Being a committer enables easier contribution to the
> > > project since there is no need to go via the patch
> > > submission process. This should enable better productivity.
> > >
> > > Welcome and congratulations, Zili Chen!
> > >
> > > Please join us in congratulating and welcoming Zili Chen onboard!
> > >
> > > Best Regards,
> > > Yu on behalf of the Pulsar PMC
> > >
> >


[DISCUSS] Make the behavior of producers that use the mode WaitForExclusive clear

2022-11-09 Thread Yubiao Feng
Hi all,

Now if an Exclusive-Mode producer has been registered, we can register
another WaitForExclusive-Mode producer.  When the old producer goes
offline, the new producer will take over as the new exclusive producer.

< Highlight >But if some Shared-Mode producers have been registered, we can
register another WaitForExclusive-Mode producer too, but even if all
Shared-Mode goes offline, the new producer will not take over as the new
exclusive producer. Then, if we register a new WaitForExclusive-Mode
producer, the earlier WaitForExclusive-Mode producer will take over as the
new exclusive producer.

< Highlight >I want to make the behavior of producers that use the mode
WaitForExclusive clear: When no other mode of producer exists for a topic,
the first producer in the WaitForExclusive queue can auto succeeds as the
exclusive producer.

Thanks
Yubiao


[GitHub] [pulsar] github-actions[bot] added a comment to the discussion: Pulsar Audit - Question

2022-11-09 Thread GitBox


GitHub user github-actions[bot] added a comment to the discussion: Pulsar Audit 
- Question

The issue had no activity for 30 days, mark with Stale label.

GitHub link: 
https://github.com/apache/pulsar/discussions/18404#discussioncomment-4104450


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: [ANNOUNCE] New Committer: Zili Chen

2022-11-09 Thread Jun Ma
Congratulations!


>
> Yu  :
>
> > The Project Management Committee (PMC) for Apache Pulsar has invited Zili
> > Chen (https://github.com/tisonkun)
> > to become a committer and we are pleased to announce that he has
> accepted.
> >
> > Being a committer enables easier contribution to the
> > project since there is no need to go via the patch
> > submission process. This should enable better productivity.
> >
> > Welcome and congratulations, Zili Chen!
> >
> > Please join us in congratulating and welcoming Zili Chen onboard!
> >
> > Best Regards,
> > Yu on behalf of the Pulsar PMC
> >
>


Re: [ANNOUNCE] New Committer: Zili Chen

2022-11-09 Thread ZhangJian He
Congratulations!

Thanks
ZhangJian He

On Thu, 10 Nov 2022 at 14:01, houxiaoyu  wrote:

> Congratulations!
>
> Best,
> Xiaoyu Hou
>
> Yu  于2022年11月10日周四 08:16写道:
>
> > The Project Management Committee (PMC) for Apache Pulsar has invited Zili
> > Chen (https://github.com/tisonkun)
> > to become a committer and we are pleased to announce that he has
> accepted.
> >
> > Being a committer enables easier contribution to the
> > project since there is no need to go via the patch
> > submission process. This should enable better productivity.
> >
> > Welcome and congratulations, Zili Chen!
> >
> > Please join us in congratulating and welcoming Zili Chen onboard!
> >
> > Best Regards,
> > Yu on behalf of the Pulsar PMC
> >
>


Re: Request a site ID and tracking code for Apache Pulsar

2022-11-09 Thread tison
Hi Martijn,

Thanks for your help! We successfully integrate Matomo to the Pulsar
website now.

I'd like to know how to create an account to add segments or further
analysis board. It seems an Apache account is not carried to Matomo account
system.

Best,
tison.


tison  于2022年11月7日周一 16:49写道:

>
> Hi Martijn,
>
> Thank you!
> Best,
> tison.
>
>
> Martijn Visser  于2022年11月7日周一 16:44写道:
>
>> Hi Tison,
>>
>> It most certainly is but I haven't had the time yet to reply yet. My
>> apologies!
>>
>> Here's the tracking code for Pulsar. The code will need to be integrated
>> in
>> any page you want to track by adding it before the  tag. You can
>> find the results at https://analytics.apache.org
>>
>> 
>> 
>>   var _paq = window._paq = window._paq || [];
>>   /* tracker methods like "setCustomDimension" should be called before
>>   "trackPageView" */
>>   /* We explicitly disable cookie tracking to avoid privacy issues */
>>   _paq.push(['disableCookies']);
>>   _paq.push(['trackPageView']);
>>   _paq.push(['enableLinkTracking']);
>>   (function() {
>> var u="https://analytics.apache.org/";;
>> _paq.push(['setTrackerUrl', u+'matomo.php']);
>> _paq.push(['setSiteId', '32']);
>> var d=document, g=d.createElement('script'),
>> s=d.getElementsByTagName('script')[0];
>> g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
>>   })();
>> 
>> 
>>
>> Best regards,
>>
>> Martijn
>>
>> Op ma 7 nov. 2022 om 09:17 schreef tison :
>>
>> > Hi,
>> >
>> > Is here the correct place to send this request to?
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > Dave Fisher  于2022年11月2日周三 00:36写道:
>> >
>> > > - privacy.
>> > >
>> > > THANK YOU!
>> > >
>> > > Sent from my iPhone
>> > >
>> > > > On Nov 1, 2022, at 9:21 AM, tison  wrote:
>> > > >
>> > > > Hi Privacy Team,
>> > > >
>> > > > As proposed in https://github.com/apache/pulsar/issues/15664, the
>> > Apache
>> > > > Pulsar community is actively migrating from Google Analytics to the
>> > > Matomo
>> > > > solution.
>> > > >
>> > > > Reading from https://privacy.apache.org/matomo/, I send this email
>> to
>> > > > request a site ID and tracking code for Apache Pulsar. I think this
>> > > setting
>> > > > is public and the motivation is provided.
>> > > >
>> > > > dev@pulsar.a.o in cc. Please correct me if more prerequisites are
>> > > needed.
>> > > >
>> > > > Best,
>> > > > tison.
>> > >
>> > >
>> >
>>
>


Re: [ANNOUNCE] New Committer: Zili Chen

2022-11-09 Thread houxiaoyu
Congratulations!

Best,
Xiaoyu Hou

Yu  于2022年11月10日周四 08:16写道:

> The Project Management Committee (PMC) for Apache Pulsar has invited Zili
> Chen (https://github.com/tisonkun)
> to become a committer and we are pleased to announce that he has accepted.
>
> Being a committer enables easier contribution to the
> project since there is no need to go via the patch
> submission process. This should enable better productivity.
>
> Welcome and congratulations, Zili Chen!
>
> Please join us in congratulating and welcoming Zili Chen onboard!
>
> Best Regards,
> Yu on behalf of the Pulsar PMC
>


Re: [ANNOUNCE] New Committer: Zili Chen

2022-11-09 Thread Dave Fisher
Congratulations to tisun kun / zilli chen!

Welcome!

Sent from my iPhone

> On Nov 9, 2022, at 4:16 PM, Yu  wrote:
> 
> The Project Management Committee (PMC) for Apache Pulsar has invited Zili
> Chen (https://github.com/tisonkun)
> to become a committer and we are pleased to announce that he has accepted.
> 
> Being a committer enables easier contribution to the
> project since there is no need to go via the patch
> submission process. This should enable better productivity.
> 
> Welcome and congratulations, Zili Chen!
> 
> Please join us in congratulating and welcoming Zili Chen onboard!
> 
> Best Regards,
> Yu on behalf of the Pulsar PMC



Re: [ANNOUNCE] New Committer: Zili Chen

2022-11-09 Thread lin chen
> Congratulations! Zili

> 在 2022年11月10日,上午11:18,Max Xu  写道:
> 
> Congratulations! Zili


[GitHub] [pulsar] BewareMyPower added a comment to the discussion: [QUESTION] Thread safe problem about HandlerState#changeToReadyState

2022-11-09 Thread GitBox


GitHub user BewareMyPower added a comment to the discussion: [QUESTION] Thread 
safe problem about HandlerState#changeToReadyState

Is following implementation better?

```java
private static boolean notClosed(State state) {
return state == State.Uninitialized || state == State.Connecting || 
state == State.RegisteringSchema;
}

// moves the state to ready if it wasn't closed
protected boolean changeToReadyState() {
return STATE_UPDATER.getAndUpdate(this, state -> (notClosed(state) ? 
State.Ready : state)) == State.Ready;
}

```

GitHub link: 
https://github.com/apache/pulsar/discussions/18401#discussioncomment-4103573


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] codelipenghui added a comment to the discussion: [QUESTION] Thread safe problem about HandlerState#changeToReadyState

2022-11-09 Thread GitBox


GitHub user codelipenghui added a comment to the discussion: [QUESTION] Thread 
safe problem about HandlerState#changeToReadyState

The issue had no activity for 30 days, mark with Stale label.

GitHub link: 
https://github.com/apache/pulsar/discussions/18401#discussioncomment-4103574


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] BewareMyPower created a discussion: [QUESTION] Thread safe problem about HandlerState#changeToReadyState

2022-11-09 Thread GitBox


GitHub user BewareMyPower created a discussion: [QUESTION] Thread safe problem 
about HandlerState#changeToReadyState

I'm not sure if it's bug. It's more a question. As we can see, 
https://github.com/apache/pulsar/blob/608929227824fe4303f46aa432e42af77bcbf625/pulsar-client/src/main/java/org/apache/pulsar/client/impl/HandlerState.java#L53-L56

`HandlerState#changeToReadyState` is not an atomic operation. I'm not sure 
there's a race case like following timeline

| Time | Event | State Before | State Now |
| : | :- | : | :-- |
| 1 | `STATE_UPDATER.compareAndSet(this, State.Uninitialized, State.Ready)` | 
`State.Connecting` | `State.Connecting` |
| 2 | `setState(State.Uninitialized)` | `State.Connecting` | 
`State.Uninitialized` |
| 3 | `STATE_UPDATER.compareAndSet(this, State.Connecting, State.Ready)` | 
`State.Uninitialized` | `State.Uninitialized` |
| 4 | `STATE_UPDATER.compareAndSet(this, State.RegisteringSchema, State.Ready)` 
| `State.Uninitialized` | `State.Uninitialized` |

As we can see, there's a time point that the state was changed back to 
`Uninitialized` from `Connecting`. However, we should expect the state to be 
`Ready` because neither `Uninitialized` nor `Connecting` was a closed state.

I see references of `changeToReadyState` in `ProducerImpl` and `ConsumerImpl` 
were protected by the lock directly or indirectly, like 
https://github.com/apache/pulsar/blob/608929227824fe4303f46aa432e42af77bcbf625/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L738-L739

I'm not sure if the lock works because it requires some `setState` invocations 
are protected by the lock and I didn't check it in detail.

And in `TransactionMetaStoreHandler#connectionOpened`, there's no lock.

https://github.com/apache/pulsar/blob/608929227824fe4303f46aa432e42af77bcbf625/pulsar-client/src/main/java/org/apache/pulsar/client/impl/TransactionMetaStoreHandler.java#L115-L117

I'm not sure if the thread safety could be guaranteed. IMO, if there's no 
possibility that the state was changed back to `Connecting` or `Uninitialized` 
during `changeToReadyState`, it will be thread safe. Or this race condition is 
acceptable?

GitHub link: https://github.com/apache/pulsar/discussions/18401


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: [ANNOUNCE] New Committer: Zili Chen

2022-11-09 Thread Max Xu
Congratulations! Zili

Best,
Max Xu


On Thu, Nov 10, 2022 at 8:16 AM Yu  wrote:

> The Project Management Committee (PMC) for Apache Pulsar has invited Zili
> Chen (https://github.com/tisonkun)
> to become a committer and we are pleased to announce that he has accepted.
>
> Being a committer enables easier contribution to the
> project since there is no need to go via the patch
> submission process. This should enable better productivity.
>
> Welcome and congratulations, Zili Chen!
>
> Please join us in congratulating and welcoming Zili Chen onboard!
>
> Best Regards,
> Yu on behalf of the Pulsar PMC
>


Re: [DISCUSSION] Redesign the MessageId interface

2022-11-09 Thread Yunze Xu
I also changed my mind after I saw Flink's MesssageIdUtils implementation.

Now it's clear to me that:
- For application users, the APIs in the pulsar-client-api module are
what they should use.
- For Pulsar ecosystem developers, the APIs in the pulsar-client
module are interfaces

So at the moment, these MessageId implementations could already be used in some
external applications.

In conclusion,
1. The MessageId interface should not be touched
2. The public methods of the MessageId implementations should not be touched

Based on these two key points, I'm going to look into these
implementations and mark
some as deprecated but they should still work like before.

Thanks,
Yunze

On Thu, Nov 10, 2022 at 3:50 AM Rajan Dhabalia  wrote:
>
> Hi,
>
> I was reading the email thread why we want to change MessageId interface:
> https://lists.apache.org/thread/rdkqnkohbmkjjs61hvoqplhhngr0b0sd
> >> Currently we have the following 5 implementations of MessageId:
> >> These implementations are such a mess. For example, when users get a
> MessageId from `Producer#send`:
>
> I think above discussion started by stating that MessageId has multiple
> implementations eg: MessageIdImpl, BatchMessageIdImpl, ChunkMessageIdImpl,
> etc.. and as a client we receive a single MessageId with send message API.
>
> Well, I see it's expected and very well defined behavior. No matter what
> implementation client library is internally using, as a user for my topic I
> should be able to publish and consume messages by providing specific
> configurations. The moment a user has to know implementation details of a
> client library such as chunk or batch message internals, there will be a
> strong dependency created between application and server implementation
> which is a RED flag for any system. Abstraction helps user adoption by
> simplifying user API and allowing systems to enhance without worrying about
> application dependency.
> Removing abstraction of MessageId and extracting implementation information
> in getter API will prevent us to make any implementation change (eg: won't
> be able to change chunk mesasgeId behavior which we might need in future
> for Shared-subscription), introduce hacks for any enhancement by making
> sure existing internal implementation can't change (eg: if we can't change
> chunk/batch-message-Id then we might create a new wrapper and attach to
> existing messageId which will make even things worse), and even prevent us
> to change backend implementation (eg: we will be stuck with BK-Ledger
> implementation forever).
> Pulsar is used for many large scale business usecaeses with a large number
> of users and it can create a nightmare for operators and users if we remove
> abstractions and start following the practice of exposing system internals
> to user applications.
>
> Thanks,
> Rajan
>
>
> On Tue, Nov 8, 2022 at 6:05 PM Yunze Xu 
> wrote:
>
> > Hi Joe,
> >
> > Then what would we expect users to do with the MessageId? It should only
> > be passed to Consumer#seek or ReaderBuilder#startMessageId?
> >
> > What about the partition index? We have a `TopicMetadata` interface that
> > returns
> > the number of partitions. If the partition is also "implementation
> > details", should we expose
> > this interface? Or should we support customizing a MessageRouter because it
> > returns the partition index?
> >
> > What about the batch index and batch size? For example, we have an
> > enableBatchIndexAcknowledgment method to enable batch index ACK. If batch
> > index is also "implementation details", how could users know what does
> > "batch
> > index ack" mean?
> >
> > Even for ledger id and entry id, this pair represents a logic storage
> > position like the offset
> > concept in Kafka (though each offset represents a message while each
> > entry represents
> > a batch). If you see the Message API, it also exposes many attributes.
> > IMO, for the
> > MessageIdData, only the ack_set (a long array serialized from the
> > BitSet) is the implementation
> > detail.
> >
> > The MessageId API should be flexible, not an abstract one. If not, why
> > do we still implement
> > the toString() method? We should not encourage users to print the
> > MessageId. It would
> > be easy to know what "ledger is 0, entry id is 1" means, users only
> > need to know the concepts
> > of ledger id and entry id. But it would be harder to know a tuple like
> > "0:1:-1:-1" means.
> >
> > Thanks,
> > Yunze
> >
> > On Tue, Nov 8, 2022 at 11:16 PM Joe F  wrote:
> > >
> > > >Maybe this design is to hidden some details, but if
> > > users don't know the details like ledger id and entry id, how could
> > > you know what does "0:0:-1:0" mean?
> > >
> > >  Abstractions exist for a reason. Ledgerid and entryid are
> > implementation
> > > details, and an application should not be interpreting that at all.
> > > -j
> > >
> > >
> > > On Tue, Nov 8, 2022 at 3:43 AM Yunze Xu 
> > > wrote:
> > >
> > > > I didn't look into these two methods at the 

[GitHub] [pulsar] blaghed added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user blaghed added a comment to the discussion: Pulsar Operator On 
Kubernetes

Hi all,
I'm very much interested in the discussion revolving around the Cluster and 
Resource Operators for Pulsar.
So, kind of hope that the reason this ticket went quiet is that you moved the 
ongoing convo over to Slack?
Is there any update from that side that would make sense adding here?

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103122


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] tisonkun added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user tisonkun added a comment to the discussion: Pulsar Operator On 
Kubernetes

Open-ended discussion. Moved to the Discussions forum.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103123


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

@nicknezis I‘ve invited you to join the channel

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103121


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

@nicknezis Hi,
Sorry, I didn't know there was pulsar slack,now, I created one pulsar operator 
channel in the Pulsar Slack,We will discuss the following here.

There's another team looking at the same thing, we can discuss it together 
here. We've been talking about priorities for all function points, and we've 
been talking about rules.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103120


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

@chrsoo OK

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103117


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] nicknezis added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user nicknezis added a comment to the discussion: Pulsar Operator On 
Kubernetes

Why not just an #operator channel in the Pulsar Slack? We hang out in there so 
not sure if this is big enough effort to need a whole new Slack.

Having a design spec laid out is a great idea because we are starting to build 
something similar. If we have an agreed upon approach, members from my team 
could contribute functionality to flesh out the implementation.

Also we can perhaps keep separate Operators as Strimzi did for their Kafka 
Operators. This would allow the current big-sky Operator to exist for managing 
the cluster and my team could create another operator for managing the 
User,Topic, Tenant, Namespace type objects.
https://strimzi.io/docs/overview/latest/#overview-components_str

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103119


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

I created slack https://pulsaroperator.slack.com/

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103118


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user chrsoo added a comment to the discussion: Pulsar Operator On 
Kubernetes

> @sijie yes, exactly. Should we take your comment regarding StreamNative to 
> mean that you are planning to open source an Operator for Pulsar on 
> Kubernetes that deals with the full scope of CRD's for Pulsar management? How 
> soon is "soon"?

@sijie any feedback on this?

> @sky-big regarding a specific design I would design it as explained above but 
> I will see if I can produce a more detailed design specification as a PR.

@sky-big  sorry for not getting back to you, the pulsar project took a 
back-seat for while where I work. It is getting reprioritised again so I will 
start looking into that PR again. Note that I am not talking about code at this 
point but at least I should be able to put how I think it should work.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103116


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sijie added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sijie added a comment to the discussion: Pulsar Operator On 
Kubernetes

I think both ansible and go operators can co-exist. I would also recommend 
submitting the operators to the operator hub as well so that it can be exposed 
to wider community.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103114


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

The next step is to publish to [OperatorHub](https://operatorhub.io/)

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103115


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user chrsoo added a comment to the discussion: Pulsar Operator On 
Kubernetes

@sijie yes, exactly. Should we take your comment regarding StreamNative to mean 
that you are planning to open source an Operator for Pulsar on Kubernetes that 
deals with the full scope of CRD's for Pulsar management? How soon is "soon"?

@sky-big regarding a specific design I would design it as explained above but I 
will see if I can produce a more detailed design specification as a PR.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103110


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

Operator add pulsar manager component success.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103112


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] nicknezis added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user nicknezis added a comment to the discussion: Pulsar Operator On 
Kubernetes

This is cool to see.  A team I am on was going to start making a Pulsar 
Operator using Ansible OperatorSDK.  
https://github.com/operator-framework/operator-sdk/blob/master/doc/ansible/user-guide.md

Here is an Ansible Operator example for reference. 
https://github.com/tomaszkiewicz/mysql-operator/tree/335556ee3331ee7deba9049eafd67ad130125174/playbooks

Would the Pulsar team be interested in this work? Any opinions on Ansible vs Go?

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103113


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

@chrsoo OK, Looking forward to you PR.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103111


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sijie added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sijie added a comment to the discussion: Pulsar Operator On 
Kubernetes

@chrsoo @sky-big @wolfstudy I think what @chrsoo request is an operator for the 
full lifecycle management for all the resources within a Pulsar instance. This 
includes both ClusterOperator and ResourceOperator. 

A ClusterOperator is what @sky-big is doing and most of the *operators* are 
doing. It is used for setting up a cluster and managing the upgrade, downgrade 
and scaling up-and-down a cluster.

A ResourceOperator is more about using *CRD* for provisioning *resources* 
within a Pulsar cluster. A *resource* such as a role (key and tokens), a 
tenant, a namespace, a topic, a function and etc. 

Once everything is modeled as CRD, you can use kubernetes to provision both 
cluster and resources. Thus you can apply a GitOps patten in a CI/CD pipeline 
to manage everything.

@chrsoo : this is kind of aligned with what we (StreamNative) are working on. 
We shall be able to share what we have done soon.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103109


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

@chrsoo I generally understand what you mean, Starting with Tenants CRDs, Can 
you make a specific design and let everyone discuss?

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103108


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

@wolfstudy  @sijie  @jiazhai  Can you disscuss it together?

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103107


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user chrsoo added a comment to the discussion: Pulsar Operator On 
Kubernetes

@wolfstudy thanks for the tip (I am aware) but how is 
[pulsar-manager](https://github.com/apache/pulsar-manager) not **imperative**?

A web UI where you do point-and-click is virtually the same thing as issuing 
commands on the command line. At each instance in time you tell the system what 
to do instead of telling it the desired target state and have it figure out the 
changes by itself.

In the end I would like to **declare** the state of my cluster in an external 
git repository and then have a CI/CD pipeline provision it for me (gitops).

The CRD proposed by @sky-big  is a really good start and I could work with this 
towards my end goal, but I need to declare other parts of the configuration as 
well. Point-and-click will not do the job :-)

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103104


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] wolfstudy added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user wolfstudy added a comment to the discussion: Pulsar Operator On 
Kubernetes

> I would really like to get out of an imperative approach based on 
> pulsar-admin for managing the cluster.

@chrsoo Maybe the [pulsar-manager](https://github.com/apache/pulsar-manager) is 
a good idea for you.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103103


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user chrsoo added a comment to the discussion: Pulsar Operator On 
Kubernetes

The difficulty here is where to draw the line between more or less static 
configuration and what should remain dynamically defined at runtime by the 
clients/users of the system. This might vary in different contexts.

I think it could be useful to define CRDs for the following Pulsar 
configuration items:

1. **Tenants** with namespaces and their relation to Clusters
1. **Persistent Topics** and their relation to tenants/namespaces
1. **Sources** and **Sinks** and their relation to clusters/topics

Starting with Tenants the following CRDs might be useful

* **PulsarTenant** - defines a Pulsar Tenant
* **PulsarNamespace** - defines a Pulsar Namespace for a Tenant
* **PulsarNamespaceBinding** - binds a Pulsar Namespace to a PulsarCluster

It would be good if Tenants and Namespaces are defined at the K8s cluster level 
as a Tenants and Namespaces then could be reused across multiple PulsarClusters 
defined in the different K8s namespaces. The PulsarNamespaceBinding should  be 
defined at the namespace level and in the same namespace as the PulsarCluster 
to which it is bound.

@sky-big - does all this make sense to you?

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103106


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

@chrsoo Cloud you please give a sepcific example based on your sence? then we 
can add this function in the pulsar operator.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103105


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] chrsoo added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user chrsoo added a comment to the discussion: Pulsar Operator On 
Kubernetes

Great initiative! I know you have to start somewhere and cannot do everything 
at once but have you thought about CRD's for Sources and Sinks? What about 
configuration like Channels, Tenants and Namespaces?

I would really like to get out of an imperative approach based on pulsar-admin 
for managing the cluster. Still trying to figure out how to get the whole 
configuration for a geo-replicated cluster in Git... 

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103102


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

Next step concern pulsar stateful component(include Bookeeper and Zookeeper), 
use operator control complex stateful component.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103101


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

Basically realized the function of pulsar operator,include pulsar dashboard, 
grafana, prometheus and so on [Pulsar 
Operator](https://github.com/sky-big/pulsar-operator),weclome to use and 
contribute.

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103100


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big added a comment to the discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big added a comment to the discussion: Pulsar Operator On 
Kubernetes

Welcome everyone to contribute

GitHub link: 
https://github.com/apache/pulsar/discussions/18398#discussioncomment-4103099


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sky-big created a discussion: Pulsar Operator On Kubernetes

2022-11-09 Thread GitBox


GitHub user sky-big created a discussion: Pulsar Operator On Kubernetes

Add pulsar operator for kubernetes(https://github.com/sky-big/pulsar-operator) 
based on deployment/kubernetes/*.yaml


GitHub link: https://github.com/apache/pulsar/discussions/18398


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[ANNOUNCE] New Committer: Zili Chen

2022-11-09 Thread Yu
The Project Management Committee (PMC) for Apache Pulsar has invited Zili
Chen (https://github.com/tisonkun)
to become a committer and we are pleased to announce that he has accepted.

Being a committer enables easier contribution to the
project since there is no need to go via the patch
submission process. This should enable better productivity.

Welcome and congratulations, Zili Chen!

Please join us in congratulating and welcoming Zili Chen onboard!

Best Regards,
Yu on behalf of the Pulsar PMC


Re: [DISCUSSION] Redesign the MessageId interface

2022-11-09 Thread Rajan Dhabalia
Hi,

I was reading the email thread why we want to change MessageId interface:
https://lists.apache.org/thread/rdkqnkohbmkjjs61hvoqplhhngr0b0sd
>> Currently we have the following 5 implementations of MessageId:
>> These implementations are such a mess. For example, when users get a
MessageId from `Producer#send`:

I think above discussion started by stating that MessageId has multiple
implementations eg: MessageIdImpl, BatchMessageIdImpl, ChunkMessageIdImpl,
etc.. and as a client we receive a single MessageId with send message API.

Well, I see it's expected and very well defined behavior. No matter what
implementation client library is internally using, as a user for my topic I
should be able to publish and consume messages by providing specific
configurations. The moment a user has to know implementation details of a
client library such as chunk or batch message internals, there will be a
strong dependency created between application and server implementation
which is a RED flag for any system. Abstraction helps user adoption by
simplifying user API and allowing systems to enhance without worrying about
application dependency.
Removing abstraction of MessageId and extracting implementation information
in getter API will prevent us to make any implementation change (eg: won't
be able to change chunk mesasgeId behavior which we might need in future
for Shared-subscription), introduce hacks for any enhancement by making
sure existing internal implementation can't change (eg: if we can't change
chunk/batch-message-Id then we might create a new wrapper and attach to
existing messageId which will make even things worse), and even prevent us
to change backend implementation (eg: we will be stuck with BK-Ledger
implementation forever).
Pulsar is used for many large scale business usecaeses with a large number
of users and it can create a nightmare for operators and users if we remove
abstractions and start following the practice of exposing system internals
to user applications.

Thanks,
Rajan


On Tue, Nov 8, 2022 at 6:05 PM Yunze Xu 
wrote:

> Hi Joe,
>
> Then what would we expect users to do with the MessageId? It should only
> be passed to Consumer#seek or ReaderBuilder#startMessageId?
>
> What about the partition index? We have a `TopicMetadata` interface that
> returns
> the number of partitions. If the partition is also "implementation
> details", should we expose
> this interface? Or should we support customizing a MessageRouter because it
> returns the partition index?
>
> What about the batch index and batch size? For example, we have an
> enableBatchIndexAcknowledgment method to enable batch index ACK. If batch
> index is also "implementation details", how could users know what does
> "batch
> index ack" mean?
>
> Even for ledger id and entry id, this pair represents a logic storage
> position like the offset
> concept in Kafka (though each offset represents a message while each
> entry represents
> a batch). If you see the Message API, it also exposes many attributes.
> IMO, for the
> MessageIdData, only the ack_set (a long array serialized from the
> BitSet) is the implementation
> detail.
>
> The MessageId API should be flexible, not an abstract one. If not, why
> do we still implement
> the toString() method? We should not encourage users to print the
> MessageId. It would
> be easy to know what "ledger is 0, entry id is 1" means, users only
> need to know the concepts
> of ledger id and entry id. But it would be harder to know a tuple like
> "0:1:-1:-1" means.
>
> Thanks,
> Yunze
>
> On Tue, Nov 8, 2022 at 11:16 PM Joe F  wrote:
> >
> > >Maybe this design is to hidden some details, but if
> > users don't know the details like ledger id and entry id, how could
> > you know what does "0:0:-1:0" mean?
> >
> >  Abstractions exist for a reason. Ledgerid and entryid are
> implementation
> > details, and an application should not be interpreting that at all.
> > -j
> >
> >
> > On Tue, Nov 8, 2022 at 3:43 AM Yunze Xu 
> > wrote:
> >
> > > I didn't look into these two methods at the moment. But I think it's
> > > possible to
> > > retain only the `fromByteArray`.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Tue, Nov 8, 2022 at 7:02 PM Enrico Olivelli 
> > > wrote:
> > > >
> > > > Il giorno mar 8 nov 2022 alle ore 11:52 Yunze Xu
> > > >  ha scritto:
> > > > >
> > > > > Hi Enrico,
> > > > >
> > > > > > We also need a way to represent this as a String or a byte[]
> > > > >
> > > > > We already have the `toByteArray` method, right?
> > > >
> > > > Yes, correct. So we are fine. I forgot about it and I answered too
> > > quickly.
> > > >
> > > > I am not sure if this can be in the scope of this initiative, but we
> > > > should somehow get rid of
> > > > stuff like "fromByteArrayWithTopic" vs "fromByteArray".
> > > >
> > > > Thanks
> > > > Enrico
> > > >
> > > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > > On Tue, Nov 8, 2022 at 6:43 PM Enrico Olivelli <
> eolive...@gmail.com>
> > > wrote:
> > > > 

Re: [DISCUSSION] Redesign the MessageId interface

2022-11-09 Thread tison
FWIW, the Flink Pulsar connector hacky parses the message id internals to
get the next message id:
https://github.com/apache/flink/blob/421f057a7488fd64854a82424755f76b89561a0b/flink-connectors/flink-connector-pulsar/src/main/java/org/apache/flink/connector/pulsar/source/enumerator/cursor/MessageIdUtils.java

Enrico Olivelli 于2022年11月10日 周四01:03写道:

> After reading Joe's comments I have changed my mind.
> Actually it is better to not expose "ledgerId" and "entryId" to client
> applications.
> They are useless pieces of information.
> And also if in the future we want to change the way we internally
> address a message we will always have to support these fields.
>
> toByteArray() is enough for an application to save the ID into an
> external database and then to recover a Subscription (or a Reader)
> from a certain point.
> toString() is good only for debug/logs, we can change it but it is
> better to not touch it and add "tests"
>
> It is better that the MessageId API stays as opaque as possible.
>
> Enrico
>
>
> Il giorno mer 9 nov 2022 alle ore 15:50 Yunze Xu
>  ha scritto:
> >
> > Hi Jiaqi,
> >
> > > I don't think `tostring` should be used in any serious case because it
> has
> > no standard.
> >
> > I agree. But it's better to keep it not changed. Just like my previous
> reply, it
> > might be a de-facto standard because the `toString()` like methods are
> used
> > in logging, not only for debugging. For example, there is a
> getLastMessageId
> > API in consumer and users might log the last message ID.
> >
> > Different representations across different languages is not a big issue
> but it
> > could make users and administrators confused.
> >
> > Here is an example that the C++ client supports constructing a MessageId
> with
> > 4 arguments, but the 1st argument is the partition, not the ledger id.
> > However, the
> > string representation is still
> > ":::". Though
> > in Java client a non-batched message ID doesn't have the
> > ":" suffix.
> >
> > Thanks,
> > Yunze
> >
> > On Wed, Nov 9, 2022 at 9:13 PM Jiaqi Shen 
> wrote:
> > >
> > > Thanks, this is very inspiring to me.
> > >
> > > But I have a different opinion on `tostring`.
> > >
> > > >>You can only see a representation from `toString` method and got some
> > > output like "0:0:-1:0".
> > >
> > > I don't think `tostring` should be used in any serious case because it
> has
> > > no standard. There is no constraint on how the messageId should be
> > > converted to a string. For example, in go client, `tostring` is not
> being
> > > supported now. If go client should implement a `tostring` method, does
> go
> > > client' s`tostring` must follow the java implement like "0:0:-1:0"?
> > >
> > > If user do need a string/[]byte to record a messageId, `toByteArray`
> will
> > > be enough. In user side, most of the time , I think users don't really
> care
> > > about the "messageId string" is meaningful. I think `tostring` only
> should
> > > be used in debug.
> > >
> > > Thanks,
> > > Jiaqi Shen
> > >
> > >
> > > Joe F  于2022年11月9日周三 20:25写道:
> > >
> > > > Messageid is an identifier which identifies a message.  How that id
> is
> > > > constructed, or what it contains should not  matter to an
> application,  and
> > > > an application should not assume anything about the implementation
> of that
> > > > id.
> > > >
> > > > >What about the partition index? We have a `TopicMetadata` interface
> that
> > > > returns the number of partitions.
> > > >
> > > > Partitioning is a first class concept, and is  designed to be used by
> > > > application.  How a partition is implemented  should not be used by
> the
> > > > application .
> > > >
> > > >  [ People violate this all the time, and I regret that Pulsar did not
> > > > provide get_Nth_topicpartion(), which led to people hardcoding it  as
> > > > topicname-N. and using that directly.  Now we are stuck with it.]
> > > >
> > > >  Similarly batch index and batch size. Those are all logical concepts
> > > > exposed to the user.  For eg: batch size is something the app is
> allowed to
> > > > tune
> > > >
> > > > >Even for ledger id and entry id, this pair represents a logic
> storage
> > > > position like the offset concept in Kafka
> > > > These are not equivalent.   In Pulsar these are implementation
> details,
> > > > while in Kafka those are logical concepts.
> > > >
> > > > One might think that these are logical concepts in Pulsar, because
> if you
> > > > reverse engineer the current msgid implementation, you observe some
> > > > "properties".
> > > >
> > > > Ledger id/entry id are logical concepts in __Bookkeeper__ , not  in
> Pulsar.
> > > > There is the Managed Ledger abstraction on top of BK, and then there
> is
> > > > Pulsar on top of ML. You will break two levels of abstraction to
> expose
> > > > ledger/entryid to an application
> > > >
> > > > An application  should only care about the  operations that  can be
> done
> > > > with a messageId
> > > >
> > > > - getmsgid() to return the message 

Re: [DISCUSSION] Redesign the MessageId interface

2022-11-09 Thread Enrico Olivelli
After reading Joe's comments I have changed my mind.
Actually it is better to not expose "ledgerId" and "entryId" to client
applications.
They are useless pieces of information.
And also if in the future we want to change the way we internally
address a message we will always have to support these fields.

toByteArray() is enough for an application to save the ID into an
external database and then to recover a Subscription (or a Reader)
from a certain point.
toString() is good only for debug/logs, we can change it but it is
better to not touch it and add "tests"

It is better that the MessageId API stays as opaque as possible.

Enrico


Il giorno mer 9 nov 2022 alle ore 15:50 Yunze Xu
 ha scritto:
>
> Hi Jiaqi,
>
> > I don't think `tostring` should be used in any serious case because it has
> no standard.
>
> I agree. But it's better to keep it not changed. Just like my previous reply, 
> it
> might be a de-facto standard because the `toString()` like methods are used
> in logging, not only for debugging. For example, there is a getLastMessageId
> API in consumer and users might log the last message ID.
>
> Different representations across different languages is not a big issue but it
> could make users and administrators confused.
>
> Here is an example that the C++ client supports constructing a MessageId with
> 4 arguments, but the 1st argument is the partition, not the ledger id.
> However, the
> string representation is still
> ":::". Though
> in Java client a non-batched message ID doesn't have the
> ":" suffix.
>
> Thanks,
> Yunze
>
> On Wed, Nov 9, 2022 at 9:13 PM Jiaqi Shen  wrote:
> >
> > Thanks, this is very inspiring to me.
> >
> > But I have a different opinion on `tostring`.
> >
> > >>You can only see a representation from `toString` method and got some
> > output like "0:0:-1:0".
> >
> > I don't think `tostring` should be used in any serious case because it has
> > no standard. There is no constraint on how the messageId should be
> > converted to a string. For example, in go client, `tostring` is not being
> > supported now. If go client should implement a `tostring` method, does go
> > client' s`tostring` must follow the java implement like "0:0:-1:0"?
> >
> > If user do need a string/[]byte to record a messageId, `toByteArray` will
> > be enough. In user side, most of the time , I think users don't really care
> > about the "messageId string" is meaningful. I think `tostring` only should
> > be used in debug.
> >
> > Thanks,
> > Jiaqi Shen
> >
> >
> > Joe F  于2022年11月9日周三 20:25写道:
> >
> > > Messageid is an identifier which identifies a message.  How that id is
> > > constructed, or what it contains should not  matter to an application,  
> > > and
> > > an application should not assume anything about the implementation of that
> > > id.
> > >
> > > >What about the partition index? We have a `TopicMetadata` interface that
> > > returns the number of partitions.
> > >
> > > Partitioning is a first class concept, and is  designed to be used by
> > > application.  How a partition is implemented  should not be used by the
> > > application .
> > >
> > >  [ People violate this all the time, and I regret that Pulsar did not
> > > provide get_Nth_topicpartion(), which led to people hardcoding it  as
> > > topicname-N. and using that directly.  Now we are stuck with it.]
> > >
> > >  Similarly batch index and batch size. Those are all logical concepts
> > > exposed to the user.  For eg: batch size is something the app is allowed 
> > > to
> > > tune
> > >
> > > >Even for ledger id and entry id, this pair represents a logic storage
> > > position like the offset concept in Kafka
> > > These are not equivalent.   In Pulsar these are implementation details,
> > > while in Kafka those are logical concepts.
> > >
> > > One might think that these are logical concepts in Pulsar, because if you
> > > reverse engineer the current msgid implementation, you observe some
> > > "properties".
> > >
> > > Ledger id/entry id are logical concepts in __Bookkeeper__ , not  in 
> > > Pulsar.
> > > There is the Managed Ledger abstraction on top of BK, and then there is
> > > Pulsar on top of ML. You will break two levels of abstraction to expose
> > > ledger/entryid to an application
> > >
> > > An application  should only care about the  operations that  can be done
> > > with a messageId
> > >
> > > - getmsgid() to return the message id  as an opaque object
> > >
> > > [Operators   using  one messageId ]
> > > -serde,   like tostring(). for storage/retrieval of message identifier
> > > -getter/setter on logical properties of the message (partition id etc...)
> > > -increment/decrement
> > >
> > > [Operators that take multiple messageIds]
> > > -comparator
> > > -range
> > >
> > > Those are the kind of operators Pulsar should provide to a user.
> > > Applications should not implement these operators on their own by reverse
> > > engineering the msgId. No application should be directly using ledgerid or
> > > entryid 

Re: [DISCUSSION] Redesign the MessageId interface

2022-11-09 Thread Yunze Xu
Hi Jiaqi,

> I don't think `tostring` should be used in any serious case because it has
no standard.

I agree. But it's better to keep it not changed. Just like my previous reply, it
might be a de-facto standard because the `toString()` like methods are used
in logging, not only for debugging. For example, there is a getLastMessageId
API in consumer and users might log the last message ID.

Different representations across different languages is not a big issue but it
could make users and administrators confused.

Here is an example that the C++ client supports constructing a MessageId with
4 arguments, but the 1st argument is the partition, not the ledger id.
However, the
string representation is still
":::". Though
in Java client a non-batched message ID doesn't have the
":" suffix.

Thanks,
Yunze

On Wed, Nov 9, 2022 at 9:13 PM Jiaqi Shen  wrote:
>
> Thanks, this is very inspiring to me.
>
> But I have a different opinion on `tostring`.
>
> >>You can only see a representation from `toString` method and got some
> output like "0:0:-1:0".
>
> I don't think `tostring` should be used in any serious case because it has
> no standard. There is no constraint on how the messageId should be
> converted to a string. For example, in go client, `tostring` is not being
> supported now. If go client should implement a `tostring` method, does go
> client' s`tostring` must follow the java implement like "0:0:-1:0"?
>
> If user do need a string/[]byte to record a messageId, `toByteArray` will
> be enough. In user side, most of the time , I think users don't really care
> about the "messageId string" is meaningful. I think `tostring` only should
> be used in debug.
>
> Thanks,
> Jiaqi Shen
>
>
> Joe F  于2022年11月9日周三 20:25写道:
>
> > Messageid is an identifier which identifies a message.  How that id is
> > constructed, or what it contains should not  matter to an application,  and
> > an application should not assume anything about the implementation of that
> > id.
> >
> > >What about the partition index? We have a `TopicMetadata` interface that
> > returns the number of partitions.
> >
> > Partitioning is a first class concept, and is  designed to be used by
> > application.  How a partition is implemented  should not be used by the
> > application .
> >
> >  [ People violate this all the time, and I regret that Pulsar did not
> > provide get_Nth_topicpartion(), which led to people hardcoding it  as
> > topicname-N. and using that directly.  Now we are stuck with it.]
> >
> >  Similarly batch index and batch size. Those are all logical concepts
> > exposed to the user.  For eg: batch size is something the app is allowed to
> > tune
> >
> > >Even for ledger id and entry id, this pair represents a logic storage
> > position like the offset concept in Kafka
> > These are not equivalent.   In Pulsar these are implementation details,
> > while in Kafka those are logical concepts.
> >
> > One might think that these are logical concepts in Pulsar, because if you
> > reverse engineer the current msgid implementation, you observe some
> > "properties".
> >
> > Ledger id/entry id are logical concepts in __Bookkeeper__ , not  in Pulsar.
> > There is the Managed Ledger abstraction on top of BK, and then there is
> > Pulsar on top of ML. You will break two levels of abstraction to expose
> > ledger/entryid to an application
> >
> > An application  should only care about the  operations that  can be done
> > with a messageId
> >
> > - getmsgid() to return the message id  as an opaque object
> >
> > [Operators   using  one messageId ]
> > -serde,   like tostring(). for storage/retrieval of message identifier
> > -getter/setter on logical properties of the message (partition id etc...)
> > -increment/decrement
> >
> > [Operators that take multiple messageIds]
> > -comparator
> > -range
> >
> > Those are the kind of operators Pulsar should provide to a user.
> > Applications should not implement these operators on their own by reverse
> > engineering the msgId. No application should be directly using ledgerid or
> > entryid for doing anything (math or logic),
> >
> >   As long as Pulsar provides  these operations  with msgid to the
> > application,  it should not care whether it's represented as "0:1:-1:-1"
> > or  "a:b:-b-b", or   "#xba4231!haxcy1826923f" or as a serialized binary
> > object or..whatever it may be.
> >
> > >>But it would be harder to know a tuple like "0:1:-1:-1" means.
> >
> > A user shouldn't have to know what this means. That's the point.
> >
> > Pulsar itself changed the messageId multiple times as it added
> > partitioning, batching and so on, and it might do so again. And bookkeeper
> > could change its representation of  ledgers, (for eg,  to uuids and byte
> > offsets)  ML could replace BK with something else  (for eg.  a table in a
> > db.)  Anything is possible - Pulsar would then just have to change the
> > implementation of the operator functions, and no application needs to be
> > rewritten.
> >
> > -j
> >

Re: [DISCUSSION] Redesign the MessageId interface

2022-11-09 Thread Yunze Xu
Hi Joe,

I think the most controversial point is what should a MessageId be used for.
>From your opinion, it should only be used as a comparable object (opaque),
which represents the position of a message [1]. What I have thought is,
MessageId should be a wrapper of the MessageIdData in PulsarApi.proto [2].

I agree that at application side, there are not many cases that need
to look into
the details of a specific MessageId implementation. However, these
"internal fields"
are the de-facto concepts in Pulsar. Yeah, ledger id and entry id are
concepts from
BK and they might change. I doubt if there is any Pulsar application
developer that
doesn't know these two concepts?

Let's look at the clients of other languages, including C++[3],
Golang[4], Python[5],
Rust[6][7], C#[8]. Only the official Java client doesn't expose these fields.

I'm also okay to not change the MessageId interface in the
pulsar-client-api module.
Because the motivation is to simplify the messed implementations of
MessageId in the
pulsar-client module.

We can add a MessageIdPojo class in the pulsar-common module or else.
In the Pulsar
repo, all MessageId implementations must extend the MessageIdPojo class.

```java
@Data
class MessageIdPojo extends MessageId {
private final long ledgerId;
// ...
}
```

[1] https://pulsar.apache.org/docs/concepts-messaging#messages
[2] 
https://github.com/apache/pulsar/blob/b31c5a6a325728b5dc5faebd1a33386952d733d5/pulsar-common/src/main/proto/PulsarApi.proto#L57
[3] 
https://github.com/apache/pulsar-client-cpp/blob/main/include/pulsar/MessageId.h
[4] 
https://github.com/apache/pulsar-client-go/blob/d9c8b0ab9c14e8d571b632c93002ea20db1a2c16/pulsar/message.go#L147
[5] 
https://github.com/apache/pulsar-client-python/blob/75a57b427d4c6944c49f4b712344107b5444aa36/pulsar/__init__.py#L84
[6] 
https://github.com/streamnative/pulsar-rs/blob/de59974080daa248bfdeaea8510eb72ec8f30bac/src/consumer.rs#L1984
[7] 
https://github.com/streamnative/pulsar-rs/blob/de59974080daa248bfdeaea8510eb72ec8f30bac/src/consumer.rs#L1353
[8] 
https://github.com/apache/pulsar-dotpulsar/blob/0590b1ad6c4474d425662352ba62abb41bfb9f0a/src/DotPulsar/MessageId.cs#L56

Thanks,
Yunze

On Wed, Nov 9, 2022 at 8:24 PM Joe F  wrote:
>
> Messageid is an identifier which identifies a message.  How that id is
> constructed, or what it contains should not  matter to an application,  and
> an application should not assume anything about the implementation of that
> id.
>
> >What about the partition index? We have a `TopicMetadata` interface that
> returns the number of partitions.
>
> Partitioning is a first class concept, and is  designed to be used by
> application.  How a partition is implemented  should not be used by the
> application .
>
>  [ People violate this all the time, and I regret that Pulsar did not
> provide get_Nth_topicpartion(), which led to people hardcoding it  as
> topicname-N. and using that directly.  Now we are stuck with it.]
>
>  Similarly batch index and batch size. Those are all logical concepts
> exposed to the user.  For eg: batch size is something the app is allowed to
> tune
>
> >Even for ledger id and entry id, this pair represents a logic storage
> position like the offset concept in Kafka
> These are not equivalent.   In Pulsar these are implementation details,
> while in Kafka those are logical concepts.
>
> One might think that these are logical concepts in Pulsar, because if you
> reverse engineer the current msgid implementation, you observe some
> "properties".
>
> Ledger id/entry id are logical concepts in __Bookkeeper__ , not  in Pulsar.
> There is the Managed Ledger abstraction on top of BK, and then there is
> Pulsar on top of ML. You will break two levels of abstraction to expose
> ledger/entryid to an application
>
> An application  should only care about the  operations that  can be done
> with a messageId
>
> - getmsgid() to return the message id  as an opaque object
>
> [Operators   using  one messageId ]
> -serde,   like tostring(). for storage/retrieval of message identifier
> -getter/setter on logical properties of the message (partition id etc...)
> -increment/decrement
>
> [Operators that take multiple messageIds]
> -comparator
> -range
>
> Those are the kind of operators Pulsar should provide to a user.
> Applications should not implement these operators on their own by reverse
> engineering the msgId. No application should be directly using ledgerid or
> entryid for doing anything (math or logic),
>
>   As long as Pulsar provides  these operations  with msgid to the
> application,  it should not care whether it's represented as "0:1:-1:-1"
> or  "a:b:-b-b", or   "#xba4231!haxcy1826923f" or as a serialized binary
> object or..whatever it may be.
>
> >>But it would be harder to know a tuple like "0:1:-1:-1" means.
>
> A user shouldn't have to know what this means. That's the point.
>
> Pulsar itself changed the messageId multiple times as it added
> partitioning, batching and so on, and it 

Re: [DISCUSSION] Redesign the MessageId interface

2022-11-09 Thread Jiaqi Shen
Thanks, this is very inspiring to me.

But I have a different opinion on `tostring`.

>>You can only see a representation from `toString` method and got some
output like "0:0:-1:0".

I don't think `tostring` should be used in any serious case because it has
no standard. There is no constraint on how the messageId should be
converted to a string. For example, in go client, `tostring` is not being
supported now. If go client should implement a `tostring` method, does go
client' s`tostring` must follow the java implement like "0:0:-1:0"?

If user do need a string/[]byte to record a messageId, `toByteArray` will
be enough. In user side, most of the time , I think users don't really care
about the "messageId string" is meaningful. I think `tostring` only should
be used in debug.

Thanks,
Jiaqi Shen


Joe F  于2022年11月9日周三 20:25写道:

> Messageid is an identifier which identifies a message.  How that id is
> constructed, or what it contains should not  matter to an application,  and
> an application should not assume anything about the implementation of that
> id.
>
> >What about the partition index? We have a `TopicMetadata` interface that
> returns the number of partitions.
>
> Partitioning is a first class concept, and is  designed to be used by
> application.  How a partition is implemented  should not be used by the
> application .
>
>  [ People violate this all the time, and I regret that Pulsar did not
> provide get_Nth_topicpartion(), which led to people hardcoding it  as
> topicname-N. and using that directly.  Now we are stuck with it.]
>
>  Similarly batch index and batch size. Those are all logical concepts
> exposed to the user.  For eg: batch size is something the app is allowed to
> tune
>
> >Even for ledger id and entry id, this pair represents a logic storage
> position like the offset concept in Kafka
> These are not equivalent.   In Pulsar these are implementation details,
> while in Kafka those are logical concepts.
>
> One might think that these are logical concepts in Pulsar, because if you
> reverse engineer the current msgid implementation, you observe some
> "properties".
>
> Ledger id/entry id are logical concepts in __Bookkeeper__ , not  in Pulsar.
> There is the Managed Ledger abstraction on top of BK, and then there is
> Pulsar on top of ML. You will break two levels of abstraction to expose
> ledger/entryid to an application
>
> An application  should only care about the  operations that  can be done
> with a messageId
>
> - getmsgid() to return the message id  as an opaque object
>
> [Operators   using  one messageId ]
> -serde,   like tostring(). for storage/retrieval of message identifier
> -getter/setter on logical properties of the message (partition id etc...)
> -increment/decrement
>
> [Operators that take multiple messageIds]
> -comparator
> -range
>
> Those are the kind of operators Pulsar should provide to a user.
> Applications should not implement these operators on their own by reverse
> engineering the msgId. No application should be directly using ledgerid or
> entryid for doing anything (math or logic),
>
>   As long as Pulsar provides  these operations  with msgid to the
> application,  it should not care whether it's represented as "0:1:-1:-1"
> or  "a:b:-b-b", or   "#xba4231!haxcy1826923f" or as a serialized binary
> object or..whatever it may be.
>
> >>But it would be harder to know a tuple like "0:1:-1:-1" means.
>
> A user shouldn't have to know what this means. That's the point.
>
> Pulsar itself changed the messageId multiple times as it added
> partitioning, batching and so on, and it might do so again. And bookkeeper
> could change its representation of  ledgers, (for eg,  to uuids and byte
> offsets)  ML could replace BK with something else  (for eg.  a table in a
> db.)  Anything is possible - Pulsar would then just have to change the
> implementation of the operator functions, and no application needs to be
> rewritten.
>
> -j
>
> On Tue, Nov 8, 2022 at 6:05 PM Yunze Xu 
> wrote:
>
> > Hi Joe,
> >
> > Then what would we expect users to do with the MessageId? It should only
> > be passed to Consumer#seek or ReaderBuilder#startMessageId?
> >
> > What about the partition index? We have a `TopicMetadata` interface that
> > returns
> > the number of partitions. If the partition is also "implementation
> > details", should we expose
> > this interface? Or should we support customizing a MessageRouter because
> it
> > returns the partition index?
> >
> > What about the batch index and batch size? For example, we have an
> > enableBatchIndexAcknowledgment method to enable batch index ACK. If batch
> > index is also "implementation details", how could users know what does
> > "batch
> > index ack" mean?
> >
> > Even for ledger id and entry id, this pair represents a logic storage
> > position like the offset
> > concept in Kafka (though each offset represents a message while each
> > entry represents
> > a batch). If you see the Message API, it also exposes many 

Re: [DISCUSSION] Redesign the MessageId interface

2022-11-09 Thread Joe F
Messageid is an identifier which identifies a message.  How that id is
constructed, or what it contains should not  matter to an application,  and
an application should not assume anything about the implementation of that
id.

>What about the partition index? We have a `TopicMetadata` interface that
returns the number of partitions.

Partitioning is a first class concept, and is  designed to be used by
application.  How a partition is implemented  should not be used by the
application .

 [ People violate this all the time, and I regret that Pulsar did not
provide get_Nth_topicpartion(), which led to people hardcoding it  as
topicname-N. and using that directly.  Now we are stuck with it.]

 Similarly batch index and batch size. Those are all logical concepts
exposed to the user.  For eg: batch size is something the app is allowed to
tune

>Even for ledger id and entry id, this pair represents a logic storage
position like the offset concept in Kafka
These are not equivalent.   In Pulsar these are implementation details,
while in Kafka those are logical concepts.

One might think that these are logical concepts in Pulsar, because if you
reverse engineer the current msgid implementation, you observe some
"properties".

Ledger id/entry id are logical concepts in __Bookkeeper__ , not  in Pulsar.
There is the Managed Ledger abstraction on top of BK, and then there is
Pulsar on top of ML. You will break two levels of abstraction to expose
ledger/entryid to an application

An application  should only care about the  operations that  can be done
with a messageId

- getmsgid() to return the message id  as an opaque object

[Operators   using  one messageId ]
-serde,   like tostring(). for storage/retrieval of message identifier
-getter/setter on logical properties of the message (partition id etc...)
-increment/decrement

[Operators that take multiple messageIds]
-comparator
-range

Those are the kind of operators Pulsar should provide to a user.
Applications should not implement these operators on their own by reverse
engineering the msgId. No application should be directly using ledgerid or
entryid for doing anything (math or logic),

  As long as Pulsar provides  these operations  with msgid to the
application,  it should not care whether it's represented as "0:1:-1:-1"
or  "a:b:-b-b", or   "#xba4231!haxcy1826923f" or as a serialized binary
object or..whatever it may be.

>>But it would be harder to know a tuple like "0:1:-1:-1" means.

A user shouldn't have to know what this means. That's the point.

Pulsar itself changed the messageId multiple times as it added
partitioning, batching and so on, and it might do so again. And bookkeeper
could change its representation of  ledgers, (for eg,  to uuids and byte
offsets)  ML could replace BK with something else  (for eg.  a table in a
db.)  Anything is possible - Pulsar would then just have to change the
implementation of the operator functions, and no application needs to be
rewritten.

-j

On Tue, Nov 8, 2022 at 6:05 PM Yunze Xu 
wrote:

> Hi Joe,
>
> Then what would we expect users to do with the MessageId? It should only
> be passed to Consumer#seek or ReaderBuilder#startMessageId?
>
> What about the partition index? We have a `TopicMetadata` interface that
> returns
> the number of partitions. If the partition is also "implementation
> details", should we expose
> this interface? Or should we support customizing a MessageRouter because it
> returns the partition index?
>
> What about the batch index and batch size? For example, we have an
> enableBatchIndexAcknowledgment method to enable batch index ACK. If batch
> index is also "implementation details", how could users know what does
> "batch
> index ack" mean?
>
> Even for ledger id and entry id, this pair represents a logic storage
> position like the offset
> concept in Kafka (though each offset represents a message while each
> entry represents
> a batch). If you see the Message API, it also exposes many attributes.
> IMO, for the
> MessageIdData, only the ack_set (a long array serialized from the
> BitSet) is the implementation
> detail.
>
> The MessageId API should be flexible, not an abstract one. If not, why
> do we still implement
> the toString() method? We should not encourage users to print the
> MessageId. It would
> be easy to know what "ledger is 0, entry id is 1" means, users only
> need to know the concepts
> of ledger id and entry id. But it would be harder to know a tuple like
> "0:1:-1:-1" means.
>
> Thanks,
> Yunze
>
> On Tue, Nov 8, 2022 at 11:16 PM Joe F  wrote:
> >
> > >Maybe this design is to hidden some details, but if
> > users don't know the details like ledger id and entry id, how could
> > you know what does "0:0:-1:0" mean?
> >
> >  Abstractions exist for a reason. Ledgerid and entryid are
> implementation
> > details, and an application should not be interpreting that at all.
> > -j
> >
> >
> > On Tue, Nov 8, 2022 at 3:43 AM Yunze Xu 
> > wrote:
> >
> > > I didn't look into 

Re: [DISCUSSION] Redesign the MessageId interface

2022-11-09 Thread Yunze Xu
Hi Haiting,

> But please make sure we have to make it compatible with previous
implementations, like the `toString` method

Yeah, I agree, I will keep it compatible.

BTW, while I'm working on this, I found the MessageId implementations
are more complicated than I thought. The MessageIdImpl class must be a
POJO. Otherwise it cannot be passed into the admin API as the Entity. But I
still insists on returning an Optional instead of T with default
value explained
in API docs.


Thanks,
Yunze

On Wed, Nov 9, 2022 at 3:02 PM Haiting Jiang  wrote:
>
> Overall, this makes sense to me.
> The current status of MessageId is a bit messy, especially for client
> developers and senior users who are interested in the implementation
> details.
> But please make sure we have to make it compatible with previous
> implementations, like the `toString` method, I bet someone has already
> done the parsing and got the ledgerId and entryId from it.
>
>
> Thanks,
> Haiting
>
> On Tue, Nov 8, 2022 at 6:25 PM Yunze Xu  wrote:
> >
> > Hi all,
> >
> > Currently we have the following 5 implementations of MessageId:
> >
> > - MessageIdImpl: (ledger id, entry id, partition index)
> >   - BatchMessageIdImpl: adds (batch index, batch size, acker), where
> > acker is a wrapper of a BitSet.
> >   - ChunkMessageIdImpl: adds another MessageIdImpl that represents
> > the first MessageIdImpl of a BitSet.
> >   - MultiMessageIdImpl: adds a map that maps the topic name to the
> > MessageId.
> > - TopicMessageIdImpl: adds the topic name and the partition name
> >
> > These implementations are such a mess. For example, when users get a
> > MessageId from `Producer#send`:
> >
> > ```java
> > var id = producer.send("msg");
> > ```
> >
> > There is no getter to get some specific fields like ledger id. You can
> > only see a representation from `toString` method and got some output
> > like "0:0:-1:0". Maybe this design is to hidden some details, but if
> > users don't know the details like ledger id and entry id, how could
> > you know what does "0:0:-1:0" mean? What if `MessageId#toString`'s
> > implementation changed? Should it be treated as a breaking change?
> >
> > The original definition of the underlying MessageIdData is much more
> > clear:
> >
> > ```proto
> > message MessageIdData {
> > required uint64 ledgerId = 1;
> > required uint64 entryId  = 2;
> > optional int32 partition = 3 [default = -1];
> > optional int32 batch_index = 4 [default = -1];
> > repeated int64 ack_set = 5;
> > optional int32 batch_size = 6;
> >
> > // For the chunk message id, we need to specify the first chunk message 
> > id.
> > optional MessageIdData first_chunk_message_id = 7;
> > }
> > ```
> >
> > IMO, MessageId should be a wrapper of MessageIdData. It's more natural
> > to have an interface like:
> >
> > ```java
> > interface MessageId {
> > long ledgerId();
> > long entryId();
> > Optional partition();
> > Optional batchIndex();
> > // ...
> > ```
> >
> > Additionally, there are many places that use only the triple of
> > (ledger id, entry id, batch index) as the key to represent the position.
> > Currently, they are done by adding a conversion from
> > BatchMessageIdImpl to MessageIdImpl. However, it's more intuitive to
> > write something like:
> >
> > ```java
> > class MessageIdPosition implements Comparable {
> > private final MessageId messageId;
> > // TODO: compare only the triple (ledger, entry, batch)
> > ```
> >
> > Therefore, I'm going to write a proposal to redesign the MessageId
> > interface only by adding some getters. Regarding the 5 existing
> > implementations, I think we can drop them because they are a part
> > of `pulsar-client`, not `pulsar-client-api`.
> >
> > Please feel free to share your points.
> >
> > Thanks,
> > Yunze


Re: [DISCUSS] PIP-175: Extend time based release process

2022-11-09 Thread Haiting Jiang
Hi all,

What's status of this PIP?

There's an issue talking about fixable vulnerabilities in the latest release.
https://github.com/apache/pulsar/issues/18348

>From what I see,  one of the problems is that we take too long to make a new 
>release ( over 2 months for 2.10.2 ). Hopefully, this PIP could do some help 
>on the issue.

Thanks,
Haiting

On 2022/06/07 22:25:24 Matteo Merli wrote:
> https://github.com/apache/pulsar/issues/15966
> 
> 
> 
> ## Motivation
> 
> In PIP-47 
> (https://github.com/apache/pulsar/wiki/PIP-47:-Time-Based-Release-Plan),
> we have adopted a time-based release plan. This was the first attempt
> at establishing a new principle on how releases should b
> 
> The main two benefits of this approach have been:
> 
>  1. Clarity for users and developers on when to expect a release
>  2. Breaking a hard relationship between feature and release: a
> particular feature will be included in the release if it is completed
> in time. Otherwise, it will be bubbled up to the next release.
> 
> The motivation for the current proposal is to extend the existing
> process to address the issues that we have seen and that were left out
> of the scope of PIP-47.
> 
> ## Summary of existing issues in the process
> 
> ### Short maintenance cycles for releases
> 
> Since we're doing a 3 months release cycle, we are ending with 4
> releases done per year, even though it's more close to 3 releases.
> 
> There is a high cost to maintain a lot of old releases, backport bug
> fixes, and security patches. In general, we actively support the last
> 3 minor releases while continuing to develop the next release. E.g.,
> 2.8, 2.9, and 2.10, while 2.11 is under development.
> 
> The result is that a user adopting a particular release is forced to
> upgrade in a < 1-year timeframe to keep up to date and use a supported
> release. This timeframe is too short for many users as it imposes a
> lot of forced upgrades, for which they are not prepared in terms of
> available time and required effort.
> 
> ### Live Upgrade/Downgrade compatibility path
> 
> In Pulsar, we guarantee that users have a way to do live upgrades and
> downgrades with zero downtime.
> 
> This is very powerful because it gives them the freedom to upgrade to
> a new release with the assurance of being able to roll back to the
> previous release in case any functional or performance regressions are
> encountered.
> 
> Today, this compatibility is guaranteed across minor versions. Eg: I
> can do  `2.7 -> 2.8 -> 2.7` as a live upgrade.
> 
> What is not guaranteed is to "skip" releases. E.g.: `2.7 -> 2.9` might
> work or not, but it's not guaranteed. In that case an intermediated
> upgrade would be required: `2.7 -> 2.8 -> 2.9`.
> 
> The reasons for which the "skip" upgrade might not work are multiple:
>   1. Incompatible upgrade of some dependency (e.g., ZooKeeper) that
> might not be compatible with an older version.
>   2. Adoption of a new metadata format or data format on disk.
>  Every time we introduce a new incompatible format change (outside
> of a regular Protobuf field addition), we do it in a 2 steps way:
>   - In a new release, we introduce the new feature/format,
> disabled by default. The new release can read both old and new
> formats, though it keeps writing the old format by default.
>   - In a subsequent release, we change the default to the new format
> 
> Note that this consideration is separate from the compatibility
> between clients and brokers, where we ***never*** break compatibility.
> The oldest available Pulsar client can still talk with the newest
> Pulsar broker, and vice versa, a new client, will be perfectly fine
> with an older broker (except the new features won't be working).
> 
> ### Releases getting delayed
> 
> Another problem we have been experiencing is that release cycles have
> been stretching considerably. Part of this has been because we have
> been reaching the end of the release window, preparing a candidate,
> and then taking a long time to flush out all issues found at the last
> minute in the new release.
> 
> We need to ensure that we have a date set in stone to deliver the
> release to users.
> 
> ## Proposal
> 
> The proposal to address the above issues is composed of 2 parts.
> 
> ### 1. Establish Long Term Support releases
> 
> We need to provide a way for users to quickly understand the expected
> lifecycle timeline of a given release and for that timeline to be long
> enough not to be a constant update mandate.
> 
> At the same time, we need to ensure that we maintainers are not
> spending all the time just maintaining a huge list of old releases.
> 
> For that, we can use the established concept of "Long Term Releases" or LTS.
> 
> We will perform LTS releases at a fixed cadence every 18 months, and
> we will keep doing regular feature releases every 3 months as we're
> currently doing.
> 
> The LTS releases will be identified by being a `.0` version. For example:
>  * `3.0` 

[GitHub] [pulsar] Jason918 added a comment to the discussion: Why pulsar retention quota must exceed configured backlog quota

2022-11-09 Thread GitBox


GitHub user Jason918 added a comment to the discussion: Why pulsar retention 
quota must exceed configured backlog quota

> Hello @Jason918 , @michaeljmarshall , but I still confused, could you please 
> explain for me why backlog quota couldn't exceed the retention? Is there any 
> risk if we set backlog quota larger than retention?

Messages will be deleted if they meet the constrains of retention policy. So if 
backlog quota is larger than retention, it just won't take effect.

GitHub link: 
https://github.com/apache/pulsar/discussions/17697#discussioncomment-4094535


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org