Re: [VOTE]PIP-247: Notifications for partitions update.

2023-03-05 Thread houxiaoyu
Thanks Asaf,  I will continue the discussion on the discussion thread

Asaf Mesika  于2023年3月5日周日 21:16写道:

> +1with note: I think the discussion thread has not reached consensus.
>
>
> On Thu, Mar 2, 2023 at 1:21 PM houxiaoyu  wrote:
>
> > Dear Community,
> >
> > I would like to start a VOTE on "PIP-247: Notifications for partitions
> > update."
> >
> > The proposal can be read at [0] and the discussion thread is available at
> > [1]
> >
> > Voting will stay open for at least 48h.
> >
> > [0] https://github.com/apache/pulsar/issues/19596
> > [1] https://lists.apache.org/thread/bcry0cz4z7kzot8pc4nhbktfv44xrk2y
> >
> > Thanks,
> > Xiaoyu Hou
> >
>


Re: [DISCUSS] PIP-247: Notifications for partitions update

2023-03-05 Thread houxiaoyu
Bump. Are there other concerns or suggestions about this PIP :)  Ping @
Michael @Joe @Enrico

Thanks
Xiaoyu Hou

houxiaoyu  于2023年2月27日周一 14:10写道:

> Hi Joe and Michael,
>
> I think I misunderstood what you replied before. Now I understand and
> explain it again.
>
> Besides the reasons what Asaf mentioned above, there are also some limits
> for using topic list watcher.  For example the `topicsPattern.pattern` must
> less that `maxSubscriptionPatternLeng` [0]. If the consumer subscribes
> multi partitioned-topics, the `topicsPattern.pattern` maybe very long.
>
> So I think that it's better to have a separate notification implementation
> for partition update.
>
> [0]
> https://github.com/apache/pulsar/blob/5d6932137d76d544f939bef27df25f61b4a4d00d/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/TopicListService.java#L115-L126
>
> Thanks,
> Xiaoyu Hou
>
> houxiaoyu  于2023年2月27日周一 10:56写道:
>
>> Hi Michael,
>>
>> >  I think we just need the client to "subscribe" to a topic notification
>> for
>> >  "-partition-[0-9]+" to eliminate the polling
>>
>> If pulsar users want to pub/sub a partitioned-topic, I think most of the
>> users would like to create a simple producer or consumer like following:
>> ```
>> Producer producer = client.newProducer().topic(topic).create();
>> producer.sendAsync(msg);
>> ```
>> ```
>> client.newConsumer()
>> .topic(topic)
>> .subscriptionName(subscription)
>> .subscribe();
>> ```
>> I think there is no reason for users to use `topicsPattern` if a pulsar
>> just wants to subscribe a partitioned-topic. In addition, `topicsPattern`
>> couldn't be used for producers.
>>
>> So I think PIP-145 [0] will benefit for regex subscriptions.  And this
>> PIP [1] will benefit for the common partitioned-topic pub/sub scenario.
>>
>> [0] https://github.com/apache/pulsar/issues/14505
>> [1] https://github.com/apache/pulsar/issues/19596
>>
>> Thanks
>> Xiaoyu Hou
>>
>> Michael Marshall  于2023年2月25日周六 01:29写道:
>>
>>> > Just the way to implements partitioned-topic metadata
>>> > notification mechanism is much like notifications on regex sub changes
>>>
>>> Why do we need a separate notification implementation? The regex
>>> subscription feature is about discovering topics (not subscriptions)
>>> that match a regular expression. As Joe mentioned, I think we just
>>> need the client to "subscribe" to a topic notification for
>>> "-partition-[0-9]+" to eliminate the polling.
>>>
>>> Building on PIP 145, the work for this PIP would be in implementing a
>>> different `TopicsChangedListener` [1] so that the result of an added
>>> topic is to add a producer/consumer to the new partition.
>>>
>>> I support removing polling in our streaming platform, but I'd prefer
>>> to limit the number of notification systems we implement.
>>>
>>> Thanks,
>>> Michael
>>>
>>> [0] https://github.com/apache/pulsar/pull/16062
>>> [1]
>>> https://github.com/apache/pulsar/blob/82237d3684fe506bcb6426b3b23f413422e6e4fb/pulsar-client/src/main/java/org/apache/pulsar/client/impl/PatternMultiTopicsConsumerImpl.java#L169-L175
>>>
>>>
>>>
>>> On Fri, Feb 24, 2023 at 1:57 AM houxiaoyu  wrote:
>>> >
>>> > Hi Joe,
>>> >
>>> > When we use PartitionedProducerImpl or MultiTopicsConsumerImpl,  there
>>> is a
>>> > poll task to fetch the metadata of the partitioned-topic regularly for
>>> the
>>> > number of partitions updated.  This PIP wants to use a
>>> > notification mechanism to replace the metadata poll task.
>>> >
>>> > Just the way to implements partitioned-topic metadata
>>> > notification mechanism is much like notifications on regex sub changes
>>> >
>>> > Joe F  于2023年2月24日周五 13:37写道:
>>> >
>>> > > Why is this needed when we have notifications on regex sub changes?
>>> Aren't
>>> > > the partition names a well-defined regex?
>>> > >
>>> > > Joe
>>> > >
>>> > > On Thu, Feb 23, 2023 at 8:52 PM houxiaoyu 
>>> wrote:
>>> > >
>>> > > > Hi Asaf,
>>> > > > thanks for your reminder.
>>> > > >
>>> > > > ## Changing
>>> > > > I have updated the following changes to make sure the notification
>>> > > arrived
>>> > > > successfully:
>>> > > > 1. The watch success response `CommandWatchPartitionUpdateSuccess`
>>> will
>>> > > > contain all the concerned topics of this watcher
>>> > > > 2. The notification `CommandPartitionUpdate` will always contain
>>> all the
>>> > > > concerned topics of this watcher.
>>> > > > 3. The notification `CommandPartitionUpdate`contains a
>>> monotonically
>>> > > > increased version.
>>> > > > 4. A map
>>> `PartitonUpdateWatcherService#inFlightUpdate>> > > > Pair>` will keep track of the updating
>>> > > > 5. A timer will check the updating timeout through `inFlightUpdate`
>>> > > > 6. The client acks `CommandPartitionUpdateResult` to broker when it
>>> > > > finishes updating.
>>> > > >
>>> > > > ## Details
>>> > > >
>>> > > > The following mechanism could make sure the newest notification
>>> arrived
>>> > > > successfully, copying the description from GH:
>>> > > >
>>> > > > A 

Re: [Vote] PIP-245: Make subscriptions of non-persistent topic non-durable

2023-03-05 Thread Yunze Xu
+1 (binding)

Thanks,
Yunze

On Fri, Mar 3, 2023 at 11:46 AM PengHui Li  wrote:
>
> +1 (binding)
>
> Penghui
>
> > On Feb 13, 2023, at 14:56, Jiuming Tao  
> > wrote:
> >
> > Hi all,
> >
> > I would like to start a VOTE on `PIP-245: Make subscriptions of 
> > non-persistent topic non-durable`.
> >
> > Motivation:
> >
> > There are two types of subscriptions for a topic: Durable and Non-durable.
> >
> > We create a Consumer with a Durable subscription and a Reader with a 
> > Non-durable subscription.
> >
> > But for NonPersistentTopic, creating a Durable subscription is meaningless, 
> > NonPersistentSubscription doesn't have a ManagedCursor to persistent its 
> > data. After its consumer disconnected, the subscription couldn't be removed 
> > automatically if we didn't set the value of 
> > subscriptionExpirationTimeMinutes greater than 0.
> >
> > For subscriptionExpirationTimeMinutes, it controls the subscription 
> > expiration of NonPersistentTopic and PersistentTopic, if we set the value 
> > of subscriptionExpirationTimeMinutes greater than 0, it may lead to data 
> > loss(The durable subscriptions of PersistentTopic also can be removed).
> >
> > And the Non-durable subscriptions will be removed automatically after all 
> > the consumers disconnected, it's the existing logic.
> >
> > For the purpose of removing the subscriptions which have no active 
> > consumers of NonPersistentTopic and the above reasons, we can make all the 
> > subscriptions of a NonPersistentTopic Non-durable.
> >
> >
> >
> > For more details, you can read: 
> > https://github.com/apache/pulsar/issues/19448 
> > 
> >
> > And the discuss thread is available at: 
> > https://lists.apache.org/thread/2ltmyglnb25jy8nk58twkwbglws43bst 
> > 
> >
> > Thanks,
> > Tao Jiuming
>


Re: [VOTE] Pulsar Node.js Client Release 1.8.1 Candidate 2

2023-03-05 Thread PengHui Li
+1 (binding)

- Verified checksum and signature
- Install from npm
- Start a standalone and run the example following here [0]

We should mention the following in the verification steps.

1. The pulsar_binary_host_mirror is required for the RC version
2. The example in README needs to copy to a .js file
3. A standalone is required when you try to run the example

[0] https://github.com/apache/pulsar-client-node#getting-started

Regards,
Penghui

On Thu, Mar 2, 2023 at 2:35 PM Yunze Xu 
wrote:

> +1 (binding)
> - Verified checksum and signature
> - Build from source on Ubuntu 20.04 WSL2
> - Test produce and consume with examples in this repo
> - Test TLS encryption and OAuth2 authentication on Ubuntu 20.04 and
> Windows 10 with https://github.com/BewareMyPower/pulsar-tls-examples
>
> Thanks,
> Yunze
>
> On Tue, Feb 28, 2023 at 5:08 PM Nozomi Kurihara 
> wrote:
> >
> > +1 (binding)
> >
> > * checked license headers
> > * verified checksum and signature
> > * install from npm and run producer/consumer
> >
> > Thanks,
> > Nozomi
> >
> > 2023年2月26日(日) 12:23 Baodi Shi :
> >
> > > Hi everyone,
> > >
> > > This is the first release candidate for Apache Pulsar Node.js client,
> > > version 1.8.1.
> > >
> > > It fixes the following issues:
> > >
> > >
> https://github.com/apache/pulsar-client-node/pulls?q=is%3Apr+label%3Arelease%2Fv1.8.1+is%3Aclosed
> > >
> > > Please download the source files and review this release candidate:
> > > - Download the source package, verify shasum and asc
> > > - Follow the README.md to build and run the Pulsar Node.js client.
> > >
> > > The release candidate package has been published to the npm registry:
> > > https://www.npmjs.com/package/pulsar-client/v/1.8.1-rc.2
> > > You can install it by `npm i pulsar-client@1.8.1-rc.2
> > > --pulsar_binary_host_mirror=
> > > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-node/`
> 
> > > 
> and
> > > verify the package.
> > >
> > > You can refer to this repository to verify tls related features:
> > >
> > >- https://github.com/shibd/pulsar-client-tls-test
> > >
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Source files:
> > >
> > >
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-node/pulsar-client-node-1.8.1-rc.2/
> > >
> > > Pulsar's KEYS file containing PGP keys we use to sign the release:
> > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> > >
> > > SHA-512 checksum:
> > >
> > >
> > >
> e596fef3eba6fbd25413ccf6eee3cf0a22c24625ff699b4f6d49676ebe2a053f4864ecdee79eb4dbde4fde143e867ec5c1fe667d0a1db07370b9d2abdb806ac3
> > >  apache-pulsar-client-node-1.8.1.tar.gz
> > >
> > > The tag to be voted upon:
> > > v1.8.1-rc.2(f0a5e0b)
> > > https://github.com/apache/pulsar-client-node/releases/tag/v1.8.1-rc.2
> > >
> > > Please review and vote on the release candidate #1 for the version
> 1.8.1,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > >
> > >
> > > Thanks,
> > > Baodi Shi
> > >
>


Re: [PROPOSAL] Roadmap for 3.0 release

2023-03-05 Thread Cong Zhao
+1 Looks great!

Thanks,
Cong

> 2023年2月18日 06:44,Matteo Merli  写道:
> 
> Since the LTS release model has been formally approved, I'm proposing
> the following schedule for the release:
> 
> * Tue - 2023-05-11
>  - RC-1
>  - Code Freeze -- Only critical fixes will be merged in the 3.0
> release branch. Contributors should plan to have all the changes merged in
> before this date. Exceptions should be extremely rare and strongly
> motivated.
> 
> * Tue - 2023-05-18 - RC-2
> * Tue - 2023-05-25 - RC-3
> * Tue - 2023-05-02 - Announce 3.0 Release
> 
> These dates will be published on the website to present users with a
> "roadmap" and we should commit to and respect these dates.
> 
> I also wanted to propose trying out a model where we have 3 release
> managers for all major releases.
> 
> The reasoning behind this is for this small group of people to collaborate
> and divide the tasks for the release: merging patches from the "master"
> branch, preparing RC, and testing.
> 
> Since everyone also has other work duties and unexpected tasks that can pop
> up at any time, it will help to have redundancy in the release-management
> "team", so that we can release on the exact dates.
> 
> Thanks,
> Matteo
> 
> --
> Matteo Merli
> 



Re: Please stop cherry-picking (breaking) changes to the released branches

2023-03-05 Thread Xiangying Meng
>Maybe it is better to start a discussion on the mailing list when you want
to
>cherry-pick something and wait for some time.
>If nobody objects then it is good to go

Because there are a large number of PRs that need to be cherry-picked,
some PRs may not strictly abide by this agreement when cherry-picking.
But from the perspective of a release manager,
I think there are three points that we should abide by.
1. As Enrico said in the discussion of starting release 2.10.3,
 it is better not to include the new last-minute things in the release for
a stable branch.
2. The release manager will send an email to notify everyone after all the
PRs with the release label are cherry-picked.
 If there is a new PR that needs to enter this release at this time,
 it is better to submit a PR to cherry-pick it and leave a message under
the mail.
3. If the newly entered commit is verified, whether it runs a CI in its own
repository or in pulsar's repository,
 then we can add the `Verified` label to it.
 Otherwise, it is best not to add the `Verified` label to this commit.

For example, this commit [1] is a new thing that was just merged before
starting this release.
It has not been verified but has a verified label.
Due to my mistake, after noticing the Verified label, I didn't check
further to see if the commit was verified.
This commit introduced some failure in CI, so we now need to re-release.

Of course, this is just a small problem so far.
Fortunately, there is a new change [2] that needs to be imported to
branch-2.10 today,
so we discovered this problem in time.
After all, the current verification process for the release candidates
cannot discover these problems.

Normally, such PRs with minor changes and little conflicts to branch-x can
be directly cherry-picked without running CI,
but it is best not to do this before releasing immediately.

Let`s do our best to make the Pulsar community more mature and perfect.


Sincerely
Xiangying

[1]
https://github.com/apache/pulsar/commit/a3a242cc48e8d7454ee0c5c8fd872ae6f92ae4f7
[2]  https://github.com/apache/pulsar/pull/19711

On Tue, Feb 28, 2023 at 7:17 PM Enrico Olivelli  wrote:

> Il giorno mar 28 feb 2023 alle ore 12:11 PengHui Li
>  ha scritto:
> >
> > > By the way the main point in this email thread is that we should
> > totally stop to do cherry-picks of stuff that it is
> > not strictly needed
> >
> > Yes, the main issue we need to resolve is how we can define if
> >  the stuff strictly needed to cherry-pick. Do you think the author
> > to provide the cherry-pick information or reviewers to add labels
> > and confirm the label is correct before merging it is a good way?
> > Who wants to update the release/* label, the context is required.
> > Do not only change the release label without any information.
>
> Maybe it is better to start a discussion on the mailing list when you want
> to
> cherry-pick something and wait for some time.
> If nobody objects then it is good to go
>
> Like we slowed down the pace to add new big changes with PIP
> we could follow a slower workflow for cherry-picks.
>
> We could try this strategy for a while.
>
> After all, we are never in a hurry to merge a patch (urgent patches,
> for security issues follow a different path),
> we aren't cutting releases very often.
>
>
> Enrico
>
> >
> > Or, push PR for every cherry-pick to get approved by committers.
> >
> > Thanks,
> > Penghui
> >
> > On Tue, Feb 28, 2023 at 6:32 PM Enrico Olivelli 
> wrote:
> >
> > > Il giorno mar 28 feb 2023 alle ore 11:19 Yubiao Feng
> > >  ha scritto:
> > > >
> > > > Append asuggestion:
> > > > - After a PR revert, we need to remove the label named "release-xxx",
> > > which
> > > > can alleviate the release manager's work
> > >
> > > I think that it is up to the committer who merges the patch to
> > > cherry-pick immediately to the other branches.
> > > At that point you have enough context to merge the patch and for sure
> > > the committer knows the patch well.
> > >
> > > In Apache BookKeeper and in Apache ZooKeeper we have a script that
> > > does the merge against the target branch and
> > > then it allows you to cherry-pick the other branches.
> > >
> > > Delaying the merge too much makes things harder.
> > >
> > > By the way the main point in this email thread is that we should
> > > totally stop to do cherry-picks of stuff that it is
> > > not strictly needed
> > >
> > >
> > > Enrico
> > >
> > > >
> > > > Thanks
> > > > Yubiao Feng
> > > >
> > > > On Mon, Feb 27, 2023 at 11:27 PM Enrico Olivelli <
> eolive...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Committers,
> > > > > I believe that we should stop cherry-picking breaking changes like
> [1]
> > > > > to released branches.
> > > > > Really, this is something that we cannot do.
> > > > >
> > > > > When you decide to cherry-pick a commit to a "stable branch",
> > > > > currently branch-2.8, branch-2.9, branch-2.10 and branch-2.11 you
> > > > > always have to think about these things:
> > > > > - is 

Re: [DISCUSS] PMC/Committer Emiratus status

2023-03-05 Thread Asaf Mesika
Thanks to everyone who took the time to carefully answer with detailed
explanations.
I personally learned a lot about Apache projects this way (made me read
about it some more).

So my personal recap is:

   - The goal of knowing the health of the Apache Pulsar community can be
   achieved by taking a look at monthly active contributors over time
   displayed on the community page.
  - It could be nice getting those numbers on the mailing list itself
  as well.
   - Calculating the engagement is not an easy task.
   - Kicking people off is not something you'd like to do in general and
   specifically for volunteers.
   - People's credit for work, which is also expressed in PMC membership
   never expires due to Merit never expires - your work credit and earned
   right should not expire.


I personally see PMC members answering someone not a PMC member nor a
comitter on this topic as a very healthy community indicator :)

Thanks !

Asaf

On Fri, Mar 3, 2023 at 10:22 AM Enrico Olivelli  wrote:

> This is an interesting discussion.
> Good to see this kind of a discussion on the dev@ mailing list, this
> way more people are aware of the fact that we are a project in the ASF
> and there is a Project Management Committee.
>
> I have been following a few Apache projects for a while, and I believe
> that this kind of discussions should be run on the private@ mailing
> list.
> It is the PMC that usually deals with this stuff.
>
> As Tison said, the common practice is that you never remove anyone
> from a PMC or from the Committers list.
>
> This happens only in rare cases where an individual behaves in such a
> way that the Project or the Foundation could be damaged,
> for instance if you speak on behalf of the project and you offend
> someone publicly.
>
> Inactive contributors/committers/PMC members do not do any harm to a
> project.
>
> Some projects have some rules that you cannot participate in official
> VOTEs if you are not "active".
>
> If anyone has some problems with someone in the community, then they
> can reach out to priv...@pulsar.apache.org and the PMC will listen to
> the problem and take actions.
>
> my 2 cents
>
> Enrico
>
> Il giorno ven 3 mar 2023 alle ore 04:39 Yunze Xu
>  ha scritto:
> >
> > As a PMC member, I don't like playing a game of determining who should
> > be removed from PMC as well.
> >
> > I hear a viewpoint that someone is only participating in the community
> > only to join a PMC so that he can benefit from it. After becoming a
> > PMC member, he is never active in the community. It might be true but
> > I think it's acceptable. Making such a rule won't prevent such cases.
> > If he wants, he can make use of the rule and keep himself "active" to
> > avoid being kicked out of the PMC. Though the active state is fake.
> >
> > I'm not against the way to remove (or something else that sounds good)
> > a PMC member because none of these ways is perfect. However, I'm
> > STRONGLY AGAINST changing a rule that has been applied for some time
> > unless it can be proved the rule is very harmful to the community.
> > You mentioned https://www.apache.org/dev/pmc.html#pmc-removal. But
> > please don't ignore the first sentence:
> >
> > > Projects can establish their own policy on handling inactive members,
> as long as they apply it CONSISTENTLY.
> >
> > In addition, Dave and Tison both mentioned we have some boards or
> > webpages to see how many people are active. We don't need to remove
> > some PMC members just for knowing who were still active recently.
> >
> > BTW, I'm also curious about the motivation of this proposal. I'm
> > wondering how do the inactive PMC members harm the community?
> >
> > Thanks,
> > Yunze
> >
> > On Fri, Mar 3, 2023 at 10:14 AM tison  wrote:
> > >
> > > Hi,
> > >
> > > In the proposal, it's unclear if you'd like to _mark_ the inactive
> members
> > > in emeritus status or _remove_ them from the LDAP group.
> > >
> > > I saw a similar discussion in the Flink community, resulting in
> "active"
> > > sentences in its Bylaws[1]. Here is some consensus there:
> > >
> > > 1. Merits never expire. There's no reason to _remove_ a committer or
> PMC
> > > member from the LDAP group because of inactive following the Apache
> way. I
> > > remember numbered cases a member got removed because they _keep_
> harming
> > > the community.
> > > 2. Emeritus status is set for unblocking consensus. The Flink community
> > > experienced some votes that could not get the required approvals in
> time
> > > and thus tried to unblock consensus by setting some members with
> binding
> > > votes in emeritus status. Do we spot concrete issues that the Pulsar
> > > community cannot work well with current PMC members and committers
> group?
> > > 3. Emeritus status is voluntary. I know that in other foundations, it
> can
> > > be judged or eagerly applied, but in ASF, we share a "Community of
> Peers"
> > > sense that everyone is a volunteer. They won't be "fired" because of
> 

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-05 Thread SiNan Liu
Hi! I updated the explanation of some things in the PIP issue. And also
added a new “flag” in the conf is used as the different
ProtobufNativeSchemaValidator implementation, also set
ProtobufNativeSchemaValidator default only check whether the name of the
root message is the same.


Thanks,
sinan


Asaf Mesika  于2023年3月5日周日 20:21写道:

> On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu  wrote:
>
> > >
> > > Can you please explain how a Protobuf Schema descriptor can be
> validated
> > > for backward compatibility check using Avro based compatibility rules?
> > > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > > descriptor?
> > > Is there some translation happening?
> >
> >
> > 1. *You can take a quick look at the previous design, the PROTOBUF uses
> > avro struct to store.*
> > https://github.com/apache/pulsar/pull/1954
> >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
>
>
> Ok. So to summarize your code (easier to write it than send links):
> * Pulsar Client, when used with Protobuf Schema, actually converts the
> Protobuf descriptor into an Avro Schema (using code found inside Avro
> library) and saves that Avro schema as the schema. It's not saving the
> protobuf descriptor at all. Very confusing I have to add - never expected
> that.
> This explains why In the ProtobufSchemaCompatibilityCheck they just extend
> the Avro without doing any translation.
>
> Thanks for that.
>
> Now thatI finally understand this, I can say that: you *must* explain that
> in the motivation part in your PIP.
>
>
>
> >
> >
> > 2. *On the broker side, protobuf and avro both use `SchemaData` converted
> > to `org.apache.avro.Schema`.*
> >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
>
>
> Actually those links don't really help.
> The main link that helps is:
>
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
>
>
> >
> >
> >
> >
> > I'm sorry - I don't understand.
> > > I understand the different compatibility check strategies.
> > > If you just spell them out here, then as you say, just translate the
> > > Protobuf Descriptor into an Avro schema and run the Avro
> > > compatibility validation, no?
> > > I believe the answer is no, since you may want to verify different
> things
> > > when it comes to Protobuf, which are different then Avro.
> >
> >
> > 1.
> > *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> > avro-protobuf.*
> >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > *ProtobufNativeSchema needs a native compatibility check, but there is no
> > official or third party implementation. So this PIP does not use
> > avro-protobuf for protobuf compatibility checking.*
> >
> > 2. *By the way, this is implemented in much the same way that Apache avro
> > does compatibility checking.*
> >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > writtenSchema and readSchema. We only need to change the order of
> arguments
> > we pass to `canRead()`.*
> > ```java
> > private void validateWithStrategy(Descriptors.Descriptor toValidate,
> > Descriptors.Descriptor fromDescriptor) throws
> ProtoBufCanReadCheckException
> > {
> > switch (strategy) {
> > case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> > case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
> > case CanBeReadMutualStrategy -> {
> > canRead(toValidate, fromDescriptor);
> > canRead(fromDescriptor, toValidate);
> > }
> > }
> > }
> >
> 

Re: [VOTE] Pulsar Release 2.10.4 Candidate 1

2023-03-05 Thread Xiangying Meng
Hi, community,

Sorry to tell everyone that we may need to abort the release
2.10.4-candidate-1 because some CI can not be passed after #19674 [0] is
cherry-picked.
I will be sure to carry out the release process again as soon as it is
resolved.

Sincerely
Xiangying
[0] https://github.com/apache/pulsar/pull/19674


On Sat, Mar 4, 2023 at 12:06 PM Xiangying Meng  wrote:

> This is the third release candidate for Apache Pulsar, version 2.10.4.
>
> This release contains 99 commits by 34 contributors.
> https://github.com/apache/pulsar/compare/v2.10.3...v2.10.4-candidate-1
>
> *** Please download, test, and vote on this release. This vote will stay
> open
> for at least 72 hours ***
>
> Note that we are voting upon the source (tag), binaries are provided for
> convenience.
>
> Source and binary files:
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.10.4-candidate-1/
>
> SHA-512 checksums:
> 8cae74a5b586ab2378c2b2737c59507180af4b8efab4a99bc0dae233096036f5b18ab94255bea03e416d8d21958bedf684c8d4bd3982f458a547d3e1efa0f19f
>  apache-pulsar-2.10.4-bin.tar.gz
> 74e16c61ff6ae9e2a51e7ae24981598c71dabbff09c820bff9303c031882e1f15d029d06b6b5b6e4cc9a02b8957a102338ce09173c8744a59e5bd848b48b1d2a
>  apache-pulsar-2.10.4-src.tar.gz
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachepulsar-1210/
>
> The tag to be voted upon:
> v2.10.4-candidate-1 (d1aebd3e4c9503406845fb2e746a289e88e00fb2)
> https://github.com/apache/pulsar/releases/tag/v2.10.4-candidate-1
>
> Pulsar's KEYS file containing PGP keys you use to sign the release:
> https://downloads.apache.org/pulsar/KEYS
>
> Docker images:
>
> 
>
> https://hub.docker.com/layers/xiangyingmeng/pulsar/2.10.4/images/sha256-144d0380592a7e0578772eb2fa51da7cad70f1d5f8a2b46189669b15f0e6b4b6?context=repo
>
> 
>
> https://hub.docker.com/layers/xiangyingmeng/pulsar-all/2.10.4/images/sha256-bcf03c05be93ced24991afbcca13f4a4b5f183d9a7b877ae84e992e16ca599ee?context=repo
>
> Please download the source package, and follow the README to build
> and run the Pulsar standalone service.
>


Re: [DISCUSS] PIP-253: Expose producer metrics for deadLetterProducer and retryLetterProducer

2023-03-05 Thread Asaf Mesika
I would rather see them as attributes of ConsumerStats .
Add

ProducerStats deadLetterProducerStats;

ProducerStats retryLetterProducerStats();


On Fri, Mar 3, 2023 at 2:54 AM Kai Levy  wrote:

> Hello!
>
> I created a new PIP because I discovered there's no way for a user to
> access the metrics for a consumer's deadLetterProducer /
> retryLetterProducer, since it is private to ConsumerImpl.java. I would like
> to propose an API change that would expose those statistics. More details
> on the github issue:
> https://github.com/apache/pulsar/issues/19698
>
> Thanks!
> Kai
>


Re: [VOTE]PIP-247: Notifications for partitions update.

2023-03-05 Thread Asaf Mesika
+1with note: I think the discussion thread has not reached consensus.


On Thu, Mar 2, 2023 at 1:21 PM houxiaoyu  wrote:

> Dear Community,
>
> I would like to start a VOTE on "PIP-247: Notifications for partitions
> update."
>
> The proposal can be read at [0] and the discussion thread is available at
> [1]
>
> Voting will stay open for at least 48h.
>
> [0] https://github.com/apache/pulsar/issues/19596
> [1] https://lists.apache.org/thread/bcry0cz4z7kzot8pc4nhbktfv44xrk2y
>
> Thanks,
> Xiaoyu Hou
>


Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-05 Thread Asaf Mesika
On Thu, Mar 2, 2023 at 12:57 PM 太上玄元道君  wrote:

> > I  think you should fix this explanation:
>
> Thanks! I would like to copy the context you provide to the PIP motivation,
> your description is more detailed, so developers don't have to go through
> the code.
>

Sure


>
> > Today the quota is checked periodically, right? So that's how the
> operator
> > knows the cost in terms of I/O is limited.
> > Now you are adding one additional I/O per collection, every 1 min by
> > default. That's a lot perhaps. How long is the check interval today?
>
> Actually, I don't want to introduce additional costs, I thought we
> could cache its result, so that it won't introduce additional costs.
> It may be that I did not make it clear in the PIP and caused this
> misunderstanding, sorry.
>

Ok, just to verify: You plan to modify the code that runs periodically the
backlog quota check, so the result will be cached there? This way when you
pull that information from that code every 1min to expose it as a metric it
will have 0 I/O cost?



>
> > The user today can calculate quota used for size based limit, since there
> > are two metrics that are exposed today on a topic level: "
> > pulsar_storage_backlog_quota_limit" and "pulsar_storage_backlog_size".
> You
> > can just divide the two to get a percentage.
> > For the time-based limit, the only metric exposed today is quota itself ,
> "
> > pulsar_storage_backlog_quota_limit_time".
>
> I only noticed `pulsar_storage_backlog_size` but missed
> `pulsar_storage_backlog_quota_limit` and
> `pulsar_storage_backlog_quota_limit_time`. Many thanks for your reminder.
>
>
> So, in this condition, we already have the following topic-level metrics:
> `pulsar_storage_backlog_size`: The total backlog size of the topics of this
> topic owned by this broker (in bytes).
> `pulsar_storage_backlog_quota_limit`: The total amount of the data in this
> topic that limits the backlog quota (bytes).
> `pulsar_storage_backlog_quota_limit_time`: The backlog quota limit in
> time(seconds). (This metric does not exists in the doc, need to improve)
>
>
> We just need to add a new metric named
> `pulsar_storage_earliest_msg_publish_time_in_backlog` in the topic-level
> that indicates the publish time of the earliest message in the backlog.
> So users could get `pulsar_backlog_size_quota_used_percentage` by divide
> `pulsar_storage_backlog_size ` and
> `pulsar_storage_backlog_quota_limit`(`pulsar_storage_backlog_size` /
> `pulsar_storage_backlog_quota_limit`),
> and could get `pulsar_backlog_time_quota_used_percentage` by divide `now -
> pulsar_storage_earliest_msg_publish_time_in_backlog` and
> `pulsar_storage_backlog_quota_limit_time` (`now -
> pulsar_storage_earliest_msg_publish_time_in_backlog` /
> `pulsar_storage_backlog_quota_limit_time`).
>

I think there is a problem with the name
`pulsar_storage_earliest_msg_publish_time_in_backlog` in the topic-level:
* First, I prefer exposing the age rather than the publish time.
* Second, it's a bit hard to figure out the meaning of the earliest msg in
the backlog.

Maybe `pulsar_storage_backlog_age_seconds`? In the explanation you can
write: "The age (time passed since it was published) of the earliest
unacknowledged message based on the topic's
existing subscriptions" ?



>
> The backlog quota time checker runs periodically, so we can cache its
> result, so it won't lead to much costs.
>
> Pulsar also exposed subscription-level  `backlogSize` and
> `earliestMsgPublishTimeInBacklog` in Pulsar-Admin
> <
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L1139
> >
> if
> `subscriptionBacklogSize` and `getEarliestTimeInBacklog` are true.
> We can also expose `backlogQuotaLimiteSize` and `backlogQuotaLimitTime` of
> the topic to PulsarAdmin.
>

What is the relationship you see between Pulsar exposing
subscriptionBacklogSize and earliestMsgPublishTimeInBacklog in
subscription, to exposing the backlog quota limits in pulsar admin?

Limits can be exposed to Pulsar Admin, since it has 0 cost associated with
it.
I think it's a good idea to do that.
The quota usage can also be exposed to pulsar admin, since we pull that
data from the backlog quota checker cache, so it has 0 cost as well.

As we said in previous email we can also expose
`backlogQuotaTimeOldestBacklogAgeSubscriptionName`


>
> After users receive the backlog alert from metrics alerting systems, they
> can get the topic name, then, they can request Topics#getStats
> <
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L1139
> >
> to
> get which subscriptions are in the huge backlog.
>
>
I agree users can use PulsarAdmin getStats for topic , with
getEarliestTimeInBacklog=true to find the oldest subscription responsible
for exceeding quota, but we can give them that information with 0 cost
since we already have that subscription name cached (we spent 

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-05 Thread Asaf Mesika
On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu  wrote:

> >
> > Can you please explain how a Protobuf Schema descriptor can be validated
> > for backward compatibility check using Avro based compatibility rules?
> > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > descriptor?
> > Is there some translation happening?
>
>
> 1. *You can take a quick look at the previous design, the PROTOBUF uses
> avro struct to store.*
> https://github.com/apache/pulsar/pull/1954
>
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
>
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115


Ok. So to summarize your code (easier to write it than send links):
* Pulsar Client, when used with Protobuf Schema, actually converts the
Protobuf descriptor into an Avro Schema (using code found inside Avro
library) and saves that Avro schema as the schema. It's not saving the
protobuf descriptor at all. Very confusing I have to add - never expected
that.
This explains why In the ProtobufSchemaCompatibilityCheck they just extend
the Avro without doing any translation.

Thanks for that.

Now thatI finally understand this, I can say that: you *must* explain that
in the motivation part in your PIP.



>
>
> 2. *On the broker side, protobuf and avro both use `SchemaData` converted
> to `org.apache.avro.Schema`.*
>
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
>
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
>
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70


Actually those links don't really help.
The main link that helps is:
https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122


>
>
>
>
> I'm sorry - I don't understand.
> > I understand the different compatibility check strategies.
> > If you just spell them out here, then as you say, just translate the
> > Protobuf Descriptor into an Avro schema and run the Avro
> > compatibility validation, no?
> > I believe the answer is no, since you may want to verify different things
> > when it comes to Protobuf, which are different then Avro.
>
>
> 1.
> *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> avro-protobuf.*
>
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> *ProtobufNativeSchema needs a native compatibility check, but there is no
> official or third party implementation. So this PIP does not use
> avro-protobuf for protobuf compatibility checking.*
>
> 2. *By the way, this is implemented in much the same way that Apache avro
> does compatibility checking.*
>
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
>
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
>
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
>
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> writtenSchema and readSchema. We only need to change the order of arguments
> we pass to `canRead()`.*
> ```java
> private void validateWithStrategy(Descriptors.Descriptor toValidate,
> Descriptors.Descriptor fromDescriptor) throws ProtoBufCanReadCheckException
> {
> switch (strategy) {
> case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
> case CanBeReadMutualStrategy -> {
> canRead(toValidate, fromDescriptor);
> canRead(fromDescriptor, toValidate);
> }
> }
> }
>
> private void canRead(Descriptors.Descriptor writtenSchema,
> Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> readSchema);
> }
> ```
>
>
I get that you want to take inspiration from the existing Avro Schema
compatibility check, to do your code design.
I also understand you *won't* use any existing avro code for that.
I also understand, you have to write the validation check on your own,
since there is no 3rd party to explain that.

The only thing I