[DISCUSS] cherry-pick #21816 resolving the metrics missing issue for time-based backlog

2024-03-25 Thread PengHui Li
Hi all,

I want to start a discussion to cherry-pick #21816[0] to release branches.
This PR added the metrics for the time-based backlog, which is introduced
in
2.8.0 [1]. However, there has always been a lack of relevant indicators to
assist users in daily monitoring work. It becomes a blocker for users to use
the time-based backlog on production, and it is hard to add alerts and
dashboards.

Since #21816 is not a BUG fix, it hasn't been cherry-picked to release
branches. But now, I believe having it in the release branches is worth it.

The target branches:

- branch-3.2
- branch-3.0
- branch-2.11
- branch-2.10

[0] https://github.com/apache/pulsar/pull/21816
[1] https://github.com/apache/pulsar/pull/10093

I will keep the discussion open for at least 48 hours.
If there is no objections, I will perform the cherry-picking.

Regards,
Penghui


Re: Suggestions on GitHub labels and issue templates

2024-03-25 Thread Lari Hotari
Thanks Kiryl, very good proposal.

> • (?) Probably it makes sense to enable and track website and docs issues in 
> apache/pulsar-site repository. And add a good visible link to apache/pulsar 
> README.md.

Yes, that would work too. Since the issue reporting for docs has been
centralized to apache/pulsar in the past, I don't think that it's a
great idea to move it back to pulsar-site, unless there's a compelling
reason. Instead of moving the location of website and docs issues, we
could improve the template for docs issues and add a template for
website issues.

-Lari



On Mon, 18 Mar 2024 at 15:39, Kiryl Valkovich  wrote:
>
> Comment with better formatting on GitHub: 
> https://github.com/apache/pulsar/issues/22277#issuecomment-2002553745
>
> • Deprecate java label. Pulsar is written in Java and most PRs update 
> Java code.
> • Instead of removing labels, deprecate them by renaming them to 
> deprecated/. Probably pick another prefix that is alphabetically 
> closer to the end of the alphabet to reduce noise.
> • Add go label automatically using labeler: 
> https://github.com/apache/pulsar/blob/master/.github/labeler.yml
> go:
> - changed-files:
> - any-glob-to-any-file: '**/*.go'
>
> • Add component/* labels automatically based on the file path
> component/config:
> - changed-files:
> - any-glob-to-any-file: 'conf/**/*'
> - any-glob-to-any-file: 'pulsar-config-validation/**/*'
> component/client:
> - changed-files:
> - any-glob-to-any-file: 'pulsar-client/**/*'
> - any-glob-to-any-file: 'pulsar-client-*/**/*'
> ...
>
> • Rename bug label to type/bug for consistency. Keep the red color.
> • (?) Rename component/* => area/* for shorter names. The 
> https://github.com/kubernetes/kubernetes/labels has such naming.
> • Rename doc-required label to type/doc. Relabel open issues and PRs with 
> doc labels to the type/doc.
> • Deprecate all other doc-* labels. If it is needed for some kind of 
> workflow, simply use the board project with ToDo -> In Progress -> Done 
> states.
> • (?) Probably it makes sense to enable and track website and docs issues 
> in apache/pulsar-site repository. And add a good visible link to 
> apache/pulsar README.md.
> • Deprecate the question label. Instead, move such issues to Discussions 
> -> Q
> • Migrate issues with the enhancement either to type/feature label or 
> Discussions. Add a new Suggest an idea issue template that redirects to the 
> Discussions -> Ideas
> • (?) Rename PIP => type/PIP for consistency
> • Rename flaky-test => type/flaky-test to consistency
> • Deprecate lifecycle/stale label. Use Stale instead. Rename Stale => 
> stale for consistency.
> • Add the ability to pick an area/* label from the dropdown on issue 
> creation.
> systemd/systemd and a few other projects use this action for that: 
> https://github.com/redhat-plumbers-in-action/advanced-issue-labeler?tab=readme-ov-file#real-life-examples
>
>
> Best,
> Kiryl
>


Re: [DISCUSS] Release Pulsar C++ Client 3.5.1 and upgrade the verify process

2024-03-25 Thread Baodi Shi
+1 Thank you for push this discussion.

We can modify the release process: we'll require the release manager to
attach the PR for Python and Node.js upgrades when initiating a candidate
vote, and ensure it CI can pass.

Once the CPP client release is successful, we can remove the candidate, and
then push for its merge.


Thanks,
Baodi Shi


On Mar 25, 2024 at 18:29:23, Yunze Xu  wrote:

> Hi all,
>
> Recently I found a regression [1] for the C++ client 3.5.0 (thanks to
> the reminder from @shibd). So I will push a fix and then release the
> C++ client 3.5.1.
>
> However, this is not the 1st time that a regression was introduced,
> see [2] for example. So I suggest when verifying the C++ client, we
> can verify the Python and Node.js clients by upgrading the
> dependencies as well. See the updated release process in [3].
>
> [1] https://github.com/apache/pulsar-client-cpp/issues/420
> [2] https://lists.apache.org/thread/rjolgrlp4x1lmfj678k3hjco80kcb73c
> [3]
> https://github.com/apache/pulsar-client-cpp/wiki/Verify-the-candidate-release-in-your-local-env#verify-the-3rd-party-projects-that-depend-on-pulsar-c-client
>
> Thanks,
> Yunze
>


Re: [DISCUSS] Broken builds and CI Failures in Maintenance Branches; improving maintenance strategy to address root causes

2024-03-25 Thread PengHui Li
Hi, Lari

Thanks for driving the discussion, and I agree that the cherry-picking is
the pain
especially when we need to maintain old branches for a long time.

Frankly, my first impression is to target the bug fix to branch-3.0, but
the features and
improvements to the master branch will burden the contributors and
committers more.
They might merge the changes to the wrong branches for a time because they
need time
to build muscle memory. Of course, we can use CI to check the labels and
the target branch.
It will not be a blocker.

I agree that the merge branch solution will resolve the ordering and
coordination
issues arising from the cherry-pick solutions. Coordination means how to
decide a PR
should be cherry-picked (Yunze pointed out to me).

I have a few questions about the merge branch solution.

- It looks like we will employ both merge branches and cherry-pick
solutions finally after we have
  4.0. Because at that time, the target branch for the BUG fix is
branch-4.0, and we still have 18
  month overlap.

- For the existing cherry-picking solution, if there is a case that we
can't cherry-pick it due to
  too many conflicts, we will usually create a separate PR for the release
branch directly. How do we
  handle this case with the merge branch solution? If I understand
correctly, we can also push separate
  PRs to the new branches and always apply the new branches when handling
merge conflicts from
  this commit?

- Is it possible to cherry-pick commits from the master to the LTS branch?
The reason for asking this
  question is a PR might be recognized as an improvement, but someone found
it should be contained
  in the LTS version. For example,
https://github.com/apache/pulsar/pull/21739. Maybe there are other
  solutions to handle this case, e.g., push PR directly. Because we might
get much more conflicts at that
  time.

- Do we need to wait for the PRs that are targeted to branch-3.0 to be
merged before cutting
  branch-4.0? Because if there are many comments on the existing PR, we
don't want to ask the author
  to create a new one to continue the review with targeting branch-4.0.
Usually, we will cut branches for
  preparing the release for at least 3 weeks. It sounds like a challenge
because we will only allow
  regression fixes to branch-4.0 during that time. We need to find a
solution for it.

- Does the committer performing the branch merging need to resolve all the
conflicts? I mean, if we have
  20 commits need to merge, and maybe there is only one that is urgent to
merge to the new branch for
  a patch release. With the cherry-pick solution, you can only cherry-pick
that commit and create the
  patch release. I think we must merge all the commits for the merge branch
solution. Maybe I'm wrong.

I would support the merge branch solution and we also need documentation to
clarify the items to note.
If I understand correctly, we can also go back to the current solution if
we find something is not working, right?
Because the cherry-pick is very flexible even if the merges happen between
branches. At least worth trying.

Regards,
Penghui


On Wed, Mar 20, 2024 at 9:38 PM Yunze Xu  wrote:

> > However, in async work, people should have more patience to read and
> write.
>
> I mean, it would be better to have something like "TL; DR". Anyway,
> I'd like to apply this change since the next feature release (3.3.0).
>
> Thanks,
> Yunze
>
> On Tue, Mar 19, 2024 at 12:10 AM Lari Hotari  wrote:
> >
> > Thanks for the comments, Yunze.
> >
> > On 2024/03/18 05:48:39 Yunze Xu wrote:
> > > I'm afraid many people don't have patience to read all the contents.
> >
> > I agree. However, in async work, people should have more patience to
> read and write. Synchronous meetings aren't a good solution either. The
> lack of patience could be caused by lack of interest. There's not a large
> group of people in our community that are interested in improving the
> maintenance strategy and also committed to invest their time and effort in
> these activities. I hope more people sign up to this type of efforts and
> show their interest and commitment in improving Apache Pulsar.
> >
> > > Here is my summary in short (please correct me if I'm wrong):
> > > - For bug fixes, the target branch should be branch-3.0. Once the PR
> > > is merged into branch-3.0, checkout the branch-3.x and run `git merge
> > > branch-3.0` and resolve the conflicts
> >
> > I didn't describe the details of how this is handle. It is different in
> practice.
> >
> > > - For features, the target branch should be branch-3.x
> >
> > New features would continue to go to master (or "main" if we decide to
> rename it). Bugs would be fixed in the branch where the feature containing
> the bug was introduced if it is missing from the LTS branch.
> >
> > > Since we introduced the LTS concept, I agree that we should make
> > > branch-3.0 as the default branch. Cherry-picking is a disaster when
> > > cherry-picks happen in the wrong order.
> >
> > Yes.
> >
> 

[DISCUSS] Release Pulsar C++ Client 3.5.1 and upgrade the verify process

2024-03-25 Thread Yunze Xu
Hi all,

Recently I found a regression [1] for the C++ client 3.5.0 (thanks to
the reminder from @shibd). So I will push a fix and then release the
C++ client 3.5.1.

However, this is not the 1st time that a regression was introduced,
see [2] for example. So I suggest when verifying the C++ client, we
can verify the Python and Node.js clients by upgrading the
dependencies as well. See the updated release process in [3].

[1] https://github.com/apache/pulsar-client-cpp/issues/420
[2] https://lists.apache.org/thread/rjolgrlp4x1lmfj678k3hjco80kcb73c
[3] 
https://github.com/apache/pulsar-client-cpp/wiki/Verify-the-candidate-release-in-your-local-env#verify-the-3rd-party-projects-that-depend-on-pulsar-c-client

Thanks,
Yunze


Re: [VOTE] Pulsar Client Python Release 3.5.0 Candidate 2

2024-03-25 Thread Yunze Xu
Cancel this release for the regression found in
https://github.com/apache/pulsar-client-cpp/issues/420.

I will prepare the fix and start the release for the C++ client 3.5.1.
Then I will continue the candidate 3.

Thanks,
Yunze

On Mon, Mar 25, 2024 at 3:48 PM PengHui Li  wrote:
>
> +1 (binding)
>
> - Checked the signature
> - Installed the wheel on macOS with Python 3.12
> - Run the consume and produce examples
>
> Regards,
> Penghui
>
> On Fri, Mar 22, 2024 at 11:55 PM Yunze Xu  wrote:
>
> > This is the 2nd release candidate for Apache Pulsar Client Python,
> > version 3.5.0.
> >
> > It fixes the following issues:
> > https://github.com/apache/pulsar-client-python/milestone/6?closed=1
> >
> > *** Please download, test and vote on this release. This vote will
> > stay open for at least 72 hours ***
> >
> > Python wheels:
> >
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-python-3.5.0-candidate-2/
> >
> > The supported python versions are 3.8, 3.9, 3.10, 3.11 and 3.12. The
> > supported platforms and architectures are:
> > - Windows x86_64 (windows/)
> > - glibc-based Linux x86_64 (linux-glibc-x86_64/)
> > - glibc-based Linux arm64 (linux-glibc-arm64/)
> > - musl-based Linux x86_64 (linux-musl-x86_64/)
> > - musl-based Linux arm64 (linux-musl-arm64/)
> > - macOS universal 2 (macos/)
> >
> > You can download the wheel (the `.whl` file) according to your own OS
> > and Python version
> > and install the wheel:
> > - Windows: `py -m pip install *.whl --force-reinstall`
> > - Linux or macOS: `python3 -m pip install *.whl --force-reinstall`
> >
> > The tag to be voted upon: v3.5.0-candidate-2
> > (730c2d7dea60ff632688463662a6101cacb98c22)
> >
> > https://github.com/apache/pulsar-client-python/releases/tag/v3.5.0-candidate-2
> >
> > Pulsar's KEYS file containing PGP keys you use to sign the release:
> > https://downloads.apache.org/pulsar/KEYS
> >
> > Please download the Python wheels and follow the README to test.
> >


Re: [VOTE] PIP-345: Optimize finding message by timestamp

2024-03-25 Thread PengHui Li
Hi, Jiuming

Yes, it's not a good one "ManagedLedger#getEarliestM
essagePublishTimeInBacklog"
and it should be the only one in the ManagedLedger to have a publish time
concept.
I think we mixed the concepts in https://github.com/apache/pulsar/pull/12523,
which is bad.
It's better to start a proposal to deprecate this method and change
existing implemetation.

> For finding message by timestamp, we can introduce `sparse index` to
Pulsar, after add entries complete, add a index to `ManagedLedgerIndex` and
store the index to ML. What do you think?

Yes, we can have different options. If users do not have too much data in
one Ledger (and it is configurable), It should be fine. We can just build
the index based on the Ledger's timestamp (the Ledger close time). By
default, it should be good for many use cases.

Since we have the ManagedLedgerIndex abstract, users can also develop their
own implementations
for extreme performance requirements. Just keep the Pulsar core more clear,
simple and work for most
common cases.

Regards,
Penghui


On Mon, Mar 25, 2024 at 5:47 PM 太上玄元道君  wrote:

> Hi Penghui,
>
> Thanks for your feedback!
>
> I'm not sure about this either, since publishTimestamp is a Messaging layer
> concept, and ML as a Persistence layer should not be aware about this.
>
> But in ML, I'd noticed some methods searching message by
> PublishTimestamp(say,
> ManagedLedgerImpl#getEarliestMessagePublishTimeInBacklog),
>  so that's why I want to add publishTimestamp to ML.
>
> Introduce secondary index to ML is a good idea, since RocketMQ has a `Hash
> index`, and Kakfa has a `Sparse index`.
>
> For finding message by timestamp, we can introduce `sparse index` to
> Pulsar, after add entries complete, add a index to `ManagedLedgerIndex` and
> store the index to ML. What do you think?
>
> Thanks,
> Tao Jiuming
>
>
>
> PengHui Li  于2024年3月25日周一 15:17写道:
>
> > Hi, Jiuming
> >
> > I'm sorry for not getting back to you sooner.
> >
> > First, I support the motivation to optimize this case because it could
> be a
> > significant
> > blocker for users who want infinite data retention, which is a BIG
> > differentiator
> > with Apache Kafka. And, I really saw the cases with high publish
> > throughput, and one
> > ledger could even hold 1M entries, 100M new entries published to a topic.
> >
> > Then, I try to check the details of the existing implementation. I think
> > the tricky part is
> > the publish time is not the concept of the ManageLedger. I saw the
> changes
> > that you
> > proposed will add publish time to the ManageLedger module, which doesn't
> > look good
> > me. Because it will couple the Pulsar concept with the ManageLedger
> > concept.
> >
> > Essentially, the publish time could be a secondary index of the
> > ManageLedger.
> > My opinion is to have a general ManagedLedgerIndex abstract, and the
> Pulsar
> > broker
> > can create any index it wants. Since the broker creates the index, the
> > broker can control the
> > index's behavior. Then, the ManageLedger can provide an API to search the
> > entry
> > with a ManagedLedgerIndex. With this option, we don't need to add the
> > publish
> > time concept to ManagedLedger directly.
> >
> > In this case, if the broker tries to search the entry with a predicate
> and
> > index. The managed
> > ledger will search from the index first. Of course, if the relevant entry
> > cannot be found in the index,
> > just fall back to the "optimized full scan".
> >
> > Regards,
> > Penghui
> >
> >
> > On Mon, Mar 25, 2024 at 11:51 AM 太上玄元道君  wrote:
> >
> > > bump
> > >
> > > 太上玄元道君 于2024年3月20日 周三16:23写道:
> > >
> > > > bump
> > > >
> > > > 太上玄元道君 于2024年3月19日 周二19:35写道:
> > > >
> > > >> Hi Pulsar community,
> > > >>
> > > >> This thread is to start a vote for PIP-345: Optimize finding message
> > by
> > > >> timestamp
> > > >>
> > > >> PIP: https://github.com/apache/pulsar/pull/22234
> > > >> Discuss thread:
> > > >> https://lists.apache.org/thread/5owc9os6wmy52zxbv07qo2jrfjm17hd2
> > > >>
> > > >> Thanks,
> > > >> Tao Jiuming
> > > >>
> > > >
> > >
> >
>


Re: [VOTE] PIP-345: Optimize finding message by timestamp

2024-03-25 Thread 太上玄元道君
Hi Penghui,

Thanks for your feedback!

I'm not sure about this either, since publishTimestamp is a Messaging layer
concept, and ML as a Persistence layer should not be aware about this.

But in ML, I'd noticed some methods searching message by
PublishTimestamp(say,
ManagedLedgerImpl#getEarliestMessagePublishTimeInBacklog),
 so that's why I want to add publishTimestamp to ML.

Introduce secondary index to ML is a good idea, since RocketMQ has a `Hash
index`, and Kakfa has a `Sparse index`.

For finding message by timestamp, we can introduce `sparse index` to
Pulsar, after add entries complete, add a index to `ManagedLedgerIndex` and
store the index to ML. What do you think?

Thanks,
Tao Jiuming



PengHui Li  于2024年3月25日周一 15:17写道:

> Hi, Jiuming
>
> I'm sorry for not getting back to you sooner.
>
> First, I support the motivation to optimize this case because it could be a
> significant
> blocker for users who want infinite data retention, which is a BIG
> differentiator
> with Apache Kafka. And, I really saw the cases with high publish
> throughput, and one
> ledger could even hold 1M entries, 100M new entries published to a topic.
>
> Then, I try to check the details of the existing implementation. I think
> the tricky part is
> the publish time is not the concept of the ManageLedger. I saw the changes
> that you
> proposed will add publish time to the ManageLedger module, which doesn't
> look good
> me. Because it will couple the Pulsar concept with the ManageLedger
> concept.
>
> Essentially, the publish time could be a secondary index of the
> ManageLedger.
> My opinion is to have a general ManagedLedgerIndex abstract, and the Pulsar
> broker
> can create any index it wants. Since the broker creates the index, the
> broker can control the
> index's behavior. Then, the ManageLedger can provide an API to search the
> entry
> with a ManagedLedgerIndex. With this option, we don't need to add the
> publish
> time concept to ManagedLedger directly.
>
> In this case, if the broker tries to search the entry with a predicate and
> index. The managed
> ledger will search from the index first. Of course, if the relevant entry
> cannot be found in the index,
> just fall back to the "optimized full scan".
>
> Regards,
> Penghui
>
>
> On Mon, Mar 25, 2024 at 11:51 AM 太上玄元道君  wrote:
>
> > bump
> >
> > 太上玄元道君 于2024年3月20日 周三16:23写道:
> >
> > > bump
> > >
> > > 太上玄元道君 于2024年3月19日 周二19:35写道:
> > >
> > >> Hi Pulsar community,
> > >>
> > >> This thread is to start a vote for PIP-345: Optimize finding message
> by
> > >> timestamp
> > >>
> > >> PIP: https://github.com/apache/pulsar/pull/22234
> > >> Discuss thread:
> > >> https://lists.apache.org/thread/5owc9os6wmy52zxbv07qo2jrfjm17hd2
> > >>
> > >> Thanks,
> > >> Tao Jiuming
> > >>
> > >
> >
>


Re: [VOTE] PIP-344: Correct the behavior of the public API pulsarClient.getPartitionsForTopic(topicName)

2024-03-25 Thread PengHui Li
Hi, Yubiao

It's better to list the names of the 3 bindings.

Thanks,
Penghui

On Mon, Mar 25, 2024 at 4:58 PM Yubiao Feng
 wrote:

> Close the vote with  3(binding).
>
> Thanks
> Yubiao Feng
>
> On Sat, Mar 16, 2024 at 6:28 AM Yubiao Feng 
> wrote:
>
> > Hi All
> >
> > This thread is to start a vote for PIP-344.
> >
> > PIP: https://github.com/apache/pulsar/pull/22182
> > Discussion thread:
> > https://lists.apache.org/thread/z693blcxoqk0mj0rzyt1k7nvy72j18t5
> >
> > Thanks
> > Yubiao Feng
> >
>


Re: [VOTE] PIP-344: Correct the behavior of the public API pulsarClient.getPartitionsForTopic(topicName)

2024-03-25 Thread Yubiao Feng
Close the vote with  3(binding).

Thanks
Yubiao Feng

On Sat, Mar 16, 2024 at 6:28 AM Yubiao Feng 
wrote:

> Hi All
>
> This thread is to start a vote for PIP-344.
>
> PIP: https://github.com/apache/pulsar/pull/22182
> Discussion thread:
> https://lists.apache.org/thread/z693blcxoqk0mj0rzyt1k7nvy72j18t5
>
> Thanks
> Yubiao Feng
>


Re: [RESULT] [VOTE] PIP-342: Support OpenTelemetry metrics in Pulsar client

2024-03-25 Thread PengHui Li
Sorry, I forgot to submit my PR review before. Just some minor comments
about the names. Please take a look.

Regards,
Penghui

On Fri, Mar 22, 2024 at 11:21 PM Matteo Merli 
wrote:

> Closing this vote with 4 binding and 4 non-binding +1s
>
> Binding +1s:
>  * Lari
>  * Mattison
>  * PengHui
>  * Matteo
>
> Non-Binding +1s:
>  * Dao Jun
>  * Apurva
>  * Asaf
>  * Zixuan
>
>
> Thanks,
> Matteo
>
>
> --
> Matteo Merli
> 
>
>
> On Thu, Mar 14, 2024 at 11:54 PM Zixuan Liu  wrote:
>
> > +1 (non-binding)
> >
> > Thanks,
> > Zixuan
> >
> > PengHui Li  于2024年3月15日周五 09:47写道:
> >
> > > +1 (binding)
> > >
> > > Regards,
> > > Penghui
> > >
> > > On Fri, Mar 15, 2024 at 2:32 AM Asaf Mesika 
> > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > On Thu, Mar 14, 2024 at 8:29 PM Apurva Telang <
> > apurvatelan...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > On Thu, Mar 14, 2024 at 2:12 AM mattison chao <
> > mattisonc...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > Best,
> > > > > > Mattison
> > > > > > On Mar 14, 2024 at 15:55 +0800, Lari Hotari  >,
> > > > wrote:
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > -Lari
> > > > > > >
> > > > > > > On Thu, 14 Mar 2024 at 03:45, Matteo Merli <
> > matteo.me...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > PIP: https://github.com/apache/pulsar/pull/22178
> > > > > > > >
> > > > > > > > WIP PR: https://github.com/apache/pulsar/pull/22179
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Matteo Merli
> > > > > > > > 
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Apurva Telang.
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Pulsar Client Python Release 3.5.0 Candidate 2

2024-03-25 Thread PengHui Li
+1 (binding)

- Checked the signature
- Installed the wheel on macOS with Python 3.12
- Run the consume and produce examples

Regards,
Penghui

On Fri, Mar 22, 2024 at 11:55 PM Yunze Xu  wrote:

> This is the 2nd release candidate for Apache Pulsar Client Python,
> version 3.5.0.
>
> It fixes the following issues:
> https://github.com/apache/pulsar-client-python/milestone/6?closed=1
>
> *** Please download, test and vote on this release. This vote will
> stay open for at least 72 hours ***
>
> Python wheels:
>
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-python-3.5.0-candidate-2/
>
> The supported python versions are 3.8, 3.9, 3.10, 3.11 and 3.12. The
> supported platforms and architectures are:
> - Windows x86_64 (windows/)
> - glibc-based Linux x86_64 (linux-glibc-x86_64/)
> - glibc-based Linux arm64 (linux-glibc-arm64/)
> - musl-based Linux x86_64 (linux-musl-x86_64/)
> - musl-based Linux arm64 (linux-musl-arm64/)
> - macOS universal 2 (macos/)
>
> You can download the wheel (the `.whl` file) according to your own OS
> and Python version
> and install the wheel:
> - Windows: `py -m pip install *.whl --force-reinstall`
> - Linux or macOS: `python3 -m pip install *.whl --force-reinstall`
>
> The tag to be voted upon: v3.5.0-candidate-2
> (730c2d7dea60ff632688463662a6101cacb98c22)
>
> https://github.com/apache/pulsar-client-python/releases/tag/v3.5.0-candidate-2
>
> Pulsar's KEYS file containing PGP keys you use to sign the release:
> https://downloads.apache.org/pulsar/KEYS
>
> Please download the Python wheels and follow the README to test.
>


Re: Suggestions on GitHub labels and issue templates

2024-03-25 Thread PengHui Li
The labels updates:

- Removed the `java` label. We only have a few legacy PRs labeled with
`java`.
- Changed `component/*` to `area/*`
- Deprecated `question` label
- Changed `PIP` to `type/PIP`
- Changed `flaky-test` to `type/flaky-test`

On Mon, Mar 25, 2024 at 3:17 PM PengHui Li  wrote:

> Yes, the PR is welcome.
>
> Best,
> Penghui
>
> On Mon, Mar 25, 2024 at 3:08 PM Kiryl Valkovich 
> wrote:
>
>> Hi PengHui,
>> Sure. If the PR is welcome here, I’ll submit it in a few days.
>>
>>
>> Best,
>> Kiryl
>>
>> > On Mar 25, 2024, at 6:07 AM, PengHui Li  wrote:
>> >
>> > Hi Kiryl,
>> >
>> > Thanks for your suggestions, and they are looking good to me
>> > I'll follow your suggestions on renaming or deprecating the labels.
>> >
>> > For the label automation, do you want to push a PR to add it?
>> >
>> > Regards,
>> > Penghui
>> >
>> >
>> > On Mon, Mar 18, 2024 at 9:39 PM Kiryl Valkovich 
>> > wrote:
>> >
>> >> Comment with better formatting on GitHub:
>> >> https://github.com/apache/pulsar/issues/22277#issuecomment-2002553745
>> >>
>> >>• Deprecate java label. Pulsar is written in Java and most PRs
>> update
>> >> Java code.
>> >>• Instead of removing labels, deprecate them by renaming them to
>> >> deprecated/. Probably pick another prefix that is
>> >> alphabetically closer to the end of the alphabet to reduce noise.
>> >>• Add go label automatically using labeler:
>> >> https://github.com/apache/pulsar/blob/master/.github/labeler.yml
>> >> go:
>> >> - changed-files:
>> >> - any-glob-to-any-file: '**/*.go'
>> >>
>> >>• Add component/* labels automatically based on the file path
>> >> component/config:
>> >> - changed-files:
>> >> - any-glob-to-any-file: 'conf/**/*'
>> >> - any-glob-to-any-file: 'pulsar-config-validation/**/*'
>> >> component/client:
>> >> - changed-files:
>> >> - any-glob-to-any-file: 'pulsar-client/**/*'
>> >> - any-glob-to-any-file: 'pulsar-client-*/**/*'
>> >> ...
>> >>
>> >>• Rename bug label to type/bug for consistency. Keep the red color.
>> >>• (?) Rename component/* => area/* for shorter names. The
>> >> https://github.com/kubernetes/kubernetes/labels has such naming.
>> >>• Rename doc-required label to type/doc. Relabel open issues and PRs
>> >> with doc labels to the type/doc.
>> >>• Deprecate all other doc-* labels. If it is needed for some kind of
>> >> workflow, simply use the board project with ToDo -> In Progress -> Done
>> >> states.
>> >>• (?) Probably it makes sense to enable and track website and docs
>> >> issues in apache/pulsar-site repository. And add a good visible link to
>> >> apache/pulsar README.md.
>> >>• Deprecate the question label. Instead, move such issues to
>> >> Discussions -> Q
>> >>• Migrate issues with the enhancement either to type/feature label
>> or
>> >> Discussions. Add a new Suggest an idea issue template that redirects
>> to the
>> >> Discussions -> Ideas
>> >>• (?) Rename PIP => type/PIP for consistency
>> >>• Rename flaky-test => type/flaky-test to consistency
>> >>• Deprecate lifecycle/stale label. Use Stale instead. Rename Stale
>> =>
>> >> stale for consistency.
>> >>• Add the ability to pick an area/* label from the dropdown on issue
>> >> creation.
>> >> systemd/systemd and a few other projects use this action for that:
>> >>
>> https://github.com/redhat-plumbers-in-action/advanced-issue-labeler?tab=readme-ov-file#real-life-examples
>> >>
>> >>
>> >> Best,
>> >> Kiryl
>> >>
>> >>
>>
>>


Re: Suggestions on GitHub labels and issue templates

2024-03-25 Thread PengHui Li
Yes, the PR is welcome.

Best,
Penghui

On Mon, Mar 25, 2024 at 3:08 PM Kiryl Valkovich 
wrote:

> Hi PengHui,
> Sure. If the PR is welcome here, I’ll submit it in a few days.
>
>
> Best,
> Kiryl
>
> > On Mar 25, 2024, at 6:07 AM, PengHui Li  wrote:
> >
> > Hi Kiryl,
> >
> > Thanks for your suggestions, and they are looking good to me
> > I'll follow your suggestions on renaming or deprecating the labels.
> >
> > For the label automation, do you want to push a PR to add it?
> >
> > Regards,
> > Penghui
> >
> >
> > On Mon, Mar 18, 2024 at 9:39 PM Kiryl Valkovich 
> > wrote:
> >
> >> Comment with better formatting on GitHub:
> >> https://github.com/apache/pulsar/issues/22277#issuecomment-2002553745
> >>
> >>• Deprecate java label. Pulsar is written in Java and most PRs update
> >> Java code.
> >>• Instead of removing labels, deprecate them by renaming them to
> >> deprecated/. Probably pick another prefix that is
> >> alphabetically closer to the end of the alphabet to reduce noise.
> >>• Add go label automatically using labeler:
> >> https://github.com/apache/pulsar/blob/master/.github/labeler.yml
> >> go:
> >> - changed-files:
> >> - any-glob-to-any-file: '**/*.go'
> >>
> >>• Add component/* labels automatically based on the file path
> >> component/config:
> >> - changed-files:
> >> - any-glob-to-any-file: 'conf/**/*'
> >> - any-glob-to-any-file: 'pulsar-config-validation/**/*'
> >> component/client:
> >> - changed-files:
> >> - any-glob-to-any-file: 'pulsar-client/**/*'
> >> - any-glob-to-any-file: 'pulsar-client-*/**/*'
> >> ...
> >>
> >>• Rename bug label to type/bug for consistency. Keep the red color.
> >>• (?) Rename component/* => area/* for shorter names. The
> >> https://github.com/kubernetes/kubernetes/labels has such naming.
> >>• Rename doc-required label to type/doc. Relabel open issues and PRs
> >> with doc labels to the type/doc.
> >>• Deprecate all other doc-* labels. If it is needed for some kind of
> >> workflow, simply use the board project with ToDo -> In Progress -> Done
> >> states.
> >>• (?) Probably it makes sense to enable and track website and docs
> >> issues in apache/pulsar-site repository. And add a good visible link to
> >> apache/pulsar README.md.
> >>• Deprecate the question label. Instead, move such issues to
> >> Discussions -> Q
> >>• Migrate issues with the enhancement either to type/feature label or
> >> Discussions. Add a new Suggest an idea issue template that redirects to
> the
> >> Discussions -> Ideas
> >>• (?) Rename PIP => type/PIP for consistency
> >>• Rename flaky-test => type/flaky-test to consistency
> >>• Deprecate lifecycle/stale label. Use Stale instead. Rename Stale =>
> >> stale for consistency.
> >>• Add the ability to pick an area/* label from the dropdown on issue
> >> creation.
> >> systemd/systemd and a few other projects use this action for that:
> >>
> https://github.com/redhat-plumbers-in-action/advanced-issue-labeler?tab=readme-ov-file#real-life-examples
> >>
> >>
> >> Best,
> >> Kiryl
> >>
> >>
>
>


Re: [VOTE] PIP-345: Optimize finding message by timestamp

2024-03-25 Thread PengHui Li
Hi, Jiuming

I'm sorry for not getting back to you sooner.

First, I support the motivation to optimize this case because it could be a
significant
blocker for users who want infinite data retention, which is a BIG
differentiator
with Apache Kafka. And, I really saw the cases with high publish
throughput, and one
ledger could even hold 1M entries, 100M new entries published to a topic.

Then, I try to check the details of the existing implementation. I think
the tricky part is
the publish time is not the concept of the ManageLedger. I saw the changes
that you
proposed will add publish time to the ManageLedger module, which doesn't
look good
me. Because it will couple the Pulsar concept with the ManageLedger concept.

Essentially, the publish time could be a secondary index of the
ManageLedger.
My opinion is to have a general ManagedLedgerIndex abstract, and the Pulsar
broker
can create any index it wants. Since the broker creates the index, the
broker can control the
index's behavior. Then, the ManageLedger can provide an API to search the
entry
with a ManagedLedgerIndex. With this option, we don't need to add the
publish
time concept to ManagedLedger directly.

In this case, if the broker tries to search the entry with a predicate and
index. The managed
ledger will search from the index first. Of course, if the relevant entry
cannot be found in the index,
just fall back to the "optimized full scan".

Regards,
Penghui


On Mon, Mar 25, 2024 at 11:51 AM 太上玄元道君  wrote:

> bump
>
> 太上玄元道君 于2024年3月20日 周三16:23写道:
>
> > bump
> >
> > 太上玄元道君 于2024年3月19日 周二19:35写道:
> >
> >> Hi Pulsar community,
> >>
> >> This thread is to start a vote for PIP-345: Optimize finding message by
> >> timestamp
> >>
> >> PIP: https://github.com/apache/pulsar/pull/22234
> >> Discuss thread:
> >> https://lists.apache.org/thread/5owc9os6wmy52zxbv07qo2jrfjm17hd2
> >>
> >> Thanks,
> >> Tao Jiuming
> >>
> >
>


Re: Suggestions on GitHub labels and issue templates

2024-03-25 Thread Kiryl Valkovich
Hi PengHui,
Sure. If the PR is welcome here, I’ll submit it in a few days.


Best,
Kiryl

> On Mar 25, 2024, at 6:07 AM, PengHui Li  wrote:
> 
> Hi Kiryl,
> 
> Thanks for your suggestions, and they are looking good to me
> I'll follow your suggestions on renaming or deprecating the labels.
> 
> For the label automation, do you want to push a PR to add it?
> 
> Regards,
> Penghui
> 
> 
> On Mon, Mar 18, 2024 at 9:39 PM Kiryl Valkovich 
> wrote:
> 
>> Comment with better formatting on GitHub:
>> https://github.com/apache/pulsar/issues/22277#issuecomment-2002553745
>> 
>>• Deprecate java label. Pulsar is written in Java and most PRs update
>> Java code.
>>• Instead of removing labels, deprecate them by renaming them to
>> deprecated/. Probably pick another prefix that is
>> alphabetically closer to the end of the alphabet to reduce noise.
>>• Add go label automatically using labeler:
>> https://github.com/apache/pulsar/blob/master/.github/labeler.yml
>> go:
>> - changed-files:
>> - any-glob-to-any-file: '**/*.go'
>> 
>>• Add component/* labels automatically based on the file path
>> component/config:
>> - changed-files:
>> - any-glob-to-any-file: 'conf/**/*'
>> - any-glob-to-any-file: 'pulsar-config-validation/**/*'
>> component/client:
>> - changed-files:
>> - any-glob-to-any-file: 'pulsar-client/**/*'
>> - any-glob-to-any-file: 'pulsar-client-*/**/*'
>> ...
>> 
>>• Rename bug label to type/bug for consistency. Keep the red color.
>>• (?) Rename component/* => area/* for shorter names. The
>> https://github.com/kubernetes/kubernetes/labels has such naming.
>>• Rename doc-required label to type/doc. Relabel open issues and PRs
>> with doc labels to the type/doc.
>>• Deprecate all other doc-* labels. If it is needed for some kind of
>> workflow, simply use the board project with ToDo -> In Progress -> Done
>> states.
>>• (?) Probably it makes sense to enable and track website and docs
>> issues in apache/pulsar-site repository. And add a good visible link to
>> apache/pulsar README.md.
>>• Deprecate the question label. Instead, move such issues to
>> Discussions -> Q
>>• Migrate issues with the enhancement either to type/feature label or
>> Discussions. Add a new Suggest an idea issue template that redirects to the
>> Discussions -> Ideas
>>• (?) Rename PIP => type/PIP for consistency
>>• Rename flaky-test => type/flaky-test to consistency
>>• Deprecate lifecycle/stale label. Use Stale instead. Rename Stale =>
>> stale for consistency.
>>• Add the ability to pick an area/* label from the dropdown on issue
>> creation.
>> systemd/systemd and a few other projects use this action for that:
>> https://github.com/redhat-plumbers-in-action/advanced-issue-labeler?tab=readme-ov-file#real-life-examples
>> 
>> 
>> Best,
>> Kiryl
>> 
>>