Re: [DISCUSS] Broken builds and CI Failures in Maintenance Branches; improving maintenance strategy to address root causes

2024-03-18 Thread Lari Hotari
Thanks for the comments, Yunze.

On 2024/03/18 05:48:39 Yunze Xu wrote:
> I'm afraid many people don't have patience to read all the contents.

I agree. However, in async work, people should have more patience to read and 
write. Synchronous meetings aren't a good solution either. The lack of patience 
could be caused by lack of interest. There's not a large group of people in our 
community that are interested in improving the maintenance strategy and also 
committed to invest their time and effort in these activities. I hope more 
people sign up to this type of efforts and show their interest and commitment 
in improving Apache Pulsar.

> Here is my summary in short (please correct me if I'm wrong):
> - For bug fixes, the target branch should be branch-3.0. Once the PR
> is merged into branch-3.0, checkout the branch-3.x and run `git merge
> branch-3.0` and resolve the conflicts

I didn't describe the details of how this is handle. It is different in 
practice.

> - For features, the target branch should be branch-3.x

New features would continue to go to master (or "main" if we decide to rename 
it). Bugs would be fixed in the branch where the feature containing the bug was 
introduced if it is missing from the LTS branch.

> Since we introduced the LTS concept, I agree that we should make
> branch-3.0 as the default branch. Cherry-picking is a disaster when
> cherry-picks happen in the wrong order.

Yes. 

-Lari

On 2024/03/18 05:48:39 Yunze Xu wrote:
> I'm afraid many people don't have patience to read all the contents.
> Here is my summary in short (please correct me if I'm wrong):
> - For bug fixes, the target branch should be branch-3.0. Once the PR
> is merged into branch-3.0, checkout the branch-3.x and run `git merge
> branch-3.0` and resolve the conflicts
> - For features, the target branch should be branch-3.x
> 
> Since we introduced the LTS concept, I agree that we should make
> branch-3.0 as the default branch. Cherry-picking is a disaster when
> cherry-picks happen in the wrong order.
> 
> 
> Thanks,
> Yunze
> 
> On Tue, Mar 5, 2024 at 8:38 PM Lari Hotari  wrote:
> >
> > To enhance our maintenance processes, I've created a guide for
> > configuring "git mergetool" to resolve merge conflicts:
> >
> > https://pulsar.apache.org/contribute/setup-mergetool/
> >
> > For Apache Pulsar core developers, managing git merge conflict
> > resolution is a necessary task. To streamline this process, it's crucial
> > to set up tools that aid in visualizing and resolving these conflicts.
> >
> > I encourage you to follow the guide to set up a git mergetool. Your
> > feedback is valuable, and you're welcome to contribute improvements
> > directly to the website. You can do this by creating a PR by editing
> > https://github.com/apache/pulsar-site/edit/main/contribute/setup-mergetool.md
> > directly in your browser.
> >
> > -Lari
> >
> > On 2024/03/01 14:01:55 Lari Hotari wrote:
> > > Dear Pulsar Community,
> > >
> > > As we prepare for new releases in our maintenance branches, we have once
> > > again encountered issues with our cherry-picking process. Some of our
> > > maintenance branches are currently broken or were recently broken,
> > > containing compilation errors or failing tests. Many have encountered
> > > these issues, as we have seen new PRs come in to address the
> > > problems. The compilation problems are already being addressed by
> > > Heesung (release manager for 3.0.3) and myself. We aim to resolve these
> > > issues as soon as possible. Please join #dev channel on Apache Pulsar
> > > Slack to collaborate in real time to help with this and get updates.
> > >
> > > The cherry-picking process has always been problematic and lacks clear
> > > documentation in Apache Pulsar. This often leads to our maintenance
> > > branches breaking, especially as we approach release dates and begin
> > > cherry-picking fixes. This recurring issue has been the subject of
> > > multiple discussions over the years. The "feature freeze" in the release
> > > process does not mitigate the key problem with the cherry-picking
> > > approach.
> > >
> > > Furthermore, the cherry-picking process is mostly based on tribal
> > > knowledge and lacks clear documentation. I have previously expressed my
> > > concerns about this on the mailing list in this thread:
> > > https://lists.apache.org/thread/69mwjso51kzkrv5xgdmw04d9wngbg8br
> > >
> > > Many problems with cherry-picking arise because cherry-picks occur in
> > > the wrong order, or dependent changes are not picked. Some dependent
> > > changes shouldn't be picked since when we have made bug fixes in the
> > > master branch, it can already contain changes for new features that
> > > shouldn't be applied to maintenance branches. In those cases
> > > a backport of the fix is needed and the original developer of the
> > > PR might not be available to do this and there could be a significant
> > > delay for the release if delivering the backport takes time.
> > >
> > > 

Re: [DISCUSS] Consistent code style (esp. ws/indent) and autotools

2024-03-18 Thread Lari Hotari
Making a change to formatting is a broad change that cause more harm than 
benefit to the Pulsar project currently. 

We can reconsider after the maintenance strategy has been revisited.
The discussion about the maintenance strategy is in a separate thread [1].
Even with a different type of maintenance strategy, formatting changes could be 
just an unnecessary distraction. I'd rather put the effort on real refactoring 
improvements instead of spending a few months on whitespace changes.

-Lari

1 - https://lists.apache.org/thread/j6qvt45rndnvjypcmqxsfmddqt41bxjv

On 2024/03/18 09:27:57 Zixuan Liu wrote:
> Top
> 
> tison  于2023年6月30日周五 09:58写道:
> 
> > Hi,
> >
> > I'd like to propose applying a consistent code style (especially whitespace
> > and indent) with an autotool Spotless.
> >
> > // Background
> >
> > Over and over we argue contributors reformat their patch manually for
> > checkstyle violations, or even whitespace changes that are not detected by
> > checkstyle. [1]
> >
> > A common reason is that such style-only changes increase the burden to do
> > cherry-pick if a later bug fix is made around the code while we often do
> > not pick the style change barely or even it's coupled with an unpickable
> > commit.
> >
> > CheckStyle helps in some cases while we don't have a strong rule set nor
> > even apply it to tests (porting bug fix actually include the test).
> >
> > Manually fixing style violations can be boring and even annoying. That's
> > why it's effectively "suppressed" in mind.
> >
> > An autotool to do the formatting work can help and here comes Spotless[2].
> > Flink and Curator use it and Flink actually migrated from CheckStyle to
> > Spotless for its more strict consistent rule set and oneliner format. See
> > their original discussion for the context[3].
> >
> > // Proprosal
> >
> > Using Spotless as the auto-formatting tool in all around apache/pulsar.
> > Since it's an auto tool I can cover the task solely.
> >
> > // Concerns
> >
> > 1. Won't it be yet another noise to the codebase?
> >
> > Yes and no. I suggest we pick this change to all maintained versions so
> > that all of them align with the consistent style. Then from now on, no more
> > style conflict.
> >
> > Flink used the same strategy[4] and even we can do it starting from 2.10 as
> > Lari proposed to apply commits from the least recent maintained version. I
> > understand the maintained versions are: 2.10, 2.11, 3.0, and master.
> >
> > 2. Can it cover all the rule sets we use in CheckStyle now?
> >
> > Let's say we almost don't care what the style is but it's consistent and
> > "not awful".
> >
> > The default Google style takes line length = 80 which can be a
> > disappointment so in Curator I use the palantir style[5] which takes line
> > length = 120 which should keep consistent with current settings.
> >
> > A downside is that Spotless Maven plugin cannot detect "forbidden imports"
> > where we can still use CheckStyle[6] - this is a low-traffic path and it
> > should be fine.
> >
> > // Implementation
> >
> > 1. Introduce Spotless in the project and apply it around the codebase, for
> > all maintained versions.
> > 2. Remove the then redundant CheckStyle rules, while retaining the
> > forbidden imports part as it's still useful.
> >
> > Looking forward to your feedback!
> >
> > Best,
> > tison.
> >
> > [1]
> >
> > https://github.com/apache/pulsar/pull/20642/files#diff-cb9b09b67f54fccdd5155dbbcedd50970ee93d9ee85ade1e6b6984cab64dab5d
> > [2] https://github.com/diffplug/spotless
> > [3] https://lists.apache.org/thread/3kjkcz9gj6f8j477d1t3gnbkl61hsb7z
> > [4] https://lists.apache.org/thread/nxdm0pg84q913w0kxszm502myqcg3db0
> > [5] https://github.com/palantir/palantir-java-format
> > [6] https://issues.apache.org/jira/browse/FLINK-32154
> >
> 


Suggestions on GitHub labels and issue templates

2024-03-18 Thread Kiryl Valkovich
Comment with better formatting on GitHub: 
https://github.com/apache/pulsar/issues/22277#issuecomment-2002553745

• Deprecate java label. Pulsar is written in Java and most PRs update Java 
code.
• Instead of removing labels, deprecate them by renaming them to 
deprecated/. Probably pick another prefix that is alphabetically 
closer to the end of the alphabet to reduce noise.
• Add go label automatically using labeler: 
https://github.com/apache/pulsar/blob/master/.github/labeler.yml
go:
- changed-files:
- any-glob-to-any-file: '**/*.go'

• Add component/* labels automatically based on the file path
component/config:
- changed-files:
- any-glob-to-any-file: 'conf/**/*'
- any-glob-to-any-file: 'pulsar-config-validation/**/*'
component/client:
- changed-files:
- any-glob-to-any-file: 'pulsar-client/**/*'
- any-glob-to-any-file: 'pulsar-client-*/**/*'
...

• Rename bug label to type/bug for consistency. Keep the red color.
• (?) Rename component/* => area/* for shorter names. The 
https://github.com/kubernetes/kubernetes/labels has such naming.
• Rename doc-required label to type/doc. Relabel open issues and PRs with 
doc labels to the type/doc.
• Deprecate all other doc-* labels. If it is needed for some kind of 
workflow, simply use the board project with ToDo -> In Progress -> Done states.
• (?) Probably it makes sense to enable and track website and docs issues 
in apache/pulsar-site repository. And add a good visible link to apache/pulsar 
README.md.
• Deprecate the question label. Instead, move such issues to Discussions -> 
Q
• Migrate issues with the enhancement either to type/feature label or 
Discussions. Add a new Suggest an idea issue template that redirects to the 
Discussions -> Ideas
• (?) Rename PIP => type/PIP for consistency
• Rename flaky-test => type/flaky-test to consistency
• Deprecate lifecycle/stale label. Use Stale instead. Rename Stale => stale 
for consistency.
• Add the ability to pick an area/* label from the dropdown on issue 
creation.
systemd/systemd and a few other projects use this action for that: 
https://github.com/redhat-plumbers-in-action/advanced-issue-labeler?tab=readme-ov-file#real-life-examples


Best,
Kiryl



[DISCUSS] Improving Pulsar broker cache design

2024-03-18 Thread Lari Hotari
Hi all,

I'd like to start a discussion about improving Pulsar broker cache design.

In the Pulsar broker, there are two main scenarios for message
dispatching to consumers: tailing (hot) reads and catch-up (cold)
reads. In both scenarios consumers can be fast or slow which also
impacts the scenarios.
The Pulsar broker contains a cache for handling tailing reads. This
cache was extended to handle catch-up reads with PR #12258 (+other
follow-up PRs) in Pulsar 2.11.
This cache is referred to as the "broker cache" or more specifically,
the "managed ledger cache".

The issue "Slow backlog draining with high fanout topics"
https://github.com/apache/pulsar/issues/12257 describes a scenario why
the caching was extended to handle catch-up reads (PR #12258, in
Pulsar 2.11):

"When there are topics with a high publish rate and high fan-out, with
a large number of subscriptions or a large number of replicators, and
a backlog is built, it becomes really difficult to drain the backlog
for such topics while they are serving a high publish rate. This
problem can be reproduced with multiple topics (100), each with 10K
writes, 6 backlog replicators, and multiple subscriptions. It becomes
impossible to drain backlogs even if most of the cursors are draining
a similar backlog because a large number of cold-reads from the bookie
makes the overall backlog draining slower. Even the bookkeeper cache
is not able to keep up by caching entries and eventually has to do
cold-reads." (edited)

The problem described above could also occur in other cases. Under
heavy load, a high fan-out catch-up read could increase the load of
the system and it could cause a cascading failure which results in
partial outages and backlogs that could take hours to recover from. In
many cases, this could be avoided with a more optimal broker cache
implementation.

Optimizations in the broker cache are crucial for improving Pulsar
performance. Unlike Kafka, the Pulsar broker doesn't have access to
local files and cannot leverage the Linux page cache for efficient
caching.
The cache implementation in the Pulsar broker should be intelligent to
avoid unnecessary load on the system. However, there have been very
few improvements and focus on improving the Pulsar broker cache.

For most production configurations, understanding the tuning of the
broker cache is necessary. However, we don't have much documentation
in the Apache Pulsar project. I noticed this in 2022 when I was
preparing my talk "Performance tuning for Apache Pulsar Kubernetes
deployments" for ApacheCon 2022 (slides [1], talk [2]). The situation
hasn't improved since then in the project documentation. There are
better performance-related tutorials than my talk available, for
example, "Apache Pulsar Troubleshooting Performance Issues" by Hang
Chen [3] and "Apache Pulsar Troubleshooting backlog Issues" by Penghui
Li [4]. It would be great if this type of content were summarized in
the Apache Pulsar project documentation. It can be hard to find the
relevant information when it is needed.

The main intention of this email is not about improving documentation.
It is to highlight that the current implementation could be
significantly improved.
The broker cache should be designed in such a way that it minimizes
cache misses and maximizes cache hits within the available broker
cache memory.
I believe this can be achieved by developing an algorithm that
incorporates an optimization model for making optimal caching
decisions.

In addition to a better caching model and algorithm, there are
opportunities to leverage rate limiting to improve caching for both
tailing reads and catch-up reads.
For instance, instead of rate limiting individual consumers, consumers
could be dynamically grouped in a way where the speed at which the
group moves forward could be rate limited so that cache hits are
maximized and consumer side speed difference wouldn't cause the
consumers to fall out of the cached range. When consumers are catching
up to the tail, it doesn't always make sense to impose a rate limit
for an individual consumer if the entries to read are already cached,
as rate limiting would only increase the cache size and increase the
chance that the consumer falls out of the cached range.

Would anyone else be interested in contributing to the design and
improvement of the Pulsar broker cache?
It would also be useful to hear about real user experiences with
current problems in high fan-out scenarios in Pulsar.

- Lari

1 - 
https://www.apachecon.com/acna2022/slides/03_Hotari_Lari_Performance_tuning_Pulsar.pdf
2 - https://www.youtube.com/watch?v=WkdfILAx-4c
3 - https://www.youtube.com/watch?v=8_4bVctj2_E
4 - https://www.youtube.com/watch?v=17jQIOVeu4s


Re: [DISCUSS] Consistent code style (esp. ws/indent) and autotools

2024-03-18 Thread Zixuan Liu
Top

tison  于2023年6月30日周五 09:58写道:

> Hi,
>
> I'd like to propose applying a consistent code style (especially whitespace
> and indent) with an autotool Spotless.
>
> // Background
>
> Over and over we argue contributors reformat their patch manually for
> checkstyle violations, or even whitespace changes that are not detected by
> checkstyle. [1]
>
> A common reason is that such style-only changes increase the burden to do
> cherry-pick if a later bug fix is made around the code while we often do
> not pick the style change barely or even it's coupled with an unpickable
> commit.
>
> CheckStyle helps in some cases while we don't have a strong rule set nor
> even apply it to tests (porting bug fix actually include the test).
>
> Manually fixing style violations can be boring and even annoying. That's
> why it's effectively "suppressed" in mind.
>
> An autotool to do the formatting work can help and here comes Spotless[2].
> Flink and Curator use it and Flink actually migrated from CheckStyle to
> Spotless for its more strict consistent rule set and oneliner format. See
> their original discussion for the context[3].
>
> // Proprosal
>
> Using Spotless as the auto-formatting tool in all around apache/pulsar.
> Since it's an auto tool I can cover the task solely.
>
> // Concerns
>
> 1. Won't it be yet another noise to the codebase?
>
> Yes and no. I suggest we pick this change to all maintained versions so
> that all of them align with the consistent style. Then from now on, no more
> style conflict.
>
> Flink used the same strategy[4] and even we can do it starting from 2.10 as
> Lari proposed to apply commits from the least recent maintained version. I
> understand the maintained versions are: 2.10, 2.11, 3.0, and master.
>
> 2. Can it cover all the rule sets we use in CheckStyle now?
>
> Let's say we almost don't care what the style is but it's consistent and
> "not awful".
>
> The default Google style takes line length = 80 which can be a
> disappointment so in Curator I use the palantir style[5] which takes line
> length = 120 which should keep consistent with current settings.
>
> A downside is that Spotless Maven plugin cannot detect "forbidden imports"
> where we can still use CheckStyle[6] - this is a low-traffic path and it
> should be fine.
>
> // Implementation
>
> 1. Introduce Spotless in the project and apply it around the codebase, for
> all maintained versions.
> 2. Remove the then redundant CheckStyle rules, while retaining the
> forbidden imports part as it's still useful.
>
> Looking forward to your feedback!
>
> Best,
> tison.
>
> [1]
>
> https://github.com/apache/pulsar/pull/20642/files#diff-cb9b09b67f54fccdd5155dbbcedd50970ee93d9ee85ade1e6b6984cab64dab5d
> [2] https://github.com/diffplug/spotless
> [3] https://lists.apache.org/thread/3kjkcz9gj6f8j477d1t3gnbkl61hsb7z
> [4] https://lists.apache.org/thread/nxdm0pg84q913w0kxszm502myqcg3db0
> [5] https://github.com/palantir/palantir-java-format
> [6] https://issues.apache.org/jira/browse/FLINK-32154
>