Re: [DISSCUSS] Don't retain null-key messages during topic compaction

2023-11-07 Thread Enrico Olivelli
This is a good point.

Is it worth to make it configurable per topic and per namespace?

Even if this behavior is kind if a side effect or bug, maybe some
applications rely on it.
Flags that change the behavior per cluster are hard to adopt in nig cluster
with many tenants.

We should always take multi tenancy in mind when we design features or
changes.


Thanks
Enrico

Il Mer 8 Nov 2023, 07:55 Cong Zhao  ha scritto:

> Hello everyone,
>
> I open a PIP-318: https://github.com/apache/pulsar/pull/21541 for this
> discussion.
>
> Any feedback and suggestions are welcome.
>
> Thanks,
> Cong Zhao
>
> On 2023/11/07 02:55:52 Cong Zhao wrote:
> > Hi, Pulsar community
> >
> > Currently, we retain all null-key messages during topic compaction,
> which I
> > don't think is necessary because when you use topic compaction,
> > it means that you want to retain the value according to the key, so
> > retaining null-key messages is meaningless.
> >
> > Additionally, retaining all null-key messages will double the storage
> cost,
> > and we'll never be able to clean them up since the compacted topic has
> not
> > supported the retention policy yet.
> >
> > In summary, I don't think we should retain null-key messages during topic
> > compaction.
> > Looking forward to your feedback!
> >
> > Thanks,
> > Cong Zhao
> >
>


Re: [DISSCUSS] Don't retain null-key messages during topic compaction

2023-11-07 Thread Cong Zhao
Hello everyone,

I open a PIP-318: https://github.com/apache/pulsar/pull/21541 for this 
discussion.

Any feedback and suggestions are welcome.

Thanks,
Cong Zhao

On 2023/11/07 02:55:52 Cong Zhao wrote:
> Hi, Pulsar community
> 
> Currently, we retain all null-key messages during topic compaction, which I
> don't think is necessary because when you use topic compaction,
> it means that you want to retain the value according to the key, so
> retaining null-key messages is meaningless.
> 
> Additionally, retaining all null-key messages will double the storage cost,
> and we'll never be able to clean them up since the compacted topic has not
> supported the retention policy yet.
> 
> In summary, I don't think we should retain null-key messages during topic
> compaction.
> Looking forward to your feedback!
> 
> Thanks,
> Cong Zhao
> 


Re: [VOTE] Pulsar Client C++ Release 3.4.0 Candidate 2

2023-11-07 Thread Yubiao Feng
+1 (no-binding)

Thanks
Yubiao Feng

On Tue, Nov 7, 2023 at 3:03 PM Yunze Xu  wrote:

> This is the second release candidate for Apache Pulsar Client C++,
> version 3.4.0.
>
> It fixes the following issues:
> https://github.com/apache/pulsar-client-cpp/milestone/5?closed=1
>
> *** Please download, test and vote on this release. This vote will stay
> open
> for at least 72 hours ***
>
> Note that we are voting upon the source (tag), binaries are provided for
> convenience.
>
> Source and binary files:
>
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-cpp/pulsar-client-cpp-3.4.0-candidate-2/
>
> SHA-512 checksums:
>
>
> 10517590a2e4296d6767a044e58dd32c79e404a5136cf41126f4cb2416ed0ef8fb1ad7aa7da54c37c12ed71b05a527ed08bfac9d50d3550fa5d475c1e8c00950
>  apache-pulsar-client-cpp-3.4.0.tar.gz
>
>
> The tag to be voted upon:
> v3.4.0-candidate-2 (f337eff7caae93730ec1260810655cbb5a345e70)
> https://github.com/apache/pulsar-client-cpp/releases/tag/v3.4.0-candidate-2
>
> Pulsar's KEYS file containing PGP keys you use to sign the release:
> https://downloads.apache.org/pulsar/KEYS
>
> Please download the source package, and follow the README to compile and
> test.
>


Re: [DISSCUSS] PIP-317: Add `bookkeeperDeleted` field to show whether a ledger is deleted from the Bookie while using tiered storage

2023-11-07 Thread Dezhi Liu
Hi 刘燊,
I think this feature is necessary.

On 2023/11/06 11:13:53 刘燊 wrote:
> Hi community,
> 
> The motivation behind this PIP is to provide administrators and users with 
> better insights into the state of the ledgers and the overall storage usage. 
> By including the `bookkeeperDeleted` field in the `ledgers`, we can make it 
> easier for users to understand the current state of their ledgers, which can 
> be helpful for monitoring and troubleshooting purposes.
> 
> 
> Hopes for discuss.
> PIP: https://github.com/apache/pulsar/pull/21521
> Releted PR/issue: https://github.com/apache/pulsar/pull/20833
> 
> 
> 
> 
> --
> 
> Best Regards,
> Shen Liu


Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-07 Thread Rajan Dhabalia
Hi Lari/Girish,

I am sorry for jumping late in the discussion but I would like to
acknowledge the requirement of pluggable publish rate-limiter and I had
also asked it during implementation of publish rate limiter as well. There
are trade-offs between different rate-limiter implementations based on
accuracy, n/w usage, simplification and user should be able to choose one
based on the requirement. However, we don't have correct and extensible
Publish rate limiter interface right now, and before making it pluggable we
have to make sure that it should support any type of implementation for
example: token based or sliding-window based throttling, support of various
decaying functions (eg: exponential decay:
https://en.wikipedia.org/wiki/Exponential_decay), etc.. I haven't seen such
interface details and design in the PIP:
https://github.com/apache/pulsar/pull/21399/. So, I would encourage to work
towards building pluggable rate-limiter but current PIP is not ready as it
doesn't cover such generic interfaces that can support different types of
implementation.

Thanks,
Rajan

On Tue, Nov 7, 2023 at 10:02 AM Lari Hotari  wrote:

> Hi Girish,
>
> Replies inline.
>
> On Tue, 7 Nov 2023 at 15:26, Girish Sharma 
> wrote:
> >
> > Hello Lari, replies inline.
> >
> > I will also be going through some textbook rate limiters (the one you
> shared, plus others) and propose the one that at least suits our needs in
> the next reply.
>
>
> sounds good. I've been also trying to find more rate limiter resources
> that could be useful for our design.
>
> Bucket4J documentation gives some good ideas and it's shows how the
> token bucket algorithm could be varied. For example, the "Refill
> styles" section [1] is useful to read as an inspiration.
> In network routers, there's a concept of "dual token bucket"
> algorithms and by googling you can find both Cisco and Juniper
> documentation referencing this.
> I also asked ChatGPT-4 to explain "dual token bucket" algorithm [2].
>
> 1 - https://bucket4j.com/8.6.0/toc.html#refill-types
> 2 - https://chat.openai.com/share/d4f4f740-f675-4233-964e-2910a7c8ed24
>
> >>
> >> It is bi-weekly on Thursdays. The meeting calendar, zoom link and
> >> meeting notes can be found at
> >> https://github.com/apache/pulsar/wiki/Community-Meetings .
> >>
> >
> > Would it make sense for me to join this time given that you are skipping
> it?
>
> Yes, it's worth joining regularly when one is participating in Pulsar
> core development. There's usually a chance to discuss all topics that
> Pulsar community members bring up to discussion. A few times there
> haven't been any participants and in that case, it's good to ask on
> the #dev channel on Pulsar Slack whether others are joining the
> meeting.
>
> >>
> >> ok. btw. "metrics" doesn't necessarily mean providing the rate limiter
> >> metrics via Prometheus. There might be other ways to provide this
> >> information for components that could react to this.
> >> For example, it could a be system topic where these rate limiters emit
> events.
> >>
> >
> > Are there any other system topics than
> `tenent/namespace/__change_events` . While it's an improvement over
> querying metrics, it would still mean one consumer per namespace and would
> form a cyclic dependency - for example, in case a broker is degrading due
> to mis-use of bursting, it might lead to delays in the consumption of the
> event from the __change_events topic.
>
> The number of events is a fairly low volume so the possible
> degradation wouldn't necessarily impact the event communications for
> this purpose. I'm not sure if there are currently any public event
> topics in Pulsar that are supported in a way that an application could
> directly read from the topic.  However since this is an advanced case,
> I would assume that we don't need to focus on this in the first phase
> of improving rate limiters.
>
> >
> >
> >> I agree. I just brought up this example to ensure that your
> >> expectation about bursting isn't about controlling the rate limits
> >> based on situational information, such as end-to-end latency
> >> information.
> >> Such a feature could be useful, but it does complicate things.
> >> However, I think it's good to keep this on the radar since this might
> >> be needed to solve some advanced use cases.
> >>
> >
> > I still envision auto-scaling to be admin API driven rather than produce
> throughput driven. That way, it remains deterministic in nature. But it
> probably doesn't make sense to even talk about it until (partition)
> scale-down is possible.
>
>
> I agree. It's better to exclude this from the discussion at the moment
> so that it doesn't cause confusion. Modifying the rate limits up and
> down automatically with some component could be considered out of
> scope in the first phase. However, that might be controversial since
> the externally observable behavior of rate limiting with bursting
> support seems to be behave in a way where the rate limit changes
> 

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-07 Thread Lari Hotari
Hi Girish,

Replies inline.

On Tue, 7 Nov 2023 at 15:26, Girish Sharma  wrote:
>
> Hello Lari, replies inline.
>
> I will also be going through some textbook rate limiters (the one you shared, 
> plus others) and propose the one that at least suits our needs in the next 
> reply.


sounds good. I've been also trying to find more rate limiter resources
that could be useful for our design.

Bucket4J documentation gives some good ideas and it's shows how the
token bucket algorithm could be varied. For example, the "Refill
styles" section [1] is useful to read as an inspiration.
In network routers, there's a concept of "dual token bucket"
algorithms and by googling you can find both Cisco and Juniper
documentation referencing this.
I also asked ChatGPT-4 to explain "dual token bucket" algorithm [2].

1 - https://bucket4j.com/8.6.0/toc.html#refill-types
2 - https://chat.openai.com/share/d4f4f740-f675-4233-964e-2910a7c8ed24

>>
>> It is bi-weekly on Thursdays. The meeting calendar, zoom link and
>> meeting notes can be found at
>> https://github.com/apache/pulsar/wiki/Community-Meetings .
>>
>
> Would it make sense for me to join this time given that you are skipping it?

Yes, it's worth joining regularly when one is participating in Pulsar
core development. There's usually a chance to discuss all topics that
Pulsar community members bring up to discussion. A few times there
haven't been any participants and in that case, it's good to ask on
the #dev channel on Pulsar Slack whether others are joining the
meeting.

>>
>> ok. btw. "metrics" doesn't necessarily mean providing the rate limiter
>> metrics via Prometheus. There might be other ways to provide this
>> information for components that could react to this.
>> For example, it could a be system topic where these rate limiters emit 
>> events.
>>
>
> Are there any other system topics than `tenent/namespace/__change_events` . 
> While it's an improvement over querying metrics, it would still mean one 
> consumer per namespace and would form a cyclic dependency - for example, in 
> case a broker is degrading due to mis-use of bursting, it might lead to 
> delays in the consumption of the event from the __change_events topic.

The number of events is a fairly low volume so the possible
degradation wouldn't necessarily impact the event communications for
this purpose. I'm not sure if there are currently any public event
topics in Pulsar that are supported in a way that an application could
directly read from the topic.  However since this is an advanced case,
I would assume that we don't need to focus on this in the first phase
of improving rate limiters.

>
>
>> I agree. I just brought up this example to ensure that your
>> expectation about bursting isn't about controlling the rate limits
>> based on situational information, such as end-to-end latency
>> information.
>> Such a feature could be useful, but it does complicate things.
>> However, I think it's good to keep this on the radar since this might
>> be needed to solve some advanced use cases.
>>
>
> I still envision auto-scaling to be admin API driven rather than produce 
> throughput driven. That way, it remains deterministic in nature. But it 
> probably doesn't make sense to even talk about it until (partition) 
> scale-down is possible.


I agree. It's better to exclude this from the discussion at the moment
so that it doesn't cause confusion. Modifying the rate limits up and
down automatically with some component could be considered out of
scope in the first phase. However, that might be controversial since
the externally observable behavior of rate limiting with bursting
support seems to be behave in a way where the rate limit changes
automatically. The "auto scaling" aspect of rate limiting, modifying
the rate limits up and down, might be necessary as part of a rate
limiter implementation eventually. More of that later.

>
> In all of the 3 cases that I listed, the current behavior, with precise rate 
> limiting enabled, is to pause the netty channel in case the throughput 
> breaches the set limits. This eventually leads to timeout at the client side 
> in case the burst is significantly greater than the configured timeout on the 
> producer side.


Makes sense.

>
> The desired behavior in all three situations is to have a multiplier based 
> bursting capability for a fixed duration. For example, it could be that a 
> pulsar topic would be able to support 1.5x of the set quota for a burst 
> duration of up to 5 minutes. There also needs to be a cooldown period in such 
> a case that it would only accept one such burst every X minutes, say every 1 
> hour.


Thanks for sharing this concrete example. It will help a lot when
starting to design a solution which achieves this. I think that a
token bucket algorithm based design can achieve something very close
to what you are describing. In the first part I shared the references
to Bucket4J's "refill styles" and the concept of "dual token bucket"

Re: [OT] Evaluate Virtual thread [WAS][DISCUSS] Moving to Java 21

2023-11-07 Thread 太上玄元道君
In the past, threads usually blocked on LockUtil.park(…), and AQS is
depends on it to park threads.

Virtual thread only solved thread blocking on JDK layer by rewriting
LockUtik.park(…)

In some conditions, the carrier thread will be blocked (I only remember the
following points)
1. JNI calling
2. synchronized
3. system calling

tison 于2023年11月7日 周二21:19写道:

> Hi,
>
> I check the docs for Virtual Threads[1][2][3]. It comes up to me with two
> major concerns about its real-world improvement for Pulsar's scenario:
>
> 1. All of the virtual threads share the same schedule pool, which means
> that all tasks run on virtual threads competing with each other. It can be
> better to separate different logical concurrent groups into dedicated
> groups, although Goroutines share the same global scheduler also.
>
> 2. The point where the virtual thread "yield" ("unmount" in the documents)
> is not quite clear. It's written to be "usually Blocking IO" but can be
> also Future::get or others. It's not easy to audit the change.
>
> Best,
> tison.
>
> [1] https://openjdk.org/jeps/444
> [2]
>
> https://blogs.oracle.com/javamagazine/post/going-inside-javas-project-loom-and-virtual-threads
> [3] https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html
>
>
> Lari Hotari  于2023年10月21日周六 13:22写道:
>
> > Thanks for suggesting. That's a good way to prevent regressions. I made
> > the changes to schedule a daily build with JDK 21. Please review
> > https://github.com/apache/pulsar/pull/21410
> >
> > -Lari
> >
> > On 2023/10/20 12:22:44 Christophe Bornet wrote:
> > > Nice.
> > > Would it be possible to have a daily build on JDK 21 to ensure it runs
> > > properly ?
> > >
> > > Le ven. 20 oct. 2023 à 00:34, Lari Hotari  a
> écrit :
> > > >
> > > > Hi all,
> > > >
> > > > I can now confirm that apache/pulsar master branch compiles and runs
> > all tests in Pulsar CI successfully with Java 21. Therefore, we have
> > already accomplished the first level of Java 21 support.
> > > >
> > > > Example of Pulsar CI build with Java 21:
> > > > https://github.com/lhotari/pulsar/actions/runs/6577911040
> > > >
> > > > This experiment was run with PR #21400 changes which adds an option
> in
> > manually triggered GitHub Actions based Pulsar CI builds with Java 21
> > selected as the runtime for the build and test runtime and also as the
> Java
> > runtime for docker images/containers used in integration & system tests
> > which are part of the Pulsar CI build.
> > > >
> > > > Please review the PR https://github.com/apache/pulsar/pull/21400,
> > let's get it merged.
> > > > By default, Java 17 will be used, so it should be ok to merge this to
> > master branch without any separate decisions such as PIPs.
> > > >
> > > > -Lari
> > > >
> > > > On 2023/10/19 12:23:03 Lari Hotari wrote:
> > > > > I have created https://github.com/apache/pulsar/pull/21400 which
> > parameterizes the JDK version used for the Pulsar CI GitHub Actions
> > workflow. When triggering the workflow
> > > > > manually, it's possible to choose between JDK 17 and JDK 21 from a
> > dropdown menu.
> > > > > The PR contains more details, please review. Once we have this
> > merged, it will be easy to experiment with Java 21 when needed.
> > > > >
> > > > > -Lari
> > > > >
> > > > > On 2023/10/19 03:06:39 tison wrote:
> > > > > > > I think Java 21 can open the door for Virtual Threads[1].
> > > > > >
> > > > > > Yep. This should be a good motivation for using JDK 21.
> > > > > >
> > > > > > We may start a survey in the community a few months later for JDK
> > 21
> > > > > > feedback (as we /will/ switch the runtime to JDK 21 in Docker)
> and
> > try to
> > > > > > switch the toolkit.
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > Zixuan Liu  于2023年10月19日周四 10:56写道:
> > > > > >
> > > > > > > +1 for compatibility with Java 21.
> > > > > > >
> > > > > > > Next step: Migrating the Pulsar Server runtime to Java 21 from
> > Java 17
> > > > > > > in the Docker image and CI. Pulsar Client/Admin continues to
> use
> > Java
> > > > > > > 8.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Zixuan
> > > > > > >
> > > > > > > Lari Hotari  于2023年10月18日周三 06:02写道:
> > > > > > > >
> > > > > > > > Dear Pulsar community,
> > > > > > > >
> > > > > > > > Java 21 was released on September 19th and has now become the
> > current
> > > > > > > Java LTS release.
> > > > > > > >
> > > > > > > > I've begun preparations in the Pulsar code base to allow for
> > Java 21 to
> > > > > > > be used as the development runtime for compiling the code and
> > running tests
> > > > > > > in the master branch. This is a proactive measure to gear up
> for
> > Java 21
> > > > > > > without committing to the switch just yet. It will help us
> > understand the
> > > > > > > necessary changes when we are able to compile the code and run
> > all tests
> > > > > > > with Java 21.
> > > > > > > >
> > > > > > > > For instance, I initiated the process with the following PRs:
> > > 

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-07 Thread Asaf Mesika
I just want to add one thing to the mix here.

You can see by the amount of plugin interfaces Pulsar has, somebody "left
the door open" for too long.
You can agree with me that the number of those interfaces is not normal for
any open source software. I know HBase for example, or Kafka - never seen
so many in them.

You can also see the lack of attention to code quality and high level
overview by the poor implementation of current rate limiter.

The feeling is: I just need this tiny little thing and I don't have time -
so over time Pulsar got into this unmaintainable mess of public APIs and
some parts are simply unreadable - such as the rate limiters. I *still*
don't understand how rate limiting works in Pulsar, even when I read the
background  and browsed quickly through the code.

I can see the people on this thread are highly talented - let's use this to
make Pulsar better, both from a bird's-eye view and your own
personal requirement.


On Tue, Nov 7, 2023 at 3:26 PM Girish Sharma 
wrote:

> Hello Lari, replies inline.
>
> I will also be going through some textbook rate limiters (the one you
> shared, plus others) and propose the one that at least suits our needs in
> the next reply.
>
> On Tue, Nov 7, 2023 at 2:49 PM Lari Hotari  wrote:
>
>>
>> It is bi-weekly on Thursdays. The meeting calendar, zoom link and
>> meeting notes can be found at
>> https://github.com/apache/pulsar/wiki/Community-Meetings .
>>
>>
> Would it make sense for me to join this time given that you are skipping
> it?
>
>
>>
>> ok. btw. "metrics" doesn't necessarily mean providing the rate limiter
>> metrics via Prometheus. There might be other ways to provide this
>> information for components that could react to this.
>> For example, it could a be system topic where these rate limiters emit
>> events.
>>
>>
> Are there any other system topics than `tenent/namespace/__change_events`
> . While it's an improvement over querying metrics, it would still mean one
> consumer per namespace and would form a cyclic dependency - for example, in
> case a broker is degrading due to mis-use of bursting, it might lead to
> delays in the consumption of the event from the __change_events topic.
>
> I agree. I just brought up this example to ensure that your
>> expectation about bursting isn't about controlling the rate limits
>> based on situational information, such as end-to-end latency
>> information.
>> Such a feature could be useful, but it does complicate things.
>> However, I think it's good to keep this on the radar since this might
>> be needed to solve some advanced use cases.
>>
>>
> I still envision auto-scaling to be admin API driven rather than produce
> throughput driven. That way, it remains deterministic in nature. But it
> probably doesn't make sense to even talk about it until (partition)
> scale-down is possible.
>
>
>>
>> >- A producer(s) is producing at a near constant rate into a topic,
>> with
>> >equal distribution among partitions. Due to a hiccup in their
>> downstream
>> >component, the produce rate goes to 0 for a few seconds, and thus, to
>> >compensate, in the next few seconds, the produce rate tries to
>> double up.
>>
>> Could you also elaborate on details such as what is the current
>> behavior of Pulsar rate limiting / throttling solution and what would
>> be the desired behavior?
>> Just guessing that you mean that the desired behavior would be to
>> allow the produce rate to double up for some time (configurable)?
>> Compared to what rate is it doubled?
>> Please explain in detail what the current and desired behaviors would
>> be so that it's easier to understand the gap.
>>
>
> In all of the 3 cases that I listed, the current behavior, with precise
> rate limiting enabled, is to pause the netty channel in case the throughput
> breaches the set limits. This eventually leads to timeout at the client
> side in case the burst is significantly greater than the configured timeout
> on the producer side.
>
> The desired behavior in all three situations is to have a multiplier based
> bursting capability for a fixed duration. For example, it could be that a
> pulsar topic would be able to support 1.5x of the set quota for a burst
> duration of up to 5 minutes. There also needs to be a cooldown period in
> such a case that it would only accept one such burst every X minutes, say
> every 1 hour.
>
>
>>
>> >- In a visitor based produce rate (where produce rate goes up in the
>> day
>> >and goes down in the night, think in terms of popular website hourly
>> view
>> >counts pattern) , there are cases when, due to certain
>> external/internal
>> >triggers, the views - and thus - the produce rate spikes for a few
>> minutes.
>>
>> Again, please explain the current behavior and desired behavior.
>> Explicit example values of number of messages, bandwidth, etc. would
>> also be helpful details.
>>
>
> Adding to what I wrote above, think of this pattern like the following:
> the produce rate 

Re: [DISCUSS] Replace stale bot with ping-pong workflow

2023-11-07 Thread Asaf Mesika
Tison let's start as you suggested by disabling it


On Tue, May 16, 2023 at 5:13 AM Yunze Xu  wrote:

> +1 to me
>
> Thanks,
> Yunze
>
> On Sun, May 14, 2023 at 9:28 PM Dave Fisher  wrote:
> >
> > Hi -
> >
> > I have not looked at all your links but I think this is a great idea.
> This will help everyone pay attention better.
> >
> > Best,
> > Dave
> >
> > Sent from my iPhone
> >
> > > On May 14, 2023, at 12:33 AM, tison  wrote:
> > >
> > > Of course, changing the workflow cannot magically increase the
> bandwidth to
> > > handle stale issues. That is what the triage guide wants to encourage
> > > committers to practice. But such a move can reduce the frustrating
> > > experience and explicitly express who is responsible for taking the
> next
> > > action to nudge the conversation.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > tison  于2023年5月14日周日 15:28写道:
> > >
> > >> Hi devs,
> > >>
> > >> Recently, I have handled a large number of stale issues and noticed
> that
> > >> periodically notifying users that "the issue is stale" without any
> human
> > >> reaction can be a frustrating experience, e.g., ISSUE-13925[1].
> > >>
> > >> Learning from the INFRA JIRA project experience, I propose we replace
> the
> > >> stale bot with a ping-pong workflow. That is -
> > >>
> > >> ping - Labeling waiting-for-reviewer on issue created and commented by
> > >> non-committers
> > >> pong - Labeling waiting-for-user on issue responded by committers
> > >>
> > >> Here is a demo implementation[2] you can refer to and you can try the
> > >> workflow in my fork[3].
> > >>
> > >> Previous references -
> > >>
> > >> * The triage guide[4]
> > >> * [DISCUSS] Does stale bot make value for you?[5]
> > >> * [COMMITTER ATTENTION] You can close stale issues as not planned [6]
> > >>
> > >> Looking forward to your feedback :D
> > >>
> > >> Best,
> > >> tison.
> > >>
> > >> [1] https://github.com/apache/pulsar/issues/13925
> > >> [2] https://github.com/apache/pulsar/pull/20319
> > >> [3] https://github.com/tisonkun/pulsar
> > >> [4] https://pulsar.apache.org/contribute/develop-triage
> > >> [5] https://lists.apache.org/thread/tv774jqohdpx8x0dymsskrd90xwwfvgp
> > >> [6] https://lists.apache.org/thread/x2c7xod8y0wvh14nsb6bknf0dq3r9gls
> > >>
> > >>
> >
>


Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-07 Thread Girish Sharma
Hello Lari, replies inline.

I will also be going through some textbook rate limiters (the one you
shared, plus others) and propose the one that at least suits our needs in
the next reply.

On Tue, Nov 7, 2023 at 2:49 PM Lari Hotari  wrote:

>
> It is bi-weekly on Thursdays. The meeting calendar, zoom link and
> meeting notes can be found at
> https://github.com/apache/pulsar/wiki/Community-Meetings .
>
>
Would it make sense for me to join this time given that you are skipping it?


>
> ok. btw. "metrics" doesn't necessarily mean providing the rate limiter
> metrics via Prometheus. There might be other ways to provide this
> information for components that could react to this.
> For example, it could a be system topic where these rate limiters emit
> events.
>
>
Are there any other system topics than `tenent/namespace/__change_events` .
While it's an improvement over querying metrics, it would still mean one
consumer per namespace and would form a cyclic dependency - for example, in
case a broker is degrading due to mis-use of bursting, it might lead to
delays in the consumption of the event from the __change_events topic.

I agree. I just brought up this example to ensure that your
> expectation about bursting isn't about controlling the rate limits
> based on situational information, such as end-to-end latency
> information.
> Such a feature could be useful, but it does complicate things.
> However, I think it's good to keep this on the radar since this might
> be needed to solve some advanced use cases.
>
>
I still envision auto-scaling to be admin API driven rather than produce
throughput driven. That way, it remains deterministic in nature. But it
probably doesn't make sense to even talk about it until (partition)
scale-down is possible.


>
> >- A producer(s) is producing at a near constant rate into a topic,
> with
> >equal distribution among partitions. Due to a hiccup in their
> downstream
> >component, the produce rate goes to 0 for a few seconds, and thus, to
> >compensate, in the next few seconds, the produce rate tries to double
> up.
>
> Could you also elaborate on details such as what is the current
> behavior of Pulsar rate limiting / throttling solution and what would
> be the desired behavior?
> Just guessing that you mean that the desired behavior would be to
> allow the produce rate to double up for some time (configurable)?
> Compared to what rate is it doubled?
> Please explain in detail what the current and desired behaviors would
> be so that it's easier to understand the gap.
>

In all of the 3 cases that I listed, the current behavior, with precise
rate limiting enabled, is to pause the netty channel in case the throughput
breaches the set limits. This eventually leads to timeout at the client
side in case the burst is significantly greater than the configured timeout
on the producer side.

The desired behavior in all three situations is to have a multiplier based
bursting capability for a fixed duration. For example, it could be that a
pulsar topic would be able to support 1.5x of the set quota for a burst
duration of up to 5 minutes. There also needs to be a cooldown period in
such a case that it would only accept one such burst every X minutes, say
every 1 hour.


>
> >- In a visitor based produce rate (where produce rate goes up in the
> day
> >and goes down in the night, think in terms of popular website hourly
> view
> >counts pattern) , there are cases when, due to certain
> external/internal
> >triggers, the views - and thus - the produce rate spikes for a few
> minutes.
>
> Again, please explain the current behavior and desired behavior.
> Explicit example values of number of messages, bandwidth, etc. would
> also be helpful details.
>

Adding to what I wrote above, think of this pattern like the following: the
produce rate slowly increases from ~2MBps at around 4 AM to a known peak of
about 30MBps by 4 PM and stays around that peak until 9 PM after which is
again starts decreasing until it reaches ~2MBps around 2 AM.
Now, due to some external triggers, maybe a scheduled sale event, at 10PM,
the quota may spike up to 40MBps for 4-5 minutes and then again go back
down to the usual ~20MBps . Here is a rough image showcasing the trend.
[image: image.png]


>
> >It is also important to keep this in check so as to not allow bots to
> do
> >DDOS into your system, while that might be a responsibility of an
> upstream
> >system like API gateway, but we cannot be ignorant about that
> completely.
>
> what would be the desired behavior?
>

The desired behavior is that the burst support should be short lived (5-10
minutes) and limited to a fixed number of bursts in a duration (say - 1
burst per hour). Obviously, all of these should be configurable, maybe at a
broker level and not a topic level.


>
> >- In streaming systems, where there are micro batches, there might be
> >constant fluctuations in produce rate from time to time, 

Pulsar Flaky test report 2023-10-27 to 2023-11-06 for PR builds in CI

2023-11-07 Thread Lari Hotari
Dear Pulsar community,

Here's a report of the flaky tests in Pulsar CI during the observation
period of 2023-10-27 to 2023-11-06.

The Pulsar CI is in fairly good shape at the moment. We have been able
to reduce flakiness, and it is not currently slowing down PR
processing significantly.

The flaky test reporting has highlighted these tests as the most flaky ones:

https://github.com/apache/pulsar/issues/21287
PersistentDispatcherFailoverConsumerTest.testAddRemoveConsumer
11 failures

https://github.com/apache/pulsar/issues/13953
PulsarDebeziumOracleSourceTest.testDebeziumOracleDbSource
11 failures

https://github.com/apache/pulsar/issues/21469
fix PR: https://github.com/apache/pulsar/pull/21479
ExtensibleLoadManagerImplTest.testCheckOwnershipAsync
6 failures

https://github.com/apache/pulsar/issues/16786
PulsarFunctionsJavaProcessTest.testJavaExclamationFunction
6 failures

https://github.com/apache/pulsar/issues/21292
BrokerServiceLookupTest.testLookupConnectionNotCloseIfGetUnloadingExOrMetadataEx
3 failures

Putting focus on fixing the most flaky tests will be helpful.

More details in this Google sheet:
https://docs.google.com/spreadsheets/d/1gtu-XrLumjBFPk9kDKcJOQfxsvIE2EiuZO7IB7ab6q0/edit

Detailed reports and flaky test reporting source:
https://github.com/lhotari/pulsar-flakes/tree/master/2023-10-27-to-2023-11-06

In addition to the flaky test reporting, there's also thread leak
reporting in the Pulsar CI build.
Thread leaks could be one source of test flakiness and that's why it
is helpful to fix thread leaks in our tests and not introduce new
thread leaks.
You can view the thread leak reports in the unit test jobs in any of
the most recently Pulsar CI builds. For example, the scheduled builds
for the
master branch can be found here:
https://github.com/apache/pulsar/actions/workflows/pulsar-ci.yaml?query=branch%3Amaster+event%3Aschedule
.
example of a thread leak report:
https://github.com/apache/pulsar/actions/runs/6784235160/job/18440670499#step:16:23
(can be viewed by clicking on the "Report detected thread leaks" in
all unit test jobs)

To coordinate the work of fixing flaky tests,

1) please search for an existing issues or search for all flaky issues with
"flaky" or the test class name (without package) in the search:
https://github.com/apache/pulsar/issues?q=is%3Aopen+flaky+sort%3Aupdated-desc

2) If there isn't an issue for a particular flaky test failure that you'd
like to fix, please create an issue using the "Flaky test" template at
https://github.com/apache/pulsar/issues/new/choose

3) Please comment on the issue that you are working on it.

Let's continue to reduce the flakiness to make contributing to Pulsar
a better experience!

-Lari


[OT] Evaluate Virtual thread [WAS][DISCUSS] Moving to Java 21

2023-11-07 Thread tison
Hi,

I check the docs for Virtual Threads[1][2][3]. It comes up to me with two
major concerns about its real-world improvement for Pulsar's scenario:

1. All of the virtual threads share the same schedule pool, which means
that all tasks run on virtual threads competing with each other. It can be
better to separate different logical concurrent groups into dedicated
groups, although Goroutines share the same global scheduler also.

2. The point where the virtual thread "yield" ("unmount" in the documents)
is not quite clear. It's written to be "usually Blocking IO" but can be
also Future::get or others. It's not easy to audit the change.

Best,
tison.

[1] https://openjdk.org/jeps/444
[2]
https://blogs.oracle.com/javamagazine/post/going-inside-javas-project-loom-and-virtual-threads
[3] https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html


Lari Hotari  于2023年10月21日周六 13:22写道:

> Thanks for suggesting. That's a good way to prevent regressions. I made
> the changes to schedule a daily build with JDK 21. Please review
> https://github.com/apache/pulsar/pull/21410
>
> -Lari
>
> On 2023/10/20 12:22:44 Christophe Bornet wrote:
> > Nice.
> > Would it be possible to have a daily build on JDK 21 to ensure it runs
> > properly ?
> >
> > Le ven. 20 oct. 2023 à 00:34, Lari Hotari  a écrit :
> > >
> > > Hi all,
> > >
> > > I can now confirm that apache/pulsar master branch compiles and runs
> all tests in Pulsar CI successfully with Java 21. Therefore, we have
> already accomplished the first level of Java 21 support.
> > >
> > > Example of Pulsar CI build with Java 21:
> > > https://github.com/lhotari/pulsar/actions/runs/6577911040
> > >
> > > This experiment was run with PR #21400 changes which adds an option in
> manually triggered GitHub Actions based Pulsar CI builds with Java 21
> selected as the runtime for the build and test runtime and also as the Java
> runtime for docker images/containers used in integration & system tests
> which are part of the Pulsar CI build.
> > >
> > > Please review the PR https://github.com/apache/pulsar/pull/21400,
> let's get it merged.
> > > By default, Java 17 will be used, so it should be ok to merge this to
> master branch without any separate decisions such as PIPs.
> > >
> > > -Lari
> > >
> > > On 2023/10/19 12:23:03 Lari Hotari wrote:
> > > > I have created https://github.com/apache/pulsar/pull/21400 which
> parameterizes the JDK version used for the Pulsar CI GitHub Actions
> workflow. When triggering the workflow
> > > > manually, it's possible to choose between JDK 17 and JDK 21 from a
> dropdown menu.
> > > > The PR contains more details, please review. Once we have this
> merged, it will be easy to experiment with Java 21 when needed.
> > > >
> > > > -Lari
> > > >
> > > > On 2023/10/19 03:06:39 tison wrote:
> > > > > > I think Java 21 can open the door for Virtual Threads[1].
> > > > >
> > > > > Yep. This should be a good motivation for using JDK 21.
> > > > >
> > > > > We may start a survey in the community a few months later for JDK
> 21
> > > > > feedback (as we /will/ switch the runtime to JDK 21 in Docker) and
> try to
> > > > > switch the toolkit.
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > Zixuan Liu  于2023年10月19日周四 10:56写道:
> > > > >
> > > > > > +1 for compatibility with Java 21.
> > > > > >
> > > > > > Next step: Migrating the Pulsar Server runtime to Java 21 from
> Java 17
> > > > > > in the Docker image and CI. Pulsar Client/Admin continues to use
> Java
> > > > > > 8.
> > > > > >
> > > > > > Thanks,
> > > > > > Zixuan
> > > > > >
> > > > > > Lari Hotari  于2023年10月18日周三 06:02写道:
> > > > > > >
> > > > > > > Dear Pulsar community,
> > > > > > >
> > > > > > > Java 21 was released on September 19th and has now become the
> current
> > > > > > Java LTS release.
> > > > > > >
> > > > > > > I've begun preparations in the Pulsar code base to allow for
> Java 21 to
> > > > > > be used as the development runtime for compiling the code and
> running tests
> > > > > > in the master branch. This is a proactive measure to gear up for
> Java 21
> > > > > > without committing to the switch just yet. It will help us
> understand the
> > > > > > necessary changes when we are able to compile the code and run
> all tests
> > > > > > with Java 21.
> > > > > > >
> > > > > > > For instance, I initiated the process with the following PRs:
> > > > > > > - Upgrade Mockito to 5.6.0 to support Java 21 [1]
> > > > > > > - Upgrade Gradle Enterprise Maven Extension to support Java 21
> [2]
> > > > > > > After these are merged, it should be possible to start running
> tests
> > > > > > with Java 21 to see what is possibly broken and continue
> iterating.
> > > > > > > Moreover, the upgrade to Lombok 1.18.30 for Java 21 support
> has already
> > > > > > been merged [3].
> > > > > > >
> > > > > > > Java 17 has been the recommended runtime for Pulsar server
> components
> > > > > > since the Pulsar 2.11 release [4]. Meanwhile, the Pulsar client
> 

[ANNOUNCE] Apache Pulsar Go Client 0.11.1 released

2023-11-07 Thread Zike Yang
The Apache Pulsar team is proud to announce Apache Pulsar Go Client
version 0.11.1.

Pulsar is a highly scalable, low latency messaging platform running on
commodity hardware. It provides simple pub-sub semantics over topics,
guaranteed at-least-once delivery of messages, automatic cursor management for
subscribers, and cross-datacenter replication.

For Pulsar release details and downloads, visit:
https://github.com/apache/pulsar-client-go/releases/tag/v0.11.1

Release Notes are at:
https://github.com/apache/pulsar-client-go/blob/master/CHANGELOG.md

We would like to thank the contributors that made the release possible.

Regards,

The Pulsar Team


Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-07 Thread Lari Hotari
Hi Girish,

I think we are starting to get into the concrete details of rate
limiters and how we could start improving the existing feature.
It is very helpful that you are sharing your insight and experience of
operating Pulsar at scale.
Replies inline.

On Mon, 6 Nov 2023 at 15:37, Girish Sharma  wrote:
> Is this every thursday? I am willing to meet at a separate time as well if
> enough folks with a viewpoint on this can meet together. I assume that the
> community meeting has a much bigger agenda with detailed discussions not
> possible?

It is bi-weekly on Thursdays. The meeting calendar, zoom link and
meeting notes can be found at
https://github.com/apache/pulsar/wiki/Community-Meetings .

> It is all good, as long as the final goal is met within reasonable
> timelines.

+1

> Well, the blacklisting use case is a very specific use case. I am
> explaining below why that can't be done using metrics and a separate
> blacklisting API.

ok. btw. "metrics" doesn't necessarily mean providing the rate limiter
metrics via Prometheus. There might be other ways to provide this
information for components that could react to this.
For example, it could a be system topic where these rate limiters emit events.

> This actually might be a blessing in disguise, at least for RateLimiter and
> PublishRateLimiter.java, being an internal interface, it has gone out of
> hand and unchecked. Explained more below.

I agree. It is hard to reason about the existing solution.


> I would like to keep auto-scaling out of scope for this discussion. That
> opens up another huge can of worms, specially given the gaps in proper
> scale down support in pulsar.

I agree. I just brought up this example to ensure that your
expectation about bursting isn't about controlling the rate limits
based on situational information, such as end-to-end latency
information.
Such a feature could be useful, but it does complicate things.
However, I think it's good to keep this on the radar since this might
be needed to solve some advanced use cases.

> > I don't know what "bursting" means for you. Would it be possible to
> > provide concrete examples of desired behavior? That would be very
> > helpful in making progress.
> >
> >
> Here are a few different use cases:

These use cases clarify your requirements a lot. Thanks for sharing.

>- A producer(s) is producing at a near constant rate into a topic, with
>equal distribution among partitions. Due to a hiccup in their downstream
>component, the produce rate goes to 0 for a few seconds, and thus, to
>compensate, in the next few seconds, the produce rate tries to double up.

Could you also elaborate on details such as what is the current
behavior of Pulsar rate limiting / throttling solution and what would
be the desired behavior?
Just guessing that you mean that the desired behavior would be to
allow the produce rate to double up for some time (configurable)?
Compared to what rate is it doubled?
Please explain in detail what the current and desired behaviors would
be so that it's easier to understand the gap.

>- In a visitor based produce rate (where produce rate goes up in the day
>and goes down in the night, think in terms of popular website hourly view
>counts pattern) , there are cases when, due to certain external/internal
>triggers, the views - and thus - the produce rate spikes for a few minutes.

Again, please explain the current behavior and desired behavior.
Explicit example values of number of messages, bandwidth, etc. would
also be helpful details.

>It is also important to keep this in check so as to not allow bots to do
>DDOS into your system, while that might be a responsibility of an upstream
>system like API gateway, but we cannot be ignorant about that completely.

what would be the desired behavior?

>- In streaming systems, where there are micro batches, there might be
>constant fluctuations in produce rate from time to time, based on batch
>failure or retries.

could you share and examples with numbers about this use case too?
explaining current behavior and desired behavior?


>
> In all of these situations, setting the throughput of the topic to be the
> absolute maximum of the various spikes observed during the day is very
> suboptimal.

btw. A plain token bucket algorithm implementation doesn't have an
absolute maximum. The maximum average rate is controlled with the rate
of tokens added to the bucket. The capacity of the bucket controls how
much buffer there is to spend on spikes. If there's a need to also set
an absolute maximum rate, 2 token buckets could be chained to handle
that case. The second rate limiter could have an average rate of the
absolute maximum with a relatively small buffer (token bucket
capacity). There might also be more sophisticated algorithms which
vary the maximum rate in some way to smoothen spikes, but that might
just be completely unnecessary in Pulsar rate limiters.
Switching to use a 

Re: [VOTE] Pulsar Client Go Release 0.11.1 Candidate 1

2023-11-07 Thread Zike Yang
Close this vote by 3 binding +1:

- Yunze
- Penghui
- Mattison

BR,
Zike Yang

On Tue, Nov 7, 2023 at 4:43 PM Zike Yang  wrote:
>
> > The KEYS file link is dead, I think it should be
> https://downloads.apache.org/pulsar/KEYS
>
> Thanks for your reminder. Yes, that should be the correct link. I have
> also pushed a PR to fix the release process:
> https://github.com/apache/pulsar-client-go/pull/1127 PTAL. Thanks!
>
> BR,
> Zike Yang
>
> On Tue, Nov 7, 2023 at 4:00 PM mattison chao  wrote:
> >
> > +1 (binding)
> >
> > - Built from the source
> > - Ran the test on pulsar 3.0.1
> >
> > Best,
> > Mattison
> >
> > > On Sep 11, 2023, at 18:07, Zike Yang  wrote:
> > >
> > > Hi everyone,
> > > Please review and vote on the release candidate #1 for the version
> > > 0.11.1, as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > This is the first release candidate for Apache Pulsar Go client, version 
> > > 0.11.1.
> > >
> > > It fixes the following issues:
> > > https://github.com/apache/pulsar-client-go/compare/v0.11.0...v0.11.1-candidate-1
> > >
> > > Pulsar Client Go's KEYS file contains PGP keys we used to sign this 
> > > release:
> > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> > >
> > > Please download these packages and review this release candidate:
> > > - Review release notes: 
> > > https://github.com/apache/pulsar-client-go/pull/1092
> > > - Download the source package (verify shasum, and asc) and follow the
> > > README.md to build and run the pulsar-client-go.
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Source file:
> > > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-go-0.11.1-candidate-1/
> > >
> > > The tag to be voted upon:
> > > v0.11.1-candidate-1
> > > https://github.com/apache/pulsar-client-go/tree/v0.11.1-candidate-1
> > >
> > > SHA-512 checksums:
> > > d2209c652918acee8d2c77d52a0a556af16ff7fc3e30ad96d05e01285b83a61d1a1f0d32bace184f830e2dd2e4dd20910e9ce5ae23aac4a40eb3d19885cb0182
> > > apache-pulsar-client-go-0.11.1-src.tar.gz
> >


Re: [VOTE] Pulsar Client Go Release 0.11.1 Candidate 1

2023-11-07 Thread Zike Yang
> The KEYS file link is dead, I think it should be
https://downloads.apache.org/pulsar/KEYS

Thanks for your reminder. Yes, that should be the correct link. I have
also pushed a PR to fix the release process:
https://github.com/apache/pulsar-client-go/pull/1127 PTAL. Thanks!

BR,
Zike Yang

On Tue, Nov 7, 2023 at 4:00 PM mattison chao  wrote:
>
> +1 (binding)
>
> - Built from the source
> - Ran the test on pulsar 3.0.1
>
> Best,
> Mattison
>
> > On Sep 11, 2023, at 18:07, Zike Yang  wrote:
> >
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version
> > 0.11.1, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > This is the first release candidate for Apache Pulsar Go client, version 
> > 0.11.1.
> >
> > It fixes the following issues:
> > https://github.com/apache/pulsar-client-go/compare/v0.11.0...v0.11.1-candidate-1
> >
> > Pulsar Client Go's KEYS file contains PGP keys we used to sign this release:
> > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> >
> > Please download these packages and review this release candidate:
> > - Review release notes: https://github.com/apache/pulsar-client-go/pull/1092
> > - Download the source package (verify shasum, and asc) and follow the
> > README.md to build and run the pulsar-client-go.
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Source file:
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-go-0.11.1-candidate-1/
> >
> > The tag to be voted upon:
> > v0.11.1-candidate-1
> > https://github.com/apache/pulsar-client-go/tree/v0.11.1-candidate-1
> >
> > SHA-512 checksums:
> > d2209c652918acee8d2c77d52a0a556af16ff7fc3e30ad96d05e01285b83a61d1a1f0d32bace184f830e2dd2e4dd20910e9ce5ae23aac4a40eb3d19885cb0182
> > apache-pulsar-client-go-0.11.1-src.tar.gz
>


Re: [VOTE] Pulsar Client Go Release 0.11.1 Candidate 1

2023-11-07 Thread mattison chao
+1 (binding)

- Built from the source
- Ran the test on pulsar 3.0.1

Best,
Mattison

> On Sep 11, 2023, at 18:07, Zike Yang  wrote:
> 
> Hi everyone,
> Please review and vote on the release candidate #1 for the version
> 0.11.1, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> This is the first release candidate for Apache Pulsar Go client, version 
> 0.11.1.
> 
> It fixes the following issues:
> https://github.com/apache/pulsar-client-go/compare/v0.11.0...v0.11.1-candidate-1
> 
> Pulsar Client Go's KEYS file contains PGP keys we used to sign this release:
> https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> 
> Please download these packages and review this release candidate:
> - Review release notes: https://github.com/apache/pulsar-client-go/pull/1092
> - Download the source package (verify shasum, and asc) and follow the
> README.md to build and run the pulsar-client-go.
> 
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> 
> Source file:
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-go-0.11.1-candidate-1/
> 
> The tag to be voted upon:
> v0.11.1-candidate-1
> https://github.com/apache/pulsar-client-go/tree/v0.11.1-candidate-1
> 
> SHA-512 checksums:
> d2209c652918acee8d2c77d52a0a556af16ff7fc3e30ad96d05e01285b83a61d1a1f0d32bace184f830e2dd2e4dd20910e9ce5ae23aac4a40eb3d19885cb0182
> apache-pulsar-client-go-0.11.1-src.tar.gz