Re: [DISSCUSS] Don't retain null-key messages during topic compaction

2023-11-08 Thread Cong Zhao
Hi Enrico,

I don't think they conflict. We can do them separately.

We can add the policy to the per topic and per namespace again when we need it 
in the future, and the current PIP can just add a configuration to broker.conf 
first so that we can not retain the null-key messages and cherry-pick this fix 
into other branches with `compactionRemainNullKey=true` default to keep 
compatibility.

Thanks,
Cong Zhao

On 2023/11/08 07:26:30 Enrico Olivelli wrote:
> This is a good point.
> 
> Is it worth to make it configurable per topic and per namespace?
> 
> Even if this behavior is kind if a side effect or bug, maybe some
> applications rely on it.
> Flags that change the behavior per cluster are hard to adopt in nig cluster
> with many tenants.
> 
> We should always take multi tenancy in mind when we design features or
> changes.
> 
> 
> Thanks
> Enrico
> 
> Il Mer 8 Nov 2023, 07:55 Cong Zhao  ha scritto:
> 
> > Hello everyone,
> >
> > I open a PIP-318: https://github.com/apache/pulsar/pull/21541 for this
> > discussion.
> >
> > Any feedback and suggestions are welcome.
> >
> > Thanks,
> > Cong Zhao
> >
> > On 2023/11/07 02:55:52 Cong Zhao wrote:
> > > Hi, Pulsar community
> > >
> > > Currently, we retain all null-key messages during topic compaction,
> > which I
> > > don't think is necessary because when you use topic compaction,
> > > it means that you want to retain the value according to the key, so
> > > retaining null-key messages is meaningless.
> > >
> > > Additionally, retaining all null-key messages will double the storage
> > cost,
> > > and we'll never be able to clean them up since the compacted topic has
> > not
> > > supported the retention policy yet.
> > >
> > > In summary, I don't think we should retain null-key messages during topic
> > > compaction.
> > > Looking forward to your feedback!
> > >
> > > Thanks,
> > > Cong Zhao
> > >
> >
> 


Re: [VOTE] Pulsar Client C++ Release 3.4.0 Candidate 2

2023-11-08 Thread Yunze Xu
Hi Penghui,

It's caused by the relative path and I explained it that issue. Here
is an improvement for tests:
https://github.com/apache/pulsar-client-cpp/pull/340.

Thanks,
Yunze


Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-08 Thread Lari Hotari
Hi Girish,

replies inline.

On Thu, 9 Nov 2023 at 00:29, Girish Sharma  wrote:
> While dual-rate dual token bucket looks promising, there is still some
> challenge with respect to allowing a certain peak burst for/up to a bigger
> duration. I am explaining it below:

> Assume a 10MBps topic. Bursting support of 1.5x upto 2 minutes, once every
> 10 minute interval.

It's possible to have many ways to model a dual token buckets.
When there are tokens in the bucket, they are consumed as fast as
possible. This is why there is a need for the second token bucket
which is used to rate limit the traffic to the absolute maximum rate.
Technically the second bucket rate limits the average rate for a short
time window.

I'd pick the first bucket for handling the 10MB rate.
The capacity of the first bucket would be 15MB * 120=1800MB. The fill
would happen in special way. I'm not sure if Bucket4J has this at all.
So describing the way of adding tokens to the bucket: the tokens in
the bucket would remain the same when the rate is <10MB. As many
tokens would be added to the bucket as are consumed by the actual
traffic. The left over tokens 10MB - actual rate would go to a
separate filling bucket that gets poured into the actual bucket every
10 minutes.
This first bucket with this separate "filling bucket" would handle the
bursting up to 1800MB.
The second bucket would solely enforce the 1.5x limit of 15MB rate
with a small capacity bucket which enforces the average rate for a
short time window.
There's one nuance here. The bursting support will only allow bursting
if the average rate has been lower than 10MBps for the tokens to use
for the bursting to be usable.
It would be possible that for example 50% of the tokens would be
immediately available and 50% of the tokens are made available in the
"filling bucket" that gets poured into the actual bucket every 10
minutes. Without having some way to earn the burst, I don't think that
there's a reasonable way to make things usable. The 10MB limit
wouldn't have an actual meaning unless that is used to "earn" the
tokens to be used for the burst.

One other detail of topic publishing throttling in Pulsar is that the
actual throttling happens after the limit has been exceeded.
This is due to the fact that Pulsar's network handling uses Netty
where you cannot block. When using the token bucket concepts,
the tokens are always first consumed and after that there's a chance
to pause message publishing.
In code, you can find this at
https://github.com/apache/pulsar/blob/c0eec1e46edeb46c888fa28f27b199ea7e7a1574/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1794
and digging down from there.

In the current rate limiters in Pulsar, the implementation is not
optimized to how Pulsar uses rate limiting. There's no need to use a
scheduler for adding "permits" as it's called in the current rate
limiter. The new tokens to add can be calculated based on the elapsed
time. Resuming from the blocking state (auto read disabled -> enabled)
requires a scheduler. The duration of the pause should be calculated
based on the rate and the average message size. Because of the nature
of asynchronous rate limiting in Pulsar topic publish throttling, the
tokens can go to a negative value and it also does. The calculation of
the pause could also take this into count. The rate limiting will be
very accurate and efficient in this way since the scheduler will only
be needed when the token bucket runs out of tokens and there's really
a need to throttle. This is the change that I would do to the current
implementation in an experiment and see how things behave with the
revisited solution. Eliminating the precise rate limiter and having
just a single rate limiter would be part of this.
I think that the code base has multiple usages of the rate limiter.
The dispatching rate limiting might require some variation. IIRC, Rate
limiting is also used for some other reasons in the code base in a
blocking manner. For example, unloading of bundles is rate limited.
Working on the code base will reveal this.

> While the number of events in the system topic would be fairly low per
> namespace, the issue is that this system topic lies on the same broker
> where the actual topic/partitions exist and those partitions are leading to
> degradation of this particular broker.

Yes, that could be possible but perhaps unlikely.

> Agreed, there is a challenge, as it's not as straightforward as I've
> demonstrated in the example above.

Yes, it might require some rounds of experimentation. Although in
general, I think it's a fairly straightforward problem to solve as
long as the requirements could be adapted to some small details that
make sense for bursting in the context of rate limiting. The detail is
that the long time average rate shouldn't go over the configured rate
even with the bursts. That's why the tokens usable for the burst
should be "earned". I'm not sure if it even is necessary to enforce

Re: [VOTE] Pulsar Client C++ Release 3.4.0 Candidate 2

2023-11-08 Thread PengHui Li
Hi Yunze,

Thanks for the reminder. I enabled the proxy on my laptop.
After disabling the proxy, the timeout issue is gone.

Now, all the tests get passed except two consistently failed tests related
to the token test.

I have created a GitHub issue.

https://github.com/apache/pulsar-client-cpp/issues/339

Thanks,
Penghui

On Thu, Nov 9, 2023 at 1:33 AM Yunze Xu  wrote:

> I think it's caused by the ClientTest.testConnectTimeout. The
> resources seem not cleaned well in this test so it affected the tests
> later. Could you try the following command that ignores this test?
>
> ```bash
> ./pulsar-tests  --gtest_filter='-ClientTest.testConnectTimeout'
> ```
>
> If the rest of the tests pass, you can open an issue to track this
> flaky test. And I see the local host is "198.18.0.1", did you use any
> proxy when running the tests?
>
>
> > Is it possible to install the CPP client without building from the
> source code?
>
> Currently there is no "true" way to avoid building from source. You
> can edit the libpulsar ruby file to install the 3.4.0-candidate-2 (See
>
> https://github.com/Homebrew/homebrew-core/blob/master/CONTRIBUTING.md#to-contribute-a-fix-to-the-foo-formula
> )
> but it still builds from source. The only advantage is that all 3rd
> party dependencies are guaranteed to be installed from homebrew..
> Maybe we can add a workflow to upload the pre-built libraries for
> macOS in future.
>
> 
>
> BTW, I ran the `pulsar-tests` locally and it failed with two tests:
>
> ```
> [  FAILED  ] 2 tests, listed below:
> [  FAILED  ] CustomLoggerTest.testCustomLogger
> [  FAILED  ] LookupServiceTest.testMultiAddresses
> ```
>
> These tests seem flaky so I reran them individually by:
>
> ```
> ./tests/pulsar-tests --gtest_filter='CustomLoggerTest.*'
> ./tests/pulsar-tests --gtest_filter='*testMultiAddresses'
> ```
>
> `LookupServiceTest.testMultiAddresses` still failed. I will take a look
> soon.
>
> Thanks,
> Yunze
>
>
> On Wed, Nov 8, 2023 at 11:07 PM PengHui Li  wrote:
> >
> > Hi Yunze,
> >
> > I got an error when running the pulsar-tests (I tried multiple times,
> same
> > error)
> >
> > ```
> > 2023-11-08 22:56:43.676 INFO  [0x16db63000] ProducerImpl:216 |
> >
> [persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1,
> > ] Created producer on broker [[::1]:58479 -> [::1]:6650]
> > 2023-11-08 22:56:43.693 INFO  [0x1f086e080] ProducerImpl:791 |
> >
> [persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-0,
> > standalone-0-635] Closing producer for topic
> >
> persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-0
> > 2023-11-08 22:56:43.693 INFO  [0x1f086e080] ProducerImpl:791 |
> >
> [persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1,
> > standalone-0-636] Closing producer for topic
> >
> persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1
> > 2023-11-08 22:56:43.693 INFO  [0x16db63000] ProducerImpl:755 |
> >
> [persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-0,
> > standalone-0-635] Closed producer 0
> > 2023-11-08 22:56:43.693 INFO  [0x16db63000] ProducerImpl:755 |
> >
> [persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1,
> > standalone-0-636] Closed producer 1
> > 2023-11-08 22:56:44.773 WARN  [0x16ba07000] ConnectionPool:91 | Deleting
> > stale connection from pool for pulsar://192.0.2.1:1234-0 use_count: 1 @
> > 0x13000ea00
> > 2023-11-08 22:56:44.773 INFO  [0x16ba07000] ClientConnection:268 | [
> > 198.18.0.1:58467 -> 192.0.2.1:1234] Destroyed connection to pulsar://
> > 192.0.2.1:1234
> > 2023-11-08 22:56:44.773 INFO  [0x16ba07000] ClientConnection:188 |
> [
> > -> pulsar://192.0.2.1:1234] Create ClientConnection, timeout=1000
> > 2023-11-08 22:56:44.773 INFO  [0x16ba07000] ConnectionPool:109 | Created
> > connection for pulsar://192.0.2.1:1234
> > 2023-11-08 22:56:44.775 INFO  [0x16ba07000] ClientConnection:398 | [
> > 198.18.0.1:58482 -> 192.0.2.1:1234] Connected to broker
> > 2023-11-08 22:56:45.774 ERROR [0x16ba07000] ClientConnection:612 | [
> > 198.18.0.1:58482 -> 192.0.2.1:1234] Connection was not established in
> 1000
> > ms, close the socket
> > 2023-11-08 22:56:45.775 INFO  [0x16ba07000] ClientConnection:1319 | [
> > 198.18.0.1:58482 -> 192.0.2.1:1234] Connection disconnected (refCnt: 2)
> > 2023-11-08 22:56:45.775 ERROR [0x16ba07000] ClientImpl:199 | Error
> > Checking/Getting Partition Metadata while creating producer on
> > persistent://public/default/test-connect-timeout -- TimeOut
> > zsh: segmentation fault  ./pulsar-tests
> > ```
> >
> > Reproduce steps:
> >
> > - Build from downloaded source code (passed)
> > - Start pulsar service (./pulsar-test-service-start.sh)
> > - Run the test(./pulsar-tests)
> >
> > Then, I got the above errors after a few minutes.
> > I'm not sure it's just a test 

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-08 Thread Lari Hotari
Hi Girish,

Replies inline.

> > The current state of rate limiting is not acceptable in Pulsar. We
> > need to fix things in the core.
> >
>
> I wouldn't say that it's not acceptable. The precise one works as expected
> as a basic rate limiter. Its only when there are complex requirements, the
> current rate limiters fail.

Just clarifying, I consider the situation with the default rate
limiter not being
optimal. The CPU overhead is significant. You must explicitly
enable the "precise" rate limiters to resolve this issue. It's not
very obvious for any Pulsar user. I don't think that this situation
makes sense. If there's an abstraction of a rate limiter, it should be
efficient and usable in the default configuration of Pulsar.

>  The key here being "complex requirement". I am with Rajan here that
> whatever improvements we do to the core built-in rate limiter, would always
> miss one or the other complex requirement.

I haven't yet seen very complex requirements that relate directly to
the rate limiter. The scope could expand to the area of capacity
management, and I'm pretty sure that it gets there when we go further.
Capacity management is a broader concern than rate limiting. We all
know that capacity management is necessary in multi-tenant systems.
Rate limiting and throttling is one way to handle that. When going to
more complex requirements, it might be useful to go beyond rate limiting also in
the conceptual design.

For example DynamoDB has the concept of capacity units (CU).
DynamoDB's conceptual design for capacity management is well described
in a paper "Amazon DynamoDB: A Scalable, Predictably Performant, and
Fully Managed NoSQL Database Service" and a related presentation [1].
There's also other related blog posts such as "Surprising Scalability
of Multitenancy" [2] and "The Road To Serverless: Multi-tenancy" [3]
which have been inspirational to me. The paper "Kora: A Cloud-Native
Event Streaming Platform For Kafka" [4] is also a very useful read to
learn about serverless capacity management.

Capacity management goes beyond rate limiting since it has a tight
relation to end-to-end flow control, load balancing, service levels and
possible auto-scaling solutions. One of the goals of capacity
management in a multi-tenant
system is to address the dreaded "noisy neighbor" problem in a cost
optimal and efficient way.

1 - https://www.usenix.org/conference/atc22/presentation/elhemali
2 - https://brooker.co.za/blog/2023/03/23/economics.html
3 - https://me.0x.me/dbaas3.html
4 - https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf

>
> I feel like we need both the things here. We can work on improving the
> built in rate limiter which does not try to solve all of my needs like
> blacklisting support, limiting the number of bursts in a window etc. The
> built in one can be improved with respect to code refactoring,
> extensibility, modularization etc. along with implementing a more well
> known rate limiting algorithm, like token bucket.
> Along with this, the newly improved interface can now make way for
> pluggable support.

Yes, I agree. Improving the rate limiter and exposing an interface
aren't exclusive.


> This is assuming that we do not improve the default build in rate limiter
> at all. Think of this like AuthorizationProvider - there is a built in one.
> it's good enough, but each organization would have its own requirements wrt
> how they handle authorization, and thus, most likely, any organization with
> well defined AuthN/AuthZ constructs would be plugging their own providers
> there.

I don't think that this is a valid comparison. The current rate
limiter has an explicit "contract". The user sets the maximum rate in
bytes and/or messages and the rate limiter takes care of enforcing
that limit. It's hard to see why that "contract" would have too many
interpretations of what it means.
Another reason is that I haven't seen any other messaging product
where there would be a need to add support for user provided rate
limiter algorithm. What makes Pulsar a special case that it would be
needed?
For authentication and authorization, it's a completely different
story. The abstractions require that you pick a specific
implementation for your way of doing authentication and authorization.
Many other systems out there do it in a somewhat similar way as Pulsar.

> The key thing here would be to make the public interface as minimal as
> possible while allowing for custom complex rate limiters as well. I believe
> this is easily doable without actually making the internal code more
> complex.

It's doable, but a different question is whether this is necessary in
the end. We'll see over time how we can improve the Pulsar core rate
limiter and whether there's a need to override it.
The current interfaces will change when the Pulsar core rate limiter
is improved. This work won't easily meet in the middle unless we start
by improving the core rate limiter.

What Pulsar does right now might have to 

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-08 Thread Girish Sharma
Hello Lari,
I've now gone through a bunch of rate limiting algorithms, along with dual
rate, dual bucket algorithm.
Reply inline.

On Tue, Nov 7, 2023 at 11:32 PM Lari Hotari  wrote:

>
> Bucket4J documentation gives some good ideas and it's shows how the
> token bucket algorithm could be varied. For example, the "Refill
> styles" section [1] is useful to read as an inspiration.
> In network routers, there's a concept of "dual token bucket"
> algorithms and by googling you can find both Cisco and Juniper
> documentation referencing this.
> I also asked ChatGPT-4 to explain "dual token bucket" algorithm [2].
>
> 1 - https://bucket4j.com/8.6.0/toc.html#refill-types
> 2 - https://chat.openai.com/share/d4f4f740-f675-4233-964e-2910a7c8ed24
>
>
While dual-rate dual token bucket looks promising, there is still some
challenge with respect to allowing a certain peak burst for/up to a bigger
duration. I am explaining it below:

Assume a 10MBps topic. Bursting support of 1.5x upto 2 minutes, once every
10 minute interval.

The first bucket has the capacity of the consistent, long-term peak that
the topic would observe, so basically -
`limit.capacity(10_000_000).refillGreedy(1_000_000, ofMillis(100))` . Here,
I am not refilling every milli because the topic will never receive uniform
homogenous traffic at a millisecond level. The pattern would always be a
batch of one or more messages in an instant, then nothing for next few
instants, then another batch of one or more messages and so on.
This first bucket is good enough to handle rate limiting in normal cases
without bursting support. This is also better than the current precise rate
limiter in pulsar which allows for hotspotting of produce within a second
as this one spreads the quota over 100ms time windows.

The second bucket would have to have a slower refill rate and also a less
granular refill rate - think of something like
`limit.capacity(5_000_000).refillGreedy(5_000_000, ofMinutes(10))`.
Now the problem with this approach is that it would allow only 1 second
worth of additional 5MBps on the topic every 10 minutes.

Suppose I change the refill rate to
`limit.capacity(5_000_000).refillGreedy(5_000_000, ofSeconds(1))` - then it
does allow additional 5MBps burst every second, but now there is no cap
(the 2 minute cap).

I will have to think this through and do some more math to figure out if
this is possible at all using dual-rate double token method. Inputs
welcomed.

>
> >>
> >> ok. btw. "metrics" doesn't necessarily mean providing the rate limiter
> >> metrics via Prometheus. There might be other ways to provide this
> >> information for components that could react to this.
> >> For example, it could a be system topic where these rate limiters emit
> events.
> >>
> >
> > Are there any other system topics than
> `tenent/namespace/__change_events` . While it's an improvement over
> querying metrics, it would still mean one consumer per namespace and would
> form a cyclic dependency - for example, in case a broker is degrading due
> to mis-use of bursting, it might lead to delays in the consumption of the
> event from the __change_events topic.
>
> The number of events is a fairly low volume so the possible
> degradation wouldn't necessarily impact the event communications for
> this purpose. I'm not sure if there are currently any public event
> topics in Pulsar that are supported in a way that an application could
> directly read from the topic.  However since this is an advanced case,
> I would assume that we don't need to focus on this in the first phase
> of improving rate limiters.
>
>
While the number of events in the system topic would be fairly low per
namespace, the issue is that this system topic lies on the same broker
where the actual topic/partitions exist and those partitions are leading to
degradation of this particular broker.



> I agree. It's better to exclude this from the discussion at the moment
> so that it doesn't cause confusion. Modifying the rate limits up and
> down automatically with some component could be considered out of
> scope in the first phase. However, that might be controversial since
> the externally observable behavior of rate limiting with bursting
> support seems to be behave in a way where the rate limit changes
> automatically. The "auto scaling" aspect of rate limiting, modifying
> the rate limits up and down, might be necessary as part of a rate
> limiter implementation eventually. More of that later.
>
Not to drag on this topic in this discussion, but doesn't this basically
mean that there is no "rate limiting" at all? Basically as good as setting
it `-1` and just let the hardware handle it to its best extent.


> >
> > The desired behavior in all three situations is to have a multiplier
> based bursting capability for a fixed duration. For example, it could be
> that a pulsar topic would be able to support 1.5x of the set quota for a
> burst duration of up to 5 minutes. There also needs to be a cooldown period
> 

Writing the Pulsar article for Wikipedia. Looking for in-depth, reliable, secondary, independent resources on the subject.

2023-11-08 Thread Kiryl Valkovich
Hi everyone! I think that Pulsar deserves an article on Wikipedia.

The problem is that it’s surprisingly hard to find something in-depth, 
reliable, secondary, and independent on the subject.
Most of the good information is in Pulsar documentation, StreamNative blog, 
DataStax blog, Apache Foundation resources, and on other blogs mostly from 
authors who are now or have been employees in companies that are interested in 
Pulsar's success.
By Wikipedia rules all such sources are not “independent”.
Any info from the documentation isn’t “secondary”.
Talks on YouTube aren’t “reliable” sources.

I pointed the moderator to the https://en.wikipedia.org/wiki/Apache_Kafka 
article has about the same quality of sources. The answer was:

  *   I’ve marked that article as needing assistance, as it has similar 
problems to your draft.

Maybe someone can point me to such resources?

If you have experience writing articles for Wikipedia and know its rules and 
how to better deal with moderators, I won't refuse help.

Draft of the article. It became quite brief after all the edits: 
https://en.wikipedia.org/wiki/Draft:Apache_Pulsar
Talk: https://en.wikipedia.org/wiki/User_talk:Visortelle


Best,
Kiryl



Re: [VOTE] Pulsar Client C++ Release 3.4.0 Candidate 2

2023-11-08 Thread Yunze Xu
I think it's caused by the ClientTest.testConnectTimeout. The
resources seem not cleaned well in this test so it affected the tests
later. Could you try the following command that ignores this test?

```bash
./pulsar-tests  --gtest_filter='-ClientTest.testConnectTimeout'
```

If the rest of the tests pass, you can open an issue to track this
flaky test. And I see the local host is "198.18.0.1", did you use any
proxy when running the tests?


> Is it possible to install the CPP client without building from the source 
> code?

Currently there is no "true" way to avoid building from source. You
can edit the libpulsar ruby file to install the 3.4.0-candidate-2 (See
https://github.com/Homebrew/homebrew-core/blob/master/CONTRIBUTING.md#to-contribute-a-fix-to-the-foo-formula)
but it still builds from source. The only advantage is that all 3rd
party dependencies are guaranteed to be installed from homebrew..
Maybe we can add a workflow to upload the pre-built libraries for
macOS in future.



BTW, I ran the `pulsar-tests` locally and it failed with two tests:

```
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] CustomLoggerTest.testCustomLogger
[  FAILED  ] LookupServiceTest.testMultiAddresses
```

These tests seem flaky so I reran them individually by:

```
./tests/pulsar-tests --gtest_filter='CustomLoggerTest.*'
./tests/pulsar-tests --gtest_filter='*testMultiAddresses'
```

`LookupServiceTest.testMultiAddresses` still failed. I will take a look soon.

Thanks,
Yunze


On Wed, Nov 8, 2023 at 11:07 PM PengHui Li  wrote:
>
> Hi Yunze,
>
> I got an error when running the pulsar-tests (I tried multiple times, same
> error)
>
> ```
> 2023-11-08 22:56:43.676 INFO  [0x16db63000] ProducerImpl:216 |
> [persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1,
> ] Created producer on broker [[::1]:58479 -> [::1]:6650]
> 2023-11-08 22:56:43.693 INFO  [0x1f086e080] ProducerImpl:791 |
> [persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-0,
> standalone-0-635] Closing producer for topic
> persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-0
> 2023-11-08 22:56:43.693 INFO  [0x1f086e080] ProducerImpl:791 |
> [persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1,
> standalone-0-636] Closing producer for topic
> persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1
> 2023-11-08 22:56:43.693 INFO  [0x16db63000] ProducerImpl:755 |
> [persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-0,
> standalone-0-635] Closed producer 0
> 2023-11-08 22:56:43.693 INFO  [0x16db63000] ProducerImpl:755 |
> [persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1,
> standalone-0-636] Closed producer 1
> 2023-11-08 22:56:44.773 WARN  [0x16ba07000] ConnectionPool:91 | Deleting
> stale connection from pool for pulsar://192.0.2.1:1234-0 use_count: 1 @
> 0x13000ea00
> 2023-11-08 22:56:44.773 INFO  [0x16ba07000] ClientConnection:268 | [
> 198.18.0.1:58467 -> 192.0.2.1:1234] Destroyed connection to pulsar://
> 192.0.2.1:1234
> 2023-11-08 22:56:44.773 INFO  [0x16ba07000] ClientConnection:188 | [
> -> pulsar://192.0.2.1:1234] Create ClientConnection, timeout=1000
> 2023-11-08 22:56:44.773 INFO  [0x16ba07000] ConnectionPool:109 | Created
> connection for pulsar://192.0.2.1:1234
> 2023-11-08 22:56:44.775 INFO  [0x16ba07000] ClientConnection:398 | [
> 198.18.0.1:58482 -> 192.0.2.1:1234] Connected to broker
> 2023-11-08 22:56:45.774 ERROR [0x16ba07000] ClientConnection:612 | [
> 198.18.0.1:58482 -> 192.0.2.1:1234] Connection was not established in 1000
> ms, close the socket
> 2023-11-08 22:56:45.775 INFO  [0x16ba07000] ClientConnection:1319 | [
> 198.18.0.1:58482 -> 192.0.2.1:1234] Connection disconnected (refCnt: 2)
> 2023-11-08 22:56:45.775 ERROR [0x16ba07000] ClientImpl:199 | Error
> Checking/Getting Partition Metadata while creating producer on
> persistent://public/default/test-connect-timeout -- TimeOut
> zsh: segmentation fault  ./pulsar-tests
> ```
>
> Reproduce steps:
>
> - Build from downloaded source code (passed)
> - Start pulsar service (./pulsar-test-service-start.sh)
> - Run the test(./pulsar-tests)
>
> Then, I got the above errors after a few minutes.
> I'm not sure it's just a test issue or not.
>
> One more question:
>
> Is it possible to install the CPP client without building from the source
> code?
> From
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-cpp/pulsar-client-cpp-3.4.0-candidate-2/
> I can find the package for Linux and Windows. But no package for macOS.
>
> Regards,
> Penghui
>
> On Wed, Nov 8, 2023 at 10:45 PM Yubiao Feng
>  wrote:
>
> > Hi all
> >
> > Sorry, I'll send another email explaining what tests were done.
> >
> > Please ignore the previous email.
> >
> > Thanks
> > Yubiao Feng
> >
> >
> > On Wed, Nov 8, 2023 at 11:48 AM 

Re: [VOTE] Pulsar Client C++ Release 3.4.0 Candidate 2

2023-11-08 Thread PengHui Li
Hi Yunze,

I got an error when running the pulsar-tests (I tried multiple times, same
error)

```
2023-11-08 22:56:43.676 INFO  [0x16db63000] ProducerImpl:216 |
[persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1,
] Created producer on broker [[::1]:58479 -> [::1]:6650]
2023-11-08 22:56:43.693 INFO  [0x1f086e080] ProducerImpl:791 |
[persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-0,
standalone-0-635] Closing producer for topic
persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-0
2023-11-08 22:56:43.693 INFO  [0x1f086e080] ProducerImpl:791 |
[persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1,
standalone-0-636] Closing producer for topic
persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1
2023-11-08 22:56:43.693 INFO  [0x16db63000] ProducerImpl:755 |
[persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-0,
standalone-0-635] Closed producer 0
2023-11-08 22:56:43.693 INFO  [0x16db63000] ProducerImpl:755 |
[persistent://public/default/testPartitionedConsumerUnexpectedAckTimeout1699455403-partition-1,
standalone-0-636] Closed producer 1
2023-11-08 22:56:44.773 WARN  [0x16ba07000] ConnectionPool:91 | Deleting
stale connection from pool for pulsar://192.0.2.1:1234-0 use_count: 1 @
0x13000ea00
2023-11-08 22:56:44.773 INFO  [0x16ba07000] ClientConnection:268 | [
198.18.0.1:58467 -> 192.0.2.1:1234] Destroyed connection to pulsar://
192.0.2.1:1234
2023-11-08 22:56:44.773 INFO  [0x16ba07000] ClientConnection:188 | [
-> pulsar://192.0.2.1:1234] Create ClientConnection, timeout=1000
2023-11-08 22:56:44.773 INFO  [0x16ba07000] ConnectionPool:109 | Created
connection for pulsar://192.0.2.1:1234
2023-11-08 22:56:44.775 INFO  [0x16ba07000] ClientConnection:398 | [
198.18.0.1:58482 -> 192.0.2.1:1234] Connected to broker
2023-11-08 22:56:45.774 ERROR [0x16ba07000] ClientConnection:612 | [
198.18.0.1:58482 -> 192.0.2.1:1234] Connection was not established in 1000
ms, close the socket
2023-11-08 22:56:45.775 INFO  [0x16ba07000] ClientConnection:1319 | [
198.18.0.1:58482 -> 192.0.2.1:1234] Connection disconnected (refCnt: 2)
2023-11-08 22:56:45.775 ERROR [0x16ba07000] ClientImpl:199 | Error
Checking/Getting Partition Metadata while creating producer on
persistent://public/default/test-connect-timeout -- TimeOut
zsh: segmentation fault  ./pulsar-tests
```

Reproduce steps:

- Build from downloaded source code (passed)
- Start pulsar service (./pulsar-test-service-start.sh)
- Run the test(./pulsar-tests)

Then, I got the above errors after a few minutes.
I'm not sure it's just a test issue or not.

One more question:

Is it possible to install the CPP client without building from the source
code?
From
https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-cpp/pulsar-client-cpp-3.4.0-candidate-2/
I can find the package for Linux and Windows. But no package for macOS.

Regards,
Penghui

On Wed, Nov 8, 2023 at 10:45 PM Yubiao Feng
 wrote:

> Hi all
>
> Sorry, I'll send another email explaining what tests were done.
>
> Please ignore the previous email.
>
> Thanks
> Yubiao Feng
>
>
> On Wed, Nov 8, 2023 at 11:48 AM Yubiao Feng 
> wrote:
>
> > +1 (no-binding)
> >
> > Thanks
> > Yubiao Feng
> >
> > On Tue, Nov 7, 2023 at 3:03 PM Yunze Xu  wrote:
> >
> >> This is the second release candidate for Apache Pulsar Client C++,
> >> version 3.4.0.
> >>
> >> It fixes the following issues:
> >> https://github.com/apache/pulsar-client-cpp/milestone/5?closed=1
> >>
> >> *** Please download, test and vote on this release. This vote will stay
> >> open
> >> for at least 72 hours ***
> >>
> >> Note that we are voting upon the source (tag), binaries are provided for
> >> convenience.
> >>
> >> Source and binary files:
> >>
> >>
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-cpp/pulsar-client-cpp-3.4.0-candidate-2/
> >>
> >> SHA-512 checksums:
> >>
> >>
> >>
> 10517590a2e4296d6767a044e58dd32c79e404a5136cf41126f4cb2416ed0ef8fb1ad7aa7da54c37c12ed71b05a527ed08bfac9d50d3550fa5d475c1e8c00950
> >>  apache-pulsar-client-cpp-3.4.0.tar.gz
> >>
> >>
> >> The tag to be voted upon:
> >> v3.4.0-candidate-2 (f337eff7caae93730ec1260810655cbb5a345e70)
> >>
> >>
> https://github.com/apache/pulsar-client-cpp/releases/tag/v3.4.0-candidate-2
> >>
> >> Pulsar's KEYS file containing PGP keys you use to sign the release:
> >> https://downloads.apache.org/pulsar/KEYS
> >>
> >> Please download the source package, and follow the README to compile and
> >> test.
> >>
> >
>


Re: [VOTE] Pulsar Client C++ Release 3.4.0 Candidate 2

2023-11-08 Thread Yubiao Feng
Hi all

Sorry, I'll send another email explaining what tests were done.

Please ignore the previous email.

Thanks
Yubiao Feng


On Wed, Nov 8, 2023 at 11:48 AM Yubiao Feng 
wrote:

> +1 (no-binding)
>
> Thanks
> Yubiao Feng
>
> On Tue, Nov 7, 2023 at 3:03 PM Yunze Xu  wrote:
>
>> This is the second release candidate for Apache Pulsar Client C++,
>> version 3.4.0.
>>
>> It fixes the following issues:
>> https://github.com/apache/pulsar-client-cpp/milestone/5?closed=1
>>
>> *** Please download, test and vote on this release. This vote will stay
>> open
>> for at least 72 hours ***
>>
>> Note that we are voting upon the source (tag), binaries are provided for
>> convenience.
>>
>> Source and binary files:
>>
>> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-cpp/pulsar-client-cpp-3.4.0-candidate-2/
>>
>> SHA-512 checksums:
>>
>>
>> 10517590a2e4296d6767a044e58dd32c79e404a5136cf41126f4cb2416ed0ef8fb1ad7aa7da54c37c12ed71b05a527ed08bfac9d50d3550fa5d475c1e8c00950
>>  apache-pulsar-client-cpp-3.4.0.tar.gz
>>
>>
>> The tag to be voted upon:
>> v3.4.0-candidate-2 (f337eff7caae93730ec1260810655cbb5a345e70)
>>
>> https://github.com/apache/pulsar-client-cpp/releases/tag/v3.4.0-candidate-2
>>
>> Pulsar's KEYS file containing PGP keys you use to sign the release:
>> https://downloads.apache.org/pulsar/KEYS
>>
>> Please download the source package, and follow the README to compile and
>> test.
>>
>


[DISCUSS] PIP-316 Create a producerName field for DeadLetterPolicy

2023-11-08 Thread Jie crossover
Hi dev,
I proposed a PIP: https://github.com/apache/pulsar/pull/21507 to create a
producerName for DeadLetterPolicy.
Please take a look and give your feedback.
-- 
Best Regards!
crossoverJie


Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-08 Thread Girish Sharma
Hello Lari, while I am yet to reply to your yesterday''s email, I am trying
to wrap this discussion about the need of pluggable rate limiter, so
replying to this first.

Comments inline


On Wed, Nov 8, 2023 at 5:35 PM Lari Hotari  wrote:

> Hi Girish,
>
> > On Asaf's comment on too many public interfaces in Pulsar and no other
> > Apache software having so many public interfaces - I would like to ask,
> has
> > that brought in any con though? For this particular use case, I feel like
> > having it has a public interface would actually improve the code quality
> > and design as the usage would be checked and changes would go through
> > scrutiny (unlike how the current PublishRateLimiter evolved unchecked).
> > Asaf - what are your thoughts on this? Are you okay with making the
> > PublishRateLimiter pluggable with a better interface?
>
> As a reply to this question, I'd like to highlight my previous email
> in this thread where I responded to Rajan.
> The current state of rate limiting is not acceptable in Pulsar. We
> need to fix things in the core.
>

I wouldn't say that it's not acceptable. The precise one works as expected
as a basic rate limiter. Its only when there are complex requirements, the
current rate limiters fail.
 The key here being "complex requirement". I am with Rajan here that
whatever improvements we do to the core built-in rate limiter, would always
miss one or the other complex requirement.

I feel like we need both the things here. We can work on improving the
built in rate limiter which does not try to solve all of my needs like
blacklisting support, limiting the number of bursts in a window etc. The
built in one can be improved with respect to code refactoring,
extensibility, modularization etc. along with implementing a more well
known rate limiting algorithm, like token bucket.
Along with this, the newly improved interface can now make way for
pluggable support.


>
> One of the downsides of adding a lot of pluggability and variation is
> the additional fragmentation and complexity that it adds to Pulsar. It
> doesn't really make sense if you need to install an external library
> to make the rate limiting feature usable in Pulsar. Rate limiting is
>

This is assuming that we do not improve the default build in rate limiter
at all. Think of this like AuthorizationProvider - there is a built in one.
it's good enough, but each organization would have its own requirements wrt
how they handle authorization, and thus, most likely, any organization with
well defined AuthN/AuthZ constructs would be plugging their own providers
there.

The key thing here would be to make the public interface as minimal as
possible while allowing for custom complex rate limiters as well. I believe
this is easily doable without actually making the internal code more
complex.
In fact, the way I envision the pluggable proposal, things become simpler
with respect to code flow, code ownership and custom if/else.



> only needed when operating Pulsar at scale. The core feature must
> support this to make any sense.
>

I will give another example here, In big organizations, where pulsar is
actually being used at scale - and not just in terms of QPS/MBps, but also
in terms of number of teams, tenants, namespaces, number of unique features
being used, there always would be an in-house schema registry. Thus, while
pulsar already has a built in schema service and registry - it is important
that it also supports custom ones. This does not speak badly about the
based pulsar package, but actually speaks more about the adaptability of
the product.


>
> The side product of this could be the pluggable interface if the core
> feature cannot be extended to cover the requirements that Pulsar users
> really have.
> Girish, it has been extremely valuable that you have shared concrete
> examples of your rate limiting related requirements. I'm confident
> that we will be able to cover the majority of your requirements in
> Pulsar core. It might be the case that when you try out the new
> features, it might actually cover your needs sufficiently.
>

I really respect and appreciate the discussions we have had. One of the
problems I've had in the pulsar community earlier is the lack of
participation. But I am getting a lot of participation this time, so its
really good.

I am willing to take this forward to improve the default rate limiters, but
since we would _have_ to meeting somewhere in the middle, at the end of all
this - our organizational requirements would still remain unfulfilled until
we build _all_ of the things that I have spoken about.

Just as Rajan was late to the discussion and he pointed out that they also
needed custom rate limiter a while back, there may he others who are either
unknownst to this mailing list, or are yet to look into rate limiter all
together who may find that whatever we have built/modified/improved is
still lacking.


>
> Let's keep the focus on improving the rate limiting in Pulsar core.
> 

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-08 Thread Lari Hotari
Hi Girish,

> On Asaf's comment on too many public interfaces in Pulsar and no other
> Apache software having so many public interfaces - I would like to ask, has
> that brought in any con though? For this particular use case, I feel like
> having it has a public interface would actually improve the code quality
> and design as the usage would be checked and changes would go through
> scrutiny (unlike how the current PublishRateLimiter evolved unchecked).
> Asaf - what are your thoughts on this? Are you okay with making the
> PublishRateLimiter pluggable with a better interface?

As a reply to this question, I'd like to highlight my previous email
in this thread where I responded to Rajan.
The current state of rate limiting is not acceptable in Pulsar. We
need to fix things in the core.

One of the downsides of adding a lot of pluggability and variation is
the additional fragmentation and complexity that it adds to Pulsar. It
doesn't really make sense if you need to install an external library
to make the rate limiting feature usable in Pulsar. Rate limiting is
only needed when operating Pulsar at scale. The core feature must
support this to make any sense.

The side product of this could be the pluggable interface if the core
feature cannot be extended to cover the requirements that Pulsar users
really have.
Girish, it has been extremely valuable that you have shared concrete
examples of your rate limiting related requirements. I'm confident
that we will be able to cover the majority of your requirements in
Pulsar core. It might be the case that when you try out the new
features, it might actually cover your needs sufficiently.

Let's keep the focus on improving the rate limiting in Pulsar core.
The possible pluggable interface could follow.


-Lari

On Wed, 8 Nov 2023 at 10:46, Girish Sharma  wrote:
>
> Hello Rajan,
> I haven't updated the PIP with a better interface for PublishRateLimiter
> yet as the discussion here in this thread went in a different direction.
>
> Personally, I agree with you that even if we choose one algorithm and
> improve the built-in rate limiter, it still may not suit all use cases as
> you have mentioned.
>
> On Asaf's comment on too many public interfaces in Pulsar and no other
> Apache software having so many public interfaces - I would like to ask, has
> that brought in any con though? For this particular use case, I feel like
> having it has a public interface would actually improve the code quality
> and design as the usage would be checked and changes would go through
> scrutiny (unlike how the current PublishRateLimiter evolved unchecked).
> Asaf - what are your thoughts on this? Are you okay with making the
> PublishRateLimiter pluggable with a better interface?
>
>
>
>
>
> On Wed, Nov 8, 2023 at 5:43 AM Rajan Dhabalia  wrote:
>
> > Hi Lari/Girish,
> >
> > I am sorry for jumping late in the discussion but I would like to
> > acknowledge the requirement of pluggable publish rate-limiter and I had
> > also asked it during implementation of publish rate limiter as well. There
> > are trade-offs between different rate-limiter implementations based on
> > accuracy, n/w usage, simplification and user should be able to choose one
> > based on the requirement. However, we don't have correct and extensible
> > Publish rate limiter interface right now, and before making it pluggable we
> > have to make sure that it should support any type of implementation for
> > example: token based or sliding-window based throttling, support of various
> > decaying functions (eg: exponential decay:
> > https://en.wikipedia.org/wiki/Exponential_decay), etc.. I haven't seen
> > such
> > interface details and design in the PIP:
> > https://github.com/apache/pulsar/pull/21399/. So, I would encourage to
> > work
> > towards building pluggable rate-limiter but current PIP is not ready as it
> > doesn't cover such generic interfaces that can support different types of
> > implementation.
> >
> > Thanks,
> > Rajan
> >
> > On Tue, Nov 7, 2023 at 10:02 AM Lari Hotari  wrote:
> >
> > > Hi Girish,
> > >
> > > Replies inline.
> > >
> > > On Tue, 7 Nov 2023 at 15:26, Girish Sharma 
> > > wrote:
> > > >
> > > > Hello Lari, replies inline.
> > > >
> > > > I will also be going through some textbook rate limiters (the one you
> > > shared, plus others) and propose the one that at least suits our needs in
> > > the next reply.
> > >
> > >
> > > sounds good. I've been also trying to find more rate limiter resources
> > > that could be useful for our design.
> > >
> > > Bucket4J documentation gives some good ideas and it's shows how the
> > > token bucket algorithm could be varied. For example, the "Refill
> > > styles" section [1] is useful to read as an inspiration.
> > > In network routers, there's a concept of "dual token bucket"
> > > algorithms and by googling you can find both Cisco and Juniper
> > > documentation referencing this.
> > > I also asked ChatGPT-4 to explain "dual token bucket" algorithm [2].

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-08 Thread Lari Hotari
Hi Rajan,

Thank you for sharing your opinion. It appears you are in favor of
pluggable interfaces for rate limiters. I would like to offer a
perspective on why we should defer the pluggability aspect of rate
limiters. If there is a real need, it can be considered later.

Currently, there's a pressing need to enhance the built-in rate
limiters in Pulsar. The current state is not good; our default rate
limiter is nearly ineffective and unusable at scale due to its high
CPU usage and overhead. Additionally, while the precise rate limiter
doesn't have CPU issues, it does lack the capability for configuring
an average rate over an extended period.

I propose we eliminate the precise rate limiter as a distinct option
and converge towards a single, configurable rate limiter solution.
Using terms like "precise" in our configuration options exposes
unnecessary implementation details and is a practice we should avoid.

Our abstractions should be designed to allow for underlying
improvements without compromising the established "contract" of the
feature. For rate limiting, we can maintain the current external
behavior while completely overhauling the internal workings.

Introducing "bursting" features, which enable average rate
calculations over longer durations, will necessitate additional
configuration options to define this time window and possibly a
separate maximum rate.

Let's prioritize refining the core rate-limiting capabilities of
Pulsar. Afterwards, we can revisit the idea of pluggable rate limiters
if they still seem necessary.
Concentrating on Pulsar's built-in rate limiters will also help us
identify the interfaces needed for pluggable rate limiters.
Moreover, by focusing on the actual rate limiting behavior and
collating Pulsar user requirements, we can potentially design generic
solutions that address the majority of needs within Pulsar's core.

Hence, our immediate goal should be to advance these improvements and
integrate them into Pulsar's core as soon as possible.

-Lari


On Wed, 8 Nov 2023 at 02:13, Rajan Dhabalia  wrote:
>
> Hi Lari/Girish,
>
> I am sorry for jumping late in the discussion but I would like to
> acknowledge the requirement of pluggable publish rate-limiter and I had
> also asked it during implementation of publish rate limiter as well. There
> are trade-offs between different rate-limiter implementations based on
> accuracy, n/w usage, simplification and user should be able to choose one
> based on the requirement. However, we don't have correct and extensible
> Publish rate limiter interface right now, and before making it pluggable we
> have to make sure that it should support any type of implementation for
> example: token based or sliding-window based throttling, support of various
> decaying functions (eg: exponential decay:
> https://en.wikipedia.org/wiki/Exponential_decay), etc.. I haven't seen such
> interface details and design in the PIP:
> https://github.com/apache/pulsar/pull/21399/. So, I would encourage to work
> towards building pluggable rate-limiter but current PIP is not ready as it
> doesn't cover such generic interfaces that can support different types of
> implementation.
>
> Thanks,
> Rajan
>
> On Tue, Nov 7, 2023 at 10:02 AM Lari Hotari  wrote:
>
> > Hi Girish,
> >
> > Replies inline.
> >
> > On Tue, 7 Nov 2023 at 15:26, Girish Sharma 
> > wrote:
> > >
> > > Hello Lari, replies inline.
> > >
> > > I will also be going through some textbook rate limiters (the one you
> > shared, plus others) and propose the one that at least suits our needs in
> > the next reply.
> >
> >
> > sounds good. I've been also trying to find more rate limiter resources
> > that could be useful for our design.
> >
> > Bucket4J documentation gives some good ideas and it's shows how the
> > token bucket algorithm could be varied. For example, the "Refill
> > styles" section [1] is useful to read as an inspiration.
> > In network routers, there's a concept of "dual token bucket"
> > algorithms and by googling you can find both Cisco and Juniper
> > documentation referencing this.
> > I also asked ChatGPT-4 to explain "dual token bucket" algorithm [2].
> >
> > 1 - https://bucket4j.com/8.6.0/toc.html#refill-types
> > 2 - https://chat.openai.com/share/d4f4f740-f675-4233-964e-2910a7c8ed24
> >
> > >>
> > >> It is bi-weekly on Thursdays. The meeting calendar, zoom link and
> > >> meeting notes can be found at
> > >> https://github.com/apache/pulsar/wiki/Community-Meetings .
> > >>
> > >
> > > Would it make sense for me to join this time given that you are skipping
> > it?
> >
> > Yes, it's worth joining regularly when one is participating in Pulsar
> > core development. There's usually a chance to discuss all topics that
> > Pulsar community members bring up to discussion. A few times there
> > haven't been any participants and in that case, it's good to ask on
> > the #dev channel on Pulsar Slack whether others are joining the
> > meeting.
> >
> > >>
> > >> ok. btw. "metrics" doesn't 

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-08 Thread Girish Sharma
Hello Rajan,
I haven't updated the PIP with a better interface for PublishRateLimiter
yet as the discussion here in this thread went in a different direction.

Personally, I agree with you that even if we choose one algorithm and
improve the built-in rate limiter, it still may not suit all use cases as
you have mentioned.

On Asaf's comment on too many public interfaces in Pulsar and no other
Apache software having so many public interfaces - I would like to ask, has
that brought in any con though? For this particular use case, I feel like
having it has a public interface would actually improve the code quality
and design as the usage would be checked and changes would go through
scrutiny (unlike how the current PublishRateLimiter evolved unchecked).
Asaf - what are your thoughts on this? Are you okay with making the
PublishRateLimiter pluggable with a better interface?





On Wed, Nov 8, 2023 at 5:43 AM Rajan Dhabalia  wrote:

> Hi Lari/Girish,
>
> I am sorry for jumping late in the discussion but I would like to
> acknowledge the requirement of pluggable publish rate-limiter and I had
> also asked it during implementation of publish rate limiter as well. There
> are trade-offs between different rate-limiter implementations based on
> accuracy, n/w usage, simplification and user should be able to choose one
> based on the requirement. However, we don't have correct and extensible
> Publish rate limiter interface right now, and before making it pluggable we
> have to make sure that it should support any type of implementation for
> example: token based or sliding-window based throttling, support of various
> decaying functions (eg: exponential decay:
> https://en.wikipedia.org/wiki/Exponential_decay), etc.. I haven't seen
> such
> interface details and design in the PIP:
> https://github.com/apache/pulsar/pull/21399/. So, I would encourage to
> work
> towards building pluggable rate-limiter but current PIP is not ready as it
> doesn't cover such generic interfaces that can support different types of
> implementation.
>
> Thanks,
> Rajan
>
> On Tue, Nov 7, 2023 at 10:02 AM Lari Hotari  wrote:
>
> > Hi Girish,
> >
> > Replies inline.
> >
> > On Tue, 7 Nov 2023 at 15:26, Girish Sharma 
> > wrote:
> > >
> > > Hello Lari, replies inline.
> > >
> > > I will also be going through some textbook rate limiters (the one you
> > shared, plus others) and propose the one that at least suits our needs in
> > the next reply.
> >
> >
> > sounds good. I've been also trying to find more rate limiter resources
> > that could be useful for our design.
> >
> > Bucket4J documentation gives some good ideas and it's shows how the
> > token bucket algorithm could be varied. For example, the "Refill
> > styles" section [1] is useful to read as an inspiration.
> > In network routers, there's a concept of "dual token bucket"
> > algorithms and by googling you can find both Cisco and Juniper
> > documentation referencing this.
> > I also asked ChatGPT-4 to explain "dual token bucket" algorithm [2].
> >
> > 1 - https://bucket4j.com/8.6.0/toc.html#refill-types
> > 2 - https://chat.openai.com/share/d4f4f740-f675-4233-964e-2910a7c8ed24
> >
> > >>
> > >> It is bi-weekly on Thursdays. The meeting calendar, zoom link and
> > >> meeting notes can be found at
> > >> https://github.com/apache/pulsar/wiki/Community-Meetings .
> > >>
> > >
> > > Would it make sense for me to join this time given that you are
> skipping
> > it?
> >
> > Yes, it's worth joining regularly when one is participating in Pulsar
> > core development. There's usually a chance to discuss all topics that
> > Pulsar community members bring up to discussion. A few times there
> > haven't been any participants and in that case, it's good to ask on
> > the #dev channel on Pulsar Slack whether others are joining the
> > meeting.
> >
> > >>
> > >> ok. btw. "metrics" doesn't necessarily mean providing the rate limiter
> > >> metrics via Prometheus. There might be other ways to provide this
> > >> information for components that could react to this.
> > >> For example, it could a be system topic where these rate limiters emit
> > events.
> > >>
> > >
> > > Are there any other system topics than
> > `tenent/namespace/__change_events` . While it's an improvement over
> > querying metrics, it would still mean one consumer per namespace and
> would
> > form a cyclic dependency - for example, in case a broker is degrading due
> > to mis-use of bursting, it might lead to delays in the consumption of the
> > event from the __change_events topic.
> >
> > The number of events is a fairly low volume so the possible
> > degradation wouldn't necessarily impact the event communications for
> > this purpose. I'm not sure if there are currently any public event
> > topics in Pulsar that are supported in a way that an application could
> > directly read from the topic.  However since this is an advanced case,
> > I would assume that we don't need to focus on this in the first phase
> > of