Replies inline

On Fri, 3 Nov 2023 at 20:48, Girish Sharma <scrapmachi...@gmail.com> wrote:

> Could you please elaborate more on these details? Here are some questions:
> > 1. What do you mean that it is too strict?
> >     - Should the rate limiting allow bursting over the limit for some
> time?
> >
>
> That's one of the major use cases, yes.
>

One possibility would be to improve the existing rate limiter to allow
bursting.
I think that Pulsar's out-of-the-box rate limiter should cover 99% of the
use cases instead of having one implementing their own rate limiter
algorithm.
The problems you are describing seem to be common to many Pulsar use cases,
and therefore, I think they should be handled directly in Pulsar.

Optimally, there would be a single solution that abstracts the rate
limiting in a way where it does the right thing based on the declarative
configuration.
I would prefer that over having a pluggable solution for rate limiter
implementations.

What would help is getting deeper in the design of the rate limiter itself,
without limiting ourselves to the existing rate limiter implementation in
Pulsar.

In textbooks, there are algorithms such as "leaky bucket" [1] and "token
bucket" [2]. Both algorithms have several variations and in some ways they
are very similar algorithms but looking from the different point of view.
It would possibly be easier to conceptualize and understand a rate limiting
algorithm if common algorithm names and implementation choices mentioned in
textbooks would be referenced in the implementation.
It seems that a "token bucket" type of algorithm can be used to implement
rate limiting with bursting. In the token bucket algorithm, the size of the
token bucket defines how large bursts will be allowed. The design could
also be something where 2 rate limiters with different type of algorithms
and/or configuration parameters are combined to achieve a desired behavior.
For example, to achieve a rate limiter with bursting and a fixed maximum
rate.
By default, the token bucket algorithm doesn't enforce a maximum rate for
bursts, but that could be achieved by chaining 2 rate limiters if that is
really needed.

The current Pulsar rate limiter implementation could be implemented in a
cleaner way, which would also be more efficient. Instead of having a
scheduler call a method once per second, I think that the rate limiter
could be implemented in a reactive way where the algorithm is implemented
without a scheduler.
I wonder if there are others that would be interested in getting down into
such implementation details?

1 - https://en.wikipedia.org/wiki/Leaky_bucket
2 - https://en.wikipedia.org/wiki/Token_bucket


> 2. What type of data loss are you experiencing?
> >
>
> Messages produced by the producers which eventually get timed out due to
> rate limiting.
>

Are you able to slow down producing on the client side? If that is
possible, there could be ways to improve ways to do client side back
pressure with Pulsar Client. Currently, the client doesn't expose this
information until the sending blocks or fails with an exception
(ProducerQueueIsFullError). Optimally, the client should slow down the rate
of producing to the rate that it can actually send to the broker.
Just curious if you have considered turning off producing timeouts on the
client side completely or making them longer? Would that address the data
loss problem?
Or is your event/message source "hot" so that you cannot stop or slow it
down, and it will just keep on flowing with a certain rate?

> I think the core implementation of how the broker fails fast at the time
> of rate limiting (whether it is by pausing netty channel or a new permits
> based model) does not change the actual issue I am targeting. Multiplexing
> has some impact on it - but yet again only limited, and can easily be fixed
> by the client by increasing the connections per broker. Even after assuming
> both these things are somehow "fixed", the fact remains that an absolutely
> strict rate limiter will lead to the above mentioned data loss for burst
> going above the limit and that a poller based rate limiter doesn't really
> rate limit anything as it allows all produce in the first interval of the
> next second.
>

Yes, it makes sense to have bursting configuration parameters in the rate
limiter.
As mentioned above, I think we could be improving the existing rate limiter
in Pulsar to cover 99% of the use case by making it stable and by including
the bursting configuration options.
Is there additional functionality you feel the rate limiter needs beyond
bursting support?

One way to workaround the multiplexing problem would be to add a client
side option for producers and consumers, where you could specify that the
client picks a separate TCP/IP connection that is not shared and isn't from
the connection pool.
Preventing connection multiplexing seems to be the only way to make the
current rate limiting deterministic and stable without adding the explicit
flow control to the Pulsar binary protocol for producers.

Are there other community members with input on the design and
implementation of an improved rate limiter?
I’m eager to continue this conversation and work together towards a robust
solution.

-Lari

Reply via email to