Replies inline.

On Sat, Nov 4, 2023 at 8:55 PM Lari Hotari <lhot...@apache.org> wrote:

>
> One possibility would be to improve the existing rate limiter to allow
> bursting.
> I think that Pulsar's out-of-the-box rate limiter should cover 99% of the
> use cases instead of having one implementing their own rate limiter
> algorithm.
>

There are challenges in this. As explained in the PIP, there are several
different usages of rate limiter, stats, unloading, etc. While I am open to
having a burstable rate limiter in pulsar out of box, it might complicate
things considering backward compatibility etc. More on this below.


> The problems you are describing seem to be common to many Pulsar use cases,
> and therefore, I think they should be handled directly in Pulsar.
>

I personally haven't seen many burstability related discussions; so this
feature might actually not be that useful for all current Pulsar users.


>
> Optimally, there would be a single solution that abstracts the rate
> limiting in a way where it does the right thing based on the declarative
> configuration.
> I would prefer that over having a pluggable solution for rate limiter
> implementations.
>
> What would help is getting deeper in the design of the rate limiter itself,
> without limiting ourselves to the existing rate limiter implementation in
> Pulsar.
>

I would personally suggest we tackle this problem in parts so that it's
available incrementally over versions rather than making the scope so big
that it takes pulsar 4.0 for these features to land.


>
> In textbooks, there are algorithms such as "leaky bucket" [1] and "token
> bucket" [2]. Both algorithms have several variations and in some ways they
> are very similar algorithms but looking from the different point of view.
> It would possibly be easier to conceptualize and understand a rate limiting
> algorithm if common algorithm names and implementation choices mentioned in
> textbooks would be referenced in the implementation.
> It seems that a "token bucket" type of algorithm can be used to implement
> rate limiting with bursting. In the token bucket algorithm, the size of the
> token bucket defines how large bursts will be allowed. The design could
> also be something where 2 rate limiters with different type of algorithms
> and/or configuration parameters are combined to achieve a desired behavior.
> For example, to achieve a rate limiter with bursting and a fixed maximum
> rate.
> By default, the token bucket algorithm doesn't enforce a maximum rate for
> bursts, but that could be achieved by chaining 2 rate limiters if that is
> really needed.
>

Yes, we were also thinking on the same terms once this is pluggable. The
idea was to have some numbers and real world usage backing an
implementation of rate limiter before merging it back into pulsar. Any
decision we would take right now would be limited only by theoretical
discussion of the implementation and our assumption that it covers 99% of
the use cases, probably just like how the precise and poller ones came into
being.


>
> The current Pulsar rate limiter implementation could be implemented in a
> cleaner way, which would also be more efficient. Instead of having a
> scheduler call a method once per second, I think that the rate limiter
> could be implemented in a reactive way where the algorithm is implemented
> without a scheduler.
> I wonder if there are others that would be interested in getting down into
> such implementation details?
>

That would be touching RateLimiter.java class, while my goal is to
improve/touch the outer classes, mainly around the interface
PublishRateLimiter.java

With this discussion taking a much bigger turn, how or where do we limit
this PIP's scope? I am happy to work on follow up PIPs which may arrive out
of this discussion.


> Are you able to slow down producing on the client side? If that is
> possible, there could be ways to improve ways to do client side back
> pressure with Pulsar Client. Currently, the client doesn't expose this
> information until the sending blocks or fails with an exception
> (ProducerQueueIsFullError). Optimally, the client should slow down the rate
> of producing to the rate that it can actually send to the broker.
> Just curious if you have considered turning off producing timeouts on the
> client side completely or making them longer? Would that address the data
> loss problem?
> Or is your event/message source "hot" so that you cannot stop or slow it
> down, and it will just keep on flowing with a certain rate?
>
> Mostly, the sources are hot sources and can't be slowed down. The lack of
clear error message (client-server protocol limitation) is also another
issue that I was planning to tackle in another PIP.


Yes, it makes sense to have bursting configuration parameters in the rate
> limiter.
> As mentioned above, I think we could be improving the existing rate limiter
> in Pulsar to cover 99% of the use case by making it stable and by including
> the bursting configuration options.
> Is there additional functionality you feel the rate limiter needs beyond
> bursting support?
>
>
There are a few other custom things. For example, there would be cases of
short and medium term blacklisting of topics based on breach of rate
limiter beyond a given SOP. I feel this is very very specific to our
organization right now to be included inside pulsar itself.


> One way to workaround the multiplexing problem would be to add a client
> side option for producers and consumers, where you could specify that the
> client picks a separate TCP/IP connection that is not shared and isn't from
> the connection pool.
> Preventing connection multiplexing seems to be the only way to make the
> current rate limiting deterministic and stable without adding the explicit
> flow control to the Pulsar binary protocol for producers.
>

Actually, by default, and for 99% of the cases, multiplexing isn't an issue
assuming:
* A single producer object is producing to a single topic (one or more
partition)
* Produce is happening in a round robin manner (by default)

Due to these assumptions, it is more than likely that all partitions are
doing uniform QPS and MBps, thus disabling auto-read off at the netty layer
doesn't have that drastic impact on the rate limiting aspect.



>
> Are there other community members with input on the design and
> implementation of an improved rate limiter?
> I’m eager to continue this conversation and work together towards a robust
> solution.
>

Again, I would love this to land in pieces so that TAT for actual usage is
much faster. What do you suggest from that perspective?


>
> -Lari
>


-- 
Girish Sharma

Reply via email to