Re: [DISCUSS] PIP-310: Support custom publish rate limiters

Girish Sharma Wed, 08 Nov 2023 04:33:46 -0800

Hello Lari, while I am yet to reply to your yesterday''s email, I am trying
to wrap this discussion about the need of pluggable rate limiter, so
replying to this first.


Comments inline


On Wed, Nov 8, 2023 at 5:35 PM Lari Hotari <lhot...@apache.org> wrote:

> Hi Girish,
>
> > On Asaf's comment on too many public interfaces in Pulsar and no other
> > Apache software having so many public interfaces - I would like to ask,
> has
> > that brought in any con though? For this particular use case, I feel like
> > having it has a public interface would actually improve the code quality
> > and design as the usage would be checked and changes would go through
> > scrutiny (unlike how the current PublishRateLimiter evolved unchecked).
> > Asaf - what are your thoughts on this? Are you okay with making the
> > PublishRateLimiter pluggable with a better interface?
>
> As a reply to this question, I'd like to highlight my previous email
> in this thread where I responded to Rajan.
> The current state of rate limiting is not acceptable in Pulsar. We
> need to fix things in the core.
>

I wouldn't say that it's not acceptable. The precise one works as expected
as a basic rate limiter. Its only when there are complex requirements, the
current rate limiters fail.
 The key here being "complex requirement". I am with Rajan here that
whatever improvements we do to the core built-in rate limiter, would always
miss one or the other complex requirement.

I feel like we need both the things here. We can work on improving the
built in rate limiter which does not try to solve all of my needs like
blacklisting support, limiting the number of bursts in a window etc. The
built in one can be improved with respect to code refactoring,
extensibility, modularization etc. along with implementing a more well
known rate limiting algorithm, like token bucket.
Along with this, the newly improved interface can now make way for
pluggable support.


>
> One of the downsides of adding a lot of pluggability and variation is
> the additional fragmentation and complexity that it adds to Pulsar. It
> doesn't really make sense if you need to install an external library
> to make the rate limiting feature usable in Pulsar. Rate limiting is
>

This is assuming that we do not improve the default build in rate limiter
at all. Think of this like AuthorizationProvider - there is a built in one.
it's good enough, but each organization would have its own requirements wrt
how they handle authorization, and thus, most likely, any organization with
well defined AuthN/AuthZ constructs would be plugging their own providers
there.

The key thing here would be to make the public interface as minimal as
possible while allowing for custom complex rate limiters as well. I believe
this is easily doable without actually making the internal code more
complex.
In fact, the way I envision the pluggable proposal, things become simpler
with respect to code flow, code ownership and custom if/else.



> only needed when operating Pulsar at scale. The core feature must
> support this to make any sense.
>

I will give another example here, In big organizations, where pulsar is
actually being used at scale - and not just in terms of QPS/MBps, but also
in terms of number of teams, tenants, namespaces, number of unique features
being used, there always would be an in-house schema registry. Thus, while
pulsar already has a built in schema service and registry - it is important
that it also supports custom ones. This does not speak badly about the
based pulsar package, but actually speaks more about the adaptability of
the product.


>
> The side product of this could be the pluggable interface if the core
> feature cannot be extended to cover the requirements that Pulsar users
> really have.
> Girish, it has been extremely valuable that you have shared concrete
> examples of your rate limiting related requirements. I'm confident
> that we will be able to cover the majority of your requirements in
> Pulsar core. It might be the case that when you try out the new
> features, it might actually cover your needs sufficiently.
>

I really respect and appreciate the discussions we have had. One of the
problems I've had in the pulsar community earlier is the lack of
participation. But I am getting a lot of participation this time, so its
really good.

I am willing to take this forward to improve the default rate limiters, but
since we would _have_ to meeting somewhere in the middle, at the end of all
this - our organizational requirements would still remain unfulfilled until
we build _all_ of the things that I have spoken about.

Just as Rajan was late to the discussion and he pointed out that they also
needed custom rate limiter a while back, there may he others who are either
unknownst to this mailing list, or are yet to look into rate limiter all
together who may find that whatever we have built/modified/improved is
still lacking.


>
> Let's keep the focus on improving the rate limiting in Pulsar core.
> The possible pluggable interface could follow.
>

I personally would like to work on improving the interface and making it
pluggable first. This is a smaller task both from a design and coding
perspective. Meanwhile, I will create another proposal for improving the
built in rate limiter. Since we have had a lot of discussion about how to
improve the rate limiter, and I will continue discussing which rate limiter
works best in my opinion, I think we can be in a liberty to take a bit of
extra time in discussing and closing the improved rate limiter design. Of
course, I will keep the interface definition in mind while proposing the
improved rate limiter and vice versa.

What are your thoughts here?


>
>
> -Lari
>
> On Wed, 8 Nov 2023 at 10:46, Girish Sharma <scrapmachi...@gmail.com>
> wrote:
> >
> > Hello Rajan,
> > I haven't updated the PIP with a better interface for PublishRateLimiter
> > yet as the discussion here in this thread went in a different direction.
> >
> > Personally, I agree with you that even if we choose one algorithm and
> > improve the built-in rate limiter, it still may not suit all use cases as
> > you have mentioned.
> >
> > On Asaf's comment on too many public interfaces in Pulsar and no other
> > Apache software having so many public interfaces - I would like to ask,
> has
> > that brought in any con though? For this particular use case, I feel like
> > having it has a public interface would actually improve the code quality
> > and design as the usage would be checked and changes would go through
> > scrutiny (unlike how the current PublishRateLimiter evolved unchecked).
> > Asaf - what are your thoughts on this? Are you okay with making the
> > PublishRateLimiter pluggable with a better interface?
> >
> >
> >
> >
> >
> > On Wed, Nov 8, 2023 at 5:43 AM Rajan Dhabalia <rdhaba...@apache.org>
> wrote:
> >
> > > Hi Lari/Girish,
> > >
> > > I am sorry for jumping late in the discussion but I would like to
> > > acknowledge the requirement of pluggable publish rate-limiter and I had
> > > also asked it during implementation of publish rate limiter as well.
> There
> > > are trade-offs between different rate-limiter implementations based on
> > > accuracy, n/w usage, simplification and user should be able to choose
> one
> > > based on the requirement. However, we don't have correct and extensible
> > > Publish rate limiter interface right now, and before making it
> pluggable we
> > > have to make sure that it should support any type of implementation for
> > > example: token based or sliding-window based throttling, support of
> various
> > > decaying functions (eg: exponential decay:
> > > https://en.wikipedia.org/wiki/Exponential_decay), etc.. I haven't seen
> > > such
> > > interface details and design in the PIP:
> > > https://github.com/apache/pulsar/pull/21399/. So, I would encourage to
> > > work
> > > towards building pluggable rate-limiter but current PIP is not ready
> as it
> > > doesn't cover such generic interfaces that can support different types
> of
> > > implementation.
> > >
> > > Thanks,
> > > Rajan
> > >
> > > On Tue, Nov 7, 2023 at 10:02 AM Lari Hotari <lhot...@apache.org>
> wrote:
> > >
> > > > Hi Girish,
> > > >
> > > > Replies inline.
> > > >
> > > > On Tue, 7 Nov 2023 at 15:26, Girish Sharma <scrapmachi...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hello Lari, replies inline.
> > > > >
> > > > > I will also be going through some textbook rate limiters (the one
> you
> > > > shared, plus others) and propose the one that at least suits our
> needs in
> > > > the next reply.
> > > >
> > > >
> > > > sounds good. I've been also trying to find more rate limiter
> resources
> > > > that could be useful for our design.
> > > >
> > > > Bucket4J documentation gives some good ideas and it's shows how the
> > > > token bucket algorithm could be varied. For example, the "Refill
> > > > styles" section [1] is useful to read as an inspiration.
> > > > In network routers, there's a concept of "dual token bucket"
> > > > algorithms and by googling you can find both Cisco and Juniper
> > > > documentation referencing this.
> > > > I also asked ChatGPT-4 to explain "dual token bucket" algorithm [2].
> > > >
> > > > 1 - https://bucket4j.com/8.6.0/toc.html#refill-types
> > > > 2 -
> https://chat.openai.com/share/d4f4f740-f675-4233-964e-2910a7c8ed24
> > > >
> > > > >>
> > > > >> It is bi-weekly on Thursdays. The meeting calendar, zoom link and
> > > > >> meeting notes can be found at
> > > > >> https://github.com/apache/pulsar/wiki/Community-Meetings .
> > > > >>
> > > > >
> > > > > Would it make sense for me to join this time given that you are
> > > skipping
> > > > it?
> > > >
> > > > Yes, it's worth joining regularly when one is participating in Pulsar
> > > > core development. There's usually a chance to discuss all topics that
> > > > Pulsar community members bring up to discussion. A few times there
> > > > haven't been any participants and in that case, it's good to ask on
> > > > the #dev channel on Pulsar Slack whether others are joining the
> > > > meeting.
> > > >
> > > > >>
> > > > >> ok. btw. "metrics" doesn't necessarily mean providing the rate
> limiter
> > > > >> metrics via Prometheus. There might be other ways to provide this
> > > > >> information for components that could react to this.
> > > > >> For example, it could a be system topic where these rate limiters
> emit
> > > > events.
> > > > >>
> > > > >
> > > > > Are there any other system topics than
> > > > `tenent/namespace/__change_events` . While it's an improvement over
> > > > querying metrics, it would still mean one consumer per namespace and
> > > would
> > > > form a cyclic dependency - for example, in case a broker is
> degrading due
> > > > to mis-use of bursting, it might lead to delays in the consumption
> of the
> > > > event from the __change_events topic.
> > > >
> > > > The number of events is a fairly low volume so the possible
> > > > degradation wouldn't necessarily impact the event communications for
> > > > this purpose. I'm not sure if there are currently any public event
> > > > topics in Pulsar that are supported in a way that an application
> could
> > > > directly read from the topic.  However since this is an advanced
> case,
> > > > I would assume that we don't need to focus on this in the first phase
> > > > of improving rate limiters.
> > > >
> > > > >
> > > > >
> > > > >> I agree. I just brought up this example to ensure that your
> > > > >> expectation about bursting isn't about controlling the rate limits
> > > > >> based on situational information, such as end-to-end latency
> > > > >> information.
> > > > >> Such a feature could be useful, but it does complicate things.
> > > > >> However, I think it's good to keep this on the radar since this
> might
> > > > >> be needed to solve some advanced use cases.
> > > > >>
> > > > >
> > > > > I still envision auto-scaling to be admin API driven rather than
> > > produce
> > > > throughput driven. That way, it remains deterministic in nature. But
> it
> > > > probably doesn't make sense to even talk about it until (partition)
> > > > scale-down is possible.
> > > >
> > > >
> > > > I agree. It's better to exclude this from the discussion at the
> moment
> > > > so that it doesn't cause confusion. Modifying the rate limits up and
> > > > down automatically with some component could be considered out of
> > > > scope in the first phase. However, that might be controversial since
> > > > the externally observable behavior of rate limiting with bursting
> > > > support seems to be behave in a way where the rate limit changes
> > > > automatically. The "auto scaling" aspect of rate limiting, modifying
> > > > the rate limits up and down, might be necessary as part of a rate
> > > > limiter implementation eventually. More of that later.
> > > >
> > > > >
> > > > > In all of the 3 cases that I listed, the current behavior, with
> precise
> > > > rate limiting enabled, is to pause the netty channel in case the
> > > throughput
> > > > breaches the set limits. This eventually leads to timeout at the
> client
> > > > side in case the burst is significantly greater than the configured
> > > timeout
> > > > on the producer side.
> > > >
> > > >
> > > > Makes sense.
> > > >
> > > > >
> > > > > The desired behavior in all three situations is to have a
> multiplier
> > > > based bursting capability for a fixed duration. For example, it
> could be
> > > > that a pulsar topic would be able to support 1.5x of the set quota
> for a
> > > > burst duration of up to 5 minutes. There also needs to be a cooldown
> > > period
> > > > in such a case that it would only accept one such burst every X
> minutes,
> > > > say every 1 hour.
> > > >
> > > >
> > > > Thanks for sharing this concrete example. It will help a lot when
> > > > starting to design a solution which achieves this. I think that a
> > > > token bucket algorithm based design can achieve something very close
> > > > to what you are describing. In the first part I shared the references
> > > > to Bucket4J's "refill styles" and the concept of "dual token bucket"
> > > > algorithms.
> > > > One possible solution is a dual token bucket where the first bucket
> > > > takes care of the burst of up to 5 minutes. The second token bucket
> > > > can cover the maximum rate of 1.5x. The cooldown period could be
> > > > handled in a way where the token bucket fills up the bucket once in
> > > > every 60 minutes. There could be implementation challenges related to
> > > > adjusting the enforced rate up and down since as you had also
> > > > observed, the token bucket algorithm will let all buffered tokens be
> > > > used.
> > > >
> > > > >
> > > > >
> > > > >>
> > > > >>
> > > > >> >    - In a visitor based produce rate (where produce rate goes
> up in
> > > > the day
> > > > >> >    and goes down in the night, think in terms of popular website
> > > > hourly view
> > > > >> >    counts pattern) , there are cases when, due to certain
> > > > external/internal
> > > > >> >    triggers, the views - and thus - the produce rate spikes for
> a
> > > few
> > > > minutes.
> > > > >>
> > > > >> Again, please explain the current behavior and desired behavior.
> > > > >> Explicit example values of number of messages, bandwidth, etc.
> would
> > > > >> also be helpful details.
> > > > >
> > > > >
> > > > > Adding to what I wrote above, think of this pattern like the
> following:
> > > > the produce rate slowly increases from ~2MBps at around 4 AM to a
> known
> > > > peak of about 30MBps by 4 PM and stays around that peak until 9 PM
> after
> > > > which is again starts decreasing until it reaches ~2MBps around 2 AM.
> > > > > Now, due to some external triggers, maybe a scheduled sale event,
> at
> > > > 10PM, the quota may spike up to 40MBps for 4-5 minutes and then
> again go
> > > > back down to the usual ~20MBps . Here is a rough image showcasing the
> > > trend.
> > > > >
> > > > >
> > > > >>
> > > > >>
> > > > >> >    It is also important to keep this in check so as to not allow
> > > bots
> > > > to do
> > > > >> >    DDOS into your system, while that might be a responsibility
> of an
> > > > upstream
> > > > >> >    system like API gateway, but we cannot be ignorant about that
> > > > completely.
> > > > >>
> > > > >> what would be the desired behavior?
> > > > >
> > > > >
> > > > > The desired behavior is that the burst support should be short
> lived
> > > > (5-10 minutes) and limited to a fixed number of bursts in a duration
> > > (say -
> > > > 1 burst per hour). Obviously, all of these should be configurable,
> maybe
> > > at
> > > > a broker level and not a topic level.
> > > >
> > > >
> > > > That can be handled with a dual token bucket algorithm based rate
> > > limiter.
> > > >
> > > > >
> > > > > I have yet to go in depth on both the textbook rate limiter
> models, but
> > > > if token bucket works on buffered tokens, it may not be much useful.
> > > > Consider a situation where the throughput is always at its absolute
> > > > configured peak, thus not reserving tokens at all. Again - this is
> the
> > > > understanding based on the above text. I am yet to go through the
> wiki
> > > page
> > > > with full details. In any case, I am very sure that an absolute
> maximum
> > > > bursting limit is needed per topic, since at the end of the day, we
> are
> > > > limited by hardware resources like network bandwidth, disk bandwidth
> etc.
> > > >
> > > >
> > > > Yes, if the actual throughput is always at the rate limiter's maximum
> > > > average rate, the tokens wouldn't be reserved at all in the default
> > > > token bucket algorithm. Is that a problem?
> > > > It might not be a problem at all. Spare tokens would be buffered
> > > > whenever the throughput is lower and therefore bursting could help
> > > > catch up to the average rate over a longer period of time.
> > > >
> > > > Btw. The challenge of no bursting when the actual throughput is
> > > > already at the peak rate is the original reason why I introduced the
> > > > concept of automatically adjusting the rate limit up and down based
> on
> > > > end-to-end latency information. If the rate limiter had a feedback
> > > > loop, it could know whether it needs to do bursting to prevent the
> > > > latencies from increasing. That was one of the reasons why I was
> > > > thinking about the automatic adjustments of the rate limit. However,
> > > > it's better to keep that out of the first phase for now.
> > > >
> > > > I believe that we will find many ways to apply the token bucket
> > > > algorithm to meet our requirements. A lot of flexibility can be added
> > > > with dual token buckets and by introducing new ways of adding/filling
> > > > tokens to the buckets. It's also possible to tweak a lot of small
> > > > details if that is necessary, such as smoothening out burst rates and
> > > > so on. However, it's better to keep things simple when possible.
> > > >
> > > >
> > > > > Suppose the SOP for bursting support is to support a burst of upto
> 1.5x
> > > > produce rate for upto 10 minutes, once in a window of 1 hour.
> > > > >
> > > > > Now if a particular producer is trying to burst to or beyond 1.5x
> for
> > > > more than 5 minutes - while we are not accepting the requests and the
> > > > producer client is continuously going into timeouts, there should be
> a
> > > way
> > > > to completely blacklist that producer so that even the client stops
> > > > attempting to produce and we unload the topics all together - thus,
> > > > releasing open connections and broker compute.
> > > >
> > > >
> > > > It seems that this part of the requirements is more challenging to
> > > > cover. We don't have to give up on trying to make progress on this
> > > > side of your requirements. It could be helpful to do thought
> > > > experiments on how some of the conditions could be detected. For
> > > > example, how would you detect if a particular producer is going
> beyond
> > > > the configured policy? It does seem intriguing to start adding the
> > > > support to the rate limiting area. The rate limiter has the
> > > > information about when bursting is actively happening. I would favor
> a
> > > > solution where the rate limiter could only emit events and not make
> > > > decisions about blocking a producer. However, I'm open to changing my
> > > > opinion on that detail of the design if it turns out to be a common
> > > > requirement, and it fits the overall architecture of the broker.
> > > >
> > > > The information about timeouts is on the client side. Perhaps there
> > > > are ways to pass the timeout configuration information along with the
> > > > producer creation, for example. The Pulsar binary protocol could have
> > > > some sort of clock sync so that client and broker clock difference
> > > > could be calculated with sufficient accuracy. However, the broker
> > > > cannot know when messages timeout that aren't at the broker at all.
> If
> > > > the end-to-end latency of sent messages start approaching the timeout
> > > > value, it's a good indication on the broker side that the client side
> > > > is approaching the state where messages will start timing out.
> > > >
> > > > In general, I think we should keep these requirements covered in the
> > > > design so that the rate limiter implementation could be extended to
> > > > support the use case in a way or another in the later phases.
> > > >
> > > >
> > > > >> > It's only async if the producers use `sendAsync` . Moreover,
> even if
> > > > it is
> > > > >> > async in nature, practically it ends up being quite linear and
> > > > >> > well-homogenised. I am speaking from experience of running
> 1000s of
> > > > >> > partitioned topics in production
> > > > >>
> > > > >> I'd assume that most high traffic use cases would use asynchronous
> > > > >> sending in a way or another. If you'd be sending synchronously, it
> > > > >> would be extremely inefficient if messages are sent one-by-one.
> The
> > > > >> asyncronouscity could also come from multiple threads in an
> > > > >> application.
> > > > >>
> > > > >> You are right that a well homogenised workload reduces the
> severity of
> > > > >> the multiplexing problem. That's also why the problem of
> multiplexing
> > > > >> hasn't been fixed since the problem isn't visibile and it's also
> very
> > > > >> hard to observe.
> > > > >> In a case where 2 producers with very different rate limiting
> options
> > > > >> share a connection, it definetely is a problem in the current
> > > > >
> > > > >
> > > > > Yes actually, I was talking to the team and we did observe a case
> where
> > > > there was a client app with 2 producer objects writing to two
> different
> > > > topics. When one of their topics was breaching quota, the other one
> was
> > > > also observing rate limiting even though it was under quota. So
> > > eventually,
> > > > this surely needs to be solved. One short term solution is to at
> least
> > > not
> > > > share connections across producer objects. But I still feel it's out
> of
> > > > scope of this discussion here.
> > > >
> > > > Thanks for sharing. The real stories from operating Pulsar at scale
> > > > are always interesting to hear!
> > > >
> > > > > What do you think about the quick solution of not sharing
> connections
> > > > across producer objects? I can raise a github issue explaining the
> > > > situation.
> > > >
> > > > It should be very straightforward to add a flag to producer and
> > > > consumer builder to use an isolated (non-shared) broker connection.
> > > > It's not optimal to add such flags, but I haven't heard of other ways
> > > > to solve this challenge where the Pulsar binary protocol doesn't have
> > > > to be modified or the backpressure solution revisited in the broker.
> > > > The proper solution requires doing both, adding a permit-based flow
> > > > control for producers and revisiting the broker side
> backpressure/flow
> > > > control and therefore it won't happen quickly.
> > > > Please go ahead and create a GH issue and share your context. That
> > > > will be very helpful.
> > > >
> > > > Some thoughts about proceeding to making things a reality so that it
> > > > gets implemented.
> > > >
> > > > One important part of development will be how we could test or
> > > > simulate a scenario and validate that the rate limiting algorithm
> > > > makes sense.
> > > > On the lowest level of unit tests, it will be helpful to have a
> > > > solution where the clock source used in unit test can be run quickly
> > > > so that simulated scenarios don't take a long time to run, but could
> > > > cover the essential features.
> > > >
> > > > It might be hard to get started on the rate limiter implementation by
> > > > looking at the existing code. The reason of the double methods in the
> > > > existing interface is due to it covering 2 completely different ways
> > > > of handling the rate limiting and trying to handle that in a single
> > > > concept. What is actually desired for at least the producing rate
> > > > limiting is that it's an asynchronous rate limiting where any
> messages
> > > > that have already arrived to the broker will be handled. The token
> > > > count could go to a negative value when implementing this in the
> token
> > > > bucket algorithm. If it goes below 0, the producers for the topic
> > > > should be backpressured by toggling the auto-read state and it should
> > > > schedule a job to resume the auto-reading after there are at least a
> > > > configurable amount of tokens available. There should be no need to
> > > > use the scheduler to add tokens to the bucket. Whenever tokens are
> > > > used, the new token count can be calculated based on the time since
> > > > tokens were last updated, by default the average max rate defines how
> > > > many tokens are added. The token count cannot increase larger than
> the
> > > > token bucket capacity. This is how simple a plain token bucket
> > > > algorithm is. That could be a starting point until we start covering
> > > > the advanced cases which require a dual token bucket algorithm.
> > > > If someone has the bandwidth, it would be fine to start experimenting
> > > > in this area with code to learn more.
> > > >
> > > > After trying something out, it's easier to be more confident about
> the
> > > > design of dual token bucket algorithm. That's one of the assumptions
> > > > that I'm making that dual token bucket would be sufficient.
> > > >
> > > > This has been a great discussion so far and it feels like we are
> > > > getting to the necessary level of detail.
> > > > Thanks for sharing many details of your requirements and what you
> have
> > > > experienced and learned when operating Pulsar at scale.
> > > > I feel excited about the upcoming improvements for rate limiting in
> > > > Pulsar. From my side I have limited availability for participating in
> > > > this work for the next 2-3 weeks. After that, I'll be ready to help
> in
> > > > making the improved rate limiting a reality.
> > > >
> > > > -Lari
> > > >
> > >
> >
> >
> > --
> > Girish Sharma
>


-- 
Girish Sharma

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

Reply via email to