Re: [DISCUSS] PIP-310: Support custom publish rate limiters

Lari Hotari Tue, 07 Nov 2023 01:20:19 -0800

Hi Girish,

I think we are starting to get into the concrete details of rate
limiters and how we could start improving the existing feature.
It is very helpful that you are sharing your insight and experience of
operating Pulsar at scale.
Replies inline.

On Mon, 6 Nov 2023 at 15:37, Girish Sharma <scrapmachi...@gmail.com> wrote:
> Is this every thursday? I am willing to meet at a separate time as well if
> enough folks with a viewpoint on this can meet together. I assume that the
> community meeting has a much bigger agenda with detailed discussions not
> possible?

It is bi-weekly on Thursdays. The meeting calendar, zoom link and
meeting notes can be found at
https://github.com/apache/pulsar/wiki/Community-Meetings .

> It is all good, as long as the final goal is met within reasonable
> timelines.

+1

> Well, the blacklisting use case is a very specific use case. I am
> explaining below why that can't be done using metrics and a separate
> blacklisting API.

ok. btw. "metrics" doesn't necessarily mean providing the rate limiter
metrics via Prometheus. There might be other ways to provide this
information for components that could react to this.
For example, it could a be system topic where these rate limiters emit events.

> This actually might be a blessing in disguise, at least for RateLimiter and
> PublishRateLimiter.java, being an internal interface, it has gone out of
> hand and unchecked. Explained more below.

I agree. It is hard to reason about the existing solution.

> I would like to keep auto-scaling out of scope for this discussion. That
> opens up another huge can of worms, specially given the gaps in proper
> scale down support in pulsar.

I agree. I just brought up this example to ensure that your
expectation about bursting isn't about controlling the rate limits
based on situational information, such as end-to-end latency
information.
Such a feature could be useful, but it does complicate things.
However, I think it's good to keep this on the radar since this might
be needed to solve some advanced use cases.

> > I don't know what "bursting" means for you. Would it be possible to
> > provide concrete examples of desired behavior? That would be very
> > helpful in making progress.
> >
> >
> Here are a few different use cases:

These use cases clarify your requirements a lot. Thanks for sharing.

>    - A producer(s) is producing at a near constant rate into a topic, with
>    equal distribution among partitions. Due to a hiccup in their downstream
>    component, the produce rate goes to 0 for a few seconds, and thus, to
>    compensate, in the next few seconds, the produce rate tries to double up.

Could you also elaborate on details such as what is the current
behavior of Pulsar rate limiting / throttling solution and what would
be the desired behavior?
Just guessing that you mean that the desired behavior would be to
allow the produce rate to double up for some time (configurable)?
Compared to what rate is it doubled?
Please explain in detail what the current and desired behaviors would
be so that it's easier to understand the gap.

>    - In a visitor based produce rate (where produce rate goes up in the day
>    and goes down in the night, think in terms of popular website hourly view
>    counts pattern) , there are cases when, due to certain external/internal
>    triggers, the views - and thus - the produce rate spikes for a few minutes.

Again, please explain the current behavior and desired behavior.
Explicit example values of number of messages, bandwidth, etc. would
also be helpful details.

>    It is also important to keep this in check so as to not allow bots to do
>    DDOS into your system, while that might be a responsibility of an upstream
>    system like API gateway, but we cannot be ignorant about that completely.

what would be the desired behavior?

>    - In streaming systems, where there are micro batches, there might be
>    constant fluctuations in produce rate from time to time, based on batch
>    failure or retries.

could you share and examples with numbers about this use case too?
explaining current behavior and desired behavior?

>
> In all of these situations, setting the throughput of the topic to be the
> absolute maximum of the various spikes observed during the day is very
> suboptimal.

btw. A plain token bucket algorithm implementation doesn't have an
absolute maximum. The maximum average rate is controlled with the rate
of tokens added to the bucket. The capacity of the bucket controls how
much buffer there is to spend on spikes. If there's a need to also set
an absolute maximum rate, 2 token buckets could be chained to handle
that case. The second rate limiter could have an average rate of the
absolute maximum with a relatively small buffer (token bucket
capacity). There might also be more sophisticated algorithms which
vary the maximum rate in some way to smoothen spikes, but that might
just be completely unnecessary in Pulsar rate limiters.
Switching to use a conceptual model of token buckets will make it a
lot easier to reason about the implementation when we come to that
point. In the design we can try to "meet in the middle" where the
requirements and the design and implementation meet so that the
implementation is as simple as possible.
A token bucket is very simple and it can be implemented in a way where
no timer is used to add tokens to the bucket. You only need a timer
(scheduler) to schedule to unblock the flow once the token has run out
of tokens. Under a normal case where the traffic doesn't exceed the
limits, it's possible to calculate the update to the number of tokens
each time before tokens are consumed from the bucket. I know I'm
jumping too much ahead already. The reason for this is that we aren't
really dealing with a very complex domain. Since the scope of adding
bursting support to rate limiting is not broad, this could be
implemented in a short time frame and we can really make this happen.

> Moreover, in each of these situations, once bursting support is present in
> the system, it would also need to have proper checks in place to penalize
> the producers from trying to mis-use the system. In a true multi-tenant
> platform, this is very critical. Thus, blacklisting actually goes hand in
> hand here. Explained more below.

I don't yet fully understand this part. What would be an example of a
case where a producer should be penalized?
When would an attempt to mis-use happen and why? How would you detect
mis-use and how would you penalize it?

> > It's interesting that you mention that you would like to improve the
> > PublishRateLimiter interface.
> > How would you change it?
> >
> >
> The current interface of PublishRateLimiter has duplicate methods. I am
> assuming after an initial implementation (poller), the next implementation
> simply added more methods into the interface rather than actually using the
> ones already existing.
> For instance, there are both `tryAcquire` and `incrementPublishCount`
> methods, there are both `checkPublishRate` and `isPublishRateExceeded`.
> Then there is the issue of misplaced responsibilities where when its
> precise, the complete responsibility of checking and responding back with
> whether its rate limited or not lies with the `PrecisePublishLimiter.java`
> but when its poller based, there is some logic inside
> `PublishRateLimiterImpl` and rest of the logic is spread across
> `AbstractTopic.java`

+100. Yes, we will need to deal with this. The abstractions are poor
and need to be replaced. I'd rather have a single abstraction for rate
limiting by messages/bytes and have it support bursting configuration.
There shouldn't be a need to have a separate polling implementation
and a "precise" implementation. Those are smells of technical debt
that needs to be addressed.
Proper abstractions don't leak implementation details such ad
"polling" or "precise".

> > short and medium term blacklisting of topics based on breach of rate
> > > limiter beyond a given SOP. I feel this is very very specific to our
> > > organization right now to be included inside pulsar itself.
> >
> > This is outside of rate limiters. IIRC, there have been some
> > discussions in the community that an API for blocking individual
> > producers or producing to specific topics could be useful in some
> > cases.
> > An external component could observe metrics and control blocking if
> > there's an API for doing so.
> >
> >
> Actually, putting this in an external component that's based off of metrics
> is not a scalable or responsive solution.
> First of all, it puts a lot of pressure on the metrics system (prometheus)
> where we are now querying 1000s of metrics every minute/sub-minute
> uselessly. Since majority of the time, not even a single topic may need
> blacklisting, this is very very inefficient.
> Secondly, it makes the design such that this external component now needs
> to be in sync about the existing topics and their rate set inside the
> pulsar's zookeeper. This also puts extra pressure on zk-reads.
> Lastly, the response time for blacklisting the topic increases a lot in
> this approach.

I commented about this above. "Metrics" could mean using a system
topic for rate limiter events and that could eliminate some of your
concerns.

> This would be a much simpler and efficient model if it were reactive,
> based/triggered directly from within the rate limiter component. It can be
> fast and responsive, which is very critical when trying to prevent the
> system from abuse.

I agree that it could be simpler and efficient from some perspectives.
Do you have examples of what type of behaviors would you like to have
for handling this?
A concrete example with numbers could be very useful.

> It's only async if the producers use `sendAsync` . Moreover, even if it is
> async in nature, practically it ends up being quite linear and
> well-homogenised. I am speaking from experience of running 1000s of
> partitioned topics in production

I'd assume that most high traffic use cases would use asynchronous
sending in a way or another. If you'd be sending synchronously, it
would be extremely inefficient if messages are sent one-by-one. The
asyncronouscity could also come from multiple threads in an
application.

You are right that a well homogenised workload reduces the severity of
the multiplexing problem. That's also why the problem of multiplexing
hasn't been fixed since the problem isn't visibile and it's also very
hard to observe.
In a case where 2 producers with very different rate limiting options
share a connection, it definetely is a problem in the current
solution. Similarly if there's a message processing application where
the consumer and producer share the connection. It's not necessarily a
big problem since backpressure happens in different directions of the
traffic. However, there are some reports about actual problems with
this. For example, Tao Jiuming brought up some of this in his reply to
the thread [1].

1 - https://lists.apache.org/thread/pz42sokfzgl9fwnmqp6np0q5w6y1vgvv

> I am happy if the discussion concludes eitherways - a pluggable
> implementation or a 99% use-case capturing configurable rate limiter. But
> from what I've seen, the participation in OSS threads can be very random
> and thus, I am afraid it might take a while before more folks pitch in
> their inputs and a clear direction of discussion is formed.

Let's make this happen! Looking forward to hearing more about the
detailed examples of current behavior and desired behavior.

-Lari

Re: [DISCUSS] PIP-310: Support custom publish rate limiters

Reply via email to