grssam commented on code in PR #23398:
URL: https://github.com/apache/pulsar/pull/23398#discussion_r1794737770


##########
pip/pip-385.md:
##########
@@ -0,0 +1,348 @@
+# PIP-385: Add rate limit semantics to pulsar protocol and Java client
+
+<details>
+  <summary><h2>Table of Contents</h2></summary>
+
+- [PIP-38X: Add rate limit semantics to pulsar 
protocol](#pip-38x-add-rate-limit-semantics-to-pulsar-protocol)
+- [Background knowledge](#background-knowledge)
+    * [Challenges with the current 
approach](#challenges-with-the-current-approach)
+- [Motivation](#motivation)
+- [Goals](#goals)
+    * [In Scope](#in-scope)
+    * [Out of Scope](#out-of-scope)
+- [High Level Design](#high-level-design)
+    * [New binary protocol commands](#new-binary-protocol-commands)
+    * [Java client changes](#java-client-changes)
+- [Detailed Design](#detailed-design)
+    * [High-level Implementation Details](#high-level-implementation-details)
+        + [Broker Changes](#broker-changes)
+        + [Determining the throttling duration for 
clients](#determining-the-throttling-duration-for-clients)
+        + [Java Client Changes](#java-client-changes-1)
+    * [Public-facing Changes](#public-facing-changes)
+        + [Binary Protocol](#binary-protocol)
+        + [Java Client](#java-client)
+        + [Configuration](#configuration)
+        + [Metrics](#metrics)
+- [Backward & Forward Compatibility](#backward-forward-compatibility)
+    * [Upgrade / Downgrade / Rollback](#upgrade-downgrade-rollback)
+    * [Lower Protocol Client](#lower-protocol-client)
+    * [Lower Protocol Server](#lower-protocol-server)
+- [Links](#links)
+
+</details>
+
+# Background knowledge
+
+Being a multi tenant system, pulsar supports quality of service constructs 
like topic quotas in bytes per second and
+qps. On top of this, the fact that one broker has only certain limited 
resources, it has to additionally implement some
+other controls to limit the resources usage, like how much message buffer it 
has, etc.
+
+As such, pulsar induces throttling at multiple levels. Just looking at publish 
level throttling, here are the various
+levers that we can configure in pulsar which enables us to rate limit a 
producer, topic or an entire connection from a
+client:
+
+* At the core of it, we can set topic level publish rate in bytes and/or 
messages per second.
+* We can create a resource group (combination of one or more namespaces or 
tenants) and set a publish-rate for that
+  resource group.
+* We can set a broker config to throttle based on pending messages at a 
connection level.
+  See 
[maxPendingPublishRequestsPerConnection](https://github.com/apache/pulsar/blob/4b3b273c1c57741f9f9da2118eb4ec5dfeee2220/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java#L750)
+* We can set a broker config to throttle based on message buffer size at a 
thread level.
+  See 
[maxMessagePublishBufferSizeInMB](https://github.com/apache/pulsar/blob/4b3b273c1c57741f9f9da2118eb4ec5dfeee2220/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java#L1431C17-L1431C49)
+* We can set a broker level maximum publish rate per broker in bytes and/or 
messages.
+
+Currently, the way pulsar uses these levers and enforces these limits is by 
pausing reading further messages from an
+established connection for a topic. This is transparent to the clients, and 
they continue to publish further messages
+with an increased observed latency. Once the publish-rates are within the 
limits, broker resumes reading from the
+connection.
+
+Here is a small illustration to demonstrate the situation:
+
+```mermaid
+%%{init: {"mirrorActors": false, "rightAngles": false} }%%
+sequenceDiagram
+    Client->>Server: CreateProducer(reqId, myTopic)
+    Note right of Server: Check Authorization
+    Server-->>Client: ProducerSuccess(reqId, producerName)
+    Activate Client
+    Activate Server
+    Client->>Server: Send(1, message1)
+    Client->>Server: Send(2, message2)
+    Server-->>Client: SendReceipt(1, msgId1)
+    Client->>Server: Send(3, message3)
+    Client->>Server: Send(4, message4)
+    Note right of Server: Topic breaching quota
+    Activate Server
+    note right of Server: TCP channel read paused
+    Client-xServer: Send(5, message5)
+    Server-->>Client: SendReceipt(2,msgId2)
+    Server-->>Client: SendReceipt(3,msgId3)
+    Client-xServer: Send(6, message6)
+    Server-->>Client: SendReceipt(4,msgId4)
+    note right of Server: TCP channel read resumed
+    deactivate Server
+    Server-->>Server: read message 5
+    Server-->>Server: read message 6
+    Client->>Server: Send(7, message7)
+    Server-->>Client: SendReceipt(5,msgId5)
+    Server-->>Client: SendReceipt(6,msgId6)
+    Server-->>Client: SendReceipt(7,msgId7)
+
+    Client->>Server: CloseProducer(producerId, reqId)
+    Server-->>-Client: Success(reqId)
+    deactivate Client
+```
+
+## Challenges with the current approach
+
+The current approach may look perfectly fine when looking at the above 
example, but when looked from a wider scope,
+things start looking bad.
+Typically, the clients reuse a single TCP connection from the client to a 
broker to send messages to multiple topics.
+This is controlled by the client side property
+of 
[connectionsPerBroker](https://github.com/apache/pulsar/blob/4b3b273c1c57741f9f9da2118eb4ec5dfeee2220/pulsar-client/src/main/java/org/apache/pulsar/client/impl/conf/ClientConfigurationData.java#L135)
+which defaults to 1. The situation is worsened by the fact that typically, a 
client is used to create producers for
+partitioned topics and generally an application may produce to more than one 
partitioned topic with the producers
+created from the same client object, thus all sharing the same tcp connection.
+
+In this situation, even when a single topic starts breaching the quota, the 
entire TCP connection is paused leading to
+a noisy neighbour effect where effectively all the topics that the client is 
producing to start getting throttled and
+observe high latencies.
+
+# Motivation
+
+The current method of inducing throttling when a topic or connection breaches 
quota has various challenges:
+
+* **Noisy neighbors** - Even if one topic is exceeding the quota, since the 
entire channel read is paused, all topics

Review Comment:
   @rdhabalia - gentle reminder . Would love to get a closure here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to