gortiz commented on code in PR #18519:
URL: https://github.com/apache/pinot/pull/18519#discussion_r3274449091


##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java:
##########
@@ -2167,6 +2167,73 @@ public static class MultiStageQueryRunner {
     public static final String KEY_OF_MAX_INBOUND_QUERY_DATA_BLOCK_SIZE_BYTES 
= "pinot.query.runner.max.msg.size.bytes";
     public static final int DEFAULT_MAX_INBOUND_QUERY_DATA_BLOCK_SIZE_BYTES = 
16 * 1024 * 1024;
 
+    /**
+     * Whether the sender side of every {@code GrpcSendingMailbox} respects 
gRPC client-side flow control by waiting
+     * on {@code ClientCallStreamObserver.isReady()} before pushing each chunk.
+     *
+     * <p>Default {@code true}. Set to {@code false} to restore the pre-1.6 
behaviour where the sender pushes
+     * unconditionally; useful as a production kill-switch if the gate causes 
an unexpected regression, and as an
+     * A/B knob for benchmarks (see {@code BenchmarkGrpcMailboxSend}).
+     *
+     * <p>Disabling this flag is what re-introduces the {@code 
OutOfDirectMemoryError} failure mode the gate exists
+     * to prevent. It is here as a safety valve, not as a recommended setting.
+     */
+    public static final String KEY_OF_GRPC_SENDER_BACKPRESSURE_ENABLED =
+        "pinot.query.runner.grpc.sender.backpressure.enabled";
+    public static final boolean DEFAULT_GRPC_SENDER_BACKPRESSURE_ENABLED = 
true;
+
+    /**
+     * Per-stream HTTP/2 flow control window, in bytes. The receiver 
advertises this value to the sender as
+     * the number of bytes it will accept before requiring a `WINDOW_UPDATE` 
frame. Wider windows let the
+     * sender push a whole `MseBlock` without {@link 
io.grpc.stub.ClientCallStreamObserver#isReady} flipping
+     * mid-block. Applied via `NettyServerBuilder.flowControlWindow` in 
`GrpcMailboxServer`.
+     *
+     * <p>This is per HTTP/2 stream, so total inbound buffering at the 
receiver scales as
+     * {@code value × #concurrent streams to this server}.
+     */
+    public static final String KEY_OF_GRPC_FLOW_CONTROL_WINDOW_BYTES =
+        "pinot.query.runner.grpc.flow.control.window.bytes";
+    public static final int DEFAULT_GRPC_FLOW_CONTROL_WINDOW_BYTES = 64 * 1024 
* 1024;

Review Comment:
   A better explaination from Claude:
   
     - flowControlWindow (64 MB): per-stream HTTP/2 inbound byte window the 
receiver advertises. Caps bytes in flight per stream
     before the sender has to wait for WINDOW_UPDATE.
     - writeBufferHighWaterMark (64 MB): per-channel Netty outbound queue size 
on the sender. Once the queue grows past this,
     Channel.isWritable() flips false and the application gate parks.
     - writeBufferLowWaterMark (32 MB): the drain threshold that flips 
isWritable() back to true.
   
     flowControlWindow and writeBufferHighWaterMark are aligned at 64 MB by 
design — they cap roughly the same conceptual thing (one peer's worth of 
in-flight bytes) from the two ends of the wire. We could collapse them into a 
single config, but I prefer keeping them separate so operators can tune them 
independently if their workload calls for it (e.g. an asymmetric one-way 
fan-in).
   
     writeBufferLowWaterMark intentionally is not equal to high — they're a 
hysteresis pair, not a single threshold. Netty marks the channel unwritable 
when the outbound queue grows above high, and writable again only when it 
drains below low. If we set them equal, the channel would oscillate between 
writable/unwritable on every byte once the queue is around that size, forcing 
constant park/wake on the application gate under sustained pressure. 
Half-of-high is the standard hysteresis ratio.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to