We have limits at the netty later, and also in the RSRpcServices layer
specifically for multigets. The two pieces that come to mind are:

- Netty layer: https://issues.apache.org/jira/browse/HBASE-27947
- RSRpcServices: https://issues.apache.org/jira/browse/HBASE-27570

Note the second one is not the most direct issue, but one where I did some
work in that area and the one I could most easily find right now. You might
have to follow the code to find more info around MultiActionResultTooLarge.
Also, I think we're missing something around similar protection for
multi-puts

On Mon, Nov 4, 2024 at 11:26 AM Nick Dimiduk <ndimi...@apache.org> wrote:

> Heya team,
>
> We hit some production troubles pertaining to clients sending very
> large multi-gets. Even with otherwise reasonable cell- and row-size
> limits, even with maximum multi-action sizes in place, even with QoS
> and our fancy IO-based Quotas, the pressure was enough to push over a
> region server or three. It got me thinking that we need some kind of
> pressure gauge in the RPC layer that can protect the RS. This wouldn't
> be a QoS or Quota kind of feature, it's not about fairness between
> tenants, rather it's a safety mechanism, a kind of pressure valve. I
> wonder if something like this already exists or maybe you know of a
> ticket already filed with some existing discussion.
>
> My napkin-sketch is something like a metric that tracks the amount of
> heap size consumed by active request and response objects. When the
> metrics hits a limit, we start to reject new requests with a retryable
> exception. I don't know if we want the overhead of tracking this value
> exactly, so maybe the value is populated only by new requests and then
> we have some crude mechanism of decay. Does Netty already have
> something like this? I'd say this is in lieu of an actual streaming
> RPC harness, but I think even a streaming system would benefit from
> such a backpressure strategy.
>
> It occurs to me that I don't know the current state of active memory
> tracking in the region server. I recall there was some work to make
> memstore and blockcache resize dynamically. Maybe this new system adds
> a 3rd component to the computation.
>
> Thoughts? Ideas?
>
> Thanks,
> Nick
>

Reply via email to