We have limits at the netty later, and also in the RSRpcServices layer specifically for multigets. The two pieces that come to mind are:
- Netty layer: https://issues.apache.org/jira/browse/HBASE-27947 - RSRpcServices: https://issues.apache.org/jira/browse/HBASE-27570 Note the second one is not the most direct issue, but one where I did some work in that area and the one I could most easily find right now. You might have to follow the code to find more info around MultiActionResultTooLarge. Also, I think we're missing something around similar protection for multi-puts On Mon, Nov 4, 2024 at 11:26 AM Nick Dimiduk <ndimi...@apache.org> wrote: > Heya team, > > We hit some production troubles pertaining to clients sending very > large multi-gets. Even with otherwise reasonable cell- and row-size > limits, even with maximum multi-action sizes in place, even with QoS > and our fancy IO-based Quotas, the pressure was enough to push over a > region server or three. It got me thinking that we need some kind of > pressure gauge in the RPC layer that can protect the RS. This wouldn't > be a QoS or Quota kind of feature, it's not about fairness between > tenants, rather it's a safety mechanism, a kind of pressure valve. I > wonder if something like this already exists or maybe you know of a > ticket already filed with some existing discussion. > > My napkin-sketch is something like a metric that tracks the amount of > heap size consumed by active request and response objects. When the > metrics hits a limit, we start to reject new requests with a retryable > exception. I don't know if we want the overhead of tracking this value > exactly, so maybe the value is populated only by new requests and then > we have some crude mechanism of decay. Does Netty already have > something like this? I'd say this is in lieu of an actual streaming > RPC harness, but I think even a streaming system would benefit from > such a backpressure strategy. > > It occurs to me that I don't know the current state of active memory > tracking in the region server. I recall there was some work to make > memstore and blockcache resize dynamically. Maybe this new system adds > a 3rd component to the computation. > > Thoughts? Ideas? > > Thanks, > Nick >