Neither of those are global limits of course. I thought adding the global limit when I added the netty support. At the time the problem was mostly around a few slow connections, and adding a global limit (and thus global accounting) felt like overkill. Of course we could add that on.
On Mon, Nov 4, 2024 at 11:39 AM Bryan Beaudreault <bbeaudrea...@apache.org> wrote: > We have limits at the netty later, and also in the RSRpcServices layer > specifically for multigets. The two pieces that come to mind are: > > - Netty layer: https://issues.apache.org/jira/browse/HBASE-27947 > - RSRpcServices: https://issues.apache.org/jira/browse/HBASE-27570 > > Note the second one is not the most direct issue, but one where I did some > work in that area and the one I could most easily find right now. You might > have to follow the code to find more info around MultiActionResultTooLarge. > Also, I think we're missing something around similar protection for > multi-puts > > On Mon, Nov 4, 2024 at 11:26 AM Nick Dimiduk <ndimi...@apache.org> wrote: > >> Heya team, >> >> We hit some production troubles pertaining to clients sending very >> large multi-gets. Even with otherwise reasonable cell- and row-size >> limits, even with maximum multi-action sizes in place, even with QoS >> and our fancy IO-based Quotas, the pressure was enough to push over a >> region server or three. It got me thinking that we need some kind of >> pressure gauge in the RPC layer that can protect the RS. This wouldn't >> be a QoS or Quota kind of feature, it's not about fairness between >> tenants, rather it's a safety mechanism, a kind of pressure valve. I >> wonder if something like this already exists or maybe you know of a >> ticket already filed with some existing discussion. >> >> My napkin-sketch is something like a metric that tracks the amount of >> heap size consumed by active request and response objects. When the >> metrics hits a limit, we start to reject new requests with a retryable >> exception. I don't know if we want the overhead of tracking this value >> exactly, so maybe the value is populated only by new requests and then >> we have some crude mechanism of decay. Does Netty already have >> something like this? I'd say this is in lieu of an actual streaming >> RPC harness, but I think even a streaming system would benefit from >> such a backpressure strategy. >> >> It occurs to me that I don't know the current state of active memory >> tracking in the region server. I recall there was some work to make >> memstore and blockcache resize dynamically. Maybe this new system adds >> a 3rd component to the computation. >> >> Thoughts? Ideas? >> >> Thanks, >> Nick >> >