Neither of those are global limits of course. I thought adding the global
limit when I added the netty support. At the time the problem was
mostly around a few slow connections, and adding a global limit (and thus
global accounting) felt like overkill. Of course we could add that on.

On Mon, Nov 4, 2024 at 11:39 AM Bryan Beaudreault <bbeaudrea...@apache.org>
wrote:

> We have limits at the netty later, and also in the RSRpcServices layer
> specifically for multigets. The two pieces that come to mind are:
>
> - Netty layer: https://issues.apache.org/jira/browse/HBASE-27947
> - RSRpcServices: https://issues.apache.org/jira/browse/HBASE-27570
>
> Note the second one is not the most direct issue, but one where I did some
> work in that area and the one I could most easily find right now. You might
> have to follow the code to find more info around MultiActionResultTooLarge.
> Also, I think we're missing something around similar protection for
> multi-puts
>
> On Mon, Nov 4, 2024 at 11:26 AM Nick Dimiduk <ndimi...@apache.org> wrote:
>
>> Heya team,
>>
>> We hit some production troubles pertaining to clients sending very
>> large multi-gets. Even with otherwise reasonable cell- and row-size
>> limits, even with maximum multi-action sizes in place, even with QoS
>> and our fancy IO-based Quotas, the pressure was enough to push over a
>> region server or three. It got me thinking that we need some kind of
>> pressure gauge in the RPC layer that can protect the RS. This wouldn't
>> be a QoS or Quota kind of feature, it's not about fairness between
>> tenants, rather it's a safety mechanism, a kind of pressure valve. I
>> wonder if something like this already exists or maybe you know of a
>> ticket already filed with some existing discussion.
>>
>> My napkin-sketch is something like a metric that tracks the amount of
>> heap size consumed by active request and response objects. When the
>> metrics hits a limit, we start to reject new requests with a retryable
>> exception. I don't know if we want the overhead of tracking this value
>> exactly, so maybe the value is populated only by new requests and then
>> we have some crude mechanism of decay. Does Netty already have
>> something like this? I'd say this is in lieu of an actual streaming
>> RPC harness, but I think even a streaming system would benefit from
>> such a backpressure strategy.
>>
>> It occurs to me that I don't know the current state of active memory
>> tracking in the region server. I recall there was some work to make
>> memstore and blockcache resize dynamically. Maybe this new system adds
>> a 3rd component to the computation.
>>
>> Thoughts? Ideas?
>>
>> Thanks,
>> Nick
>>
>

Reply via email to