I have observed the issue that Matteo describes and I also attributed the 
problem to the absence of a back pressure mechanism in the client. Issue #2497 
was not about that, though. There was some corruption going on that was leading 
to the server receiving garbage.

-Flavio 

> On 8 Jan 2021, at 22:47, Matteo Merli <mme...@apache.org> wrote:
> 
> On Fri, Jan 8, 2021 at 8:27 AM Enrico Olivelli <eolive...@gmail.com> wrote:
>> 
>> Hi Matteo,
>> in this comment you are talking about an issue you saw when WQ is greater 
>> that AQ
>> https://github.com/apache/bookkeeper/issues/2497#issuecomment-734423246
>> 
>> IIUC you are saying that if one bookie is slow the client continues to 
>> accumulate references to the entries that still have not received the 
>> confirmation from it.
>> I think that this is correct.
>> 
>> Have you seen problems in production related to this scenario ?
>> Can you tell more about them ?
> 
> Yes, for simplicity, assume e=3, w=3, a=2.
> 
> If one bookie is slow (not down, just slow), the BK client will the
> acks to the user that the entries are written after the first 2 acks.
> In the meantime, it will keep waiting for the 3rd bookie to respond.
> If the bookie responds within the timeout, the entries can now be
> dropped from memory, otherwise the write will timeout internally and
> it will get replayed to a new bookie.
> 
> In both cases, the amount of memory used in the client will max at
> "throughput" * "timeout". This can be a large amount of memory and
> easily cause OOM errors.
> 
> Part of the problem is that it cannot be solved from outside the BK
> client, since there's no visibility on what entries have 2 or 3 acks
> and therefore it's not possible to apply backpressure. Instead,
> there should be a backpressure mechanism in the BK client itself to
> prevent this kind of issue.
> One possibility there could be to use the same approach as described
> in 
> https://github.com/apache/pulsar/wiki/PIP-74%3A-Pulsar-client-memory-limits,
> giving a max memory limit per BK client instance and throttling
> everything after the quota is reached.
> 
> 
> Matteo

Reply via email to