Flavio

Il giorno mar 12 gen 2021 alle ore 17:26 Flavio Junqueira <f...@apache.org>
ha scritto:

> I have observed the issue that Matteo describes and I also attributed the
> problem to the absence of a back pressure mechanism in the client. Issue
> #2497 was not about that, though. There was some corruption going on that
> was leading to the server receiving garbage.
>

Correct, #2497 is not about the topic of this email, I just mentioned it
because the discussion started from that comment from Matteo.

We should work on some kind of back-pressure mechanism for the client, but
I am not sure about which kind of support we should provide at BK level

Regarding the writer side of this story and memory usage,
we are not performing copies of the original payload that the caller is
passing, in case of a ByteBuf
see PendingAddOp
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L263
and here, we simply wrap it in a ByteBufList
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/checksum/DigestManager.java#L116

And as soon as the application is notified of the result of the write
(success or failure) we are releasing the reference to the payload (as I
have shown in this email thread),
so in theory the application has full control over the retained memory
and it can apply its own memory management mechanisms


Enrico


> -Flavio
>
> > On 8 Jan 2021, at 22:47, Matteo Merli <mme...@apache.org> wrote:
> >
> > On Fri, Jan 8, 2021 at 8:27 AM Enrico Olivelli <eolive...@gmail.com>
> wrote:
> >>
> >> Hi Matteo,
> >> in this comment you are talking about an issue you saw when WQ is
> greater that AQ
> >> https://github.com/apache/bookkeeper/issues/2497#issuecomment-734423246
> >>
> >> IIUC you are saying that if one bookie is slow the client continues to
> accumulate references to the entries that still have not received the
> confirmation from it.
> >> I think that this is correct.
> >>
> >> Have you seen problems in production related to this scenario ?
> >> Can you tell more about them ?
> >
> > Yes, for simplicity, assume e=3, w=3, a=2.
> >
> > If one bookie is slow (not down, just slow), the BK client will the
> > acks to the user that the entries are written after the first 2 acks.
> > In the meantime, it will keep waiting for the 3rd bookie to respond.
> > If the bookie responds within the timeout, the entries can now be
> > dropped from memory, otherwise the write will timeout internally and
> > it will get replayed to a new bookie.
> >
> > In both cases, the amount of memory used in the client will max at
> > "throughput" * "timeout". This can be a large amount of memory and
> > easily cause OOM errors.
> >
> > Part of the problem is that it cannot be solved from outside the BK
> > client, since there's no visibility on what entries have 2 or 3 acks
> > and therefore it's not possible to apply backpressure. Instead,
> > there should be a backpressure mechanism in the BK client itself to
> > prevent this kind of issue.
> > One possibility there could be to use the same approach as described
> > in
> https://github.com/apache/pulsar/wiki/PIP-74%3A-Pulsar-client-memory-limits
> ,
> > giving a max memory limit per BK client instance and throttling
> > everything after the quota is reached.
> >
> >
> > Matteo
>
>

Reply via email to