Re: [DISCUSS] BP-61: Revert BP-59 to release ByteBuf using ByteBuf.release() instead of ReferenceCountUtil.safeRelease()

2023-02-02 Thread Andrey Yegorov
Netty recommends to NOT use .safeRelease() unless one absolutely needs to
swallow an exception:
https://github.com/netty/netty/blob/48adb2609c7210ff55bb2ae42772a579052d06e7/common/src/main/java/io/netty/util/ReferenceCountUtil.java#L107-L112

.release() is a convenient way to bypass null check if the object may
happen to be null e.g. in the finally block when it is possible that
allocation didn't happen.
though BK's code still does the null check and then uses .release().

I am not really sure why safeRelease() spread so much, maybe it was a quick
way to deal with errors in unit tests caused by improper setup/tearDown.

On an unrelated note, while looking through all that code I noticed that
memetable uses NIO ByteBuffers.
so naturally for the netty's ByteBuf entry has to be copied into nio buffer:
"memTable.addEntry(ledgerId, entryId, entry.nioBuffer(), this)"
And after getEntry() it is wrapped into netty's ByteBuf.
I wonder if switching it to ByteBuf would improve performance / reduce java
GC load.


On Wed, Feb 1, 2023 at 1:28 AM Hang Chen  wrote:

> > +1 , use new specification (ReferenceCountUtil.release) replace BP-59
>
> > I think it makes sense to replace ReferenceCountUtil.safeRelease
> with ReferenceCountUtil.release.
> FWIW ReferenceCountUtil.release's return value just says if the byte buf's
> ref count reached zero or not, it is not an error handling mechanism.
>
> The `ReferenceCountUtil.release` checks whether the ByteBuf is
> `ReferenceCounted`. If not, it will do nothing and return false.
> ```java
> public static boolean release(Object msg) {
> return msg instanceof ReferenceCounted ?
> ((ReferenceCounted)msg).release() : false;
> }
> ```
> All the use cases in BookKeeper about `ReferenceCountUtil.release` are
> not check the returned value.
> Even Though all the ByteBuf in BookKeeper code are ReferenceCounted,
> it still has risk of causing direct memory leak. One example is [1]
>
> My question is why not use `ByteBuf.release()` directly?
>
>
> [1] https://github.com/apache/bookkeeper/issues/3527
>
> Thanks,
> Hang
>
> Enrico Olivelli  于2023年2月1日周三 15:35写道:
> >
> > Horizon
> >
> > Il giorno lun 30 gen 2023 alle ore 11:14 Horizon
> > <1060026...@qq.com.invalid> ha scritto:
> > >
> > > Hi, Enrico. Just confirm, only revert
> https://github.com/apache/bookkeeper/pull/3528, it work well?
> >
> > yes
> > This is the only thing we reverted in our fork
> >
> > see the history
> > https://github.com/datastax/bookkeeper/commits/ds-4.14
> >
> > but I don't have all of the other commits in the history,
> > so maybe other changes may be bad
> >
> > Unfortunately currently we (both BookKeeper community and in my
> > company) don't have
> > dedicated tests for BookKeeper but only for Pulsar,
> > that makes things harder to track
> >
> > probably we should build some automated performance test:
> > a set of scripts that create a BK cluster  on k8s, run some
> > benchmarks/heavy load, verify that everything worked well
> > The workbench should be re-usable by anyone, I am not saying that we
> > should spend the ASF resources to run
> > this kind of tests, but that we should provide a reference load test.
> >
> >
> > Enrico
> >
> > >
> > > > 2023年1月30日 16:04,Enrico Olivelli  写道:
> > > >
> > > > I think that we must do something, current master branch is not
> stable.
> > > >
> > > > My colleague Massimiliano opened this issue
> > > > https://github.com/apache/bookkeeper/issues/3751
> > > >
> > > > Basically in the current master there is some problem that leads to
> > > > Netty BytBuf corruption.
> > > >
> > > > The problem is solved by reverting this PR
> > > > https://github.com/apache/bookkeeper/pull/3528 Fix memory leak when
> > > > reading entry but the connection disconnected
> > > >
> > > > Enrico
> > > >
> > > > Il giorno lun 30 gen 2023 alle ore 08:04 steven lu
> > > >  ha scritto:
> > > >>
> > > >> There are three ways to write release in current pulsar and
> bookkeeper
> > > >> projects:
> > > >> 1.ByteBuf.release: Call release directly without handling any
> exceptions;
> > > >> 2.ReferenceCountUtil.release: return processing status;
> > > >> 3. ReferenceCountUtil.safeRelease: without return value, print
> exception
> > > >> information
> > > >>
> > > >> I don't think it is necessary to revert these PRs. We can discuss
> what is
> > > >> the correct way to release, and then modify these three ways into
> the
> > > >> correct way.
> > > >>
> > > >> Hang Chen  于2023年1月30日周一 12:28写道:
> > > >>
> > > >>> Hi guys,
> > > >>>   In BP-59, which was not discussed in the dev mail list and
> without
> > > >>> a vote, refactored the ByteBuf release method by
> > > >>> ReferenceCountUtil.safeRelease() instead of ByteBuf.release().
> > > >>>
> > > >>> In the `ReferenceCountUtil.safeRelease()`, it catches the release
> > > >>> exception. This change can make the ByteBuf release without any
> > > >>> exceptions and makes the code run well, but it will make bugs hide
> > > >>> deeper and more challenging 

Re: [DISCUSS] BP-61: Revert BP-59 to release ByteBuf using ByteBuf.release() instead of ReferenceCountUtil.safeRelease()

2023-02-02 Thread Enrico Olivelli
Il giorno gio 2 feb 2023 alle ore 10:57 Yan Zhao
 ha scritto:
>
> > I think that we must do something, current master branch is not stable.
> >
> > My colleague Massimiliano opened this issue
> > https://github.com/apache/bookkeeper/issues/3751
> >
> > Basically in the current master there is some problem that leads to
> > Netty BytBuf corruption.
> >
> > The problem is solved by reverting this PR
> > https://github.com/apache/bookkeeper/pull/3528 Fix memory leak when
> > reading entry but the connection disconnected
> >
> > Enrico
>
> Hi, Enrico. We found that the pulsar timeout may not be introduced by 
> https://github.com/apache/bookkeeper/pull/3528. Only roll back the pulsar 
> version and still maintain https://github.com/apache/bookkeeper/pull/3528, 
> the problem is also solved.

I am sorry but https://github.com/apache/bookkeeper/issues/3751
doesn't report the full story ( I will ask Massimiliano do add more
logs)

The Bookie operation timed out because on the Bookie side the Bookie
sees a packet that is too big, and the cause is that something (the
Netty ByteBuf) is probably recycled



Enrico

>
> I guess the bk thread hangs for some reason in pulsar, the bk server handles 
> the request successfully and sends the response to the bk client, but the bk 
> thread hangs, it didn't handle the response, so timeout.
>
> Massimiliano is tracing it at pulsar side.


Re: [DISCUSS] BP-61: Revert BP-59 to release ByteBuf using ByteBuf.release() instead of ReferenceCountUtil.safeRelease()

2023-02-02 Thread Yan Zhao
> I think that we must do something, current master branch is not stable.
> 
> My colleague Massimiliano opened this issue
> https://github.com/apache/bookkeeper/issues/3751
> 
> Basically in the current master there is some problem that leads to
> Netty BytBuf corruption.
> 
> The problem is solved by reverting this PR
> https://github.com/apache/bookkeeper/pull/3528 Fix memory leak when
> reading entry but the connection disconnected
> 
> Enrico

Hi, Enrico. We found that the pulsar timeout may not be introduced by 
https://github.com/apache/bookkeeper/pull/3528. Only roll back the pulsar 
version and still maintain https://github.com/apache/bookkeeper/pull/3528, the 
problem is also solved.

I guess the bk thread hangs for some reason in pulsar, the bk server handles 
the request successfully and sends the response to the bk client, but the bk 
thread hangs, it didn't handle the response, so timeout.

Massimiliano is tracing it at pulsar side.