Thanks for starting the discussion, Yunze.

> - The fallibility can be shown from my previous example. Using a
> recyclable object is much more error-prone than using an immutable
> object (Java record).

Yes, it is very error-prone to use recyclable objects.

> Anyway, it's already the case here. The best efforts we can do without
> much refactoring is to keep the unique ownership for a recyclable
> object.

I agree.

Regarding how ownership is handled, let's first take a look at how
Netty handles this.
Netty has a best-effort solution to detect "use after release" issues.
If the application logs IllegalReferenceCountExceptions, it's usually
a sign of "use after release" or "double release" issues in the
implementation.

In Netty, the methods call "ensureAccessible" before accessing the
actual buffer. This is a best-effort solution as explained in this
example:
https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java#L1431-L1456

ByteBuf's isAccessible implementation:
https://github.com/netty/netty/blob/94cfa608a8071e8a9005e0d52e0199b3584f4ae7/buffer/src/main/java/io/netty/buffer/ByteBuf.java#L2485-L2491

    /**
     * Used internally by {@link AbstractByteBuf#ensureAccessible()}
to try to guard
     * against using the buffer after it was released (best-effort).
     */
    boolean isAccessible() {
        return refCnt() != 0;
    }

I have an experiment in https://github.com/apache/pulsar/pull/22110 to
add something similar in Pulsar. Please note that it's just an
experiment and it's outdated.
At that time there were multiple "use after release" issues. It seems
that the Netty SSL issue (fixed in 4.1.111.Final) and the issues in
the Broker Entry Cache were the main sources of this type of problem.

In the Pulsar code base, there are some solutions for handing off the
ownership from one thread to another.
It's hard to get ownership passing right. One of the solutions that
has been used is the "waitingReadOp"/WAITING_READ_OP_UPDATER solution
in ManagedCursorImpl.
The solution missed a possibility where the object reference that is
held by another thread could have been recycled in between, and the
same instance received from the recycled pool could be set in the
waitingReadOp field while it is already being used for a different
purpose.
To avoid this case, https://github.com/apache/pulsar/pull/24569 added
an id field that is incremented for each usage, and in addition to the
OpReadEntry, the other thread holds the id. This way it is ensured
that the "hand off" to the other thread is consistent.
A similar pattern (passing a separate field) has been used in
ReadEntryCallbackWrapper:
https://github.com/apache/pulsar/blob/d2728253c666f7d4bd4f111356f3b97663603b6a/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2285-L2289
https://github.com/apache/pulsar/blob/d2728253c666f7d4bd4f111356f3b97663603b6a/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2430-L2443
In that solution, the readOpCount value is passed in the operation ctx.

These solutions alone aren't sufficient for handling "ownership". In
addition, the object instances must be safely passed from one thread
to another.
As long as the recycled instances aren't mutated in multiple threads
simultaneously and the object instance is always shared in a
"pipeline" (let's say in an asynchronous CompletableFuture pipeline)
that ensures "happens-before", it's fine.
The thread safety issues make the problem worse when there are
multiple threads that might mutate and read the instance (breaking
"happens-before" relationships), even though the instance hasn't been
recycled yet.
This has been the case for some issues in the past. Some of the
problems have been addressed by switching to use volatile fields. All
of this is error-prone and it's also hard to ensure that changes are
correct in code reviews.

I hope this is useful background information about "ownership passing".

-Lari

On Sun, 3 Aug 2025 at 09:17, Yunze Xu <[email protected]> wrote:
>
> Actually passing the ownership between threads should not be
> encouraged for recyclable objects. You can ask AI (like Gemini) the
> following question:
>
> > Is it a good practice to share a recyclable object (allocated by Netty 
> > Recycler) among different threads?
>
> The key point for the ownership passing case:
>
> > If you have a pattern of frequent cross-thread recycling, you're bypassing 
> > the performance benefits of the thread-local stacks and introducing 
> > overhead.
>
> Though this pattern has already been widely used in Pulsar, which
> trades the performance and fallibility for heap memory allocation.
> - The performance overhead is mainly the `pollLast()` and
> `relaxedOffer()` methods on a platform independent MPSC (multiple
> producers single consumer) queue. What's more, we can see many thread
> switching to guarantee thread safety on the fields of a recyclable
> object (e.g. the `ArrayList` used in `OpReadEntry#entries`)
> - The fallibility can be shown from my previous example. Using a
> recyclable object is much more error-prone than using an immutable
> object (Java record).
> - The saved heap memory is never measured, at least I've never seen a
> test report.
>
> Anyway, it's already the case here. The best efforts we can do without
> much refactoring is to keep the unique ownership for a recyclable
> object.
>
> Thanks,
> Yunze
>
> On Sun, Aug 3, 2025 at 1:35 PM Yunze Xu <[email protected]> wrote:
> >
> > Hi Penghui,
> >
> > Introducing the reference count adds unnecessary overhead and might
> > not make things better. Generally, a recyclable object should not be
> > used concurrently, though the recyclable object might be passed across
> > threads. We should avoid sharing the same recyclable object among
> > multiple threads in the code path, rather than adding a reference
> > count.
> >
> > I understand Java programmers hardly take object ownership (unique vs.
> > shared) into consideration because of GC, but when the Netty recycler
> > is involved, we have to face the ownership issue. It's not just for
> > the sake of performance, it's also for correctness.
> >
> > Given the following example:
> > - Thread A creates a recyclable object O and publishes the reference
> > to thread B and C
> > - Thread B executes a normal task on O and then recycles O
> > - Thread C detects if O has not been recycled by checking `O.field !=
> > null`, and then read `O.field` for following logic.
> >
> > Firstly, there could be an ABA issue that `O.field != null` does not
> > mean O has not been recycled. It could also be that the previous
> > referenced O has been recycled and allocated from the pool again. Then
> > `O.field` should not be expected.
> > Secondly, there could be a thread safety issue. Generally, the access
> > to O must be protected by a lock for thread safety, but we usually
> > don't do that because the fields are only initialized once in the
> > factory method of O. However, there is usually a `recycle()` method
> > that reset all fields to null before recycling the object, which means
> > that in a certain time point, thread B modifies O.field while thread C
> > reads O.field.
> >
> > The root cause is that the ownership of O is shared between B and C.
> > If we can guarantee the ownership is unique, this issue will
> > disappear.
> >
> > Thanks,
> > Yunze
> >
> > On Tue, Jul 29, 2025 at 11:29 PM PengHui Li <[email protected]> wrote:
> > >
> > > I think the question is a more general question about reusing objects, not
> > > only for failed recyclable objects.
> > > Can we add a reference counter to the recyclable objects (implement the
> > > ReferenceCounted interface),
> > > There could be some corner cases that were missed in the release of the
> > > counter.
> > > We can log and skip the object recycling.
> > > But it will prevent one object from being used by 2 references at the same
> > > time.
> > >
> > > - Penghui
> > >
> > >
> > >
> > > On Thu, Jul 17, 2025 at 9:17 AM Ran Gao <[email protected]> wrote:
> > >
> > > > Thanks for your explanation. Reusing the corrupted recycle object is
> > > > dangerous, and it's hard to investigate issues from the heap dump.
> > > >
> > > > Thanks,
> > > > Ran Gao
> > > >
> > > > On 2025/07/17 13:38:38 Lari Hotari wrote:
> > > > > Recycling can cause bugs that are very hard to detect. The main reason
> > > > to avoid recycling in exceptional cases is to prevent such bugs. One
> > > > possible scenario is when another thread continues to hold a reference 
> > > > to
> > > > an object instance. After the object instance has been recycled, it 
> > > > could
> > > > refer to a completely different object instance. We have encountered 
> > > > such
> > > > bugs in the past with Pulsar, which is why it's better to avoid the 
> > > > risk of
> > > > these bugs altogether.
> > > > >
> > > > > Netty doesn't recycle all object instances when recycling is enabled.
> > > > There's currently a per-thread limit of 4,096 object instances by 
> > > > default.
> > > > Omitting recycling in exceptional paths won't add noticeable extra GC
> > > > pressure because many instances will be discarded in any case, due to
> > > > recycling cache overflow when using default Netty settings.
> > > > >
> > > > > -Lari
> > > > >
> > > > > On 2025/07/17 12:47:22 Yunze Xu wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I noticed there is a difference of opinion for whether to recycle
> > > > > > failed recyclable objects. [1][2][3] So I'd like to open the
> > > > > > discussion in the mail list.
> > > > > >
> > > > > > Netty recycler is an object pool to reduce GC pressure for 
> > > > > > frequently
> > > > > > created and discarded objects. If a recyclable object fails, e.g. 
> > > > > > when
> > > > > > an `OpAddEntry` encounters an unexpected exception, IMO, it's
> > > > > > reasonable not to recycle it.
> > > > > >
> > > > > > When a recyclable object is not recycled, it just won't be reused 
> > > > > > from
> > > > > > the object pool. The worst case is that the memory of the whole 
> > > > > > object
> > > > > > pool is exhausted. After that
> > > > > >
> > > > > > It's true that if such objects are not recycled, there will be a
> > > > > > "memory leak". However, the worst result is that the pool memory is
> > > > > > exhausted. In this case, it will fall back to allocating objects via
> > > > > > the `newObject` method with a dummy handle [4]. The implementation 
> > > > > > of
> > > > > > this method usually allocates memory from JVM. In such cases, the
> > > > > > recyclable object will eventually be garbage collected.
> > > > > >
> > > > > > It's actually not a memory leak. Recyclable objects usually have 
> > > > > > short
> > > > > > life timing, for example, an `OpAddEntry` object is created when
> > > > > > starting an asynchronous send and recycled after the send is done. 
> > > > > > The
> > > > > > object is never referenced by any other object, so even if it's
> > > > > > allocated from JVM, it will eventually be garbage collected.
> > > > > >
> > > > > > The benefit to skip recycling objects for failures is that for rare
> > > > > > cases, retaining the object could help diagnose issues that are hard
> > > > > > to reproduce. Still take `OpAddEntry` for example, if it encountered
> > > > > > an unexpected exception, recycling it could set all fields with 
> > > > > > null,
> > > > > > so it's hard to know which ledger it belongs to (and other useful
> > > > > > fields) from the heap dump.
> > > > > >
> > > > > > [1] 
> > > > > > https://github.com/apache/pulsar/pull/24515#discussion_r2212602435
> > > > > > [2] 
> > > > > > https://github.com/apache/pulsar/pull/24522#discussion_r2210814071
> > > > > > [3] 
> > > > > > https://github.com/apache/pulsar/pull/24522#discussion_r2213100905
> > > > > > [4]
> > > > https://github.com/netty/netty/blob/ab2d2cf3ff3a6f055368405af7f6e9dd5b8d144e/common/src/main/java/io/netty/util/Recycler.java#L189
> > > > > >
> > > > > > Thanks,
> > > > > > Yunze
> > > > > >
> > > > >
> > > >

Reply via email to