This is an automated email from the ASF dual-hosted git repository.
xyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new d81c76ee702 [improve][pip] PIP-443: Stop using Netty Recycler in new
code (#24766)
d81c76ee702 is described below
commit d81c76ee70245c3a321b19947507f69ffd4a05cf
Author: Yunze Xu <[email protected]>
AuthorDate: Mon Sep 29 19:51:25 2025 +0800
[improve][pip] PIP-443: Stop using Netty Recycler in new code (#24766)
---
pip/pip-443.md | 237 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 237 insertions(+)
diff --git a/pip/pip-443.md b/pip/pip-443.md
new file mode 100644
index 00000000000..1c908093248
--- /dev/null
+++ b/pip/pip-443.md
@@ -0,0 +1,237 @@
+# PIP-443: Stop using Netty Recycler in new code
+
+# Background knowledge
+
+Netty
[`Recycler`](https://github.com/netty/netty/blob/netty-4.1.127.Final/common/src/main/java/io/netty/util/Recycler.java)
is widely used in many places to reduce GC pressure by reusing objects. From
the outputs of the following command:
+
+```bash
+find . -name "*.java" | grep src/main | xargs grep -n "new .*Recycler"
+```
+
+You can find:
+- 5 usages in `pulsar-common`
+- 10 usages in `pulsar-broker`
+- 10 usages in `pulsar-client`
+
+However, many of them are only the result of cargo cult programming, which
blindly believes objects allocated from the `Recycler` is better than allocated
from heap memory. Hence, we should understand how it works first.
+
+Recycler is basically a thread-local object pool that provides
+- a `get` method to allocate an object from the pool or call the user-provided
`newObject` method to create a new one if the pool is empty or full.
+- a `recycle` method to return an object back to the pool.
+
+Here is a typical implementation that wraps a `Recycler` to manage a triple:
+
+```java
+@Data // support getter and setter for all fields
+public class RecyclablePair {
+
+ private final Recycler.Handle<RecyclablePair> handle;
+ private Object x;
+ private Object y;
+
+ private RecyclablePair(Handle<RecyclablePair> handle) {
+ this.handle = handle;
+ }
+
+ public void recycle() {
+ this.x = null;
+ this.y = null;
+ handle.recycle(this);
+ }
+
+ public static RecyclablePair create(Object x, Object y) {
+ final RecyclablePair s = RECYCLER.get();
+ s.x = x;
+ s.y = y;
+ return s;
+ }
+
+ private static final Recycler<RecyclablePair> RECYCLER = new
Recycler<RecyclablePair>() {
+ @Override
+ protected RecyclablePair newObject(Handle<RecyclablePair> handle) {
+ return new RecyclablePair(handle);
+ }
+ };
+}
+```
+
+It's typically used with a "recycle-after-use" pattern like:
+
+```java
+var triple = RecyclablePair.create(a, b);
+process(triple);
+triple.recycle();
+```
+
+or
+
+```java
+var triple = RecyclablePair.create(a, b);
+executor.execute(() -> {
+ process(triple);
+ triple.recycle();
+});
+```
+
+Unlike a simple thread local pool, recycling in a different thread is
supported because in this case, `recycle()` will push the object (`triple` in
the code above) into a queue that implements
[MessagePassingQueue](https://github.com/JCTools/JCTools/blob/v4.0.5/jctools-core/src/main/java/org/jctools/queues/MessagePassingQueue.java).
+
+The following code snippet of the thread local pool used by `Recycler`
includes some key parts:
+
+```java
+private static final class LocalPool<T> implements
MessagePassingQueue.Consumer<DefaultHandle<T>> {
+ private final ArrayDeque<DefaultHandle<T>> batch;
+ private volatile Thread owner;
+ private volatile MessagePassingQueue<DefaultHandle<T>> pooledHandles;
+```
+
+When `get` is called, it firsts get the thread local `LocalPool` and apply the
following logic:
+1. Drain some handles from `pooledHandles` to `batch` if `batch` is empty.
+2. Poll a handle from `batch` and mark it as "claimed" (used by the current
thread).
+3. If `handle` is available, call the `get` method to return the object. In
the example above, `RecyclablePair#create` calls this `get` method to get a
pooled object and initializes some fields.
+4. Otherwise, call `newObject` to create a new object from heap memory and
associated with a new handle. With the default config, the handle is a "no-op
handle" in 7/8 cases that won't push the object into `pooledHandles` when
`recycle` is called. In the rest 1/8 cases, the handle type is a
`DefaultHandle` that is associated with the `LocalPool` in this thread and can
be pushed to `pooledHandles` and `batch`
+
+When `recycle` is called if the handle is a `DefaultHandle`:
+1. **Good Case**: if the object is allocated by `get()` in a
`FastThreadLocalThread` and `recycle()` is called in the same thread, the
handle is pushed to `batch` directly.
+2. **Bad Case**: otherwise, the handle is pushed to `pooledHandles` queue,
which is less efficient.
+
+# Motivation
+
+There are mainly three issues of using `Recycler` in Pulsar.
+
+## Issue 1: performance overhead
+
+As explained in the previous section, we can see that if the recyclable object
like `RecyclablePair` is allocated not in a `FastThreadLocalThread`, it's not
efficient. Here is an existing benchmark result running on a GitHub Actions
Ubuntu 24.04 runner:
+
+```
+recycler.RecyclerBenchmark.testRecord thrpt 25
468034078.305 ± 1453202.080 ops/s
+recycler.RecyclerBenchmark.testThreadLocalSingletonRecycler thrpt 25
137434124.938 ± 173674.317 ops/s
+recycler.RecyclerBenchmark.testThreadLocalQueueRecycler thrpt 25
70585776.893 ± 418944.351 ops/s
+recycler.RecyclerBenchmark.testRecycler thrpt 25
36061222.443 ± 368182.450 ops/s
+```
+
+- `testRecord` allocates a simple tuple `record` object from heap memory
+- `testThreadLocalSingletonRecycler` allocates a tuple object from a thread
local object
+- `testThreadLocalQueueRecycler` allocates a recyclable tuple object from a
thread local queue (`ArrayDeque`) and add it to the queue (it simulates the
case that `recycle` is called in a `FastThreadLocalThread` and recycled in the
same thread)
+- `testRecycler` allocates a recyclable tuple object using `Recycler` and
calling `recycle` then
+
+The full benchmark code is
[here](https://github.com/BewareMyPower/JavaBenchmark/blob/v0.1.0/src/main/java/io/bewaremypower/recycler/RecyclerBenchmark.java).
+
+From the results above, the performance is about 7x worse than an object from
heap memory in the good case and 13x worse in the bad case.
+
+## Issue 2: error-prone usage
+
+Compared to the performance overhead, the error-prone usage is a more serious
issue. Users have to call `recycle()` explicitly for each object like the
error-prone `malloc/free` in C language.
+
+```java
+var triple = RecyclablePair.create(a, b);
+process(triple);
+triple.recycle();
+```
+
+For example, if `process` in the code above throws an exception, `recycle()`
will be skipped. [#13233](https://github.com/apache/pulsar/pull/13233) is a
real world fix for such issue. Though the issue is not serious because the
worst case is that the pool is full and then the object will be allocated from
heap buffer.
+
+A more serious issue is that the object is shared among different threads and
then `recycle()` are called twice, which will throw an exception for the 2nd
recycle call. [#24697](https://github.com/apache/pulsar/issues/24697) suffers
this issue and was not fixed when this proposal is written. Since the exception
is thrown by `recycle()` during the read path, it affects the client side that
the consumer cannot receive messages anymore.
[#24725](https://github.com/apache/pulsar/pull/24725) [...]
+
+The most serious issue is that if the object is shared by two threads and one
thread has recycled it, the second thread could access:
+- the object whose fields are reset with initial values by `recycle()`
+- or the object that has been reused in another task by `get()`
+
+The 1st case might lead to NPE, which might be relatively fine because it's
exposed in logs. The 2nd case is really dangerous because the object is in an
inconsistent state and might not be exposed with any exception.
+
+Since the ownership of a recyclable object must be unique, the move semantics
can be implemented like:
+
+```java
+@RequiredArgsConstructor
+@Getter
+static class MovableRecyableObject {
+
+ private static final Recycler<MovableRecyableObject> RECYCLER = new
Recycler<MovableRecyableObject>() {
+ @Override
+ protected MovableRecyableObject
newObject(Handle<MovableRecyableObject> handle) {
+ return new MovableRecyableObject(handle);
+ }
+ };
+
+ private final Recycler.Handle<MovableRecyableObject> handle;
+ private Object object;
+
+ public static MovableRecyableObject create(Object object) {
+ final var result = RECYCLER.get();
+ result.object = object;
+ return result;
+ }
+
+ public MovableRecyableObject move() {
+ final var result = MovableRecyableObject.create(this.object);
+ this.recycle();
+ return result;
+ }
+
+ public void recycle() {
+ this.object = null;
+ handle.recycle(this);
+ }
+}
+```
+
+Before publishing the recyclable object to a different thread, `move()` must
be called, so that if the object is accessed by a third thread later, it will
fail fast with NPE. However, this makes code harder to write that developers
have to follow this pattern. Not calling `move()` won't lead to a compile-time
issue or runtime issue.
+
+Without following the move-before-publish pattern, we have to make `recycle`
synchronized, which will have more performance overhead:
+
+```java
+ public synchronized void recycle() {
+ if (this.object != null) {
+ this.object = null;
+ handle.recycle(this);
+ }
+ }
+```
+
+## Issue 3: questionable benefits
+
+How much GC pressure can be reduced by sacrificing the memory allocation
latency is hard to measure and never proved by any benchmark. When `Recycler`
was introduced in Netty, Java 8 was even not released. The main use case of
`Recycler` in Netty is for `ByteBuf` to avoid the time-consuming buffer memory
allocation from direct memory, where the buffer is used as a request for users
to process in the `FastThreadLocalThread`.
+
+However, Java GC has involved a lot in these years. For example, Generational
ZGC, which is introduced since Java 21, performs much better on tail latencies
than the previous garbage collectors. Here are some blogs that verify the
improvement:
+- https://www.morling.dev/blog/lower-java-tail-latencies-with-zgc/
+-
https://netflixtechblog.com/bending-pause-times-to-your-will-with-generational-zgc-256629c9386b
+
+It's questionable if `Recycler` is still useful in modern Java versions. Even
if assuming a usage of `Recycler` can reduce the GC pressure significantly in a
scenario, the benefit can be easily broken by a slight change of the code,
which is hard to be noticed and verified.
+
+In Pulsar, Netty `Recycler` is typically used to wrap a list of objects to
avoid passing them as method parameters, which makes methods too long with
duplicated parameters. For example,
[`ManagedLedgerImpl#asyncReadOpEntry`](https://github.com/apache/pulsar/blob/d833b8bef21cb9e85f0f313eb9d49c7ca550fbbd/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2064)
accepts an `OpReadEntry` instance, which is recycable and includes many
fields. This method is [...]
+
+However, as the code becomes complicated, after being passed to the entry
cache to process, the `OpReadEntry` is wrapped as another callback object:
https://github.com/apache/pulsar/blob/d833b8bef21cb9e85f0f313eb9d49c7ca550fbbd/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java#L349-L365
+
+Now, if the recycler design on `OpReadEntry` saved 1 of 1 heap memory
allocation for each read operation, now it just saves 1 of 2 heap memory
allocations. What's more, this wrapped callback is passed to
`PendingReadsManager` and eventually used as a field of a new record type
`ReadEntriesCallbackWithContext` to an `ArrayList` in
https://github.com/apache/pulsar/blob/d833b8bef21cb9e85f0f313eb9d49c7ca550fbbd/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/PendingRead
[...]
+
+Now, each read operation has:
+- 1 object allocated by Netty recycler (`OpReadEntry`)
+- 1 object allocated from heap memory (the wrapped callback in
`RangeEntryCacheImpl`)
+- 1 object allocated from heap memory (the `ReadEntriesCallbackWithContext`
object that has a field of the wrapped callback)
+ - This object is added to an `ArrayList` allocated from heap memory with at
least 1 element
+
+Eventually, only saving the 1st heap memory allocation should not make a
difference. Actually, many optimizations in garbage collectors can prevent
short-lived objects from being allocated from heap memory.
+
+# Goals
+
+Make a rule for whether to use Netty `Recycler` to avoid debates when
reviewing code like "the recycler is good that you should not remove it".
+
+# High Level Design
+
+For new contributions, don't use Netty `Recycler`.
+
+If you need to save heap memory usage, consider using `FastThreadLocal` to
cache a single object or an object queue like What Netty `Recycler` does for
the **Good Case** mentioned in the **Background knowledge** section. This would
also be more effective for normal threads than Netty `Recycler`.
+
+For existing code, generally we should not pay efforts to remove recyclable
classes. But it's acceptable to remove them especially when:
+- A known recycler issue is reported but the reason is hard to analyze.
(example: [#24741](https://github.com/apache/pulsar/pull/24741))
+- The existing recyclable object makes code refactoring hard
+
+# Alternatives
+
+## Add an option to allow whether to use Netty recycler
+
+This would be helpful to figure out if the GC pressure can be reduced once
it's enabled. However, the code would be much more complicated. Additionally,
measuring the GC pressure is not easy. In many dashboards, the GC time is
mistakenly called as "GC pause time", where there is no actual pause. When
objects aren't recycled, it's expected for some use cases that more CPU will be
spent in GC. However the Netty Recycler consumes a significant share of CPU too
and there's no metric, so just [...]
+
+# Links
+
+* Mailing List discussion thread:
http://lists.apache.org/thread/bvllwld260g39v5hfsyr8gl2bwb1kxqz
+* Mailing List voting thread:
https://lists.apache.org/thread/bxonjp38p7yzdw8ytk6bkp344nd9vw45