qianye1001 commented on issue #10397:
URL: https://github.com/apache/rocketmq/issues/10397#issuecomment-4560519111

   ## 🤖 Automated Fix Proposal (v1)
   
   I've analyzed this issue against the latest upstream code and generated a 
fix specification.
   
   **Code Base:** c9e51d625e1d (develop)
   
   ### Summary
   
   **Root Cause:** When TLS certificates are hot-reloaded, both 
`NettyRemotingServer.loadSslContext()` and 
`ProxyAndTlsProtocolNegotiator.loadSslContext()` overwrite the `sslContext` 
field without releasing the old instance. Since `OpenSslContext` allocates 
native off-heap memory (SSL_CTX, X509 chain, EVP_PKEY), this native memory 
leaks on every certificate rotation cycle.
   
   **Fix Strategy:**
   1. Save the old `SslContext` reference before building the replacement
   2. After successful assignment of the new context, call 
`ReferenceCountUtil.release(oldSslContext)` to free native memory
   3. Use "build new, then release old" ordering to ensure `sslContext` is 
never null or prematurely released
   
   **Files to Modify:**
   - 
`remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRemotingServer.java`
 — add old context release in `loadSslContext()`
   - 
`proxy/src/main/java/org/apache/rocketmq/proxy/grpc/ProxyAndTlsProtocolNegotiator.java`
 — add `volatile` + old context release in `loadSslContext()`
   
   **Risk Assessment:** LOW — straightforward resource lifecycle fix following 
Netty's reference-counting pattern. Fully backward compatible.
   
   ### ⚠️ Duplicate Issue Notice
   
   This issue describes the **same root cause** as issue #10395. An existing 
fix has been proposed in PR #10396 (branch `fix/issue-10395`). If PR #10396 is 
merged, this issue can be closed as a duplicate.
   
   If the community prefers a separate or revised fix for this issue, please 
provide feedback below.
   
   <details>
   <summary>Full Specification (click to expand)</summary>
   
   ```markdown
   # Fix Specification: TLS Certificate Hot-Reload Native Memory Leak
   
   **Issue:** #10397 - TLS certificate hot-reload leaks native memory due to 
unreleased old SslContext  
   **Related Issue:** #10395 (duplicate describing the same root cause)  
   **Existing PR:** #10396 (branch `fix/issue-10395`, commit `7d55555`)
   
   ---
   
   ## 1. Root Cause Analysis
   
   When TLS certificates are hot-reloaded, both 
`NettyRemotingServer.loadSslContext()` and 
`ProxyAndTlsProtocolNegotiator.loadSslContext()` overwrite the `sslContext` 
field with a newly built `SslContext` instance **without releasing the previous 
one**.
   
   Netty's `SslContext` (especially when using `SslProvider.OPENSSL` / 
`SslProvider.OPENSSL_REFCNT`) wraps native OpenSSL resources managed via 
reference counting. When the old reference is simply discarded:
   
   - The native SSL_CTX and associated buffers are never freed.
   - Each reload accumulates leaked native memory proportional to the size of 
the certificate chain and private key material.
   - Over time this leads to OOM or process termination by the OS OOM-killer.
   
   **Affected code paths:**
   
   | Location | Line | Behavior |
   |----------|------|----------|
   | `NettyRemotingServer.loadSslContext()` | 
remoting/.../NettyRemotingServer.java:186 | `sslContext = 
TlsHelper.buildSslContext(false);` overwrites without release |
   | `ProxyAndTlsProtocolNegotiator.loadSslContext()` | 
proxy/.../ProxyAndTlsProtocolNegotiator.java:118,131 | `sslContext = 
GrpcSslContexts.forServer(...)...build();` overwrites without release |
   
   The `sslContext` field in `NettyRemotingAbstract` is already declared 
`volatile` (line 121), so `NettyRemotingServer` is thread-safe for reads. 
However, in `ProxyAndTlsProtocolNegotiator` the field is `private static` 
without `volatile`, creating a potential visibility issue across threads.
   
   ---
   
   ## 2. Fix Strategy (Step by Step)
   
   ### NettyRemotingServer.loadSslContext()
   
   1. Before building the new `SslContext`, save the current value: `SslContext 
oldSslContext = this.sslContext;`
   2. Build the new context via `TlsHelper.buildSslContext(false)`.
   3. Assign the new context to `this.sslContext`.
   4. If `oldSslContext != null`, call 
`ReferenceCountUtil.release(oldSslContext)` to decrement its reference count 
and free native resources.
   5. Wrap the release in a try-catch to avoid disrupting the reload if release 
fails (log warning).
   
   ### ProxyAndTlsProtocolNegotiator.loadSslContext()
   
   1. Add `volatile` modifier to the `private static SslContext sslContext` 
field for thread-safe visibility.
   2. At the start of `loadSslContext()`, save: `SslContext oldSslContext = 
sslContext;`
   3. Build the new context (both test-mode and production paths already 
produce a new `SslContext`).
   4. Assign to `sslContext`.
   5. After successful assignment, release the old context: `if (oldSslContext 
!= null) ReferenceCountUtil.release(oldSslContext);`
   6. Wrap the release in a try-catch to avoid disrupting the reload.
   
   ### Ordering Rationale
   
   "Build new, then release old" ensures:
   - `sslContext` is neve
   ```
   
   </details>
   
   ### Next Steps
   
   - Reply `/approve` to proceed with generating a PR (or to endorse the 
existing PR #10396 as the fix for both issues)
   - Reply `/revise <feedback>` to request changes to the fix approach
   - Reply `/reject` to close this proposal
   
   *This proposal will expire in 72 hours if no response is received.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to