qianye1001 commented on issue #10302:
URL: https://github.com/apache/rocketmq/issues/10302#issuecomment-4542898215

   # Implementation Spec — apache/rocketmq#10302
   
   **Feature:** SNI multi-domain certificate support for Proxy TLS
   **Branch:** `develop`
   **Date:** 2026-05-26
   **Status:** Verified ✅ — ready for implementation
   
   ---
   
   ## 1. Context & Verified Current State
   
   All claims in the issue have been verified against `apache/rocketmq@develop`:
   
   | Claim | File | Verified |
   |---|---|---|
   | `ProxyConfig` only has `tlsCertPath` / `tlsKeyPath` | 
`proxy/src/main/java/org/apache/rocketmq/proxy/config/ProxyConfig.java:82-86` | 
✅ |
   | `TlsCertificateManager` watches single cert/key pair | 
`proxy/src/main/java/org/apache/rocketmq/proxy/service/cert/TlsCertificateManager.java:35-49,86-116`
 | ✅ |
   | gRPC negotiator uses single static `SslContext`, no SNI | 
`proxy/src/main/java/org/apache/rocketmq/proxy/grpc/ProxyAndTlsProtocolNegotiator.java:80,106-139,253-302`
 | ✅ |
   | Remoting TLS helper builds single context (ALPN only) | 
`proxy/src/main/java/org/apache/rocketmq/proxy/remoting/MultiProtocolTlsHelper.java:53-99`
 | ✅ |
   | `NettyRemotingServer.TlsModeHandler` uses plain `SslHandler` | 
`remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRemotingServer.java:123,180-186,485-536`
 | ✅ |
   | No `SniHandler`, `TlsSniManager`, `TlsDomainConfig`, etc. anywhere | 
repo-wide grep | ✅ |
   
   > Note: `MultiProtocolTlsHelper` is named for ALPN multiplexing (HTTP/2 vs 
remoting), not multi-certificate. It uses the global static 
`TlsSystemConfig.tlsServerCertPath`/`tlsServerKeyPath`.
   
   > Issue #10296 is a duplicate and already closed; #10302 is canonical.
   
   ---
   
   ## 2. Goals & Non-Goals
   
   ### Goals
   - Serve **multiple top-level domains** with different certificates on the 
**same Proxy port** for both gRPC and Remoting protocols.
   - Pure-additive configuration: when `tlsDomainConfigs` is empty, runtime 
behavior is **identical** to the current single-cert mode.
   - Independent hot-reload per cert/key pair (file watcher).
   - Wildcard hostname matching, falling back to the default cert when no 
domain rule matches.
   
   ### Non-Goals
   - Multi-cert support in the **broker / NameServer / core remoting** beyond 
what Proxy needs (those still use a single global `TlsSystemConfig`). Out of 
scope for this issue.
   - mTLS / client-cert-based selection.
   - ACME / Let's Encrypt automation.
   - Per-domain cipher suite or protocol-version overrides.
   
   ---
   
   ## 3. High-Level Design
   
   ```
               ┌──────────────────────────────────────────┐
               │             ProxyConfig                  │
               │                                          │
               │  tlsCertPath / tlsKeyPath  (default)     │
               │  tlsDomainConfigs:                       │
               │    "*.example.com" -> {cert, key}        │
               │    "*.sample.org"  -> {cert, key}        │
               └─────────────────┬────────────────────────┘
                                 │
                                 ▼
               ┌──────────────────────────────────────────┐
               │             TlsSniManager                │
               │  - default SslContext                    │
               │  - Map<pattern, SslContext>              │
               │  - Mapping<String, SslContext> (Netty)   │
               │  - reloadDomain(pattern)                 │
               └─────────────────┬────────────────────────┘
                                 │ DomainNameMapping / custom Mapping
                 ┌───────────────┴────────────────┐
                 ▼                                ▼
      gRPC pipeline (negotiator)        Remoting pipeline
      ┌──────────────────────┐         ┌──────────────────────┐
      │ HAProxyDecoder       │         │ HAProxyDecoder       │
      │ SniHandler ──┐       │         │ TlsModeHandler       │
      │              ▼       │         │   └─ SniHandler ─┐   │
      │           SslHandler │         │                  ▼   │
      │ HTTP/2 framer        │         │              SslHandler│
      └──────────────────────┘         └──────────────────────┘
                                 ▲
                                 │ file change events
                     ┌───────────┴────────────┐
                     │ TlsCertificateManager  │  (one FileWatchService
                     │  - per-domain watchers │   per cert+key pair)
                     └────────────────────────┘
   ```
   
   ---
   
   ## 4. File-Level Changes
   
   ### 4.1 New files (4)
   
   | Path | Purpose |
   |---|---|
   | 
`proxy/src/main/java/org/apache/rocketmq/proxy/config/TlsDomainConfig.java` | 
POJO: `certPath`, `keyPath`, optional `keyPassword`. Jackson-friendly (no-arg 
ctor + getters/setters). |
   | 
`proxy/src/main/java/org/apache/rocketmq/proxy/service/cert/TlsSniManager.java` 
| Holds default `SslContext` + `Map<String pattern, SslContext>`. Exposes a 
Netty `Mapping<String, SslContext>` for `SniHandler`. Provides `reload(String 
pattern)` and `reloadDefault()`. Thread-safe via `volatile` references and 
`ConcurrentHashMap`. |
   | 
`proxy/src/main/java/org/apache/rocketmq/proxy/service/cert/SniHostnameMatcher.java`
 | Pure-function matcher implementing the wildcard rules (§5). Unit-testable in 
isolation. |
   | 
`proxy/src/main/java/org/apache/rocketmq/proxy/service/cert/TlsContextProvider.java`
 | Indirection used by remoting `TlsModeHandler` to obtain either a 
`Mapping<String, SslContext>` (SNI mode) or a single `SslContext` (legacy 
mode). Lets remoting and gRPC share the same SNI manager. |
   
   ### 4.2 Modified files (8)
   
   | Path | Change |
   |---|---|
   | `proxy/src/main/java/org/apache/rocketmq/proxy/config/ProxyConfig.java` | 
Add `private Map<String, TlsDomainConfig> tlsDomainConfigs = new HashMap<>();` 
+ getter/setter. Keep existing `tlsCertPath`/`tlsKeyPath` as the default 
fallback. |
   | 
`proxy/src/main/java/org/apache/rocketmq/proxy/service/cert/TlsCertificateManager.java`
 | Refactor to support N watched (cert, key) pairs. Internally keep a list of 
per-pair `FileWatchService` + listener. Each listener fires `onReload(pattern)` 
to `TlsSniManager` (or `onDefaultReload()`). Preserve existing single-pair 
behavior when `tlsDomainConfigs` is empty. |
   | 
`proxy/src/main/java/org/apache/rocketmq/proxy/grpc/ProxyAndTlsProtocolNegotiator.java`
 | Replace static single `SslContext` with a `TlsSniManager`. In 
`TlsModeHandler` pipeline, when bytes indicate TLS, insert `new 
SniHandler(tlsSniManager.asMapping())` instead of pre-baking an 
`InternalProtocolNegotiators.serverTls(ctx).newHandler(...)`. Use 
`SniHandler#newSslHandler(SslContext, ByteBufAllocator)` override hook to wrap 
the chosen context with gRPC's protocol negotiator (preserves ALPN/HTTP-2 
behavior). When `tlsDomainConfigs` is empty, fall back to legacy code path 
verbatim (no SNI handler). |
   | 
`proxy/src/main/java/org/apache/rocketmq/proxy/remoting/MultiProtocolTlsHelper.java`
 | Add overload `buildSniContextProvider(ProxyConfig)` returning a 
`TlsContextProvider`. Existing `buildSslContext()` retained for back-compat. |
   | 
`proxy/src/main/java/org/apache/rocketmq/proxy/remoting/MultiProtocolRemotingServer.java`
 | Wire `TlsContextProvider` into the server. When SNI mode active, install 
`SniHandler` in pipeline before the protocol decoder; otherwise keep current 
`SslHandler`. |
   | 
`remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRemotingServer.java`
 | Extend `TlsModeHandler` constructor to accept an optional 
`TlsContextProvider`. If present, add `new SniHandler(provider.mapping())` to 
pipeline instead of `sslContext.newHandler(...)`. **No behavior change** when 
provider is null — preserves broker/nameserver behavior. |
   | `proxy/src/main/java/org/apache/rocketmq/proxy/ProxyStartup.java` | Build 
`TlsSniManager` from `ProxyConfig` once; pass it into both gRPC negotiator 
construction and remoting helper. |
   | `proxy/src/main/java/org/apache/rocketmq/proxy/grpc/GrpcServer.java` (or 
equivalent builder) | Pass `TlsSniManager` reference into 
`ProxyAndTlsProtocolNegotiator` constructor. |
   
   ### 4.3 New test files (4)
   
   | Path | Coverage |
   |---|---|
   | 
`proxy/src/test/java/org/apache/rocketmq/proxy/service/cert/SniHostnameMatcherTest.java`
 | Wildcard semantics matrix (§5). |
   | 
`proxy/src/test/java/org/apache/rocketmq/proxy/service/cert/TlsSniManagerTest.java`
 | Mapping lookup, fallback, concurrent reload, missing-file behavior. |
   | 
`proxy/src/test/java/org/apache/rocketmq/proxy/grpc/ProxyAndTlsProtocolNegotiatorSniTest.java`
 | `EmbeddedChannel` ClientHello with SNI extension → expected `SslContext` 
chosen. Legacy mode (no `tlsDomainConfigs`) still works. |
   | 
`proxy/src/test/java/org/apache/rocketmq/proxy/remoting/MultiProtocolTlsHelperSniTest.java`
 | Remoting path: SNI handler installed, ALPN unchanged. |
   
   Existing `ProxyAndTlsProtocolNegotiatorTest` and `TlsCertificateManagerTest` 
must remain green (no regressions in single-cert mode).
   
   ---
   
   ## 5. Wildcard Matching Algorithm (`SniHostnameMatcher`)
   
   Input: hostname `h` from ClientHello SNI (already lowercased by Netty).
   Configured patterns: set `P` (lowercased at load).
   
   ```
   1. If h ∈ P            → return P[h]                 // exact match (O(1))
   2. Split h into labels [l0, l1, …, lN]
   3. For i in 1..N:                                    // try progressively 
shorter suffixes
        candidate = "*." + join(l_i … l_N, ".")
        if candidate ∈ P:
           // label-count guard: a single "*" matches exactly one label
           if (N - i) == (labels(candidate) - 1):
               return P[candidate]
   4. If h has form "x.y.…" and "*.x.y.…" ∈ P with one extra label → handled by 
step 3
   5. Bare-domain fallback: if h == "example.com" and "*.example.com" ∈ P → 
return P["*.example.com"]
      (treat bare apex as if it had an empty leading label that matches "*")
   6. Otherwise → return defaultContext
   ```
   
   Matrix (must be covered by `SniHostnameMatcherTest`):
   
   | Hostname | Pattern | Result |
   |---|---|---|
   | `foo.example.com` | `*.example.com` | match |
   | `example.com` | `*.example.com` | match (bare-domain rule) |
   | `a.b.example.com` | `*.example.com` | **no** match (multi-level) |
   | `foo.example.com` | `foo.example.com` | exact match (priority over 
wildcard) |
   | `bar.sample.org` | `*.example.com` | no match → default |
   | `EXAMPLE.com` (uppercase) | `*.example.com` | match (case-insensitive) |
   | `null` / empty SNI | any | default |
   
   ---
   
   ## 6. Configuration
   
   ### YAML example
   ```yaml
   tlsTestModeEnable: false
   tlsCertPath: /etc/rocketmq/tls/default.crt
   tlsKeyPath:  /etc/rocketmq/tls/default.key
   tlsCertWatchIntervalMs: 3600000
   
   tlsDomainConfigs:
     "*.example.com":
       certPath: /etc/rocketmq/tls/example.crt
       keyPath:  /etc/rocketmq/tls/example.key
     "*.sample.org":
       certPath: /etc/rocketmq/tls/sample.crt
       keyPath:  /etc/rocketmq/tls/sample.key
   ```
   
   ### JSON (Jackson-deserializable)
   ```json
   {
     "tlsCertPath": "/etc/rocketmq/tls/default.crt",
     "tlsKeyPath": "/etc/rocketmq/tls/default.key",
     "tlsDomainConfigs": {
       "*.example.com": { "certPath": "/etc/rocketmq/tls/example.crt", 
"keyPath": "/etc/rocketmq/tls/example.key" }
     }
   }
   ```
   
   Validation at startup (in `ProxyStartup`):
   - Each domain config must have non-blank `certPath` and `keyPath`.
   - Files must exist and be readable.
   - Pattern must match regex `^(\*\.)?([a-z0-9-]+\.)+[a-z]{2,}$` 
(case-insensitive, normalized to lower-case).
   - Patterns starting with `*.` may have only one wildcard at the leading 
position.
   - Duplicate patterns → fail-fast.
   
   ---
   
   ## 7. Backward Compatibility
   
   - When `tlsDomainConfigs` is empty (or absent), `TlsSniManager` is **not 
constructed**; the negotiator/remoting take the existing legacy code paths 
byte-for-byte.
   - `TlsCertificateManager`'s public surface preserved; new multi-watcher 
logic only activates when more than the default pair is registered.
   - No changes to `TlsSystemConfig` (global statics) — broker/nameserver 
unaffected.
   - No new mandatory CLI flags; no protocol-level changes; rolling upgrade 
compatible.
   
   ---
   
   ## 8. Risks & Mitigations
   
   | Risk | Mitigation |
   |---|---|
   | gRPC's `InternalProtocolNegotiators.serverTls(...)` expects the 
`SslContext` at handler-construction time; using `SniHandler` defers context 
selection. | Subclass `SniHandler` and override `newSslHandler(SslContext, 
ByteBufAllocator)` to invoke gRPC's negotiator with the resolved context (the 
same wiring it does internally). Cover with `EmbeddedChannel` test sending a 
real ClientHello bytestream. |
   | Pipeline ordering with HAProxy protocol decoder. | Install order remains: 
`HAProxyMessageDecoder` → `TlsModeHandler` → (`SniHandler` → `SslHandler`) → 
app handlers. SNI handler must come **after** any proxy-protocol stripping so 
the first bytes it inspects are the real ClientHello. |
   | Hot-reload race: a connection mid-handshake while context swaps. | New 
connections only pick up the new context. In-flight handshake uses the snapshot 
it captured at `SniHandler.newSslHandler` time. Use `volatile` reference swap 
inside `TlsSniManager` — no locking on hot path. |
   | Misconfigured pattern silently falls through to default cert (browser 
shows "wrong domain"). | Log a `WARN` once per unique unmatched SNI hostname 
(rate-limited). |
   | Missing/unreadable cert at runtime reload. | Keep the previous good 
context; log `ERROR`; expose a metric 
`proxy_tls_cert_reload_failures_total{pattern=...}`. |
   | Memory cost of N `SslContext` instances. | Negligible (KBs each); document 
recommended cap ~50 domains. |
   
   ---
   
   ## 9. Test Plan
   
   ### Unit
   - `SniHostnameMatcherTest` — full matrix from §5.
   - `TlsSniManagerTest` — context registration, lookup, reload, concurrent 
reload while resolving (use `CountDownLatch`).
   - `TlsCertificateManagerTest` — extended for multi-pair watchers; ensure 
single-pair legacy path unchanged.
   
   ### Integration (Netty `EmbeddedChannel`)
   - Feed a synthetic ClientHello with SNI = `foo.example.com`; assert the 
resolved `SslContext` corresponds to `*.example.com`.
   - Feed ClientHello with no SNI; assert default context.
   - Feed ClientHello with SNI = `unknown.test`; assert default context + WARN 
log.
   - Verify ALPN still negotiates `h2` on gRPC pipeline after SNI resolution.
   
   ### Manual / E2E
   - `openssl s_client -connect proxy:8443 -servername foo.example.com 
-showcerts` → returns `example.crt`.
   - `openssl s_client -connect proxy:8443 -servername foo.sample.org 
-showcerts` → returns `sample.crt`.
   - `openssl s_client -connect proxy:8443 -servername other.test -showcerts` → 
returns `default.crt`.
   - Touch a domain cert on disk; within `tlsCertWatchIntervalMs`, new 
connections present the updated cert (existing connections unaffected).
   - Restart Proxy with `tlsDomainConfigs` removed → behaves identically to 
current release.
   
   ### Compatibility
   - Run full existing TLS test suite 
(`proxy/src/test/java/.../ProxyAndTlsProtocolNegotiatorTest`, 
`TlsCertificateManagerTest`, `remoting/.../TlsTest`) — must pass without 
modification.
   
   ---
   
   ## 10. Rollout
   
   1. PR #1: introduce `TlsDomainConfig`, `SniHostnameMatcher`, `TlsSniManager` 
(with tests) — no wiring yet. Safe to merge.
   2. PR #2: extend `TlsCertificateManager` for multi-pair watching (with 
tests).
   3. PR #3: wire gRPC negotiator + remoting helper + `ProxyStartup`; gated on 
non-empty `tlsDomainConfigs`.
   4. Docs: update `docs/cn/Configuration_TLS.md` and 
`docs/en/Configuration_TLS.md` with SNI section + YAML example.
   
   ---
   
   ## 11. Open Questions
   
   1. Should the default cert be optional when `tlsDomainConfigs` is set (i.e. 
require SNI from clients)? Current design keeps the default mandatory — simpler 
and avoids handshake failures for legacy clients. Recommend: keep default 
mandatory.
   2. Should `TlsSniManager` also be wired into `NettyRemotingServer` outside 
of proxy (broker/namesrv)? Out of scope for this issue; track as a follow-up.
   3. Metric/log namespace conventions — align with 
`org.apache.rocketmq.proxy.metrics.*`.
   
   ---
   
   ## 12. Acceptance Criteria
   
   - [ ] `tlsDomainConfigs` accepted in `proxy.json` / yaml and parsed into 
`Map<String, TlsDomainConfig>`.
   - [ ] gRPC and Remoting both serve correct cert per SNI hostname on the same 
port.
   - [ ] Wildcard matching matrix (§5) fully covered by unit tests.
   - [ ] Hot-reload of any single cert/key pair does not interrupt other 
domains' traffic.
   - [ ] Empty/absent `tlsDomainConfigs` → bit-identical behavior to current 
release (verified by existing test suite).
   - [ ] Documentation updated in both `docs/cn` and `docs/en`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to