c-taylor opened a new issue, #13225:
URL: https://github.com/apache/trafficserver/issues/13225

   Worked Copilot suggestion from 
https://github.com/apache/trafficserver/pull/13098
   ----
   
   **Component:** `iocore/net` — TLS cert lookup
   
   **Files:**
   - `src/iocore/net/SSLCertLookup.cc`
   - `src/iocore/net/P_SSLCertLookup.h`
   
   ## Summary
   
   `SSLCertContext::operator=` (`src/iocore/net/SSLCertLookup.cc:245-256`) 
mutates `this->ctx` and the four sibling fields (`opt`, `userconfig`, 
`keyblock`, `ctx_type`) without acquiring `this->ctx_mutex`. The class's 
documented contract — established by `getCtx()` / `setCtx()` 
(`src/iocore/net/SSLCertLookup.cc:259-271`) and by the comment in 
`P_SSLCertLookup.h:122` ("Threadsafe Functions to get and set shared SSL_CTX 
pointer") — is that `ctx` is protected by `ctx_mutex`. `operator=` violates 
that invariant: a thread copy-assigning to `*cc` races against any concurrent 
`cc->getCtx()` / `cc->setCtx()` reader or writer, because the assigner takes a 
lock only on `other.ctx_mutex`, never on `this->ctx_mutex`.
   
   The same gap is present (less severely) for `opt`, `userconfig`, `keyblock`, 
`ctx_type`: those fields are read directly throughout the codebase without any 
lock, and `operator=` writes them with no synchronization at all.
   
   ## Why this matters
   
   Published `SSLCertContext` instances are mutated post-publication today. The 
cert-secret reload path calls `setCtx()` on entries returned by 
`lookup->find(...)`:
   
   - `src/iocore/net/SSLUtils.cc:1704`
   - `src/iocore/net/SSLUtils.cc:1729`
   
   There is currently no in-tree call site that invokes `operator=` on a 
published `SSLCertContext` (insertion goes through `SSLContextStorage::store()` 
→ `ctx_store.push_back(cc)` at `src/iocore/net/SSLCertLookup.cc:437-441`, which 
uses the copy *constructor*, not `operator=`, even on vector reallocation). So 
the bug is latent. But the class invariant is wrong as stated: a future change 
that assigns to a published entry — e.g., a partial-reload variant that swaps a 
slot in-place rather than rebuilding the whole `SSLCertLookup` — would silently 
introduce a data race that current readers via `getCtx()` would not be 
protected against.
   
   ## Reproduction sketch (conceptual — not currently triggerable in tree)
   
   ```cpp
   SSLCertContext *cc = lookup->find(name);   // published entry
   // Thread A:
   auto p = cc->getCtx();                     // lock_guard(ctx_mutex), reads 
ctx
   // Thread B:
   *cc = SSLCertContext(new_ctx, ...);        // takes other.ctx_mutex,
                                              // then writes cc->ctx with NO 
lock on cc->ctx_mutex
   ```
   
   Thread A and Thread B race on `cc->ctx` (a `std::shared_ptr`). Tearing 
during the assignment can leave `cc->ctx` observably half-updated to A; Thread 
A may also load a refcount/control-block pair that disagree, leading to 
use-after-free or double-decrement on the underlying `SSL_CTX`.
   
   ## Possible Fix
   
   Lock both `other.ctx_mutex` and `this->ctx_mutex` in `operator=`, using 
`std::lock` to acquire them in a deadlock-free order, and move *all* field 
assignments inside the locked section so the assignment is atomic from any 
reader's perspective. The self-assignment guard (`&other != this`) must remain 
to avoid double-locking the same mutex.
   
   Apply this patch to `src/iocore/net/SSLCertLookup.cc:245-256`:
   
   ```cpp
   SSLCertContext &
   SSLCertContext::operator=(SSLCertContext const &other)
   {
     if (&other != this) {
       std::unique_lock<std::mutex> other_lock(other.ctx_mutex, 
std::defer_lock);
       std::unique_lock<std::mutex> this_lock(this->ctx_mutex, std::defer_lock);
       std::lock(other_lock, this_lock);
       this->opt        = other.opt;
       this->userconfig = other.userconfig;
       this->keyblock   = other.keyblock;
       this->ctx_type   = other.ctx_type;
       this->ctx        = other.ctx;
     }
     return *this;
   }
   ```
   
   Note: the copy constructor at `src/iocore/net/SSLCertLookup.cc:234-242` is 
*not* affected — `this` is not yet observable during construction, so locking 
`this->ctx_mutex` is unnecessary. Locking `other.ctx_mutex` is sufficient and 
is already done.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to