mkrug1981 opened a new issue, #13247:
URL: https://github.com/apache/trafficserver/issues/13247
## Bug: slice plugin fatal assertion — `TSHttpTxnEffectiveUrlStringGet`
called with invalid txn in `read_resp_hdr` (introduced by #11618)
### Summary
ATS crashes with a fatal assertion when the slice plugin calls
`TSHttpTxnEffectiveUrlStringGet` inside its server intercept response handler.
The call site was introduced by the Conditional Slicing feature (#11618). At
the point `read_resp_hdr` fires inside the intercept PluginVC, the original
transaction's HTTP state machine has already advanced past the hook phase where
this API is valid, causing `sdk_sanity_check_txn` to fail and ATS to abort.
This is a **fleet-wide, reproducible crash** observed on two independent
hosts running the same slice.so build, confirming it is not an isolated
incident.
### Version
- **ATS version:** 10.1.3 (RPM build path references 10.1.2 sources —
possible packaging issue worth investigating separately)
- **OS:** RHEL 8 / Linux x86_64, kernel 4.18.0-553.x.el8_10
- **Build date:** 2026-05-26
### Plugins loaded
`slice.so`, `cache_range_requests.so`, `header_rewrite.so`, `compress.so`,
`regex_remap.so`, `cachekey.so`, `background_fetch.so`, `maxmind_acl.so`,
`tslua.so`
### Steps to reproduce
1. Configure ATS with the `slice` plugin active on remap rules, with or
without `--minimum-size` (conditional slicing).
2. Serve traffic that triggers the slice intercept path (range requests or
slice-eligible objects).
3. Under production load, ATS fatally aborts.
### Observed behaviour
Fatal assertion in `src/api/InkAPI.cc:3940`:
```
Fatal: /rpmbuilddir/BUILD/trafficserver-10.1.2/src/api/InkAPI.cc:3940:
failed assertion `TS_SUCCESS == sdk_sanity_check_txn(txnp)`
```
Crash stack (from `traffic_crashlog`), identical across both affected hosts:
```
Thread [ET_NET N]:
crash_logger_invoke
gsignal / abort
ink_abort
_ink_assert
_TSReleaseAssert
TSHttpTxnEffectiveUrlStringGet(txnp, &urllen) ← assertion fires here
handle_server_resp(tsapi_cont*, TSEvent, Data*) [slice.so +0x4eb]
intercept_hook(tsapi_cont*, TSEvent, void*) [slice.so +0x41d]
INKContInternal::handle_event
PluginVC::process_write_side
PluginVC::main_handler
```
Identical function offsets (`handle_server_resp +0x4eb`, `intercept_hook
+0x41d`) on both hosts confirm the same slice.so binary is affected fleet-wide.
### Root cause
PR #11618 (Conditional Slicing) added a new `read_resp_hdr` function in
`plugins/slice/server.cc` that calls `TSHttpTxnEffectiveUrlStringGet(txnp)` to
look up the effective URL for updating the object size cache. This function is
registered as a `TS_HTTP_READ_RESPONSE_HDR_HOOK` and also fires from within the
intercept handler chain via `handle_server_resp`.
When slice sets up a server intercept via `TSContCreate(intercept_hook,
mutex)`, the `PluginVC` drives a fake origin exchange. When
`handle_server_resp` fires inside that intercept, the `txnp` is the intercept
pseudo-transaction. The original HTTP SM-backed transaction has already
advanced past `TS_HTTP_READ_REQUEST_HDR_HOOK` state, so `sdk_sanity_check_txn`
— which validates that the pointer refers to an `HttpSM` in an expected hook
state — returns `TS_ERROR`, triggering the release assert.
This is a fundamental API lifecycle violation:
`TSHttpTxnEffectiveUrlStringGet` is only valid while the original transaction
is in a hook callback, not from inside an intercept server response handler.
### Suggested fix
The effective URL should be captured once in `read_request` (where `txnp` is
fully valid) and stored in the `Data` struct, then read from there in
`read_resp_hdr` instead of calling the API again.
**1. Add a field to the `Data` struct:**
```cpp
struct Data {
// ... existing fields ...
std::string effective_url; // captured at read_request time, before
data.release()
};
```
**2. Capture in `read_request` before `data.release()`:**
```cpp
int urllen = 0;
char *urlstr = TSHttpTxnEffectiveUrlStringGet(txnp, &urllen);
if (urlstr != nullptr) {
data->effective_url.assign(urlstr, urllen);
TSfree(urlstr);
}
// ... then data.release() as before
```
**3. Use the stored URL in `read_resp_hdr` (intercept path):**
```cpp
// Remove: char *urlstr = TSHttpTxnEffectiveUrlStringGet(txnp, &urllen);
// Replace with:
Data *data = static_cast<Data *>(TSContDataGet(contp));
if (data->effective_url.empty()) {
// no URL available — update stats, reenable, return
}
std::string_view url = data->effective_url;
```
**4. Non-intercept path (first-time large object discovery):**
For the non-intercepted txn hook, allocate a small struct at hook
registration time in `read_request` (while `txnp` is still valid), capture the
URL there, and pass it as cont data. Destroy it in the response handler after
use.
```cpp
struct RespHdrData {
Config *config;
std::string effective_url;
};
// In read_request, non-intercept branch:
auto *rhdata = new RespHdrData{config, {}};
int urllen = 0;
char *urlstr = TSHttpTxnEffectiveUrlStringGet(txnp, &urllen);
if (urlstr) { rhdata->effective_url.assign(urlstr, urllen); TSfree(urlstr); }
TSCont resp_contp = TSContCreate(read_resp_hdr, nullptr);
TSContDataSet(resp_contp, rhdata);
TSHttpTxnHookAdd(txnp, TS_HTTP_READ_RESPONSE_HDR_HOOK, resp_contp);
// TSContDestroy + delete rhdata after use in the handler
```
### Expected behaviour
The slice plugin should not call `TSHttpTxnEffectiveUrlStringGet` from
within an intercept server response handler. The URL should be captured earlier
in the transaction lifecycle and stored for later use.
### Workaround
Disabling the `slice` plugin stops the crashes. No other workaround is known.
### Additional context
- Crash confirmed on two independent production hosts with identical
`slice.so` binary (confirmed by matching intra-plugin offsets despite different
ASLR load addresses and plugin sandbox UUIDs).
- The version string mismatch (binary reports 10.1.3, RPM build path shows
10.1.2) may indicate a separate packaging issue.
- Signal information and CPU registers are not present in the crashlog
because the crash is triggered by `abort()` rather than a hardware fault — this
is expected behaviour.
- 74 ET_NET threads were active at crash time, indicating a high-concurrency
production environment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]