On Fri, Apr 10, 2026 at 9:37 AM Runtian Liu <[email protected]> wrote:
>
> Hi Andrés, Isaac,
>
> Thank you for the detailed write-up, Andrés. Your investigation into the 
> FastBuilder.reset() bug was the starting point for our own analysis, which 
> led us to identify an additional impact beyond the ClassCastException.
>
> Isaac — yes, we believe CASSANDRA-21260 and CASSANDRA-21216 are directly 
> related. CASSANDRA-21260 was filed by our team to track the SSTable header 
> contamination we've been seeing. Based on Andrés' findings about the stale 
> savedBuffer/savedNextKey in FastBuilder.reset(), we investigated whether the 
> same bug could explain our corrupted SSTable headers — and we believe it does.
>
> What we observed (CASSANDRA-21260)
>
> We have been seeing corrupted SSTable headers where an SSTable for one table 
> contains column metadata belonging to a completely different table. When we 
> deserialize the on-disk SerializationHeader.Component and compare it against 
> the table's TableMetadata, we find column names that are not part of the 
> table's schema — they belong to another table in the same keyspace. In one 
> case, a table with ~2000 columns had 29 foreign columns from a ~150-column 
> table embedded in its SSTable header.
>
> These corrupted SSTables are otherwise structurally valid — they are accepted 
> into the live set and only detected by explicit header validation we added. 
> The foreign columns do not correspond to dropped columns or any prior schema 
> version of the affected table. As noted in CASSANDRA-21260, once a corrupted 
> SSTable exists, compaction merges headers blindly, so the contamination 
> propagates to new SSTables indefinitely.

I do not want to be a scope creep here, but there is also
CASSANDRA-21000 which will keep deleted columns in there forever. I do
not see any issue with fixing it (details in the ticket), but at the
same time I can not say 100% that it will not have any side-effects we
did not count on.

However, if we change the logic / do some fixes around
SerializationHeader, I think it would be great to think about the
inclusion of this ticket as well / to have it in mind.

> How the FastBuilder bug (CASSANDRA-21216) causes this
>
> Building on Andrés' analysis of the FastBuilder state leakage, we traced a 
> path from the stale savedBuffer/savedNextKey all the way to on-disk SSTable 
> header contamination:
>
> 1. A schema disagreement (e.g. during column addition) causes an internode 
> READ_REQ deserialization to fail on a replica. 
> Columns.Serializer.deserialize() uses a thread-local pooled FastBuilder, and 
> if the table has more than 31 columns, the overflow populates savedBuffer and 
> savedNextKey before the exception. Since reset() does not clear these fields, 
> the FastBuilder is returned to the pool with stale ColumnMetadata from the 
> source table.
>
> 2. When a deletion-only mutation (partition delete or range tombstone) for a 
> different table is later deserialized on the same thread, 
> Columns.Serializer.deserialize() acquires the poisoned FastBuilder. The stale 
> ColumnMetadata from the source table are drained into the victim table's 
> Columns via propagateOverflow(). Because the mutation contains only a 
> deletion — no rows, no static row — no per-row column-subset deserialization 
> occurs, so the contaminated Columns survives without error. (Mutations with 
> actual row data would fail due to subset encoding mismatches, which is why 
> only deletion-only mutations propagate the contamination silently.)
>
> When the contaminated PartitionUpdate is applied to the memtable, 
> ColumnsCollector.update() records the foreign ColumnMetadata. At flush, 
> BigTableWriter.openFinal() writes the SSTable using the in-memory 
> SerializationHeader directly, bypassing toHeader() validation. The result is 
> an on-disk SSTable whose header contains columns from the wrong table.
>
> This also affects small messages on the Netty event loop
>
> Andrés, your investigation focused on wide tables where messages exceed the 
> ~64KB large-message threshold and are deserialized on SEPWorker threads. We 
> found that the same contamination also occurs with small messages 
> deserialized on the Netty event loop.
>
> For messages under 64KB, processSmallMessage() deserializes the payload 
> inline on the event loop thread, which has its own 
> TinyThreadLocalPool<FastBuilder>. Since Netty binds each channel to a single 
> EventLoop, messages from the same peer are handled by the same thread — 
> making thread reuse virtually guaranteed rather than probabilistic.
>
> This lowers the trigger threshold significantly: the source table only needs 
> more than 31 columns (for FastBuilder overflow) rather than the ~4200 needed 
> to exceed the large-message threshold. In our case, a 150-column table was 
> the contamination source. The 29 foreign columns we observed are consistent 
> with the 31 + 1 items retained in savedBuffer/savedNextKey, minus a few 
> consumed as internal BTree node keys during build().
>
> Summary
>
> We strongly support the proposed fix to clear savedBuffer and savedNextKey in 
> FastBuilder.reset(). Beyond the ClassCastException that Andrés identified, 
> the same bug can cause the silent SSTable header contamination tracked in 
> CASSANDRA-21260. We have written JVM dtests reproducing both the 
> large-message and small-message contamination paths and are happy to share 
> them.
>
> Best regards
> Runtian

Reply via email to