On Fri, Apr 10, 2026 at 9:37 AM Runtian Liu <[email protected]> wrote: > > Hi Andrés, Isaac, > > Thank you for the detailed write-up, Andrés. Your investigation into the > FastBuilder.reset() bug was the starting point for our own analysis, which > led us to identify an additional impact beyond the ClassCastException. > > Isaac — yes, we believe CASSANDRA-21260 and CASSANDRA-21216 are directly > related. CASSANDRA-21260 was filed by our team to track the SSTable header > contamination we've been seeing. Based on Andrés' findings about the stale > savedBuffer/savedNextKey in FastBuilder.reset(), we investigated whether the > same bug could explain our corrupted SSTable headers — and we believe it does. > > What we observed (CASSANDRA-21260) > > We have been seeing corrupted SSTable headers where an SSTable for one table > contains column metadata belonging to a completely different table. When we > deserialize the on-disk SerializationHeader.Component and compare it against > the table's TableMetadata, we find column names that are not part of the > table's schema — they belong to another table in the same keyspace. In one > case, a table with ~2000 columns had 29 foreign columns from a ~150-column > table embedded in its SSTable header. > > These corrupted SSTables are otherwise structurally valid — they are accepted > into the live set and only detected by explicit header validation we added. > The foreign columns do not correspond to dropped columns or any prior schema > version of the affected table. As noted in CASSANDRA-21260, once a corrupted > SSTable exists, compaction merges headers blindly, so the contamination > propagates to new SSTables indefinitely.
I do not want to be a scope creep here, but there is also CASSANDRA-21000 which will keep deleted columns in there forever. I do not see any issue with fixing it (details in the ticket), but at the same time I can not say 100% that it will not have any side-effects we did not count on. However, if we change the logic / do some fixes around SerializationHeader, I think it would be great to think about the inclusion of this ticket as well / to have it in mind. > How the FastBuilder bug (CASSANDRA-21216) causes this > > Building on Andrés' analysis of the FastBuilder state leakage, we traced a > path from the stale savedBuffer/savedNextKey all the way to on-disk SSTable > header contamination: > > 1. A schema disagreement (e.g. during column addition) causes an internode > READ_REQ deserialization to fail on a replica. > Columns.Serializer.deserialize() uses a thread-local pooled FastBuilder, and > if the table has more than 31 columns, the overflow populates savedBuffer and > savedNextKey before the exception. Since reset() does not clear these fields, > the FastBuilder is returned to the pool with stale ColumnMetadata from the > source table. > > 2. When a deletion-only mutation (partition delete or range tombstone) for a > different table is later deserialized on the same thread, > Columns.Serializer.deserialize() acquires the poisoned FastBuilder. The stale > ColumnMetadata from the source table are drained into the victim table's > Columns via propagateOverflow(). Because the mutation contains only a > deletion — no rows, no static row — no per-row column-subset deserialization > occurs, so the contaminated Columns survives without error. (Mutations with > actual row data would fail due to subset encoding mismatches, which is why > only deletion-only mutations propagate the contamination silently.) > > When the contaminated PartitionUpdate is applied to the memtable, > ColumnsCollector.update() records the foreign ColumnMetadata. At flush, > BigTableWriter.openFinal() writes the SSTable using the in-memory > SerializationHeader directly, bypassing toHeader() validation. The result is > an on-disk SSTable whose header contains columns from the wrong table. > > This also affects small messages on the Netty event loop > > Andrés, your investigation focused on wide tables where messages exceed the > ~64KB large-message threshold and are deserialized on SEPWorker threads. We > found that the same contamination also occurs with small messages > deserialized on the Netty event loop. > > For messages under 64KB, processSmallMessage() deserializes the payload > inline on the event loop thread, which has its own > TinyThreadLocalPool<FastBuilder>. Since Netty binds each channel to a single > EventLoop, messages from the same peer are handled by the same thread — > making thread reuse virtually guaranteed rather than probabilistic. > > This lowers the trigger threshold significantly: the source table only needs > more than 31 columns (for FastBuilder overflow) rather than the ~4200 needed > to exceed the large-message threshold. In our case, a 150-column table was > the contamination source. The 29 foreign columns we observed are consistent > with the 31 + 1 items retained in savedBuffer/savedNextKey, minus a few > consumed as internal BTree node keys during build(). > > Summary > > We strongly support the proposed fix to clear savedBuffer and savedNextKey in > FastBuilder.reset(). Beyond the ClassCastException that Andrés identified, > the same bug can cause the silent SSTable header contamination tracked in > CASSANDRA-21260. We have written JVM dtests reproducing both the > large-message and small-message contamination paths and are happy to share > them. > > Best regards > Runtian
