Hi Andrés, Isaac, Thank you for the detailed write-up, Andrés. Your investigation into the FastBuilder.reset() bug was the starting point for our own analysis, which led us to identify an additional impact beyond the ClassCastException.
Isaac — yes, we believe CASSANDRA-21260 and CASSANDRA-21216 are directly related. CASSANDRA-21260 was filed by our team to track the SSTable header contamination we've been seeing. Based on Andrés' findings about the stale savedBuffer/savedNextKey in FastBuilder.reset(), we investigated whether the same bug could explain our corrupted SSTable headers — and we believe it does. What we observed (CASSANDRA-21260) We have been seeing corrupted SSTable headers where an SSTable for one table contains column metadata belonging to a completely different table. When we deserialize the on-disk SerializationHeader.Component and compare it against the table's TableMetadata, we find column names that are not part of the table's schema — they belong to another table in the same keyspace. In one case, a table with ~2000 columns had 29 foreign columns from a ~150-column table embedded in its SSTable header. These corrupted SSTables are otherwise structurally valid — they are accepted into the live set and only detected by explicit header validation we added. The foreign columns do not correspond to dropped columns or any prior schema version of the affected table. As noted in CASSANDRA-21260, once a corrupted SSTable exists, compaction merges headers blindly, so the contamination propagates to new SSTables indefinitely. How the FastBuilder bug (CASSANDRA-21216) causes this Building on Andrés' analysis of the FastBuilder state leakage, we traced a path from the stale savedBuffer/savedNextKey all the way to on-disk SSTable header contamination: 1. A schema disagreement (e.g. during column addition) causes an internode READ_REQ deserialization to fail on a replica. Columns.Serializer.deserialize() uses a thread-local pooled FastBuilder, and if the table has more than 31 columns, the overflow populates savedBuffer and savedNextKey before the exception. Since reset() does not clear these fields, the FastBuilder is returned to the pool with stale ColumnMetadata from the source table. 2. When a deletion-only mutation (partition delete or range tombstone) for a different table is later deserialized on the same thread, Columns.Serializer.deserialize() acquires the poisoned FastBuilder. The stale ColumnMetadata from the source table are drained into the victim table's Columns via propagateOverflow(). Because the mutation contains only a deletion — no rows, no static row — no per-row column-subset deserialization occurs, so the contaminated Columns survives without error. (Mutations with actual row data would fail due to subset encoding mismatches, which is why only deletion-only mutations propagate the contamination silently.) When the contaminated PartitionUpdate is applied to the memtable, ColumnsCollector.update() records the foreign ColumnMetadata. At flush, BigTableWriter.openFinal() writes the SSTable using the in-memory SerializationHeader directly, bypassing toHeader() validation. The result is an on-disk SSTable whose header contains columns from the wrong table. This also affects small messages on the Netty event loop Andrés, your investigation focused on wide tables where messages exceed the ~64KB large-message threshold and are deserialized on SEPWorker threads. We found that the same contamination also occurs with small messages deserialized on the Netty event loop. For messages under 64KB, processSmallMessage() deserializes the payload inline on the event loop thread, which has its own TinyThreadLocalPool<FastBuilder>. Since Netty binds each channel to a single EventLoop, messages from the same peer are handled by the same thread — making thread reuse virtually guaranteed rather than probabilistic. This lowers the trigger threshold significantly: the source table only needs more than 31 columns (for FastBuilder overflow) rather than the ~4200 needed to exceed the large-message threshold. In our case, a 150-column table was the contamination source. The 29 foreign columns we observed are consistent with the 31 + 1 items retained in savedBuffer/savedNextKey, minus a few consumed as internal BTree node keys during build(). Summary We strongly support the proposed fix to clear savedBuffer and savedNextKey in FastBuilder.reset(). Beyond the ClassCastException that Andrés identified, the same bug can cause the silent SSTable header contamination tracked in CASSANDRA-21260. We have written JVM dtests reproducing both the large-message and small-message contamination paths and are happy to share them. Best regards Runtian >
