[
https://issues.apache.org/jira/browse/CASSANDRA-21260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrés Beck-Ruiz updated CASSANDRA-21260:
-----------------------------------------
Status: Needs Committer (was: Patch Available)
> SSTable header contains unknown columns from other tables
> ---------------------------------------------------------
>
> Key: CASSANDRA-21260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21260
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Legacy/Local Write-Read Paths
> Reporter: Yuqi Yan
> Assignee: Andrés Beck-Ruiz
> Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> Slack: [https://the-asf.slack.com/archives/CJZLTM05A/p1762461191355119]
> Multiple reports on this issue (reported by [~gilg]):
> {code:java}
> Hey, I have an issue with a 4.1.5 cluster.For some reason a few tables appear
> to have bad metadata - sstablemetadata reports certain columns which are not
> really there (sstabledump does not show said columns on those sstables), on a
> lot of sstables, most of the table actually.Not sure exactly how it got there
> (we suspect someone loaded sstables of other tables since wrong column names
> match ones in other tables), but right now situation is some servers work
> fine and serve clients, while other servers, who have gone through restarts,
> are virtually without data for those tables, since on startup there were
> exceptions during sstables loading - "Unknown column xxx during
> deserialization" {code}
> reported by [~tolbertam] :
> {code:java}
> it's something I saw once recently, but I haven't been able to reproduce it.
> I suspect it has something to do with the refactoring around making Schema
> pluggable (https://issues.apache.org/jira/browse/CASSANDRA-17044). When I
> saw Gil's post it reminded me a lot of what I saw
> My suspicion is it's some kind of timing issue with schema changes being
> pushed out, while also being pulled from other instances.
> Like, if you have a bunch of concurrent schema changes on the same keyspaces,
> all submitted to different coordinators submitted relatively at the same
> time, without schema disagreement, that has traditionally caused a lot of
> issues with Cassandra. {code}
> reported by [~curlylrt] / [~yukei]
> {code:java}
> We were running 4.0.6 when the column was added to a table in another
> keyspace. And now, we are running 4.1.3 and we recently noticed this issue
> because we are seeing error logs when loading sstables during node restart.
> {code}
> Known facts
> * On restart, SSTables with unknown columns in SSTable header will be
> ignored -> once the SSTable header get contaminated, it’s data loss on
> restart (until we scrub and fix the headers)
> * Because the contaminated SSTable will be ignored after restart + other
> node receiving the corrupted SSTables will not be able to parse it and throws
> UnknownColumnException, these contaminated SSTables should not spread to
> other nodes
> * When doing compaction, there is no schema check. I.e. once a SSTable has
> the unexpected columns in the header, new SSTable will inherit the header by
> merging them blindly
> ** This is confirmed in local test by injecting a SSTable with unexpected
> columns and running compaction
> * All the victim tables / offender tables share the same primary key (for
> our case)
> * Seems related to schema change
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]