[ 
https://issues.apache.org/jira/browse/CASSANDRA-21260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrés Beck-Ruiz updated CASSANDRA-21260:
-----------------------------------------
    Status: Needs Committer  (was: Patch Available)

> SSTable header contains unknown columns from other tables
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-21260
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21260
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Legacy/Local Write-Read Paths
>            Reporter: Yuqi Yan
>            Assignee: Andrés Beck-Ruiz
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> Slack: [https://the-asf.slack.com/archives/CJZLTM05A/p1762461191355119]
> Multiple reports on this issue (reported by [~gilg]):
> {code:java}
> Hey, I have an issue with a 4.1.5 cluster.For some reason a few tables appear 
> to have bad metadata - sstablemetadata reports certain columns which are not 
> really there (sstabledump does not show said columns on those sstables), on a 
> lot of sstables, most of the table actually.Not sure exactly how it got there 
> (we suspect someone loaded sstables of other tables since wrong column names 
> match ones in other tables), but right now situation is some servers work 
> fine and serve clients, while other servers, who have gone through restarts, 
> are virtually without data for those tables, since on startup there were 
> exceptions during sstables loading - "Unknown column xxx during 
> deserialization" {code}
> reported by [~tolbertam] :
> {code:java}
> it's something I saw once recently, but I haven't been able to reproduce it.  
> I suspect it has something to do with the refactoring around making Schema 
> pluggable (https://issues.apache.org/jira/browse/CASSANDRA-17044).   When I 
> saw Gil's post it reminded me a lot of what I saw
> My suspicion is it's some kind of timing issue with schema changes being 
> pushed out, while also being pulled from other instances.
> Like, if you have a bunch of concurrent schema changes on the same keyspaces, 
> all submitted to different coordinators submitted relatively at the same 
> time, without schema disagreement, that has traditionally caused a lot of 
> issues with Cassandra. {code}
> reported by [~curlylrt] / [~yukei] 
> {code:java}
> We were running 4.0.6 when the column was added to a table in another 
> keyspace. And now, we are running 4.1.3 and we recently noticed this issue 
> because we are seeing error logs when loading sstables during node restart. 
> {code}
> Known facts
>  * On restart, SSTables with unknown columns in SSTable header will be 
> ignored -> once the SSTable header get contaminated, it’s data loss on 
> restart (until we scrub and fix the headers)
>  * Because the contaminated SSTable will be ignored after restart + other 
> node receiving the corrupted SSTables will not be able to parse it and throws 
> UnknownColumnException, these contaminated SSTables should not spread to 
> other nodes
>  * When doing compaction, there is no schema check. I.e. once a SSTable has 
> the unexpected columns in the header, new SSTable will inherit the header by 
> merging them blindly
>  ** This is confirmed in local test by injecting a SSTable with unexpected 
> columns and running compaction
>  * All the victim tables / offender tables share the same primary key (for 
> our case)
>  * Seems related to schema change



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to