[ https://issues.apache.org/jira/browse/CASSANDRA-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703045#comment-16703045 ]
Sam Tunnicliffe commented on CASSANDRA-14843: --------------------------------------------- +1 - there's an additional/unexpected dtest failure in {{putget_test.py::TestPutGet::test_wide_slice}}, but I've observed this failing on 3.0 yesterday, so I'm planning to investigate that separately. > Drop/add column name with different Kind can result in corruption > ----------------------------------------------------------------- > > Key: CASSANDRA-14843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14843 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Reporter: Benedict > Assignee: Benedict > Priority: Major > Fix For: 3.0.18 > > > While we have always imposed that the type of any column name remains > consistent, we have not done the same for its kind. If a column’s kind is > changed via Drop/Add, the SerializationHeader’s column sets will be > interpreted incorrectly, and may either lead to CorruptSSTableException or > silent data corruption. > The problem occurs in SerializationHeader.Component.toHeader(). In this > method, we lookup columns from the metadata only by name, ignoring the static > status. If a column of the same name but different kind to our dropped > column exists in the schema, we will only add this column to the list of > fields we expect. So we will be missing a column from our set of expected > static columns. We will also add the regular variant to the set of regular > columns present, which it may not be. > There are a number of ways this can play out: > 1) There are no other static columns in the affected sstables. In this case > we will incorrectly treat the table as containing no static data, so we will > not attempt to read any static row. This leaves a static row to be read as > the first regular row, and will throw the exception in the description. > 1a) If for some reason the static row is absent - say, it has expired - then > depending on lexicographical ordering, and if the sstable did not contain any > instance of the regular variant, we may misattribute column data. This will > probably result in corruption exceptions, but if the columns were of the same > type it could be undetectable. > 2) There are other static columns on the table, and in the affected sstables. > In this case, the row will be correctly treated as a static row, but > depending on the dropped column’s lexicographical position relative to these, > it may result in column data being misattributed to other columns. This > *could* be undetectable corruption, if the column types were the same. > 3) Either of these above scenario could be replayed with the static/regular > relations replaced. > CASSANDRA-14591 would protect against 1a and 2. > Probably the safest and correct fix is to prevent adding a column back with a > different kind to the original, however this would require storing more > metadata about dropped columns. This is probably viable for 4.0, but for 3.0 > perhaps too invasive. > It is quite easy to make this particular method safe against this particular > corruption. However it is hard to be certain there aren’t other places where > assumptions were made about a name being of only one kind. There are only a > handful of places that *should* need to be of concern, specifically those > places that invoke CFMetaData.getDroppedColumn, so we should be fairly > confident that - if these call sites are safe - we are now insulated. This > might suffice for 3.0. Thoughts? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org