[ https://issues.apache.org/jira/browse/CASSANDRA-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582286#comment-16582286 ]
Benedict edited comment on CASSANDRA-14568 at 8/16/18 9:48 AM: --------------------------------------------------------------- OK. So, [here|https://github.com/belliottsmith/cassandra/commits/14568-2] is may second attempt at fixing this. In the process of adding improved assertion logic, I realised we might have another bug around dropped static collection column, that could have resulted in decoding a static collection deletion as a whole-static-row deletion (with unknown semantics, since I vaguely recall that our correctness in some places depends on there being no such deletions). In essence: if on looking up the collectionNameBytes, we found no collectionName (due, for instance, to it having been dropped), we would be left with a only a complete static row bound to construct. Perhaps I should split this fix into a separate ticket, for a separate CHANGES.txt mention? We clearly need to introduce some upgrade dtests to cover these cases as well was (Author: benedict): OK. So, [here|https://github.com/belliottsmith/cassandra/commits/14568-2] is may second attempt at fixing this. In the process of adding improved assertion logic, I realised we might have another bug around dropped static collection column, that could have resulted in decoding a static collection deletion as a whole-static-row deletion (with unknown semantics, since I vaguely recall that our correctness in some places depends on there being no such deletions). In essence: if on looking up the collectionNameBytes, we found no collectionName (due, for instance, to it having been dropped), we would be left with a only a complete static row bound to construct. We clearly need to introduce some upgrade dtests to cover these cases as well > Static collection deletions are corrupted in 3.0 -> 2.{1,2} messages > -------------------------------------------------------------------- > > Key: CASSANDRA-14568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14568 > Project: Cassandra > Issue Type: Bug > Reporter: Benedict > Assignee: Benedict > Priority: Critical > Fix For: 3.0.17, 3.11.3 > > > In 2.1 and 2.2, row and complex deletions were represented as range > tombstones. LegacyLayout is our compatibility layer, that translates the > relevant RT patterns in 2.1/2.2 to row/complex deletions in 3.0, and vice > versa. Unfortunately, it does not handle the special case of static row > deletions, they are treated as regular row deletions. Since static rows are > themselves never directly deleted, the only issue is with collection > deletions. > Collection deletions in 2.1/2.2 were encoded as a range tombstone, consisting > of a sequence of the clustering keys’ data for the affected row, followed by > the bytes representing the name of the collection column. STATIC_CLUSTERING > contains zero clusterings, so by treating the deletion as for a regular row, > zero clusterings are written to precede the column name of the erased > collection, so the column name is written at position zero. > This can exhibit itself in at least two ways: > # If the type of your first clustering key is a variable width type, new > deletes will begin appearing covering the clustering key represented by the > column name. > ** If you have multiple clustering keys, you will receive a RT covering all > those rows with a matching first clustering key. > ** This RT will be valid as far as the system is concerned, and go > undetected unless there are outside data quality checks in place. > # Otherwise, an invalid size of data will be written to the clustering and > sent over the network to the 2.1 node. > ** The 2.1/2.2 node will handle this just fine, even though the record is > junk. Since it is a deletion covering impossible data, there will be no > user-API visible effect. But if received as a write from a 3.0 node, it will > dutifully persist the junk record. > ** The 3.0 node that originally sent this junk, may later coordinate a read > of the partition, and will notice a digest mismatch, read-repair and > serialize the junk to disk > ** The sstable containing this record is now corrupt; the deserialization > expects fixed-width data, but it encounters too many (or too few) bytes, and > is now at an incorrect position to read its structural information > ** (Alternatively when the 2.1 node is upgraded this will occur on eventual > compaction) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org