[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ZhaoYang updated CASSANDRA-15861: --------------------------------- Bug Category: Parent values: Correctness(12982)Level 1 values: Unrecoverable Corruption / Loss(13161) Complexity: Normal Discovered By: Unit Test Severity: Normal Status: Open (was: Triage Needed) > Muting sstable STATS metadata may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > ------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming > Reporter: ZhaoYang > Assignee: ZhaoYang > Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > I believe similar race may happen with level compaction where it may directly > mutate a sstable's level if it doesn't overlap with sstables at next level. > (Note: this isn't a problem in legacy streaming as STATS file length didn't > matter.) > Ideally it will be great to make sstable STATS metadata immutable, just like > other sstable components, so we don't have to worry this special case. For > now, I suggest to use a {{StatsMetadata}} snapshot when initializing > {{CassandraOutgoingFile}} instead of relying on mutable on-disk STATS file. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org