[ https://issues.apache.org/jira/browse/CASSANDRA-14400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449385#comment-16449385 ]
Kurt Greaves commented on CASSANDRA-14400: ------------------------------------------ Interesting... I can reproduce this and understand a bit better now after a bit of research. bq. it might stay in pending until a compaction has run bq. Nope, it's exclusively compaction. They were probably just compacted away on startup. This is only partially true. It will actually switch from pending to repaired as soon as {{org.apache.cassandra.db.compaction.CompactionStrategyManager#getNextBackgroundTask}} is called as pending repairs are the first thing it checks for, and it doesn't actually require a "compaction" in a traditional sense, it will just update the metadata (which is why the generation number doesn't change). A side effect of restarting is that we enable compaction on all keyspaces during startup and subsequently call {{getNextBackgroundTask()}}, which finds the SSTable pending repair and marks it as repaired. So the caveat is it will stay in pending until we _attempt_ to trigger a compaction - not necessarily that a compaction has run, or that the specific SSTable is included in a compaction. This seems perfectly fine to me, just documenting findings here in case someone else gets a little confused like I did. The same behaviour as shown below can be achieved by {{nodetool disableautocompaction; nodetool enableautocompaction}} rather than stopping/starting the node. After repair: {code:java} CASSANDRA_INCLUDE=~/.ccm/kgreav-3nodes/node1/bin/cassandra.in.sh ~/werk/cstar/kgreav-cassandra/tools/bin/sstablemetadata na-39-big-Data.db SSTable: /home/kurt/.ccm/kgreav-3nodes/node1/data0/aoeu/aoeu-c2c45b00439011e8bfc8737d74e3e5df/na-39-big First token: -8223339496150845696 (derphead5731287) Last token: -8023360031800191250 (derphead3351464) Repaired at: 0 Pending repair: 825565d0-4784-11e8-b1b1-8f56691c789f ccm node1 stop CASSANDRA_INCLUDE=~/.ccm/kgreav-3nodes/node1/bin/cassandra.in.sh ~/werk/cstar/kgreav-cassandra/tools/bin/sstablemetadata na-39-big-Data.db SSTable: /home/kurt/.ccm/kgreav-3nodes/node1/data0/aoeu/aoeu-c2c45b00439011e8bfc8737d74e3e5df/na-39-big First token: -8223339496150845696 (derphead5731287) Last token: -8023360031800191250 (derphead3351464) SSTable Level: 0 Repaired at: 0 Pending repair: 825565d0-4784-11e8-b1b1-8f56691c789f ccm node1 start CASSANDRA_INCLUDE=~/.ccm/kgreav-3nodes/node1/bin/cassandra.in.sh ~/werk/cstar/kgreav-cassandra/tools/bin/sstablemetadata na-39-big-Data.db SSTable: /home/kurt/.ccm/kgreav-3nodes/node1/data0/aoeu/aoeu-c2c45b00439011e8bfc8737d74e3e5df/na-39-big First token: -8223339496150845696 (derphead5731287) Last token: -8023360031800191250 (derphead3351464) Repaired at: 1524549508277 (04/24/2018 05:58:28) Pending repair: -- {code} > Subrange repair doesn't always mark as repaired > ----------------------------------------------- > > Key: CASSANDRA-14400 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14400 > Project: Cassandra > Issue Type: Bug > Reporter: Kurt Greaves > Priority: Major > > So was just messing around with subrange repair on trunk and found that if I > generated an SSTable with a single token and then tried to repair that > SSTable using subrange repairs it wouldn't get marked as repaired. > > Before repair: > {code:java} > First token: -9223362383595311662 (derphead4471291) > Last token: -9223362383595311662 (derphead4471291) > Repaired at: 0 > Pending repair: 862395e0-4394-11e8-8f20-3b8ee110d005 > {code} > Repair command: > {code} > ccm node1 nodetool "repair -st -9223362383595311663 -et -9223362383595311661 > aoeu" > [2018-04-19 05:44:42,806] Starting repair command #7 > (c23f76c0-4394-11e8-8f20-3b8ee110d005), repairing keyspace aoeu with repair > options (parallelism: parallel, primary range: false, incremental: true, job > threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], previewKind: > NONE, # of ranges: 1, pull repair: false, force repair: false, optimise > streams: false) > [2018-04-19 05:44:42,843] Repair session c242d220-4394-11e8-8f20-3b8ee110d005 > for range [(-9223362383595311663,-9223362383595311661]] finished (progress: > 20%) > [2018-04-19 05:44:43,139] Repair completed successfully > [2018-04-19 05:44:43,140] Repair command #7 finished in 0 seconds > {code} > After repair SSTable hasn't changed and sstablemetadata outputs: > {code} > First token: -9223362383595311662 (derphead4471291) > Last token: -9223362383595311662 (derphead4471291) > Repaired at: 0 > Pending repair: 862395e0-4394-11e8-8f20-3b8ee110d005 > {code} > And parent_repair_history states that the repair is complete/range was > successful: > {code} > select * from system_distributed.parent_repair_history where > parent_id=862395e0-4394-11e8-8f20-3b8ee110d005 ; > parent_id | columnfamily_names | > exception_message | exception_stacktrace | finished_at | > keyspace_name | options > > > | requested_ranges > | started_at | successful_ranges > --------------------------------------+--------------------+-------------------+----------------------+---------------------------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+---------------------------------+------------------------------------------------- > 862395e0-4394-11e8-8f20-3b8ee110d005 | {'aoeu'} | > null | null | 2018-04-19 05:43:14.578000+0000 | aoeu > | {'dataCenters': '', 'forceRepair': 'false', 'hosts': '', 'incremental': > 'true', 'jobThreads': '1', 'optimiseStreams': 'false', 'parallelism': > 'parallel', 'previewKind': 'NONE', 'primaryRange': 'false', 'pullRepair': > 'false', 'sub_range_repair': 'true', 'trace': 'false'} | > {'(-9223362383595311663,-9223362383595311661]'} | 2018-04-19 > 05:43:01.952000+0000 | {'(-9223362383595311663,-9223362383595311661]'} > {code} > Subrange repairs seem to work fine over large ranges and set {{Repaired at}} > as expected, but I haven't figured out why it works for a large range versus > a small range so far. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org