[jira] [Commented] (CASSANDRA-14400) Subrange repair doesn't always mark as repaired

Kurt Greaves (JIRA) Mon, 23 Apr 2018 23:40:39 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449385#comment-16449385
 ]


Kurt Greaves commented on CASSANDRA-14400:
------------------------------------------

Interesting... I can reproduce this and understand a bit better now after a bit 
of research.

bq. it might stay in pending until a compaction has run
bq. Nope, it's exclusively compaction. They were probably just compacted away 
on startup.

This is only partially true. It will actually switch from pending to repaired 
as soon as 
{{org.apache.cassandra.db.compaction.CompactionStrategyManager#getNextBackgroundTask}}
 is called as pending repairs are the first thing it checks for, and it doesn't 
actually require a "compaction" in a traditional sense, it will just update the 
metadata (which is why the generation number doesn't change). A side effect of 
restarting is that we enable compaction on all keyspaces during startup and 
subsequently call {{getNextBackgroundTask()}}, which finds the SSTable pending 
repair and marks it as repaired.

So the caveat is it will stay in pending until we _attempt_ to trigger a 
compaction - not necessarily that a compaction has run, or that the specific 
SSTable is included in a compaction.

This seems perfectly fine to me, just documenting findings here in case someone 
else gets a little confused like I did.

The same behaviour as shown below can be achieved by {{nodetool 
disableautocompaction; nodetool enableautocompaction}} rather than 
stopping/starting the node. 

 After repair: 
{code:java}
CASSANDRA_INCLUDE=~/.ccm/kgreav-3nodes/node1/bin/cassandra.in.sh 
~/werk/cstar/kgreav-cassandra/tools/bin/sstablemetadata na-39-big-Data.db
SSTable: 
/home/kurt/.ccm/kgreav-3nodes/node1/data0/aoeu/aoeu-c2c45b00439011e8bfc8737d74e3e5df/na-39-big
First token: -8223339496150845696 (derphead5731287)
Last token: -8023360031800191250 (derphead3351464)
Repaired at: 0
Pending repair: 825565d0-4784-11e8-b1b1-8f56691c789f

ccm node1 stop

CASSANDRA_INCLUDE=~/.ccm/kgreav-3nodes/node1/bin/cassandra.in.sh 
~/werk/cstar/kgreav-cassandra/tools/bin/sstablemetadata na-39-big-Data.db
SSTable: 
/home/kurt/.ccm/kgreav-3nodes/node1/data0/aoeu/aoeu-c2c45b00439011e8bfc8737d74e3e5df/na-39-big
First token: -8223339496150845696 (derphead5731287)
Last token: -8023360031800191250 (derphead3351464)
SSTable Level: 0
Repaired at: 0
Pending repair: 825565d0-4784-11e8-b1b1-8f56691c789f

ccm node1 start

CASSANDRA_INCLUDE=~/.ccm/kgreav-3nodes/node1/bin/cassandra.in.sh 
~/werk/cstar/kgreav-cassandra/tools/bin/sstablemetadata na-39-big-Data.db
SSTable: 
/home/kurt/.ccm/kgreav-3nodes/node1/data0/aoeu/aoeu-c2c45b00439011e8bfc8737d74e3e5df/na-39-big
First token: -8223339496150845696 (derphead5731287)
Last token: -8023360031800191250 (derphead3351464)
Repaired at: 1524549508277 (04/24/2018 05:58:28)
Pending repair: --
{code}



> Subrange repair doesn't always mark as repaired
> -----------------------------------------------
>
>                 Key: CASSANDRA-14400
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14400
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Kurt Greaves
>            Priority: Major
>
> So was just messing around with subrange repair on trunk and found that if I 
> generated an SSTable with a single token and then tried to repair that 
> SSTable using subrange repairs it wouldn't get marked as repaired.
>  
>  Before repair:
> {code:java}
> First token: -9223362383595311662 (derphead4471291)
> Last token: -9223362383595311662 (derphead4471291)
> Repaired at: 0
> Pending repair: 862395e0-4394-11e8-8f20-3b8ee110d005
> {code}
> Repair command:
> {code}
> ccm node1 nodetool "repair -st -9223362383595311663 -et -9223362383595311661 
> aoeu"
> [2018-04-19 05:44:42,806] Starting repair command #7 
> (c23f76c0-4394-11e8-8f20-3b8ee110d005), repairing keyspace aoeu with repair 
> options (parallelism: parallel, primary range: false, incremental: true, job 
> threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], previewKind: 
> NONE, # of ranges: 1, pull repair: false, force repair: false, optimise 
> streams: false)
> [2018-04-19 05:44:42,843] Repair session c242d220-4394-11e8-8f20-3b8ee110d005 
> for range [(-9223362383595311663,-9223362383595311661]] finished (progress: 
> 20%)
> [2018-04-19 05:44:43,139] Repair completed successfully
> [2018-04-19 05:44:43,140] Repair command #7 finished in 0 seconds
> {code}
> After repair SSTable hasn't changed and sstablemetadata outputs:
> {code}
> First token: -9223362383595311662 (derphead4471291)
> Last token: -9223362383595311662 (derphead4471291)
> Repaired at: 0
> Pending repair: 862395e0-4394-11e8-8f20-3b8ee110d005
> {code}
> And parent_repair_history states that the repair is complete/range was 
> successful:
> {code}
> select * from system_distributed.parent_repair_history where 
> parent_id=862395e0-4394-11e8-8f20-3b8ee110d005 ;
>  parent_id                            | columnfamily_names | 
> exception_message | exception_stacktrace | finished_at                     | 
> keyspace_name | options                                                       
>                                                                               
>                                                                               
>                                                      | requested_ranges       
>                          | started_at                      | successful_ranges
> --------------------------------------+--------------------+-------------------+----------------------+---------------------------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+---------------------------------+-------------------------------------------------
>  862395e0-4394-11e8-8f20-3b8ee110d005 |           {'aoeu'} |              
> null |                 null | 2018-04-19 05:43:14.578000+0000 |          aoeu 
> | {'dataCenters': '', 'forceRepair': 'false', 'hosts': '', 'incremental': 
> 'true', 'jobThreads': '1', 'optimiseStreams': 'false', 'parallelism': 
> 'parallel', 'previewKind': 'NONE', 'primaryRange': 'false', 'pullRepair': 
> 'false', 'sub_range_repair': 'true', 'trace': 'false'} | 
> {'(-9223362383595311663,-9223362383595311661]'} | 2018-04-19 
> 05:43:01.952000+0000 | {'(-9223362383595311663,-9223362383595311661]'}
> {code}
> Subrange repairs seem to work fine over large ranges and set {{Repaired at}} 
> as expected, but I haven't figured out why it works for a large range versus 
> a small range so far.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14400) Subrange repair doesn't always mark as repaired

Reply via email to