[jira] [Commented] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs

Paul Chandler (Jira) Wed, 02 Feb 2022 13:05:09 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486100#comment-17486100
 ]


Paul Chandler commented on CASSANDRA-17342:
-------------------------------------------

[~brandon.williams] These are the changes I have done:

In LocalSessions.java, I have modified the method maybeUpdateRepairedState to 
store the level data for later if the node is in the process of starting. I 
have added the method finaliseStates to process all the data at the end.

In RepairedState.java added the method finaliseInitalLevels and refactored the 
add method to pull out the processLevels method which will now do the 
processing during start up and during steady state.

I have created UnitTest BulkRepairStateTest, this is basically a copy 
RepairStateTest, but calling the new methods instead.

I have not done that much testing, but the command nodetool repair_admin 
summarize-repaired does give the same results as before.

> Performance problem for node restart with incremental range repairs
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-17342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17342
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair
>            Reporter: Paul Chandler
>            Priority: Normal
>             Fix For: 4.0.x
>
>         Attachments: BulkRepairStateTest.java, 
> IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java
>
>
> There is a performance problem when restarting cassandra for clusters doing 
> incremental repairs with range repairs. 
> We have clusters with 16 vnodes per node, and are splitting each vnode into 
> 100 ranges, this causes a node to take over 30 minutes to process the data 
> stored in the system.repairs table before the node can restart. Even when we 
> reduce this to 10 ranges per vnode this still takes 2 minutes to process. The 
> cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in 
> the system.repairs table.
>  
> The problem seems to occur in the 
> org.apache.cassandra.repair.consistent.RepairState class where the add method 
> re processes the complete list, including sorting, every time a new Range is 
> added. This leads is an exponential growth in processing time, this is 
> demonstrated in the attached unit test.
>  
> I have created a change, that collects the data read in from the 
> system.repairs table, in the 
> org.apache.cassandra.repair.consistent.LocalSessions class, before processing 
> it as a group at the end, this reduces the processing time to a couple of 
> seconds even for the 100 range version.
>  
> This is my first attempt at changing the cassandra code, so I am in need of a 
> mentor to help me with the process, and validate what I have done.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs

Reply via email to