[ https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcus Eriksson updated CASSANDRA-17342: ---------------------------------------- Fix Version/s: 4.0.3 (was: 4.0.x) Since Version: 4.0.0 Source Control Link: https://github.com/apache/cassandra/commit/c60ad61b3b6145af100578f2c652819f61729018 Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed to 4.0 and merged up, thanks again for the patch! trunk tests look bad, but similar to [non-patched trunk|https://app.circleci.com/pipelines/github/krummas/cassandra/775/workflows/b0ede5ae-db7c-4a1d-b6ff-22245922bb46] [circleci 4.0|https://app.circleci.com/pipelines/github/krummas/cassandra/770/workflows/edfe8c85-0de6-4191-b4be-e7c4cb1a4c1e] [circleci trunk|https://app.circleci.com/pipelines/github/krummas/cassandra/769/workflows/6eea562c-0354-41e2-b253-32da2f929193] > Performance problem for node restart with incremental range repairs > ------------------------------------------------------------------- > > Key: CASSANDRA-17342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17342 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair > Reporter: Paul Chandler > Assignee: Paul Chandler > Priority: Normal > Fix For: 4.0.3 > > Attachments: BulkRepairStateTest.java, > IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java > > > There is a performance problem when restarting cassandra for clusters doing > incremental repairs with range repairs. > We have clusters with 16 vnodes per node, and are splitting each vnode into > 100 ranges, this causes a node to take over 30 minutes to process the data > stored in the system.repairs table before the node can restart. Even when we > reduce this to 10 ranges per vnode this still takes 2 minutes to process. The > cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in > the system.repairs table. > > The problem seems to occur in the > org.apache.cassandra.repair.consistent.RepairState class where the add method > re processes the complete list, including sorting, every time a new Range is > added. This leads is an exponential growth in processing time, this is > demonstrated in the attached unit test. > > I have created a change, that collects the data read in from the > system.repairs table, in the > org.apache.cassandra.repair.consistent.LocalSessions class, before processing > it as a group at the end, this reduces the processing time to a couple of > seconds even for the 100 range version. > > This is my first attempt at changing the cassandra code, so I am in need of a > mentor to help me with the process, and validate what I have done. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org