[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-17342: Fix Version/s: 4.1 > Performance problem for node restart with incremental range repairs > --- > > Key: CASSANDRA-17342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17342 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Paul Chandler >Assignee: Paul Chandler >Priority: Normal > Fix For: 4.0.3, 4.1 > > Attachments: BulkRepairStateTest.java, > IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java > > > There is a performance problem when restarting cassandra for clusters doing > incremental repairs with range repairs. > We have clusters with 16 vnodes per node, and are splitting each vnode into > 100 ranges, this causes a node to take over 30 minutes to process the data > stored in the system.repairs table before the node can restart. Even when we > reduce this to 10 ranges per vnode this still takes 2 minutes to process. The > cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in > the system.repairs table. > > The problem seems to occur in the > org.apache.cassandra.repair.consistent.RepairState class where the add method > re processes the complete list, including sorting, every time a new Range is > added. This leads is an exponential growth in processing time, this is > demonstrated in the attached unit test. > > I have created a change, that collects the data read in from the > system.repairs table, in the > org.apache.cassandra.repair.consistent.LocalSessions class, before processing > it as a group at the end, this reduces the processing time to a couple of > seconds even for the 100 range version. > > This is my first attempt at changing the cassandra code, so I am in need of a > mentor to help me with the process, and validate what I have done. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-17342: Fix Version/s: 4.0.3 (was: 4.0.x) Since Version: 4.0.0 Source Control Link: https://github.com/apache/cassandra/commit/c60ad61b3b6145af100578f2c652819f61729018 Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed to 4.0 and merged up, thanks again for the patch! trunk tests look bad, but similar to [non-patched trunk|https://app.circleci.com/pipelines/github/krummas/cassandra/775/workflows/b0ede5ae-db7c-4a1d-b6ff-22245922bb46] [circleci 4.0|https://app.circleci.com/pipelines/github/krummas/cassandra/770/workflows/edfe8c85-0de6-4191-b4be-e7c4cb1a4c1e] [circleci trunk|https://app.circleci.com/pipelines/github/krummas/cassandra/769/workflows/6eea562c-0354-41e2-b253-32da2f929193] > Performance problem for node restart with incremental range repairs > --- > > Key: CASSANDRA-17342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17342 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Paul Chandler >Assignee: Paul Chandler >Priority: Normal > Fix For: 4.0.3 > > Attachments: BulkRepairStateTest.java, > IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java > > > There is a performance problem when restarting cassandra for clusters doing > incremental repairs with range repairs. > We have clusters with 16 vnodes per node, and are splitting each vnode into > 100 ranges, this causes a node to take over 30 minutes to process the data > stored in the system.repairs table before the node can restart. Even when we > reduce this to 10 ranges per vnode this still takes 2 minutes to process. The > cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in > the system.repairs table. > > The problem seems to occur in the > org.apache.cassandra.repair.consistent.RepairState class where the add method > re processes the complete list, including sorting, every time a new Range is > added. This leads is an exponential growth in processing time, this is > demonstrated in the attached unit test. > > I have created a change, that collects the data read in from the > system.repairs table, in the > org.apache.cassandra.repair.consistent.LocalSessions class, before processing > it as a group at the end, this reduces the processing time to a couple of > seconds even for the 100 range version. > > This is my first attempt at changing the cassandra code, so I am in need of a > mentor to help me with the process, and validate what I have done. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-17342: Status: Ready to Commit (was: Review In Progress) > Performance problem for node restart with incremental range repairs > --- > > Key: CASSANDRA-17342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17342 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Paul Chandler >Assignee: Paul Chandler >Priority: Normal > Fix For: 4.0.x > > Attachments: BulkRepairStateTest.java, > IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java > > > There is a performance problem when restarting cassandra for clusters doing > incremental repairs with range repairs. > We have clusters with 16 vnodes per node, and are splitting each vnode into > 100 ranges, this causes a node to take over 30 minutes to process the data > stored in the system.repairs table before the node can restart. Even when we > reduce this to 10 ranges per vnode this still takes 2 minutes to process. The > cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in > the system.repairs table. > > The problem seems to occur in the > org.apache.cassandra.repair.consistent.RepairState class where the add method > re processes the complete list, including sorting, every time a new Range is > added. This leads is an exponential growth in processing time, this is > demonstrated in the attached unit test. > > I have created a change, that collects the data read in from the > system.repairs table, in the > org.apache.cassandra.repair.consistent.LocalSessions class, before processing > it as a group at the end, this reduces the processing time to a couple of > seconds even for the 100 range version. > > This is my first attempt at changing the cassandra code, so I am in need of a > mentor to help me with the process, and validate what I have done. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-17342: Reviewers: Brandon Williams, Marcus Eriksson, Marcus Eriksson (was: Brandon Williams, Marcus Eriksson) Brandon Williams, Marcus Eriksson, Marcus Eriksson (was: Brandon Williams, Marcus Eriksson) Status: Review In Progress (was: Patch Available) > Performance problem for node restart with incremental range repairs > --- > > Key: CASSANDRA-17342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17342 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Paul Chandler >Assignee: Paul Chandler >Priority: Normal > Fix For: 4.0.x > > Attachments: BulkRepairStateTest.java, > IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java > > > There is a performance problem when restarting cassandra for clusters doing > incremental repairs with range repairs. > We have clusters with 16 vnodes per node, and are splitting each vnode into > 100 ranges, this causes a node to take over 30 minutes to process the data > stored in the system.repairs table before the node can restart. Even when we > reduce this to 10 ranges per vnode this still takes 2 minutes to process. The > cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in > the system.repairs table. > > The problem seems to occur in the > org.apache.cassandra.repair.consistent.RepairState class where the add method > re processes the complete list, including sorting, every time a new Range is > added. This leads is an exponential growth in processing time, this is > demonstrated in the attached unit test. > > I have created a change, that collects the data read in from the > system.repairs table, in the > org.apache.cassandra.repair.consistent.LocalSessions class, before processing > it as a group at the end, this reduces the processing time to a couple of > seconds even for the 100 range version. > > This is my first attempt at changing the cassandra code, so I am in need of a > mentor to help me with the process, and validate what I have done. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-17342: - Reviewers: Brandon Williams, Marcus Eriksson (was: Marcus Eriksson) > Performance problem for node restart with incremental range repairs > --- > > Key: CASSANDRA-17342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17342 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Paul Chandler >Assignee: Paul Chandler >Priority: Normal > Fix For: 4.0.x > > Attachments: BulkRepairStateTest.java, > IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java > > > There is a performance problem when restarting cassandra for clusters doing > incremental repairs with range repairs. > We have clusters with 16 vnodes per node, and are splitting each vnode into > 100 ranges, this causes a node to take over 30 minutes to process the data > stored in the system.repairs table before the node can restart. Even when we > reduce this to 10 ranges per vnode this still takes 2 minutes to process. The > cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in > the system.repairs table. > > The problem seems to occur in the > org.apache.cassandra.repair.consistent.RepairState class where the add method > re processes the complete list, including sorting, every time a new Range is > added. This leads is an exponential growth in processing time, this is > demonstrated in the attached unit test. > > I have created a change, that collects the data read in from the > system.repairs table, in the > org.apache.cassandra.repair.consistent.LocalSessions class, before processing > it as a group at the end, this reduces the processing time to a couple of > seconds even for the 100 range version. > > This is my first attempt at changing the cassandra code, so I am in need of a > mentor to help me with the process, and validate what I have done. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-17342: - Test and Documentation Plan: run CI Status: Patch Available (was: Open) > Performance problem for node restart with incremental range repairs > --- > > Key: CASSANDRA-17342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17342 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Paul Chandler >Assignee: Paul Chandler >Priority: Normal > Fix For: 4.0.x > > Attachments: BulkRepairStateTest.java, > IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java > > > There is a performance problem when restarting cassandra for clusters doing > incremental repairs with range repairs. > We have clusters with 16 vnodes per node, and are splitting each vnode into > 100 ranges, this causes a node to take over 30 minutes to process the data > stored in the system.repairs table before the node can restart. Even when we > reduce this to 10 ranges per vnode this still takes 2 minutes to process. The > cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in > the system.repairs table. > > The problem seems to occur in the > org.apache.cassandra.repair.consistent.RepairState class where the add method > re processes the complete list, including sorting, every time a new Range is > added. This leads is an exponential growth in processing time, this is > demonstrated in the attached unit test. > > I have created a change, that collects the data read in from the > system.repairs table, in the > org.apache.cassandra.repair.consistent.LocalSessions class, before processing > it as a group at the end, this reduces the processing time to a couple of > seconds even for the 100 range version. > > This is my first attempt at changing the cassandra code, so I am in need of a > mentor to help me with the process, and validate what I have done. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-17342: Reviewers: Marcus Eriksson > Performance problem for node restart with incremental range repairs > --- > > Key: CASSANDRA-17342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17342 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Paul Chandler >Priority: Normal > Fix For: 4.0.x > > Attachments: BulkRepairStateTest.java, > IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java > > > There is a performance problem when restarting cassandra for clusters doing > incremental repairs with range repairs. > We have clusters with 16 vnodes per node, and are splitting each vnode into > 100 ranges, this causes a node to take over 30 minutes to process the data > stored in the system.repairs table before the node can restart. Even when we > reduce this to 10 ranges per vnode this still takes 2 minutes to process. The > cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in > the system.repairs table. > > The problem seems to occur in the > org.apache.cassandra.repair.consistent.RepairState class where the add method > re processes the complete list, including sorting, every time a new Range is > added. This leads is an exponential growth in processing time, this is > demonstrated in the attached unit test. > > I have created a change, that collects the data read in from the > system.repairs table, in the > org.apache.cassandra.repair.consistent.LocalSessions class, before processing > it as a group at the end, this reduces the processing time to a couple of > seconds even for the 100 range version. > > This is my first attempt at changing the cassandra code, so I am in need of a > mentor to help me with the process, and validate what I have done. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Chandler updated CASSANDRA-17342: -- Attachment: LocalSessions.java RepairedState.java BulkRepairStateTest.java > Performance problem for node restart with incremental range repairs > --- > > Key: CASSANDRA-17342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17342 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Paul Chandler >Priority: Normal > Fix For: 4.0.x > > Attachments: BulkRepairStateTest.java, > IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java > > > There is a performance problem when restarting cassandra for clusters doing > incremental repairs with range repairs. > We have clusters with 16 vnodes per node, and are splitting each vnode into > 100 ranges, this causes a node to take over 30 minutes to process the data > stored in the system.repairs table before the node can restart. Even when we > reduce this to 10 ranges per vnode this still takes 2 minutes to process. The > cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in > the system.repairs table. > > The problem seems to occur in the > org.apache.cassandra.repair.consistent.RepairState class where the add method > re processes the complete list, including sorting, every time a new Range is > added. This leads is an exponential growth in processing time, this is > demonstrated in the attached unit test. > > I have created a change, that collects the data read in from the > system.repairs table, in the > org.apache.cassandra.repair.consistent.LocalSessions class, before processing > it as a group at the end, this reduces the processing time to a couple of > seconds even for the 100 range version. > > This is my first attempt at changing the cassandra code, so I am in need of a > mentor to help me with the process, and validate what I have done. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-17342: - Bug Category: Parent values: Degradation(12984)Level 1 values: Performance Bug/Regression(12997) Complexity: Normal Discovered By: User Report Fix Version/s: 4.0.x Severity: Normal Status: Open (was: Triage Needed) > Performance problem for node restart with incremental range repairs > --- > > Key: CASSANDRA-17342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17342 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Paul Chandler >Priority: Normal > Fix For: 4.0.x > > Attachments: IncrementalRepairStartupTest.java > > > There is a performance problem when restarting cassandra for clusters doing > incremental repairs with range repairs. > We have clusters with 16 vnodes per node, and are splitting each vnode into > 100 ranges, this causes a node to take over 30 minutes to process the data > stored in the system.repairs table before the node can restart. Even when we > reduce this to 10 ranges per vnode this still takes 2 minutes to process. The > cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in > the system.repairs table. > > The problem seems to occur in the > org.apache.cassandra.repair.consistent.RepairState class where the add method > re processes the complete list, including sorting, every time a new Range is > added. This leads is an exponential growth in processing time, this is > demonstrated in the attached unit test. > > I have created a change, that collects the data read in from the > system.repairs table, in the > org.apache.cassandra.repair.consistent.LocalSessions class, before processing > it as a group at the end, this reduces the processing time to a couple of > seconds even for the 100 range version. > > This is my first attempt at changing the cassandra code, so I am in need of a > mentor to help me with the process, and validate what I have done. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org