Full repair running for an entire week sounds excessively long. Even if you've got 1 TB of data per node, 1 week means the repair speed is less than 2 MB/s, that's very slow. Perhaps you should focus on finding the bottleneck of the full repair speed and work on that instead.

On 03/02/2024 16:18, Sebastian Marsching wrote:
Hi,

2. use an orchestration tool, such as Cassandra Reaper, to take care of that for you. You will still need monitor and alert to ensure the repairs are run successfully, but fixing a stuck or failed repair is not very time sensitive, you can usually leave it till Monday morning if it happens at Friday night.

Does anyone know how such a schedule can be created in Cassandra Reaper?

I recently learned the hard way that running both a full and an incremental repair for the same keyspace and table in parallel is not a good idea (it caused a very unpleasant overload situation on one of our clusters).

At the moment, we have one schedule for the full repairs (every 90 days) and another schedule for the incremental repairs (daily). But as full repairs take much longer than a day (about a week, in our case), the two schedules collide. So, Cassandra Reaper starts an incremental repair while the full repair is still in process.

Does anyone know how to avoid this? Optimally, the full repair would be paused (no new segments started) for the duration of the incremental repair. The second best option would be inhibiting the incremental repair while a full repair is in progress.

Best regards,
Sebastian

Reply via email to