Re: Migrating to incremental repair in C* 4.x

Jeff Jirsa Mon, 27 Nov 2023 08:48:18 -0800

I don’t work for datastax, thats not my blog, and I’m on a phone and 
potentially missing nuance, but I’d never try to convert a cluster to IR by 
disabling auto compaction . It sounds very much out of date or its optimized 
for fixing one node in a cluster somehow. It didn’t make sense in the 4.0 era.


Instead I’d leave compaction running and slowly run incremental repair across 
parts of the token range, slowing down as pending compactions increase

I’d choose token ranges such that you’d repair 5-10% of the data on each node 
at a time



> On Nov 23, 2023, at 11:31 PM, Sebastian Marsching <sebast...@marsching.com> 
> wrote:
> 
> Hi,
> 
> we are currently in the process of migrating from C* 3.11 to C* 4.1 and we 
> want to start using incremental repairs after the upgrade has been completed. 
> It seems like all the really bad bugs that made using incremental repairs 
> dangerous in C* 3.x have been fixed in 4.x, and for our specific workload, 
> incremental repairs should offer a significant performance improvement.
> 
> Therefore, I am currently devising a plan how we could migrate to using 
> incremental repairs. I am aware of the guide from DataStax 
> (https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsRepairNodesMigration.html),
>  but this guide is quite old and was written with C* 3.0 in mind, so I am not 
> sure whether this still fully applies to C* 4.x.
> 
> In addition to that, I am not sure whether this approach fits our workload. 
> In particular, I am wary about disabling autocompaction for an extended 
> period of time (if you are interested in the reasons why, they are at the end 
> of this e-mail).
> 
> Therefore, I am wondering whether a slighly different process might work 
> better for us:
> 
> 1. Run a full repair (we periodically run those anyway).
> 2. Mark all SSTables as repaired, even though they will include data that has 
> not been repaired yet because it was added while the repair process was 
> running.
> 3. Run another full repair.
> 4. Start using incremental repairs (and the occasional full repair in order 
> to handle bit rot etc.).
> 
> If I understood the interactions between full repairs and incremental repairs 
> correctly, step 3 should repair potential inconsistencies in the SSTables 
> that were marked as repaired in step 2 while avoiding the problem of 
> overstreaming that would happen when only marking those SSTables as repaired 
> that already existed before step 1.
> 
> Does anyone see a flaw in this concept or has experience with a similar 
> scenario (migrating to incremental repairs in an environment with 
> high-density nodes, where a single table contains most of the data)?
> 
> I am also interested in hearing about potential problems other C* users 
> experienced when migrating to incremental repairs, so that we get a better 
> idea what to expect.
> 
> Thanks,
> Sebastian
> 
> 
> Here is the explanation why I am being cautious:
> 
> More than 95 percent of our data is stored in a single table, and we use high 
> density nodes (storing about 3 TB of data per node). This means that a full 
> repair for the whole cluster takes about a week.
> 
> The reason for this layout is that most of our data is “cold”, meaning that 
> it is written once, never updated, and rarely deleted or read. However, new 
> data is added continuously, so disabling autocompaction for the duration of a 
> full repair would lead to a high number of small SSTables accumulating over 
> the course of the week, and I am not sure how well the cluster would handle 
> such a situation (and the increased load when autocompaction is enabled 
> again).

Re: Migrating to incremental repair in C* 4.x

Reply via email to