Re: [PR] [CASSANDRA-21138] Add autorepair to Cassandra 5.0 [cassandra]

via GitHub Fri, 06 Mar 2026 12:13:05 -0800


pauloricardomg commented on code in PR #4558:
URL: https://github.com/apache/cassandra/pull/4558#discussion_r2866738711



##########
conf/cassandra.yaml:
##########
@@ -2281,3 +2289,170 @@ drop_compact_storage_enabled: false
 #   compatibility mode would no longer toggle behaviors as when it was running 
in the UPGRADING mode.
 #
 storage_compatibility_mode: CASSANDRA_4
+
+
+# Prevents preparing a repair session or beginning a repair streaming session 
if pending compactions is over
+# the given value.  Defaults to disabled.
+# reject_repair_compaction_threshold: 1024
+
+# At least 20% of disk must be unused to run repair. It is useful to avoid 
disks filling up during
+# repair as anti-compaction during repair may contribute to additional space 
temporarily.
+# if you want to disable this feature (the recommendation is not to, but if 
you want to disable it for whatever reason)
+# then set the ratio to 0.0
+# repair_disk_headroom_reject_ratio: 0.2;

Review Comment:
   Addressed on 60b12b4ff00200dc0174ef2556e70f15baf96559



##########
src/java/org/apache/cassandra/repair/autorepair/AutoRepairUtils.java:
##########
@@ -451,6 +451,42 @@ public static boolean hasMultipleLiveMajorVersions()
         return majorVersions.size() > 1;
     }
 
+    /**
+     * Last version that does not support auto-repair.
+     * All nodes in the cluster must be running a version above this to enable 
auto-repair.
+     * Versions at or below this version (5.0.6) do not support auto-repair.
+     */
+    @VisibleForTesting
+    static final CassandraVersion LAST_UNSUPPORTED_VERSION_FOR_AUTO_REPAIR = 
new CassandraVersion("5.0.6");
+
+    /**
+     * Checks whether any node in the cluster is running an unsupported 
version for auto-repair.
+     *
+     * @return true if any live node has a version at or below 5.0.6 
(unsupported) or has an unknown version,
+     *         false if all nodes are running versions above 5.0.6 (supported)
+     */
+    public static boolean hasNodesBelowMinimumVersion()
+    {
+        Set<InetAddressAndPort> liveEndpoints = 
Gossiper.instance.getLiveMembers();
+        for (InetAddressAndPort endpoint : liveEndpoints)
+        {
+            CassandraVersion releaseVersion = 
Gossiper.instance.getReleaseVersion(endpoint);
+            if (releaseVersion == null)
+            {
+                logger.warn("Cannot determine version for endpoint {}, 
blocking auto-repair", endpoint);
+                return true;
+            }

Review Comment:
   added system property to skip check on 
0f8e9fbf87d70f5d04ea38525bf6a0f3f1cd736e



##########
test/distributed/org/apache/cassandra/distributed/upgrade/AutoRepairDisabledSchemaUpgradeTest.java:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.distributed.upgrade;
+
+import java.util.UUID;
+
+import com.vdurmont.semver4j.Semver;
+import com.vdurmont.semver4j.Semver.SemverType;
+import org.junit.Test;
+
+import org.apache.cassandra.config.CassandraRelevantProperties;
+import org.apache.cassandra.distributed.UpgradeableCluster;
+import org.apache.cassandra.distributed.api.ConsistencyLevel;
+import org.apache.cassandra.distributed.api.Feature;
+import org.apache.cassandra.distributed.impl.AbstractCluster;
+import org.apache.cassandra.distributed.shared.Versions;
+
+import static org.apache.cassandra.distributed.shared.Versions.Version;
+import static org.apache.cassandra.distributed.shared.Versions.find;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+/**
+ * Tests that upgrading from 5.0.6 (where auto-repair is not included) to the 
current version
+ * with AUTO_REPAIR_ENABLE=false maintains schema agreement between nodes.
+ * <p>
+ * This test verifies that when the auto-repair feature is disabled via the 
JVM property,
+ * the schema version remains consistent between upgraded and non-upgraded 
nodes,
+ * ensuring no schema disagreement occurs due to conditional auto-repair 
schema changes.
+ */
+public class AutoRepairDisabledSchemaUpgradeTest extends UpgradeTestBase
+{
+    private static final Semver v506 = new Semver("5.0.6", SemverType.STRICT);
+
+    @Test
+    public void testSchemaAgreementWithAutoRepairDisabled() throws Throwable
+    {
+        // Disable auto-repair feature to ensure no auto-repair schema changes 
are made
+        CassandraRelevantProperties.AUTOREPAIR_ENABLE.setBoolean(false);
+
+        Versions versions = find();
+        Version from = versions.get(v506);
+        Version to = AbstractCluster.CURRENT_VERSION;
+
+        assertNotNull("5.0.6 version not available - ensure dtest-5.0.6.jar is 
built", from);
+
+        try (UpgradeableCluster cluster = init(UpgradeableCluster.create(2, 
from,
+                config -> config.with(Feature.GOSSIP, Feature.NETWORK))))
+        {
+            // Create a simple table to ensure schema is propagated
+            cluster.schemaChange("CREATE TABLE " + KEYSPACE + ".tbl (pk int 
PRIMARY KEY, v int)");
+            cluster.coordinator(1).execute("INSERT INTO " + KEYSPACE + ".tbl 
(pk, v) VALUES (1, 1)", ConsistencyLevel.ALL);
+
+            // Verify initial schema agreement before upgrade
+            UUID schemaBefore1 = cluster.get(1).schemaVersion();
+            UUID schemaBefore2 = cluster.get(2).schemaVersion();
+            assertEquals("Schema versions should match before upgrade", 
schemaBefore1, schemaBefore2);
+
+            // Upgrade only node 1 to current version
+            cluster.get(1).shutdown().get();
+            cluster.get(1).setVersion(to);
+            cluster.get(1).startup();
+
+            // Wait for schema to settle
+            Thread.sleep(5000);
+
+            // Verify schema agreement after upgrade
+            UUID schemaAfter1 = cluster.get(1).schemaVersion();
+            UUID schemaAfter2 = cluster.get(2).schemaVersion();
+
+            assertNotNull("Node 1 schema version should not be null after 
upgrade", schemaAfter1);
+            assertNotNull("Node 2 schema version should not be null after 
upgrade", schemaAfter2);
+            assertEquals("Schema versions should match between upgraded and 
non-upgraded nodes", schemaAfter1, schemaAfter2);
+
+            // Verify data is still readable from both nodes
+            Object[][] result1 = cluster.coordinator(1).execute("SELECT * FROM 
" + KEYSPACE + ".tbl WHERE pk = 1", ConsistencyLevel.ALL);
+            Object[][] result2 = cluster.coordinator(2).execute("SELECT * FROM 
" + KEYSPACE + ".tbl WHERE pk = 1", ConsistencyLevel.ALL);
+            assertEquals("Data should be readable from node 1", 1, 
result1.length);
+            assertEquals("Data should be readable from node 2", 1, 
result2.length);

Review Comment:
   Done



##########
src/java/org/apache/cassandra/config/Config.java:
##########
@@ -348,6 +349,11 @@ public MemtableOptions()
     // The number of executors to use for building secondary indexes
     public volatile int concurrent_index_builders = 2;
 
+    // at least 20% of disk must be unused to run repair
+    // if you want to disable this feature (the recommendation is not to, but 
if you want to disable it for whatever reason) then set the ratio to 0.0
+    @Replaces(oldName = "incremental_repair_disk_headroom_reject_ratio")

Review Comment:
   Addressed on 60b12b4ff00200dc0174ef2556e70f15baf96559



##########
src/java/org/apache/cassandra/schema/SystemDistributedKeyspace.java:
##########
@@ -197,7 +210,20 @@ private static TableMetadata.Builder parse(String table, 
String description, Str
 
     public static KeyspaceMetadata metadata()
     {
-        return 
KeyspaceMetadata.create(SchemaConstants.DISTRIBUTED_KEYSPACE_NAME, 
KeyspaceParams.simple(Math.max(DEFAULT_RF, 
DatabaseDescriptor.getDefaultKeyspaceRF())), Tables.of(RepairHistory, 
ParentRepairHistory, ViewBuildStatus, PartitionDenylistTable, 
AutoRepairHistory, AutoRepairPriority));
+        Tables tables;
+        if (CassandraRelevantProperties.AUTOREPAIR_ENABLE.getBoolean())

Review Comment:
   Good call. I added a note to NEWS.txt on 54cb5c9ba8



##########
src/java/org/apache/cassandra/schema/SystemDistributedKeyspace.java:
##########
@@ -83,8 +83,9 @@ private SystemDistributedKeyspace()
      * gen 4: compression chunk length reduced to 16KiB, 
memtable_flush_period_in_ms now unset on all tables in 4.0
      * gen 5: add ttl and TWCS to repair_history tables
      * gen 6: add denylist table
+     * gen 7: add auto_repair_history and auto_repair_priority tables for 
AutoRepair feature
      */
-    public static final long GENERATION = 6;
+    public static final long GENERATION = 7;

Review Comment:
   Good catch, fixed on 81d9547a0b2da74b31e61cd815a5eab5f0b21eb9



##########
src/java/org/apache/cassandra/cql3/statements/schema/TableAttributes.java:
##########
@@ -58,6 +59,10 @@ public final class TableAttributes extends 
PropertyDefinitions
     public void validate()
     {
         validate(validKeywords, obsoleteKeywords);
+
+        if (hasOption(AUTO_REPAIR) && 
!CassandraRelevantProperties.AUTOREPAIR_ENABLE.getBoolean())

Review Comment:
   Detected this condition and print more friendly error message on 
https://github.com/apache/cassandra/pull/4558/changes/0bdf37cec5620afedd302a8d109723eb110c7dd8
   
   added test that check this scenario
   ```java
   ERROR [SSTableBatchOpen:1] 2026-02-28T00:14:27,240 SSTableReader.java:437 - 
The SSTable 
/home/paulo/workspace/cassandra-worktree/worktrees/cassandra-5.0/bin/../data/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09/nb-13-big
 contains an auto_repair column that is not recognized. This occurs when 
cassandra.autorepair.enable was previously set to true and is now false. Set 
-Dcassandra.autorepair.enable=true and restart.
   ```
   
   



##########
conf/cassandra.yaml:
##########
@@ -2281,3 +2289,170 @@ drop_compact_storage_enabled: false
 #   compatibility mode would no longer toggle behaviors as when it was running 
in the UPGRADING mode.
 #
 storage_compatibility_mode: CASSANDRA_4
+
+
+# Prevents preparing a repair session or beginning a repair streaming session 
if pending compactions is over
+# the given value.  Defaults to disabled.
+# reject_repair_compaction_threshold: 1024
+
+# At least 20% of disk must be unused to run repair. It is useful to avoid 
disks filling up during
+# repair as anti-compaction during repair may contribute to additional space 
temporarily.
+# if you want to disable this feature (the recommendation is not to, but if 
you want to disable it for whatever reason)
+# then set the ratio to 0.0
+# repair_disk_headroom_reject_ratio: 0.2;
+
+# This is the deprecated config which was used to safeguard incremental 
repairs. Use repair_disk_headroom_reject_ratio
+# instead as it safeguards against all repairs.
+# incremental_repair_disk_headroom_reject_ratio: 0.2;

Review Comment:
   Addressed on 60b12b4ff00200dc0174ef2556e70f15baf96559



##########
doc/modules/cassandra/pages/managing/operating/auto_repair.adoc:
##########
@@ -0,0 +1,460 @@
+= Auto Repair
+:navtitle: Auto Repair
+:description: Auto Repair concepts - How it works, how to configure it, and 
more.
+:keywords: CEP-37, Repair, Incremental, Preview
+
+Auto Repair is a fully automated scheduler that provides repair orchestration 
within Apache Cassandra. This
+significantly reduces operational overhead by eliminating the need for 
operators to deploy external tools to submit and
+manage repairs.
+
+At a high level, a dedicated thread pool is assigned to the repair scheduler. 
The repair scheduler in Cassandra
+maintains a new replicated table, `system_distributed.auto_repair_history`, 
which stores the repair history for all
+nodes, including details such as the last repair time. The scheduler selects 
the node(s) to begin repairs and
+orchestrates the process to ensure that every table and its token ranges are 
repaired.
+
+The algorithm can run repairs simultaneously on multiple nodes and splits 
token ranges into subranges, with necessary
+retries to handle transient failures. Automatic repair starts as soon as a 
Cassandra cluster is launched, similar to
+compaction, and if configured appropriately, does not require human 
intervention.
+
+The scheduler currently supports Full, Incremental, and Preview repair types 
with the following features. New repair
+types, such as Paxos repair or other future repair mechanisms, can be 
integrated with minimal development effort!
+
+
+== Features
+- Capability to run repairs on multiple nodes simultaneously.
+- A default implementation and an interface to override the dataset being 
repaired per session.
+- Extendable token split algorithms with two implementations readily available:
+.  Splits token ranges by placing a cap on the size of data repaired in one 
session and a maximum cap at the schedule
+level using xref:#repair-token-range-splitter[RepairTokenRangeSplitter] 
(default).
+.  Splits tokens evenly based on the specified number of splits using
+xref:#fixed-split-token-range-splitter[FixedSplitTokenRangeSplitter].
+- A new xref:#table-configuration[CQL table property] (`auto_repair`) offering:
+.  The ability to disable specific repair types at the table level, allowing 
the scheduler to skip one or more tables.
+.  Configuring repair priorities for certain tables to prioritize them over 
others.
+- Dynamic enablement or disablement of the scheduler for each repair type.
+- Configurable settings tailored to each repair job.
+- Rich configuration options for each repair type (e.g., Full, Incremental, or 
Preview repairs).
+- Comprehensive observability features that allow operators to configure 
alarms as needed.
+
+== Considerations
+
+Before enabling Auto Repair, please consult the 
xref:managing/operating/repair.adoc[Repair] guide to establish a base
+understanding of repairs.
+
+=== Full Repair
+
+Full Repairs operate over all data in the token range being repaired.  It is 
therefore important to run full repair
+with a longer schedule and with smaller assignments.
+
+=== Incremental Repair
+
+When enabled from the inception of a cluster, incremental repairs operate over 
unrepaired data and should finish
+quickly when run more frequently.
+
+Once incremental repair has been run, SSTables will be separated between data 
that have been incrementally repaired
+and data that have not.  Therefore, it is important to continually run 
incremental repair once it has been enabled so
+newly written data can be compacted together with previously repaired data, 
allowing overwritten and expired data to
+be eventually purged.
+
+Running incremental repair more frequently keeps the unrepaired set smaller 
and thus causes repairs to operate over
+a smaller set of data, so a shorter `min_repair_interval` such as `1h` is 
recommended for new clusters.
+
+==== Enabling Incremental Repair on existing clusters with a large amount of 
data
+[#enabling-ir]
+One should be careful when enabling incremental repair on a cluster for the 
first time. While
+xref:#repair-token-range-splitter[RepairTokenRangeSplitter] includes a default 
configuration to attempt to gracefully
+migrate to incremental repair over time, failure to take proper precaution 
could overwhelm the cluster with
+xref:managing/operating/compaction/overview.adoc#types-of-compaction[anticompactions].
+
+No matter how one goes about enabling and running incremental repair, it is 
recommended to run a cycle of full repairs
+for the entire cluster as pre-flight step to running incremental repair. This 
will put the cluster into a more
+consistent state which will reduce the amount of streaming between replicas 
when incremental repair initially runs.
+
+If you do not have strong data consistency requirements, one may consider using
+xref:managing/tools/sstable/sstablerepairedset.adoc[nodetool 
sstablerepairedset] to mark all SSTables as repaired
+before enabling incremental repair scheduling using Auto Repair. This will 
reduce the burden of initially running
+incremental repair because all existing data will be considered as repaired, 
so subsequent incremental repairs will
+only run against new data.
+
+If you do have strong data consistency requirements, then one must treat all 
data as initially unrepaired and run
+incremental repair against it.  Consult
+xref:#incremental-repair-defaults[RepairTokenRangeSplitter's Incremental 
repair defaults].
+
+In particular one should be mindful of the 
xref:managing/operating/compaction/overview.adoc[compaction strategy]
+you use for your tables and how it might impact incremental repair before 
running incremental repair for the first
+time:
+
+- *Large SSTables*: When using 
xref:managing/operating/compaction/stcs.adoc[SizeTieredCompactionStrategy] or 
any
+  compaction strategy which can create large SSTables including many 
partitions the amount of
+  
xref:managing/operating/compaction/overview.adoc#types-of-compaction[anticompaction]
 that might be required could be
+  excessive. Using a small `bytes_per_assignment` might contribute to repeated 
anticompactions over the same
+  unrepaired data.
+- *Partitions overlapping many SSTables*: If partitions overlap between many 
SSTables, the amount of SSTables included
+  in a repair might be large.  Therefore it is important to consider that many 
SSTables may be included in a repair
+  session and must all be anticompacted. 
xref:managing/operating/compaction/lcs.adoc[LeveledCompactionStrategy] is less
+  susceptible to this issue as it prevents overlapping of partitions within 
levels outside of L0, but if SSTables
+  start accumulating in L0 between incremental repairs, the cost of 
anticompaction will increase.
+  xref:managing/operating/compaction/ucs#sharding[UnifiedCompactionStrategy's 
sharding] can also be used to avoid
+  partitions overlapping SSTables.
+
+The xref:#repair-token-range-splitter[token_range_splitter] configuration for 
incremental repair includes a default
+configuration that attempts to conservatively migrate 100GiB of compressed 
data every day per node. Depending on
+requirements, data set and capability of a cluster's hardware, one may 
consider tuning these values to be more
+aggressive or conservative.
+
+=== Previewing Repaired Data
+
+The `preview_repaired` repair type executes repairs over the repaired data set 
to detect possible data inconsistencies.
+
+Inconsistencies in the repaired data set should not happen in practice and 
could indicate a possible bug in incremental
+repair.
+
+Running preview repairs is useful when considering using the
+xref:cassandra:managing/operating/compaction/tombstones.adoc#deletion[only_purge_repaired_tombstones]
 table compaction
+option to prevent data from possibly being resurrected when inconsistent 
replicas are missing tombstones from deletes.
+
+When enabled, the `BytesPreviewedDesynchronized` and 
`TokenRangesPreviewedDesynchronized`
+xref:cassandra:managing/operating/metrics.adoc#table-metrics[table metrics] 
can be used to detect inconsistencies in the
+repaired data set.
+
+== Configuring Auto Repair in cassandra.yaml
+
+Configuration for Auto Repair is managed in the `cassandra.yaml` file by the 
`auto_repair` property.

Review Comment:
   Addressed on bbb1258ea9e38fd1e0940798aeb348293400b278



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [CASSANDRA-21138] Add autorepair to Cassandra 5.0 [cassandra]

Reply via email to