Alex, you're absolutely right that this isn’t a correctness issue—the system will eventually re-prepare the statement. The problem, however, shows up in real production environments under high QPS.
When a node is serving a heavy workload, the race condition described in the ticket causes repeated evictions followed by repeated re-prepare attempts. Instead of a single re-prepare, we see a *storm* of re-prepare requests hitting the coordinator. This quickly becomes expensive: it increases CPU usage, adds latency, and in our case escalated into a cluster-wide performance degradation. We actually experienced an outage triggered by this behavior. So while correctness is preserved, the operational impact is severe. Preventing the unnecessary eviction avoids the re-prepare storm entirely, which is why we believe this patch is important for stability in real clusters. On Mon, Dec 15, 2025 at 8:00 AM Paulo Motta <[email protected]> wrote: > I wanted to note I recently faced the issue described in this ticket in a > real cluster. I'm not familiar with this area to understand if there any > negative implications of this patch. > > So even if it's not a correctness issue per se, but fixes a practical > issue faced by users without negative consequences I don't see why this > should not be accepted, specially since it has been validated in production. > > On Mon, 15 Dec 2025 at 07:28 Alex Petrov <[email protected]> wrote: > >> iirc I reviewed it and mentioned this is not a correctness issue since we >> would simply re-prepare. I can't recall why we needed to evict, but I think >> this was for correctness reasons. >> >> Would you mind to elaborate why simply letting it to get re-prepared is >> harmful behavior? Or am I missing something and this has larger >> implications? >> >> To be clear, I am not opposed to this patch, just want to understand >> implications better. >> >> On Sun, Dec 14, 2025, at 9:03 PM, Jaydeep Chovatia wrote: >> >> Hi >> >> I had reported this bug (CASSANDRA-17401 >> <https://issues.apache.org/jira/browse/CASSANDRA-17401>) in 2022 along >> with the fix (PR#3059 <https://github.com/apache/cassandra/pull/3059>) >> and a reproducible (PR#3058 >> <https://github.com/apache/cassandra/pull/3058>). I already applied this >> fix internally, and it has been working fine for many years. Now we can see >> one of the Cassandra users has been facing the exact same problem. I have >> told them to go with the private fix for now. >> Paulo and Alex had reviewed it partially, could you (or someone) please >> complete the review so I can land to the official repo. >> >> Jaydeep >> >> >>
