David Capwell created CASSANDRA-21410:
-----------------------------------------

             Summary: ShardDurability.markDefunct() called O(N²) times across 
topology updates, causing log spam and OOM in tests
                 Key: CASSANDRA-21410
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21410
             Project: Apache Cassandra
          Issue Type: Bug
          Components: Accord
            Reporter: David Capwell
            Assignee: David Capwell


ShardDurability.updateTopology() has a bug where defunct schedulers accumulate 
in the shardSchedulers map and are re-marked defunct on every subsequent 
topology change, producing O(N²) log messages.

The issue is in updateTopology():


{code}
shardSchedulers.putAll(prev);           // puts defunct schedulers back into 
the map
prev.forEach((r, s) -> s.markDefunct()); // marks them defunct (again)
{code}

When a topology change removes a shard range, its scheduler is marked defunct 
but kept in shardSchedulers (via putAll) so it can finish in-flight work before 
self-removing. However, on the next topology change, these already-defunct 
schedulers are copied into the new prev map, survive the removal loop (their 
range doesn't exist in the new topology), and get markDefunct() called again. 
Every subsequent topology change re-processes all previously-defunct schedulers 
that haven't yet self-removed.

With N topology changes, markDefunct() is called 1 + 2 + 3 + ... + N = 
N*(N+1)/2 times total.

This was observed in CI running ShortReadProtectionTest, which is parameterized 
with 24 combinations x 15 test methods = 360 iterations, each creating a new 
table (and thus a new topology epoch). With 
accord.shard_durability_target_splits=4, ShardDurability.java:173 produced 
173,534 INFO-level log lines across an 11-minute test run. The JUnit test 
formatter buffers all stdout in a ByteArrayOutputStream with no size cap, and 
the accumulated ~155 MiB of log output exhausted the 1G test JVM heap, causing 
an OOM.

This ticket / patch was generated by Opus 4.6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to