[ https://issues.apache.org/jira/browse/CASSANDRA-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815259#comment-17815259 ]
Andres de la Peña edited comment on CASSANDRA-19336 at 2/7/24 11:59 AM: ------------------------------------------------------------------------ The CI results above look good to me: * {{MemtableSizeTest.testTruncationReleasesLogSpace}} in 4.0 is CASSANDRA-17298. * {{RepairJobTest.testNoTreesRetainedAfterDifference}} in 4.0 is CASSANDRA-17884. * {{JMXFeatureTest.testOneNetworkInterfaceProvisioning}} in 4.1 is CASSANDRA-19261. * {{MemtableSizeTest.testSize[skiplist]}} in 4.1 is CASSANDRA-17298. * {{VectorUpdateDeleteTest.updateTest}} in 5.0 and trunk is CASSANDRA-19168. * {{CompactionStrategyManagerTest.testAutomaticUpgradeConcurrency}} in trunk is unreported, but it can be [reproduced on the base branch|https://app.circleci.com/pipelines/github/adelapena/cassandra/3427/workflows/126307ce-d465-4736-a71e-82e0b8749598/jobs/103861/tests]. I've created CASSANDRA-19376 for it. * {{HarrySimulatorTest}} in trunk is CASSANDRA-19279. * {{InJVMTokenAwareExecutorTest.testRepair}} in trunk is a timeout seen on [Butler|https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-trunk/trunk] * {{ConcurrentQuiescentCheckerIntegrationTest.testConcurrentReadWriteWorkload}} in trunk is on [Butler|https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-trunk/trunk]. * {{NativeTransportEncryptionOptionsTest.optionalTlsConnectionAllowedToRegularPortTest}} in trunk is CASSANDRA-19239. * {{NativeTransportEncryptionOptionsTest.testOptionalMtlsModeDoNotAllowNonSSLConnections}} in trunk is CASSANDRA-19239. * {{ConsistentBootstrapTest.coordinatorIsBehindTest}} in trunk is CASSANDRA-19343. was (Author: adelapena): The CI results above look good to me: * {{MemtableSizeTest.testTruncationReleasesLogSpace}} in 4.0 is CASSANDRA-17298 * {{RepairJobTest.testNoTreesRetainedAfterDifference}} in 4.0 is CASSANDRA-17884 * {{JMXFeatureTest.testOneNetworkInterfaceProvisioning}} in 4.1 is CASSANDRA-19261 * {{MemtableSizeTest.testSize[skiplist]}} in 4.1 is CASSANDRA-17298 * {{VectorUpdateDeleteTest.updateTest}} in 5.0 and trunk is CASSANDRA-19168 * {{CompactionStrategyManagerTest.testAutomaticUpgradeConcurrency}} in trunk is unreported, but it can be [reproduced on the base branch|https://app.circleci.com/pipelines/github/adelapena/cassandra/3427/workflows/126307ce-d465-4736-a71e-82e0b8749598/jobs/103861/tests]. * {{HarrySimulatorTest}} in trunk is CASSANDRA-19279 * {{InJVMTokenAwareExecutorTest.testRepair}} in trunk is a timeout seen on [Butler|https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-trunk/trunk] * {{ConcurrentQuiescentCheckerIntegrationTest.testConcurrentReadWriteWorkload}} in trunk is on [Butler|https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-trunk/trunk] * {{NativeTransportEncryptionOptionsTest.optionalTlsConnectionAllowedToRegularPortTest}} in trunk is CASSANDRA-19239 * {{NativeTransportEncryptionOptionsTest.testOptionalMtlsModeDoNotAllowNonSSLConnections}} in trunk is CASSANDRA-19239 * {{ConsistentBootstrapTest.coordinatorIsBehindTest}} in trunk is CASSANDRA-19343. > Repair causes out of memory > --------------------------- > > Key: CASSANDRA-19336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19336 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair > Reporter: Andres de la Peña > Assignee: Andres de la Peña > Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > CASSANDRA-14096 introduced {{repair_session_space}} as a limit for the memory > usage for Merkle tree calculations during repairs. This limit is applied to > the set of Merkle trees built for a received validation request > ({{{}VALIDATION_REQ{}}}), divided by the replication factor so as not to > overwhelm the repair coordinator, who will have requested RF sets of Merkle > trees. That way the repair coordinator should only use > {{repair_session_space}} for the RF Merkle trees. > However, a repair session without {{{}-pr-{}}}/{{{}-partitioner-range{}}} > will send RF*RF validation requests, because the repair coordinator node has > RF-1 replicas and is also the replica of RF-1 nodes. Since all the requests > are sent at the same time, at some point the repair coordinator can have up > to RF*{{{}repair_session_space{}}} worth of Merkle trees if none of the > validation responses is fully processed before the last response arrives. > Even worse, if the cluster uses virtual nodes, many nodes can be replicas of > the repair coordinator, and some nodes can be replicas of multiple token > ranges. It would mean that the repair coordinator can send more than RF or > RF*RF simultaneous validation requests. > For example, in an 11-node cluster with RF=3 and 256 tokens, we have seen a > repair session involving 44 groups of ranges to be repaired. This produces > 44*3=132 validation requests contacting all the nodes in the cluster. When > the responses for all these requests start to arrive to the coordinator, each > containing up to {{repair_session_space}}/3 of Merkle trees, they accumulate > quicker than they are consumed, greatly exceeding {{repair_session_space}} > and OOMing the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org