[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829743#comment-17829743 ] Manish Khandelwal edited comment on CASSANDRA-18762 at 3/22/24 1:36 PM: I think reason for getting OOM here is related to same reasoning as mentioned in https://issues.apache.org/jira/browse/CASSANDRA-19336 I applied the patch for https://issues.apache.org/jira/browse/CASSANDRA-19336 and all full repairs with -pr on keyspace were successful. As without this patch in one repair we can see almost 240 sessions triggered ( vnode:256, 11*11 cluster), resulting in 240*6 merkle tree requests for one table. For a keywpace with 3 tables this number was astonishing 240*6*3 resulting in direct byte buffer within a minute of running. After applying the patch repairs ran without issue also no memory pressue. was (Author: manmagic3): I think reason for getting OOM here is related to same reasoning as mentioned in https://issues.apache.org/jira/browse/CASSANDRA-19336. I applied the patch for https://issues.apache.org/jira/browse/CASSANDRA-19336 and all full repairs with -pr on keyspace were successful. As without this patch in one repair we can see almost 240 sessions triggered ( vnode:256, 11*11 cluster), resulting in 240*6 merkle tree requests for one table. For a keywpace with 3 tables this number was astonishing 240*6*3 resulting in direct byte buffer within a minute of running. After applying the patch repairs ran without issue also no memory pressue. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.n
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829743#comment-17829743 ] Manish Khandelwal commented on CASSANDRA-18762: --- I think reason for getting OOM here is related to same reasoning as mentioned in https://issues.apache.org/jira/browse/CASSANDRA-19336. I applied the patch for https://issues.apache.org/jira/browse/CASSANDRA-19336 and all full repairs with -pr on keyspace were successful. As without this patch in one repair we can see almost 240 sessions triggered ( vnode:256, 11*11 cluster), resulting in 240*6 merkle tree requests for one table. For a keywpace with 3 tables this number was astonishing 240*6*3 resulting in direct byte buffer within a minute of running. After applying the patch repairs ran without issue also no memory pressue. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolEx
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820725#comment-17820725 ] Manish Khandelwal commented on CASSANDRA-18762: --- [~bschoeni] were vnodes enabled for 4DC cluster when you run parallel repair and getting Direct buffer OOM. Also what was the value of vnodes? > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+ParallelRefProcEnabled > -XX
[jira] [Commented] (CASSANDRA-19336) Repair causes out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819427#comment-17819427 ] Manish Khandelwal commented on CASSANDRA-19336: --- Fix Version should be 4.0.13 and probably 4.1.5. > Repair causes out of memory > --- > > Key: CASSANDRA-19336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19336 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0.12, 4.1.4, 5.0-beta2, 5.1 > > Time Spent: 40m > Remaining Estimate: 0h > > CASSANDRA-14096 introduced {{repair_session_space}} as a limit for the memory > usage for Merkle tree calculations during repairs. This limit is applied to > the set of Merkle trees built for a received validation request > ({{{}VALIDATION_REQ{}}}), divided by the replication factor so as not to > overwhelm the repair coordinator, who will have requested RF sets of Merkle > trees. That way the repair coordinator should only use > {{repair_session_space}} for the RF Merkle trees. > However, a repair session without {{{}-pr-{}}}/{{{}-partitioner-range{}}} > will send RF*RF validation requests, because the repair coordinator node has > RF-1 replicas and is also the replica of RF-1 nodes. Since all the requests > are sent at the same time, at some point the repair coordinator can have up > to RF*{{{}repair_session_space{}}} worth of Merkle trees if none of the > validation responses is fully processed before the last response arrives. > Even worse, if the cluster uses virtual nodes, many nodes can be replicas of > the repair coordinator, and some nodes can be replicas of multiple token > ranges. It would mean that the repair coordinator can send more than RF or > RF*RF simultaneous validation requests. > For example, in an 11-node cluster with RF=3 and 256 tokens, we have seen a > repair session involving 44 groups of ranges to be repaired. This produces > 44*3=132 validation requests contacting all the nodes in the cluster. When > the responses for all these requests start to arrive to the coordinator, each > containing up to {{repair_session_space}}/3 of Merkle trees, they accumulate > quicker than they are consumed, greatly exceeding {{repair_session_space}} > and OOMing the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817832#comment-17817832 ] Manish Khandelwal edited comment on CASSANDRA-18762 at 2/16/24 10:34 AM: - Another update setting --XX:MaxDirectMemorySize=10G to higher value (more than heap which is 8G) resulted in running repairs successfully on multiple nodes. But failure is still happening on some nodes. Will evaluate CASSANDRA-19336 but description says while running repairs without -pr thats why ignored it first as we are using full repairs with -pr option. {*}Update{*}: After pumping up --XX:MaxDirectMemorySize=12G, repairs were successful on all the nodes. was (Author: manmagic3): Another update setting --XX:MaxDirectMemorySize=10G to higher value (more than heap which is 8G) resulted in running repairs successfully on multiple nodes. But failure is still happening on some nodes. Will evaluate CASSANDRA-19336 but description says while running repairs without -pr thats why ignored it first as we are using full repairs with -pr option. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMes
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817832#comment-17817832 ] Manish Khandelwal commented on CASSANDRA-18762: --- Another update setting --XX:MaxDirectMemorySize=10G to higher value (more than heap which is 8G) resulted in running repairs successfully on multiple nodes. But failure is still happening on some nodes. Will evaluate CASSANDRA-19336 but description says while running repairs without -pr thats why ignored it first as we are using full repairs with -pr option. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817590#comment-17817590 ] Manish Khandelwal commented on CASSANDRA-18762: --- We are also getting the same issue on multi DC setup. Though in single DC things run fine for 11 nodes. But once another DC is addded it starts to fail pretty quickly. Getting the same error as mentioned in the issue here. Running repair table wise seems to be successful most of the times. But on keyspace level repairs always fails for one of the keyspace. This keyspace has three tables, all STCS with one table having almost no data. Tried setting *-XX:MaxDirectMemorySize* but results are same, i.e., getting out of memory. We are on java8. and Cassandra 4.0.10. I think with multi DC should be easy to reproduce. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorke
[jira] (CASSANDRA-8272) 2ndary indexes can return stale data
[ https://issues.apache.org/jira/browse/CASSANDRA-8272 ] Manish Khandelwal deleted comment on CASSANDRA-8272: -- was (Author: manmagic3): Hi, reading the comments on this ticket indicates a performance hit for people using queries with "ALLOW FILTERING" with CL above One or Local_ONE. But I see this ticket fixed in Cassandra 3.11.7. When I upgraded to Cassandra 3.11.13, I did not see the dip in performance but after upgrading to Cassandra 4.0.x , I am seeing test cases using "ALLOW FILTERING" with CL LOCAL_QUORUM taking a hit. Is there something which only went in Cassandra 4.0.x and not in 3.11.7+. > 2ndary indexes can return stale data > > > Key: CASSANDRA-8272 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8272 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index >Reporter: Sylvain Lebresne >Assignee: Andres de la Peña >Priority: Normal > Labels: pull-request-available > Fix For: 3.0.21, 3.11.7, 4.0-beta1, 4.0 > > Time Spent: 7h > Remaining Estimate: 0h > > When replica return 2ndary index results, it's possible for a single replica > to return a stale result and that result will be sent back to the user, > potentially failing the CL contract. > For instance, consider 3 replicas A, B and C, and the following situation: > {noformat} > CREATE TABLE test (k int PRIMARY KEY, v text); > CREATE INDEX ON test(v); > INSERT INTO test(k, v) VALUES (0, 'foo'); > {noformat} > with every replica up to date. Now, suppose that the following queries are > done at {{QUORUM}}: > {noformat} > UPDATE test SET v = 'bar' WHERE k = 0; > SELECT * FROM test WHERE v = 'foo'; > {noformat} > then, if A and B acknowledge the insert but C respond to the read before > having applied the insert, then the now stale result will be returned (since > C will return it and A or B will return nothing). > A potential solution would be that when we read a tombstone in the index (and > provided we make the index inherit the gcGrace of it's parent CF), instead of > skipping that tombstone, we'd insert in the result a corresponding range > tombstone. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8272) 2ndary indexes can return stale data
[ https://issues.apache.org/jira/browse/CASSANDRA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751279#comment-17751279 ] Manish Khandelwal commented on CASSANDRA-8272: -- Hi, reading the comments on this ticket indicates a performance hit for people using queries with "ALLOW FILTERING" with CL above One or Local_ONE. But I see this ticket fixed in Cassandra 3.11.7. When I upgraded to Cassandra 3.11.13, I did not see the dip in performance but after upgrading to Cassandra 4.0.x , I am seeing test cases using "ALLOW FILTERING" with CL LOCAL_QUORUM taking a hit. Is there something which only went in Cassandra 4.0.x and not in 3.11.7+. > 2ndary indexes can return stale data > > > Key: CASSANDRA-8272 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8272 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index >Reporter: Sylvain Lebresne >Assignee: Andres de la Peña >Priority: Normal > Labels: pull-request-available > Fix For: 3.0.21, 3.11.7, 4.0-beta1, 4.0 > > Time Spent: 7h > Remaining Estimate: 0h > > When replica return 2ndary index results, it's possible for a single replica > to return a stale result and that result will be sent back to the user, > potentially failing the CL contract. > For instance, consider 3 replicas A, B and C, and the following situation: > {noformat} > CREATE TABLE test (k int PRIMARY KEY, v text); > CREATE INDEX ON test(v); > INSERT INTO test(k, v) VALUES (0, 'foo'); > {noformat} > with every replica up to date. Now, suppose that the following queries are > done at {{QUORUM}}: > {noformat} > UPDATE test SET v = 'bar' WHERE k = 0; > SELECT * FROM test WHERE v = 'foo'; > {noformat} > then, if A and B acknowledge the insert but C respond to the read before > having applied the insert, then the now stale result will be returned (since > C will return it and A or B will return nothing). > A potential solution would be that when we read a tombstone in the index (and > provided we make the index inherit the gcGrace of it's parent CF), instead of > skipping that tombstone, we'd insert in the result a corresponding range > tombstone. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16246) Unexpected warning "Ignoring Unrecognized strategy option" for NetworkTopologyStrategy when restarting
[ https://issues.apache.org/jira/browse/CASSANDRA-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746090#comment-17746090 ] Manish Khandelwal edited comment on CASSANDRA-16246 at 7/24/23 6:49 AM: I am seeing these warnings in nodes of my DC1 for DC2 for Cassandra 4.0.10. What could be the reason. DC2 exists in our case. Nodes in DC2 are not showing such messages. 2023-07-23 17:03:09,706 AbstractReplicationStrategy.java:337 - Ignoring Unrecognized strategy option \{DC2} passed to NetworkTopologyStrategy for keyspace keyspace_name. Identified this issue when upgrading from 3.11.13 to Cassandra 4.0.10. Also saw similar logs when upgraded from Cassandra 3.11.13 to Cassandra 4.0.5. was (Author: manmagic3): I am seeing these warnings in nodes of my DC1 for DC2 for Cassandra 4.0.10. What could be the reason. DC2 exists in our case. Nodes in DC2 are not showing such messages. 2023-07-23 17:03:09,706 AbstractReplicationStrategy.java:337 - Ignoring Unrecognized strategy option \{DC2} passed to NetworkTopologyStrategy for keyspace keyspace_name > Unexpected warning "Ignoring Unrecognized strategy option" for > NetworkTopologyStrategy when restarting > -- > > Key: CASSANDRA-16246 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16246 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0-beta4, 4.0 > > Time Spent: 20m > Remaining Estimate: 0h > > During restarting, bunch of warning messages like > "AbstractReplicationStrategy.java:364 - Ignoring Unrecognized strategy option > {datacenter2} passed to NetworkTopologyStrategy for keyspace > distributed_test_keyspace" are logged. > The warnings are not expected since the mentioned DC exist. > It seems to be caused by the improper order during startup, so that when > opening keyspaces it is unaware of DCs. > The warning can be reproduced using the test below. > {code:java} > @Test > public void testEmitsWarningsForNetworkTopologyStategyConfigOnRestart() > throws Exception { > int nodesPerDc = 2; > try (Cluster cluster = builder().withConfig(c -> c.with(GOSSIP, NETWORK)) > .withRacks(2, 1, nodesPerDc) > .start()) { > cluster.schemaChange("CREATE KEYSPACE " + KEYSPACE + > " WITH replication = {'class': > 'NetworkTopologyStrategy', " + > "'datacenter1' : " + nodesPerDc + ", > 'datacenter2' : " + nodesPerDc + " };"); > cluster.get(2).nodetool("flush"); > System.out.println("Stop node 2 in datacenter 1"); > cluster.get(2).shutdown().get(); > System.out.println("Start node 2 in datacenter 1"); > cluster.get(2).startup(); > List result = cluster.get(2).logs().grep("Ignoring > Unrecognized strategy option \\{datacenter2\\}").getResult(); > Assert.assertFalse(result.isEmpty()); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16246) Unexpected warning "Ignoring Unrecognized strategy option" for NetworkTopologyStrategy when restarting
[ https://issues.apache.org/jira/browse/CASSANDRA-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746090#comment-17746090 ] Manish Khandelwal edited comment on CASSANDRA-16246 at 7/23/23 3:40 PM: I am seeing these warnings in nodes of my DC1 for DC2 for Cassandra 4.0.10. What could be the reason. DC2 exists in our case. Nodes in DC2 are not showing such messages. 2023-07-23 17:03:09,706 AbstractReplicationStrategy.java:337 - Ignoring Unrecognized strategy option \{DC2} passed to NetworkTopologyStrategy for keyspace keyspace_name was (Author: manmagic3): I am seeing these warnings in nodes of my DC1 for DC2 for Cassandra 4.0.10. What could be the reason. DC2 exists in our case. > Unexpected warning "Ignoring Unrecognized strategy option" for > NetworkTopologyStrategy when restarting > -- > > Key: CASSANDRA-16246 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16246 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0-beta4, 4.0 > > Time Spent: 20m > Remaining Estimate: 0h > > During restarting, bunch of warning messages like > "AbstractReplicationStrategy.java:364 - Ignoring Unrecognized strategy option > {datacenter2} passed to NetworkTopologyStrategy for keyspace > distributed_test_keyspace" are logged. > The warnings are not expected since the mentioned DC exist. > It seems to be caused by the improper order during startup, so that when > opening keyspaces it is unaware of DCs. > The warning can be reproduced using the test below. > {code:java} > @Test > public void testEmitsWarningsForNetworkTopologyStategyConfigOnRestart() > throws Exception { > int nodesPerDc = 2; > try (Cluster cluster = builder().withConfig(c -> c.with(GOSSIP, NETWORK)) > .withRacks(2, 1, nodesPerDc) > .start()) { > cluster.schemaChange("CREATE KEYSPACE " + KEYSPACE + > " WITH replication = {'class': > 'NetworkTopologyStrategy', " + > "'datacenter1' : " + nodesPerDc + ", > 'datacenter2' : " + nodesPerDc + " };"); > cluster.get(2).nodetool("flush"); > System.out.println("Stop node 2 in datacenter 1"); > cluster.get(2).shutdown().get(); > System.out.println("Start node 2 in datacenter 1"); > cluster.get(2).startup(); > List result = cluster.get(2).logs().grep("Ignoring > Unrecognized strategy option \\{datacenter2\\}").getResult(); > Assert.assertFalse(result.isEmpty()); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16246) Unexpected warning "Ignoring Unrecognized strategy option" for NetworkTopologyStrategy when restarting
[ https://issues.apache.org/jira/browse/CASSANDRA-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746090#comment-17746090 ] Manish Khandelwal commented on CASSANDRA-16246: --- I am seeing these warnings in nodes of my DC1 for DC2 for Cassandra 4.0.10. What could be the reason. DC2 exists in our case. > Unexpected warning "Ignoring Unrecognized strategy option" for > NetworkTopologyStrategy when restarting > -- > > Key: CASSANDRA-16246 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16246 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0-beta4, 4.0 > > Time Spent: 20m > Remaining Estimate: 0h > > During restarting, bunch of warning messages like > "AbstractReplicationStrategy.java:364 - Ignoring Unrecognized strategy option > {datacenter2} passed to NetworkTopologyStrategy for keyspace > distributed_test_keyspace" are logged. > The warnings are not expected since the mentioned DC exist. > It seems to be caused by the improper order during startup, so that when > opening keyspaces it is unaware of DCs. > The warning can be reproduced using the test below. > {code:java} > @Test > public void testEmitsWarningsForNetworkTopologyStategyConfigOnRestart() > throws Exception { > int nodesPerDc = 2; > try (Cluster cluster = builder().withConfig(c -> c.with(GOSSIP, NETWORK)) > .withRacks(2, 1, nodesPerDc) > .start()) { > cluster.schemaChange("CREATE KEYSPACE " + KEYSPACE + > " WITH replication = {'class': > 'NetworkTopologyStrategy', " + > "'datacenter1' : " + nodesPerDc + ", > 'datacenter2' : " + nodesPerDc + " };"); > cluster.get(2).nodetool("flush"); > System.out.println("Stop node 2 in datacenter 1"); > cluster.get(2).shutdown().get(); > System.out.println("Start node 2 in datacenter 1"); > cluster.get(2).startup(); > List result = cluster.get(2).logs().grep("Ignoring > Unrecognized strategy option \\{datacenter2\\}").getResult(); > Assert.assertFalse(result.isEmpty()); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] (CASSANDRA-14825) Expose table schema for drivers
[ https://issues.apache.org/jira/browse/CASSANDRA-14825 ] Manish Khandelwal deleted comment on CASSANDRA-14825: --- was (Author: manmagic3): Partial support for COMPACT STORAGE was restored via CASSANDRA-16217, so with this change(CASSANDRA-14825) DESC output for "COMPACT STORAGE" is giving a warning *Warning: Table keyspace.tablename omitted because it has constructs not compatible with CQL (was created via legacy API).* *Approximate structure, for reference:* *(this should not be used to reproduce this schema)* Is this the right behavior since Cassandra 4 allows creating table with COMPACT STORAGE and we are not describing the table schema. > Expose table schema for drivers > --- > > Key: CASSANDRA-14825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14825 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL >Reporter: Chris Lohfink >Assignee: Robert Stupp >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-beta1, 4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently the drivers recreate the CQL for the tables by putting together the > system table values. This is very difficult to keep up to date and buggy > enough that its only even supported in Java and Python drivers. Cassandra > already has some limited output available for snapshots that we could provide > in a virtual table or new query that the drivers can fetch. This can greatly > reduce the complexity of drivers while also reducing bugs like > CASSANDRA-14822 as the underlying schema and properties change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14825) Expose table schema for drivers
[ https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610896#comment-17610896 ] Manish Khandelwal edited comment on CASSANDRA-14825 at 9/29/22 8:54 AM: Partial support for COMPACT STORAGE was restored via CASSANDRA-16217, so with this change(CASSANDRA-14825) DESC output for "COMPACT STORAGE" is giving a warning *Warning: Table keyspace.tablename omitted because it has constructs not compatible with CQL (was created via legacy API).* *Approximate structure, for reference:* *(this should not be used to reproduce this schema)* Is this the right behavior since Cassandra 4 allows creating table with COMPACT STORAGE and we are not describing the table schema. was (Author: manmagic3): Partial support for COMPACT STORAGE was restored via https://issues.apache.org/jira/browse/CASSANDRA-16217, so with this change(CASSANDRA-14825) DESC output for "COMPACT STORAGE" is giving a warning *Warning: Table keyspace.tablename omitted because it has constructs not compatible with CQL (was created via legacy API).* *Approximate structure, for reference:* *(this should not be used to reproduce this schema)* Is this the right behavior since Cassandra 4 allows creating table with COMPACT STORAGE and we are not describing the table schema. > Expose table schema for drivers > --- > > Key: CASSANDRA-14825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14825 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL >Reporter: Chris Lohfink >Assignee: Robert Stupp >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-beta1, 4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently the drivers recreate the CQL for the tables by putting together the > system table values. This is very difficult to keep up to date and buggy > enough that its only even supported in Java and Python drivers. Cassandra > already has some limited output available for snapshots that we could provide > in a virtual table or new query that the drivers can fetch. This can greatly > reduce the complexity of drivers while also reducing bugs like > CASSANDRA-14822 as the underlying schema and properties change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14825) Expose table schema for drivers
[ https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610896#comment-17610896 ] Manish Khandelwal edited comment on CASSANDRA-14825 at 9/29/22 8:53 AM: Partial support for COMPACT STORAGE was restored via https://issues.apache.org/jira/browse/CASSANDRA-16217, so with this change(CASSANDRA-14825) DESC output for "COMPACT STORAGE" is giving a warning *Warning: Table keyspace.tablename omitted because it has constructs not compatible with CQL (was created via legacy API).* *Approximate structure, for reference:* *(this should not be used to reproduce this schema)* Is this the right behavior since Cassandra 4 allows creating table with COMPACT STORAGE and we are not describing the table schema. was (Author: manmagic3): Since partial support for COMPACT STORAGE was restored via https://issues.apache.org/jira/browse/CASSANDRA-16217, so with this change DESC output for "COMPACT STORAGE" is giving a warning *Warning: Table keyspace.tablename omitted because it has constructs not compatible with CQL (was created via legacy API).* *Approximate structure, for reference:* *(this should not be used to reproduce this schema)* Is this the right behavior since Cassandra 4 allows creating table with COMPACT STORAGE and we are not describing the table schema. > Expose table schema for drivers > --- > > Key: CASSANDRA-14825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14825 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL >Reporter: Chris Lohfink >Assignee: Robert Stupp >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-beta1, 4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently the drivers recreate the CQL for the tables by putting together the > system table values. This is very difficult to keep up to date and buggy > enough that its only even supported in Java and Python drivers. Cassandra > already has some limited output available for snapshots that we could provide > in a virtual table or new query that the drivers can fetch. This can greatly > reduce the complexity of drivers while also reducing bugs like > CASSANDRA-14822 as the underlying schema and properties change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14825) Expose table schema for drivers
[ https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610896#comment-17610896 ] Manish Khandelwal commented on CASSANDRA-14825: --- Since partial support for COMPACT STORAGE was restored via https://issues.apache.org/jira/browse/CASSANDRA-16217, so with this change DESC output for "COMPACT STORAGE" is giving a warning *Warning: Table keyspace.tablename omitted because it has constructs not compatible with CQL (was created via legacy API).* *Approximate structure, for reference:* *(this should not be used to reproduce this schema)* Is this the right behavior since Cassandra 4 allows creating table with COMPACT STORAGE and we are not describing the table schema. > Expose table schema for drivers > --- > > Key: CASSANDRA-14825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14825 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL >Reporter: Chris Lohfink >Assignee: Robert Stupp >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-beta1, 4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently the drivers recreate the CQL for the tables by putting together the > system table values. This is very difficult to keep up to date and buggy > enough that its only even supported in Java and Python drivers. Cassandra > already has some limited output available for snapshots that we could provide > in a virtual table or new query that the drivers can fetch. This can greatly > reduce the complexity of drivers while also reducing bugs like > CASSANDRA-14822 as the underlying schema and properties change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16843) auto-snapshots for dropped tables don't appear in nodetool listsnapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397097#comment-17397097 ] Manish Khandelwal edited comment on CASSANDRA-16843 at 8/11/21, 7:26 AM: - Looks okay to me. List snapshot should only show snapshot of live tables. As per clearsnapshot, once table is dropped no point in keeping its backup. Any use case which requires clearsnapshot to not clear snapshot of dropped tables. For example if I have dropped a table x, then its old snapshot does not come in list snapshot which is correct in my opinion. For clearsnapshot if old snapshots are getting cleared, I dont find any reason not to clear them (Space gets created and old snapshots are not of any use as such). I think functionality is correct. was (Author: manmagic3): Looks okay to me. List snapshot should only show snapshot of live tables. As per clearsnapshot, once table is dropped no point in keeping its backup. Any use case which requires clearsnapshot to not clear snapshot of dropped tables. For example if I have dropped a table x, then its old snapshot does not come in list snapshot which is correct in my opinion. For clearsnapshot if old snapshots are getting cleared, I dont find any reason not to clear them (Space gets created and old snapshots are not of any use as such) > auto-snapshots for dropped tables don't appear in nodetool listsnapshots > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug >Reporter: James Brown >Priority: Normal > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though they do get cleared by {{nodetool > clearsnapshot}}). This makes them kind of annoying to clean up, since you > need to muck about in the data directory to find them. > Erick on the mailing list said that this seems to be an oversight and that > clearsnapshot was fixed by > [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418]. > I reproduced this both on 3.11.11 and 4.0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16843) auto-snapshots for dropped tables don't appear in nodetool listsnapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397097#comment-17397097 ] Manish Khandelwal edited comment on CASSANDRA-16843 at 8/11/21, 7:23 AM: - Looks okay to me. List snapshot should only show snapshot of live tables. As per clearsnapshot, once table is dropped no point in keeping its backup. Any use case which requires clearsnapshot to not clear snapshot of dropped tables. For example if I have dropped a table x, then its old snapshot does not come in list snapshot which is correct in my opinion. For clearsnapshot if old snapshots are getting cleared, I dont find any reason not to clear them (Space gets created and old snapshots are not of any use as such) was (Author: manmagic3): Looks okay to me. List snapshot should only show snapshot of live tables. As per clearsnapshot, once table is dropped no point in keeping its backup. Any usecase which requires clearsnapshot to not clear snapshot of dropped tables. > auto-snapshots for dropped tables don't appear in nodetool listsnapshots > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug >Reporter: James Brown >Priority: Normal > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though they do get cleared by {{nodetool > clearsnapshot}}). This makes them kind of annoying to clean up, since you > need to muck about in the data directory to find them. > Erick on the mailing list said that this seems to be an oversight and that > clearsnapshot was fixed by > [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418]. > I reproduced this both on 3.11.11 and 4.0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16843) auto-snapshots for dropped tables don't appear in nodetool listsnapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397097#comment-17397097 ] Manish Khandelwal commented on CASSANDRA-16843: --- Looks okay to me. List snapshot should only show snapshot of live tables. As per clearsnapshot, once table is dropped no point in keeping its backup. Any usecase which requires clearsnapshot to not clear snapshot of dropped tables. > auto-snapshots for dropped tables don't appear in nodetool listsnapshots > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug >Reporter: James Brown >Priority: Normal > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though they do get cleared by {{nodetool > clearsnapshot}}). This makes them kind of annoying to clean up, since you > need to muck about in the data directory to find them. > Erick on the mailing list said that this seems to be an oversight and that > clearsnapshot was fixed by > [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418]. > I reproduced this both on 3.11.11 and 4.0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15833) Unresolvable false digest mismatch during upgrade due to CASSANDRA-10657
[ https://issues.apache.org/jira/browse/CASSANDRA-15833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Khandelwal reassigned CASSANDRA-15833: - Assignee: (was: Manish Khandelwal) > Unresolvable false digest mismatch during upgrade due to CASSANDRA-10657 > > > Key: CASSANDRA-15833 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15833 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Jacek Lewandowski >Priority: Normal > Fix For: 3.11.x, 4.x > > Attachments: CASSANDRA-15833-3.11.patch, CASSANDRA-15833-4.0.patch > > > CASSANDRA-10657 introduced changes in how the ColumnFilter is interpreted. > This results in digest mismatch when querying incomplete set of columns from > a table with consistency that requires reaching instances running pre > CASSANDRA-10657 from nodes that include CASSANDRA-10657 (it was introduced in > Cassandra 3.4). > The fix is to bring back the previous behaviour until there are no instances > running pre CASSANDRA-10657 version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15833) Unresolvable false digest mismatch during upgrade due to CASSANDRA-10657
[ https://issues.apache.org/jira/browse/CASSANDRA-15833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Khandelwal reassigned CASSANDRA-15833: - Assignee: Manish Khandelwal (was: Jacek Lewandowski) > Unresolvable false digest mismatch during upgrade due to CASSANDRA-10657 > > > Key: CASSANDRA-15833 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15833 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Jacek Lewandowski >Assignee: Manish Khandelwal >Priority: Normal > Fix For: 3.11.x, 4.x > > Attachments: CASSANDRA-15833-3.11.patch, CASSANDRA-15833-4.0.patch > > > CASSANDRA-10657 introduced changes in how the ColumnFilter is interpreted. > This results in digest mismatch when querying incomplete set of columns from > a table with consistency that requires reaching instances running pre > CASSANDRA-10657 from nodes that include CASSANDRA-10657 (it was introduced in > Cassandra 3.4). > The fix is to bring back the previous behaviour until there are no instances > running pre CASSANDRA-10657 version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14092) Max ttl of 20 years will overflow localDeletionTime
[ https://issues.apache.org/jira/browse/CASSANDRA-14092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Khandelwal reassigned CASSANDRA-14092: - Assignee: Paulo Motta (was: Manish Khandelwal) > Max ttl of 20 years will overflow localDeletionTime > --- > > Key: CASSANDRA-14092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14092 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Urgent > Fix For: 2.1.20, 2.2.12, 3.0.16, 3.11.2 > > Attachments: 2.1-14092-dtest.png, 2.1-14092-testall.png, > 2.2-14092-dtest.png, 2.2-14092-testall.png, 3.0-14092-dtest.png, > 3.0-14092-testall.png, 3.11-14092-dtest.png, 3.11-14092-testall.png, > trunk-14092-dtest.png, trunk-14092-testall.png > > > CASSANDRA-4771 added a max value of 20 years for ttl to protect against [year > 2038 overflow bug|https://en.wikipedia.org/wiki/Year_2038_problem] for > {{localDeletionTime}}. > It turns out that next year the {{localDeletionTime}} will start overflowing > with the maximum ttl of 20 years ({{System.currentTimeMillis() + ttl(20 > years) > Integer.MAX_VALUE}}), so we should remove this limitation. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14092) Max ttl of 20 years will overflow localDeletionTime
[ https://issues.apache.org/jira/browse/CASSANDRA-14092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Khandelwal reassigned CASSANDRA-14092: - Assignee: Manish Khandelwal (was: Paulo Motta) > Max ttl of 20 years will overflow localDeletionTime > --- > > Key: CASSANDRA-14092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14092 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Paulo Motta >Assignee: Manish Khandelwal >Priority: Urgent > Fix For: 2.1.20, 2.2.12, 3.0.16, 3.11.2 > > Attachments: 2.1-14092-dtest.png, 2.1-14092-testall.png, > 2.2-14092-dtest.png, 2.2-14092-testall.png, 3.0-14092-dtest.png, > 3.0-14092-testall.png, 3.11-14092-dtest.png, 3.11-14092-testall.png, > trunk-14092-dtest.png, trunk-14092-testall.png > > > CASSANDRA-4771 added a max value of 20 years for ttl to protect against [year > 2038 overflow bug|https://en.wikipedia.org/wiki/Year_2038_problem] for > {{localDeletionTime}}. > It turns out that next year the {{localDeletionTime}} will start overflowing > with the maximum ttl of 20 years ({{System.currentTimeMillis() + ttl(20 > years) > Integer.MAX_VALUE}}), so we should remove this limitation. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org