[jira] [Commented] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138135#comment-17138135 ] ZhaoYang commented on CASSANDRA-15861: -- [~dcapwell] you are right. {{IndexSummary}} can definitely cause trouble for entire-sstable-streaming.. Then the only option we have is to apply first approach to {{IndexSummary}} because we can't make {{IndexSummary}} fixed-length encoding.. > Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > --- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > This isn't a problem in legacy streaming as STATS file length didn't matter. > Ideally it will be great
[jira] [Comment Edited] (CASSANDRA-15782) Compression test failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138117#comment-17138117 ] Caleb Rackliffe edited comment on CASSANDRA-15782 at 6/17/20, 5:48 AM: --- [~Bereng] [~jolynch] I think [~djoshi] and I are seeing this pop up again here: https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/48/workflows/23de1e8d-108e-4138-8ea6-a650965920a5/jobs/2550/parallel-runs/2. The failure output seems to indicate the changes from this patch are present. (The C* branch in question should be [very close to trunk|https://github.com/dineshjoshi/cassandra/tree/CASSANDRA-14888].) was (Author: maedhroz): [~Bereng] [~jolynch] I think [~djoshi] and I are seeing this pop up again here: https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/48/workflows/23de1e8d-108e-4138-8ea6-a650965920a5/jobs/2550/parallel-runs/2. The failure output seems to indicate the changes from this patch are present. > Compression test failure > > > Key: CASSANDRA-15782 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15782 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Berenguer Blasi >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0, 4.0-alpha5 > > > On CASSANDRA-15560 compression test failed. This was bisected to > [9c1bbf3|https://github.com/apache/cassandra/commit/9c1bbf3ac913f9bdf7a0e0922106804af42d2c1e] > from CASSANDRA-15379. > Full details here > CC/ [~jolynch] in case he can spot it quick. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138119#comment-17138119 ] Caleb Rackliffe edited comment on CASSANDRA-14888 at 6/17/20, 5:46 AM: --- I've made a note in CASSANDRA-15782, but I think it's pretty safe to say this patch isn't the cause of any of the regressions we're seeing. I'd say we're ready to commit. was (Author: maedhroz): I've made a note in CASSANDRA-15782, but I think it's pretty safe to say this patch isn't the cause of any of the regressions we're seeing. > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138119#comment-17138119 ] Caleb Rackliffe commented on CASSANDRA-14888: - I've made a note in CASSANDRA-15782, but I think it's pretty safe to say this patch isn't the cause of any of the regressions we're seeing. > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15782) Compression test failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138117#comment-17138117 ] Caleb Rackliffe commented on CASSANDRA-15782: - [~Bereng] [~jolynch] I think [~djoshi] and I are seeing this pop up again here: https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/48/workflows/23de1e8d-108e-4138-8ea6-a650965920a5/jobs/2550/parallel-runs/2. The failure output seems to indicate the changes from this patch are present. > Compression test failure > > > Key: CASSANDRA-15782 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15782 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Berenguer Blasi >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0, 4.0-alpha5 > > > On CASSANDRA-15560 compression test failed. This was bisected to > [9c1bbf3|https://github.com/apache/cassandra/commit/9c1bbf3ac913f9bdf7a0e0922106804af42d2c1e] > from CASSANDRA-15379. > Full details here > CC/ [~jolynch] in case he can spot it quick. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138058#comment-17138058 ] Caleb Rackliffe edited comment on CASSANDRA-14888 at 6/17/20, 5:26 AM: --- [~djoshi] After the known issues above, the only other failure appears to be [compression_test.TestCompression|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/48/workflows/23de1e8d-108e-4138-8ea6-a650965920a5/jobs/2550], but CASSANDRA-15782 should have addressed that. Not exactly sure what's going on there, given that fix was committed to {{cassandra-dtests}} at the [beginning of May|https://github.com/apache/cassandra-dtest/commit/da7fcefb16d16af8924cda35c0a6a63ad553694f]. was (Author: maedhroz): [~djoshi] After the known issues above, the only other failure is that appears to be [compression_test.TestCompression|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/48/workflows/23de1e8d-108e-4138-8ea6-a650965920a5/jobs/2550], but CASSANDRA-15782 should have addressed that. Not exactly sure what's going on there yet... > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15871) Cassandra driver first connection get the user's own schema information
[ https://issues.apache.org/jira/browse/CASSANDRA-15871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138066#comment-17138066 ] Brandon Williams commented on CASSANDRA-15871: -- If they're both superusers, I'm not sure I understand the validity of the test. > Cassandra driver first connection get the user's own schema information > > > Key: CASSANDRA-15871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15871 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Schema, Messaging/Client >Reporter: maxwellguo >Priority: Normal > Attachments: 1.jpg > > > We know that cassandra driver making a conenction with the coordinator node > first time , the driver may select all the keyspaces/tables/columns/types > from the server and cache the data locally. > For different users they may have different tables and types ,so It is not > suitable to get all the meta data cached , It is fine to just cache the > user's own schema information not all. > And doing this is safe and save first time connection resourse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138058#comment-17138058 ] Caleb Rackliffe edited comment on CASSANDRA-14888 at 6/17/20, 3:18 AM: --- [~djoshi] After the known issues above, the only other failure is that appears to be [compression_test.TestCompression|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/48/workflows/23de1e8d-108e-4138-8ea6-a650965920a5/jobs/2550], but CASSANDRA-15782 should have addressed that. Not exactly sure what's going on there yet... was (Author: maedhroz): [~djoshi] After the known issues above, the only other failure is that appears to be [compression_test.TestCompression| https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/48/workflows/23de1e8d-108e-4138-8ea6-a650965920a5/jobs/2550], but CASSANDRA-15782 should have addressed that. Not exactly sure what's going on there yet... > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138058#comment-17138058 ] Caleb Rackliffe commented on CASSANDRA-14888: - [~djoshi] After the known issues above, the only other failure is that appears to be [compression_test.TestCompression| https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/48/workflows/23de1e8d-108e-4138-8ea6-a650965920a5/jobs/2550], but CASSANDRA-15782 should have addressed that. Not exactly sure what's going on there yet... > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15871) Cassandra driver first connection get the user's own schema information
[ https://issues.apache.org/jira/browse/CASSANDRA-15871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138040#comment-17138040 ] maxwellguo commented on CASSANDRA-15871: For my test, I use userA to create kscas and userB to create ksgc , all are superuser. But when doing connection first time use userA ,all keyspace include userB's ksgc are all return to driver. It seems use permissions does not work. > Cassandra driver first connection get the user's own schema information > > > Key: CASSANDRA-15871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15871 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Schema, Messaging/Client >Reporter: maxwellguo >Priority: Normal > Attachments: 1.jpg > > > We know that cassandra driver making a conenction with the coordinator node > first time , the driver may select all the keyspaces/tables/columns/types > from the server and cache the data locally. > For different users they may have different tables and types ,so It is not > suitable to get all the meta data cached , It is fine to just cache the > user's own schema information not all. > And doing this is safe and save first time connection resourse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137986#comment-17137986 ] Caleb Rackliffe edited comment on CASSANDRA-15879 at 6/17/20, 2:53 AM: --- I pushed up [a PR|https://github.com/apache/cassandra/pull/638] that approximates the 2.2 version. (There have been a number of other changes to {{CorruptedSSTablesCompactionsTest}} since then to fix other kinds of flakiness.) The only downside I can see to following through w/ this is that if we leave things as they are and there's another failure, we'd know exactly which seed broke things. was (Author: maedhroz): I pushed up [a PR|https://github.com/apache/cassandra/pull/638] that approximates the 2.2 version. (There have been a number of other changes to {{CorruptedSSTablesCompactionsTest}} since then to fix other kinds of flakiness.) > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15685) flaky testWithMismatchingPending - org.apache.cassandra.distributed.test.PreviewRepairTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138038#comment-17138038 ] Ekaterina Dimitrova commented on CASSANDRA-15685: - We might fix the IR but it is not a blocker now as there is no defect and it is rare case. It will be taken care in beta, it is not a blocker. That is what I meant > flaky testWithMismatchingPending - > org.apache.cassandra.distributed.test.PreviewRepairTest > -- > > Key: CASSANDRA-15685 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15685 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Kevin Gallardo >Assignee: Ekaterina Dimitrova >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-beta > > Attachments: log-CASSANDRA-15685.txt, output > > Time Spent: 10m > Remaining Estimate: 0h > > Observed in: > https://app.circleci.com/pipelines/github/newkek/cassandra/34/workflows/1c6b157d-13c3-48a9-85fb-9fe8c153256b/jobs/191/tests > Failure: > {noformat} > testWithMismatchingPending - > org.apache.cassandra.distributed.test.PreviewRepairTest > junit.framework.AssertionFailedError > at > org.apache.cassandra.distributed.test.PreviewRepairTest.testWithMismatchingPending(PreviewRepairTest.java:97) > {noformat} > [Circle > CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FCASSANDRA-15685] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15685) flaky testWithMismatchingPending - org.apache.cassandra.distributed.test.PreviewRepairTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138020#comment-17138020 ] David Capwell commented on CASSANDRA-15685: --- sorry for the delay. I am fine with the test getting fixed without changing IR, though this adds an unexpected edge case for users; though it is expected to be rare in production. > flaky testWithMismatchingPending - > org.apache.cassandra.distributed.test.PreviewRepairTest > -- > > Key: CASSANDRA-15685 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15685 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Kevin Gallardo >Assignee: Ekaterina Dimitrova >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-beta > > Attachments: log-CASSANDRA-15685.txt, output > > Time Spent: 10m > Remaining Estimate: 0h > > Observed in: > https://app.circleci.com/pipelines/github/newkek/cassandra/34/workflows/1c6b157d-13c3-48a9-85fb-9fe8c153256b/jobs/191/tests > Failure: > {noformat} > testWithMismatchingPending - > org.apache.cassandra.distributed.test.PreviewRepairTest > junit.framework.AssertionFailedError > at > org.apache.cassandra.distributed.test.PreviewRepairTest.testWithMismatchingPending(PreviewRepairTest.java:97) > {noformat} > [Circle > CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FCASSANDRA-15685] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15852) Handle errors in StreamSession#prepare
[ https://issues.apache.org/jira/browse/CASSANDRA-15852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15852: -- Status: Changes Suggested (was: Review In Progress) > Handle errors in StreamSession#prepare > -- > > Key: CASSANDRA-15852 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15852 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Streaming >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-beta > > > Since CASSANDRA-12229 we don't handle errors in {{StreamSession#prepare}} - > this makes a stream initiator hang forever if an error is thrown. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15851) Add bytebuddy support for in-jvm dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-15851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15851: -- Reviewers: Alex Petrov, David Capwell, David Capwell (was: Alex Petrov, David Capwell) Alex Petrov, David Capwell, David Capwell (was: Alex Petrov) Status: Review In Progress (was: Patch Available) * https://github.com/apache/cassandra-in-jvm-dtest-api/pull/11/files#diff-d58040416ac6fdf2482ed7100441b555R265 this allows null, so we should add a null check here https://github.com/krummas/cassandra/commit/6899cfb970d5674e2f012371bd6ba23294cf882d#diff-e398a00672550f1911eb13e4d4aa86cbR159 Overall LGTM, only a small thing; +1 [~ifesdjeen] I have not been paying attention, are we planning to release .3 or are we going to start supporting snapshots? I know I have been the one opposing snapshots so not sure if it got brought up again. > Add bytebuddy support for in-jvm dtests > --- > > Key: CASSANDRA-15851 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15851 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Labels: pull-request-available > > Old python dtests support byteman, but that is quite horrible to work with, > [bytebuddy|https://bytebuddy.net/#/] is much better, so we should add support > for that in the in-jvm dtests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-14888: Status: Review In Progress (was: Changes Suggested) > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15852) Handle errors in StreamSession#prepare
[ https://issues.apache.org/jira/browse/CASSANDRA-15852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137987#comment-17137987 ] David Capwell commented on CASSANDRA-15852: --- I am +1 assuming the exception type is changed, or a exception message is checked. > Handle errors in StreamSession#prepare > -- > > Key: CASSANDRA-15852 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15852 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Streaming >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-beta > > > Since CASSANDRA-12229 we don't handle errors in {{StreamSession#prepare}} - > this makes a stream initiator hang forever if an error is thrown. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15879: Test and Documentation Plan: CircleCI: TODO Status: Patch Available (was: In Progress) > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137986#comment-17137986 ] Caleb Rackliffe commented on CASSANDRA-15879: - I pushed up [a PR|https://github.com/apache/cassandra/pull/638] that approximates the 2.2 version. (There have been a number of other changes to {{CorruptedSSTablesCompactionsTest}} since then to fix other kinds of flakiness.) > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15852) Handle errors in StreamSession#prepare
[ https://issues.apache.org/jira/browse/CASSANDRA-15852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15852: -- Reviewers: David Capwell, David Capwell (was: David Capwell) David Capwell, David Capwell Status: Review In Progress (was: Patch Available) Mostly LGTM * https://github.com/krummas/cassandra/commit/ee0a5f2a849b8a11d760ca2975a61fd4bbdc1735#diff-040c2f1cb2ba51b14c7e249412d6574eR62 Can we use a custom exception here, or add a message and verify the message? RuntimeException can be thrown in many locations, so this test could pass without triggering this condition. > Handle errors in StreamSession#prepare > -- > > Key: CASSANDRA-15852 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15852 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Streaming >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0-beta > > > Since CASSANDRA-12229 we don't handle errors in {{StreamSession#prepare}} - > this makes a stream initiator hang forever if an error is thrown. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe reassigned CASSANDRA-15879: --- Assignee: Caleb Rackliffe > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15879: Source Control Link: https://github.com/apache/cassandra/pull/638 (3.0) > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe reassigned CASSANDRA-15879: --- Assignee: (was: Caleb Rackliffe) > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Priority: Normal > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release
[ https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137982#comment-17137982 ] Josh McKenzie commented on CASSANDRA-13994: --- [~djoshi] - are you reviewing this? Or Jordan, or Sylvain, or? :) Seems like we have a lot of hands on this one. Just want to clarify so I know who to -badger- follow up with about it. > Remove COMPACT STORAGE internals before 4.0 release > --- > > Key: CASSANDRA-13994 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13994 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Alex Petrov >Assignee: Ekaterina Dimitrova >Priority: Low > Fix For: 4.0, 4.0-alpha > > > 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after > [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of > the related functionality is useless. > There are still some things to consider: > 1. One of the system tables (built indexes) was compact. For now, we just > added {{value}} column to it to make sure it's backwards-compatible, but we > might want to make sure it's just a "normal" table and doesn't have redundant > columns. > 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is > trivial, but this would mean that all built indexes will be defunct. We could > log a warning for now and ask users to migrate off those for now and > completely remove it from future releases. It's just a couple of classes > though. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137979#comment-17137979 ] David Capwell commented on CASSANDRA-15861: --- If reading this correctly, I wonder if this should also be a issue with org.apache.cassandra.io.sstable.format.SSTableReader#cloneWithNewSummarySamplingLevel which is called by org.apache.cassandra.io.sstable.IndexSummaryRedistribution; this modifies the summary file in place. > Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > --- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > This isn't a problem in legacy streaming as STATS file leng
[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15879: Status: Open (was: Resolved) It seems the test has been renamed to {{CorruptedSSTablesCompactionsTest}}. > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Issue Comment Deleted] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15879: Comment: was deleted (was: Sorry for the noise. The fork where the failure occurred has not been sync'd with the upstream repo in quite a while. {{BlacklistingCompactionsTest}} no longer exists.) > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15879: Resolution: Invalid Status: Resolved (was: Open) Sorry for the noise. The fork where the failure occurred has not been sync'd with the upstream repo in quite a while. {{BlacklistingCompactionsTest}} no longer exists. > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta > > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15879: Fix Version/s: (was: 4.0-beta) (was: 3.11.x) (was: 3.0.x) > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-14888: Test and Documentation Plan: Review and tests CircleCI: https://circleci.com/gh/dineshjoshi/cassandra/2534 was: Review and tests CircleCI: https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
Caleb Rackliffe created CASSANDRA-15879: --- Summary: Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy Key: CASSANDRA-15879 URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 Project: Cassandra Issue Type: Bug Components: Test/unit Reporter: Caleb Rackliffe CASSANDRA-14238 addressed the failure in {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] on trunk. It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe reassigned CASSANDRA-15879: --- Assignee: Caleb Rackliffe > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14238) Flaky Unittest: org.apache.cassandra.db.compaction.BlacklistingCompactionsTest
[ https://issues.apache.org/jira/browse/CASSANDRA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137951#comment-17137951 ] Caleb Rackliffe commented on CASSANDRA-14238: - Just created CASSANDRA-15879. Will put up the patch shortly... > Flaky Unittest: org.apache.cassandra.db.compaction.BlacklistingCompactionsTest > -- > > Key: CASSANDRA-14238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14238 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Testing >Reporter: Jay Zhuang >Assignee: Marcus Eriksson >Priority: Low > Labels: testing > Fix For: 2.2.13 > > > The unittest is flaky > {noformat} > [junit] Testcase: > testBlacklistingWithSizeTieredCompactionStrategy(org.apache.cassandra.db.compaction.BlacklistingCompactionsTest): > FAILED > [junit] expected:<8> but was:<25> > [junit] junit.framework.AssertionFailedError: expected:<8> but was:<25> > [junit] at > org.apache.cassandra.db.compaction.BlacklistingCompactionsTest.testBlacklisting(BlacklistingCompactionsTest.java:170) > [junit] at > org.apache.cassandra.db.compaction.BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy(BlacklistingCompactionsTest.java:71) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15879: Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Low Hanging Fruit Discovered By: Unit Test Fix Version/s: 4.0-beta 3.11.x 3.0.x Reviewers: Dinesh Joshi, Marcus Eriksson Severity: Normal Status: Open (was: Triage Needed) > Flaky unit test: > BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy > - > > Key: CASSANDRA-15879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15879 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta > > > CASSANDRA-14238 addressed the failure in > {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}}, > but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the > failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] > on trunk. > It looks like this should be a clean merge forward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137936#comment-17137936 ] Caleb Rackliffe commented on CASSANDRA-14888: - ...and {{hintedhandoff_test.TestHintedHandoffConfig}} appears to be covered by CASSANDRA-15865. > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137933#comment-17137933 ] Caleb Rackliffe edited comment on CASSANDRA-14888 at 6/16/20, 9:23 PM: --- The one failure in the unit tests appears to be CASSANDRA-14238 resurrected, which we've already noted. In the dtests, the {{TestPushedNotifications}} failures have already been noted in CASSANDRA-15877. There also appears to be some good evidence that {{test_simple_repair_order_preserving - repair_tests.repair_test.TestRepair}} is [flaky|https://issues.apache.org/jira/browse/CASSANDRA-15170?focusedCommentId=16909454&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16909454]. was (Author: maedhroz): The one failure in the unit tests appears to be CASSANDRA-14238 resurrected, which we've already noted. In the dtests, the {{pushed_notifications_test.TestPushedNotifications}} have already been noted in CASSANDRA-15877. There also appears to be some good evidence that {{test_simple_repair_order_preserving - repair_tests.repair_test.TestRepair}} is [flaky|https://issues.apache.org/jira/browse/CASSANDRA-15170?focusedCommentId=16909454&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16909454]. > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137933#comment-17137933 ] Caleb Rackliffe commented on CASSANDRA-14888: - The one failure in the unit tests appears to be CASSANDRA-14238 resurrected, which we've already noted. In the dtests, the {{pushed_notifications_test.TestPushedNotifications}} have already been noted in CASSANDRA-15877. There also appears to be some good evidence that {{test_simple_repair_order_preserving - repair_tests.repair_test.TestRepair}} is [flaky|https://issues.apache.org/jira/browse/CASSANDRA-15170?focusedCommentId=16909454&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16909454]. > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136793#comment-17136793 ] Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-15874 at 6/16/20, 9:03 PM: - thanks [~brandon.williams] can you please provide the symptoms of this race conditions? in my case I see only some portion of the data is not bootstrapped but rest of the data bootstrapped without any issues. was (Author: jaid): thanks [~brandon.williams] can you please provide the symptoms of this race conditions? in my case I see only some portion of the data is bootstrapped but rest of the data bootstrapped without any issues. > Bootstrap completes Successfully without streaming all the data > --- > > Key: CASSANDRA-15874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15874 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > I am seeing a strange issue where, adding a new node with auto_bootstrap: > true is not streaming all the data before it joins the cluster. Don't see any > information in the logs about bootstrap failures. > Here is the sequence of logs > > {code:java} > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > schema complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > waiting for pending range calculation > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > calculation complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > getting bootstrap token > INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: > Starting to bootstrap... > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId . If a table was just created, this is likely due to the schema > not being fully propagated. Please wait for schema agreement on table > creation. > INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] > All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StorageService.java:1505 - Bootstrap completed! for the tokens > {code} > Cassandra Version: 3.11.3 > I am not able to reproduce this issue all the time, but it happened couple of > times. Is there any race condition/corner case, which could cause this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-14888: - Status: Changes Suggested (was: Review In Progress) > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137924#comment-17137924 ] Dinesh Joshi commented on CASSANDRA-14888: -- I kicked off a [test run|https://circleci.com/workflow-run/de5f7cdb-06b6-4869-9d19-81a145e79f3f]. However, I see a couple failures. Could you both please ensure that they are indeed unrelated / flaky? > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15677) Topology events are not sent to clients if the nodes use the same network interface
[ https://issues.apache.org/jira/browse/CASSANDRA-15677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15677: - Status: Open (was: Resolved) This broke two python dtests: test_restart_node_localhost - pushed_notifications_test.TestPushedNotifications, test_move_single_node_localhost - pushed_notifications_test.TestPushedNotifications. This is because those tests expect no notifications, where now some exist: https://app.circleci.com/pipelines/github/driftx/cassandra/29/workflows/e2a641fa-8100-49e6-8c5a-da46d3fcee5f/jobs/241 These both assert this way due to CASSANDRA-10052. Are we sure this new behavior is correct? > Topology events are not sent to clients if the nodes use the same network > interface > --- > > Key: CASSANDRA-15677 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15677 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Alan Boudreault >Assignee: Bryn Cooke >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha5 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > *This bug only happens when the cassandra nodes are configured to use a > single network interface (ip) but different ports. See CASSANDRA-7544.* > Issue: The topology events aren't sent to clients. The problem is that the > port is not taken into account when determining if we send it or not: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/Server.java#L624 > To reproduce: > {code} > # I think the cassandra-test branch is required to get the -S option > (USE_SINGLE_INTERFACE) > ccm create -n4 local40 -v 4.0-alpha2 -S > {code} > > Then run this small python driver script: > {code} > import time > from cassandra.cluster import Cluster > cluster = Cluster() > session = cluster.connect() > while True: > print(cluster.metadata.all_hosts()) > print([h.is_up for h in cluster.metadata.all_hosts()]) > time.sleep(5) > {code} > Then decommission a node: > {code} > ccm node2 nodetool disablebinary > ccm node2 nodetool decommission > {code} > > You should see that the node is never removed from the client cluster > metadata and the reconnector started. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15794) Upgraded C* (4.x) fail to start because of Compact Tables & dropping compact tables in downgraded C* (3.11.4) introduces non-existent columns
[ https://issues.apache.org/jira/browse/CASSANDRA-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134575#comment-17134575 ] Zhuqi Jin edited comment on CASSANDRA-15794 at 6/16/20, 7:57 PM: - Hi, [~ifesdjeen]. I created patches for 3.0 and 3.11, according to the third method we discussed earlier. [^CASSANDRA-15794-branch-3.0.patch] I've attached the patches. Would you mind reviewing them? And I'd like to move on to the second method. Could you please do me a favor? We don‘t want to generate new commit logs before we hit the error in 4.x, so I need to know when and where the commit logs were written. was (Author: zhuqi1108): Hi, [~ifesdjeen]. I created patches for 3.0 and 3.11, according to the third method we discussed earlier. [^CASSANDRA-15794-branch-3.0.patch] I've attached the patches. Would you mind reviewing them? > Upgraded C* (4.x) fail to start because of Compact Tables & dropping compact > tables in downgraded C* (3.11.4) introduces non-existent columns > - > > Key: CASSANDRA-15794 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15794 > Project: Cassandra > Issue Type: Bug >Reporter: Zhuqi Jin >Priority: Normal > Attachments: CASSANDRA-15794-branch-3.0.patch, > CASSANDRA-15794-branch-3.11.patch > > > We tried to test upgrading a 3.11.4 C* cluster to 4.x and run into the > following problems. > * We started a single 3.11.4 C* node. > * We ran cassandra-stress like this > {code:java} > ./cassandra-stress write n = 30 -rate threads = 10 -node 172.17.0.2 {code} > * We stopped this node, and started a C* node running C* compiled from trunk > (git commit: e394dc0bb32f612a476269010930c617dd1ed3cb) > * New C* failed to start with the following error message > {code:java} > ERROR [main] 2020-05-07 00:58:18,503 CassandraDaemon.java:245 - Error while > loading schema: ERROR [main] 2020-05-07 00:58:18,503 CassandraDaemon.java:245 > - Error while loading schema: java.lang.IllegalArgumentException: Compact > Tables are not allowed in Cassandra starting with 4.0 version. Use `ALTER ... > DROP COMPACT STORAGE` command supplied in 3.x/3.11 Cassandra in order to > migrate off Compact Storage. at > org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:965) > at > org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:924) > at > org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:883) > at > org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:874) > at > org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:862) > at org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:102) at > org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:91) at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:241) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:653) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:770)Exception > (java.lang.IllegalArgumentException) encountered during startup: Compact > Tables are not allowed in Cassandra starting with 4.0 version. Use `ALTER ... > DROP COMPACT STORAGE` command supplied in 3.x/3.11 Cassandra in order to > migrate off Compact Storage.ERROR [main] 2020-05-07 00:58:18,520 > CassandraDaemon.java:792 - Exception encountered during > startupjava.lang.IllegalArgumentException: Compact Tables are not allowed in > Cassandra starting with 4.0 version. Use `ALTER ... DROP COMPACT STORAGE` > command supplied in 3.x/3.11 Cassandra in order to migrate off Compact > Storage. at > org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:965) > at > org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:924) > at > org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:883) > at > org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:874) > at > org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:862) > at org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:102) at > org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:91) at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:241) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:653) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:770){code} > * We stopped the trunk version C* and started the 3.11.4 version C*. > * 3.11.4 C* failed to start with the following error messages: > {code:
[jira] [Comment Edited] (CASSANDRA-15877) Followup on CASSANDRA-15600
[ https://issues.apache.org/jira/browse/CASSANDRA-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137860#comment-17137860 ] Ekaterina Dimitrova edited comment on CASSANDRA-15877 at 6/16/20, 7:11 PM: --- Thank you for the review [~kornelpal]! After the change from random tokens to splits, TokenAllocatorDiagnostics.randomTokensGenerated does not seem to be used anymore. Could you please consider removing it, if not needed. I think actually I should create new method probably for an event to be published. I will check what is needed a bit later or Thursday (off tomorrow). Good catch! I've noticed that you added a new NoReplicationTokenAllocatorTest.failed field with assertions, but it does not seem to be set to true anywhere. Could you please check whether it is needed. I think this assertion is actually not needed anymore was (Author: e.dimitrova): Thank you for the review [~kornelpal]! ?? After the change from random tokens to splits, TokenAllocatorDiagnostics.randomTokensGenerated does not seem to be used anymore. Could you please consider removing it, if not needed.?? I think actually I should create new method probably for an event to be published. I will check what is needed a bit later or Thursday (off tomorrow). Good catch! ?? I've noticed that you added a new NoReplicationTokenAllocatorTest.failed field with assertions, but it does not seem to be set to true anywhere. Could you please check whether it is needed.?? I think this assertion is actually not needed anymore > Followup on CASSANDRA-15600 > --- > > Key: CASSANDRA-15877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15877 > Project: Cassandra > Issue Type: Bug > Components: Feature/Virtual Nodes >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-06-12 at 3.21.18 PM.png > > > As part of CASSANDRA-15600 generateSplits method replaced the > generateRandomTokens for NoReplicationAwareTokenAllocator. generateSplits > should be used also in ReplicationAwareTokenAllocator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15877) Followup on CASSANDRA-15600
[ https://issues.apache.org/jira/browse/CASSANDRA-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137860#comment-17137860 ] Ekaterina Dimitrova edited comment on CASSANDRA-15877 at 6/16/20, 7:11 PM: --- Thank you for the review [~kornelpal]! _After the change from random tokens to splits, TokenAllocatorDiagnostics.randomTokensGenerated does not seem to be used anymore. Could you please consider removing it, if not needed._ I think actually I should create new method probably for an event to be published. I will check what is needed a bit later or Thursday (off tomorrow). Good catch! _I've noticed that you added a new NoReplicationTokenAllocatorTest.failed field with assertions, but it does not seem to be set to true anywhere. Could you please check whether it is needed._ I think this assertion is actually not needed anymore was (Author: e.dimitrova): Thank you for the review [~kornelpal]! After the change from random tokens to splits, TokenAllocatorDiagnostics.randomTokensGenerated does not seem to be used anymore. Could you please consider removing it, if not needed. I think actually I should create new method probably for an event to be published. I will check what is needed a bit later or Thursday (off tomorrow). Good catch! I've noticed that you added a new NoReplicationTokenAllocatorTest.failed field with assertions, but it does not seem to be set to true anywhere. Could you please check whether it is needed. I think this assertion is actually not needed anymore > Followup on CASSANDRA-15600 > --- > > Key: CASSANDRA-15877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15877 > Project: Cassandra > Issue Type: Bug > Components: Feature/Virtual Nodes >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-06-12 at 3.21.18 PM.png > > > As part of CASSANDRA-15600 generateSplits method replaced the > generateRandomTokens for NoReplicationAwareTokenAllocator. generateSplits > should be used also in ReplicationAwareTokenAllocator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15877) Followup on CASSANDRA-15600
[ https://issues.apache.org/jira/browse/CASSANDRA-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137860#comment-17137860 ] Ekaterina Dimitrova commented on CASSANDRA-15877: - Thank you for the review [~kornelpal]! ?? After the change from random tokens to splits, TokenAllocatorDiagnostics.randomTokensGenerated does not seem to be used anymore. Could you please consider removing it, if not needed.?? I think actually I should create new method probably for an event to be published. I will check what is needed a bit later or Thursday (off tomorrow). Good catch! ?? I've noticed that you added a new NoReplicationTokenAllocatorTest.failed field with assertions, but it does not seem to be set to true anywhere. Could you please check whether it is needed.?? I think this assertion is actually not needed anymore > Followup on CASSANDRA-15600 > --- > > Key: CASSANDRA-15877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15877 > Project: Cassandra > Issue Type: Bug > Components: Feature/Virtual Nodes >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-06-12 at 3.21.18 PM.png > > > As part of CASSANDRA-15600 generateSplits method replaced the > generateRandomTokens for NoReplicationAwareTokenAllocator. generateSplits > should be used also in ReplicationAwareTokenAllocator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14238) Flaky Unittest: org.apache.cassandra.db.compaction.BlacklistingCompactionsTest
[ https://issues.apache.org/jira/browse/CASSANDRA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137841#comment-17137841 ] Marcus Eriksson commented on CASSANDRA-14238: - [~maedhroz]/[~djoshi] could you open a new jira? > Flaky Unittest: org.apache.cassandra.db.compaction.BlacklistingCompactionsTest > -- > > Key: CASSANDRA-14238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14238 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Testing >Reporter: Jay Zhuang >Assignee: Marcus Eriksson >Priority: Low > Labels: testing > Fix For: 2.2.13 > > > The unittest is flaky > {noformat} > [junit] Testcase: > testBlacklistingWithSizeTieredCompactionStrategy(org.apache.cassandra.db.compaction.BlacklistingCompactionsTest): > FAILED > [junit] expected:<8> but was:<25> > [junit] junit.framework.AssertionFailedError: expected:<8> but was:<25> > [junit] at > org.apache.cassandra.db.compaction.BlacklistingCompactionsTest.testBlacklisting(BlacklistingCompactionsTest.java:170) > [junit] at > org.apache.cassandra.db.compaction.BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy(BlacklistingCompactionsTest.java:71) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14238) Flaky Unittest: org.apache.cassandra.db.compaction.BlacklistingCompactionsTest
[ https://issues.apache.org/jira/browse/CASSANDRA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137825#comment-17137825 ] Caleb Rackliffe commented on CASSANDRA-14238: - [~marcuse] We've been able [to reproduce this|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests] in a branch based on the current trunk while working on CASSANDRA-14888. CC [~djoshi] > Flaky Unittest: org.apache.cassandra.db.compaction.BlacklistingCompactionsTest > -- > > Key: CASSANDRA-14238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14238 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Testing >Reporter: Jay Zhuang >Assignee: Marcus Eriksson >Priority: Low > Labels: testing > Fix For: 2.2.13 > > > The unittest is flaky > {noformat} > [junit] Testcase: > testBlacklistingWithSizeTieredCompactionStrategy(org.apache.cassandra.db.compaction.BlacklistingCompactionsTest): > FAILED > [junit] expected:<8> but was:<25> > [junit] junit.framework.AssertionFailedError: expected:<8> but was:<25> > [junit] at > org.apache.cassandra.db.compaction.BlacklistingCompactionsTest.testBlacklisting(BlacklistingCompactionsTest.java:170) > [junit] at > org.apache.cassandra.db.compaction.BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy(BlacklistingCompactionsTest.java:71) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14238) Flaky Unittest: org.apache.cassandra.db.compaction.BlacklistingCompactionsTest
[ https://issues.apache.org/jira/browse/CASSANDRA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137822#comment-17137822 ] Dinesh Joshi commented on CASSANDRA-14238: -- trunk has a similar failure: https://circleci.com/gh/dineshjoshi/cassandra/2516#tests/containers/36 [~marcuse] can you please confirm? > Flaky Unittest: org.apache.cassandra.db.compaction.BlacklistingCompactionsTest > -- > > Key: CASSANDRA-14238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14238 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Testing >Reporter: Jay Zhuang >Assignee: Marcus Eriksson >Priority: Low > Labels: testing > Fix For: 2.2.13 > > > The unittest is flaky > {noformat} > [junit] Testcase: > testBlacklistingWithSizeTieredCompactionStrategy(org.apache.cassandra.db.compaction.BlacklistingCompactionsTest): > FAILED > [junit] expected:<8> but was:<25> > [junit] junit.framework.AssertionFailedError: expected:<8> but was:<25> > [junit] at > org.apache.cassandra.db.compaction.BlacklistingCompactionsTest.testBlacklisting(BlacklistingCompactionsTest.java:170) > [junit] at > org.apache.cassandra.db.compaction.BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy(BlacklistingCompactionsTest.java:71) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15871) Cassandra driver first connection get the user's own schema information
[ https://issues.apache.org/jira/browse/CASSANDRA-15871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137803#comment-17137803 ] Brandon Williams commented on CASSANDRA-15871: -- bq. different users can get all the schemas information when first made a connection Use permissions to prevent them from being able to do that. > Cassandra driver first connection get the user's own schema information > > > Key: CASSANDRA-15871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15871 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Schema, Messaging/Client >Reporter: maxwellguo >Priority: Normal > Attachments: 1.jpg > > > We know that cassandra driver making a conenction with the coordinator node > first time , the driver may select all the keyspaces/tables/columns/types > from the server and cache the data locally. > For different users they may have different tables and types ,so It is not > suitable to get all the meta data cached , It is fine to just cache the > user's own schema information not all. > And doing this is safe and save first time connection resourse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137801#comment-17137801 ] Brandon Williams commented on CASSANDRA-15874: -- That is the symptom. > Bootstrap completes Successfully without streaming all the data > --- > > Key: CASSANDRA-15874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15874 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > I am seeing a strange issue where, adding a new node with auto_bootstrap: > true is not streaming all the data before it joins the cluster. Don't see any > information in the logs about bootstrap failures. > Here is the sequence of logs > > {code:java} > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > schema complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > waiting for pending range calculation > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > calculation complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > getting bootstrap token > INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: > Starting to bootstrap... > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId . If a table was just created, this is likely due to the schema > not being fully propagated. Please wait for schema agreement on table > creation. > INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] > All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StorageService.java:1505 - Bootstrap completed! for the tokens > {code} > Cassandra Version: 3.11.3 > I am not able to reproduce this issue all the time, but it happened couple of > times. Is there any race condition/corner case, which could cause this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-14888: Test and Documentation Plan: Review and tests CircleCI: https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f was: Review and tests CircleCI: https://app.circleci.com/pipelines/github/maedhroz/cassandra?branch=14888-maedhroz > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: CASSANDRA-14888.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15863) Bootstrap resume and TestReplaceAddress fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15863: - Since Version: 3.0.0 Source Control Link: https://github.com/apache/cassandra/commit/eacdfc4978547b8e7be06c9ba9611c29963e6cc2 Resolution: Fixed Status: Resolved (was: Ready to Commit) Thanks, committed. I will note though that your 3.11 branch and beyond would not compile because your call to finishJoiningRing did not include the boolean (which 3.0 does not have.) I added that on commit. > Bootstrap resume and TestReplaceAddress fixes > - > > Key: CASSANDRA-15863 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15863 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission, Test/dtest >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > This has been > [broken|https://ci-cassandra.apache.org/job/Cassandra-trunk/159/testReport/dtest-large.replace_address_test/TestReplaceAddress/test_restart_failed_replace/history/] > for ages -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15863) Bootstrap resume and TestReplaceAddress fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15863: - Status: Ready to Commit (was: Review In Progress) > Bootstrap resume and TestReplaceAddress fixes > - > > Key: CASSANDRA-15863 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15863 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission, Test/dtest >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > This has been > [broken|https://ci-cassandra.apache.org/job/Cassandra-trunk/159/testReport/dtest-large.replace_address_test/TestReplaceAddress/test_restart_failed_replace/history/] > for ages -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-dtest] branch master updated: Fix flaky replace address tests and bootstrap resume fixes
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git The following commit(s) were added to refs/heads/master by this push: new f6b79ab Fix flaky replace address tests and bootstrap resume fixes f6b79ab is described below commit f6b79abec6f059add754d5cceaab1009089c962b Author: Bereng AuthorDate: Wed Jun 10 14:59:34 2020 +0200 Fix flaky replace address tests and bootstrap resume fixes Patch by Berenguer Blasi, reviewed by brandonwilliams for CASSANDRA-15863 --- replace_address_test.py | 37 + 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/replace_address_test.py b/replace_address_test.py index bc122c7..0a34ac8 100644 --- a/replace_address_test.py +++ b/replace_address_test.py @@ -182,10 +182,13 @@ class BaseReplaceAddressTest(Tester): # a little hacky but grep_log returns the whole line... num_tokens = int(self.replacement_node.get_conf_option('num_tokens')) -logger.debug("Verifying {} tokens migrated sucessfully".format(num_tokens)) -logs = self.replacement_node.grep_log(r"Token (.*?) changing ownership from /{} to /{}" - .format(self.replaced_node.address(), - self.replacement_node.address())) +logger.debug("Verifying {} tokens migrated successfully".format(num_tokens)) +replmnt_address = ("/" + self.replacement_node.address()) if self.cluster.version() < '4.0' else self.replacement_node.address_and_port() +repled_address = ("/" + self.replaced_node.address()) if self.cluster.version() < '4.0' else self.replaced_node.address_and_port() +token_ownership_log = r"Token (.*?) changing ownership from {} to {}".format(repled_address, + replmnt_address) +logs = self.replacement_node.grep_log(token_ownership_log) + if (previous_log_size is not None): assert len(logs) == previous_log_size @@ -321,7 +324,9 @@ class TestReplaceAddress(BaseReplaceAddressTest): self._do_replace(replace_address='127.0.0.5', wait_for_binary_proto=False) logger.debug("Waiting for replace to fail") -self.replacement_node.watch_log_for("java.lang.RuntimeException: Cannot replace_address /127.0.0.5 because it doesn't exist in gossip") +node_log_str = "/127.0.0.5" if self.cluster.version() < '4.0' else "127.0.0.5:7000" +self.replacement_node.watch_log_for("java.lang.RuntimeException: Cannot replace_address " ++ node_log_str + " because it doesn't exist in gossip") assert_not_running(self.replacement_node) @since('3.6') @@ -464,17 +469,23 @@ class TestReplaceAddress(BaseReplaceAddressTest): self._stop_node_to_replace() logger.debug("Submitting byteman script to make stream fail") +btmmark = self.query_node.mark_log() if self.cluster.version() < '4.0': self.query_node.byteman_submit(['./byteman/pre4.0/stream_failure.btm']) self._do_replace(jvm_option='replace_address_first_boot', - opts={'streaming_socket_timeout_in_ms': 1000}) + opts={'streaming_socket_timeout_in_ms': 1000}, + wait_for_binary_proto=False, + wait_other_notice=True) else: self.query_node.byteman_submit(['./byteman/4.0/stream_failure.btm']) -self._do_replace(jvm_option='replace_address_first_boot') +self._do_replace(jvm_option='replace_address_first_boot', wait_for_binary_proto=False, wait_other_notice=True) # Make sure bootstrap did not complete successfully -assert_bootstrap_state(self, self.replacement_node, 'IN_PROGRESS') +self.query_node.watch_log_for("Triggering network failure", from_mark=btmmark) +self.query_node.watch_log_for("Stream failed", from_mark=btmmark) +self.replacement_node.watch_log_for("Stream failed") +self.replacement_node.watch_log_for("Some data streaming failed.*IN_PROGRESS$") if mode == 'reset_resume_state': mark = self.replacement_node.mark_log() @@ -498,12 +509,14 @@ class TestReplaceAddress(BaseReplaceAddressTest): self.replacement_node.stop() logger.debug("Waiting other nodes to detect node stopped") -self.query_node.watch_log_for("FatClient /{} has been silent for 3ms, removing from gossip".format(self.replacement_node.address()), timeout=120) -self.query_node.watch_log_for("Node /{} failed during replace.".format(self.replacement_node.address()), timeout=120, filename='debug.log')
[cassandra] branch trunk updated (7cdad3c -> eacdfc4)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 7cdad3c Avoid overflow when bloom filter size exceeds 2GB new 4f50a67 Catch exception on bootstrap resume and init native transport new eacdfc4 Merge branch 'cassandra-3.11' into trunk The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt| 1 + .../apache/cassandra/service/CassandraDaemon.java | 3 ++- .../apache/cassandra/service/StorageService.java | 30 ++ 3 files changed, 23 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.0 updated: Catch exception on bootstrap resume and init native transport
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/cassandra-3.0 by this push: new 1843032 Catch exception on bootstrap resume and init native transport 1843032 is described below commit 184303220b2995f411f51f675131007404372b3d Author: Bereng AuthorDate: Wed Jun 10 12:34:34 2020 +0200 Catch exception on bootstrap resume and init native transport Patch by Berenguer Blasi, reviewed by brandonwilliams for CASSANDRA-15863 --- CHANGES.txt| 1 + .../apache/cassandra/service/CassandraDaemon.java | 3 ++- .../apache/cassandra/service/StorageService.java | 30 ++ 3 files changed, 23 insertions(+), 11 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index d506dc8..d1b1416 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.21 + * Catch exception on bootstrap resume and init native transport (CASSANDRA-15863) * Fix replica-side filtering returning stale data with CL > ONE (CASSANDRA-8272, CASSANDRA-8273) * Fix duplicated row on 2.x upgrades when multi-rows range tombstones interact with collection ones (CASSANDRA-15805) * Rely on snapshotted session infos on StreamResultFuture.maybeComplete to avoid race conditions (CASSANDRA-15667) diff --git a/src/java/org/apache/cassandra/service/CassandraDaemon.java b/src/java/org/apache/cassandra/service/CassandraDaemon.java index 4a6e947..85a002f 100644 --- a/src/java/org/apache/cassandra/service/CassandraDaemon.java +++ b/src/java/org/apache/cassandra/service/CassandraDaemon.java @@ -421,7 +421,8 @@ public class CassandraDaemon public void initializeNativeTransport() { // Native transport -nativeTransportService = new NativeTransportService(); +if (nativeTransportService == null) +nativeTransportService = new NativeTransportService(); } /* diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java index f9efdb8..d287788 100644 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@ -1317,20 +1317,30 @@ public class StorageService extends NotificationBroadcasterSupport implements IE @Override public void onSuccess(StreamState streamState) { -bootstrapFinished(); -if (isSurveyMode) +try { -logger.info("Startup complete, but write survey mode is active, not becoming an active ring member. Use JMX (StorageService->joinRing()) to finalize ring joining."); +bootstrapFinished(); +if (isSurveyMode) +{ +logger.info("Startup complete, but write survey mode is active, not becoming an active ring member. Use JMX (StorageService->joinRing()) to finalize ring joining."); +} +else +{ +isSurveyMode = false; +progressSupport.progress("bootstrap", ProgressEvent.createNotification("Joining ring...")); +finishJoiningRing(bootstrapTokens); +} +progressSupport.progress("bootstrap", new ProgressEvent(ProgressEventType.COMPLETE, 1, 1, "Resume bootstrap complete")); +if (!isNativeTransportRunning()) +daemon.initializeNativeTransport(); +daemon.start(); +logger.info("Resume complete"); } -else +catch(Exception e) { -isSurveyMode = false; -progressSupport.progress("bootstrap", ProgressEvent.createNotification("Joining ring...")); -finishJoiningRing(bootstrapTokens); +onFailure(e); +throw e; } -progressSupport.progress("bootstrap", new ProgressEvent(ProgressEventType.COMPLETE, 1, 1, "Resume bootstrap complete")); -daemon.start(); -logger.info("Resume complete"); } @Override - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit eacdfc4978547b8e7be06c9ba9611c29963e6cc2 Merge: 7cdad3c 4f50a67 Author: Brandon Williams AuthorDate: Tue Jun 16 12:37:15 2020 -0500 Merge branch 'cassandra-3.11' into trunk CHANGES.txt| 1 + .../apache/cassandra/service/CassandraDaemon.java | 3 ++- .../apache/cassandra/service/StorageService.java | 30 ++ 3 files changed, 23 insertions(+), 11 deletions(-) diff --cc CHANGES.txt index 5f09cdc,0f730d4..e6ecb42 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,49 -1,10 +1,50 @@@ -3.11.7 +4.0-alpha5 + * Fix missing topology events when running multiple nodes on the same network interface (CASSANDRA-15677) + * Create config.yml.MIDRES (CASSANDRA-15712) + * Fix handling of fully purged static rows in repaired data tracking (CASSANDRA-15848) + * Prevent validation request submission from blocking ANTI_ENTROPY stage (CASSANDRA-15812) + * Add fqltool and auditlogviewer to rpm and deb packages (CASSANDRA-14712) + * Include DROPPED_COLUMNS in schema digest computation (CASSANDRA-15843) + * Fix Cassandra restart from rpm install (CASSANDRA-15830) + * Improve handling of 2i initialization failures (CASSANDRA-13606) + * Add completion_ratio column to sstable_tasks virtual table (CASANDRA-15759) + * Add support for adding custom Verbs (CASSANDRA-15725) + * Speed up entire-file-streaming file containment check and allow entire-file-streaming for all compaction strategies (CASSANDRA-15657,CASSANDRA-15783) + * Provide ability to configure IAuditLogger (CASSANDRA-15748) + * Fix nodetool enablefullquerylog blocking param parsing (CASSANDRA-15819) + * Add isTransient to SSTableMetadataView (CASSANDRA-15806) + * Fix tools/bin/fqltool for all shells (CASSANDRA-15820) + * Fix clearing of legacy size_estimates (CASSANDRA-15776) + * Update port when reconnecting to pre-4.0 SSL storage (CASSANDRA-15727) + * Only calculate dynamicBadnessThreshold once per loop in DynamicEndpointSnitch (CASSANDRA-15798) + * Cleanup redundant nodetool commands added in 4.0 (CASSANDRA-15256) + * Update to Python driver 3.23 for cqlsh (CASSANDRA-15793) + * Add tunable initial size and growth factor to RangeTombstoneList (CASSANDRA-15763) + * Improve debug logging in SSTableReader for index summary (CASSANDRA-15755) + * bin/sstableverify should support user provided token ranges (CASSANDRA-15753) + * Improve logging when mutation passed to commit log is too large (CASSANDRA-14781) + * replace LZ4FastDecompressor with LZ4SafeDecompressor (CASSANDRA-15560) + * Fix buffer pool NPE with concurrent release due to in-progress tiny pool eviction (CASSANDRA-15726) + * Avoid race condition when completing stream sessions (CASSANDRA-15666) + * Flush with fast compressors by default (CASSANDRA-15379) + * Fix CqlInputFormat regression from the switch to system.size_estimates (CASSANDRA-15637) + * Allow sending Entire SSTables over SSL (CASSANDRA-15740) + * Fix CQLSH UTF-8 encoding issue for Python 2/3 compatibility (CASSANDRA-15739) + * Fix batch statement preparation when multiple tables and parameters are used (CASSANDRA-15730) + * Fix regression with traceOutgoingMessage printing message size (CASSANDRA-15687) + * Ensure repaired data tracking reads a consistent amount of data across replicas (CASSANDRA-15601) + * Fix CQLSH to avoid arguments being evaluated (CASSANDRA-15660) + * Correct Visibility and Improve Safety of Methods in LatencyMetrics (CASSANDRA-15597) + * Allow cqlsh to run with Python2.7/Python3.6+ (CASSANDRA-15659,CASSANDRA-15573) + * Improve logging around incremental repair (CASSANDRA-15599) + * Do not check cdc_raw_directory filesystem space if CDC disabled (CASSANDRA-15688) + * Replace array iterators with get by index (CASSANDRA-15394) + * Minimize BTree iterator allocations (CASSANDRA-15389) +Merged from 3.11: * Fix CQL formatting of read command restrictions for slow query log (CASSANDRA-15503) - * Allow sstableloader to use SSL on the native port (CASSANDRA-14904) Merged from 3.0: + * Catch exception on bootstrap resume and init native transport (CASSANDRA-15863) * Fix replica-side filtering returning stale data with CL > ONE (CASSANDRA-8272, CASSANDRA-8273) - * Fix duplicated row on 2.x upgrades when multi-rows range tombstones interact with collection ones (CASSANDRA-15805) * Rely on snapshotted session infos on StreamResultFuture.maybeComplete to avoid race conditions (CASSANDRA-15667) * EmptyType doesn't override writeValue so could attempt to write bytes when expected not to (CASSANDRA-15790) * Fix index queries on partition key columns when some partitions contains only static data (CASSANDRA-13666) diff --cc src/java/org/apache/cassandra/service/CassandraDaemon.java index 2dbe217,1f93262..c7591d5
[cassandra] branch cassandra-3.11 updated: Catch exception on bootstrap resume and init native transport
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/cassandra-3.11 by this push: new 4f50a67 Catch exception on bootstrap resume and init native transport 4f50a67 is described below commit 4f50a6712ada5c4298ec860836015ea15049cbda Author: Bereng AuthorDate: Wed Jun 10 12:34:34 2020 +0200 Catch exception on bootstrap resume and init native transport Patch by Berenguer Blasi, reviewed by brandonwilliams for CASSANDRA-15863 --- CHANGES.txt| 1 + .../apache/cassandra/service/CassandraDaemon.java | 3 +- .../apache/cassandra/service/StorageService.java | 32 ++ 3 files changed, 23 insertions(+), 13 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index 7f54146..0f730d4 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -2,6 +2,7 @@ * Fix CQL formatting of read command restrictions for slow query log (CASSANDRA-15503) * Allow sstableloader to use SSL on the native port (CASSANDRA-14904) Merged from 3.0: + * Catch exception on bootstrap resume and init native transport (CASSANDRA-15863) * Fix replica-side filtering returning stale data with CL > ONE (CASSANDRA-8272, CASSANDRA-8273) * Fix duplicated row on 2.x upgrades when multi-rows range tombstones interact with collection ones (CASSANDRA-15805) * Rely on snapshotted session infos on StreamResultFuture.maybeComplete to avoid race conditions (CASSANDRA-15667) diff --git a/src/java/org/apache/cassandra/service/CassandraDaemon.java b/src/java/org/apache/cassandra/service/CassandraDaemon.java index b80580a..1f93262 100644 --- a/src/java/org/apache/cassandra/service/CassandraDaemon.java +++ b/src/java/org/apache/cassandra/service/CassandraDaemon.java @@ -450,7 +450,8 @@ public class CassandraDaemon public void initializeNativeTransport() { // Native transport -nativeTransportService = new NativeTransportService(); +if (nativeTransportService == null) +nativeTransportService = new NativeTransportService(); } /* diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java index 3d31596..d3c30a0 100644 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@ -1592,22 +1592,30 @@ public class StorageService extends NotificationBroadcasterSupport implements IE @Override public void onSuccess(StreamState streamState) { -bootstrapFinished(); -// start participating in the ring. -// pretend we are in survey mode so we can use joinRing() here -if (isSurveyMode) +try { -logger.info("Startup complete, but write survey mode is active, not becoming an active ring member. Use JMX (StorageService->joinRing()) to finalize ring joining."); +bootstrapFinished(); +if (isSurveyMode) +{ +logger.info("Startup complete, but write survey mode is active, not becoming an active ring member. Use JMX (StorageService->joinRing()) to finalize ring joining."); +} +else +{ +isSurveyMode = false; +progressSupport.progress("bootstrap", ProgressEvent.createNotification("Joining ring...")); +finishJoiningRing(true, bootstrapTokens); +} +progressSupport.progress("bootstrap", new ProgressEvent(ProgressEventType.COMPLETE, 1, 1, "Resume bootstrap complete")); +if (!isNativeTransportRunning()) +daemon.initializeNativeTransport(); +daemon.start(); +logger.info("Resume complete"); } -else +catch(Exception e) { -isSurveyMode = false; -progressSupport.progress("bootstrap", ProgressEvent.createNotification("Joining ring...")); -finishJoiningRing(true, bootstrapTokens); +onFailure(e); +throw e; } -progressSupport.progress("bootstrap", new ProgressEvent(ProgressEventType.COMPLETE, 1, 1, "Resume bootstrap complete")); -daemon.start(); -logger.info("Resume complete"); } @Override
[jira] [Commented] (CASSANDRA-15833) Unresolvable false digest mismatch during upgrade due to CASSANDRA-10657
[ https://issues.apache.org/jira/browse/CASSANDRA-15833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137775#comment-17137775 ] Jacek Lewandowski commented on CASSANDRA-15833: --- Regarding your concern about using Gossiper; I was wondering whether we could just create a ColumnFilter factory and have different implementations, wdyt? > Unresolvable false digest mismatch during upgrade due to CASSANDRA-10657 > > > Key: CASSANDRA-15833 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15833 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.11.x, 4.0-alpha > > Attachments: CASSANDRA-15833-3.11.patch, CASSANDRA-15833-4.0.patch > > > CASSANDRA-10657 introduced changes in how the ColumnFilter is interpreted. > This results in digest mismatch when querying incomplete set of columns from > a table with consistency that requires reaching instances running pre > CASSANDRA-10657 from nodes that include CASSANDRA-10657 (it was introduced in > Cassandra 3.4). > The fix is to bring back the previous behaviour until there are no instances > running pre CASSANDRA-10657 version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andres de la Peña updated CASSANDRA-15459: -- Bug Category: Parent values: Correctness(12982)Level 1 values: Consistency(12989) Complexity: Normal Discovered By: Code Inspection Fix Version/s: 3.11.7 Severity: Normal Status: Open (was: Triage Needed) > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 3.11.7, 4.0-beta > > Time Spent: 0.5h > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15868) Update Netty version to 4.1.50 because there are security issues in 4.1.37
[ https://issues.apache.org/jira/browse/CASSANDRA-15868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15868: - Status: In Progress (was: Changes Suggested) > Update Netty version to 4.1.50 because there are security issues in 4.1.37 > -- > > Key: CASSANDRA-15868 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15868 > Project: Cassandra > Issue Type: Task > Components: Dependencies >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.7, 4.0-beta > > Attachments: dependency-check-report.html > > > Please see attached HTML report from OWASP dependency check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15868) Update Netty version to 4.1.50 because there are security issues in 4.1.37
[ https://issues.apache.org/jira/browse/CASSANDRA-15868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15868: - Status: Patch Available (was: In Progress) > Update Netty version to 4.1.50 because there are security issues in 4.1.37 > -- > > Key: CASSANDRA-15868 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15868 > Project: Cassandra > Issue Type: Task > Components: Dependencies >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.7, 4.0-beta > > Attachments: dependency-check-report.html > > > Please see attached HTML report from OWASP dependency check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Description: Flaky dtest: [test_dead_sync_initiator - repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] {code:java|title=stacktrace} Unexpected error found in node logs (see stdout for full details). Errors: [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 CassandraEntireSSTableStreamReader.java:145 - [Stream 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream for table = keyspace1.standard1 org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) at org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) at org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) at org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Checksums do not match for /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db {code} In the above test, it executes "nodetool repair" on node1 and kills node2 during repair. At the end, node3 reports checksum validation failure on sstable transferred from node1. {code:java|title=what happened} 1. When repair started on node1, it performs anti-compaction which modifies sstable's repairAt to 0 and pending repair id to session-id. 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be transferred to node3. 3. Before node1 actually sends the files to node3, node2 is killed and node1 starts to broadcast repair-failure-message to all participants in {{CoordinatorSession#fail}} 4. Node1 receives its own repair-failure-message and fails its local repair sessions at {{LocalSessions#failSession}} which triggers async background compaction. 5. Node1's background compaction will mutate sstable's repairAt to 0 and pending repair id to null via {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more in-progress repair. 6. Node1 actually sends the sstable to node3 where the sstable's STATS component size is different from the original size recorded in the manifest. 7. At the end, node3 reports checksum validation failure when it tries to mutate sstable level and "isTransient" attribute in {{CassandraEntireSSTableStreamReader#read}}. {code} This isn't a problem in legacy streaming as STATS file length didn't matter. Ideally it will be great to make sstable STATS metadata immutable, just like other sstable components, so we don't have to worry this special case. I can think of 2 ways: # Make STATS mutation as a proper compaction to create hard link on the compacting sstable components with a new descriptor, except STATS files which will be copied entirely. Then mutation will be applied on the new STATS file. At the end, old sstable will be released. This ensures all sstable components are immutable and shouldn't make these special compaction tasks slower. # Change STATS metadata format to use fixed length encoding for repair info was: Flaky dtest: [test_dead_sync_initiator - repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] {code:title=stacktrace} Unexpected error found in node logs (see stdout for full details). Errors: [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 202
[jira] [Updated] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Description: Flaky dtest: [test_dead_sync_initiator - repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] {code:title=stacktrace} Unexpected error found in node logs (see stdout for full details). Errors: [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 CassandraEntireSSTableStreamReader.java:145 - [Stream 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream for table = keyspace1.standard1 org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) at org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) at org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) at org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Checksums do not match for /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db {code} In the above test, it executes "nodetool repair" on node1 and kills node2 during repair. At the end, node3 reports checksum validation failure on sstable transferred from node1. {code:java|title=what happened} 1. When repair started on node1, it performs anti-compaction which modifies sstable's repairAt to 0 and pending repair id to session-id. 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be transferred to node3. 3. Before node1 actually sends the files to node3, node2 is killed and node1 starts to broadcast repair-failure-message to all participants in {{CoordinatorSession#fail}} 4. Node1 receives its own repair-failure-message and fails its local repair sessions at {{LocalSessions#failSession}} which triggers async background compaction. 5. Node1's background compaction will mutate sstable's repairAt to 0 and pending repair id to null via {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more in-progress repair. 6. Node1 actually sends the sstable to node3 where the sstable's STATS component size is different from the original size recorded in the manifest. 7. At the end, node3 reports checksum validation failure when it tries to mutate sstable level and "isTransient" attribute in {{CassandraEntireSSTableStreamReader#read}}. {code} This isn't a problem in legacy streaming as STATS file length didn't matter. Ideally it will be great to make sstable STATS metadata immutable, just like other sstable components, so we don't have to worry this special case. I can think of 2 ways: # Change {{RepairFinishedCompactionTask}}, {{AntiCompaction}} and {{SingleSSTableLCSTask}} to create hard link on the compacting sstable components with a new descriptor, except STATS files which will be copied entirely. Then mutation will be applied on the new STATS file. At the end, old sstable will be released. This ensures all sstable components are immutable and shouldn't make these special compaction tasks slower. # Change STATS metadata format to use fixed length encoding for repair info was: Flaky dtest: [test_dead_sync_initiator - repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] In the above test, it executes "nodetool repair" on node1 and kills node2 during repair. At the end, node3 report
[jira] [Updated] (CASSANDRA-15878) Ec2Snitch fails on upgrade in legacy mode
[ https://issues.apache.org/jira/browse/CASSANDRA-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-15878: --- Fix Version/s: 4.0-beta > Ec2Snitch fails on upgrade in legacy mode > - > > Key: CASSANDRA-15878 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15878 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-7839 changed the way the EC2 DC/Rack naming was handled in the > Ec2Snitch to match AWS conventions. > The "legacy" mode was introduced to allow upgrades from Cassandra 3.0/3.x and > keep the same naming as before (while the "standard" mode uses the new naming > convention). > When performing an upgrade in the us-west-2 region, the second node failed to > start with the following exception: > > {code:java} > ERROR [main] 2020-06-16 09:14:42,218 Ec2Snitch.java:210 - This ec2-enabled > snitch appears to be using the legacy naming scheme for regions, but existing > nodes in cluster are using the opposite: region(s) = [us-west-2], > availability zone(s) = [2a]. Please check the ec2_naming_scheme property in > the cassandra-rackdc.properties configuration file for more details. > ERROR [main] 2020-06-16 09:14:42,219 CassandraDaemon.java:789 - Exception > encountered during startup > java.lang.IllegalStateException: null > at > org.apache.cassandra.service.StorageService.validateEndpointSnitch(StorageService.java:573) > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530) > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:800) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:659) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:610) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767) > {code} > > The exception leads back to [this piece of > code|https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L183-L185]. > After adding some logging, it turned out the DC name of the first upgraded > node was considered invalid as a legacy one: > {code:java} > INFO [main] 2020-06-16 09:14:42,216 Ec2Snitch.java:183 - Detected DC > us-west-2 > INFO [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:185 - > dcUsesLegacyFormat=false / usingLegacyNaming=true > ERROR [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:188 - Invalid DC name > us-west-2 > {code} > > The problem is that the regex that's used to identify legacy dc names will > match both old and new names : > {code:java} > boolean dcUsesLegacyFormat = !dc.matches("[a-z]+-[a-z].+-[\\d].*"); > {code} > Knowing that some dc names didn't change between the two modes (us-west-2 for > example), I don't see how we can use the dc names to detect if the legacy > mode is being used by other nodes in the cluster. > > The rack names on the other hand are totally different in the legacy and > standard modes and can be used to detect mismatching settings. > > My go to fix would be to drop the check on datacenters by removing the > following lines: > [https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L172-L186] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15459) Short read protection doesn't work on group-by queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15459: Fix Version/s: 4.0-beta > Short read protection doesn't work on group-by queries > -- > > Key: CASSANDRA-15459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15459 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: ZhaoYang >Assignee: Andres de la Peña >Priority: Normal > Labels: correctness > Fix For: 4.0-beta > > Time Spent: 0.5h > Remaining Estimate: 0h > > [DTest to > reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]: > it affects all versions.. > {code} > In a two-node cluster with RF = 2 > Execute only on Node1: > * Insert pk=1 and ck=1 with timestamp 9 > * Delete pk=0 and ck=0 with timestamp 10 > * Insert pk=2 and ck=2 with timestamp 9 > Execute only on Node2: > * Delete pk=1 and ck=1 with timestamp 10 > * Insert pk=0 and ck=0 with timestamp 9 > * Delete pk=2 and ck=2 with timestamp 10 > Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1" > * Expect no live data, but got [0, 0] > {code} > Note: for group-by queries, SRP should use "group counted" to calculate > limits used for SRP query, rather than "row counted". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Description: Flaky dtest: [test_dead_sync_initiator - repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] In the above test, it executes "nodetool repair" on node1 and kills node2 during repair. At the end, node3 reports checksum validation failure on sstable transferred from node1. {code:java|title=what happened} 1. When repair started on node1, it performs anti-compaction which modifies sstable's repairAt to 0 and pending repair id to session-id. 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be transferred to node3. 3. Before node1 actually sends the files to node3, node2 is killed and node1 starts to broadcast repair-failure-message to all participants in {{CoordinatorSession#fail}} 4. Node1 receives its own repair-failure-message and fails its local repair sessions at {{LocalSessions#failSession}} which triggers async background compaction. 5. Node1's background compaction will mutate sstable's repairAt to 0 and pending repair id to null via {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more in-progress repair. 6. Node1 actually sends the sstable to node3 where the sstable's STATS component size is different from the original size recorded in the manifest. 7. At the end, node3 reports checksum validation failure when it tries to mutate sstable level and "isTransient" attribute in {{CassandraEntireSSTableStreamReader#read}}. {code} This isn't a problem in legacy streaming as STATS file length didn't matter. Ideally it will be great to make sstable STATS metadata immutable, just like other sstable components, so we don't have to worry this special case. I can think of 3 ways: # Change {{RepairFinishedCompactionTask}}, {{AntiCompaction}} and {{SingleSSTableLCSTask}} to create hard link on the compacting sstable components with a new descriptor, except STATS files which will be copied entirely. Then mutation will be applied on the new STATS file. At the end, old sstable will be released. This ensures all sstable components are immutable and shouldn't make these special compaction tasks slower. # Change STATS metadata format to use fixed length encoding for repair info # Hacky approach: load the small STATS file into memory when initializing {{CassandraOutgoingFile}} instead of relying on mutable on-disk STATS file. was: Flaky dtest: [test_dead_sync_initiator - repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] In the above test, it executes "nodetool repair" on node1 and kills node2 during repair. At the end, node3 reports checksum validation failure on sstable transferred from node1. {code:java|title=what happened} 1. When repair started on node1, it performs anti-compaction which modifies sstable's repairAt to 0 and pending repair id to session-id. 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be transferred to node3. 3. Before node1 actually sends the files to node3, node2 is killed and node1 starts to broadcast repair-failure-message to all participants in {{CoordinatorSession#fail}} 4. Node1 receives its own repair-failure-message and fails its local repair sessions at {{LocalSessions#failSession}} which triggers async background compaction. 5. Node1's background compaction will mutate sstable's repairAt to 0 and pending repair id to null via {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more in-progress repair. 6. Node1 actually sends the sstable to node3 where the sstable's STATS component size is different from the original size recorded in the manifest. 7. At the end, node3 reports checksum validation failure when it tries to mutate sstable level and "isTransient" attribute in {{CassandraEntireSSTableStreamReader#read}}. {code} This isn't a problem in legacy streaming as STATS file length didn't matter. Ideally it will be great to make sstable STATS metadata immutable, just like other sstable components, so we don't have to worry this special case. I can think of two ways: # Change {{RepairFinishedCompactionTask}}, {{AntiCompaction}} and {{SingleSSTableLCSTask}} to create hard link on the compacting sstable components with a new descriptor, except STATS files which will be copied entirely. Then mutation will be applied on the new STATS file. At the end, old sstable will be released. This ensures all sstable components are immutable and shouldn't make these special compaction tasks slower. # Hacky approach: load the small STATS file into memory when initi
[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136793#comment-17136793 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874: --- thanks [~brandon.williams] can you please provide the symptoms of this race conditions? in my case I see only some portion of the data is bootstrapped but rest of the data bootstrapped without any issues. > Bootstrap completes Successfully without streaming all the data > --- > > Key: CASSANDRA-15874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15874 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Normal > > I am seeing a strange issue where, adding a new node with auto_bootstrap: > true is not streaming all the data before it joins the cluster. Don't see any > information in the logs about bootstrap failures. > Here is the sequence of logs > > {code:java} > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > schema complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: > waiting for pending range calculation > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > calculation complete, ready to bootstrap > INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: > getting bootstrap token > INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: > Starting to bootstrap... > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId . If a table was just created, this is likely due to the schema > not being fully propagated. Please wait for schema agreement on table > creation. > INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] > All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 > StorageService.java:1505 - Bootstrap completed! for the tokens > {code} > Cassandra Version: 3.11.3 > I am not able to reproduce this issue all the time, but it happened couple of > times. Is there any race condition/corner case, which could cause this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Description: Flaky dtest: [test_dead_sync_initiator - repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] In the above test, it executes "nodetool repair" on node1 and kills node2 during repair. At the end, node3 reports checksum validation failure on sstable transferred from node1. {code:java|title=what happened} 1. When repair started on node1, it performs anti-compaction which modifies sstable's repairAt to 0 and pending repair id to session-id. 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be transferred to node3. 3. Before node1 actually sends the files to node3, node2 is killed and node1 starts to broadcast repair-failure-message to all participants in {{CoordinatorSession#fail}} 4. Node1 receives its own repair-failure-message and fails its local repair sessions at {{LocalSessions#failSession}} which triggers async background compaction. 5. Node1's background compaction will mutate sstable's repairAt to 0 and pending repair id to null via {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more in-progress repair. 6. Node1 actually sends the sstable to node3 where the sstable's STATS component size is different from the original size recorded in the manifest. 7. At the end, node3 reports checksum validation failure when it tries to mutate sstable level and "isTransient" attribute in {{CassandraEntireSSTableStreamReader#read}}. {code} This isn't a problem in legacy streaming as STATS file length didn't matter. Ideally it will be great to make sstable STATS metadata immutable, just like other sstable components, so we don't have to worry this special case. I can think of two ways: # Change {{RepairFinishedCompactionTask}}, {{AntiCompaction}} and {{SingleSSTableLCSTask}} to create hard link on the compacting sstable components with a new descriptor, except STATS files which will be copied entirely. Then mutation will be applied on the new STATS file. At the end, old sstable will be released. This ensures all sstable components are immutable and shouldn't make these special compaction tasks slower. # Hacky approach: load the small STATS file into memory when initializing {{CassandraOutgoingFile}} instead of relying on mutable on-disk STATS file. was: Flaky dtest: [test_dead_sync_initiator - repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] In the above test, it executes "nodetool repair" on node1 and kills node2 during repair. At the end, node3 reports checksum validation failure on sstable transferred from node1. {code:java|title=what happened} 1. When repair started on node1, it performs anti-compaction which modifies sstable's repairAt to 0 and pending repair id to session-id. 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be transferred to node3. 3. Before node1 actually sends the files to node3, node2 is killed and node1 starts to broadcast repair-failure-message to all participants in {{CoordinatorSession#fail}} 4. Node1 receives its own repair-failure-message and fails its local repair sessions at {{LocalSessions#failSession}} which triggers async background compaction. 5. Node1's background compaction will mutate sstable's repairAt to 0 and pending repair id to null via {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more in-progress repair. 6. Node1 actually sends the sstable to node3 where the sstable's STATS component size is different from the original size recorded in the manifest. 7. At the end, node3 reports checksum validation failure when it tries to mutate sstable level and "isTransient" attribute in {{CassandraEntireSSTableStreamReader#read}}. {code} I believe similar race may happen with level compaction where it may directly mutate a sstable's level if it doesn't overlap with sstables at next level. (Note: this isn't a problem in legacy streaming as STATS file length didn't matter.) Also it impacts snapshot as well because snapshotted STATS file is hard linked. Ideally it will be great to make sstable STATS metadata immutable, just like other sstable components, so we don't have to worry this special case. I can think of two ways: # Change {{RepairFinishedCompactionTask}}, {{AntiCompaction}} and {{SingleSSTableLCSTask}} to create hard link on the compacting sstable components with a new descriptor, except STATS files which will be copied entirely. Then mutation will be applied on the new STATS file. At the end, old sstable will be released. Th
[jira] [Updated] (CASSANDRA-15877) Followup on CASSANDRA-15600
[ https://issues.apache.org/jira/browse/CASSANDRA-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-15877: Attachment: Screen Shot 2020-06-12 at 3.21.18 PM.png > Followup on CASSANDRA-15600 > --- > > Key: CASSANDRA-15877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15877 > Project: Cassandra > Issue Type: Bug > Components: Feature/Virtual Nodes >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-06-12 at 3.21.18 PM.png > > > As part of CASSANDRA-15600 generateSplits method replaced the > generateRandomTokens for NoReplicationAwareTokenAllocator. generateSplits > should be used also in ReplicationAwareTokenAllocator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Description: Flaky dtest: [test_dead_sync_initiator - repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] In the above test, it executes "nodetool repair" on node1 and kills node2 during repair. At the end, node3 reports checksum validation failure on sstable transferred from node1. {code:java|title=what happened} 1. When repair started on node1, it performs anti-compaction which modifies sstable's repairAt to 0 and pending repair id to session-id. 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be transferred to node3. 3. Before node1 actually sends the files to node3, node2 is killed and node1 starts to broadcast repair-failure-message to all participants in {{CoordinatorSession#fail}} 4. Node1 receives its own repair-failure-message and fails its local repair sessions at {{LocalSessions#failSession}} which triggers async background compaction. 5. Node1's background compaction will mutate sstable's repairAt to 0 and pending repair id to null via {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more in-progress repair. 6. Node1 actually sends the sstable to node3 where the sstable's STATS component size is different from the original size recorded in the manifest. 7. At the end, node3 reports checksum validation failure when it tries to mutate sstable level and "isTransient" attribute in {{CassandraEntireSSTableStreamReader#read}}. {code} I believe similar race may happen with level compaction where it may directly mutate a sstable's level if it doesn't overlap with sstables at next level. (Note: this isn't a problem in legacy streaming as STATS file length didn't matter.) Also it impacts snapshot as well because snapshotted STATS file is hard linked. Ideally it will be great to make sstable STATS metadata immutable, just like other sstable components, so we don't have to worry this special case. I can think of two ways: # Change {{RepairFinishedCompactionTask}}, {{AntiCompaction}} and {{SingleSSTableLCSTask}} to create hard link on the compacting sstable components with a new descriptor, except STATS files which will be copied entirely. Then mutation will be applied on the new STATS file. At the end, old sstable will be released. This ensures all sstable components are immutable and shouldn't make these special compaction tasks slower. # Hacky approach: load the small STATS file into memory when initializing {{CassandraOutgoingFile}} instead of relying on mutable on-disk STATS file. was: Flaky dtest: [test_dead_sync_initiator - repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] In the above test, it executes "nodetool repair" on node1 and kills node2 during repair. At the end, node3 reports checksum validation failure on sstable transferred from node1. {code:java|title=what happened} 1. When repair started on node1, it performs anti-compaction which modifies sstable's repairAt to 0 and pending repair id to session-id. 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be transferred to node3. 3. Before node1 actually sends the files to node3, node2 is killed and node1 starts to broadcast repair-failure-message to all participants in {{CoordinatorSession#fail}} 4. Node1 receives its own repair-failure-message and fails its local repair sessions at {{LocalSessions#failSession}} which triggers async background compaction. 5. Node1's background compaction will mutate sstable's repairAt to 0 and pending repair id to null via {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more in-progress repair. 6. Node1 actually sends the sstable to node3 where the sstable's STATS component size is different from the original size recorded in the manifest. 7. At the end, node3 reports checksum validation failure when it tries to mutate sstable level and "isTransient" attribute in {{CassandraEntireSSTableStreamReader#read}}. {code} I believe similar race may happen with level compaction where it may directly mutate a sstable's level if it doesn't overlap with sstables at next level. (Note: this isn't a problem in legacy streaming as STATS file length didn't matter.) Ideally it will be great to make sstable STATS metadata immutable, just like other sstable components, so we don't have to worry this special case. I can think of two ways: # Change {{RepairFinishedCompactionTask}}, {{AntiCompaction}} and {{SingleSSTableLCSTask}} to create hard link on the compacting sstable components with a
[jira] [Created] (CASSANDRA-15878) Ec2Snitch fails on upgrade in legacy mode
Alexander Dejanovski created CASSANDRA-15878: Summary: Ec2Snitch fails on upgrade in legacy mode Key: CASSANDRA-15878 URL: https://issues.apache.org/jira/browse/CASSANDRA-15878 Project: Cassandra Issue Type: Bug Reporter: Alexander Dejanovski CASSANDRA-7839 changed the way the EC2 DC/Rack naming was handled in the Ec2Snitch to match AWS conventions. The "legacy" mode was introduced to allow upgrades from Cassandra 3.0/3.x and keep the same naming as before (while the "standard" mode uses the new naming convention). When performing an upgrade in the us-west-2 region, the second node failed to start with the following exception: {code:java} ERROR [main] 2020-06-16 09:14:42,218 Ec2Snitch.java:210 - This ec2-enabled snitch appears to be using the legacy naming scheme for regions, but existing nodes in cluster are using the opposite: region(s) = [us-west-2], availability zone(s) = [2a]. Please check the ec2_naming_scheme property in the cassandra-rackdc.properties configuration file for more details. ERROR [main] 2020-06-16 09:14:42,219 CassandraDaemon.java:789 - Exception encountered during startup java.lang.IllegalStateException: null at org.apache.cassandra.service.StorageService.validateEndpointSnitch(StorageService.java:573) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:800) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:659) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:610) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767) {code} The exception leads back to [this piece of code|https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L183-L185]. After adding some logging, it turned out the DC name of the first upgraded node was considered invalid as a legacy one: {code:java} INFO [main] 2020-06-16 09:14:42,216 Ec2Snitch.java:183 - Detected DC us-west-2 INFO [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:185 - dcUsesLegacyFormat=false / usingLegacyNaming=true ERROR [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:188 - Invalid DC name us-west-2 {code} The problem is that the regex that's used to identify legacy dc names will match both old and new names : {code:java} boolean dcUsesLegacyFormat = !dc.matches("[a-z]+-[a-z].+-[\\d].*"); {code} Knowing that some dc names didn't change between the two modes (us-west-2 for example), I don't see how we can use the dc names to detect if the legacy mode is being used by other nodes in the cluster. The rack names on the other hand are totally different in the legacy and standard modes and can be used to detect mismatching settings. My go to fix would be to drop the check on datacenters by removing the following lines: [https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L172-L186] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14753) Document incremental repair session timeouts and repair_admin usage
[ https://issues.apache.org/jira/browse/CASSANDRA-14753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136632#comment-17136632 ] Berenguer Blasi commented on CASSANDRA-14753: - Hi [~bdeggleston] was wondering if you mind I take this one. Also being a bit cheeky, even after reading through CASSANDRA-14685, I can't seem to pin down which error exactly you are referring to with "The sstable acquisition error"... > Document incremental repair session timeouts and repair_admin usage > --- > > Key: CASSANDRA-14753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14753 > Project: Cassandra > Issue Type: Task > Components: Legacy/Documentation and Website >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Low > Fix For: 4.0 > > > As seen in CASSANDRA-14685, the behavior of incremental repair sessions with > failed streams is not obvious and appears to be a bug (although it's working > as expected). The incremental repair documentation should be updated to > describe what happens if an incremental repair session fails mid-stream, the > session timeouts, and how and when to use nodetool repair_admin. The sstable > acquisition error should also be updated to mention this as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15821) Metrics Documentation Enhancements
[ https://issues.apache.org/jira/browse/CASSANDRA-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136482#comment-17136482 ] Berenguer Blasi commented on CASSANDRA-15821: - Hi [~spmallette] I told you I would look into this but I am afraid I can't help much more than doing a sanity check. It looks good in that regard. But my metrics knowledge is no match for the requirements here. I will keep an eye on it but you'll need sbdy with deeper knowledge to chime in. > Metrics Documentation Enhancements > -- > > Key: CASSANDRA-15821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15821 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Stephen Mallette >Assignee: Stephen Mallette >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15582 involves quality around metrics and it was mentioned that > reviewing and [improving > documentation|https://github.com/apache/cassandra/blob/trunk/doc/source/operating/metrics.rst] > around metrics would fall into that scope. Please consider some of this > analysis in determining what improvements to make here: > Please see [this > spreadsheet|https://docs.google.com/spreadsheets/d/1iPWfCMIG75CI6LbYuDtCTjEOvZw-5dyH-e08bc63QnI/edit?usp=sharing] > that itemizes almost all of cassandra's metrics and whether they are > documented or not (and other notes). That spreadsheet is "almost all" > because there are some metrics that don't seem to initialize as part of > Cassandra startup (i was able to trigger some to initialize, but all were not > immediately obvious). The missing metrics seem to be related to the following: > * ThreadPool metrics - only some initialize at startup the list of which > follow below > * Streaming Metrics > * HintedHandoff Metrics > * HintsService Metrics > Here are the ThreadPool scopes that get listed: > {code} > AntiEntropyStage > CacheCleanupExecutor > CompactionExecutor > GossipStage > HintsDispatcher > MemtableFlushWriter > MemtablePostFlush > MemtableReclaimMemory > MigrationStage > MutationStage > Native-Transport-Requests > PendingRangeCalculator > PerDiskMemtableFlushWriter_0 > ReadStage > Repair-Task > RequestResponseStage > Sampler > SecondaryIndexManagement > ValidationExecutor > ViewBuildExecutor > {code} > I noticed that Keyspace Metrics have this note: "Most of these metrics are > the same as the Table Metrics above, only they are aggregated at the Keyspace > level." I think I've isolated those metrics on table that are not on keyspace > to specifically be: > {code} > BloomFilterFalsePositives > BloomFilterFalseRatio > BytesAnticompacted > BytesFlushed > BytesMutatedAnticompaction > BytesPendingRepair > BytesRepaired > BytesUnrepaired > CompactionBytesWritten > CompressionRatio > CoordinatorReadLatency > CoordinatorScanLatency > CoordinatorWriteLatency > EstimatedColumnCountHistogram > EstimatedPartitionCount > EstimatedPartitionSizeHistogram > KeyCacheHitRate > LiveSSTableCount > MaxPartitionSize > MeanPartitionSize > MinPartitionSize > MutatedAnticompactionGauge > PercentRepaired > RowCacheHitOutOfRange > RowCacheHit > RowCacheMiss > SpeculativeSampleLatencyNanos > SyncTime > WaitingOnFreeMemtableSpace > DroppedMutations > {code} > Someone with greater knowledge of this area might consider it worth the > effort to see if any of these metrics should be aggregated to the keyspace > level in case they were inadvertently missed. In any case, perhaps the > documentation could easily now reflect which metric names could be expected > on Keyspace. > The DroppedMessage metrics have a much larger body of scopes than just what > were documented: > {code} > ASYMMETRIC_SYNC_REQ > BATCH_REMOVE_REQ > BATCH_REMOVE_RSP > BATCH_STORE_REQ > BATCH_STORE_RSP > CLEANUP_MSG > COUNTER_MUTATION_REQ > COUNTER_MUTATION_RSP > ECHO_REQ > ECHO_RSP > FAILED_SESSION_MSG > FAILURE_RSP > FINALIZE_COMMIT_MSG > FINALIZE_PROMISE_MSG > FINALIZE_PROPOSE_MSG > GOSSIP_DIGEST_ACK > GOSSIP_DIGEST_ACK2 > GOSSIP_DIGEST_SYN > GOSSIP_SHUTDOWN > HINT_REQ > HINT_RSP > INTERNAL_RSP > MUTATION_REQ > MUTATION_RSP > PAXOS_COMMIT_REQ > PAXOS_COMMIT_RSP > PAXOS_PREPARE_REQ > PAXOS_PREPARE_RSP > PAXOS_PROPOSE_REQ > PAXOS_PROPOSE_RSP > PING_REQ > PING_RSP > PREPARE_CONSISTENT_REQ > PREPARE_CONSISTENT_RSP > PREPARE_MSG > RANGE_REQ > RANGE_RSP > READ_REPAIR_REQ > READ_REPAIR_RSP > READ_REQ > READ_RSP > REPAIR_RSP > REPLICATION_DONE_REQ > REPLICATION_DONE_RSP > REQUEST_RSP > SCHEMA_PULL_REQ > SCHEMA_PULL_RSP > SCHEMA_PUSH_REQ > SCHEMA_PUSH_RSP > SCHEMA_VERSION_REQ > SCHEMA_VERSION_RSP > SNAPSHOT_MSG > SNAPSHOT_REQ > SNAPSHOT_RSP > STATUS_REQ > STATUS_RSP > SYNC_REQ > SYNC_RSP > TRUNCATE_REQ > TRUNCATE_RSP > VALIDATION_REQ > VALIDATION_RSP > _SAMPLE > _TEST_1 > _TEST_2 > _TRACE > {code} >
[jira] [Updated] (CASSANDRA-15821) Metrics Documentation Enhancements
[ https://issues.apache.org/jira/browse/CASSANDRA-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Berenguer Blasi updated CASSANDRA-15821: Reviewers: Berenguer Blasi > Metrics Documentation Enhancements > -- > > Key: CASSANDRA-15821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15821 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Stephen Mallette >Assignee: Stephen Mallette >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15582 involves quality around metrics and it was mentioned that > reviewing and [improving > documentation|https://github.com/apache/cassandra/blob/trunk/doc/source/operating/metrics.rst] > around metrics would fall into that scope. Please consider some of this > analysis in determining what improvements to make here: > Please see [this > spreadsheet|https://docs.google.com/spreadsheets/d/1iPWfCMIG75CI6LbYuDtCTjEOvZw-5dyH-e08bc63QnI/edit?usp=sharing] > that itemizes almost all of cassandra's metrics and whether they are > documented or not (and other notes). That spreadsheet is "almost all" > because there are some metrics that don't seem to initialize as part of > Cassandra startup (i was able to trigger some to initialize, but all were not > immediately obvious). The missing metrics seem to be related to the following: > * ThreadPool metrics - only some initialize at startup the list of which > follow below > * Streaming Metrics > * HintedHandoff Metrics > * HintsService Metrics > Here are the ThreadPool scopes that get listed: > {code} > AntiEntropyStage > CacheCleanupExecutor > CompactionExecutor > GossipStage > HintsDispatcher > MemtableFlushWriter > MemtablePostFlush > MemtableReclaimMemory > MigrationStage > MutationStage > Native-Transport-Requests > PendingRangeCalculator > PerDiskMemtableFlushWriter_0 > ReadStage > Repair-Task > RequestResponseStage > Sampler > SecondaryIndexManagement > ValidationExecutor > ViewBuildExecutor > {code} > I noticed that Keyspace Metrics have this note: "Most of these metrics are > the same as the Table Metrics above, only they are aggregated at the Keyspace > level." I think I've isolated those metrics on table that are not on keyspace > to specifically be: > {code} > BloomFilterFalsePositives > BloomFilterFalseRatio > BytesAnticompacted > BytesFlushed > BytesMutatedAnticompaction > BytesPendingRepair > BytesRepaired > BytesUnrepaired > CompactionBytesWritten > CompressionRatio > CoordinatorReadLatency > CoordinatorScanLatency > CoordinatorWriteLatency > EstimatedColumnCountHistogram > EstimatedPartitionCount > EstimatedPartitionSizeHistogram > KeyCacheHitRate > LiveSSTableCount > MaxPartitionSize > MeanPartitionSize > MinPartitionSize > MutatedAnticompactionGauge > PercentRepaired > RowCacheHitOutOfRange > RowCacheHit > RowCacheMiss > SpeculativeSampleLatencyNanos > SyncTime > WaitingOnFreeMemtableSpace > DroppedMutations > {code} > Someone with greater knowledge of this area might consider it worth the > effort to see if any of these metrics should be aggregated to the keyspace > level in case they were inadvertently missed. In any case, perhaps the > documentation could easily now reflect which metric names could be expected > on Keyspace. > The DroppedMessage metrics have a much larger body of scopes than just what > were documented: > {code} > ASYMMETRIC_SYNC_REQ > BATCH_REMOVE_REQ > BATCH_REMOVE_RSP > BATCH_STORE_REQ > BATCH_STORE_RSP > CLEANUP_MSG > COUNTER_MUTATION_REQ > COUNTER_MUTATION_RSP > ECHO_REQ > ECHO_RSP > FAILED_SESSION_MSG > FAILURE_RSP > FINALIZE_COMMIT_MSG > FINALIZE_PROMISE_MSG > FINALIZE_PROPOSE_MSG > GOSSIP_DIGEST_ACK > GOSSIP_DIGEST_ACK2 > GOSSIP_DIGEST_SYN > GOSSIP_SHUTDOWN > HINT_REQ > HINT_RSP > INTERNAL_RSP > MUTATION_REQ > MUTATION_RSP > PAXOS_COMMIT_REQ > PAXOS_COMMIT_RSP > PAXOS_PREPARE_REQ > PAXOS_PREPARE_RSP > PAXOS_PROPOSE_REQ > PAXOS_PROPOSE_RSP > PING_REQ > PING_RSP > PREPARE_CONSISTENT_REQ > PREPARE_CONSISTENT_RSP > PREPARE_MSG > RANGE_REQ > RANGE_RSP > READ_REPAIR_REQ > READ_REPAIR_RSP > READ_REQ > READ_RSP > REPAIR_RSP > REPLICATION_DONE_REQ > REPLICATION_DONE_RSP > REQUEST_RSP > SCHEMA_PULL_REQ > SCHEMA_PULL_RSP > SCHEMA_PUSH_REQ > SCHEMA_PUSH_RSP > SCHEMA_VERSION_REQ > SCHEMA_VERSION_RSP > SNAPSHOT_MSG > SNAPSHOT_REQ > SNAPSHOT_RSP > STATUS_REQ > STATUS_RSP > SYNC_REQ > SYNC_RSP > TRUNCATE_REQ > TRUNCATE_RSP > VALIDATION_REQ > VALIDATION_RSP > _SAMPLE > _TEST_1 > _TEST_2 > _TRACE > {code} > I suppose I may yet be missing some metrics as my knowledge of what's > available is limited to what I can get from JMX after cassandra > initialization (and some initial starting commands) and what's int he > documentation. If something is present that is missing from both then I won't > know it's there. Anyway, pe
[jira] [Updated] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Component/s: Local/Compaction > Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > --- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > I believe similar race may happen with level compaction where it may directly > mutate a sstable's level if it doesn't overlap with sstables at next level. > (Note: this isn't a problem in legacy streaming as STATS file length didn't > matter.) > Ideally it will be great to make sstable STATS metadata immutable, just like > other sstable components, so we don't have to worry this special case. > I can think of two ways: > # Change {{RepairFinishedCompactionTask}}, {{AntiCompaction}} and > {{SingleSSTableLCSTask}} to create hard link on the compacting sstable > components with a new descriptor, except STATS files which will be copied > entirely. Then mutation will be applied on the new STATS file. At the end, > old sstable will be released. This ensures all sstable components are > immutable and shouldn't make these special compaction tasks slower. > # Hacky approach: load the small STATS file into memory when initializing > {{CassandraOutgoingFile}} instead of relying on mutable on-disk STATS file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org